Avoid losing MRR to gateway outages, timeouts, and PSP-specific hiccups. In one sprint you can add fallback routing for recurring charges—without rebuilding your billing system.
What “fallback for subscriptions” is (and when to use it)
A fallback subscription charge attempts Gateway A first and, on specific platform signals (PSP error, timeout, degraded latency) or policy (issuer/BIN/region with poor renewal performance), automatically retries via Gateway B—only when it’s safe and compliant.
Use it for
- PSP/platform failures (5xx, timeouts, maintenance/incidents).
- Known bad cohorts (BIN/country/issuer) with low renewal approval.
- Latency spikes breaching your SLO for charge attempts.
Do not use it for
- Hard issuer declines (e.g., invalid card, stolen card).
- Funds/limit issues where dunning cadence is the right lever.
Fallback complements dunning; it doesn’t replace it.
Stored credentials & compliance (CIT/MIT)
Subscriptions rely on card-on-file rules:
- CIT at signup (customer-initiated), MIT for renewals (merchant-initiated).
- When routing to another PSP, preserve stored-credential indicators, original transaction/reference IDs, and reason (recurring).
- Prefer network tokens or a portable vault so payment methods remain usable across gateways.
Guardrail: if a token isn’t portable, route only when you have a compatible token or a network token for that PAN.
Orchestration middleware for recurring charges
Place a thin layer between your billing engine and PSPs.
Core responsibilities
- Routing policy: A → B based on PSP signals or issuer cohort rules.
- Idempotency per invoice/period:
invoice_id + attempt_ordinal
to prevent double charges across gateways. - Normalization: unify states/codes (authorized, captured, soft/hard decline).
- Observability: emit events with labels (gateway, BIN, country, plan, attempt).
Activation signals (subscription-friendly)
- PSP/platform: 5xx, timeout, degraded status/health flag.
- Cohort policy: issuer/BIN/region below target renewal rate.
- Latency: p95 over threshold during charge window.
Error mapping table (example)
Signal | Action | Note |
---|---|---|
PSP 5xx on auth/capture | Retry on B | 1 retry max |
Timeout (>SLO) | Retry on B | Short backoff (100–300 ms) |
BIN cohort < target renewal rate | Route to B | Policy-based |
Hard issuer decline (stolen/invalid) | No fallback | Trigger dunning flow |
Retry policy (safe defaults)
- Max 1 cross-gateway retry per attempt window.
- Never retry on hard declines/fraud signals.
- Log route + result to attribute renewal lift to fallback (vs. dunning).
Idempotency & scheduling for renewals
- Keying:
customer_id + invoice_id + attempt_ordinal
(and persist across gateways). - Windows: align retries with issuer-friendly hours; don’t collide with your dunning cadence.
- Separation of concerns: fallback handles platform failures; dunning handles issuer/funds issues.
Observability: subscription KPIs that matter
- Renewal success rate (overall, by gateway/BIN/country/plan).
- Involuntary churn (pre/post fallback).
- Timeout rate and latency p95/p99 during charge windows.
- Route distribution (% using fallback).
- Attribution: renewal lift from fallback vs. dunning.
Practical KPI: +1–4 pp renewal lift with <150 ms added latency during charge windows. (Typical range; YMMV.)
Tests & canary
- Simulate PSP outage/timeout on A in staging.
- Canary by cohort (e.g., 10% of invoices, or specific issuers/regions).
- Feature flags to toggle policies instantly.
- Rollback: single switch to disable cross-gateway retries.
Expected outcomes
- Higher renewal rate where PSP incidents used to kill cycles.
- Lower involuntary churn from platform-side failures.
- Clear attribution of what saved the invoice (fallback vs. dunning).
Quick checklist
- Portable token strategy (network tokens or shared vault)
- Error/timeout map per gateway (subscriptions context)
- Idempotency keys per invoice/attempt
- Metrics with labels (gateway, BIN, country, plan, route)
- Canary + feature flags + rollback playbook
CTA
Create your account and enable subscription fallback today.
Ship a safe canary in one sprint with built-in error maps and retries.