Configure Failover Behavior¶
When an SD-WAN member fails (SLA out of bounds), what happens depends on your rules and failover settings. Tune the response speed and recovery behavior to match your tolerance.
Key Settings¶
Failure detection time¶
config system sdwan
config health-check
edit "Google-DNS-Health"
set interval 500 # ms between checks
set failtime 5 # consecutive failures to mark down
set recoverytime 5 # consecutive successes to mark up
next
end
end
Default: 500 ms * 5 = 2.5 seconds to detect failure. Aggressive: 200 ms * 3 = 0.6 seconds (more sensitive, more flapping). Lazy: 1000 ms * 10 = 10 seconds (less flapping, slower failover).
Hold timer (prevent flapping)¶
config system sdwan
config service
edit 1
set hold-down-time 30 # seconds before bringing member back after recovery
next
end
end
Prevents oscillation on a borderline link.
Per-rule behavior on SLA failure¶
In an SD-WAN rule:
- Auto — automatically falls to next member in priority list.
- Manual — sticks with the failed member regardless (use cautiously).
Visual Failover Test¶
- Network → SD-WAN → SD-WAN Members — note current member used.
- Physically unplug
wan1. - Within ~2-5 seconds: members reshuffles, traffic moves to
wan2. - Plug
wan1back in. Afterrecoverytime(and any hold-down), traffic returns.
Logging Failover Events¶
By default, SD-WAN logs major state changes:
# Recent failover events:
execute log filter category 1 # System
execute log display
Or in GUI: Log & Report → System Events, filter by subtype = sdwan.
CLI Equivalent¶
config system sdwan
set status enable
set neighbor-hold-down enable
set neighbor-hold-down-time 30
end
Common Issues¶
- Failover takes 30+ seconds. Health-check too slow. Tighten interval and failtime.
- Failover is instant but app sessions still drop. Stateful apps can't survive WAN IP change. Use SD-WAN with same external IP (if BGP-advertised), or accept the reset for non-persistent apps.
- Constant flapping. Health check too aggressive, or actual link quality genuinely on the edge. Increase failtime + hold-down.
- Wan1 came back but traffic stays on wan2. No "preferred" member configured. Set priority in the rule.