6.2 KiB
6.2 KiB
Architecture Documentation
System Overview
Route-Switcher is a network failover system that operates at the application layer to provide automatic network redundancy. The system monitors network connectivity through multiple interfaces and manages routing tables to ensure continuous connectivity.
Component Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Main Thread │ │ Async Pingers │ │ Route Manager │
│ │ │ │ │ │
│ • State Machine │◄──►│ • Interface A │◄──►│ • Netlink API │
│ • Decision Logic│ │ • Interface B │ │ • Route Add/Del │
│ • Coordination │ │ • ICMP Monitoring│ │ • Metric Mgmt │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
┌──────────────────┐
│ Linux Kernel │
│ │
│ • Routing Table │
│ • Network Stack │
│ • Netlink Socket │
└──────────────────┘
Data Flow
-
Monitoring Phase
- Async pingers send ICMP packets via both interfaces
- Results are collected and sent to main thread
- State machine evaluates connectivity patterns
-
Decision Phase
- State machine determines if failover is needed
- Verifies secondary interface health
- Triggers route changes if conditions are met
-
Action Phase
- Route manager updates kernel routing table
- Changes are applied via netlink interface
- System continues monitoring in new state
State Machine Design
States
- Boot: Initial state, gathering connectivity data
- Primary: Using primary interface for routing
- Fallback: Using secondary interface for routing
Transitions
Boot → Primary: After 10 seconds of sampling (regardless of ping results)
Primary → Fallback: After 3 consecutive failures AND secondary is healthy
Fallback → Primary: After 60 seconds of stable primary connectivity
Routing Behavior
- Boot State: Both routes are set up initially - primary (metric 10) and secondary (metric 20)
- Primary State: Primary route (metric 10) and secondary route (metric 20) present
- Fallback State: All three routes present - primary (metric 10), secondary (metric 20), and failover secondary (metric 5)
- Exit: Only the failover route (metric 5) is removed
Route Management Strategy
The system follows a "both routes always present, extra failover on-demand" approach:
- Initialization: Set up primary route (metric 10) and secondary route (metric 20)
- Boot Phase: Collect 10 seconds of ping samples to establish baseline connectivity
- Normal Operation: Primary route serves traffic (metric 10), secondary available as backup (metric 20)
- Failover: Add extra secondary route with highest priority (metric 5) for immediate failover
- Failback: Remove extra failover route when primary recovers
- Cleanup: Only remove the extra failover route on exit, preserving base routes
State Persistence
- Current state is maintained in memory
- State changes are logged for debugging
- No persistent storage required (state rebuilds on restart)
Interface Design
Pinger Interface
pub trait Pinger {
async fn ping(&self, target: Ipv4Addr, interface: &str) -> PingResult;
async fn start_monitoring(&self, targets: &[Ipv4Addr], interfaces: &[String]) -> Receiver<PingResult>;
}
Route Manager Interface
pub trait RouteManager {
fn add_default_route(&self, gateway: Ipv4Addr, interface: &str, metric: u32) -> Result<()>;
fn delete_default_route(&self, gateway: Ipv4Addr, interface: &str, metric: u32) -> Result<()>;
fn get_current_routes(&self) -> Result<Vec<RouteInfo>>;
}
Threading Model
Main Thread
- Runs the state machine
- Handles signals and graceful shutdown
- Coordinates between components
Async Pinger Tasks
- One task per interface
- Non-blocking ICMP operations
- Results sent via channels
Route Manager
- Synchronous operations (netlink is sync)
- Called from main thread
- Thread-safe operations
Error Handling Strategy
Categories
- Network Errors: Temporary connectivity issues
- System Errors: Permission problems, interface not found
- Configuration Errors: Invalid IP addresses, missing interfaces
Recovery Mechanisms
- Network Errors: Retry with exponential backoff
- System Errors: Log and exit (requires admin intervention)
- Configuration Errors: Validate on startup, exit if invalid
Security Considerations
Privileges
- Requires root privileges for route manipulation
- Drops unnecessary privileges where possible
- Validates all user inputs
Network Security
- Only sends ICMP packets to configured targets
- No arbitrary packet crafting
- Interface binding prevents traffic leakage
Performance Characteristics
Resource Usage
- Memory: Minimal (~10MB)
- CPU: Low (periodic ICMP packets)
- Network: Very low (only ping traffic)
Scalability
- Single target machine design
- Supports multiple ping targets
- Limited to 2 interfaces (current design)
Testing Architecture
Unit Tests
- Individual component testing
- Mock network interfaces
- State machine logic verification
Integration Tests
- Component interaction testing
- Real network interface usage
- Netlink operation verification
End-to-End Tests
- Full system testing in containers
- Network failure simulation
- Failover timing verification