working base
This commit is contained in:
167
doc/ARCHITECTURE.md
Normal file
167
doc/ARCHITECTURE.md
Normal file
@@ -0,0 +1,167 @@
|
||||
# Architecture Documentation
|
||||
|
||||
## System Overview
|
||||
|
||||
Route-Switcher is a network failover system that operates at the application layer to provide automatic network redundancy. The system monitors network connectivity through multiple interfaces and manages routing tables to ensure continuous connectivity.
|
||||
|
||||
## Component Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
||||
│ Main Thread │ │ Async Pingers │ │ Route Manager │
|
||||
│ │ │ │ │ │
|
||||
│ • State Machine │◄──►│ • Interface A │◄──►│ • Netlink API │
|
||||
│ • Decision Logic│ │ • Interface B │ │ • Route Add/Del │
|
||||
│ • Coordination │ │ • ICMP Monitoring│ │ • Metric Mgmt │
|
||||
└─────────────────┘ └──────────────────┘ └─────────────────┘
|
||||
│ │ │
|
||||
└───────────────────────┼───────────────────────┘
|
||||
│
|
||||
┌──────────────────┐
|
||||
│ Linux Kernel │
|
||||
│ │
|
||||
│ • Routing Table │
|
||||
│ • Network Stack │
|
||||
│ • Netlink Socket │
|
||||
└──────────────────┘
|
||||
```
|
||||
|
||||
## Data Flow
|
||||
|
||||
1. **Monitoring Phase**
|
||||
- Async pingers send ICMP packets via both interfaces
|
||||
- Results are collected and sent to main thread
|
||||
- State machine evaluates connectivity patterns
|
||||
|
||||
2. **Decision Phase**
|
||||
- State machine determines if failover is needed
|
||||
- Verifies secondary interface health
|
||||
- Triggers route changes if conditions are met
|
||||
|
||||
3. **Action Phase**
|
||||
- Route manager updates kernel routing table
|
||||
- Changes are applied via netlink interface
|
||||
- System continues monitoring in new state
|
||||
|
||||
## State Machine Design
|
||||
|
||||
### States
|
||||
- **Boot**: Initial state, gathering connectivity data
|
||||
- **Primary**: Using primary interface for routing
|
||||
- **Fallback**: Using secondary interface for routing
|
||||
|
||||
### Transitions
|
||||
```
|
||||
Boot → Primary: After 10 seconds of sampling (regardless of ping results)
|
||||
Primary → Fallback: After 3 consecutive failures AND secondary is healthy
|
||||
Fallback → Primary: After 60 seconds of stable primary connectivity
|
||||
```
|
||||
|
||||
### Routing Behavior
|
||||
- **Boot State**: Both routes are set up initially - primary (metric 10) and secondary (metric 20)
|
||||
- **Primary State**: Primary route (metric 10) and secondary route (metric 20) present
|
||||
- **Fallback State**: All three routes present - primary (metric 10), secondary (metric 20), and failover secondary (metric 5)
|
||||
- **Exit**: Only the failover route (metric 5) is removed
|
||||
|
||||
### Route Management Strategy
|
||||
The system follows a "both routes always present, extra failover on-demand" approach:
|
||||
1. **Initialization**: Set up primary route (metric 10) and secondary route (metric 20)
|
||||
2. **Boot Phase**: Collect 10 seconds of ping samples to establish baseline connectivity
|
||||
3. **Normal Operation**: Primary route serves traffic (metric 10), secondary available as backup (metric 20)
|
||||
4. **Failover**: Add extra secondary route with highest priority (metric 5) for immediate failover
|
||||
5. **Failback**: Remove extra failover route when primary recovers
|
||||
6. **Cleanup**: Only remove the extra failover route on exit, preserving base routes
|
||||
|
||||
### State Persistence
|
||||
- Current state is maintained in memory
|
||||
- State changes are logged for debugging
|
||||
- No persistent storage required (state rebuilds on restart)
|
||||
|
||||
## Interface Design
|
||||
|
||||
### Pinger Interface
|
||||
```rust
|
||||
pub trait Pinger {
|
||||
async fn ping(&self, target: Ipv4Addr, interface: &str) -> PingResult;
|
||||
async fn start_monitoring(&self, targets: &[Ipv4Addr], interfaces: &[String]) -> Receiver<PingResult>;
|
||||
}
|
||||
```
|
||||
|
||||
### Route Manager Interface
|
||||
```rust
|
||||
pub trait RouteManager {
|
||||
fn add_default_route(&self, gateway: Ipv4Addr, interface: &str, metric: u32) -> Result<()>;
|
||||
fn delete_default_route(&self, gateway: Ipv4Addr, interface: &str, metric: u32) -> Result<()>;
|
||||
fn get_current_routes(&self) -> Result<Vec<RouteInfo>>;
|
||||
}
|
||||
```
|
||||
|
||||
## Threading Model
|
||||
|
||||
### Main Thread
|
||||
- Runs the state machine
|
||||
- Handles signals and graceful shutdown
|
||||
- Coordinates between components
|
||||
|
||||
### Async Pinger Tasks
|
||||
- One task per interface
|
||||
- Non-blocking ICMP operations
|
||||
- Results sent via channels
|
||||
|
||||
### Route Manager
|
||||
- Synchronous operations (netlink is sync)
|
||||
- Called from main thread
|
||||
- Thread-safe operations
|
||||
|
||||
## Error Handling Strategy
|
||||
|
||||
### Categories
|
||||
1. **Network Errors**: Temporary connectivity issues
|
||||
2. **System Errors**: Permission problems, interface not found
|
||||
3. **Configuration Errors**: Invalid IP addresses, missing interfaces
|
||||
|
||||
### Recovery Mechanisms
|
||||
- **Network Errors**: Retry with exponential backoff
|
||||
- **System Errors**: Log and exit (requires admin intervention)
|
||||
- **Configuration Errors**: Validate on startup, exit if invalid
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Privileges
|
||||
- Requires root privileges for route manipulation
|
||||
- Drops unnecessary privileges where possible
|
||||
- Validates all user inputs
|
||||
|
||||
### Network Security
|
||||
- Only sends ICMP packets to configured targets
|
||||
- No arbitrary packet crafting
|
||||
- Interface binding prevents traffic leakage
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Resource Usage
|
||||
- **Memory**: Minimal (~10MB)
|
||||
- **CPU**: Low (periodic ICMP packets)
|
||||
- **Network**: Very low (only ping traffic)
|
||||
|
||||
### Scalability
|
||||
- Single target machine design
|
||||
- Supports multiple ping targets
|
||||
- Limited to 2 interfaces (current design)
|
||||
|
||||
## Testing Architecture
|
||||
|
||||
### Unit Tests
|
||||
- Individual component testing
|
||||
- Mock network interfaces
|
||||
- State machine logic verification
|
||||
|
||||
### Integration Tests
|
||||
- Component interaction testing
|
||||
- Real network interface usage
|
||||
- Netlink operation verification
|
||||
|
||||
### End-to-End Tests
|
||||
- Full system testing in containers
|
||||
- Network failure simulation
|
||||
- Failover timing verification
|
||||
Reference in New Issue
Block a user