Compare commits

...

5 Commits

Author SHA1 Message Date
Michal Humpula
d0a9262cd8 simplify netlink operations 2026-03-01 08:27:16 +01:00
Michal Humpula
c913fc0fb1 add constants to routes 2026-03-01 08:22:18 +01:00
Michal Humpula
e2e68c8e81 simplify env 2026-03-01 08:20:24 +01:00
Michal Humpula
3a289ecff2 simplify ping handling 2026-03-01 08:17:26 +01:00
Michal Humpula
7d0392b703 cleanup docs 2026-03-01 07:40:32 +01:00
6 changed files with 234 additions and 796 deletions

127
README.md
View File

@@ -8,31 +8,25 @@ Route-Switcher monitors connectivity to specified IP addresses via multiple netw
## Architecture ## Architecture
### Core Components Route-Switcher consists of three main components:
1. **Async Pingers** (`src/pinger.rs`) 1. **Async Pingers** (`src/pinger.rs`) - ICMP monitoring with explicit interface binding
- Dual-interface ICMP monitoring 2. **Route Manager** (`src/routing.rs`) - Netlink-based route manipulation
- Explicit interface binding (equivalent to `ping -I <interface>`) 3. **State Machine** (`src/main.rs`) - Failover logic with anti-flapping protection
- Configurable ping targets and intervals
- Async/await implementation with tokio
2. **Route Manager** (`src/routing.rs`) ### State Machine
- Netlink-based route manipulation ```
- No external dependencies on `ip` command Boot → Primary: After 10 seconds of sampling
- Route addition and deletion Primary → Fallback: After 3 consecutive failures AND secondary is healthy
- Metric-based route prioritization Fallback → Primary: After 60 seconds of stable primary connectivity
```
3. **State Machine** (`src/main.rs`) ### Route Management Strategy
- Failover logic with anti-flapping protection - **Primary route**: metric 10 (default priority)
- Three consecutive failures trigger failover - **Secondary route**: metric 20 (lower priority)
- One minute of stable connectivity triggers failback - **Failover route**: metric 5 (highest priority, added only during failover)
- Prevents switching when both interfaces fail
4. **Configuration** The system maintains both base routes continuously and adds/removes the failover route as needed.
- Interface definitions (primary/secondary)
- Gateway configurations
- Ping targets and timing
- Route metrics
## Key Features ## Key Features
@@ -105,62 +99,74 @@ RUST_LOG=debug sudo cargo run
RUST_LOG=info sudo cargo run RUST_LOG=info sudo cargo run
``` ```
## Testing Environment ## Testing
### Podman-Compose Setup
The project includes a complete testing environment using podman-compose:
### Quick Test
```bash ```bash
# Start test environment # Start test environment
podman-compose up -d podman-compose up -d
# Run automated failover test
./scripts/test-failover.sh
# View logs # View logs
podman-compose logs -f route-switcher podman-compose logs -f route-switcher
# Stop test environment # Stop environment
podman-compose down podman-compose down
``` ```
### End-to-End Testing ### Manual Testing
```bash ```bash
# Simulate primary interface failure # Test primary connectivity
podman-compose exec primary ip link set eth0 down podman-compose exec route-switcher ping -c 3 -I eth0 192.168.202.100
# Observe failover in logs # Test secondary connectivity
podman-compose logs -f route-switcher podman-compose exec route-switcher ping -c 3 -I eth1 192.168.202.100
# Restore primary interface # Simulate primary router failure
podman-compose exec primary ip link set eth0 up podman-compose exec primary-router ip link set eth0 down
# Observe failback after 1 minute # Check routing table
podman-compose exec route-switcher ip route show
``` ```
## Implementation Details ## API (Optional)
### State Machine The route-switcher includes an optional HTTP REST API for monitoring and control.
```
[Boot] -> [Primary] (after initial connectivity check) ### Configuration
[Primary] -> [Fallback] (after 3 consecutive failures) ```bash
[Fallback] -> [Primary] (after 60 seconds of stability) # Enable API
API_ENABLED=true
API_USERNAME=admin
API_PASSWORD_HASH=<bcrypt-hash>
API_PORT=8080
``` ```
### Route Management ### Endpoints
- Primary route: `ip r add default via <primary-gw> dev <primary-iface> metric 10` - **GET /api/state** - Returns current state and ping statistics
- Secondary route: `ip r add default via <secondary-gw> dev <secondary-iface> metric 20` - **POST /api/state** - Manually set state (primary/secondary)
- Routes are managed via netlink, not external commands
### Failover Logic ### Example Response
1. **Detection**: 3 consecutive ping failures on primary interface ```json
2. **Verification**: Secondary interface must be responsive {
3. **Switch**: Update routing table to use secondary gateway "state": "Primary",
4. **Monitor**: Continue monitoring both interfaces "primary_stats": {
5. **Recovery**: After 60 seconds of stable primary connectivity, switch back "success_rate": 95.5,
"failures": 2,
### Error Handling "total_pings": 44,
- Graceful degradation on interface failures "last_ping": "Ok"
- Comprehensive logging for debugging },
- Signal handling for clean shutdown "secondary_stats": {
- Recovery from temporary network issues "success_rate": 98.2,
"failures": 1,
"total_pings": 56,
"last_ping": "Ok"
},
"last_failover": "2024-02-15T10:30:00Z"
}
```
## Dependencies ## Dependencies
@@ -169,13 +175,8 @@ podman-compose exec primary ip link set eth0 up
- `netlink-sys` - Netlink kernel communication - `netlink-sys` - Netlink kernel communication
- `anyhow` - Error handling - `anyhow` - Error handling
- `log` + `env_logger` - Logging - `log` + `env_logger` - Logging
- `crossbeam-channel` - Inter-thread communication - `clap` - Command line parsing
- `signal-hook` - Signal handling
## Development Phases
- [ ] End-to-end automated tests
## License ## License
GPLv3 GPLv

View File

@@ -1,231 +0,0 @@
# Route-Switcher API Design
## Overview
HTTP REST API with Basic Authentication for Home Assistant integration, exposing state machine state and ping statistics.
## Design Principles
- **Minimal surface area**: Only expose necessary information
- **Simple authentication**: HTTP Basic Auth (no JWT complexity)
- **State-focused**: Centered on state machine state and ping history
- **Home Assistant friendly**: Structured for HA REST integration
- **Opt-in**: API disabled by default
## API Endpoints
### GET /api/state
Returns current state machine state with ping statistics.
**Response:**
```json
{
"state": "Primary",
"primary_stats": {
"success_rate": 95.5,
"failures": 2,
"total_pings": 44,
"last_ping": "Ok"
},
"secondary_stats": {
"success_rate": 98.2,
"failures": 1,
"total_pings": 56,
"last_ping": "Ok"
},
"last_failover": "2024-02-15T10:30:00Z"
}
```
**Fields:**
- `state`: Current state machine state (Boot/Primary/Fallback)
- `primary_stats`: Ping statistics for primary interface
- `secondary_stats`: Ping statistics for secondary interface
- `last_failover`: ISO 8601 timestamp of last failover (null if never)
### POST /api/state
Manually set state machine state.
**Request:**
```json
{
"state": "fallback"
}
```
**Response:**
```json
{
"state": "Fallback",
"previous_state": "Primary",
"primary_stats": { ... },
"secondary_stats": { ... },
"last_failover": "2024-02-15T10:30:00Z"
}
```
**Valid states:** `primary`, `fallback`
## Authentication
HTTP Basic Authentication with username/password configured via environment variables.
**Security considerations:**
- Passwords stored as bcrypt hash
- HTTPS recommended for production
- Local network access only
- No token management (stateless)
## Data Structures
### PingStats
Calculated from state machine ping history (60 entries per interface):
```rust
struct PingStats {
success_rate: f64, // Percentage of successful pings
failures: usize, // Number of failed pings in history
total_pings: usize, // Total pings in history
last_ping: String, // "Ok" or "Failed"
}
```
### StateResponse
```rust
struct StateResponse {
state: String,
primary_stats: PingStats,
secondary_stats: PingStats,
last_failover: Option<String>,
}
```
## Home Assistant Integration
### REST Sensor Configuration
```yaml
sensor:
- platform: rest
name: Route Switcher State
resource: http://route-switcher.local:8080/api/state
username: !secret route_switcher_user
password: !secret route_switcher_pass
value_template: "{{ value_json.state }}"
json_attributes:
- primary_stats
- secondary_stats
- last_failover
- platform: template
sensors:
route_switcher_primary_success_rate:
value_template: "{{ state_attr('sensor.route_switcher_state', 'primary_stats').success_rate | default(0) }}"
unit_of_measurement: "%"
route_switcher_secondary_success_rate:
value_template: "{{ state_attr('sensor.route_switcher_state', 'secondary_stats').success_rate | default(0) }}"
unit_of_measurement: "%"
route_switcher_primary_failures:
value_template: "{{ state_attr('sensor.route_switcher_state', 'primary_stats').failures | default(0) }}"
route_switcher_secondary_failures:
value_template: "{{ state_attr('sensor.route_switcher_state', 'secondary_stats').failures | default(0) }}"
switch:
- platform: rest
name: Route Switcher Control
resource: http://route-switcher.local:8080/api/state
username: !secret route_switcher_user
password: !secret route_switcher_pass
body_on: '{"state": "fallback"}'
body_off: '{"state": "primary"}'
is_on_template: "{{ value_json.state == 'fallback' }}"
```
## Configuration
### Environment Variables
```bash
# API Configuration
API_ENABLED=true
API_BIND_ADDRESS=0.0.0.0
API_PORT=8080
API_USERNAME=admin
API_PASSWORD_HASH=<bcrypt-hash>
# CORS Configuration
API_CORS_ORIGINS=http://homeassistant.local:8123
```
### Password Hash Generation
```bash
# Generate bcrypt hash
echo -n "your-password" | bcrypt
```
## Implementation Details
### Dependencies
```toml
axum = "0.7"
tokio = { version = "1.42", features = ["full"] }
tower = "0.4"
tower-http = { version = "0.5", features = ["cors", "auth"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
chrono = { version = "0.4", features = ["serde"] }
bcrypt = "0.15"
base64 = "0.22"
```
### Architecture
- **API Module**: `src/api.rs` - HTTP server and endpoints
- **State Sharing**: Thread-safe access to state machine and ping history
- **Authentication**: Basic Auth middleware with bcrypt validation
- **Error Handling**: Standardized JSON error responses
- **Integration**: Minimal changes to existing state machine
### Thread Safety
- `Arc<Mutex<StateMachine>>` for shared state access
- Non-blocking async operations
- Minimal locking duration
## Error Handling
Standardized error responses:
```json
{
"error": "Invalid state",
"message": "State must be 'primary' or 'fallback'"
}
```
HTTP Status Codes:
- 200: Success
- 400: Bad Request (invalid state)
- 401: Unauthorized (invalid credentials)
- 500: Internal Server Error
## Security Considerations
- Network access restrictions (local only recommended)
- HTTPS for credential protection
- Rate limiting considerations
- Audit logging for manual state changes
- No configuration exposure (state only)
## Backward Compatibility
- API disabled by default
- No changes to existing CLI functionality
- Service continues without API if disabled
- Graceful degradation on API errors

View File

@@ -1,167 +0,0 @@
# Architecture Documentation
## System Overview
Route-Switcher is a network failover system that operates at the application layer to provide automatic network redundancy. The system monitors network connectivity through multiple interfaces and manages routing tables to ensure continuous connectivity.
## Component Architecture
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Main Thread │ │ Async Pingers │ │ Route Manager │
│ │ │ │ │ │
│ • State Machine │◄──►│ • Interface A │◄──►│ • Netlink API │
│ • Decision Logic│ │ • Interface B │ │ • Route Add/Del │
│ • Coordination │ │ • ICMP Monitoring│ │ • Metric Mgmt │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
┌──────────────────┐
│ Linux Kernel │
│ │
│ • Routing Table │
│ • Network Stack │
│ • Netlink Socket │
└──────────────────┘
```
## Data Flow
1. **Monitoring Phase**
- Async pingers send ICMP packets via both interfaces
- Results are collected and sent to main thread
- State machine evaluates connectivity patterns
2. **Decision Phase**
- State machine determines if failover is needed
- Verifies secondary interface health
- Triggers route changes if conditions are met
3. **Action Phase**
- Route manager updates kernel routing table
- Changes are applied via netlink interface
- System continues monitoring in new state
## State Machine Design
### States
- **Boot**: Initial state, gathering connectivity data
- **Primary**: Using primary interface for routing
- **Fallback**: Using secondary interface for routing
### Transitions
```
Boot → Primary: After 10 seconds of sampling (regardless of ping results)
Primary → Fallback: After 3 consecutive failures AND secondary is healthy
Fallback → Primary: After 60 seconds of stable primary connectivity
```
### Routing Behavior
- **Boot State**: Both routes are set up initially - primary (metric 10) and secondary (metric 20)
- **Primary State**: Primary route (metric 10) and secondary route (metric 20) present
- **Fallback State**: All three routes present - primary (metric 10), secondary (metric 20), and failover secondary (metric 5)
- **Exit**: Only the failover route (metric 5) is removed
### Route Management Strategy
The system follows a "both routes always present, extra failover on-demand" approach:
1. **Initialization**: Set up primary route (metric 10) and secondary route (metric 20)
2. **Boot Phase**: Collect 10 seconds of ping samples to establish baseline connectivity
3. **Normal Operation**: Primary route serves traffic (metric 10), secondary available as backup (metric 20)
4. **Failover**: Add extra secondary route with highest priority (metric 5) for immediate failover
5. **Failback**: Remove extra failover route when primary recovers
6. **Cleanup**: Only remove the extra failover route on exit, preserving base routes
### State Persistence
- Current state is maintained in memory
- State changes are logged for debugging
- No persistent storage required (state rebuilds on restart)
## Interface Design
### Pinger Interface
```rust
pub trait Pinger {
async fn ping(&self, target: Ipv4Addr, interface: &str) -> PingResult;
async fn start_monitoring(&self, targets: &[Ipv4Addr], interfaces: &[String]) -> Receiver<PingResult>;
}
```
### Route Manager Interface
```rust
pub trait RouteManager {
fn add_default_route(&self, gateway: Ipv4Addr, interface: &str, metric: u32) -> Result<()>;
fn delete_default_route(&self, gateway: Ipv4Addr, interface: &str, metric: u32) -> Result<()>;
fn get_current_routes(&self) -> Result<Vec<RouteInfo>>;
}
```
## Threading Model
### Main Thread
- Runs the state machine
- Handles signals and graceful shutdown
- Coordinates between components
### Async Pinger Tasks
- One task per interface
- Non-blocking ICMP operations
- Results sent via channels
### Route Manager
- Synchronous operations (netlink is sync)
- Called from main thread
- Thread-safe operations
## Error Handling Strategy
### Categories
1. **Network Errors**: Temporary connectivity issues
2. **System Errors**: Permission problems, interface not found
3. **Configuration Errors**: Invalid IP addresses, missing interfaces
### Recovery Mechanisms
- **Network Errors**: Retry with exponential backoff
- **System Errors**: Log and exit (requires admin intervention)
- **Configuration Errors**: Validate on startup, exit if invalid
## Security Considerations
### Privileges
- Requires root privileges for route manipulation
- Drops unnecessary privileges where possible
- Validates all user inputs
### Network Security
- Only sends ICMP packets to configured targets
- No arbitrary packet crafting
- Interface binding prevents traffic leakage
## Performance Characteristics
### Resource Usage
- **Memory**: Minimal (~10MB)
- **CPU**: Low (periodic ICMP packets)
- **Network**: Very low (only ping traffic)
### Scalability
- Single target machine design
- Supports multiple ping targets
- Limited to 2 interfaces (current design)
## Testing Architecture
### Unit Tests
- Individual component testing
- Mock network interfaces
- State machine logic verification
### Integration Tests
- Component interaction testing
- Real network interface usage
- Netlink operation verification
### End-to-End Tests
- Full system testing in containers
- Network failure simulation
- Failover timing verification

View File

@@ -1,112 +1,43 @@
# Testing Guide # Testing Guide
## Overview ## Test Environment
This document describes the testing strategy and environment for the Route-Switcher project. The testing environment uses podman-compose to create a network topology with routers and an ICMP target:
## Testing Environment
### Podman-Compose Setup
The testing environment uses podman-compose to create a realistic network topology with routers and a single ICMP target:
``` ```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Route-Switcher │ │ Primary Router│ │ │ Route-Switcher │ │ Primary Router│ │ ICMP Target
│ │ │ │ │ │ │ │ │ │ │ │
│ eth0 ────────────┼────►│ eth0 ──────────┼────►│ ICMP Target │ eth0 ────────────┼────►│ eth0 ──────────┼────►│ 192.168.202.100
│ eth1 ────────────┼────►│ eth1 ──────────┼────►│ │ │ eth1 ────────────┼────►│ eth1 ──────────┼────►│ │
│ │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ │ │
▼ ▼ ▼
primary-net secondary-net target-net
192.168.1.0/24 192.168.2.0/24 10.0.0.0/24
``` ```
### Container Architecture ### Container Setup
- **route-switcher**: Dual interfaces (eth0→primary-net, eth1→secondary-net) - **route-switcher**: Dual interfaces (eth0→primary-net, eth1→secondary-net)
- **primary-router**: Connects primary-net ↔ target-net (192.168.1.1 ↔ 10.0.0.1) - **primary-router**: Connects primary-net ↔ target-net (192.168.200.11 ↔ 192.168.202.11)
- **secondary-router**: Connects secondary-net ↔ target-net (192.168.2.1 ↔ 10.0.0.2) - **secondary-router**: Connects secondary-net ↔ target-net (192.168.201.11 ↔ 192.168.202.12)
- **icmp-target**: Single IP on target-net (10.0.0.100), reachable via either router - **icmp-target**: Single IP on target-net (192.168.202.100)
### Quick Start
```bash
# Start the testing environment
podman-compose up -d
# Run automated failover test
./scripts/test-failover.sh
# View logs
podman-compose logs -f route-switcher
# Stop environment
podman-compose down
```
### Network Configuration
**Route-Switcher:**
- eth0: 192.168.1.10 (primary network)
- eth1: 192.168.2.10 (secondary network)
- Default gateway: 192.168.1.1 (primary router)
**Primary Router:**
- eth0: 192.168.1.1 (primary network)
- eth1: 10.0.0.1 (target network)
- Routes traffic between networks with NAT
**Secondary Router:**
- eth0: 192.168.2.1 (secondary network)
- eth1: 10.0.0.2 (target network)
- Routes traffic between networks with NAT
**ICMP Target:**
- Single IP: 10.0.0.100
- Default route: 10.0.0.1 (primary router)
- Responds to ping from both routers
## Test Scenarios ## Test Scenarios
### 1. Basic Connectivity Test ### 1. Basic Connectivity Test
**Objective**: Verify basic ping functionality on both interfaces
```bash ```bash
# Start environment
podman-compose up -d podman-compose up -d
podman-compose exec route-switcher ping -c 3 -I eth0 192.168.202.100
# Test primary connectivity podman-compose exec route-switcher ping -c 3 -I eth1 192.168.202.100
podman-compose exec route-switcher ping -c 3 -I eth0 10.0.0.100
# Test secondary connectivity
podman-compose exec route-switcher ping -c 3 -I eth1 10.0.0.100
# Check routing table
podman-compose exec route-switcher ip route show
``` ```
### 2. Failover Test ### 2. Failover Test
**Objective**: Verify automatic failover when primary router fails
```bash ```bash
# Start monitoring logs # Monitor logs
podman-compose logs -f route-switcher & podman-compose logs -f route-switcher &
# Simulate primary router failure # Simulate primary router failure
podman-compose exec primary-router ip link set eth0 down podman-compose exec primary-router ip link set eth0 down
# Verify failover occurs (should see in logs) # Verify failover occurs and connectivity works
# Wait for state change to Fallback podman-compose exec route-switcher ping -c 3 192.168.202.100
# Check routing table after failover
podman-compose exec route-switcher ip route show
# Test connectivity via secondary router
podman-compose exec route-switcher ping -c 3 10.0.0.100
# Restore primary router # Restore primary router
podman-compose exec primary-router ip link set eth0 up podman-compose exec primary-router ip link set eth0 up
@@ -115,119 +46,45 @@ podman-compose exec primary-router ip link set eth0 up
``` ```
### 3. Dual Failure Test ### 3. Dual Failure Test
**Objective**: Verify system doesn't failover when both routers fail
```bash ```bash
# Start monitoring logs # Fail both routers - system should NOT switch
podman-compose logs -f route-switcher &
# Fail both routers
podman-compose exec primary-router ip link set eth0 down podman-compose exec primary-router ip link set eth0 down
podman-compose exec secondary-router ip link set eth0 down podman-compose exec secondary-router ip link set eth0 down
# Verify no routing changes occur # Verify no routing changes occur
# System should remain in current state
# Restore routers
podman-compose exec primary-router ip link set eth0 up
podman-compose exec secondary-router ip link set eth0 up
``` ```
### 4. Router Target Interface Failure ## Automated Testing
**Objective**: Test upstream network failure simulation
Run the comprehensive test script:
```bash ```bash
# Fail primary router's connection to target network
podman-compose exec primary-router ip link set eth1 down
# Should trigger failover to secondary router
# Verify connectivity still works via secondary path
# Restore primary router's target connection
podman-compose exec primary-router ip link set eth1 up
```
### 5. Automated Failover Test
**Objective**: Run complete automated test sequence
```bash
# Run the comprehensive test script
./scripts/test-failover.sh ./scripts/test-failover.sh
# This script will:
# 1. Start the environment
# 2. Verify initial connectivity
# 3. Simulate primary router failure
# 4. Monitor failover
# 5. Restore primary router
# 6. Verify failback after 60 seconds
``` ```
This script:
1. Starts the test environment
2. Verifies initial connectivity
3. Simulates primary router failure
4. Monitors failover
5. Restores primary router
6. Verifies failback
## Unit Tests ## Unit Tests
### Running Tests
```bash ```bash
# Run all tests # Run all tests
cargo test cargo test
# Run specific test module # Run specific module
cargo test pinger cargo test pinger
cargo test routing
# Run with coverage cargo test state_machine
cargo tarpaulin --out Html
``` ```
### Test Structure ## Debug Commands
```
tests/
├── unit/
│ ├── pinger_tests.rs
│ ├── routing_tests.rs
│ └── state_machine_tests.rs
├── integration/
│ ├── netlink_tests.rs
│ └── dual_interface_tests.rs
└── e2e/
└── failover_tests.rs
```
## Performance Testing
### Load Testing
```bash ```bash
# Test with multiple ping targets # Check container interfaces
cargo run -- --ping-target 8.8.8.8
# Monitor resource usage
podman stats route-switcher
# Test long-running stability
# Run for 24 hours and monitor for memory leaks
```
### Network Latency Testing
```bash
# Measure failover time
# Start script to time the state transition
start_time=$(date +%s%N)
# Trigger failure
# Wait for state change
end_time=$(date +%s%N)
failover_time=$((($end_time - $start_time) / 1000000))
echo "Failover time: ${failover_time}ms"
```
## Debugging Tests
### Common Issues
1. **Permission Denied**: Ensure containers run with privileged mode
2. **Interface Not Found**: Check network configuration in compose file
3. **Netlink Errors**: Verify kernel supports required operations
4. **Timing Issues**: Adjust test timeouts for your environment
### Debug Commands
```bash
# Check container network interfaces
podman-compose exec route-switcher ip addr show podman-compose exec route-switcher ip addr show
# Check routing table # Check routing table
@@ -235,36 +92,4 @@ podman-compose exec route-switcher ip route show
# Monitor network traffic # Monitor network traffic
podman-compose exec route-switcher tcpdump -i any icmp podman-compose exec route-switcher tcpdump -i any icmp
# Check system logs
podman-compose exec route-switcher dmesg | tail -20
``` ```
## Test Data
### Sample Ping Results
```rust
// Mock data for testing
let mock_ping_results = vec![
PingResult::Ok, // Normal operation
PingResult::Failed, // Single failure
PingResult::Failed, // Consecutive failure
PingResult::Failed, // Trigger failover
];
```
### Network Configuration
```bash
# Test network setup
ip addr add 192.168.1.10/24 dev eth0
ip addr add 192.168.2.10/24 dev eth1
ip route add default via 192.168.1.1 dev eth0 metric 10
ip route add default via 192.168.2.1 dev eth1 metric 20
```
## Test Coverage Goals
- **Unit Tests**: 90%+ code coverage
- **Integration Tests**: All major component interactions
- **E2E Tests**: All user scenarios and edge cases
- **Performance Tests**: Resource usage and timing validation

View File

@@ -54,6 +54,18 @@ struct Config {
failback_delay: u64, failback_delay: u64,
} }
fn apply_env_overrides(mut config: Config) -> Config {
config.primary_interface =
std::env::var("PRIMARY_INTERFACE").unwrap_or(config.primary_interface);
config.secondary_interface =
std::env::var("SECONDARY_INTERFACE").unwrap_or(config.secondary_interface);
config.primary_gateway = std::env::var("PRIMARY_GATEWAY").unwrap_or(config.primary_gateway);
config.secondary_gateway =
std::env::var("SECONDARY_GATEWAY").unwrap_or(config.secondary_gateway);
config.ping_target = std::env::var("PING_TARGET").unwrap_or(config.ping_target);
config
}
#[tokio::main] #[tokio::main]
async fn main() -> Result<()> { async fn main() -> Result<()> {
let env = Env::default().filter_or("RUST_LOG", "info"); let env = Env::default().filter_or("RUST_LOG", "info");
@@ -64,22 +76,7 @@ async fn main() -> Result<()> {
let config = Config::parse(); let config = Config::parse();
// Override with environment variables if present // Override with environment variables if present
let primary_interface = let config_with_env = apply_env_overrides(config);
std::env::var("PRIMARY_INTERFACE").unwrap_or(config.primary_interface.clone());
let secondary_interface =
std::env::var("SECONDARY_INTERFACE").unwrap_or(config.secondary_interface.clone());
let primary_gateway =
std::env::var("PRIMARY_GATEWAY").unwrap_or(config.primary_gateway.clone());
let secondary_gateway =
std::env::var("SECONDARY_GATEWAY").unwrap_or(config.secondary_gateway.clone());
let ping_target = std::env::var("PING_TARGET").unwrap_or(config.ping_target.clone());
let mut config_with_env = config;
config_with_env.primary_interface = primary_interface;
config_with_env.secondary_interface = secondary_interface;
config_with_env.primary_gateway = primary_gateway;
config_with_env.secondary_gateway = secondary_gateway;
config_with_env.ping_target = ping_target;
debug!("Configuration: {:?}", config_with_env); debug!("Configuration: {:?}", config_with_env);
@@ -127,6 +124,45 @@ async fn main() -> Result<()> {
use state_machine::StateMachine; use state_machine::StateMachine;
async fn handle_ping_result(
result: pinger::PingResult,
interface_name: &str,
state_machine: &Arc<tokio::sync::Mutex<StateMachine>>,
last_failover: &Arc<tokio::sync::Mutex<Option<chrono::DateTime<Utc>>>>,
route_manager: &mut routing::RouteManager,
primary_gateway: &Ipv4Addr,
secondary_gateway: &Ipv4Addr,
config: &Config,
) -> Result<()> {
debug!("{} ping result: {}", interface_name, result);
let mut sm = state_machine.lock().await;
// Add result to appropriate history based on interface
if interface_name == "primary" {
sm.add_primary_result(result);
} else {
sm.add_secondary_result(result);
}
if let Some((old_state, new_state)) = sm.update_state() {
let mut last_failover_lock = last_failover.lock().await;
if new_state == state_machine::State::Fallback
&& old_state != state_machine::State::Fallback
{
*last_failover_lock = Some(Utc::now());
}
state_machine::handle_state_change(
new_state,
old_state,
route_manager,
primary_gateway,
secondary_gateway,
config,
)?;
}
Ok(())
}
async fn main_service( async fn main_service(
config: Config, config: Config,
primary_gateway: Ipv4Addr, primary_gateway: Ipv4Addr,
@@ -204,32 +240,30 @@ async fn main_service(
tokio::select! { tokio::select! {
// Handle primary ping results // Handle primary ping results
Some(result) = primary_rx.recv() => { Some(result) = primary_rx.recv() => {
debug!("Primary ping result: {}", result); handle_ping_result(
let mut sm = state_machine.lock().await; result,
sm.add_primary_result(result); "primary",
&state_machine,
if let Some((old_state, new_state)) = sm.update_state() { &last_failover,
let mut last_failover_lock = last_failover.lock().await; &mut route_manager,
if new_state == state_machine::State::Fallback && old_state != state_machine::State::Fallback { &primary_gateway,
*last_failover_lock = Some(Utc::now()); &secondary_gateway,
} &config,
state_machine::handle_state_change(new_state, old_state, &mut route_manager, &primary_gateway, &secondary_gateway, &config)?; ).await?;
}
} }
// Handle secondary ping results // Handle secondary ping results
Some(result) = secondary_rx.recv() => { Some(result) = secondary_rx.recv() => {
debug!("Secondary ping result: {}", result); handle_ping_result(
let mut sm = state_machine.lock().await; result,
sm.add_secondary_result(result); "secondary",
&state_machine,
if let Some((old_state, new_state)) = sm.update_state() { &last_failover,
let mut last_failover_lock = last_failover.lock().await; &mut route_manager,
if new_state == state_machine::State::Fallback && old_state != state_machine::State::Fallback { &primary_gateway,
*last_failover_lock = Some(Utc::now()); &secondary_gateway,
} &config,
state_machine::handle_state_change(new_state, old_state, &mut route_manager, &primary_gateway, &secondary_gateway, &config)?; ).await?;
}
} }
// Handle shutdown signal // Handle shutdown signal

View File

@@ -2,11 +2,20 @@ use anyhow::Result;
use libc::if_nametoindex; use libc::if_nametoindex;
use log::{debug, info}; use log::{debug, info};
use netlink_packet_route::route::RouteAddress; use netlink_packet_route::route::RouteAddress;
use netlink_packet_route::{
AddressFamily, RouteNetlinkMessage,
route::{RouteAttribute, RouteHeader, RouteMessage, RouteProtocol, RouteScope, RouteType},
};
use std::ffi::CString; use std::ffi::CString;
use std::net::Ipv4Addr; use std::net::Ipv4Addr;
const MAIN_TABLE_ID: u8 = 254; const MAIN_TABLE_ID: u8 = 254;
// Route metrics - higher priority = lower number
const FAILOVER_METRIC: u32 = 5;
const PRIMARY_METRIC: u32 = 10;
const SECONDARY_METRIC: u32 = 20;
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct RouteInfo { pub struct RouteInfo {
pub gateway: Ipv4Addr, pub gateway: Ipv4Addr,
@@ -60,8 +69,6 @@ impl RouteManager {
} }
pub fn set_primary_route(&mut self, gateway: Ipv4Addr, interface: String) -> Result<()> { pub fn set_primary_route(&mut self, gateway: Ipv4Addr, interface: String) -> Result<()> {
let primary_metric = 10;
// Remove existing routes for this interface if any // Remove existing routes for this interface if any
if let Some(pos) = self.routes.iter().position(|r| r.interface == interface) { if let Some(pos) = self.routes.iter().position(|r| r.interface == interface) {
let existing_route = self.routes[pos].clone(); let existing_route = self.routes[pos].clone();
@@ -73,7 +80,7 @@ impl RouteManager {
} }
// Add as primary route // Add as primary route
self.add_route(gateway, interface, primary_metric)?; self.add_route(gateway, interface, PRIMARY_METRIC)?;
Ok(()) Ok(())
} }
@@ -88,20 +95,17 @@ impl RouteManager {
self.set_primary_route(primary_gateway, primary_interface)?; self.set_primary_route(primary_gateway, primary_interface)?;
// Set secondary route with metric 20 (lower priority) // Set secondary route with metric 20 (lower priority)
let secondary_metric = 20; self.add_route(secondary_gateway, secondary_interface, SECONDARY_METRIC)?;
self.add_route(secondary_gateway, secondary_interface, secondary_metric)?;
Ok(()) Ok(())
} }
pub fn add_failover_route(&mut self, gateway: Ipv4Addr, interface: String) -> Result<()> { pub fn add_failover_route(&mut self, gateway: Ipv4Addr, interface: String) -> Result<()> {
let failover_metric = 5; // Higher priority than both primary (10) and secondary (20) self.add_route(gateway, interface, FAILOVER_METRIC)?;
self.add_route(gateway, interface, failover_metric)?;
Ok(()) Ok(())
} }
pub fn remove_failover_route(&mut self, gateway: Ipv4Addr, interface: String) -> Result<()> { pub fn remove_failover_route(&mut self, gateway: Ipv4Addr, interface: String) -> Result<()> {
let failover_metric = 5; self.remove_route(gateway, &interface, FAILOVER_METRIC)?;
self.remove_route(gateway, &interface, failover_metric)?;
Ok(()) Ok(())
} }
@@ -123,36 +127,16 @@ impl RouteManager {
use netlink_packet_core::{ use netlink_packet_core::{
NLM_F_ACK, NLM_F_CREATE, NLM_F_REQUEST, NetlinkHeader, NetlinkMessage, NetlinkPayload, NLM_F_ACK, NLM_F_CREATE, NLM_F_REQUEST, NetlinkHeader, NetlinkMessage, NetlinkPayload,
}; };
use netlink_packet_route::{
AddressFamily, RouteNetlinkMessage,
route::RouteProtocol,
route::RouteScope,
route::{RouteAttribute, RouteHeader, RouteMessage, RouteType},
};
use netlink_sys::{Socket, SocketAddr, protocols::NETLINK_ROUTE}; use netlink_sys::{Socket, SocketAddr, protocols::NETLINK_ROUTE};
let mut socket = Socket::new(NETLINK_ROUTE)?; let mut socket = Socket::new(NETLINK_ROUTE)?;
let _port_number = socket.bind_auto()?.port_number(); let _port_number = socket.bind_auto()?.port_number();
socket.connect(&SocketAddr::new(0, 0))?; socket.connect(&SocketAddr::new(0, 0))?;
let route_msg_hdr = RouteHeader {
address_family: AddressFamily::Inet,
table: MAIN_TABLE_ID,
destination_prefix_length: 0, // Default route
protocol: RouteProtocol::Boot,
scope: RouteScope::Universe,
kind: RouteType::Unicast,
..Default::default()
};
let mut route_msg = RouteMessage::default(); let route_msg = create_route_message(route_info.gateway, index, route_info.metric);
route_msg.header = route_msg_hdr;
route_msg.attributes = vec![
RouteAttribute::Gateway(RouteAddress::Inet(route_info.gateway)),
RouteAttribute::Oif(index),
RouteAttribute::Priority(route_info.metric),
];
let mut nl_hdr = NetlinkHeader::default(); let mut nl_hdr = NetlinkHeader::default();
nl_hdr.flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_ACK; // Remove NLM_F_EXCL to allow updates nl_hdr.flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_ACK;
let mut msg = NetlinkMessage::new( let mut msg = NetlinkMessage::new(
nl_hdr, nl_hdr,
@@ -161,11 +145,8 @@ impl RouteManager {
msg.finalize(); msg.finalize();
let mut buf = vec![0; 1024 * 8]; let mut buf = vec![0; 1024 * 8];
msg.serialize(&mut buf[..msg.buffer_len()]); msg.serialize(&mut buf[..msg.buffer_len()]);
// Debug: Log the netlink message being sent
debug!("Netlink message being sent: {:?}", &buf[..msg.buffer_len()]);
debug!( debug!(
"Route addition attempt: gateway={}, interface={}, metric={}, interface_index={}", "Route addition attempt: gateway={}, interface={}, metric={}, interface_index={}",
route_info.gateway, route_info.interface, route_info.metric, index route_info.gateway, route_info.interface, route_info.metric, index
@@ -198,33 +179,18 @@ impl RouteManager {
route_info.metric route_info.metric
); );
} else { } else {
let error_str = match error_code { return handle_netlink_error(error_code);
-1 => "EPERM - Operation not permitted (need root privileges)",
-2 => "ENOENT - No such file or directory",
-13 => "EACCES - Permission denied",
-22 => "EINVAL - Invalid argument",
_ => "Unknown error",
};
return Err(anyhow::anyhow!(
"Failed to add route: {} (code: {}): {:?}",
error_str,
error_code,
error_msg
));
} }
} }
debug!("Route added successfully"); debug!("Route added successfully");
} }
Ok(())
} }
Err(e) => { Err(e) => Err(anyhow::anyhow!(
return Err(anyhow::anyhow!( "Failed to deserialize netlink message: {}",
"Failed to deserialize netlink message: {}", e
e )),
));
}
} }
Ok(())
} }
fn delete_default_route_internal( fn delete_default_route_internal(
@@ -242,35 +208,13 @@ impl RouteManager {
use netlink_packet_core::{ use netlink_packet_core::{
NLM_F_ACK, NLM_F_REQUEST, NetlinkHeader, NetlinkMessage, NetlinkPayload, NLM_F_ACK, NLM_F_REQUEST, NetlinkHeader, NetlinkMessage, NetlinkPayload,
}; };
use netlink_packet_route::{
AddressFamily, RouteNetlinkMessage,
route::RouteProtocol,
route::RouteScope,
route::{RouteAttribute, RouteHeader, RouteMessage, RouteType},
};
use netlink_sys::{Socket, SocketAddr, protocols::NETLINK_ROUTE}; use netlink_sys::{Socket, SocketAddr, protocols::NETLINK_ROUTE};
let mut socket = Socket::new(NETLINK_ROUTE)?; let mut socket = Socket::new(NETLINK_ROUTE)?;
let _port_number = socket.bind_auto()?.port_number(); let _port_number = socket.bind_auto()?.port_number();
socket.connect(&SocketAddr::new(0, 0))?; socket.connect(&SocketAddr::new(0, 0))?;
let route_msg_hdr = RouteHeader { let route_msg = create_route_message(gateway, index, metric);
address_family: AddressFamily::Inet,
table: MAIN_TABLE_ID,
destination_prefix_length: 0, // Default route
protocol: RouteProtocol::Boot,
scope: RouteScope::Universe,
kind: RouteType::Unicast,
..Default::default()
};
let mut route_msg = RouteMessage::default();
route_msg.header = route_msg_hdr;
route_msg.attributes = vec![
RouteAttribute::Gateway(RouteAddress::Inet(gateway)),
RouteAttribute::Oif(index),
RouteAttribute::Priority(metric),
];
let mut nl_hdr = NetlinkHeader::default(); let mut nl_hdr = NetlinkHeader::default();
nl_hdr.flags = NLM_F_REQUEST | NLM_F_ACK; nl_hdr.flags = NLM_F_REQUEST | NLM_F_ACK;
@@ -282,14 +226,8 @@ impl RouteManager {
msg.finalize(); msg.finalize();
let mut buf = vec![0; 1024 * 8]; let mut buf = vec![0; 1024 * 8];
msg.serialize(&mut buf[..msg.buffer_len()]); msg.serialize(&mut buf[..msg.buffer_len()]);
// Debug: Log the netlink message being sent
debug!(
"Netlink delete message being sent: {:?}",
&buf[..msg.buffer_len()]
);
debug!( debug!(
"Route deletion attempt: gateway={}, interface={}, metric={}, interface_index={}", "Route deletion attempt: gateway={}, interface={}, metric={}, interface_index={}",
gateway, interface, metric, index gateway, interface, metric, index
@@ -315,16 +253,13 @@ impl RouteManager {
} }
debug!("Route deleted successfully"); debug!("Route deleted successfully");
} }
Ok(())
} }
Err(e) => { Err(e) => Err(anyhow::anyhow!(
return Err(anyhow::anyhow!( "Failed to deserialize netlink message: {}",
"Failed to deserialize netlink message: {}", e
e )),
));
}
} }
Ok(())
} }
} }
@@ -338,3 +273,44 @@ fn get_interface_index(iface_name: &str) -> Result<u32> {
Ok(index) Ok(index)
} }
} }
fn create_route_header() -> RouteHeader {
RouteHeader {
address_family: AddressFamily::Inet,
table: MAIN_TABLE_ID,
destination_prefix_length: 0, // Default route
protocol: RouteProtocol::Boot,
scope: RouteScope::Universe,
kind: RouteType::Unicast,
..Default::default()
}
}
fn handle_netlink_error(error_code: i32) -> Result<()> {
if error_code == -17 {
// EEXIST - Route already exists, treat as success
return Ok(());
}
let error_str = match error_code {
-1 => "EPERM - Operation not permitted (need root privileges)",
-2 => "ENOENT - No such file or directory",
-13 => "EACCES - Permission denied",
-22 => "EINVAL - Invalid argument",
_ => "Unknown error",
};
Err(anyhow::anyhow!("Netlink operation failed: {}", error_str))
}
fn create_route_message(gateway: Ipv4Addr, interface_index: u32, metric: u32) -> RouteMessage {
let route_msg_hdr = create_route_header();
let mut route_msg = RouteMessage::default();
route_msg.header = route_msg_hdr;
route_msg.attributes = vec![
RouteAttribute::Gateway(RouteAddress::Inet(gateway)),
RouteAttribute::Oif(interface_index),
RouteAttribute::Priority(metric),
];
route_msg
}