cleanup docs
This commit is contained in:
127
README.md
127
README.md
@@ -8,31 +8,25 @@ Route-Switcher monitors connectivity to specified IP addresses via multiple netw
|
|||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
### Core Components
|
Route-Switcher consists of three main components:
|
||||||
|
|
||||||
1. **Async Pingers** (`src/pinger.rs`)
|
1. **Async Pingers** (`src/pinger.rs`) - ICMP monitoring with explicit interface binding
|
||||||
- Dual-interface ICMP monitoring
|
2. **Route Manager** (`src/routing.rs`) - Netlink-based route manipulation
|
||||||
- Explicit interface binding (equivalent to `ping -I <interface>`)
|
3. **State Machine** (`src/main.rs`) - Failover logic with anti-flapping protection
|
||||||
- Configurable ping targets and intervals
|
|
||||||
- Async/await implementation with tokio
|
|
||||||
|
|
||||||
2. **Route Manager** (`src/routing.rs`)
|
### State Machine
|
||||||
- Netlink-based route manipulation
|
```
|
||||||
- No external dependencies on `ip` command
|
Boot → Primary: After 10 seconds of sampling
|
||||||
- Route addition and deletion
|
Primary → Fallback: After 3 consecutive failures AND secondary is healthy
|
||||||
- Metric-based route prioritization
|
Fallback → Primary: After 60 seconds of stable primary connectivity
|
||||||
|
```
|
||||||
|
|
||||||
3. **State Machine** (`src/main.rs`)
|
### Route Management Strategy
|
||||||
- Failover logic with anti-flapping protection
|
- **Primary route**: metric 10 (default priority)
|
||||||
- Three consecutive failures trigger failover
|
- **Secondary route**: metric 20 (lower priority)
|
||||||
- One minute of stable connectivity triggers failback
|
- **Failover route**: metric 5 (highest priority, added only during failover)
|
||||||
- Prevents switching when both interfaces fail
|
|
||||||
|
|
||||||
4. **Configuration**
|
The system maintains both base routes continuously and adds/removes the failover route as needed.
|
||||||
- Interface definitions (primary/secondary)
|
|
||||||
- Gateway configurations
|
|
||||||
- Ping targets and timing
|
|
||||||
- Route metrics
|
|
||||||
|
|
||||||
## Key Features
|
## Key Features
|
||||||
|
|
||||||
@@ -105,62 +99,74 @@ RUST_LOG=debug sudo cargo run
|
|||||||
RUST_LOG=info sudo cargo run
|
RUST_LOG=info sudo cargo run
|
||||||
```
|
```
|
||||||
|
|
||||||
## Testing Environment
|
## Testing
|
||||||
|
|
||||||
### Podman-Compose Setup
|
|
||||||
The project includes a complete testing environment using podman-compose:
|
|
||||||
|
|
||||||
|
### Quick Test
|
||||||
```bash
|
```bash
|
||||||
# Start test environment
|
# Start test environment
|
||||||
podman-compose up -d
|
podman-compose up -d
|
||||||
|
|
||||||
|
# Run automated failover test
|
||||||
|
./scripts/test-failover.sh
|
||||||
|
|
||||||
# View logs
|
# View logs
|
||||||
podman-compose logs -f route-switcher
|
podman-compose logs -f route-switcher
|
||||||
|
|
||||||
# Stop test environment
|
# Stop environment
|
||||||
podman-compose down
|
podman-compose down
|
||||||
```
|
```
|
||||||
|
|
||||||
### End-to-End Testing
|
### Manual Testing
|
||||||
```bash
|
```bash
|
||||||
# Simulate primary interface failure
|
# Test primary connectivity
|
||||||
podman-compose exec primary ip link set eth0 down
|
podman-compose exec route-switcher ping -c 3 -I eth0 192.168.202.100
|
||||||
|
|
||||||
# Observe failover in logs
|
# Test secondary connectivity
|
||||||
podman-compose logs -f route-switcher
|
podman-compose exec route-switcher ping -c 3 -I eth1 192.168.202.100
|
||||||
|
|
||||||
# Restore primary interface
|
# Simulate primary router failure
|
||||||
podman-compose exec primary ip link set eth0 up
|
podman-compose exec primary-router ip link set eth0 down
|
||||||
|
|
||||||
# Observe failback after 1 minute
|
# Check routing table
|
||||||
|
podman-compose exec route-switcher ip route show
|
||||||
```
|
```
|
||||||
|
|
||||||
## Implementation Details
|
## API (Optional)
|
||||||
|
|
||||||
### State Machine
|
The route-switcher includes an optional HTTP REST API for monitoring and control.
|
||||||
```
|
|
||||||
[Boot] -> [Primary] (after initial connectivity check)
|
### Configuration
|
||||||
[Primary] -> [Fallback] (after 3 consecutive failures)
|
```bash
|
||||||
[Fallback] -> [Primary] (after 60 seconds of stability)
|
# Enable API
|
||||||
|
API_ENABLED=true
|
||||||
|
API_USERNAME=admin
|
||||||
|
API_PASSWORD_HASH=<bcrypt-hash>
|
||||||
|
API_PORT=8080
|
||||||
```
|
```
|
||||||
|
|
||||||
### Route Management
|
### Endpoints
|
||||||
- Primary route: `ip r add default via <primary-gw> dev <primary-iface> metric 10`
|
- **GET /api/state** - Returns current state and ping statistics
|
||||||
- Secondary route: `ip r add default via <secondary-gw> dev <secondary-iface> metric 20`
|
- **POST /api/state** - Manually set state (primary/secondary)
|
||||||
- Routes are managed via netlink, not external commands
|
|
||||||
|
|
||||||
### Failover Logic
|
### Example Response
|
||||||
1. **Detection**: 3 consecutive ping failures on primary interface
|
```json
|
||||||
2. **Verification**: Secondary interface must be responsive
|
{
|
||||||
3. **Switch**: Update routing table to use secondary gateway
|
"state": "Primary",
|
||||||
4. **Monitor**: Continue monitoring both interfaces
|
"primary_stats": {
|
||||||
5. **Recovery**: After 60 seconds of stable primary connectivity, switch back
|
"success_rate": 95.5,
|
||||||
|
"failures": 2,
|
||||||
### Error Handling
|
"total_pings": 44,
|
||||||
- Graceful degradation on interface failures
|
"last_ping": "Ok"
|
||||||
- Comprehensive logging for debugging
|
},
|
||||||
- Signal handling for clean shutdown
|
"secondary_stats": {
|
||||||
- Recovery from temporary network issues
|
"success_rate": 98.2,
|
||||||
|
"failures": 1,
|
||||||
|
"total_pings": 56,
|
||||||
|
"last_ping": "Ok"
|
||||||
|
},
|
||||||
|
"last_failover": "2024-02-15T10:30:00Z"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
@@ -169,13 +175,8 @@ podman-compose exec primary ip link set eth0 up
|
|||||||
- `netlink-sys` - Netlink kernel communication
|
- `netlink-sys` - Netlink kernel communication
|
||||||
- `anyhow` - Error handling
|
- `anyhow` - Error handling
|
||||||
- `log` + `env_logger` - Logging
|
- `log` + `env_logger` - Logging
|
||||||
- `crossbeam-channel` - Inter-thread communication
|
- `clap` - Command line parsing
|
||||||
- `signal-hook` - Signal handling
|
|
||||||
|
|
||||||
## Development Phases
|
|
||||||
|
|
||||||
- [ ] End-to-end automated tests
|
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
GPLv3
|
GPLv
|
||||||
@@ -1,231 +0,0 @@
|
|||||||
# Route-Switcher API Design
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
HTTP REST API with Basic Authentication for Home Assistant integration, exposing state machine state and ping statistics.
|
|
||||||
|
|
||||||
## Design Principles
|
|
||||||
|
|
||||||
- **Minimal surface area**: Only expose necessary information
|
|
||||||
- **Simple authentication**: HTTP Basic Auth (no JWT complexity)
|
|
||||||
- **State-focused**: Centered on state machine state and ping history
|
|
||||||
- **Home Assistant friendly**: Structured for HA REST integration
|
|
||||||
- **Opt-in**: API disabled by default
|
|
||||||
|
|
||||||
## API Endpoints
|
|
||||||
|
|
||||||
### GET /api/state
|
|
||||||
|
|
||||||
Returns current state machine state with ping statistics.
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"state": "Primary",
|
|
||||||
"primary_stats": {
|
|
||||||
"success_rate": 95.5,
|
|
||||||
"failures": 2,
|
|
||||||
"total_pings": 44,
|
|
||||||
"last_ping": "Ok"
|
|
||||||
},
|
|
||||||
"secondary_stats": {
|
|
||||||
"success_rate": 98.2,
|
|
||||||
"failures": 1,
|
|
||||||
"total_pings": 56,
|
|
||||||
"last_ping": "Ok"
|
|
||||||
},
|
|
||||||
"last_failover": "2024-02-15T10:30:00Z"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Fields:**
|
|
||||||
- `state`: Current state machine state (Boot/Primary/Fallback)
|
|
||||||
- `primary_stats`: Ping statistics for primary interface
|
|
||||||
- `secondary_stats`: Ping statistics for secondary interface
|
|
||||||
- `last_failover`: ISO 8601 timestamp of last failover (null if never)
|
|
||||||
|
|
||||||
### POST /api/state
|
|
||||||
|
|
||||||
Manually set state machine state.
|
|
||||||
|
|
||||||
**Request:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"state": "fallback"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"state": "Fallback",
|
|
||||||
"previous_state": "Primary",
|
|
||||||
"primary_stats": { ... },
|
|
||||||
"secondary_stats": { ... },
|
|
||||||
"last_failover": "2024-02-15T10:30:00Z"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Valid states:** `primary`, `fallback`
|
|
||||||
|
|
||||||
## Authentication
|
|
||||||
|
|
||||||
HTTP Basic Authentication with username/password configured via environment variables.
|
|
||||||
|
|
||||||
**Security considerations:**
|
|
||||||
- Passwords stored as bcrypt hash
|
|
||||||
- HTTPS recommended for production
|
|
||||||
- Local network access only
|
|
||||||
- No token management (stateless)
|
|
||||||
|
|
||||||
## Data Structures
|
|
||||||
|
|
||||||
### PingStats
|
|
||||||
|
|
||||||
Calculated from state machine ping history (60 entries per interface):
|
|
||||||
|
|
||||||
```rust
|
|
||||||
struct PingStats {
|
|
||||||
success_rate: f64, // Percentage of successful pings
|
|
||||||
failures: usize, // Number of failed pings in history
|
|
||||||
total_pings: usize, // Total pings in history
|
|
||||||
last_ping: String, // "Ok" or "Failed"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### StateResponse
|
|
||||||
|
|
||||||
```rust
|
|
||||||
struct StateResponse {
|
|
||||||
state: String,
|
|
||||||
primary_stats: PingStats,
|
|
||||||
secondary_stats: PingStats,
|
|
||||||
last_failover: Option<String>,
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Home Assistant Integration
|
|
||||||
|
|
||||||
### REST Sensor Configuration
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
sensor:
|
|
||||||
- platform: rest
|
|
||||||
name: Route Switcher State
|
|
||||||
resource: http://route-switcher.local:8080/api/state
|
|
||||||
username: !secret route_switcher_user
|
|
||||||
password: !secret route_switcher_pass
|
|
||||||
value_template: "{{ value_json.state }}"
|
|
||||||
json_attributes:
|
|
||||||
- primary_stats
|
|
||||||
- secondary_stats
|
|
||||||
- last_failover
|
|
||||||
|
|
||||||
- platform: template
|
|
||||||
sensors:
|
|
||||||
route_switcher_primary_success_rate:
|
|
||||||
value_template: "{{ state_attr('sensor.route_switcher_state', 'primary_stats').success_rate | default(0) }}"
|
|
||||||
unit_of_measurement: "%"
|
|
||||||
route_switcher_secondary_success_rate:
|
|
||||||
value_template: "{{ state_attr('sensor.route_switcher_state', 'secondary_stats').success_rate | default(0) }}"
|
|
||||||
unit_of_measurement: "%"
|
|
||||||
route_switcher_primary_failures:
|
|
||||||
value_template: "{{ state_attr('sensor.route_switcher_state', 'primary_stats').failures | default(0) }}"
|
|
||||||
route_switcher_secondary_failures:
|
|
||||||
value_template: "{{ state_attr('sensor.route_switcher_state', 'secondary_stats').failures | default(0) }}"
|
|
||||||
|
|
||||||
switch:
|
|
||||||
- platform: rest
|
|
||||||
name: Route Switcher Control
|
|
||||||
resource: http://route-switcher.local:8080/api/state
|
|
||||||
username: !secret route_switcher_user
|
|
||||||
password: !secret route_switcher_pass
|
|
||||||
body_on: '{"state": "fallback"}'
|
|
||||||
body_off: '{"state": "primary"}'
|
|
||||||
is_on_template: "{{ value_json.state == 'fallback' }}"
|
|
||||||
```
|
|
||||||
|
|
||||||
## Configuration
|
|
||||||
|
|
||||||
### Environment Variables
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# API Configuration
|
|
||||||
API_ENABLED=true
|
|
||||||
API_BIND_ADDRESS=0.0.0.0
|
|
||||||
API_PORT=8080
|
|
||||||
API_USERNAME=admin
|
|
||||||
API_PASSWORD_HASH=<bcrypt-hash>
|
|
||||||
|
|
||||||
# CORS Configuration
|
|
||||||
API_CORS_ORIGINS=http://homeassistant.local:8123
|
|
||||||
```
|
|
||||||
|
|
||||||
### Password Hash Generation
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Generate bcrypt hash
|
|
||||||
echo -n "your-password" | bcrypt
|
|
||||||
```
|
|
||||||
|
|
||||||
## Implementation Details
|
|
||||||
|
|
||||||
### Dependencies
|
|
||||||
|
|
||||||
```toml
|
|
||||||
axum = "0.7"
|
|
||||||
tokio = { version = "1.42", features = ["full"] }
|
|
||||||
tower = "0.4"
|
|
||||||
tower-http = { version = "0.5", features = ["cors", "auth"] }
|
|
||||||
serde = { version = "1.0", features = ["derive"] }
|
|
||||||
serde_json = "1.0"
|
|
||||||
chrono = { version = "0.4", features = ["serde"] }
|
|
||||||
bcrypt = "0.15"
|
|
||||||
base64 = "0.22"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Architecture
|
|
||||||
|
|
||||||
- **API Module**: `src/api.rs` - HTTP server and endpoints
|
|
||||||
- **State Sharing**: Thread-safe access to state machine and ping history
|
|
||||||
- **Authentication**: Basic Auth middleware with bcrypt validation
|
|
||||||
- **Error Handling**: Standardized JSON error responses
|
|
||||||
- **Integration**: Minimal changes to existing state machine
|
|
||||||
|
|
||||||
### Thread Safety
|
|
||||||
|
|
||||||
- `Arc<Mutex<StateMachine>>` for shared state access
|
|
||||||
- Non-blocking async operations
|
|
||||||
- Minimal locking duration
|
|
||||||
|
|
||||||
## Error Handling
|
|
||||||
|
|
||||||
Standardized error responses:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"error": "Invalid state",
|
|
||||||
"message": "State must be 'primary' or 'fallback'"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
HTTP Status Codes:
|
|
||||||
- 200: Success
|
|
||||||
- 400: Bad Request (invalid state)
|
|
||||||
- 401: Unauthorized (invalid credentials)
|
|
||||||
- 500: Internal Server Error
|
|
||||||
|
|
||||||
## Security Considerations
|
|
||||||
|
|
||||||
- Network access restrictions (local only recommended)
|
|
||||||
- HTTPS for credential protection
|
|
||||||
- Rate limiting considerations
|
|
||||||
- Audit logging for manual state changes
|
|
||||||
- No configuration exposure (state only)
|
|
||||||
|
|
||||||
## Backward Compatibility
|
|
||||||
|
|
||||||
- API disabled by default
|
|
||||||
- No changes to existing CLI functionality
|
|
||||||
- Service continues without API if disabled
|
|
||||||
- Graceful degradation on API errors
|
|
||||||
@@ -1,167 +0,0 @@
|
|||||||
# Architecture Documentation
|
|
||||||
|
|
||||||
## System Overview
|
|
||||||
|
|
||||||
Route-Switcher is a network failover system that operates at the application layer to provide automatic network redundancy. The system monitors network connectivity through multiple interfaces and manages routing tables to ensure continuous connectivity.
|
|
||||||
|
|
||||||
## Component Architecture
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
|
||||||
│ Main Thread │ │ Async Pingers │ │ Route Manager │
|
|
||||||
│ │ │ │ │ │
|
|
||||||
│ • State Machine │◄──►│ • Interface A │◄──►│ • Netlink API │
|
|
||||||
│ • Decision Logic│ │ • Interface B │ │ • Route Add/Del │
|
|
||||||
│ • Coordination │ │ • ICMP Monitoring│ │ • Metric Mgmt │
|
|
||||||
└─────────────────┘ └──────────────────┘ └─────────────────┘
|
|
||||||
│ │ │
|
|
||||||
└───────────────────────┼───────────────────────┘
|
|
||||||
│
|
|
||||||
┌──────────────────┐
|
|
||||||
│ Linux Kernel │
|
|
||||||
│ │
|
|
||||||
│ • Routing Table │
|
|
||||||
│ • Network Stack │
|
|
||||||
│ • Netlink Socket │
|
|
||||||
└──────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
## Data Flow
|
|
||||||
|
|
||||||
1. **Monitoring Phase**
|
|
||||||
- Async pingers send ICMP packets via both interfaces
|
|
||||||
- Results are collected and sent to main thread
|
|
||||||
- State machine evaluates connectivity patterns
|
|
||||||
|
|
||||||
2. **Decision Phase**
|
|
||||||
- State machine determines if failover is needed
|
|
||||||
- Verifies secondary interface health
|
|
||||||
- Triggers route changes if conditions are met
|
|
||||||
|
|
||||||
3. **Action Phase**
|
|
||||||
- Route manager updates kernel routing table
|
|
||||||
- Changes are applied via netlink interface
|
|
||||||
- System continues monitoring in new state
|
|
||||||
|
|
||||||
## State Machine Design
|
|
||||||
|
|
||||||
### States
|
|
||||||
- **Boot**: Initial state, gathering connectivity data
|
|
||||||
- **Primary**: Using primary interface for routing
|
|
||||||
- **Fallback**: Using secondary interface for routing
|
|
||||||
|
|
||||||
### Transitions
|
|
||||||
```
|
|
||||||
Boot → Primary: After 10 seconds of sampling (regardless of ping results)
|
|
||||||
Primary → Fallback: After 3 consecutive failures AND secondary is healthy
|
|
||||||
Fallback → Primary: After 60 seconds of stable primary connectivity
|
|
||||||
```
|
|
||||||
|
|
||||||
### Routing Behavior
|
|
||||||
- **Boot State**: Both routes are set up initially - primary (metric 10) and secondary (metric 20)
|
|
||||||
- **Primary State**: Primary route (metric 10) and secondary route (metric 20) present
|
|
||||||
- **Fallback State**: All three routes present - primary (metric 10), secondary (metric 20), and failover secondary (metric 5)
|
|
||||||
- **Exit**: Only the failover route (metric 5) is removed
|
|
||||||
|
|
||||||
### Route Management Strategy
|
|
||||||
The system follows a "both routes always present, extra failover on-demand" approach:
|
|
||||||
1. **Initialization**: Set up primary route (metric 10) and secondary route (metric 20)
|
|
||||||
2. **Boot Phase**: Collect 10 seconds of ping samples to establish baseline connectivity
|
|
||||||
3. **Normal Operation**: Primary route serves traffic (metric 10), secondary available as backup (metric 20)
|
|
||||||
4. **Failover**: Add extra secondary route with highest priority (metric 5) for immediate failover
|
|
||||||
5. **Failback**: Remove extra failover route when primary recovers
|
|
||||||
6. **Cleanup**: Only remove the extra failover route on exit, preserving base routes
|
|
||||||
|
|
||||||
### State Persistence
|
|
||||||
- Current state is maintained in memory
|
|
||||||
- State changes are logged for debugging
|
|
||||||
- No persistent storage required (state rebuilds on restart)
|
|
||||||
|
|
||||||
## Interface Design
|
|
||||||
|
|
||||||
### Pinger Interface
|
|
||||||
```rust
|
|
||||||
pub trait Pinger {
|
|
||||||
async fn ping(&self, target: Ipv4Addr, interface: &str) -> PingResult;
|
|
||||||
async fn start_monitoring(&self, targets: &[Ipv4Addr], interfaces: &[String]) -> Receiver<PingResult>;
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Route Manager Interface
|
|
||||||
```rust
|
|
||||||
pub trait RouteManager {
|
|
||||||
fn add_default_route(&self, gateway: Ipv4Addr, interface: &str, metric: u32) -> Result<()>;
|
|
||||||
fn delete_default_route(&self, gateway: Ipv4Addr, interface: &str, metric: u32) -> Result<()>;
|
|
||||||
fn get_current_routes(&self) -> Result<Vec<RouteInfo>>;
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Threading Model
|
|
||||||
|
|
||||||
### Main Thread
|
|
||||||
- Runs the state machine
|
|
||||||
- Handles signals and graceful shutdown
|
|
||||||
- Coordinates between components
|
|
||||||
|
|
||||||
### Async Pinger Tasks
|
|
||||||
- One task per interface
|
|
||||||
- Non-blocking ICMP operations
|
|
||||||
- Results sent via channels
|
|
||||||
|
|
||||||
### Route Manager
|
|
||||||
- Synchronous operations (netlink is sync)
|
|
||||||
- Called from main thread
|
|
||||||
- Thread-safe operations
|
|
||||||
|
|
||||||
## Error Handling Strategy
|
|
||||||
|
|
||||||
### Categories
|
|
||||||
1. **Network Errors**: Temporary connectivity issues
|
|
||||||
2. **System Errors**: Permission problems, interface not found
|
|
||||||
3. **Configuration Errors**: Invalid IP addresses, missing interfaces
|
|
||||||
|
|
||||||
### Recovery Mechanisms
|
|
||||||
- **Network Errors**: Retry with exponential backoff
|
|
||||||
- **System Errors**: Log and exit (requires admin intervention)
|
|
||||||
- **Configuration Errors**: Validate on startup, exit if invalid
|
|
||||||
|
|
||||||
## Security Considerations
|
|
||||||
|
|
||||||
### Privileges
|
|
||||||
- Requires root privileges for route manipulation
|
|
||||||
- Drops unnecessary privileges where possible
|
|
||||||
- Validates all user inputs
|
|
||||||
|
|
||||||
### Network Security
|
|
||||||
- Only sends ICMP packets to configured targets
|
|
||||||
- No arbitrary packet crafting
|
|
||||||
- Interface binding prevents traffic leakage
|
|
||||||
|
|
||||||
## Performance Characteristics
|
|
||||||
|
|
||||||
### Resource Usage
|
|
||||||
- **Memory**: Minimal (~10MB)
|
|
||||||
- **CPU**: Low (periodic ICMP packets)
|
|
||||||
- **Network**: Very low (only ping traffic)
|
|
||||||
|
|
||||||
### Scalability
|
|
||||||
- Single target machine design
|
|
||||||
- Supports multiple ping targets
|
|
||||||
- Limited to 2 interfaces (current design)
|
|
||||||
|
|
||||||
## Testing Architecture
|
|
||||||
|
|
||||||
### Unit Tests
|
|
||||||
- Individual component testing
|
|
||||||
- Mock network interfaces
|
|
||||||
- State machine logic verification
|
|
||||||
|
|
||||||
### Integration Tests
|
|
||||||
- Component interaction testing
|
|
||||||
- Real network interface usage
|
|
||||||
- Netlink operation verification
|
|
||||||
|
|
||||||
### End-to-End Tests
|
|
||||||
- Full system testing in containers
|
|
||||||
- Network failure simulation
|
|
||||||
- Failover timing verification
|
|
||||||
233
doc/TESTING.md
233
doc/TESTING.md
@@ -1,112 +1,43 @@
|
|||||||
# Testing Guide
|
# Testing Guide
|
||||||
|
|
||||||
## Overview
|
## Test Environment
|
||||||
|
|
||||||
This document describes the testing strategy and environment for the Route-Switcher project.
|
The testing environment uses podman-compose to create a network topology with routers and an ICMP target:
|
||||||
|
|
||||||
## Testing Environment
|
|
||||||
|
|
||||||
### Podman-Compose Setup
|
|
||||||
|
|
||||||
The testing environment uses podman-compose to create a realistic network topology with routers and a single ICMP target:
|
|
||||||
|
|
||||||
```
|
```
|
||||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||||
│ Route-Switcher │ │ Primary Router│ │ │
|
│ Route-Switcher │ │ Primary Router│ │ ICMP Target │
|
||||||
│ │ │ │ │ │
|
│ │ │ │ │ │
|
||||||
│ eth0 ────────────┼────►│ eth0 ──────────┼────►│ ICMP Target │
|
│ eth0 ────────────┼────►│ eth0 ──────────┼────►│ 192.168.202.100│
|
||||||
│ eth1 ────────────┼────►│ eth1 ──────────┼────►│ │
|
│ eth1 ────────────┼────►│ eth1 ──────────┼────►│ │
|
||||||
│ │ │ │ │ │
|
|
||||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||||
│ │ │
|
|
||||||
│ │ │
|
|
||||||
▼ ▼ ▼
|
|
||||||
primary-net secondary-net target-net
|
|
||||||
192.168.1.0/24 192.168.2.0/24 10.0.0.0/24
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Container Architecture
|
### Container Setup
|
||||||
|
|
||||||
- **route-switcher**: Dual interfaces (eth0→primary-net, eth1→secondary-net)
|
- **route-switcher**: Dual interfaces (eth0→primary-net, eth1→secondary-net)
|
||||||
- **primary-router**: Connects primary-net ↔ target-net (192.168.1.1 ↔ 10.0.0.1)
|
- **primary-router**: Connects primary-net ↔ target-net (192.168.200.11 ↔ 192.168.202.11)
|
||||||
- **secondary-router**: Connects secondary-net ↔ target-net (192.168.2.1 ↔ 10.0.0.2)
|
- **secondary-router**: Connects secondary-net ↔ target-net (192.168.201.11 ↔ 192.168.202.12)
|
||||||
- **icmp-target**: Single IP on target-net (10.0.0.100), reachable via either router
|
- **icmp-target**: Single IP on target-net (192.168.202.100)
|
||||||
|
|
||||||
### Quick Start
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Start the testing environment
|
|
||||||
podman-compose up -d
|
|
||||||
|
|
||||||
# Run automated failover test
|
|
||||||
./scripts/test-failover.sh
|
|
||||||
|
|
||||||
# View logs
|
|
||||||
podman-compose logs -f route-switcher
|
|
||||||
|
|
||||||
# Stop environment
|
|
||||||
podman-compose down
|
|
||||||
```
|
|
||||||
|
|
||||||
### Network Configuration
|
|
||||||
|
|
||||||
**Route-Switcher:**
|
|
||||||
- eth0: 192.168.1.10 (primary network)
|
|
||||||
- eth1: 192.168.2.10 (secondary network)
|
|
||||||
- Default gateway: 192.168.1.1 (primary router)
|
|
||||||
|
|
||||||
**Primary Router:**
|
|
||||||
- eth0: 192.168.1.1 (primary network)
|
|
||||||
- eth1: 10.0.0.1 (target network)
|
|
||||||
- Routes traffic between networks with NAT
|
|
||||||
|
|
||||||
**Secondary Router:**
|
|
||||||
- eth0: 192.168.2.1 (secondary network)
|
|
||||||
- eth1: 10.0.0.2 (target network)
|
|
||||||
- Routes traffic between networks with NAT
|
|
||||||
|
|
||||||
**ICMP Target:**
|
|
||||||
- Single IP: 10.0.0.100
|
|
||||||
- Default route: 10.0.0.1 (primary router)
|
|
||||||
- Responds to ping from both routers
|
|
||||||
|
|
||||||
## Test Scenarios
|
## Test Scenarios
|
||||||
|
|
||||||
### 1. Basic Connectivity Test
|
### 1. Basic Connectivity Test
|
||||||
**Objective**: Verify basic ping functionality on both interfaces
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Start environment
|
|
||||||
podman-compose up -d
|
podman-compose up -d
|
||||||
|
podman-compose exec route-switcher ping -c 3 -I eth0 192.168.202.100
|
||||||
# Test primary connectivity
|
podman-compose exec route-switcher ping -c 3 -I eth1 192.168.202.100
|
||||||
podman-compose exec route-switcher ping -c 3 -I eth0 10.0.0.100
|
|
||||||
|
|
||||||
# Test secondary connectivity
|
|
||||||
podman-compose exec route-switcher ping -c 3 -I eth1 10.0.0.100
|
|
||||||
|
|
||||||
# Check routing table
|
|
||||||
podman-compose exec route-switcher ip route show
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. Failover Test
|
### 2. Failover Test
|
||||||
**Objective**: Verify automatic failover when primary router fails
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Start monitoring logs
|
# Monitor logs
|
||||||
podman-compose logs -f route-switcher &
|
podman-compose logs -f route-switcher &
|
||||||
|
|
||||||
# Simulate primary router failure
|
# Simulate primary router failure
|
||||||
podman-compose exec primary-router ip link set eth0 down
|
podman-compose exec primary-router ip link set eth0 down
|
||||||
|
|
||||||
# Verify failover occurs (should see in logs)
|
# Verify failover occurs and connectivity works
|
||||||
# Wait for state change to Fallback
|
podman-compose exec route-switcher ping -c 3 192.168.202.100
|
||||||
|
|
||||||
# Check routing table after failover
|
|
||||||
podman-compose exec route-switcher ip route show
|
|
||||||
|
|
||||||
# Test connectivity via secondary router
|
|
||||||
podman-compose exec route-switcher ping -c 3 10.0.0.100
|
|
||||||
|
|
||||||
# Restore primary router
|
# Restore primary router
|
||||||
podman-compose exec primary-router ip link set eth0 up
|
podman-compose exec primary-router ip link set eth0 up
|
||||||
@@ -115,119 +46,45 @@ podman-compose exec primary-router ip link set eth0 up
|
|||||||
```
|
```
|
||||||
|
|
||||||
### 3. Dual Failure Test
|
### 3. Dual Failure Test
|
||||||
**Objective**: Verify system doesn't failover when both routers fail
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Start monitoring logs
|
# Fail both routers - system should NOT switch
|
||||||
podman-compose logs -f route-switcher &
|
|
||||||
|
|
||||||
# Fail both routers
|
|
||||||
podman-compose exec primary-router ip link set eth0 down
|
podman-compose exec primary-router ip link set eth0 down
|
||||||
podman-compose exec secondary-router ip link set eth0 down
|
podman-compose exec secondary-router ip link set eth0 down
|
||||||
|
|
||||||
# Verify no routing changes occur
|
# Verify no routing changes occur
|
||||||
# System should remain in current state
|
|
||||||
|
|
||||||
# Restore routers
|
|
||||||
podman-compose exec primary-router ip link set eth0 up
|
|
||||||
podman-compose exec secondary-router ip link set eth0 up
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. Router Target Interface Failure
|
## Automated Testing
|
||||||
**Objective**: Test upstream network failure simulation
|
|
||||||
|
|
||||||
|
Run the comprehensive test script:
|
||||||
```bash
|
```bash
|
||||||
# Fail primary router's connection to target network
|
|
||||||
podman-compose exec primary-router ip link set eth1 down
|
|
||||||
|
|
||||||
# Should trigger failover to secondary router
|
|
||||||
# Verify connectivity still works via secondary path
|
|
||||||
|
|
||||||
# Restore primary router's target connection
|
|
||||||
podman-compose exec primary-router ip link set eth1 up
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5. Automated Failover Test
|
|
||||||
**Objective**: Run complete automated test sequence
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Run the comprehensive test script
|
|
||||||
./scripts/test-failover.sh
|
./scripts/test-failover.sh
|
||||||
|
|
||||||
# This script will:
|
|
||||||
# 1. Start the environment
|
|
||||||
# 2. Verify initial connectivity
|
|
||||||
# 3. Simulate primary router failure
|
|
||||||
# 4. Monitor failover
|
|
||||||
# 5. Restore primary router
|
|
||||||
# 6. Verify failback after 60 seconds
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
This script:
|
||||||
|
1. Starts the test environment
|
||||||
|
2. Verifies initial connectivity
|
||||||
|
3. Simulates primary router failure
|
||||||
|
4. Monitors failover
|
||||||
|
5. Restores primary router
|
||||||
|
6. Verifies failback
|
||||||
|
|
||||||
## Unit Tests
|
## Unit Tests
|
||||||
|
|
||||||
### Running Tests
|
|
||||||
```bash
|
```bash
|
||||||
# Run all tests
|
# Run all tests
|
||||||
cargo test
|
cargo test
|
||||||
|
|
||||||
# Run specific test module
|
# Run specific module
|
||||||
cargo test pinger
|
cargo test pinger
|
||||||
|
cargo test routing
|
||||||
# Run with coverage
|
cargo test state_machine
|
||||||
cargo tarpaulin --out Html
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Test Structure
|
## Debug Commands
|
||||||
```
|
|
||||||
tests/
|
|
||||||
├── unit/
|
|
||||||
│ ├── pinger_tests.rs
|
|
||||||
│ ├── routing_tests.rs
|
|
||||||
│ └── state_machine_tests.rs
|
|
||||||
├── integration/
|
|
||||||
│ ├── netlink_tests.rs
|
|
||||||
│ └── dual_interface_tests.rs
|
|
||||||
└── e2e/
|
|
||||||
└── failover_tests.rs
|
|
||||||
```
|
|
||||||
|
|
||||||
## Performance Testing
|
|
||||||
|
|
||||||
### Load Testing
|
|
||||||
```bash
|
```bash
|
||||||
# Test with multiple ping targets
|
# Check container interfaces
|
||||||
cargo run -- --ping-target 8.8.8.8
|
|
||||||
|
|
||||||
# Monitor resource usage
|
|
||||||
podman stats route-switcher
|
|
||||||
|
|
||||||
# Test long-running stability
|
|
||||||
# Run for 24 hours and monitor for memory leaks
|
|
||||||
```
|
|
||||||
|
|
||||||
### Network Latency Testing
|
|
||||||
```bash
|
|
||||||
# Measure failover time
|
|
||||||
# Start script to time the state transition
|
|
||||||
start_time=$(date +%s%N)
|
|
||||||
# Trigger failure
|
|
||||||
# Wait for state change
|
|
||||||
end_time=$(date +%s%N)
|
|
||||||
failover_time=$((($end_time - $start_time) / 1000000))
|
|
||||||
echo "Failover time: ${failover_time}ms"
|
|
||||||
```
|
|
||||||
|
|
||||||
## Debugging Tests
|
|
||||||
|
|
||||||
### Common Issues
|
|
||||||
1. **Permission Denied**: Ensure containers run with privileged mode
|
|
||||||
2. **Interface Not Found**: Check network configuration in compose file
|
|
||||||
3. **Netlink Errors**: Verify kernel supports required operations
|
|
||||||
4. **Timing Issues**: Adjust test timeouts for your environment
|
|
||||||
|
|
||||||
### Debug Commands
|
|
||||||
```bash
|
|
||||||
# Check container network interfaces
|
|
||||||
podman-compose exec route-switcher ip addr show
|
podman-compose exec route-switcher ip addr show
|
||||||
|
|
||||||
# Check routing table
|
# Check routing table
|
||||||
@@ -235,36 +92,4 @@ podman-compose exec route-switcher ip route show
|
|||||||
|
|
||||||
# Monitor network traffic
|
# Monitor network traffic
|
||||||
podman-compose exec route-switcher tcpdump -i any icmp
|
podman-compose exec route-switcher tcpdump -i any icmp
|
||||||
|
|
||||||
# Check system logs
|
|
||||||
podman-compose exec route-switcher dmesg | tail -20
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Test Data
|
|
||||||
|
|
||||||
### Sample Ping Results
|
|
||||||
```rust
|
|
||||||
// Mock data for testing
|
|
||||||
let mock_ping_results = vec![
|
|
||||||
PingResult::Ok, // Normal operation
|
|
||||||
PingResult::Failed, // Single failure
|
|
||||||
PingResult::Failed, // Consecutive failure
|
|
||||||
PingResult::Failed, // Trigger failover
|
|
||||||
];
|
|
||||||
```
|
|
||||||
|
|
||||||
### Network Configuration
|
|
||||||
```bash
|
|
||||||
# Test network setup
|
|
||||||
ip addr add 192.168.1.10/24 dev eth0
|
|
||||||
ip addr add 192.168.2.10/24 dev eth1
|
|
||||||
ip route add default via 192.168.1.1 dev eth0 metric 10
|
|
||||||
ip route add default via 192.168.2.1 dev eth1 metric 20
|
|
||||||
```
|
|
||||||
|
|
||||||
## Test Coverage Goals
|
|
||||||
|
|
||||||
- **Unit Tests**: 90%+ code coverage
|
|
||||||
- **Integration Tests**: All major component interactions
|
|
||||||
- **E2E Tests**: All user scenarios and edge cases
|
|
||||||
- **Performance Tests**: Resource usage and timing validation
|
|
||||||
|
|||||||
Reference in New Issue
Block a user