BlackBox Flight Recorder
Why You Need This
Your robot runs overnight in a warehouse. At 3:17 AM, it stops moving. By morning, the logs have rotated, the process restarted, and nobody knows what happened.
The BlackBox solves this. Like an aircraft's flight data recorder, it continuously records the last N events in a fixed-size ring buffer. When something goes wrong, the BlackBox contains the exact sequence of events leading up to the failure — deadline misses, node panics, budget violations, emergency stops — all timestamped and structured.
BlackBox vs Logging: Logs are text, grow forever, and require parsing. The BlackBox is structured events, fixed-size (never fills your disk), and queryable by type (show only anomalies).
BlackBox vs Record/Replay: Record/Replay captures full node state (inputs/outputs) for deterministic replay — great for debugging but storage-heavy. The BlackBox captures lightweight events (what happened, not the full data) — always-on, zero overhead, crash-safe.
When to Use
| Situation | Use BlackBox? |
|---|---|
| Robot runs unattended (production, field tests) | Yes — you need crash forensics |
| Safety-critical system (motors, arms, drones) | Yes — every deadline miss is recorded |
| Development with debugger attached | Optional — you can inspect directly |
| Short test runs (< 5 minutes) | Optional — logs are usually sufficient |
| Overnight regression testing | Yes — find intermittent failures |
How It Works
The BlackBox is a circular buffer — it keeps the last N events and discards the oldest when full. This means:
- Fixed memory — never grows beyond the configured size
- Always-on — no performance impact (events are tiny structs)
- Crash-safe — data persists even if the process is killed
- No manual instrumentation — the Scheduler records events automatically
Enabling BlackBox
Use the .blackbox(size_mb) builder method to enable the BlackBox:
use horus::prelude::*;
// 16MB black box for general production
let mut scheduler = Scheduler::new()
.blackbox(16);
// 1GB black box for safety-critical systems with watchdog
let mut scheduler = Scheduler::new()
.watchdog(500_u64.ms())
.blackbox(1024);
// 100MB black box for hard real-time systems
let mut scheduler = Scheduler::new()
.blackbox(100);
What Gets Recorded
The BlackBox automatically captures events during scheduler execution:
| Event | Description |
|---|---|
| Scheduler start/stop | When the scheduler begins and ends |
| Node execution | Each node tick with duration and success/failure |
| Node errors | Failed node executions |
| Deadline misses | Nodes that missed their timing deadline |
| Budget violations | Nodes that exceeded their execution time budget |
| Failure policy events | Failure policy state transitions |
| Emergency stops | Safety system activations |
| Custom events | User-defined markers |
Post-Mortem Debugging
After a failure, the BlackBox contains the sequence of events leading up to it. Use the Scheduler's blackbox access to inspect:
use horus::prelude::*;
let mut scheduler = Scheduler::new()
.blackbox(16);
// ... application runs ...
// After a failure, inspect the blackbox via CLI:
// horus blackbox --anomalies
// horus blackbox --json
// Or programmatically after scheduler.run() returns:
if let Some(bb) = scheduler.get_blackbox() {
let anomalies = bb.lock().expect("blackbox lock").anomalies();
println!("=== ANOMALIES ({}) ===", anomalies.len());
for record in &anomalies {
println!("[tick {}] {:?}", record.tick, record.event);
}
}
Circular Buffer Behavior
The BlackBox uses a fixed-size circular buffer. When full, the oldest events are discarded:
Buffer capacity: 50,000 records (10MB)
Event 1 → [1, _, _, _, _] New events fill the buffer
Event 2 → [1, 2, _, _, _]
...
Event N → [1, 2, ..., N-1, N] Buffer full
Event N+1 → [2, 3, ..., N, N+1] Oldest dropped
This ensures bounded memory usage while keeping the most recent events for debugging.
Recommended Buffer Sizes
| Use Case | Configuration | Buffer Size |
|---|---|---|
| Development | .blackbox(16) | 16 MB |
| Long-running production | .blackbox(100) | 100 MB |
| Safety-critical | .blackbox(1024) | 1 GB |
CLI Usage
Inspect the BlackBox from the command line:
# View all events
horus blackbox
# View anomalies only (errors, deadline misses, e-stops)
horus blackbox --anomalies
# Follow in real-time (like tail -f)
horus blackbox --follow
# Filter by node
horus blackbox --node motor_ctrl
# Filter by event type
horus blackbox --event DeadlineMiss
# JSON output for scripts/dashboards
horus blackbox --json
Debugging Walkthrough: "My Robot Crashed Overnight"
Scenario: Your mobile robot stopped moving during an overnight warehouse test. The process restarted but the original crash data is gone.
Step 1: Check the BlackBox
horus blackbox --anomalies
Step 2: Read the timeline
[03:17:01.001] SchedulerStart { nodes: 4, rate: 500Hz }
[03:17:01.500] NodeTick { name: "planner", duration_us: 2100, success: true }
[03:17:01.502] DeadlineMiss { name: "collision_checker", deadline_us: 1900, actual_us: 4200 }
[03:17:01.503] DeadlineMiss { name: "collision_checker", deadline_us: 1900, actual_us: 5100 }
[03:17:01.504] NodeError { name: "arm_controller", error: "joint limit exceeded" }
[03:17:01.504] EmergencyStop { reason: "deadline miss threshold exceeded" }
Step 3: Diagnose The collision checker started missing its 1.9ms deadline (taking 4-5ms instead). During that time, the planner sent a trajectory that would have been rejected — but the check arrived too late. The arm exceeded its joint limits.
Step 4: Fix
- Tighten the collision checker's budget:
.budget(1500_u64.us()) - Or add a safety interlock: hold trajectory execution until collision check completes
- Or move collision checking to the same RT thread as the arm controller
BlackBox vs Other Debugging Tools
| Tool | What it captures | Storage | When to use |
|---|---|---|---|
| BlackBox | Scheduler events (lightweight) | Fixed ring buffer (16-1024 MB) | Always-on crash forensics |
| Record/Replay | Full node state (inputs/outputs) | Grows with time | Reproduce specific bugs |
| horus log | Text log messages | Grows with time | Verbose debugging |
| horus monitor | Live system state | None (real-time only) | Active debugging |
See Also
- Safety Monitor — Real-time safety monitoring
- Fault Tolerance — Failure policies and recovery
- Record & Replay — Full recording and playback
- Debugging Workflows — Step-by-step debugging guides