BlackBox Flight Recorder

Why You Need This

Your robot runs overnight in a warehouse. At 3:17 AM, it stops moving. By morning, the logs have rotated, the process restarted, and nobody knows what happened.

The BlackBox solves this. Like an aircraft's flight data recorder, it continuously records the last N events in a fixed-size ring buffer. When something goes wrong, the BlackBox contains the exact sequence of events leading up to the failure — deadline misses, node panics, budget violations, emergency stops — all timestamped and structured.

BlackBox vs Logging: Logs are text, grow forever, and require parsing. The BlackBox is structured events, fixed-size (never fills your disk), and queryable by type (show only anomalies).

BlackBox vs Record/Replay: Record/Replay captures full node state (inputs/outputs) for deterministic replay — great for debugging but storage-heavy. The BlackBox captures lightweight events (what happened, not the full data) — always-on, zero overhead, crash-safe.

When to Use

SituationUse BlackBox?
Robot runs unattended (production, field tests)Yes — you need crash forensics
Safety-critical system (motors, arms, drones)Yes — every deadline miss is recorded
Development with debugger attachedOptional — you can inspect directly
Short test runs (< 5 minutes)Optional — logs are usually sufficient
Overnight regression testingYes — find intermittent failures

How It Works

The BlackBox is a circular buffer — it keeps the last N events and discards the oldest when full. This means:

  • Fixed memory — never grows beyond the configured size
  • Always-on — no performance impact (events are tiny structs)
  • Crash-safe — data persists even if the process is killed
  • No manual instrumentation — the Scheduler records events automatically

Enabling BlackBox

Use the .blackbox(size_mb) builder method to enable the BlackBox:

use horus::prelude::*;

// 16MB black box for general production
let mut scheduler = Scheduler::new()
    .blackbox(16);

// 1GB black box for safety-critical systems with watchdog
let mut scheduler = Scheduler::new()
    .watchdog(500_u64.ms())
    .blackbox(1024);

// 100MB black box for hard real-time systems
let mut scheduler = Scheduler::new()
    .blackbox(100);

What Gets Recorded

The BlackBox automatically captures events during scheduler execution:

EventDescription
Scheduler start/stopWhen the scheduler begins and ends
Node executionEach node tick with duration and success/failure
Node errorsFailed node executions
Deadline missesNodes that missed their timing deadline
Budget violationsNodes that exceeded their execution time budget
Failure policy eventsFailure policy state transitions
Emergency stopsSafety system activations
Custom eventsUser-defined markers

Post-Mortem Debugging

After a failure, the BlackBox contains the sequence of events leading up to it. Use the Scheduler's blackbox access to inspect:

use horus::prelude::*;

let mut scheduler = Scheduler::new()
    .blackbox(16);

// ... application runs ...

// After a failure, inspect the blackbox via CLI:
//   horus blackbox --anomalies
//   horus blackbox --json

// Or programmatically after scheduler.run() returns:
if let Some(bb) = scheduler.get_blackbox() {
    let anomalies = bb.lock().expect("blackbox lock").anomalies();
    println!("=== ANOMALIES ({}) ===", anomalies.len());
    for record in &anomalies {
        println!("[tick {}] {:?}", record.tick, record.event);
    }
}

Circular Buffer Behavior

The BlackBox uses a fixed-size circular buffer. When full, the oldest events are discarded:

Buffer capacity: 50,000 records (10MB)

Event 1 → [1, _, _, _, _]     New events fill the buffer
Event 2 → [1, 2, _, _, _]
...
Event N → [1, 2, ..., N-1, N]  Buffer full
Event N+1 → [2, 3, ..., N, N+1]  Oldest dropped

This ensures bounded memory usage while keeping the most recent events for debugging.

Use CaseConfigurationBuffer Size
Development.blackbox(16)16 MB
Long-running production.blackbox(100)100 MB
Safety-critical.blackbox(1024)1 GB

CLI Usage

Inspect the BlackBox from the command line:

# View all events
horus blackbox

# View anomalies only (errors, deadline misses, e-stops)
horus blackbox --anomalies

# Follow in real-time (like tail -f)
horus blackbox --follow

# Filter by node
horus blackbox --node motor_ctrl

# Filter by event type
horus blackbox --event DeadlineMiss

# JSON output for scripts/dashboards
horus blackbox --json

Debugging Walkthrough: "My Robot Crashed Overnight"

Scenario: Your mobile robot stopped moving during an overnight warehouse test. The process restarted but the original crash data is gone.

Step 1: Check the BlackBox

horus blackbox --anomalies

Step 2: Read the timeline

[03:17:01.001] SchedulerStart { nodes: 4, rate: 500Hz }
[03:17:01.500] NodeTick { name: "planner", duration_us: 2100, success: true }
[03:17:01.502] DeadlineMiss { name: "collision_checker", deadline_us: 1900, actual_us: 4200 }
[03:17:01.503] DeadlineMiss { name: "collision_checker", deadline_us: 1900, actual_us: 5100 }
[03:17:01.504] NodeError { name: "arm_controller", error: "joint limit exceeded" }
[03:17:01.504] EmergencyStop { reason: "deadline miss threshold exceeded" }

Step 3: Diagnose The collision checker started missing its 1.9ms deadline (taking 4-5ms instead). During that time, the planner sent a trajectory that would have been rejected — but the check arrived too late. The arm exceeded its joint limits.

Step 4: Fix

  • Tighten the collision checker's budget: .budget(1500_u64.us())
  • Or add a safety interlock: hold trajectory execution until collision check completes
  • Or move collision checking to the same RT thread as the arm controller

BlackBox vs Other Debugging Tools

ToolWhat it capturesStorageWhen to use
BlackBoxScheduler events (lightweight)Fixed ring buffer (16-1024 MB)Always-on crash forensics
Record/ReplayFull node state (inputs/outputs)Grows with timeReproduce specific bugs
horus logText log messagesGrows with timeVerbose debugging
horus monitorLive system stateNone (real-time only)Active debugging

See Also