Execution Classes

A motor controller that misses its deadline by even a millisecond can cause a robot arm to overshoot and collide with a person. A path planner that takes 50 ms of CPU time is completely normal — but if it runs on the same thread as the motor controller, it blocks 50 ticks. A logging node that takes an extra 10 ms is harmless. A cloud uploader that blocks on a network request shouldn't hold up anything.

These are fundamentally different workloads. Running them all the same way — in a single sequential loop — forces every node to compromise. The fast ones wait for the slow ones. The critical ones share a thread with the optional ones. A single slow node can cascade timing failures across the entire system.

HORUS solves this with execution classes: five different executors, each optimized for a specific workload type. The scheduler automatically selects the right class based on how you configure the node — you describe what your node needs, and the scheduler figures out how to run it.

The Five Classes

The scheduler selects the execution class from your node configuration

BestEffort (Default)

The default class. The scheduler automatically parallelizes independent BestEffort nodes using a dependency graph built from topic send()/recv() metadata. Nodes that share no topics run simultaneously. Nodes with topic dependencies execute in causal order (publisher before subscriber). When no topic metadata exists, nodes fall back to .order() tiers.

// simplified
sched.add(display_node)
    .order(100)
    .build()?;  // No .rate(), .compute(), .on(), or .async_io() → BestEffort

How it works: At startup, the scheduler builds a dependency graph from topic metadata. Independent nodes are dispatched to a thread pool via the ready-dispatch executor — each node starts the instant its last dependency finishes, with no barriers. Dependent nodes execute in causal order. The graph is built after init() and rebuilt once after the first tick to capture any lazily-registered topics.

Use for: Sensors, controllers, planners, logging, telemetry — most nodes. The scheduler automatically determines what can run in parallel.

Characteristics:

Runs at the scheduler's global tick_rate()
Independent nodes execute in parallel (automatic — no configuration needed)
Dependent nodes execute in causal order (publisher before subscriber)
.order() is optional — used as tiebreaker for nodes with no topic relationship
Falls back to sequential .order() tiers when no topic metadata is available

Rt (Real-Time)

Each RT node gets a dedicated thread with optional OS-level priority scheduling. The scheduler enforces timing budgets and deadlines, and takes action when a node runs too long.

// simplified
// Auto-derived: budget = 80% of period, deadline = 95% of period
sched.add(motor_ctrl)
    .order(0)
    .rate(1000_u64.hz())       // 1 kHz → budget=800µs, deadline=950µs
    .on_miss(Miss::SafeMode)   // Enter safe state on deadline miss
    .build()?;

// Explicit budget and deadline
sched.add(safety_monitor)
    .order(0)
    .budget(100_u64.us())
    .deadline(200_u64.us())
    .on_miss(Miss::Stop)
    .build()?;

There is no .rt() method. RT is always auto-detected from timing constraints. Setting .rate(), .budget(), or .deadline() — any of them — automatically assigns the Rt class. This maps developer intent ("this node needs to run at 1 kHz") directly to the right executor.

How it works: Each RT node runs on its own dedicated thread. If .require_rt() or .prefer_rt() is set on the scheduler and the OS supports it, the thread gets SCHED_FIFO real-time priority. The scheduler measures every tick() call and applies the Miss policy when budget or deadline is exceeded.

Additional RT configuration:

// simplified
sched.add(critical_node)
    .order(0)
    .rate(1000_u64.hz())
    .budget(300_u64.us())
    .deadline(900_u64.us())
    .on_miss(Miss::Skip)
    .priority(90)           // OS-level thread priority (1-99, higher = more urgent)
    .core(0)                // Pin to CPU core 0
    .watchdog(500_u64.ms()) // Per-node freeze detection
    .build()?;

Use for: Motor control, safety monitoring, sensor fusion — anything where missing a deadline has physical consequences.

Compute

For CPU-heavy work that benefits from parallelism. Multiple Compute nodes run simultaneously on a shared thread pool.

// simplified
sched.add(path_planner)
    .order(5)
    .compute()
    .rate(10_u64.hz())  // Optional: limit how often the node ticks
    .build()?;

How it works: Compute nodes are dispatched to a thread pool (similar to rayon). Multiple compute nodes can run in parallel on different CPU cores. They don't block the main tick loop or RT threads.

Use for: Path planning, SLAM, point cloud processing, ML inference on CPU, image processing — any CPU-bound work that takes more than ~1 ms per tick.

.rate() on a Compute node does not make it RT — it only limits how often the node ticks. A .rate(10_u64.hz()).compute() node ticks at most 10 times per second but has no budget or deadline enforcement.

Event

Nodes that sleep until a specific topic receives new data. Zero CPU usage when idle.

// simplified
sched.add(estop_handler)
    .order(0)
    .on("emergency.stop")
    .build()?;

How it works: The node's thread sleeps. When any publisher calls send() on the named topic, the Event node wakes and tick() is called. If multiple messages arrive between wakes, the node ticks once — call recv() in a loop inside tick() to drain all pending messages.

Use for: Emergency stop handlers, command receivers, sparse event processors — anything where the node should be completely idle until something specific happens.

Characteristics:

Zero CPU when no messages arrive
Wake latency: ~microseconds from send() to tick()
.order() still applies: if two event nodes wake simultaneously, lower order runs first
The topic name in .on("name") must match a Topic::new("name") in another node

AsyncIo

For network or file I/O operations that would block a real-time thread. Runs on a tokio runtime.

// simplified
sched.add(cloud_uploader)
    .order(50)
    .async_io()
    .rate(1_u64.hz())  // Upload once per second
    .build()?;

How it works: The node's tick() runs via tokio::task::spawn_blocking on a tokio-managed thread pool. The node can safely block on network requests, file I/O, or database queries without affecting any other node.

Use for: HTTP/REST API calls, database writes, file logging, cloud telemetry.

Gpu

For nodes that launch CUDA kernels. The scheduler manages a dedicated GPU thread with one CUDA stream per node. Kernels launched in tick() execute asynchronously on the GPU; the executor synchronizes after all GPU nodes have launched.

// simplified
sched.add(preprocess_node)
    .order(2)
    .gpu()
    .rate(30_u64.hz())
    .build()?;

How it works: The GpuExecutor runs on a dedicated thread with a CUDA context. Each GPU node gets its own CUDA stream. Per tick cycle: (1) all ready nodes call tick() which launches GPU kernels non-blocking, (2) all streams synchronize, (3) GPU-side timing is recorded via CUDA events. Inside tick(), access the scheduler-managed stream via horus::gpu_stream().

Use for: Image preprocessing (resize, normalize, color convert), neural network inference, point cloud filtering, any CUDA kernel work. .gpu() takes precedence over .rate() for execution class -- a GPU node with .rate(30.hz()).gpu() runs on the GPU executor at 30Hz, not the RT executor.

Graceful degradation: If CUDA is not available, the GpuExecutor logs a warning and GPU nodes fall back to BestEffort execution on the main thread.

How Classes Are Selected

The scheduler selects the execution class based on which builder methods you call:

Configuration	Resulting Class
(nothing special)	BestEffort
`.rate()`	Rt (auto-derived budget/deadline)
`.budget()`	Rt
`.deadline()`	Rt
`.rate()` + `.budget()` + `.deadline()`	Rt (explicit overrides)
`.compute()`	Compute
`.compute().rate()`	Compute (rate-limited, not RT)
`.on("topic")`	Event
`.async_io()`	AsyncIo
`.async_io().rate()`	AsyncIo (rate-limited, not RT)
`.gpu()`	Gpu
`.gpu().rate()`	Gpu (rate-limited, CUDA stream managed)

Key rule: .rate() only auto-enables RT when no explicit execution class (.compute(), .on(), .async_io(), .gpu()) is set. When combined with an explicit class, .rate() just limits tick frequency.

Deferred Finalization

Class selection happens at .build() time, not when individual methods are called. This means:

.rate(100_u64.hz()).compute() → Compute (.compute() overrides auto-RT)
.compute().rate(100_u64.hz()) → Compute (same result regardless of order)
.compute().async_io() → AsyncIo (last explicit class wins, warning logged)

Decision Guide

Your node does...	Use	Builder
Motor control at 500+ Hz	Rt	`.rate(500_u64.hz())`
Safety monitoring with deadlines	Rt	`.rate().budget().deadline().on_miss()`
Path planning (takes 10-50 ms)	Compute	`.compute()`
ML inference on CPU	Compute	`.compute()`
React to emergency stop	Event	`.on("emergency.stop")`
Process commands as they arrive	Event	`.on("command")`
Upload telemetry to cloud	AsyncIo	`.async_io()`
Write logs to disk	AsyncIo	`.async_io()`
Display dashboard updates	BestEffort	(default)
Simple sensor reading	BestEffort or Rt	`.rate()` if timing matters

Validation and Common Mistakes

.build() validates your configuration and catches mistakes:

What's Rejected

Configuration	Error
`.compute().budget()`	Budget only meaningful for RT nodes
`.on("topic").deadline()`	Deadline only meaningful for RT nodes
`.async_io().budget()`	Budget only meaningful for RT nodes
`.budget(Duration::ZERO)`	Budget must be > 0
`.on("")`	Empty topic — node can never trigger

What's Warned

Configuration	Warning
`.compute().async_io()`	Last class wins (AsyncIo), first silently overridden
`.compute().priority(99)`	Priority ignored on non-RT nodes
`.on_miss(Miss::Stop)` without deadline	No deadline to miss — policy has no effect

Common Mistakes

1. Thinking .priority() works on Compute nodes

// simplified
// Priority is silently ignored — only RT nodes get SCHED_FIFO threads
sched.add(planner).compute().priority(99).build()?;

// Make it RT if you need OS-level priority
sched.add(planner).rate(100_u64.hz()).priority(99).build()?;

2. Setting .on_miss() without a deadline

// simplified
// No deadline means Miss::Stop can never trigger
sched.add(ctrl).compute().on_miss(Miss::Stop).build()?;

// Add .rate() so a deadline exists to miss
sched.add(ctrl).rate(100_u64.hz()).on_miss(Miss::Stop).build()?;

3. Chaining multiple execution classes

// simplified
// Only the LAST class applies — compute() is silently overridden
sched.add(node).compute().async_io().build()?;  // → AsyncIo, NOT Compute

// Pick one
sched.add(node).async_io().build()?;

Complete Example: Mixed Execution Classes

// simplified
use horus::prelude::*;

fn main() -> Result<()> {
    let mut sched = Scheduler::new()
        .tick_rate(100_u64.hz())
        .prefer_rt();

    // Rt — 1 kHz motor control with strict timing
    sched.add(MotorController::new()?)
        .order(0)
        .rate(1000_u64.hz())
        .budget(300_u64.us())
        .on_miss(Miss::Skip)
        .build()?;

    // Event — only runs when emergency_stop topic updates
    sched.add(EmergencyHandler::new()?)
        .order(0)
        .on("emergency.stop")
        .build()?;

    // Rt — 100 Hz sensor reading
    sched.add(ImuReader::new()?)
        .order(1)
        .rate(100_u64.hz())
        .build()?;

    // Compute — path planning in parallel
    sched.add(PathPlanner::new()?)
        .order(5)
        .compute()
        .rate(10_u64.hz())
        .build()?;

    // AsyncIo — telemetry upload every 5 seconds
    sched.add(TelemetryUploader::new()?)
        .order(50)
        .async_io()
        .rate(0.2_f64.hz())
        .build()?;

    // BestEffort — display node in main loop
    sched.add(Dashboard::new()?)
        .order(100)
        .build()?;

    sched.run()
}

Design Decisions

Why 5 classes instead of just RT and non-RT? A thread-per-node model (RT) is wasteful for logging nodes — dedicating OS threads and SCHED_FIFO slots to telemetry is overkill. A single-threaded model (BestEffort) can't handle 50 ms path planning without stalling the control loop. A two-class split (RT vs non-RT) doesn't distinguish between CPU-bound work (Compute), event-driven reactions (Event), and I/O-bound operations (AsyncIo) — each of which has a fundamentally different optimal executor. The five-class model matches the five common robotics workload patterns.

Why auto-detection instead of explicit class selection? Most developers don't think in terms of "execution classes" — they think "this node needs to run at 1 kHz" or "this node does heavy computation." Auto-detection from .rate(), .compute(), .on(), and .async_io() maps intent to the right executor without requiring framework knowledge. If you set .rate(1000_u64.hz()), the scheduler knows you need a dedicated real-time thread. You don't have to explicitly request one.

Why does .rate() + .compute() not become RT? Because rate-limiting and real-time are different things. A path planner at 10 Hz means "tick at most 10 times per second" — not "this node has a 100 ms deadline that must be enforced." Mixing the two concepts would force compute nodes to pay RT overhead (dedicated threads, timing measurement) for no benefit. The rule is clear: .rate() only triggers RT when no explicit class is set.

Trade-offs

Gain	Cost
Right executor per workload — each node runs optimally	Must understand which class fits your node
Auto-detection — `.rate()` infers RT without explicit configuration	Less explicit — must know the `.rate()` + `.compute()` interaction
RT isolation — a slow Compute node can't block an RT motor controller	RT nodes consume one OS thread each
Event nodes — zero CPU when idle	Must match `.on("topic")` name exactly to a `Topic::new("topic")`
AsyncIo — I/O never blocks the tick loop	tokio runtime overhead for simple file writes