Emergency Stop
Every robot that moves can hurt someone. IEC 61508 (functional safety) and IEC 62061 (safety of machinery) both mandate that autonomous systems provide a reliable way to cease all hazardous motion immediately. Whether you are building a warehouse AGV, a surgical arm, or a hobby rover, an emergency stop is not optional — it is the single most important safety subsystem in your robot.
HORUS provides the building blocks for a cooperative software E-stop: the scheduler's enter_safe_state() callback, Miss::SafeMode deadline enforcement, and shared-memory topics for cross-node signal propagation. This recipe shows you how to wire them together into a production-ready pattern with debounce, fail-safe defaults, and testable shutdown behavior.
When To Use This
- Any robot with actuators (motors, servos, grippers)
- When safety regulations require guaranteed shutdown
- When you need sub-millisecond response to safety events
Use Fault Tolerance instead if you need graceful degradation (reduced speed, limited range of motion) rather than full shutdown.
Prerequisites
- Familiarity with Nodes and CmdVel
- Understanding of Miss policies and
enter_safe_state()
Solution
horus.toml
[package]
name = "emergency-stop"
version = "0.1.0"
description = "E-stop monitor with safety state handling"
src/main.rs (or src/main.py)
// simplified
use horus::prelude::*;
/// E-stop trigger from hardware or software
#[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)]
#[repr(C)]
struct EStopSignal {
triggered: u8, // 0 = clear, 1 = triggered (u8 for repr(C) compat)
source: u8, // 0 = hardware, 1 = software, 2 = remote
}
/// Status published by the E-stop monitor
#[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)]
#[repr(C)]
struct SafetyStatus {
estop_active: u8,
consecutive_clears: u32,
uptime_ticks: u64,
}
struct EStopNode {
estop_sub: Topic<EStopSignal>,
cmd_pub: Topic<CmdVel>,
status_pub: Topic<SafetyStatus>,
estop_active: bool,
consecutive_clears: u32,
ticks: u64,
}
impl EStopNode {
fn new() -> Result<Self> {
Ok(Self {
estop_sub: Topic::new("safety.estop")?,
cmd_pub: Topic::new("cmd_vel")?,
status_pub: Topic::new("safety.status")?,
estop_active: false,
consecutive_clears: 0,
ticks: 0,
})
}
}
impl Node for EStopNode {
fn name(&self) -> &str { "EStop" }
fn tick(&mut self) {
self.ticks += 1;
if let Some(signal) = self.estop_sub.recv() {
if signal.triggered != 0 {
// SAFETY: immediately activate E-stop
self.estop_active = true;
self.consecutive_clears = 0;
} else {
self.consecutive_clears += 1;
}
} else {
// No signal received — treat as potential fault (fail-safe)
self.consecutive_clears = 0;
}
// Require N consecutive clear signals before releasing
const CLEAR_THRESHOLD: u32 = 50; // 50 ticks at 100 Hz = 0.5 seconds
if self.estop_active && self.consecutive_clears >= CLEAR_THRESHOLD {
self.estop_active = false;
}
if self.estop_active {
// SAFETY: override cmd_vel with zero — stops all motion
self.cmd_pub.send(CmdVel::zero());
}
self.status_pub.send(SafetyStatus {
estop_active: self.estop_active as u8,
consecutive_clears: self.consecutive_clears,
uptime_ticks: self.ticks,
});
}
fn shutdown(&mut self) -> Result<()> {
// SAFETY: zero velocity on shutdown
self.cmd_pub.send(CmdVel::zero());
Ok(())
}
fn enter_safe_state(&mut self) {
// SAFETY: called by scheduler if this node misses its deadline
self.estop_active = true;
self.cmd_pub.send(CmdVel::zero());
}
fn is_safe_state(&self) -> bool {
self.estop_active
}
}
fn main() -> Result<()> {
let mut scheduler = Scheduler::new()
.watchdog(500_u64.ms())
.max_deadline_misses(1); // aggressive — isolate after 1 miss
// E-stop runs LAST — overrides any cmd_vel from other nodes
scheduler.add(EStopNode::new()?)
.order(100) // after drive/planner nodes
.rate(100_u64.hz()) // 100 Hz safety monitoring
.budget(200_u64.us()) // tight budget
.deadline(500_u64.us()) // tight deadline
.on_miss(Miss::SafeMode) // force safe state on deadline miss
.build()?;
scheduler.run()
}
Understanding the Code
enter_safe_state()is called by the scheduler when this node misses its deadline — the robot stops automatically without any application logicMiss::SafeModeis the strictest miss policy — any deadline overrun triggers safe state- High
.order(100)ensures E-stop runs AFTER drive/planning nodes — it overrides theircmd_veloutput - Debounce with
CLEAR_THRESHOLDprevents flickering E-stop signals from bouncing - No signal = fault — if the E-stop topic stops publishing, the node treats it as triggered (fail-safe design)
- 200 us budget is generous for this simple node — keeps safety checks deterministic
The E-stop node must run on the same scheduler as the motor controller. If they run in separate processes, the E-stop can't override cmd_vel — use shared-memory topics with the E-stop node having a higher .order() than the motor node.
Full System Example
A real robot does not run the E-stop node alone. Below is a complete scheduler with three nodes — DriveNode (motor control), PlannerNode (path planning), and EStopNode (safety override) — showing how enter_safe_state() and is_safe_state() cooperate across the system.
// simplified
use horus::prelude::*;
// --- Messages (same EStopSignal and SafetyStatus as above) ---
#[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)]
#[repr(C)]
struct EStopSignal {
triggered: u8,
source: u8,
}
#[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)]
#[repr(C)]
struct SafetyStatus {
estop_active: u8,
consecutive_clears: u32,
uptime_ticks: u64,
}
#[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)]
#[repr(C)]
struct WheelCmd {
left_rpm: f32,
right_rpm: f32,
}
// --- DriveNode: converts cmd_vel to wheel commands ---
struct DriveNode {
cmd_sub: Topic<CmdVel>,
wheel_pub: Topic<WheelCmd>,
}
impl DriveNode {
fn new() -> Result<Self> {
Ok(Self {
cmd_sub: Topic::new("cmd_vel")?,
wheel_pub: Topic::new("wheel.cmd")?,
})
}
}
impl Node for DriveNode {
fn name(&self) -> &str { "Drive" }
fn tick(&mut self) {
if let Some(cmd) = self.cmd_sub.recv() {
let wheel_base = 0.3_f32;
let radius = 0.05_f32;
let to_rpm = 60.0 / (2.0 * std::f32::consts::PI);
let left = ((cmd.linear - cmd.angular * wheel_base / 2.0) / radius) * to_rpm;
let right = ((cmd.linear + cmd.angular * wheel_base / 2.0) / radius) * to_rpm;
self.wheel_pub.send(WheelCmd {
left_rpm: left.clamp(-200.0, 200.0),
right_rpm: right.clamp(-200.0, 200.0),
});
}
}
fn enter_safe_state(&mut self) {
// Zero both motors immediately — this is the critical safety action
self.wheel_pub.send(WheelCmd { left_rpm: 0.0, right_rpm: 0.0 });
}
fn shutdown(&mut self) -> Result<()> {
self.wheel_pub.send(WheelCmd { left_rpm: 0.0, right_rpm: 0.0 });
Ok(())
}
}
// --- PlannerNode: publishes cmd_vel, checks path safety ---
struct PlannerNode {
cmd_pub: Topic<CmdVel>,
path_clear: bool,
}
impl PlannerNode {
fn new() -> Result<Self> {
Ok(Self {
cmd_pub: Topic::new("cmd_vel")?,
path_clear: true,
})
}
}
impl Node for PlannerNode {
fn name(&self) -> &str { "Planner" }
fn tick(&mut self) {
// In production, check lidar/camera for obstacles
if self.path_clear {
self.cmd_pub.send(CmdVel::new(0.5, 0.0));
} else {
self.cmd_pub.send(CmdVel::zero());
}
}
fn is_safe_state(&self) -> bool {
// Scheduler queries this — if false, the safety monitor can escalate
!self.path_clear
}
fn enter_safe_state(&mut self) {
self.path_clear = false;
self.cmd_pub.send(CmdVel::zero());
}
}
// --- EStopNode: same as the Solution section above ---
// (omitted for brevity — use the full EStopNode from above)
fn main() -> Result<()> {
let mut scheduler = Scheduler::new()
.watchdog(500_u64.ms())
.max_deadline_misses(1);
// Planner runs first — publishes cmd_vel
scheduler.add(PlannerNode::new()?)
.order(0)
.rate(20_u64.hz())
.on_miss(Miss::Skip)
.build()?;
// Drive runs second — converts cmd_vel to wheel commands
scheduler.add(DriveNode::new()?)
.order(10)
.rate(50_u64.hz())
.budget(500_u64.us())
.on_miss(Miss::SafeMode)
.build()?;
// E-stop runs LAST — overrides cmd_vel if triggered
scheduler.add(EStopNode::new()?)
.order(100)
.rate(100_u64.hz())
.budget(200_u64.us())
.deadline(500_u64.us())
.on_miss(Miss::SafeMode)
.build()?;
scheduler.run()
}
The execution order matters: Planner (0) produces velocity, Drive (10) converts it to motor commands, and EStop (100) overrides cmd_vel with zero if triggered. Because EStop writes to the same cmd_vel topic after Planner, the zero command propagates to Drive on the next tick.
When the scheduler calls enter_safe_state() on DriveNode (due to a deadline miss or watchdog expiry), DriveNode zeros its wheel output independently of the E-stop signal. This gives you two layers of protection: the E-stop node zeroes the command, and the drive node zeroes the output.
Hardware GPIO Integration
Physical E-stop buttons connect to GPIO pins. The pattern below reads a hardware button and publishes to the safety.estop topic so the EStopNode (running in the same or a different process) can react.
GPIO access varies by platform. This example uses the gpio_cdev crate (Linux character device interface). On Raspberry Pi, you could also use rppal. The HORUS E-stop pattern is the same regardless of which GPIO library you choose.
// simplified
use horus::prelude::*;
use gpio_cdev::{Chip, LineRequestFlags};
struct GpioEstopPublisher {
estop_pub: Topic<EStopSignal>,
gpio_line: gpio_cdev::Line,
pin: u32,
}
impl GpioEstopPublisher {
fn new(chip_path: &str, pin: u32) -> Result<Self> {
let mut chip = Chip::new(chip_path)
.map_err(|e| horus::Error::msg(format!("GPIO chip: {e}")))?;
let line = chip.get_line(pin)
.map_err(|e| horus::Error::msg(format!("GPIO line {pin}: {e}")))?;
Ok(Self {
estop_pub: Topic::new("safety.estop")?,
gpio_line: line,
pin,
})
}
}
impl Node for GpioEstopPublisher {
fn name(&self) -> &str { "GpioEstop" }
fn tick(&mut self) {
// Request the line as input each tick (some drivers require re-request)
let handle = self.gpio_line
.request(LineRequestFlags::INPUT, 0, "horus-estop")
.unwrap();
let value = handle.get_value().unwrap_or(1); // fail-safe: default to triggered
// E-stop buttons are normally-closed (NC): pin LOW = safe, pin HIGH = triggered
// Wiring a NC button means a broken wire also triggers the E-stop (fail-safe)
self.estop_pub.send(EStopSignal {
triggered: value as u8,
source: 0, // hardware
});
}
}
Why normally-closed wiring? A normally-closed (NC) button keeps the circuit completed when not pressed. If the wire breaks or the connector fails, the circuit opens — which reads the same as "button pressed." This is the standard industrial fail-safe pattern: any hardware fault triggers the E-stop rather than silently disabling it.
Multi-Process E-stop
HORUS topics use shared memory. When two processes open Topic::new("safety.estop"), they share the same SHM segment. This means a hardware GPIO publisher in one process and an EStopNode in another process see the same signal with zero serialization overhead.
The pattern for multi-process E-stop:
// simplified
// Process 1: Hardware monitor (publishes E-stop from GPIO)
fn main() -> Result<()> {
let mut scheduler = Scheduler::new();
scheduler.add(GpioEstopPublisher::new("/dev/gpiochip0", 17)?)
.order(0)
.rate(200_u64.hz()) // sample GPIO at 200 Hz for fast response
.budget(100_u64.us())
.build()?;
scheduler.run()
}
// Process 2: Main robot controller (subscribes to E-stop)
fn main() -> Result<()> {
let mut scheduler = Scheduler::new()
.watchdog(500_u64.ms())
.max_deadline_misses(1);
scheduler.add(PlannerNode::new()?)
.order(0)
.rate(20_u64.hz())
.build()?;
scheduler.add(DriveNode::new()?)
.order(10)
.rate(50_u64.hz())
.on_miss(Miss::SafeMode)
.build()?;
// E-stop reads from SHM — sees GPIO publisher's writes
scheduler.add(EStopNode::new()?)
.order(100)
.rate(100_u64.hz())
.budget(200_u64.us())
.deadline(500_u64.us())
.on_miss(Miss::SafeMode)
.build()?;
scheduler.run()
}
Every process that controls actuators should subscribe to safety.estop. The publisher only needs to exist once (in the hardware monitor process), but any number of subscribers can react to it. Because SHM is lock-free, the E-stop signal propagates in under 1 microsecond regardless of how many subscribers exist.
If the hardware monitor process crashes, subscribers stop receiving new messages. The EStopNode already handles this: no signal received = consecutive_clears resets to zero, which prevents release of an active E-stop. For the initial case (E-stop not yet active), consider starting with estop_active: true and requiring the first clear signal before allowing motion.
Testing E-stop
Use tick_once() to verify E-stop behavior deterministically without running the full scheduler loop. Each call to tick_once() executes exactly one tick of every node in order.
// simplified
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn motor_runs_when_estop_clear() {
let mut scheduler = Scheduler::new();
scheduler.add(DriveNode::new().unwrap())
.order(10)
.build().unwrap();
scheduler.add(EStopNode::new().unwrap())
.order(100)
.build().unwrap();
// Publish a clear E-stop signal
let estop_pub: Topic<EStopSignal> = Topic::new("safety.estop").unwrap();
estop_pub.send(EStopSignal { triggered: 0, source: 0 });
// Publish a velocity command
let cmd_pub: Topic<CmdVel> = Topic::new("cmd_vel").unwrap();
cmd_pub.send(CmdVel::new(1.0, 0.0));
// Tick once — Drive should produce non-zero wheel output
scheduler.tick_once();
let wheel_sub: Topic<WheelCmd> = Topic::new("wheel.cmd").unwrap();
let wheel = wheel_sub.recv().unwrap();
assert!(wheel.left_rpm.abs() > 0.0, "motor should be running");
assert!(wheel.right_rpm.abs() > 0.0, "motor should be running");
}
#[test]
fn motor_stops_when_estop_triggered() {
let mut scheduler = Scheduler::new();
scheduler.add(DriveNode::new().unwrap())
.order(10)
.build().unwrap();
scheduler.add(EStopNode::new().unwrap())
.order(100)
.build().unwrap();
// Publish a triggered E-stop
let estop_pub: Topic<EStopSignal> = Topic::new("safety.estop").unwrap();
estop_pub.send(EStopSignal { triggered: 1, source: 1 });
// Publish a velocity command — E-stop should override it
let cmd_pub: Topic<CmdVel> = Topic::new("cmd_vel").unwrap();
cmd_pub.send(CmdVel::new(1.0, 0.0));
// Tick once — EStopNode overwrites cmd_vel with zero
scheduler.tick_once();
// Tick again — DriveNode reads the zeroed cmd_vel
scheduler.tick_once();
let wheel_sub: Topic<WheelCmd> = Topic::new("wheel.cmd").unwrap();
let wheel = wheel_sub.recv().unwrap();
assert_eq!(wheel.left_rpm, 0.0, "motor must be stopped");
assert_eq!(wheel.right_rpm, 0.0, "motor must be stopped");
}
#[test]
fn estop_requires_debounce_to_clear() {
let mut scheduler = Scheduler::new();
scheduler.add(EStopNode::new().unwrap())
.order(100)
.build().unwrap();
let estop_pub: Topic<EStopSignal> = Topic::new("safety.estop").unwrap();
let status_sub: Topic<SafetyStatus> = Topic::new("safety.status").unwrap();
// Trigger E-stop
estop_pub.send(EStopSignal { triggered: 1, source: 0 });
scheduler.tick_once();
let status = status_sub.recv().unwrap();
assert_eq!(status.estop_active, 1, "should be active after trigger");
// Send a single clear — should NOT release (need 50 consecutive)
estop_pub.send(EStopSignal { triggered: 0, source: 0 });
scheduler.tick_once();
let status = status_sub.recv().unwrap();
assert_eq!(status.estop_active, 1, "should still be active — debounce not met");
}
#[test]
fn no_signal_keeps_estop_active() {
let mut scheduler = Scheduler::new();
scheduler.add(EStopNode::new().unwrap())
.order(100)
.build().unwrap();
let estop_pub: Topic<EStopSignal> = Topic::new("safety.estop").unwrap();
let status_sub: Topic<SafetyStatus> = Topic::new("safety.status").unwrap();
// Trigger E-stop, then stop publishing entirely
estop_pub.send(EStopSignal { triggered: 1, source: 0 });
scheduler.tick_once();
// Tick without publishing — simulates publisher crash
scheduler.tick_once();
let status = status_sub.recv().unwrap();
assert_eq!(status.estop_active, 1, "no signal = fault = stay active");
}
}
These tests run in milliseconds and cover the four critical scenarios: normal operation, triggered stop, debounce behavior, and publisher failure.
Safety Standards Note
HORUS provides cooperative software safety. The E-stop pattern in this recipe is a software-level safety layer — it depends on the scheduler running, the process being alive, and shared memory being accessible. This is necessary but not sufficient for safety-critical systems.
For systems that must meet SIL (Safety Integrity Level) ratings under IEC 61508 or performance levels under ISO 13849:
- Hardware E-stop circuit: A physical relay that cuts power to actuators independently of software. This is the primary safety system. The relay must be rated for the motor's stall current and must be fail-safe (normally-closed contacts).
- Software E-stop (this recipe): A secondary layer that provides faster response (sub-millisecond vs. relay switching time of 5-20 ms) and richer behavior (debounce, status reporting, coordinated shutdown). It monitors the same physical button and coordinates the software stack.
- Watchdog timer: A hardware watchdog (e.g., on the microcontroller or SBC) that resets the system if software stops sending heartbeats. HORUS's
.watchdog()is a software watchdog — pair it with a hardware one for defense in depth.
The layered approach: hardware relay catches catastrophic failures (software crash, kernel panic, power brownout), software E-stop handles the 99% case with better UX (status reporting, coordinated shutdown, logging).
Design Decisions
Why fail-safe (no signal = fault)? If the E-stop publisher crashes, the subscriber stops receiving messages. A "fail-dangerous" design would interpret silence as "all clear" — the robot keeps moving with no safety monitoring. The fail-safe design treats silence as a fault and activates the E-stop. This is the same principle as a dead man's switch: you must actively assert safety, not passively assume it.
Why debounce on clear (not on trigger)? Triggering the E-stop must be instant — any delay could mean the robot travels further into a hazard. But releasing the E-stop can afford 0.5 seconds of delay. Debounce on clear prevents a flickering signal (e.g., a loose wire on the GPIO pin) from rapidly cycling the robot between motion and stop, which can damage motors and gearboxes.
Why same-scheduler ordering instead of priority? HORUS uses cooperative scheduling with deterministic ordering, not preemptive priorities. The .order(100) guarantee means the E-stop node always runs after drive nodes within the same tick. This is simpler to reason about than preemptive priority inversion, and the worst-case latency is bounded by the tick period (10 ms at 100 Hz) rather than being unbounded.
Why u8 instead of bool for triggered? The #[repr(C)] attribute ensures the struct has a predictable memory layout for zero-copy SHM transport. The Rust bool type has no guaranteed #[repr(C)] size across all platforms. Using u8 makes the wire format explicit: 0 = clear, 1 = triggered.
Trade-offs
| Approach | Latency | Reliability | Complexity | When to use |
|---|---|---|---|---|
| Same-scheduler E-stop (this recipe) | <1 tick (10 ms at 100 Hz) | High — deterministic ordering | Low | Single-process robots, most use cases |
| Multi-process SHM E-stop | <1 us propagation + subscriber tick period | High — survives publisher crash via fail-safe | Medium | Multi-process architectures |
Event-driven .on("safety.estop") | <1 us (wake on signal) | Medium — no periodic checking | Low | When lowest latency matters more than periodic monitoring |
| Hardware relay only (no software) | 5-20 ms relay switching | Very high — independent of software | Very low | Certification requirement, last-resort backup |
| Hardware relay + software E-stop | <1 tick software, 5-20 ms hardware backup | Highest — defense in depth | Medium | Production robots, SIL-rated systems |
Variations
Common Errors
| Symptom | Cause | Fix |
|---|---|---|
| E-stop doesn't override motors | E-stop .order() lower than motor node | Set E-stop .order() HIGHER than motor node |
| Motors resume after E-stop | Only sending zero once | Send CmdVel::zero() every tick while active |
| E-stop flickers on/off | No debounce on clear signal | Use CLEAR_THRESHOLD consecutive clears |
| System continues after deadline miss | Using Miss::Warn instead of Miss::SafeMode | Set .on_miss(Miss::SafeMode) |
| E-stop never releases | CLEAR_THRESHOLD too high or no clear signals | Verify upstream publishes clear signals |
| GPIO reads inverted | Button wired normally-open instead of normally-closed | Invert the logic or rewire as NC for fail-safe |
| Multi-process E-stop has lag | Subscriber tick rate too low | Increase E-stop node .rate() or use event-driven |
See Also
- Safety Monitor — Graduated degradation and watchdog
- Fault Tolerance — Circuit breaker pattern for graceful degradation
- CmdVel — Velocity command type
- Miss Enum — All deadline miss policies
- Differential Drive — Motor control node that pairs with E-stop