Telemetry Export
You need to send HORUS runtime metrics (node tick durations, message counts, deadline misses) to an external monitoring system like Grafana, Prometheus, or a custom dashboard. Here is how to enable telemetry export with a single builder method.
When To Use This
- Sending scheduler metrics to Grafana, InfluxDB, or Prometheus via an HTTP receiver
- Logging metrics to a file for offline analysis
- Streaming metrics over UDP to a LAN monitoring tool
- Debugging performance with stdout metric output
Use Monitor instead if you want a built-in web/TUI dashboard without external tooling.
Use Logging instead if you need text-based diagnostic messages, not numeric metrics.
Prerequisites
- A HORUS Rust project with
use horus::prelude::*; - The
telemetryfeature enabled onhorus_core(enabled by default) - For HTTP export: a running HTTP endpoint that accepts JSON POST requests
Quick Start
Enable telemetry with a single builder method:
// simplified
let mut scheduler = Scheduler::new()
.tick_rate(100_u64.hz())
.telemetry("http://localhost:9090/metrics"); // HTTP POST endpoint
scheduler.add(MyNode::new()?).order(0).rate(100_u64.hz()).build()?;
scheduler.run()?;
The scheduler exports a JSON snapshot every 1 second (default interval).
Architecture
Telemetry in HORUS is designed around one constraint: the scheduler tick loop must never block on I/O. The architecture achieves this with a three-stage pipeline that decouples metric collection from serialization and delivery.
┌─────────────────────────────────────────────────────────────┐
│ Scheduler Tick Loop (RT thread) │
│ │
│ for each node: │
│ tick() → measure duration → record in Profiler │
│ │
│ if export_interval elapsed: │
│ Profiler.node_stats → TelemetryManager.gauge/counter │
│ TelemetryManager.build_snapshot() │
│ TelemetryManager.export() │
│ ├─ HTTP → try_send(snapshot) on bounded channel ──┐ │
│ ├─ UDP → sendto() (single syscall, fire-forget) │ │
│ ├─ File → write_all() (buffered, local disk) │ │
│ └─ Stdout→ println() (debugging only) │ │
└──────────────────────────────────────────────────────┬───┘ │
│ │
┌─────────────────────────────────────┘ │
│ Bounded Channel (capacity: 4) │
│ try_send() — non-blocking │
│ If full → snapshot silently dropped │
└──────────────┬──────────────────────────────┘
│
┌──────────────▼──────────────────────────────┐
│ Background Thread ("horus-telemetry-http") │
│ │
│ loop: │
│ snapshot = rx.recv() │
│ TcpStream::connect(url) │
│ HTTP POST (JSON body) │
│ read status line (200/201/204 = ok) │
│ │
│ On Drop: sender dropped → recv returns Err │
│ → thread exits, handle.join() │
└──────────────────────────────────────────────┘
Data flow step by step
-
Collect -- The scheduler tick loop executes each node's
tick()and records timing data in the internalProfiler. This is just aHashMapupdate per node, with no allocation on the hot path. -
Package -- At the configured export interval (default: 1 second),
TelemetryManagerreads the profiler stats and callsgauge_with_labels()/counter_with_labels()for each node. It then callsbuild_snapshot(), which clones the accumulatedHashMap<String, Metric>into aTelemetrySnapshotstruct. -
Deliver --
export()dispatches the snapshot to the configured endpoint:- HTTP: The snapshot is sent via
try_send()on a boundedmpsc::sync_channel(4). If the background thread is busy with a slow POST and the channel is full, the snapshot is silently dropped. The scheduler thread returns immediately in all cases. - UDP: A single
sendto()syscall. The socket is pre-bound at startup and cached. - File:
File::create()+write_all()with pretty-printed JSON. - Stdout: Direct
println!()for debugging.
- HTTP: The snapshot is sent via
Why the background thread only applies to HTTP
UDP, file, and stdout exports are inherently fast and bounded -- a UDP sendto() completes in microseconds, file writes go to the kernel page cache, and stdout is local. Only HTTP involves potentially unbounded network latency (DNS, TCP handshake, slow receivers), which is why it gets its own dedicated thread.
Graceful shutdown
When the TelemetryManager is dropped (scheduler exit), it drops the channel sender. The background thread's rx.recv() returns Err, breaking the loop. The Drop implementation then calls handle.join() to drain any queued snapshots before the process exits, ensuring no data is lost on clean shutdown.
Endpoint Types
The .telemetry(endpoint) method accepts a string that determines the export backend:
| String Format | Endpoint | Behavior |
|---|---|---|
"http://host:port/path" | HTTP POST | Non-blocking — background thread handles network I/O |
"https://host:port/path" | HTTPS POST | Same as HTTP with TLS |
"udp://host:port" | UDP datagram | Compact JSON, single packet per snapshot |
"file:///path/to/metrics.json" | Local file | Pretty-printed JSON, overwritten each export |
"/path/to/metrics.json" | Local file | Same as file:// prefix |
"stdout" or "local" | Stdout | Pretty-printed to terminal (debugging) |
"disabled" or "" | Disabled | No export (default) |
HTTP Endpoint (Recommended for Production)
// simplified
let mut scheduler = Scheduler::new()
.telemetry("http://localhost:9090/metrics");
HTTP export is fully non-blocking for the scheduler:
- Scheduler calls
export()on its cycle — posts a snapshot to a bounded channel (capacity 4) - A dedicated background thread reads from the channel and performs the HTTP POST
- If the channel is full (receiver slow), the snapshot is silently dropped — the scheduler never blocks
UDP Endpoint (Low Overhead)
// simplified
let mut scheduler = Scheduler::new()
.telemetry("udp://192.168.1.100:9999");
Sends compact single-line JSON per snapshot. Good for LAN monitoring where packet loss is acceptable.
File Endpoint (Debugging & Logging)
// simplified
let mut scheduler = Scheduler::new()
.telemetry("/tmp/horus-metrics.json");
Overwrites the file on each export cycle. Useful for debugging or feeding into log aggregation pipelines.
JSON Payload Format
Every export produces a TelemetrySnapshot:
{
"timestamp_secs": 1710547200,
"scheduler_name": "motor_control",
"uptime_secs": 42.5,
"metrics": [
{
"name": "node.tick_duration_us",
"value": { "Gauge": 145.2 },
"labels": { "node": "MotorCtrl" },
"timestamp_secs": 1710547200
},
{
"name": "node.total_ticks",
"value": { "Counter": 4250 },
"labels": { "node": "MotorCtrl" },
"timestamp_secs": 1710547200
},
{
"name": "scheduler.deadline_misses",
"value": { "Counter": 0 },
"labels": {},
"timestamp_secs": 1710547200
}
]
}
Metric Value Types
| Type | JSON | Description |
|---|---|---|
| Counter | { "Counter": 42 } | Monotonically increasing (total ticks, messages sent) |
| Gauge | { "Gauge": 3.14 } | Current value (tick duration, CPU usage) |
| Histogram | { "Histogram": [0.1, 0.2, 0.15] } | Distribution of values |
| Text | { "Text": "Healthy" } | String status |
Auto-Collected Metrics
When telemetry is enabled, the scheduler automatically exports:
| Metric Name | Type | Labels | Description |
|---|---|---|---|
node.total_ticks | Counter | node | Total ticks executed |
node.tick_duration_us | Gauge | node | Last tick duration in microseconds |
node.errors | Counter | node | Total tick errors |
scheduler.deadline_misses | Counter | — | Total deadline misses across all nodes |
scheduler.uptime_secs | Gauge | — | Scheduler uptime |
Feature Flag
Telemetry requires the telemetry feature on horus_core — enabled by default.
To disable at compile time (saves binary size):
[dependencies]
horus = { version = "0.1", default-features = false, features = ["macros", "blackbox"] }
Integration with External Tools
Grafana + Custom Receiver
Write a small HTTP server that receives the JSON POST and forwards metrics to Prometheus/InfluxDB:
# receiver.py — minimal Flask example
from flask import Flask, request
app = Flask(__name__)
@app.route("/metrics", methods=["POST"])
def metrics():
snapshot = request.json
for m in snapshot["metrics"]:
print(f"{m['name']} = {m['value']}")
return "ok", 200
app.run(port=9090)
horus monitor (Alternative)
For local debugging, horus monitor provides a built-in TUI dashboard — no external setup needed. See Monitor for details.
Prometheus Integration
Prometheus scrapes metrics over HTTP in its own exposition format, not JSON. HORUS exports JSON, so you need a small bridge that converts the JSON payload into Prometheus-compatible text. There are two approaches.
Approach A: Python bridge (recommended for quick setup)
This script receives HORUS JSON POSTs and serves a /metrics endpoint that Prometheus scrapes:
# prometheus_bridge.py
from flask import Flask, request
from prometheus_client import Gauge, Counter, generate_latest, REGISTRY
app = Flask(__name__)
# Dynamic metric storage
gauges = {}
counters = {}
def get_or_create_gauge(name, labels):
key = name
if key not in gauges:
label_keys = list(labels.keys()) if labels else []
gauges[key] = Gauge(
name.replace(".", "_"),
f"HORUS metric: {name}",
label_keys,
)
return gauges[key]
def get_or_create_counter(name, labels):
key = name
if key not in counters:
label_keys = list(labels.keys()) if labels else []
counters[key] = Counter(
name.replace(".", "_"),
f"HORUS metric: {name}",
label_keys,
)
return counters[key]
@app.route("/ingest", methods=["POST"])
def ingest():
"""Receives HORUS telemetry JSON and updates Prometheus metrics."""
snapshot = request.json
for m in snapshot["metrics"]:
labels = m.get("labels", {})
value_dict = m["value"]
if "Gauge" in value_dict:
g = get_or_create_gauge(m["name"], labels)
if labels:
g.labels(**labels).set(value_dict["Gauge"])
else:
g.set(value_dict["Gauge"])
elif "Counter" in value_dict:
c = get_or_create_counter(m["name"], labels)
# Prometheus counters only increment; set to the HORUS total
current = c._value.get() if not labels else 0
delta = value_dict["Counter"] - current
if delta > 0:
if labels:
c.labels(**labels).inc(delta)
else:
c.inc(delta)
return "ok", 200
@app.route("/metrics")
def metrics():
"""Prometheus scrape endpoint."""
return generate_latest(REGISTRY), 200, {"Content-Type": "text/plain; charset=utf-8"}
if __name__ == "__main__":
app.run(host="0.0.0.0", port=9090)
Configure HORUS to push to the bridge:
// simplified
let mut scheduler = Scheduler::new()
.tick_rate(100_u64.hz())
.telemetry("http://localhost:9090/ingest"); // POST to the bridge
Add the bridge to your prometheus.yml:
scrape_configs:
- job_name: "horus"
scrape_interval: 5s
static_configs:
- targets: ["localhost:9090"]
Approach B: File-based with node_exporter textfile collector
For environments where you cannot run an extra HTTP service, write telemetry to a file that Prometheus node_exporter picks up:
// simplified
let mut scheduler = Scheduler::new()
.tick_rate(100_u64.hz())
.telemetry("/var/lib/node_exporter/textfile/horus.json");
Then use a cron job or sidecar script to convert the JSON file to Prometheus exposition format:
#!/usr/bin/env python3
# json_to_prom.py — convert HORUS JSON to Prometheus textfile format
import json, sys
with open(sys.argv[1]) as f:
snapshot = json.load(f)
output = sys.argv[2] if len(sys.argv) > 2 else "/var/lib/node_exporter/textfile/horus.prom"
lines = []
for m in snapshot["metrics"]:
name = m["name"].replace(".", "_")
labels = m.get("labels", {})
label_str = ",".join(f'{k}="{v}"' for k, v in labels.items())
label_part = "{" + label_str + "}" if label_str else ""
value_dict = m["value"]
for vtype in ("Gauge", "Counter"):
if vtype in value_dict:
lines.append(f"horus_{name}{label_part} {value_dict[vtype]}")
with open(output, "w") as f:
f.write("\n".join(lines) + "\n")
Run it periodically: watch -n 1 python3 json_to_prom.py /var/lib/node_exporter/textfile/horus.json
Performance Overhead
All telemetry export happens outside the scheduler's critical timing path. The overhead per export cycle depends on the endpoint:
| Endpoint | Overhead per export | Thread | Notes |
|---|---|---|---|
| HTTP | ~50us on the scheduler thread | Background thread does the actual POST | Scheduler only pays for build_snapshot() + try_send() into the bounded channel. Network I/O is fully off the RT thread. |
| UDP | ~5us | Scheduler thread (single syscall) | sendto() is fire-and-forget. Socket is pre-bound at startup. No connection overhead per export. |
| File | ~10us | Scheduler thread (buffered write) | File::create() + write_all(). Goes to kernel page cache, not disk. Actual disk flush is asynchronous. |
| Stdout | Negligible | Scheduler thread | println!() to terminal. Only useful for debugging; not recommended in production. |
| Disabled | Zero | N/A | No-op. should_export() returns false immediately. |
What "overhead" means in practice
At a 100 Hz tick rate, the scheduler has a 10 ms budget per tick. A 50us telemetry export (the worst case for HTTP) consumes 0.5% of that budget, and it only runs once per second (every 100th tick), not every tick. The amortized per-tick cost is effectively 0.005%.
For UDP and file endpoints, the overhead is even smaller. The should_export() check (a single Instant::elapsed() comparison) runs every tick and costs <1us.
Memory overhead
TelemetryManager maintains a HashMap<String, Metric> that grows with the number of unique metric names. For a typical system with 10 nodes, this is approximately 2-4 KB. The bounded channel for HTTP holds at most 4 snapshots, each roughly 1-2 KB serialized.
Complete Example
// simplified
use horus::prelude::*;
struct SensorNode {
pub_topic: Topic<Imu>,
}
impl SensorNode {
fn new() -> Result<Self> {
Ok(Self { pub_topic: Topic::new("imu.raw")? })
}
}
impl Node for SensorNode {
fn name(&self) -> &str { "Sensor" }
fn tick(&mut self) {
self.pub_topic.send(Imu {
orientation: [1.0, 0.0, 0.0, 0.0],
angular_velocity: [0.0, 0.0, 0.0],
linear_acceleration: [0.0, 0.0, 9.81],
});
}
}
fn main() -> Result<()> {
let mut scheduler = Scheduler::new()
.tick_rate(100_u64.hz())
.telemetry("http://localhost:9090/metrics"); // export every 1s
scheduler.add(SensorNode::new()?)
.order(0)
.rate(100_u64.hz())
.build()?;
scheduler.run()
}
Common Errors
| Symptom | Cause | Fix |
|---|---|---|
| No metrics exported | Telemetry endpoint not configured | Add .telemetry("http://localhost:9090/metrics") to Scheduler::new() builder |
| HTTP metrics silently dropped | Receiver is slow or unreachable, bounded channel full (capacity 4) | Check receiver is running, increase receiver throughput — the scheduler never blocks |
| Compile error: telemetry method not found | telemetry feature disabled | Ensure horus dependency has telemetry feature (enabled by default) |
| File endpoint not updating | File overwritten each cycle but viewer caches | Re-read the file or use watch cat /tmp/horus-metrics.json |
| UDP metrics lost | Network packet loss on LAN | Expected behavior for UDP; use HTTP for reliable delivery |
Design Decisions
Understanding why telemetry works this way helps you make the right integration choices for your system.
Why non-blocking (RT safety)
HORUS is a real-time robotics framework. The scheduler tick loop has hard timing budgets -- a motor control node running at 1 kHz has exactly 1 ms per tick. If the telemetry export blocked on a slow HTTP server or a full TCP buffer, it would cause deadline misses and potentially unsafe behavior. The non-blocking design ensures that telemetry is always best-effort and never degrades control loop timing.
The bounded channel with try_send() is the key mechanism. On the scheduler thread, export() either succeeds instantly (channel has space) or drops the snapshot instantly (channel full). Both paths complete in nanoseconds. The actual network I/O happens on a separate thread that the scheduler never waits on.
Why bounded channel with drop (not backpressure)
Backpressure means "slow the producer when the consumer is slow." In telemetry for a real-time system, this is exactly wrong -- you never want your 1 kHz motor controller to slow down because Grafana's ingest endpoint is overloaded.
The bounded channel (capacity 4) provides a small buffer for transient slowness (e.g., a brief network hiccup) while hard-capping memory usage. When the buffer is full, the oldest-unsent snapshot becomes stale anyway -- dropping it is the correct behavior because a newer snapshot with fresher data will arrive within a second.
The capacity of 4 is deliberate: at the default 1-second export interval, it absorbs up to 4 seconds of receiver downtime before dropping begins. This handles common transient issues (GC pauses, brief network congestion) without unbounded memory growth.
Why JSON (not Protocol Buffers or MessagePack)
Three reasons:
-
Debuggability -- You can pipe HORUS telemetry to
jq, read it in a browser, orcatthe file endpoint directly. Binary formats require dedicated tools to inspect. -
No build dependency -- Protocol Buffers require
protocand.protofiles. MessagePack requires additional codec libraries. JSON serialization viaserde_jsonis already a transitive dependency of HORUS and adds zero extra build complexity. -
Adequate performance -- Serializing a typical telemetry snapshot (~10 metrics) to JSON takes <10us. For a 1 Hz export, this is negligible. If HORUS ever needs sub-millisecond export rates (unlikely for telemetry), a binary format would be reconsidered.
Why feature-gated
The telemetry feature on horus_core is enabled by default, but it can be disabled for two reasons:
- Binary size -- Disabling
telemetryremovesserde_json, theTelemetryManager, and the HTTP background thread infrastructure. For deeply embedded targets with tight flash constraints, this matters. - Attack surface -- In locked-down production deployments, some teams prefer to compile out all network-facing code paths. Disabling the feature guarantees at the type level that no telemetry endpoint is reachable.
When disabled, all telemetry builder methods are either absent (compile error if used) or no-op, depending on the feature gate configuration. There is no runtime overhead.
Trade-offs
| Decision | Benefit | Cost |
|---|---|---|
Non-blocking try_send() for HTTP | Scheduler never stalls on network I/O | Snapshots may be silently dropped if receiver is slow |
| Bounded channel (capacity 4) | Predictable memory usage, absorbs transient slowness | Only 4 seconds of buffering before drops begin |
| Silent drop (no error log on full channel) | No log spam when receiver is down for extended periods | Harder to notice when telemetry delivery is degraded (check receiver-side metrics) |
| JSON serialization | Human-readable, no build dependencies, easy debugging | ~5x larger payload than protobuf; ~3x slower serialization (still <10us per snapshot) |
| Single background thread (HTTP only) | Minimal resource usage, simple shutdown semantics | HTTP throughput limited to one in-flight POST at a time; pipelining not supported |
| File endpoint overwrites (not appends) | Simple, bounded disk usage, always shows latest state | No history; use HTTP + a time-series database for historical data |
| UDP fire-and-forget | Lowest overhead (~5us), good for LAN monitoring | No delivery guarantees; packet loss on congested networks |
| 1-second default interval | Low overhead, sufficient for most dashboards | Not suitable for sub-second alerting (use horus monitor for real-time views) |
| Feature-gated (compile-time opt-out) | Zero overhead when disabled, smaller binary | Must recompile to enable/disable; no runtime toggle |
Per-metric HashMap accumulation | Deduplicates metrics by name, latest value always wins | Small allocation per unique metric name; not lock-free (acceptable since only the scheduler thread writes) |
See Also
- Monitor — Built-in web and TUI dashboards (no external setup needed)
- Logging — Structured text logging with
hlog!macros - Scheduler API — Scheduler builder methods including
.telemetry() - Debugging Workflows — Using telemetry data to diagnose performance issues
- Operations — Production monitoring and deployment patterns