Perception Messages
Perception messages carry computer vision results — detected objects, tracked targets, body pose keypoints, segmentation masks, and plane surfaces. These are the outputs of your ML models and the inputs to your planning/control systems.
from horus import (
BoundingBox2D, BoundingBox3D, Detection, Detection3D,
TrackedObject, TrackingHeader,
Landmark, Landmark3D, LandmarkArray,
PlaneDetection, PlaneArray,
SegmentationMask,
)
BoundingBox2D
Axis-aligned bounding box in 2D image coordinates. The fundamental output of object detectors like YOLO, SSD, Faster R-CNN.
Constructor
bbox = BoundingBox2D(x=10.0, y=20.0, width=100.0, height=200.0)
.from_center(cx, cy, width, height) — From Center Point
bbox = BoundingBox2D.from_center(cx=60.0, cy=120.0, width=100.0, height=200.0)
Many ML models output bounding boxes as (center_x, center_y, width, height). This factory creates a BoundingBox2D from that format.
.area() — Box Area in Pixels
print(bbox.area()) # 20000.0
Width × height. Use for filtering — ignore very small detections (noise) or very large ones (false positives spanning the whole image).
.iou(other) — Intersection Over Union
bbox_a = BoundingBox2D(x=0.0, y=0.0, width=100.0, height=100.0)
bbox_b = BoundingBox2D(x=50.0, y=50.0, width=100.0, height=100.0)
print(bbox_a.iou(bbox_b)) # ~0.143 (partial overlap)
Returns 0.0 (no overlap) to 1.0 (identical boxes). This is the core metric for non-maximum suppression (NMS) — when your detector finds multiple boxes for the same object, keep the highest-confidence one and suppress any box with IoU > threshold (typically 0.3-0.5).
# Simple NMS pattern
detections.sort(key=lambda d: d.confidence, reverse=True)
kept = []
for det in detections:
if all(det.bbox.iou(k.bbox) < 0.5 for k in kept):
kept.append(det)
.as_tuple() / .as_xyxy() — Format Conversion
x, y, w, h = bbox.as_tuple() # (x, y, width, height)
x1, y1, x2, y2 = bbox.as_xyxy() # (x_min, y_min, x_max, y_max)
Different drawing libraries expect different formats. OpenCV uses (x, y, w, h), some plotting tools use (x1, y1, x2, y2).
BoundingBox3D
A 3D bounding box with center, dimensions, and orientation. The constructor takes a single yaw angle for ground-plane rotation (the most common case). For full 3D orientation, use with_rotation.
Constructor
bbox = BoundingBox3D(cx=1.0, cy=2.0, cz=0.5, length=2.0, width=1.0, height=1.5, yaw=0.3)
.with_rotation(cx, cy, cz, length, width, height, roll, pitch, yaw) — Full 3D Rotation
bbox = BoundingBox3D.with_rotation(
cx=1.0, cy=2.0, cz=0.5,
length=2.0, width=1.0, height=1.5,
roll=0.0, pitch=0.1, yaw=0.3
)
Use this when the detected object is tilted or on a slope. The constructor only accepts yaw (rotation around the vertical axis), which is sufficient for objects on flat ground. with_rotation lets you specify all three Euler angles for objects at arbitrary orientations — a crate on a ramp, a drone in flight, or a wall-mounted sensor.
Detection
A single 2D object detection result — class + confidence + bounding box.
Constructor
det = Detection(class_name="person", confidence=0.95,
x=10.0, y=20.0, width=100.0, height=200.0)
.is_confident(threshold) — Filter Low Confidence
if det.is_confident(0.5):
print(f"Detected {det.class_name} at {det.confidence:.0%}")
Returns True if confidence exceeds the threshold. Typical thresholds:
- 0.3-0.5: Real-time applications (more detections, some false positives)
- 0.7-0.9: High-precision applications (fewer detections, almost no false positives)
.with_class_id(class_id) — Set Numeric Class ID
det = det.with_class_id(1) # COCO class ID for "person"
Returns a new Detection with the class ID set. Many ML frameworks output numeric class IDs alongside string names.
Detection3D
3D object detection with position, size, and optional velocity.
.with_velocity(vx, vy, vz) — Add Motion Estimate
det3d = Detection3D(class_name="car", confidence=0.9,
cx=5.0, cy=2.0, cz=0.0,
length=4.5, width=1.8, height=1.5)
det3d = det3d.with_velocity(vx=10.0, vy=0.0, vz=0.0) # Moving at 10 m/s in x
Returns a new Detection3D with velocity components. Use when your 3D detector also estimates object motion (e.g., from multi-frame tracking or radar fusion).
TrackedObject
Multi-object tracking state with a lifecycle: tentative → confirmed → deleted.
A new detection starts as tentative. After being seen in multiple consecutive frames, it's confirmed. If it's not seen for too long, it's deleted. This lifecycle prevents spurious single-frame detections from being treated as real objects.
Constructor
tracked = TrackedObject(track_id=42, class_id=1, confidence=0.9,
x=1.0, y=2.0, width=3.0, height=4.0)
.is_tentative() / .is_confirmed() / .is_deleted() — State Queries
if tracked.is_tentative():
print("New detection — not yet reliable")
elif tracked.is_confirmed():
print("Stable track — use for planning")
elif tracked.is_deleted():
print("Lost track — remove from state")
.confirm() — Promote to Confirmed
tracked.confirm() # Tentative → Confirmed
Call after the object has been matched across enough frames (typically 3-5). Only confirmed tracks should be used for navigation and planning decisions.
.update(bbox, confidence) — New Frame Data
tracked.update(new_bbox, new_confidence)
Updates the track with the latest detection. Resets the "time since update" counter. Call this every frame where the object is re-detected.
Common mistake: Forgetting to call
update()for matched tracks. Without it,time_since_updategrows and the track eventually gets deleted even though you keep detecting the object.
.mark_missed() — Not Seen This Frame
tracked.mark_missed()
Call when the object was NOT detected in the current frame. Increments the miss counter — after enough misses, the track should be deleted.
.delete() — Remove Track
tracked.delete()
Marks the track as deleted. is_deleted() returns True.
.speed() / .heading() — Motion Estimation
print(f"Speed: {tracked.speed():.1f} px/frame")
print(f"Heading: {tracked.heading():.1f} rad")
Computed from the tracked trajectory. Speed is in pixels per frame (or meters per frame if tracking in world coordinates). Heading is the direction of motion in radians.
Landmark, Landmark3D, LandmarkArray
Body pose estimation keypoints — skeleton joints from COCO, MediaPipe, or custom pose models.
Landmark — 2D Keypoint
lm = Landmark(x=100.0, y=200.0, visibility=0.95, index=5)
visible_lm = Landmark.visible(x=100.0, y=200.0, index=5) # visibility=1.0
.is_visible(threshold) — Filter Occluded Keypoints
if lm.is_visible(0.5):
# Keypoint is visible — use for pose estimation
pass
Visibility is a confidence score (0.0 = occluded/not detected, 1.0 = clearly visible). Filter low-visibility keypoints to avoid using unreliable data.
.distance_to(other) — Keypoint Distance
dist = left_wrist.distance_to(right_wrist)
Landmark3D — 3D Keypoint
lm3d = Landmark3D(x=1.0, y=2.0, z=0.5, visibility=0.9, index=10)
lm2d = lm3d.to_2d() # Project to 2D (drops z)
LandmarkArray — Skeleton Presets
# Standard presets for popular pose models
skeleton = LandmarkArray.coco_pose() # 17 COCO keypoints
skeleton = LandmarkArray.mediapipe_pose() # 33 MediaPipe pose keypoints
hand = LandmarkArray.mediapipe_hand() # 21 hand keypoints
face = LandmarkArray.mediapipe_face() # 478 face mesh keypoints
These presets set the correct number of landmarks and dimension for each model. Fill in the actual keypoint coordinates from your model's output.
PlaneDetection
Detected planar surfaces — floors, walls, tables. Used for navigation (floor detection), manipulation (table surface), and augmented reality.
.distance_to_point(px, py, pz) — Point-to-Plane Distance
plane = PlaneDetection(...)
dist = plane.distance_to_point(1.0, 2.0, 0.5)
# Signed distance — positive = above plane, negative = below
The signed perpendicular distance from a point to the plane. Use this to check if objects are on, above, or below a surface.
.contains_point(px, py, pz, tolerance) — Is a Point on This Plane?
if plane.contains_point(1.0, 2.0, 0.01, tolerance=0.05):
print("Point is on the table surface (within 5cm)")
Returns True if the point is within tolerance meters of the plane. Use for classifying which objects are on which surface.
SegmentationMask
Pixel-level image segmentation — semantic (class per pixel), instance (unique ID per object), or panoptic (both).
Factory Methods
# Semantic: what class is each pixel?
mask = SegmentationMask.semantic(width=640, height=480, num_classes=21)
# Instance: which object is each pixel?
mask = SegmentationMask.instance(width=640, height=480)
# Panoptic: both class AND instance for each pixel
mask = SegmentationMask.panoptic(width=640, height=480, num_classes=21)
.is_semantic() / .is_instance() / .is_panoptic() — Check Type
if mask.is_semantic():
print("Semantic segmentation mask")
.data_size() / .data_size_u16()
print(f"Data: {mask.data_size()} bytes (u8), {mask.data_size_u16()} elements (u16)")
PointField
Describes a single field in a point cloud — name, byte offset, datatype, and element count. Used when defining custom point cloud formats (e.g., XYZ + RGB + intensity).
Constructor
field = PointField(name="x", offset=0, datatype=7, count=1) # FLOAT32
.field_size() — Byte Size of One Element
size = field.field_size() # e.g., 4 for FLOAT32, 8 for FLOAT64
Returns the byte size of a single element based on the datatype. Useful when computing byte offsets for the next field in a point cloud layout, or when parsing raw point cloud buffers.
See Also
- Vision Messages — Image, PointCloud, DepthImage
- Geometry Messages — Point3 for 3D positions
- Rust Perception Messages — Rust API reference