Perception Messages

Perception messages carry computer vision results — detected objects, tracked targets, body pose keypoints, segmentation masks, and plane surfaces. These are the outputs of your ML models and the inputs to your planning/control systems.

from horus import (
    BoundingBox2D, BoundingBox3D, Detection, Detection3D,
    TrackedObject, TrackingHeader,
    Landmark, Landmark3D, LandmarkArray,
    PlaneDetection, PlaneArray,
    SegmentationMask,
)

BoundingBox2D

Axis-aligned bounding box in 2D image coordinates. The fundamental output of object detectors like YOLO, SSD, Faster R-CNN.

Constructor

bbox = BoundingBox2D(x=10.0, y=20.0, width=100.0, height=200.0)

.from_center(cx, cy, width, height) — From Center Point

bbox = BoundingBox2D.from_center(cx=60.0, cy=120.0, width=100.0, height=200.0)

Many ML models output bounding boxes as (center_x, center_y, width, height). This factory creates a BoundingBox2D from that format.

.area() — Box Area in Pixels

print(bbox.area())  # 20000.0

Width × height. Use for filtering — ignore very small detections (noise) or very large ones (false positives spanning the whole image).

.iou(other) — Intersection Over Union

bbox_a = BoundingBox2D(x=0.0, y=0.0, width=100.0, height=100.0)
bbox_b = BoundingBox2D(x=50.0, y=50.0, width=100.0, height=100.0)
print(bbox_a.iou(bbox_b))  # ~0.143 (partial overlap)

Returns 0.0 (no overlap) to 1.0 (identical boxes). This is the core metric for non-maximum suppression (NMS) — when your detector finds multiple boxes for the same object, keep the highest-confidence one and suppress any box with IoU > threshold (typically 0.3-0.5).

# Simple NMS pattern
detections.sort(key=lambda d: d.confidence, reverse=True)
kept = []
for det in detections:
    if all(det.bbox.iou(k.bbox) < 0.5 for k in kept):
        kept.append(det)

.as_tuple() / .as_xyxy() — Format Conversion

x, y, w, h = bbox.as_tuple()       # (x, y, width, height)
x1, y1, x2, y2 = bbox.as_xyxy()    # (x_min, y_min, x_max, y_max)

Different drawing libraries expect different formats. OpenCV uses (x, y, w, h), some plotting tools use (x1, y1, x2, y2).


BoundingBox3D

A 3D bounding box with center, dimensions, and orientation. The constructor takes a single yaw angle for ground-plane rotation (the most common case). For full 3D orientation, use with_rotation.

Constructor

bbox = BoundingBox3D(cx=1.0, cy=2.0, cz=0.5, length=2.0, width=1.0, height=1.5, yaw=0.3)

.with_rotation(cx, cy, cz, length, width, height, roll, pitch, yaw) — Full 3D Rotation

bbox = BoundingBox3D.with_rotation(
    cx=1.0, cy=2.0, cz=0.5,
    length=2.0, width=1.0, height=1.5,
    roll=0.0, pitch=0.1, yaw=0.3
)

Use this when the detected object is tilted or on a slope. The constructor only accepts yaw (rotation around the vertical axis), which is sufficient for objects on flat ground. with_rotation lets you specify all three Euler angles for objects at arbitrary orientations — a crate on a ramp, a drone in flight, or a wall-mounted sensor.


Detection

A single 2D object detection result — class + confidence + bounding box.

Constructor

det = Detection(class_name="person", confidence=0.95,
                x=10.0, y=20.0, width=100.0, height=200.0)

.is_confident(threshold) — Filter Low Confidence

if det.is_confident(0.5):
    print(f"Detected {det.class_name} at {det.confidence:.0%}")

Returns True if confidence exceeds the threshold. Typical thresholds:

  • 0.3-0.5: Real-time applications (more detections, some false positives)
  • 0.7-0.9: High-precision applications (fewer detections, almost no false positives)

.with_class_id(class_id) — Set Numeric Class ID

det = det.with_class_id(1)  # COCO class ID for "person"

Returns a new Detection with the class ID set. Many ML frameworks output numeric class IDs alongside string names.


Detection3D

3D object detection with position, size, and optional velocity.

.with_velocity(vx, vy, vz) — Add Motion Estimate

det3d = Detection3D(class_name="car", confidence=0.9,
                     cx=5.0, cy=2.0, cz=0.0,
                     length=4.5, width=1.8, height=1.5)
det3d = det3d.with_velocity(vx=10.0, vy=0.0, vz=0.0)  # Moving at 10 m/s in x

Returns a new Detection3D with velocity components. Use when your 3D detector also estimates object motion (e.g., from multi-frame tracking or radar fusion).


TrackedObject

Multi-object tracking state with a lifecycle: tentative → confirmed → deleted.

A new detection starts as tentative. After being seen in multiple consecutive frames, it's confirmed. If it's not seen for too long, it's deleted. This lifecycle prevents spurious single-frame detections from being treated as real objects.

Constructor

tracked = TrackedObject(track_id=42, class_id=1, confidence=0.9,
                         x=1.0, y=2.0, width=3.0, height=4.0)

.is_tentative() / .is_confirmed() / .is_deleted() — State Queries

if tracked.is_tentative():
    print("New detection — not yet reliable")
elif tracked.is_confirmed():
    print("Stable track — use for planning")
elif tracked.is_deleted():
    print("Lost track — remove from state")

.confirm() — Promote to Confirmed

tracked.confirm()  # Tentative → Confirmed

Call after the object has been matched across enough frames (typically 3-5). Only confirmed tracks should be used for navigation and planning decisions.

.update(bbox, confidence) — New Frame Data

tracked.update(new_bbox, new_confidence)

Updates the track with the latest detection. Resets the "time since update" counter. Call this every frame where the object is re-detected.

Common mistake: Forgetting to call update() for matched tracks. Without it, time_since_update grows and the track eventually gets deleted even though you keep detecting the object.

.mark_missed() — Not Seen This Frame

tracked.mark_missed()

Call when the object was NOT detected in the current frame. Increments the miss counter — after enough misses, the track should be deleted.

.delete() — Remove Track

tracked.delete()

Marks the track as deleted. is_deleted() returns True.

.speed() / .heading() — Motion Estimation

print(f"Speed: {tracked.speed():.1f} px/frame")
print(f"Heading: {tracked.heading():.1f} rad")

Computed from the tracked trajectory. Speed is in pixels per frame (or meters per frame if tracking in world coordinates). Heading is the direction of motion in radians.


Landmark, Landmark3D, LandmarkArray

Body pose estimation keypoints — skeleton joints from COCO, MediaPipe, or custom pose models.

Landmark — 2D Keypoint

lm = Landmark(x=100.0, y=200.0, visibility=0.95, index=5)
visible_lm = Landmark.visible(x=100.0, y=200.0, index=5)  # visibility=1.0

.is_visible(threshold) — Filter Occluded Keypoints

if lm.is_visible(0.5):
    # Keypoint is visible — use for pose estimation
    pass

Visibility is a confidence score (0.0 = occluded/not detected, 1.0 = clearly visible). Filter low-visibility keypoints to avoid using unreliable data.

.distance_to(other) — Keypoint Distance

dist = left_wrist.distance_to(right_wrist)

Landmark3D — 3D Keypoint

lm3d = Landmark3D(x=1.0, y=2.0, z=0.5, visibility=0.9, index=10)
lm2d = lm3d.to_2d()  # Project to 2D (drops z)

LandmarkArray — Skeleton Presets

# Standard presets for popular pose models
skeleton = LandmarkArray.coco_pose()        # 17 COCO keypoints
skeleton = LandmarkArray.mediapipe_pose()   # 33 MediaPipe pose keypoints
hand = LandmarkArray.mediapipe_hand()       # 21 hand keypoints
face = LandmarkArray.mediapipe_face()       # 478 face mesh keypoints

These presets set the correct number of landmarks and dimension for each model. Fill in the actual keypoint coordinates from your model's output.


PlaneDetection

Detected planar surfaces — floors, walls, tables. Used for navigation (floor detection), manipulation (table surface), and augmented reality.

.distance_to_point(px, py, pz) — Point-to-Plane Distance

plane = PlaneDetection(...)
dist = plane.distance_to_point(1.0, 2.0, 0.5)
# Signed distance — positive = above plane, negative = below

The signed perpendicular distance from a point to the plane. Use this to check if objects are on, above, or below a surface.

.contains_point(px, py, pz, tolerance) — Is a Point on This Plane?

if plane.contains_point(1.0, 2.0, 0.01, tolerance=0.05):
    print("Point is on the table surface (within 5cm)")

Returns True if the point is within tolerance meters of the plane. Use for classifying which objects are on which surface.


SegmentationMask

Pixel-level image segmentation — semantic (class per pixel), instance (unique ID per object), or panoptic (both).

Factory Methods

# Semantic: what class is each pixel?
mask = SegmentationMask.semantic(width=640, height=480, num_classes=21)

# Instance: which object is each pixel?
mask = SegmentationMask.instance(width=640, height=480)

# Panoptic: both class AND instance for each pixel
mask = SegmentationMask.panoptic(width=640, height=480, num_classes=21)

.is_semantic() / .is_instance() / .is_panoptic() — Check Type

if mask.is_semantic():
    print("Semantic segmentation mask")

.data_size() / .data_size_u16()

print(f"Data: {mask.data_size()} bytes (u8), {mask.data_size_u16()} elements (u16)")

PointField

Describes a single field in a point cloud — name, byte offset, datatype, and element count. Used when defining custom point cloud formats (e.g., XYZ + RGB + intensity).

Constructor

field = PointField(name="x", offset=0, datatype=7, count=1)  # FLOAT32

.field_size() — Byte Size of One Element

size = field.field_size()  # e.g., 4 for FLOAT32, 8 for FLOAT64

Returns the byte size of a single element based on the datatype. Useful when computing byte offsets for the next field in a point cloud layout, or when parsing raw point cloud buffers.


See Also