Vision Messages

Vision messages carry image data, point clouds, depth maps, and camera calibration. The pool-backed types (Image, PointCloud, DepthImage) use zero-copy shared memory for maximum throughput — no serialization overhead when passing images between nodes.

from horus import (
    Image, PointCloud, DepthImage,
    CameraInfo, CompressedImage,
    RegionOfInterest, StereoInfo,
)

Image (Pool-Backed)

Zero-copy image with numpy and PyTorch interop. Memory lives in shared memory — passing an Image between nodes costs ~3μs regardless of resolution.

Constructor

img = Image(height=480, width=640, encoding=0)  # RGB8

Numpy Integration

import numpy as np

# Create from numpy array
pixels = np.zeros((480, 640, 3), dtype=np.uint8)
img = Image.from_numpy(pixels)

# Convert to numpy (zero-copy view when possible)
arr = img.to_numpy()
arr[100, 200] = [255, 0, 0]  # Set a red pixel

PyTorch Integration

tensor = img.to_torch()  # CHW format for PyTorch models

Common mistake: Holding references to Image across ticks. Pool-backed images are recycled — the same memory slot may be reused on the next tick. Process the image within the current tick, or copy the data with img.to_numpy().copy().


PointCloud (Pool-Backed)

Zero-copy 3D point cloud with field descriptors.

cloud = PointCloud(num_points=1000)

# From numpy XYZ array
import numpy as np
xyz = np.random.randn(1000, 3).astype(np.float32)
cloud = PointCloud.from_xyz(xyz)

print(cloud.point_count())  # 1000

DepthImage (Pool-Backed)

Depth map from a depth camera (RealSense, Kinect, stereo).

depth = DepthImage(height=480, width=640)

# Per-pixel access
d = depth.get_depth(320, 240)  # Depth at image center (meters)
depth.set_depth(320, 240, 1.5)  # Set depth to 1.5m

# Statistics
stats = depth.depth_statistics()  # min, max, mean, valid_count

CameraInfo

Camera intrinsic calibration parameters — the link between 2D pixel coordinates and 3D world coordinates.

Constructor

cam = CameraInfo(width=640, height=480, fx=525.0, fy=525.0, cx=320.0, cy=240.0)

.focal_lengths() — Camera Focal Length

fx, fy = cam.focal_lengths()
print(f"Focal length: ({fx:.0f}, {fy:.0f}) pixels")

Returns (fx, fy) in pixels. The focal length determines how "zoomed in" the camera is — higher values = narrower field of view. In the pinhole camera model, a 3D point (X, Y, Z) projects to pixel (fxX/Z + cx, fyY/Z + cy).

.principal_point() — Optical Center

cx, cy = cam.principal_point()
print(f"Principal point: ({cx:.0f}, {cy:.0f})")

Returns (cx, cy) — the pixel where the optical axis hits the image sensor. Usually near the image center, but not exactly — calibration corrects for manufacturing imprecision.

.with_distortion_model(model) — Set Lens Distortion Type

cam = cam.with_distortion_model("plumb_bob")

Returns a new CameraInfo with the distortion model name. Common models:

  • "plumb_bob": Standard radial + tangential (5 coefficients)
  • "equidistant": Fisheye cameras
  • "none": Synthetic cameras (no distortion)

Example — 3D Projection from Depth:

from horus import CameraInfo, DepthImage, Point3

cam = CameraInfo(width=640, height=480, fx=525.0, fy=525.0, cx=320.0, cy=240.0)

def pixel_to_3d(cam, depth, u, v):
    """Convert pixel (u,v) + depth to 3D point."""
    fx, fy = cam.focal_lengths()
    cx, cy = cam.principal_point()
    z = depth.get_depth(u, v)
    if z is None or z <= 0:
        return None
    x = (u - cx) * z / fx
    y = (v - cy) * z / fy
    return Point3(x=x, y=y, z=z)

CompressedImage

JPEG/PNG compressed image for network transmission. Smaller than raw Image but requires decompression.

img = CompressedImage(format="jpeg", data=jpeg_bytes, width=640, height=480)
print(img.format_str())  # "jpeg"

Use CompressedImage for network topics or when bandwidth is limited. Use Image for local processing (zero-copy, no decompression needed).


RegionOfInterest

Rectangular region in an image — for cropping detections or specifying areas of interest.

.contains(x, y) — Is a Pixel Inside?

roi = RegionOfInterest(x=100, y=200, width=50, height=60)
if roi.contains(120, 220):
    print("Pixel is inside the ROI")

.area() — Region Size

print(f"ROI area: {roi.area()} pixels")  # 3000

StereoInfo

Stereo camera pair calibration with depth↔disparity conversion.

.depth_from_disparity(disparity) — Convert Disparity to Depth

stereo = StereoInfo(baseline=0.12)  # 12cm between cameras
depth = stereo.depth_from_disparity(disparity=30.0)
# depth = baseline × fx / disparity

The fundamental stereo vision equation: Z = B × f / d. Given disparity (pixel offset between left and right images), compute real-world depth. Larger baseline = better depth accuracy at long range.

.disparity_from_depth(depth) — Convert Depth to Disparity

disp = stereo.disparity_from_depth(depth=2.0)
# Expected disparity for an object at 2 meters

The inverse — useful for setting depth range limits in stereo matching algorithms.


See Also