Python Image

A camera image backed by shared memory for zero-copy inter-process communication. Only a small descriptor travels through the ring buffer; the actual pixel data stays in a shared memory pool. This enables real-time image pipelines at full camera frame rates without serialization overhead.

When to Use

Use Image when your robot has a camera and you need to share frames between nodes — for example, between a camera driver, a vision node, and a display node. A 1080p RGB image transfers in microseconds, not milliseconds.

ROS2 equivalent: sensor_msgs/Image — same concept, but HORUS uses shared memory pools instead of serialized byte buffers.

Constructor

# simplified
from horus import Image

# Image(height, width, encoding)
img = Image(480, 640, "rgb8")

Parameters:

height: int — Image height in pixels
width: int — Image width in pixels
encoding: str — Pixel format (default: "rgb8", see encoding table below)

Factory Methods

# simplified
# From NumPy array — copies data into shared memory pool
import numpy as np
pixels = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)
img = Image.from_numpy(pixels)                # encoding auto-detected from shape
img = Image.from_numpy(pixels, encoding="bgr8")  # explicit encoding

# From PyTorch tensor — copies into pool
import torch
tensor = torch.zeros(480, 640, 3, dtype=torch.uint8)
img = Image.from_torch(tensor, encoding="rgb8")

# From raw bytes — copies into pool
img = Image.from_bytes(raw_data, height=480, width=640, encoding="rgb8")

Factory	Parameters	Use case
`Image(h, w, enc)`	height, width, encoding	Create empty image to fill manually
`Image.from_numpy(arr, enc?)`	ndarray, optional encoding	Camera capture, OpenCV output
`Image.from_torch(tensor, enc?)`	Tensor, optional encoding	ML model output
`Image.from_bytes(data, h, w, enc)`	bytes, height, width, encoding	Network/file loading

Python takes (height, width), Rust takes (width, height). This matches each language's convention — NumPy/OpenCV use row-major (H, W), while graphics APIs use (W, H).

Supported Encodings

Encoding	Channels	Bytes/Pixel	When to use
`"mono8"`	1	1	Grayscale cameras, edge detection output
`"mono16"`	1	2	High dynamic range grayscale
`"rgb8"`	3	3	Standard color cameras (default)
`"bgr8"`	3	3	OpenCV output (OpenCV uses BGR internally)
`"rgba8"`	4	4	Images with transparency
`"bgra8"`	4	4	Windows/DirectX style with transparency
`"yuv422"`	2	2	Raw USB camera output
`"mono32f"`	1	4	ML model output (float grayscale)
`"rgb32f"`	3	12	HDR imaging, ML float output
`"bayer_rggb8"`	1	1	Raw sensor data before debayering
`"depth16"`	1	2	16-bit depth in millimeters (use `DepthImage` for float meters)

Properties

Property	Type	Description
`height`	`int`	Image height in pixels
`width`	`int`	Image width in pixels
`channels`	`int`	Number of color channels (1, 2, 3, or 4)
`encoding`	`str`	Encoding string (e.g., `"rgb8"`)
`dtype`	`str`	Data type string (e.g., `"uint8"`)
`nbytes`	`int`	Total pixel data size in bytes
`step`	`int`	Row stride in bytes (`width * bytes_per_pixel`)
`frame_id`	`str`	Coordinate frame (e.g., `"camera_front"`)
`timestamp_ns`	`int`	Timestamp in nanoseconds since epoch

Methods

Pixel Access

# simplified
# Read pixel at (x, y) — returns channel values as list
pixel = img.pixel(320, 240)  # e.g., [128, 64, 255] for RGB

# Write pixel at (x, y)
img.set_pixel(320, 240, [255, 0, 0])  # Red pixel

# Fill entire image with one color
img.fill([0, 0, 0])  # Black

# Copy raw bytes into image
img.copy_from(raw_bytes)

# Extract region of interest (returns raw bytes)
roi_data = img.roi(x=100, y=100, w=200, h=200)

Method	Signature	Description
`pixel(x, y)`	`(int, int) -> list[int]`	Read pixel channel values
`set_pixel(x, y, val)`	`(int, int, list[int]) -> None`	Write pixel
`fill(val)`	`(list[int]) -> None`	Fill entire image
`copy_from(data)`	`(bytes) -> None`	Overwrite pixel data from bytes
`roi(x, y, w, h)`	`(int, int, int, int) -> list[int]`	Extract region of interest

Framework Conversions

# simplified
# To NumPy — zero-copy (shared memory view)
np_array = img.to_numpy()  # Shape: (H, W, C) for color, (H, W) for mono

# To PyTorch — zero-copy via DLPack
torch_tensor = img.to_torch()

# To JAX — zero-copy via DLPack
jax_array = img.to_jax()

All to_*() methods are zero-copy (~3 us). They return views into the shared memory pool — no pixel data is copied.

from_*() methods copy data into the pool (one copy at publish time). This is necessary because the pool allocator controls memory layout.

Metadata

# simplified
# Set coordinate frame for TransformFrame integration
img.set_frame_id("camera_front")

# Set timestamp for time-based queries
img.set_timestamp_ns(horus.timestamp_ns())

Complete Example

# simplified
import horus
from horus import Image, Topic
import numpy as np

img_topic = Topic(Image)

def camera_tick(node):
    # Simulate camera capture
    frame = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)
    img = Image.from_numpy(frame, encoding="rgb8")
    img.set_frame_id("camera_front")
    img.set_timestamp_ns(horus.timestamp_ns())
    img_topic.send(img)

def vision_tick(node):
    img = img_topic.recv()
    if img:
        # Zero-copy to NumPy for OpenCV processing
        pixels = img.to_numpy()
        gray = np.mean(pixels, axis=2).astype(np.uint8)
        edges = np.abs(np.diff(gray, axis=1))
        node.log_info(f"Detected {np.sum(edges > 128)} edge pixels")

camera = horus.Node(name="camera", tick=camera_tick, rate=30, order=0, pubs=["image"])
vision = horus.Node(name="vision", tick=vision_tick, rate=30, order=1, subs=["image"])
horus.run(camera, vision)

Tensor Interop

Convert an Image to a general-purpose Tensor for Pythonic operations. This is zero-copy -- the Tensor shares the same shared memory:

# simplified
t = img.as_tensor()              # shape=[480, 640, 3], dtype=uint8
t[0:10] += 128                   # brighten top rows (writes to SHM)
features = t.flatten()           # Tensor reshape
pt = torch.from_dlpack(t)        # zero-copy to PyTorch

Images also support direct indexing and arithmetic:

# simplified
pixel = img[240, 320]            # read pixel at (y, x)
img[0:10] = 255                  # write to rows
bright = img + 50                # returns Tensor
normalized = img / 255.0         # returns Tensor

See Tensor for the full Pythonic API (reshape, arithmetic, reductions, type conversion).

Design Decisions

Why pool-backed shared memory instead of serialized byte buffers? Serializing a 1080p RGB image (6 MB) takes ~2 ms and doubles memory usage (sender buffer + receiver buffer). With pool-backed shared memory, only the 64-byte descriptor is copied; the pixel data stays in one place and every subscriber maps the same physical memory. Latency stays under 10 us regardless of resolution.

Why fixed encoding enums instead of arbitrary format strings? Fixed enums enable compile-time size calculations (step = width * bytes_per_pixel) and prevent encoding mismatches between publisher and subscriber. The enum covers all common camera output formats; for exotic encodings, use GenericMessage with manual layout.

Why from_numpy() copies but to_numpy() doesn't? Writing into the shared memory pool requires placing data at a specific pool slot. from_numpy() copies once into that slot. Reading (to_numpy()) returns a view into the existing pool memory — no copy needed. One copy on publish, zero copies on subscribe.