Python Image

A camera image backed by shared memory for zero-copy inter-process communication. Only a small descriptor travels through the ring buffer; the actual pixel data stays in a shared memory pool. This enables real-time image pipelines at full camera frame rates without serialization overhead.

When to Use

Use Image when your robot has a camera and you need to share frames between nodes — for example, between a camera driver, a vision node, and a display node. A 1080p RGB image transfers in microseconds, not milliseconds.

ROS2 equivalent: sensor_msgs/Image — same concept, but HORUS uses shared memory pools instead of serialized byte buffers.

Constructor

from horus import Image

# Image(height, width, encoding)
img = Image(480, 640, "rgb8")

Parameters:

  • height: int — Image height in pixels
  • width: int — Image width in pixels
  • encoding: str — Pixel format (default: "rgb8", see encoding table below)

Factory Methods

# From NumPy array — copies data into shared memory pool
import numpy as np
pixels = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)
img = Image.from_numpy(pixels)                # encoding auto-detected from shape
img = Image.from_numpy(pixels, encoding="bgr8")  # explicit encoding

# From PyTorch tensor — copies into pool
import torch
tensor = torch.zeros(480, 640, 3, dtype=torch.uint8)
img = Image.from_torch(tensor, encoding="rgb8")

# From raw bytes — copies into pool
img = Image.from_bytes(raw_data, height=480, width=640, encoding="rgb8")
FactoryParametersUse case
Image(h, w, enc)height, width, encodingCreate empty image to fill manually
Image.from_numpy(arr, enc?)ndarray, optional encodingCamera capture, OpenCV output
Image.from_torch(tensor, enc?)Tensor, optional encodingML model output
Image.from_bytes(data, h, w, enc)bytes, height, width, encodingNetwork/file loading

Python takes (height, width), Rust takes (width, height). This matches each language's convention — NumPy/OpenCV use row-major (H, W), while graphics APIs use (W, H).

Supported Encodings

EncodingChannelsBytes/PixelWhen to use
"mono8"11Grayscale cameras, edge detection output
"mono16"12High dynamic range grayscale
"rgb8"33Standard color cameras (default)
"bgr8"33OpenCV output (OpenCV uses BGR internally)
"rgba8"44Images with transparency
"bgra8"44Windows/DirectX style with transparency
"yuv422"22Raw USB camera output
"mono32f"14ML model output (float grayscale)
"rgb32f"312HDR imaging, ML float output
"bayer_rggb8"11Raw sensor data before debayering
"depth16"1216-bit depth in millimeters (use DepthImage for float meters)

Properties

PropertyTypeDescription
heightintImage height in pixels
widthintImage width in pixels
channelsintNumber of color channels (1, 2, 3, or 4)
encodingstrEncoding string (e.g., "rgb8")
dtypestrData type string (e.g., "uint8")
nbytesintTotal pixel data size in bytes
stepintRow stride in bytes (width * bytes_per_pixel)
frame_idstrCoordinate frame (e.g., "camera_front")
timestamp_nsintTimestamp in nanoseconds since epoch

Methods

Pixel Access

# Read pixel at (x, y) — returns channel values as list
pixel = img.pixel(320, 240)  # e.g., [128, 64, 255] for RGB

# Write pixel at (x, y)
img.set_pixel(320, 240, [255, 0, 0])  # Red pixel

# Fill entire image with one color
img.fill([0, 0, 0])  # Black

# Copy raw bytes into image
img.copy_from(raw_bytes)

# Extract region of interest (returns raw bytes)
roi_data = img.roi(x=100, y=100, w=200, h=200)
MethodSignatureDescription
pixel(x, y)(int, int) -> list[int]Read pixel channel values
set_pixel(x, y, val)(int, int, list[int]) -> NoneWrite pixel
fill(val)(list[int]) -> NoneFill entire image
copy_from(data)(bytes) -> NoneOverwrite pixel data from bytes
roi(x, y, w, h)(int, int, int, int) -> list[int]Extract region of interest

Framework Conversions

# To NumPy — zero-copy (shared memory view)
np_array = img.to_numpy()  # Shape: (H, W, C) for color, (H, W) for mono

# To PyTorch — zero-copy via DLPack
torch_tensor = img.to_torch()

# To JAX — zero-copy via DLPack
jax_array = img.to_jax()

All to_*() methods are zero-copy (~3 us). They return views into the shared memory pool — no pixel data is copied.

from_*() methods copy data into the pool (one copy at publish time). This is necessary because the pool allocator controls memory layout.

Metadata

# Set coordinate frame for TransformFrame integration
img.set_frame_id("camera_front")

# Set timestamp for time-based queries
img.set_timestamp_ns(horus.timestamp_ns())

Complete Example

import horus
from horus import Image, Topic
import numpy as np

img_topic = Topic(Image)

def camera_tick(node):
    # Simulate camera capture
    frame = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)
    img = Image.from_numpy(frame, encoding="rgb8")
    img.set_frame_id("camera_front")
    img.set_timestamp_ns(horus.timestamp_ns())
    img_topic.send(img)

def vision_tick(node):
    img = img_topic.recv()
    if img:
        # Zero-copy to NumPy for OpenCV processing
        pixels = img.to_numpy()
        gray = np.mean(pixels, axis=2).astype(np.uint8)
        edges = np.abs(np.diff(gray, axis=1))
        node.log_info(f"Detected {np.sum(edges > 128)} edge pixels")

camera = horus.Node(name="camera", tick=camera_tick, rate=30, order=0, pubs=["image"])
vision = horus.Node(name="vision", tick=vision_tick, rate=30, order=1, subs=["image"])
horus.run(camera, vision)

Tensor Interop

Convert an Image to a general-purpose Tensor for Pythonic operations. This is zero-copy -- the Tensor shares the same shared memory:

t = img.as_tensor()              # shape=[480, 640, 3], dtype=uint8
t[0:10] += 128                   # brighten top rows (writes to SHM)
features = t.flatten()           # Tensor reshape
pt = torch.from_dlpack(t)        # zero-copy to PyTorch

Images also support direct indexing and arithmetic:

pixel = img[240, 320]            # read pixel at (y, x)
img[0:10] = 255                  # write to rows
bright = img + 50                # returns Tensor
normalized = img / 255.0         # returns Tensor

See Tensor for the full Pythonic API (reshape, arithmetic, reductions, type conversion).


Design Decisions

Why pool-backed shared memory instead of serialized byte buffers? Serializing a 1080p RGB image (6 MB) takes ~2 ms and doubles memory usage (sender buffer + receiver buffer). With pool-backed shared memory, only the 64-byte descriptor is copied; the pixel data stays in one place and every subscriber maps the same physical memory. Latency stays under 10 us regardless of resolution.

Why fixed encoding enums instead of arbitrary format strings? Fixed enums enable compile-time size calculations (step = width * bytes_per_pixel) and prevent encoding mismatches between publisher and subscriber. The enum covers all common camera output formats; for exotic encodings, use GenericMessage with manual layout.

Why from_numpy() copies but to_numpy() doesn't? Writing into the shared memory pool requires placing data at a specific pool slot. from_numpy() copies once into that slot. Reading (to_numpy()) returns a view into the existing pool memory — no copy needed. One copy on publish, zero copies on subscribe.


See Also