Python Image
A camera image backed by shared memory for zero-copy inter-process communication. Only a small descriptor travels through the ring buffer; the actual pixel data stays in a shared memory pool. This enables real-time image pipelines at full camera frame rates without serialization overhead.
When to Use
Use Image when your robot has a camera and you need to share frames between nodes — for example, between a camera driver, a vision node, and a display node. A 1080p RGB image transfers in microseconds, not milliseconds.
ROS2 equivalent: sensor_msgs/Image — same concept, but HORUS uses shared memory pools instead of serialized byte buffers.
Constructor
from horus import Image
# Image(height, width, encoding)
img = Image(480, 640, "rgb8")
Parameters:
height: int— Image height in pixelswidth: int— Image width in pixelsencoding: str— Pixel format (default:"rgb8", see encoding table below)
Factory Methods
# From NumPy array — copies data into shared memory pool
import numpy as np
pixels = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)
img = Image.from_numpy(pixels) # encoding auto-detected from shape
img = Image.from_numpy(pixels, encoding="bgr8") # explicit encoding
# From PyTorch tensor — copies into pool
import torch
tensor = torch.zeros(480, 640, 3, dtype=torch.uint8)
img = Image.from_torch(tensor, encoding="rgb8")
# From raw bytes — copies into pool
img = Image.from_bytes(raw_data, height=480, width=640, encoding="rgb8")
| Factory | Parameters | Use case |
|---|---|---|
Image(h, w, enc) | height, width, encoding | Create empty image to fill manually |
Image.from_numpy(arr, enc?) | ndarray, optional encoding | Camera capture, OpenCV output |
Image.from_torch(tensor, enc?) | Tensor, optional encoding | ML model output |
Image.from_bytes(data, h, w, enc) | bytes, height, width, encoding | Network/file loading |
Python takes (height, width), Rust takes (width, height). This matches each language's convention — NumPy/OpenCV use row-major (H, W), while graphics APIs use (W, H).
Supported Encodings
| Encoding | Channels | Bytes/Pixel | When to use |
|---|---|---|---|
"mono8" | 1 | 1 | Grayscale cameras, edge detection output |
"mono16" | 1 | 2 | High dynamic range grayscale |
"rgb8" | 3 | 3 | Standard color cameras (default) |
"bgr8" | 3 | 3 | OpenCV output (OpenCV uses BGR internally) |
"rgba8" | 4 | 4 | Images with transparency |
"bgra8" | 4 | 4 | Windows/DirectX style with transparency |
"yuv422" | 2 | 2 | Raw USB camera output |
"mono32f" | 1 | 4 | ML model output (float grayscale) |
"rgb32f" | 3 | 12 | HDR imaging, ML float output |
"bayer_rggb8" | 1 | 1 | Raw sensor data before debayering |
"depth16" | 1 | 2 | 16-bit depth in millimeters (use DepthImage for float meters) |
Properties
| Property | Type | Description |
|---|---|---|
height | int | Image height in pixels |
width | int | Image width in pixels |
channels | int | Number of color channels (1, 2, 3, or 4) |
encoding | str | Encoding string (e.g., "rgb8") |
dtype | str | Data type string (e.g., "uint8") |
nbytes | int | Total pixel data size in bytes |
step | int | Row stride in bytes (width * bytes_per_pixel) |
frame_id | str | Coordinate frame (e.g., "camera_front") |
timestamp_ns | int | Timestamp in nanoseconds since epoch |
Methods
Pixel Access
# Read pixel at (x, y) — returns channel values as list
pixel = img.pixel(320, 240) # e.g., [128, 64, 255] for RGB
# Write pixel at (x, y)
img.set_pixel(320, 240, [255, 0, 0]) # Red pixel
# Fill entire image with one color
img.fill([0, 0, 0]) # Black
# Copy raw bytes into image
img.copy_from(raw_bytes)
# Extract region of interest (returns raw bytes)
roi_data = img.roi(x=100, y=100, w=200, h=200)
| Method | Signature | Description |
|---|---|---|
pixel(x, y) | (int, int) -> list[int] | Read pixel channel values |
set_pixel(x, y, val) | (int, int, list[int]) -> None | Write pixel |
fill(val) | (list[int]) -> None | Fill entire image |
copy_from(data) | (bytes) -> None | Overwrite pixel data from bytes |
roi(x, y, w, h) | (int, int, int, int) -> list[int] | Extract region of interest |
Framework Conversions
# To NumPy — zero-copy (shared memory view)
np_array = img.to_numpy() # Shape: (H, W, C) for color, (H, W) for mono
# To PyTorch — zero-copy via DLPack
torch_tensor = img.to_torch()
# To JAX — zero-copy via DLPack
jax_array = img.to_jax()
All to_*() methods are zero-copy (~3 us). They return views into the shared memory pool — no pixel data is copied.
from_*() methods copy data into the pool (one copy at publish time). This is necessary because the pool allocator controls memory layout.
Metadata
# Set coordinate frame for TransformFrame integration
img.set_frame_id("camera_front")
# Set timestamp for time-based queries
img.set_timestamp_ns(horus.timestamp_ns())
Complete Example
import horus
from horus import Image, Topic
import numpy as np
img_topic = Topic(Image)
def camera_tick(node):
# Simulate camera capture
frame = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)
img = Image.from_numpy(frame, encoding="rgb8")
img.set_frame_id("camera_front")
img.set_timestamp_ns(horus.timestamp_ns())
img_topic.send(img)
def vision_tick(node):
img = img_topic.recv()
if img:
# Zero-copy to NumPy for OpenCV processing
pixels = img.to_numpy()
gray = np.mean(pixels, axis=2).astype(np.uint8)
edges = np.abs(np.diff(gray, axis=1))
node.log_info(f"Detected {np.sum(edges > 128)} edge pixels")
camera = horus.Node(name="camera", tick=camera_tick, rate=30, order=0, pubs=["image"])
vision = horus.Node(name="vision", tick=vision_tick, rate=30, order=1, subs=["image"])
horus.run(camera, vision)
Tensor Interop
Convert an Image to a general-purpose Tensor for Pythonic operations. This is zero-copy -- the Tensor shares the same shared memory:
t = img.as_tensor() # shape=[480, 640, 3], dtype=uint8
t[0:10] += 128 # brighten top rows (writes to SHM)
features = t.flatten() # Tensor reshape
pt = torch.from_dlpack(t) # zero-copy to PyTorch
Images also support direct indexing and arithmetic:
pixel = img[240, 320] # read pixel at (y, x)
img[0:10] = 255 # write to rows
bright = img + 50 # returns Tensor
normalized = img / 255.0 # returns Tensor
See Tensor for the full Pythonic API (reshape, arithmetic, reductions, type conversion).
Design Decisions
Why pool-backed shared memory instead of serialized byte buffers? Serializing a 1080p RGB image (6 MB) takes ~2 ms and doubles memory usage (sender buffer + receiver buffer). With pool-backed shared memory, only the 64-byte descriptor is copied; the pixel data stays in one place and every subscriber maps the same physical memory. Latency stays under 10 us regardless of resolution.
Why fixed encoding enums instead of arbitrary format strings? Fixed enums enable compile-time size calculations (step = width * bytes_per_pixel) and prevent encoding mismatches between publisher and subscriber. The enum covers all common camera output formats; for exotic encodings, use GenericMessage with manual layout.
Why from_numpy() copies but to_numpy() doesn't? Writing into the shared memory pool requires placing data at a specific pool slot. from_numpy() copies once into that slot. Reading (to_numpy()) returns a view into the existing pool memory — no copy needed. One copy on publish, zero copies on subscribe.
See Also
- Tensor — General-purpose tensor with full Pythonic API
- Image (Stdlib) — Image message type overview
- PointCloud (Python) — 3D point cloud data
- DepthImage (Python) — Depth maps
- Python CV Node Recipe — Computer vision with Python
- ML Utilities — ML framework integration