Tensor

A lightweight tensor descriptor for zero-copy ML data sharing across nodes and processes.

use horus::prelude::*;

Most users do not need Tensor directly. For camera images use Image, for 3D points use PointCloud, for depth data use DepthImage. Tensor is for advanced ML pipelines where you need direct control over shape, dtype, and layout — for example, feeding preprocessed batches into a model or reading raw model outputs.

Overview

Tensor is a lightweight descriptor that references data in shared memory. Only the descriptor is transmitted through topics — the actual tensor data stays in-place, enabling zero-copy transport for large ML payloads.

Methods

MethodReturn TypeDescription
shape()&[u64]Tensor dimensions (e.g., [1080, 1920, 3])
strides()&[u64]Byte strides per dimension
numel()u64Total number of elements
nbytes()u64Total size in bytes (numel * dtype.element_size())
dtype()TensorDtypeElement data type
device()DeviceDevice location (CPU or CUDA)
is_cpu()boolTrue if data resides on CPU / shared memory
is_cuda()boolTrue if device descriptor is set to CUDA
is_contiguous()boolTrue if memory layout is C-contiguous
view(new_shape)Option<Self>Reshape without copying (fails if not contiguous or element count changes)
slice_first_dim(start, end)Option<Self>Slice along the first dimension, adjusting strides

Reshape and Slice

let topic: Topic<Tensor> = Topic::new("model.input")?;

if let Some(tensor) = topic.recv() {
    // Reshape a flat 1D tensor into a batch of images
    if let Some(reshaped) = tensor.view(&[4, 3, 224, 224]) {
        println!("Batch shape: {:?}", reshaped.shape()); // [4, 3, 224, 224]
    }

    // Take the first 2 items from a batch
    if let Some(sliced) = tensor.slice_first_dim(0, 2) {
        println!("Sliced shape: {:?}", sliced.shape()); // [2, 3, 224, 224]
    }
}

TensorDtype

Supported element types with sizes and common use cases:

DtypeSizeUse Case
F324 bytesML training and inference
F648 bytesHigh-precision computation
F162 bytesMemory-efficient inference
BF162 bytesTraining on modern GPUs
I81 byteQuantized inference
I162 bytesAudio, sensor data
I324 bytesGeneral integer
I648 bytesLarge signed values
U81 byteImages
U162 bytesDepth sensors (mm)
U324 bytesLarge indices
U648 bytesCounters, timestamps
Bool1 byteMasks

TensorDtype Methods

let dtype = TensorDtype::F32;

// Size in bytes
assert_eq!(dtype.element_size(), 4);

// Display (lowercase string representation)
println!("{}", dtype); // "float32"

// Parse from string — accepts common aliases
let parsed = TensorDtype::parse("float32").unwrap(); // F32
let parsed = TensorDtype::parse("f16").unwrap();     // F16
let parsed = TensorDtype::parse("uint8").unwrap();   // U8
let parsed = TensorDtype::parse("bool").unwrap();     // Bool

Device

Fixed-size device descriptor supporting CPU and CUDA device tags. Device is metadata only — Device::cuda(N) tags a tensor with a device target but does not allocate GPU memory (GPU tensor pools are not yet implemented).

// Constructors
let cpu = Device::cpu();
let gpu0 = Device::cuda(0);  // Descriptor only — no GPU allocation

// Check device type
assert!(cpu.is_cpu());
assert!(gpu0.is_cuda());

// Display
println!("{}", gpu0); // "cuda:0"

// Parse from string
let dev = Device::parse("cpu").unwrap();
let dev = Device::parse("cuda:0").unwrap();

ML Pipeline Example

A camera node captures frames using Image, while a preprocessing node converts them into batched Tensor data for model inference:

use horus::prelude::*;

// Producer: camera capture node — uses Image, not raw Tensor
node! {
    CameraNode {
        pub { frames: Image -> "camera.rgb" }
        data { frame_count: u64 = 0 }

        tick {
            let image = Image::new(640, 480, ImageEncoding::Rgb8);
            // ... fill pixel data from camera driver ...
            self.frames.send(&image);
            self.frame_count += 1;
        }
    }
}

// Preprocessor: converts Image frames into batched Tensor for ML
node! {
    PreprocessNode {
        sub { frames: Image -> "camera.rgb" }
        pub { batch: Tensor -> "model.input" }
        data { buffer: Vec<Image> = Vec::new() }

        tick {
            if let Some(img) = self.frames.recv() {
                self.buffer.push(img);

                if self.buffer.len() >= 4 {
                    // Build a [4, 3, 224, 224] F32 batch tensor for the model
                    let tensor = Tensor::from_shape(
                        &[4, 3, 224, 224],
                        TensorDtype::F32,
                        Device::cpu(),
                    );
                    // ... resize, normalize, and copy frames into tensor ...
                    self.batch.send(&tensor);
                    self.buffer.clear();
                }
            }
        }
    }
}

// Consumer: inference node — works with raw Tensor input/output
node! {
    InferenceNode {
        sub { input: Tensor -> "model.input" }
        pub { detections: GenericMessage -> "model.detections" }

        tick {
            if let Some(tensor) = self.input.recv() {
                hlog!(debug, "Input: {:?}, {} bytes", tensor.shape(), tensor.nbytes());
                // Run inference on the batch tensor, publish results ...
            }
        }
    }
}

Python Usage

In Python, use Image, PointCloud, or DepthImage for zero-copy tensor data — they wrap the pool-backed tensor system automatically and provide .to_numpy(), .to_torch(), .to_jax() conversions:

import horus
import numpy as np

# Image → NumPy (zero-copy)
img = horus.Image(480, 640, "rgb8")
arr = img.to_numpy()  # shape: (480, 640, 3), dtype: uint8

# PointCloud → PyTorch (zero-copy via DLPack)
cloud = horus.PointCloud.from_numpy(np.random.randn(1000, 3).astype(np.float32))
tensor = cloud.to_torch()

# DepthImage → JAX
depth = horus.DepthImage(480, 640, "float32")
jax_arr = depth.to_jax()

See Python Memory Types for the full API.


TensorDtype

Enumerates all supported tensor element types. Matches common ML framework dtypes for seamless interop with PyTorch, NumPy, JAX, and DLPack.

VariantValueSizeNumPyUse Case
F3204 bytes<f4Default for most ML models
F6418 bytes<f8High-precision computation
F1622 bytes<f2GPU inference, mixed precision
BF1632 bytes<V2Training, transformer models
I841 byte|i1Quantized models
I1652 bytes<i2Audio, depth sensors
I3264 bytes<i4Labels, indices
I6478 bytes<i8Timestamps, large indices
U881 byte|u1Images (RGB pixels)
U1692 bytes<u2Depth images (mm)
U32104 bytes<u4Point cloud indices
U64118 bytes<u8Large counters
Bool121 byte|b1Masks, flags

Methods

use horus::prelude::*;

let dtype = TensorDtype::F32;

// Element size in bytes
assert_eq!(dtype.element_size(), 4);

// NumPy type string (for __array_interface__)
assert_eq!(dtype.numpy_typestr(), "<f4");

// DLPack interop (code, bits, lanes)
let (code, bits, lanes) = dtype.to_dlpack();
assert_eq!((code, bits, lanes), (2, 32, 1)); // FLOAT, 32-bit, 1 lane

// Round-trip from DLPack
let recovered = TensorDtype::from_dlpack(code, bits, lanes);
assert_eq!(recovered, Some(TensorDtype::F32));

// Parse from string
let parsed = TensorDtype::parse("f16");
assert_eq!(parsed, Some(TensorDtype::F16));

// Safe construction from raw u8 (for SHM/network data)
let safe = TensorDtype::from_raw(255); // invalid → defaults to F32
assert_eq!(safe, TensorDtype::F32);

Device

Specifies where tensor data resides: CPU or CUDA GPU. Supports unlimited GPU indices.

Layout

device_type (1 byte): 0=CPU, 1=CUDA
_pad        (3 bytes): alignment
device_id   (4 bytes): GPU index (0 for CPU)
Total: 8 bytes

Methods

use horus::prelude::*;

// CPU (default)
let cpu = Device::cpu();
assert!(cpu.is_cpu());

// CUDA GPU 0
let gpu0 = Device::cuda(0);
assert!(gpu0.is_cuda());

// CUDA GPU 3 (multi-GPU)
let gpu3 = Device::cuda(3);

// Parse from string
let dev = Device::parse("cuda:1").unwrap();
assert!(dev.is_cuda());
assert_eq!(dev.device_id, 1);

// Also supports "gpu" alias
let dev = Device::parse("gpu:0").unwrap();
assert!(dev.is_cuda());

// Constants
let _ = Device::CPU;   // CPU constant
let _ = Device::CUDA0; // CUDA device 0 constant

DLPack Interop

use horus::prelude::*;

let device = Device::cuda(2);

// Convert to DLPack format
let dl_type = device.to_dlpack_device_type(); // 2 = kDLCUDA
let dl_id = device.to_dlpack_device_id();     // 2

// Create from DLPack
let recovered = Device::from_dlpack(2, 2); // kDLCUDA, device 2
assert_eq!(recovered, Some(Device::cuda(2)));

String Parsing

InputResult
"cpu"Device::cpu()
"cuda" or "gpu"Device::cuda(0)
"cuda:0" or "gpu:0"Device::cuda(0)
"cuda:3"Device::cuda(3)
"tpu"None (unsupported)

See Also