Operations over Tensor Collections#

This section highlights some important utility functions and classes used in the codebase for working with collections of tensors.


Most models in robomimic operate on nested tensor dictionaries, both for input and training labels. We provide a suite of utilities to work with these dictionaries in robomimic.utils.tensor_utils. For example, given a numpy dictionary of observations:

import numpy as np

x = {
    'image': np.random.randn(3, 224, 224),
    'proprio': {
        'eef_pos': np.random.randn(3),
        'eef_rot': np.random.randn(3)

For example, we can use robomimic.utils.tensor_utils to convert them to pytorch tensors, add a batch dimension, and send them to GPU:

import torch
import robomimic.utils.tensor_utils as TensorUtils

# Converts all numpy arrays in nested dictionary or list or tuple to torch tensors
x = TensorUtils.to_tensor(x)  

# add a batch dimension to all tensors in the dict
x = TensorUtils.to_batch(x)

# send all nested tensors to GPU (if available)
x = TensorUtils.to_device(x, torch.device("cuda:0"))

The library also supports nontrivial shape operations on the nested dict. For example:

# create a new dimension at dim=1 and expand the dimension size to 10
x = TensorUtils.unsqueeze_expand_at(x, size=10, dim=1)  
# x["rgb"].shape == torch.Size([1, 10, 3, 224, 224])

# repeat the 0-th dimension 10 times
x = TensorUtils.repeat_by_expand_at(x, repeats=10, dim=0)  
# x["rgb"].shape == torch.Size([10, 10, 3, 224, 224])

# gather the sequence dimension (dim=1) by some index
x = TensorUtils.gather_sequence(x_seq, indices=torch.arange(10)) 
# x["rgb"].shape == torch.Size([10, 3, 224, 224])

In addition, map_tensor allows applying an arbitrary function to all tensors in a nested dictionary or list of tensors and returns the same nested structure.

x = TensorUtils.map_tensor(x, your_func)

The complete documentation of robomimic.utils.tensor_utils.py is available here.


robomimic.utils.obs_utils implements a suite of utility functions to preprocess different observation modalities such as images and functions to determine types of observations in order to create suitable encoder network architectures. Below we list the important functions.

  • initialize_obs_utils_with_obs_specs(obs_modality_specs)

    This function initialize a global registry of mapping between observation key names and observation modalities e.g. which ones are low-dimensional, and which ones are rgb images). For example, given an obs_modality_specs of the following format:

        "obs": {
            "low_dim": ["robot0_eef_pos", "robot0_eef_quat"],
            "rgb": ["agentview_image", "robot0_eye_in_hand"],
        "goal": {
            "low_dim": ["robot0_eef_pos"],
            "rgb": ["agentview_image"]

    The function will create a mapping between observation names such as 'agentview_image' and observation modalities such as 'rgb'. The registry is stored in OBS_MODALITIES_TO_KEYS and can be accessed globally. Utility functions such as key_is_obs_modality() rely on this global registry to determine observation modalities.

  • process_obs(obs_dict)

    Preprocess a dictionary of observations to be fed to a neural network. For example, image observations will be casted to float format, rescaled to [0-1], and axis-transposed to [C, H, W] format.

  • unprocess_obs(obs_dict)

    Revert the preprocessing transformation applied to observations by process_obs. Useful for converting images back to uint8 for efficient storage.

  • normalize_obs(obs_dict, obs_normalization_stats)

    Normalize observations by computing the mean observation and std of each observation (in each dimension and observation key), and normalizing unit mean and variance in each dimension.