robomimic.models package#

Submodules#

robomimic.models.base_nets module#

Contains torch Modules that correspond to basic network building blocks, like MLP, RNN, and CNN backbones.

class robomimic.models.base_nets.Conv1dBase(input_channel=1, activation='relu', out_channels=(32, 64, 64), kernel_size=(8, 4, 2), stride=(4, 2, 1), **conv_kwargs)#

Bases: robomimic.models.base_nets.Module

Base class for stacked Conv1d layers.

Parameters:
  • input_channel (int) – Number of channels for inputs to this network

  • activation (None or str) – Per-layer activation to use. Defaults to “relu”. Valid options are currently {relu, None} for no activation

  • out_channels (list of int) – Output channel size for each sequential Conv1d layer

  • kernel_size (list of int) – Kernel sizes for each sequential Conv1d layer

  • stride (list of int) – Stride sizes for each sequential Conv1d layer

  • conv_kwargs (dict) – additional nn.Conv1D args to use, in list form, where the ith element corresponds to the argument to be passed to the ith Conv1D layer. See https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html for specific possible arguments.

forward(inputs)#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

output_shape(input_shape)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.base_nets.ConvBase#

Bases: robomimic.models.base_nets.Module

Base class for ConvNets.

forward(inputs)#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

output_shape(input_shape)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.base_nets.CoordConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', coord_encoding='position')#

Bases: torch.nn.modules.conv.Conv2d, robomimic.models.base_nets.Module

2D Coordinate Convolution

Source: An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution https://arxiv.org/abs/1807.03247 (e.g. adds 2 channels per input feature map corresponding to (x, y) location on map)

bias: Optional[torch.Tensor]#
dilation: Tuple[int, ...]#
forward(input)#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

groups: int#
kernel_size: Tuple[int, ...]#
out_channels: int#
output_padding: Tuple[int, ...]#
output_shape(input_shape)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

padding: Tuple[int, ...]#
padding_mode: str#
stride: Tuple[int, ...]#
transposed: bool#
weight: torch.Tensor#
class robomimic.models.base_nets.FeatureAggregator(dim=1, agg_type='avg')#

Bases: robomimic.models.base_nets.Module

Helpful class for aggregating features across a dimension. This is useful in practice when training models that break an input image up into several patches since features can be extraced per-patch using the same encoder and then aggregated using this module.

clear_weight()#
forward(x)#

Forward pooling pass.

output_shape(input_shape)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

set_weight(w)#
training: bool#
class robomimic.models.base_nets.MLP(input_dim, output_dim, layer_dims=(), layer_func=<class 'torch.nn.modules.linear.Linear'>, layer_func_kwargs=None, activation=<class 'torch.nn.modules.activation.ReLU'>, dropouts=None, normalization=False, output_activation=None)#

Bases: robomimic.models.base_nets.Module

Base class for simple Multi-Layer Perceptrons.

forward(inputs)#

Forward pass.

output_shape(input_shape=None)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.base_nets.MVPConv(input_channel=3, mvp_model_class='vitb-mae-egosoup', freeze=True)#

Bases: robomimic.models.base_nets.ConvBase

Base class for ConvNets pretrained with MVP (https://arxiv.org/abs/2203.06173)

forward(inputs)#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

output_shape(input_shape)#

Function to compute output shape from inputs to this module. :param input_shape: shape of input. Does not include batch dimension.

Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.base_nets.Module#

Bases: torch.nn.modules.module.Module

Base class for networks. The only difference from torch.nn.Module is that it requires implementing @output_shape.

abstract output_shape(input_shape=None)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.base_nets.Parameter(init_tensor)#

Bases: robomimic.models.base_nets.Module

A class that is a thin wrapper around a torch.nn.Parameter to make for easy saving and optimization.

forward(inputs=None)#

Forward call just returns the parameter tensor.

output_shape(input_shape=None)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.base_nets.R3MConv(input_channel=3, r3m_model_class='resnet18', freeze=True)#

Bases: robomimic.models.base_nets.ConvBase

Base class for ConvNets pretrained with R3M (https://arxiv.org/abs/2203.12601)

output_shape(input_shape)#

Function to compute output shape from inputs to this module. :param input_shape: shape of input. Does not include batch dimension.

Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.base_nets.RNN_Base(input_dim, rnn_hidden_dim, rnn_num_layers, rnn_type='LSTM', rnn_kwargs=None, per_step_net=None)#

Bases: robomimic.models.base_nets.Module

A wrapper class for a multi-step RNN and a per-step network.

forward(inputs, rnn_init_state=None, return_state=False)#

Forward a sequence of inputs through the RNN and the per-step network.

Parameters:
  • inputs (torch.Tensor) – tensor input of shape [B, T, D], where D is the RNN input size

  • rnn_init_state – rnn hidden state, initialize to zero state if set to None

  • return_state (bool) – whether to return hidden state

Returns:

outputs of the per_step_net

rnn_state: return rnn state at the end if return_state is set to True

Return type:

outputs

forward_step(inputs, rnn_state)#

Forward a single step input through the RNN and per-step network, and return the new hidden state. :param inputs: tensor input of shape [B, D], where D is the RNN input size :type inputs: torch.Tensor :param rnn_state: rnn hidden state, initialize to zero state if set to None

Returns:

outputs of the per_step_net

rnn_state: return the new rnn state

Return type:

outputs

get_rnn_init_state(batch_size, device)#

Get a default RNN state (zeros) :param batch_size: batch size dimension :type batch_size: int :param device: device the hidden state should be sent to.

Returns:

returns hidden state tensor or tuple of hidden state tensors

depending on the RNN type

Return type:

hidden_state (torch.Tensor or tuple)

output_shape(input_shape)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

property rnn_type#
training: bool#
class robomimic.models.base_nets.ResNet18Conv(input_channel=3, pretrained=False, input_coord_conv=False)#

Bases: robomimic.models.base_nets.ConvBase

A ResNet18 block that can be used to process input images.

output_shape(input_shape)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.base_nets.Sequential(*args, has_output_shape=True)#

Bases: torch.nn.modules.container.Sequential, robomimic.models.base_nets.Module

Compose multiple Modules together (defined above).

freeze()#
output_shape(input_shape=None)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

train(mode)#

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters:

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns:

self

Return type:

Module

training: bool#
class robomimic.models.base_nets.ShallowConv(input_channel=3, output_channel=32)#

Bases: robomimic.models.base_nets.ConvBase

A shallow convolutional encoder from https://rll.berkeley.edu/dsae/dsae.pdf

output_shape(input_shape)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.base_nets.SpatialMeanPool(input_shape)#

Bases: robomimic.models.base_nets.Module

Module that averages inputs across all spatial dimensions (dimension 2 and after), leaving only the batch and channel dimensions.

forward(inputs)#

Forward pass - average across all dimensions except batch and channel.

output_shape(input_shape=None)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.base_nets.SpatialSoftmax(input_shape, num_kp=32, temperature=1.0, learnable_temperature=False, output_variance=False, noise_std=0.0)#

Bases: robomimic.models.base_nets.ConvBase

Spatial Softmax Layer.

Based on Deep Spatial Autoencoders for Visuomotor Learning by Finn et al. https://rll.berkeley.edu/dsae/dsae.pdf

forward(feature)#

Forward pass through spatial softmax layer. For each keypoint, a 2D spatial probability distribution is created using a softmax, where the support is the pixel locations. This distribution is used to compute the expected value of the pixel location, which becomes a keypoint of dimension 2. K such keypoints are created.

Returns:

mean keypoints of shape [B, K, 2], and possibly

keypoint variance of shape [B, K, 2, 2] corresponding to the covariance under the 2D spatial softmax distribution

Return type:

out (torch.Tensor or tuple)

output_shape(input_shape)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.base_nets.Squeeze(dim)#

Bases: robomimic.models.base_nets.Module

Trivial class that squeezes the input. Useful for including in a nn.Sequential network

forward(x)#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

output_shape(input_shape=None)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.base_nets.Unsqueeze(dim)#

Bases: robomimic.models.base_nets.Module

Trivial class that unsqueezes the input. Useful for including in a nn.Sequential network

forward(x)#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

output_shape(input_shape=None)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
robomimic.models.base_nets.rnn_args_from_config(rnn_config)#

Takes a Config object corresponding to RNN settings (for example config.algo.rnn in BCConfig) and extracts rnn kwargs for instantiating rnn networks.

robomimic.models.base_nets.transformer_args_from_config(transformer_config)#

Takes a Config object corresponding to Transformer settings (for example config.algo.transformer in BCConfig) and extracts transformer kwargs for instantiating transformer networks.

robomimic.models.distributions module#

Contains distribution models used as parts of other networks. These classes usually inherit or emulate torch distributions.

class robomimic.models.distributions.DiscreteValueDistribution(values, probs=None, logits=None)#

Bases: object

Extension to torch categorical probability distribution in order to keep track of the support (categorical values, or in this case, value atoms). This is used for distributional value networks.

property logits#
mean()#

Categorical distribution mean, taking the value support into account.

property probs#
sample(sample_shape=torch.Size([]))#

Sample from the distribution. Make sure to return value atoms, not categorical class indices.

property values#
variance()#

Categorical distribution variance, taking the value support into account.

class robomimic.models.distributions.TanhWrappedDistribution(base_dist, scale=1.0, epsilon=1e-06)#

Bases: torch.distributions.distribution.Distribution

Class that wraps another valid torch distribution, such that sampled values from the base distribution are passed through a tanh layer. The corresponding (log) probabilities are also modified accordingly. Tanh Normal distribution - adapted from rlkit and CQL codebase (https://github.com/aviralkumar2907/CQL/blob/d67dbe9cf5d2b96e3b462b6146f249b3d6569796/d4rl/rlkit/torch/distributions.py#L6).

log_prob(value, pre_tanh_value=None)#
Parameters:
  • value (torch.Tensor) – some tensor to compute log probabilities for

  • pre_tanh_value – If specified, will not calculate atanh manually from @value. More numerically stable

property mean#

Returns the mean of the distribution.

rsample(sample_shape=torch.Size([]), return_pretanh_value=False)#

Sampling in the reparameterization case - for differentiable samples.

sample(sample_shape=torch.Size([]), return_pretanh_value=False)#

Gradients will and should not pass through this operation. See https://github.com/pytorch/pytorch/issues/4620 for discussion.

property stddev#

Returns the standard deviation of the distribution.

robomimic.models.obs_core module#

Contains torch Modules for core observation processing blocks such as encoders (e.g. EncoderCore, VisualCore, ScanCore, …) and randomizers (e.g. Randomizer, CropRandomizer).

class robomimic.models.obs_core.ColorRandomizer(input_shape, brightness=0.3, contrast=0.3, saturation=0.3, hue=0.3, num_samples=1)#

Bases: robomimic.models.obs_core.Randomizer

Randomly sample color jitter at input, and then average across color jtters at output.

get_batch_transform(N)#

Generates a batch transform, where each set of sample(s) along the batch (first) dimension will have the same @N unique ColorJitter transforms applied.

Parameters:

N (int) – Number of ColorJitter transforms to apply per set of sample(s) along the batch (first) dimension

Returns:

Aggregated transform which will autoamtically apply a different ColorJitter transforms to

each sub-set of samples along batch dimension, assumed to be the FIRST dimension in the inputted tensor Note: This function will MULTIPLY the first dimension by N

Return type:

Lambda

get_transform()#

Get a randomized transform to be applied on image.

Implementation taken directly from:

https://github.com/pytorch/vision/blob/2f40a483d73018ae6e1488a484c5927f2b309969/torchvision/transforms/transforms.py#L1053-L1085

Returns:

Transform which randomly adjusts brightness, contrast and saturation in a random order.

Return type:

Transform

output_shape_in(input_shape=None)#

Function to compute output shape from inputs to this module. Corresponds to the @forward_in operation, where raw inputs (usually observation modalities) are passed in.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

output_shape_out(input_shape=None)#

Function to compute output shape from inputs to this module. Corresponds to the @forward_out operation, where processed inputs (usually encoded observation modalities) are passed in.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.obs_core.CropRandomizer(input_shape, crop_height=76, crop_width=76, num_crops=1, pos_enc=False)#

Bases: robomimic.models.obs_core.Randomizer

Randomly sample crops at input, and then average across crop features at output.

output_shape_in(input_shape=None)#

Function to compute output shape from inputs to this module. Corresponds to the @forward_in operation, where raw inputs (usually observation modalities) are passed in.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

output_shape_out(input_shape=None)#

Function to compute output shape from inputs to this module. Corresponds to the @forward_out operation, where processed inputs (usually encoded observation modalities) are passed in.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.obs_core.EncoderCore(input_shape)#

Bases: robomimic.models.base_nets.Module

Abstract class used to categorize all cores used to encode observations

training: bool#
class robomimic.models.obs_core.GaussianNoiseRandomizer(input_shape, noise_mean=0.0, noise_std=0.3, limits=None, num_samples=1)#

Bases: robomimic.models.obs_core.Randomizer

Randomly sample gaussian noise at input, and then average across noises at output.

output_shape_in(input_shape=None)#

Function to compute output shape from inputs to this module. Corresponds to the @forward_in operation, where raw inputs (usually observation modalities) are passed in.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

output_shape_out(input_shape=None)#

Function to compute output shape from inputs to this module. Corresponds to the @forward_out operation, where processed inputs (usually encoded observation modalities) are passed in.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.obs_core.Randomizer#

Bases: robomimic.models.base_nets.Module

Base class for randomizer networks. Each randomizer should implement the @output_shape_in, @output_shape_out, @forward_in, and @forward_out methods. The randomizer’s @forward_in method is invoked on raw inputs, and @forward_out is invoked on processed inputs (usually processed by a @VisualCore instance). Note that the self.training property can be used to change the randomizer’s behavior at train vs. test time.

forward_in(inputs)#

Randomize raw inputs if training.

forward_out(inputs)#

Processing for network outputs.

output_shape(input_shape=None)#

This function is unused. See @output_shape_in and @output_shape_out.

abstract output_shape_in(input_shape=None)#

Function to compute output shape from inputs to this module. Corresponds to the @forward_in operation, where raw inputs (usually observation modalities) are passed in.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

abstract output_shape_out(input_shape=None)#

Function to compute output shape from inputs to this module. Corresponds to the @forward_out operation, where processed inputs (usually encoded observation modalities) are passed in.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.obs_core.ScanCore(input_shape, conv_kwargs=None, conv_activation='relu', pool_class=None, pool_kwargs=None, flatten=True, feature_dimension=None)#

Bases: robomimic.models.obs_core.EncoderCore, robomimic.models.base_nets.ConvBase

A network block that combines a Conv1D backbone network with optional pooling and linear layers.

forward(inputs)#

Forward pass through visual core.

output_shape(input_shape)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.obs_core.VisualCore(input_shape, backbone_class='ResNet18Conv', pool_class='SpatialSoftmax', backbone_kwargs=None, pool_kwargs=None, flatten=True, feature_dimension=64)#

Bases: robomimic.models.obs_core.EncoderCore, robomimic.models.base_nets.ConvBase

A network block that combines a visual backbone network with optional pooling and linear layers.

forward(inputs)#

Forward pass through visual core.

output_shape(input_shape)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#

robomimic.models.obs_nets module#

Contains torch Modules that help deal with inputs consisting of multiple modalities. This is extremely common when networks must deal with one or more observation dictionaries, where each input dictionary can have observation keys of a certain modality and shape.

As an example, an observation could consist of a flat “robot0_eef_pos” observation key, and a 3-channel RGB “agentview_image” observation key.

class robomimic.models.obs_nets.MIMO_MLP(input_obs_group_shapes, output_shapes, layer_dims, layer_func=<class 'torch.nn.modules.linear.Linear'>, activation=<class 'torch.nn.modules.activation.ReLU'>, encoder_kwargs=None)#

Bases: robomimic.models.base_nets.Module

Extension to MLP to accept multiple observation dictionaries as input and to output dictionaries of tensors. Inputs are specified as a dictionary of observation dictionaries, with each key corresponding to an observation group.

This module utilizes @ObservationGroupEncoder to process the multiple input dictionaries and @ObservationDecoder to generate tensor dictionaries. The default behavior for encoding the inputs is to process visual inputs with a learned CNN and concatenating the flat encodings with the other flat inputs. The default behavior for generating outputs is to use a linear layer branch to produce each modality separately (including visual outputs).

forward(**inputs)#

Process each set of inputs in its own observation group.

Parameters:

inputs (dict) – a dictionary of dictionaries with one dictionary per observation group. Each observation group’s dictionary should map modality to torch.Tensor batches. Should be consistent with @self.input_obs_group_shapes.

Returns:

dictionary of output torch.Tensors, that corresponds

to @self.output_shapes

Return type:

outputs (dict)

output_shape(input_shape=None)#

Returns output shape for this module, which is a dictionary instead of a list since outputs are dictionaries.

training: bool#
class robomimic.models.obs_nets.MIMO_Transformer(input_obs_group_shapes, output_shapes, transformer_embed_dim, transformer_num_layers, transformer_num_heads, transformer_context_length, transformer_emb_dropout=0.1, transformer_attn_dropout=0.1, transformer_block_output_dropout=0.1, transformer_sinusoidal_embedding=False, transformer_activation='gelu', transformer_nn_parameter_for_timesteps=False, encoder_kwargs=None)#

Bases: robomimic.models.base_nets.Module

Extension to Transformer (based on GPT architecture) to accept multiple observation dictionaries as input and to output dictionaries of tensors. Inputs are specified as a dictionary of observation dictionaries, with each key corresponding to an observation group. This module utilizes @ObservationGroupEncoder to process the multiple input dictionaries and @ObservationDecoder to generate tensor dictionaries. The default behavior for encoding the inputs is to process visual inputs with a learned CNN and concatenating the flat encodings with the other flat inputs. The default behavior for generating outputs is to use a linear layer branch to produce each modality separately (including visual outputs).

embed_timesteps(embeddings)#

Computes timestep-based embeddings (aka positional embeddings) to add to embeddings. :param embeddings: embeddings prior to positional embeddings are computed :type embeddings: torch.Tensor

Returns:

positional embeddings to add to embeddings

Return type:

time_embeddings (torch.Tensor)

forward(**inputs)#

Process each set of inputs in its own observation group. :param inputs: a dictionary of dictionaries with one dictionary per

observation group. Each observation group’s dictionary should map modality to torch.Tensor batches. Should be consistent with @self.input_obs_group_shapes. First two leading dimensions should be batch and time [B, T, …] for each tensor.

Returns:

dictionary of output torch.Tensors, that corresponds

to @self.output_shapes. Leading dimensions will be batch and time [B, T, …] for each tensor.

Return type:

outputs (dict)

input_embedding(inputs)#

Process encoded observations into embeddings to pass to transformer, Adds timestep-based embeddings (aka positional embeddings) to inputs. :param inputs: outputs from observation encoder :type inputs: torch.Tensor

Returns:

input embeddings to pass to transformer backbone.

Return type:

embeddings (torch.Tensor)

output_shape(input_shape=None)#

Returns output shape for this module, which is a dictionary instead of a list since outputs are dictionaries.

training: bool#
class robomimic.models.obs_nets.ObservationDecoder(decode_shapes, input_feat_dim)#

Bases: robomimic.models.base_nets.Module

Module that can generate observation outputs by modality. Inputs are assumed to be flat (usually outputs from some hidden layer). Each observation output is generated with a linear layer from these flat inputs. Subclass this module in order to implement more complex schemes for generating each modality.

forward(feats)#

Predict each modality from input features, and reshape to each modality’s shape.

output_shape(input_shape=None)#

Returns output shape for this module, which is a dictionary instead of a list since outputs are dictionaries.

training: bool#
class robomimic.models.obs_nets.ObservationEncoder(feature_activation=<class 'torch.nn.modules.activation.ReLU'>)#

Bases: robomimic.models.base_nets.Module

Module that processes inputs by observation key and then concatenates the processed observation keys together. Each key is processed with an encoder head network. Call @register_obs_key to register observation keys with the encoder and then finally call @make to create the encoder networks.

forward(obs_dict)#

Processes modalities according to the ordering in @self.obs_shapes. For each modality, it is processed with a randomizer (if present), an encoder network (if present), and again with the randomizer (if present), flattened, and then concatenated with the other processed modalities.

Parameters:

obs_dict (OrderedDict) – dictionary that maps modalities to torch.Tensor batches that agree with @self.obs_shapes. All modalities in @self.obs_shapes must be present, but additional modalities can also be present.

Returns:

flat features of shape [B, D]

Return type:

feats (torch.Tensor)

make()#

Creates the encoder networks and locks the encoder so that more modalities cannot be added.

output_shape(input_shape=None)#

Compute the output shape of the encoder.

register_obs_key(name, shape, net_class=None, net_kwargs=None, net=None, randomizer=None, share_net_from=None)#

Register an observation key that this encoder should be responsible for.

Parameters:
  • name (str) – modality name

  • shape (int tuple) – shape of modality

  • net_class (str) – name of class in base_nets.py that should be used to process this observation key before concatenation. Pass None to flatten and concatenate the observation key directly.

  • net_kwargs (dict) – arguments to pass to @net_class

  • net (Module instance) – if provided, use this Module to process the observation key instead of creating a different net

  • randomizer (Randomizer instance) – if provided, use this Module to augment observation keys coming in to the encoder, and possibly augment the processed output as well

  • share_net_from (str) – if provided, use the same instance of @net_class as another observation key. This observation key must already exist in this encoder. Warning: Note that this does not share the observation key randomizer

training: bool#
class robomimic.models.obs_nets.ObservationGroupEncoder(observation_group_shapes, feature_activation=<class 'torch.nn.modules.activation.ReLU'>, encoder_kwargs=None)#

Bases: robomimic.models.base_nets.Module

This class allows networks to encode multiple observation dictionaries into a single flat, concatenated vector representation. It does this by assigning each observation dictionary (observation group) an @ObservationEncoder object.

The class takes a dictionary of dictionaries, @observation_group_shapes. Each key corresponds to a observation group (e.g. ‘obs’, ‘subgoal’, ‘goal’) and each OrderedDict should be a map between modalities and expected input shapes (e.g. { ‘image’ : (3, 120, 160) }).

forward(**inputs)#

Process each set of inputs in its own observation group.

Parameters:

inputs (dict) – dictionary that maps observation groups to observation dictionaries of torch.Tensor batches that agree with @self.observation_group_shapes. All observation groups in @self.observation_group_shapes must be present, but additional observation groups can also be present. Note that these are specified as kwargs for ease of use with networks that name each observation stream in their forward calls.

Returns:

flat outputs of shape [B, D]

Return type:

outputs (torch.Tensor)

output_shape()#

Compute the output shape of this encoder.

training: bool#
class robomimic.models.obs_nets.RNN_MIMO_MLP(input_obs_group_shapes, output_shapes, mlp_layer_dims, rnn_hidden_dim, rnn_num_layers, rnn_type='LSTM', rnn_kwargs=None, mlp_activation=<class 'torch.nn.modules.activation.ReLU'>, mlp_layer_func=<class 'torch.nn.modules.linear.Linear'>, per_step=True, encoder_kwargs=None)#

Bases: robomimic.models.base_nets.Module

A wrapper class for a multi-step RNN and a per-step MLP and a decoder.

Structure: [encoder -> rnn -> mlp -> decoder]

All temporal inputs are processed by a shared @ObservationGroupEncoder, followed by an RNN, and then a per-step multi-output MLP.

forward(rnn_init_state=None, return_state=False, **inputs)#
Parameters:
  • inputs (dict) – a dictionary of dictionaries with one dictionary per observation group. Each observation group’s dictionary should map modality to torch.Tensor batches. Should be consistent with @self.input_obs_group_shapes. First two leading dimensions should be batch and time [B, T, …] for each tensor.

  • rnn_init_state – rnn hidden state, initialize to zero state if set to None

  • return_state (bool) – whether to return hidden state

Returns:

dictionary of output torch.Tensors, that corresponds

to @self.output_shapes. Leading dimensions will be batch and time [B, T, …] for each tensor.

rnn_state (torch.Tensor or tuple): return the new rnn state (if @return_state)

Return type:

outputs (dict)

forward_step(rnn_state, **inputs)#

Unroll network over a single timestep.

Parameters:
  • inputs (dict) – expects same modalities as @self.input_shapes, with additional batch dimension (but NOT time), since this is a single time step.

  • rnn_state (torch.Tensor) – rnn hidden state

Returns:

dictionary of output torch.Tensors, that corresponds

to @self.output_shapes. Does not contain time dimension.

rnn_state: return the new rnn state

Return type:

outputs (dict)

get_rnn_init_state(batch_size, device)#

Get a default RNN state (zeros)

Parameters:
  • batch_size (int) – batch size dimension

  • device – device the hidden state should be sent to.

Returns:

returns hidden state tensor or tuple of hidden state tensors

depending on the RNN type

Return type:

hidden_state (torch.Tensor or tuple)

output_shape(input_shape)#

Returns output shape for this module, which is a dictionary instead of a list since outputs are dictionaries.

Parameters:

input_shape (dict) – dictionary of dictionaries, where each top-level key corresponds to an observation group, and the low-level dictionaries specify the shape for each modality in an observation dictionary

training: bool#
robomimic.models.obs_nets.obs_encoder_factory(obs_shapes, feature_activation=<class 'torch.nn.modules.activation.ReLU'>, encoder_kwargs=None)#

Utility function to create an @ObservationEncoder from kwargs specified in config.

Parameters:
  • obs_shapes (OrderedDict) – a dictionary that maps observation key to expected shapes for observations.

  • feature_activation – non-linearity to apply after each obs net - defaults to ReLU. Pass None to apply no activation.

  • encoder_kwargs (dict or None) –

    If None, results in default encoder_kwargs being applied. Otherwise, should be nested dictionary containing relevant per-modality information for encoder networks. Should be of form:

    obs_modality1: dict

    feature_dimension: int core_class: str core_kwargs: dict

    obs_randomizer_class: str obs_randomizer_kwargs: dict

    obs_modality2: dict

robomimic.models.policy_nets module#

Contains torch Modules for policy networks. These networks take an observation dictionary as input (and possibly additional conditioning, such as subgoal or goal dictionaries) and produce action predictions, samples, or distributions as outputs. Note that actions are assumed to lie in [-1, 1], and most networks will have a final tanh activation to help ensure this range.

class robomimic.models.policy_nets.ActorNetwork(obs_shapes, ac_dim, mlp_layer_dims, goal_shapes=None, encoder_kwargs=None)#

Bases: robomimic.models.obs_nets.MIMO_MLP

A basic policy network that predicts actions from observations. Can optionally be goal conditioned on future observations.

forward(obs_dict, goal_dict=None)#

Process each set of inputs in its own observation group.

Parameters:

inputs (dict) – a dictionary of dictionaries with one dictionary per observation group. Each observation group’s dictionary should map modality to torch.Tensor batches. Should be consistent with @self.input_obs_group_shapes.

Returns:

dictionary of output torch.Tensors, that corresponds

to @self.output_shapes

Return type:

outputs (dict)

output_shape(input_shape=None)#

Returns output shape for this module, which is a dictionary instead of a list since outputs are dictionaries.

training: bool#
class robomimic.models.policy_nets.GMMActorNetwork(obs_shapes, ac_dim, mlp_layer_dims, num_modes=5, min_std=0.01, std_activation='softplus', low_noise_eval=True, use_tanh=False, goal_shapes=None, encoder_kwargs=None)#

Bases: robomimic.models.policy_nets.ActorNetwork

Variant of actor network that learns a multimodal Gaussian mixture distribution over actions.

forward(obs_dict, goal_dict=None)#

Samples actions from the policy distribution.

Parameters:
  • obs_dict (dict) – batch of observations

  • goal_dict (dict) – if not None, batch of goal observations

Returns:

batch of actions from policy distribution

Return type:

action (torch.Tensor)

forward_train(obs_dict, goal_dict=None)#

Return full GMM distribution, which is useful for computing quantities necessary at train-time, like log-likelihood, KL divergence, etc.

Parameters:
  • obs_dict (dict) – batch of observations

  • goal_dict (dict) – if not None, batch of goal observations

Returns:

GMM distribution

Return type:

dist (Distribution)

training: bool#
class robomimic.models.policy_nets.GaussianActorNetwork(obs_shapes, ac_dim, mlp_layer_dims, fixed_std=False, std_activation='softplus', init_last_fc_weight=None, init_std=0.3, mean_limits=(- 9.0, 9.0), std_limits=(0.007, 7.5), low_noise_eval=True, use_tanh=False, goal_shapes=None, encoder_kwargs=None)#

Bases: robomimic.models.policy_nets.ActorNetwork

Variant of actor network that learns a diagonal unimodal Gaussian distribution over actions.

forward(obs_dict, goal_dict=None)#

Samples actions from the policy distribution.

Parameters:
  • obs_dict (dict) – batch of observations

  • goal_dict (dict) – if not None, batch of goal observations

Returns:

batch of actions from policy distribution

Return type:

action (torch.Tensor)

forward_train(obs_dict, goal_dict=None)#

Return full Gaussian distribution, which is useful for computing quantities necessary at train-time, like log-likelihood, KL divergence, etc.

Parameters:
  • obs_dict (dict) – batch of observations

  • goal_dict (dict) – if not None, batch of goal observations

Returns:

Gaussian distribution

Return type:

dist (Distribution)

training: bool#
class robomimic.models.policy_nets.PerturbationActorNetwork(obs_shapes, ac_dim, mlp_layer_dims, perturbation_scale=0.05, goal_shapes=None, encoder_kwargs=None)#

Bases: robomimic.models.policy_nets.ActorNetwork

An action perturbation network - primarily used in BCQ. It takes states and actions and returns action perturbations.

forward(obs_dict, acts, goal_dict=None)#

Forward pass through perturbation actor.

training: bool#
class robomimic.models.policy_nets.RNNActorNetwork(obs_shapes, ac_dim, mlp_layer_dims, rnn_hidden_dim, rnn_num_layers, rnn_type='LSTM', rnn_kwargs=None, goal_shapes=None, encoder_kwargs=None)#

Bases: robomimic.models.obs_nets.RNN_MIMO_MLP

An RNN policy network that predicts actions from observations.

forward(obs_dict, goal_dict=None, rnn_init_state=None, return_state=False)#

Forward a sequence of inputs through the RNN and the per-step network.

Parameters:
  • obs_dict (dict) – batch of observations - each tensor in the dictionary should have leading dimensions batch and time [B, T, …]

  • goal_dict (dict) – if not None, batch of goal observations

  • rnn_init_state – rnn hidden state, initialize to zero state if set to None

  • return_state (bool) – whether to return hidden state

Returns:

predicted action sequence rnn_state: return rnn state at the end if return_state is set to True

Return type:

actions (torch.Tensor)

forward_step(obs_dict, goal_dict=None, rnn_state=None)#

Unroll RNN over single timestep to get actions.

Parameters:
  • obs_dict (dict) – batch of observations. Should not contain time dimension.

  • goal_dict (dict) – if not None, batch of goal observations

  • rnn_state – rnn hidden state, initialize to zero state if set to None

Returns:

batch of actions - does not contain time dimension state: updated rnn state

Return type:

actions (torch.Tensor)

output_shape(input_shape)#

Returns output shape for this module, which is a dictionary instead of a list since outputs are dictionaries.

Parameters:

input_shape (dict) – dictionary of dictionaries, where each top-level key corresponds to an observation group, and the low-level dictionaries specify the shape for each modality in an observation dictionary

training: bool#
class robomimic.models.policy_nets.RNNGMMActorNetwork(obs_shapes, ac_dim, mlp_layer_dims, rnn_hidden_dim, rnn_num_layers, rnn_type='LSTM', rnn_kwargs=None, num_modes=5, min_std=0.01, std_activation='softplus', low_noise_eval=True, use_tanh=False, goal_shapes=None, encoder_kwargs=None)#

Bases: robomimic.models.policy_nets.RNNActorNetwork

An RNN GMM policy network that predicts sequences of action distributions from observation sequences.

forward(obs_dict, goal_dict=None, rnn_init_state=None, return_state=False)#

Samples actions from the policy distribution.

Parameters:
  • obs_dict (dict) – batch of observations

  • goal_dict (dict) – if not None, batch of goal observations

Returns:

batch of actions from policy distribution

Return type:

action (torch.Tensor)

forward_step(obs_dict, goal_dict=None, rnn_state=None)#

Unroll RNN over single timestep to get sampled actions.

Parameters:
  • obs_dict (dict) – batch of observations. Should not contain time dimension.

  • goal_dict (dict) – if not None, batch of goal observations

  • rnn_state – rnn hidden state, initialize to zero state if set to None

Returns:

batch of actions - does not contain time dimension state: updated rnn state

Return type:

acts (torch.Tensor)

forward_train(obs_dict, goal_dict=None, rnn_init_state=None, return_state=False)#

Return full GMM distribution, which is useful for computing quantities necessary at train-time, like log-likelihood, KL divergence, etc.

Parameters:
  • obs_dict (dict) – batch of observations

  • goal_dict (dict) – if not None, batch of goal observations

  • rnn_init_state – rnn hidden state, initialize to zero state if set to None

  • return_state (bool) – whether to return hidden state

Returns:

sequence of GMM distributions over the timesteps rnn_state: return rnn state at the end if return_state is set to True

Return type:

dists (Distribution)

forward_train_step(obs_dict, goal_dict=None, rnn_state=None)#

Unroll RNN over single timestep to get action GMM distribution, which is useful for computing quantities necessary at train-time, like log-likelihood, KL divergence, etc.

Parameters:
  • obs_dict (dict) – batch of observations. Should not contain time dimension.

  • goal_dict (dict) – if not None, batch of goal observations

  • rnn_state – rnn hidden state, initialize to zero state if set to None

Returns:

GMM action distributions state: updated rnn state

Return type:

ad (Distribution)

training: bool#
class robomimic.models.policy_nets.TransformerActorNetwork(obs_shapes, ac_dim, transformer_embed_dim, transformer_num_layers, transformer_num_heads, transformer_context_length, transformer_emb_dropout=0.1, transformer_attn_dropout=0.1, transformer_block_output_dropout=0.1, transformer_sinusoidal_embedding=False, transformer_activation='gelu', transformer_nn_parameter_for_timesteps=False, goal_shapes=None, encoder_kwargs=None)#

Bases: robomimic.models.obs_nets.MIMO_Transformer

An Transformer policy network that predicts actions from observation sequences (assumed to be frame stacked from previous observations) and possible from previous actions as well (in an autoregressive manner).

forward(obs_dict, actions=None, goal_dict=None)#

Forward a sequence of inputs through the Transformer. :param obs_dict: batch of observations - each tensor in the dictionary

should have leading dimensions batch and time [B, T, …]

Parameters:
  • actions (torch.Tensor) – batch of actions of shape [B, T, D]

  • goal_dict (dict) – if not None, batch of goal observations

Returns:

contains predicted action sequence, or dictionary

with predicted action sequence and predicted observation sequences

Return type:

outputs (torch.Tensor or dict)

output_shape(input_shape)#

Returns output shape for this module, which is a dictionary instead of a list since outputs are dictionaries.

training: bool#
class robomimic.models.policy_nets.TransformerGMMActorNetwork(obs_shapes, ac_dim, transformer_embed_dim, transformer_num_layers, transformer_num_heads, transformer_context_length, transformer_emb_dropout=0.1, transformer_attn_dropout=0.1, transformer_block_output_dropout=0.1, transformer_sinusoidal_embedding=False, transformer_activation='gelu', transformer_nn_parameter_for_timesteps=False, num_modes=5, min_std=0.01, std_activation='softplus', low_noise_eval=True, use_tanh=False, goal_shapes=None, encoder_kwargs=None)#

Bases: robomimic.models.policy_nets.TransformerActorNetwork

A Transformer GMM policy network that predicts sequences of action distributions from observation sequences (assumed to be frame stacked from previous observations).

forward(obs_dict, actions=None, goal_dict=None)#

Samples actions from the policy distribution. :param obs_dict: batch of observations :type obs_dict: dict :param actions: batch of actions :type actions: torch.Tensor :param goal_dict: if not None, batch of goal observations :type goal_dict: dict

Returns:

batch of actions from policy distribution

Return type:

action (torch.Tensor)

forward_train(obs_dict, actions=None, goal_dict=None, low_noise_eval=None)#

Return full GMM distribution, which is useful for computing quantities necessary at train-time, like log-likelihood, KL divergence, etc. :param obs_dict: batch of observations :type obs_dict: dict :param actions: batch of actions :type actions: torch.Tensor :param goal_dict: if not None, batch of goal observations :type goal_dict: dict

Returns:

sequence of GMM distributions over the timesteps

Return type:

dists (Distribution)

training: bool#
class robomimic.models.policy_nets.VAEActor(obs_shapes, ac_dim, encoder_layer_dims, decoder_layer_dims, latent_dim, device, decoder_is_conditioned=True, decoder_reconstruction_sum_across_elements=False, latent_clip=None, prior_learn=False, prior_is_conditioned=False, prior_layer_dims=(), prior_use_gmm=False, prior_gmm_num_modes=10, prior_gmm_learn_weights=False, prior_use_categorical=False, prior_categorical_dim=10, prior_categorical_gumbel_softmax_hard=False, goal_shapes=None, encoder_kwargs=None)#

Bases: robomimic.models.base_nets.Module

A VAE that models a distribution of actions conditioned on observations. The VAE prior and decoder are used at test-time as the policy.

decode(obs_dict=None, goal_dict=None, z=None, n=None)#

Thin wrapper around @VaeNets.VAE implementation.

Parameters:
  • obs_dict (dict) – a dictionary that maps modalities to torch.Tensor batches. Only needs to be provided if @decoder_is_conditioned or @z is None (since the prior will require it to generate z).

  • goal_dict (dict) – a dictionary that maps modalities to torch.Tensor batches. These should correspond to goal modalities.

  • z (torch.Tensor) – if provided, these latents are used to generate reconstructions from the VAE, and the prior is not sampled.

  • n (int) – this argument is used to specify the number of samples to generate from the prior. Only required if @z is None - i.e. sampling takes place

Returns:

dictionary of reconstructed inputs (this will be a dictionary

with a single “action” key)

Return type:

recons (dict)

encode(actions, obs_dict, goal_dict=None)#
Parameters:
  • actions (torch.Tensor) – a batch of actions

  • obs_dict (dict) – a dictionary that maps modalities to torch.Tensor batches. These should correspond to the observation modalities used for conditioning in either the decoder or the prior (or both).

  • goal_dict (dict) – a dictionary that maps modalities to torch.Tensor batches. These should correspond to goal modalities.

Returns:

dictionary with the following keys:

mean (torch.Tensor): posterior encoder means

logvar (torch.Tensor): posterior encoder logvars

Return type:

posterior params (dict)

forward(obs_dict, goal_dict=None, z=None)#

Samples actions from the policy distribution.

Parameters:
  • obs_dict (dict) – batch of observations

  • goal_dict (dict) – if not None, batch of goal observations

  • z (torch.Tensor) – if not None, use the provided batch of latents instead of sampling from the prior

Returns:

batch of actions from policy distribution

Return type:

action (torch.Tensor)

forward_train(actions, obs_dict, goal_dict=None, freeze_encoder=False)#

A full pass through the VAE network used during training to construct KL and reconstruction losses. See @VAE class for more info.

Parameters:
  • actions (torch.Tensor) – a batch of actions

  • obs_dict (dict) – a dictionary that maps modalities to torch.Tensor batches. These should correspond to the observation modalities used for conditioning in either the decoder or the prior (or both).

  • goal_dict (dict) – a dictionary that maps modalities to torch.Tensor batches. These should correspond to goal modalities.

Returns:

a dictionary that contains the following outputs.

encoder_params (dict): parameters for the posterior distribution

from the encoder forward pass

encoder_z (torch.Tensor): latents sampled from the encoder posterior

decoder_outputs (dict): action reconstructions from the decoder

kl_loss (torch.Tensor): KL loss over the batch of data

reconstruction_loss (torch.Tensor): reconstruction loss over the batch of data

Return type:

vae_outputs (dict)

get_gumbel_temperature()#

Return current Gumbel-Softmax temperature. Should only be used if @prior_use_categorical is True.

output_shape(input_shape=None)#

This implementation is required by the Module superclass, but is unused since we never chain this module to other ones.

sample_prior(obs_dict=None, goal_dict=None, n=None)#

Thin wrapper around @VaeNets.VAE implementation.

Parameters:
  • n (int) – this argument is used to specify the number of samples to generate from the prior.

  • obs_dict (dict) – a dictionary that maps modalities to torch.Tensor batches. Only needs to be provided if @prior_is_conditioned.

  • goal_dict (dict) – a dictionary that maps modalities to torch.Tensor batches. These should correspond to goal modalities.

Returns:

latents sampled from the prior

Return type:

z (torch.Tensor)

set_gumbel_temperature(temperature)#

Used by external algorithms to schedule Gumbel-Softmax temperature, which is used during reparametrization at train-time. Should only be used if @prior_use_categorical is True.

training: bool#

robomimic.models.transformers module#

Implementation of transformers, mostly based on Andrej’s minGPT model. See https://github.com/karpathy/minGPT/blob/master/mingpt/model.py for more details.

class robomimic.models.transformers.CausalSelfAttention(embed_dim, num_heads, context_length, attn_dropout=0.1, output_dropout=0.1)#

Bases: robomimic.models.base_nets.Module

forward(x)#

Forward pass through Self-Attention block. Input should be shape (B, T, D) where B is batch size, T is seq length (@self.context_length), and D is input dimension (@self.embed_dim).

output_shape(input_shape=None)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.transformers.GEGLU#

Bases: torch.nn.modules.module.Module

References

Shazeer et al., “GLU Variants Improve Transformer,” 2020. https://arxiv.org/abs/2002.05202

Implementation: https://github.com/pfnet-research/deep-table/blob/237c8be8a405349ce6ab78075234c60d9bfe60b7/deep_table/nn/layers/activation.py

forward(x)#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

geglu(x)#
training: bool#
class robomimic.models.transformers.GPT_Backbone(embed_dim, context_length, attn_dropout=0.1, block_output_dropout=0.1, num_layers=6, num_heads=8, activation='gelu')#

Bases: robomimic.models.base_nets.Module

the GPT model, with a context size of block_size

forward(inputs)#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

output_shape(input_shape=None)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#
class robomimic.models.transformers.PositionalEncoding(embed_dim)#

Bases: torch.nn.modules.module.Module

Taken from https://pytorch.org/tutorials/beginner/transformer_tutorial.html.

forward(x)#

Input timestep of shape BxT

training: bool#
class robomimic.models.transformers.SelfAttentionBlock(embed_dim, num_heads, context_length, attn_dropout=0.1, output_dropout=0.1, activation=GELU())#

Bases: robomimic.models.base_nets.Module

A single Transformer Block, that can be chained together repeatedly. It consists of a @CausalSelfAttention module and a small MLP, along with layer normalization and residual connections on each input.

forward(x)#

Forward pass - chain self-attention + MLP blocks, with residual connections and layer norms.

output_shape(input_shape=None)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#

robomimic.models.vae_nets module#

Contains an implementation of Variational Autoencoder (VAE) and other variants, including other priors, and RNN-VAEs.

class robomimic.models.vae_nets.CategoricalPrior(latent_dim, categorical_dim, device, learnable=False, obs_shapes=None, mlp_layer_dims=(), goal_shapes=None, encoder_kwargs=None)#

Bases: robomimic.models.vae_nets.Prior

A class that holds functionality for learning categorical priors for use in VAEs.

forward(batch_size, obs_dict=None, goal_dict=None)#

Computes prior logits (unnormalized log-probs).

Parameters:
  • batch_size (int) – batch size - this is needed for parameters that are not obs-dependent, to make sure the leading dimension is correct for downstream sampling and loss computation purposes

  • obs_dict (dict) – inputs according to @obs_shapes. Only needs to be provided if any prior parameters are obs-dependent.

  • goal_dict (dict) – inputs according to @goal_shapes (only if using goal observations)

Returns:

dictionary containing prior parameters

Return type:

prior_params (dict)

kl_loss(posterior_params, z=None, obs_dict=None, goal_dict=None)#

Computes KL divergence loss between the Categorical distribution given by the unnormalized logits @logits and the prior distribution.

Parameters:
  • posterior_params (dict) – dictionary with key “logits” corresponding to torch.Tensor batch of unnormalized logits of shape [B, D * C] that corresponds to the posterior categorical distribution

  • z (torch.Tensor) – samples from encoder - unused for this prior

  • obs_dict (dict) – inputs according to @obs_shapes. Only needs to be provided if any prior parameters are obs-dependent.

  • goal_dict (dict) – inputs according to @goal_shapes (only if using goal observations)

Returns:

KL divergence loss

Return type:

kl_loss (torch.Tensor)

sample(n, obs_dict=None, goal_dict=None)#

Returns a batch of samples from the prior distribution.

Parameters:
  • n (int) – this argument is used to specify the number of samples to generate from the prior.

  • obs_dict (dict) – inputs according to @obs_shapes. Only needs to be provided if any prior parameters are obs-dependent. Leading dimension should be consistent with @n, the number of samples to generate.

  • goal_dict (dict) – inputs according to @goal_shapes (only if using goal observations)

Returns:

batch of sampled latent vectors.

Return type:

z (torch.Tensor)

training: bool#
class robomimic.models.vae_nets.GaussianPrior(latent_dim, device, latent_clip=None, learnable=False, use_gmm=False, gmm_num_modes=10, gmm_learn_weights=False, obs_shapes=None, mlp_layer_dims=(), goal_shapes=None, encoder_kwargs=None)#

Bases: robomimic.models.vae_nets.Prior

A class that holds functionality for learning both unimodal Gaussian priors and multimodal Gaussian Mixture Model priors for use in VAEs.

forward(batch_size, obs_dict=None, goal_dict=None)#

Computes means, logvars, and GMM weights (if using GMM and learning weights).

Parameters:
  • batch_size (int) – batch size - this is needed for parameters that are not obs-dependent, to make sure the leading dimension is correct for downstream sampling and loss computation purposes

  • obs_dict (dict) – inputs according to @obs_shapes. Only needs to be provided if any prior parameters are obs-dependent.

  • goal_dict (dict) – inputs according to @goal_shapes (only if using goal observations)

Returns:

dictionary containing prior parameters

Return type:

prior_params (dict)

kl_loss(posterior_params, z=None, obs_dict=None, goal_dict=None)#

Computes sample-based KL divergence loss between the Gaussian distribution given by @mu, @logvar and the prior distribution.

Parameters:
  • posterior_params (dict) – dictionary with keys “mu” and “logvar” corresponding to torch.Tensor batch of means and log-variances of posterior Gaussian distribution.

  • z (torch.Tensor) – samples from the Gaussian distribution parametrized by @mu and @logvar. Only needed if @self.use_gmm is True.

  • obs_dict (dict) – inputs according to @obs_shapes. Only needs to be provided if any prior parameters are obs-dependent.

  • goal_dict (dict) – inputs according to @goal_shapes (only if using goal observations)

Returns:

KL divergence loss

Return type:

kl_loss (torch.Tensor)

sample(n, obs_dict=None, goal_dict=None)#

Returns a batch of samples from the prior distribution.

Parameters:
  • n (int) – this argument is used to specify the number of samples to generate from the prior.

  • obs_dict (dict) – inputs according to @obs_shapes. Only needs to be provided if any prior parameters are obs-dependent. Leading dimension should be consistent with @n, the number of samples to generate.

  • goal_dict (dict) – inputs according to @goal_shapes (only if using goal observations)

Returns:

batch of sampled latent vectors.

Return type:

z (torch.Tensor)

training: bool#
class robomimic.models.vae_nets.Prior(param_shapes, param_obs_dependent, obs_shapes=None, mlp_layer_dims=(), goal_shapes=None, encoder_kwargs=None)#

Bases: robomimic.models.base_nets.Module

Base class for VAE priors. It’s basically the same as a @MIMO_MLP network (it instantiates one) but it supports additional methods such as KL loss computation and sampling, and also may learn prior parameters as observation-independent torch Parameters instead of observation-dependent mappings.

forward(batch_size, obs_dict=None, goal_dict=None)#

Computes prior parameters.

Parameters:
  • batch_size (int) – batch size - this is needed for parameters that are not obs-dependent, to make sure the leading dimension is correct for downstream sampling and loss computation purposes

  • obs_dict (dict) – inputs according to @obs_shapes. Only needs to be provided if any prior parameters are obs-dependent.

  • goal_dict (dict) – inputs according to @goal_shapes (only if using goal observations)

Returns:

dictionary containing prior parameters

Return type:

prior_params (dict)

kl_loss(posterior_params, z=None, obs_dict=None, goal_dict=None)#

Computes sample-based KL divergence loss between the Gaussian distribution given by @mu, @logvar and the prior distribution.

Parameters:
  • posterior_params (dict) – dictionary with keys “mu” and “logvar” corresponding to torch.Tensor batch of means and log-variances of posterior Gaussian distribution.

  • z (torch.Tensor) – samples from the Gaussian distribution parametrized by @mu and @logvar. May not be needed depending on the prior.

  • obs_dict (dict) – inputs according to @obs_shapes. Only needs to be provided if any prior parameters are obs-dependent.

  • goal_dict (dict) – inputs according to @goal_shapes (only if using goal observations)

Returns:

KL divergence loss

Return type:

kl_loss (torch.Tensor)

output_shape(input_shape=None)#

Returns output shape for this module, which is a dictionary instead of a list since outputs are dictionaries.

sample(n, obs_dict=None, goal_dict=None)#

Returns a batch of samples from the prior distribution.

Parameters:
  • n (int) – this argument is used to specify the number of samples to generate from the prior.

  • obs_dict (dict) – inputs according to @obs_shapes. Only needs to be provided if any prior parameters are obs-dependent. Leading dimension should be consistent with @n, the number of samples to generate.

  • goal_dict (dict) – inputs according to @goal_shapes (only if using goal observations)

Returns:

batch of sampled latent vectors.

Return type:

z (torch.Tensor)

training: bool#
class robomimic.models.vae_nets.VAE(input_shapes, output_shapes, encoder_layer_dims, decoder_layer_dims, latent_dim, device, condition_shapes=None, decoder_is_conditioned=True, decoder_reconstruction_sum_across_elements=False, latent_clip=None, output_squash=(), output_scales=None, output_ranges=None, prior_learn=False, prior_is_conditioned=False, prior_layer_dims=(), prior_use_gmm=False, prior_gmm_num_modes=10, prior_gmm_learn_weights=False, prior_use_categorical=False, prior_categorical_dim=10, prior_categorical_gumbel_softmax_hard=False, goal_shapes=None, encoder_kwargs=None)#

Bases: torch.nn.modules.module.Module

A Variational Autoencoder (VAE), as described in https://arxiv.org/abs/1312.6114.

Models a distribution p(X) or a conditional distribution p(X | Y), where each variable can consist of multiple modalities. The target variable X, whose distribution is modeled, is specified through the @input_shapes argument, which is a map between modalities (strings) and expected shapes. In this way, a variable that consists of multiple kinds of data (e.g. image and flat-dimensional) can be modeled as well. A separate @output_shapes argument is used to specify the expected reconstructions - this allows for asymmetric reconstruction (for example, reconstructing low-resolution images).

This implementation supports learning conditional distributions as well (cVAE). The conditioning variable Y is specified through the @condition_shapes argument, which is also a map between modalities (strings) and expected shapes. In this way, variables with multiple kinds of data (e.g. image and flat-dimensional) can jointly be conditioned on. By default, the decoder takes the conditioning variable Y as input. To force the decoder to reconstruct from just the latent, set @decoder_is_conditioned to False (in this case, the prior must be conditioned).

The implementation also supports learning expressive priors instead of using the usual N(0, 1) prior. There are three kinds of priors supported - Gaussian, Gaussian Mixture Model (GMM), and Categorical. For each prior, the parameters can be learned as independent parameters, or be learned as functions of the conditioning variable Y (by setting @prior_is_conditioned).

decode(conditions=None, goals=None, z=None, n=None)#

Pass latents through decoder. Latents should be passed in to this function at train-time for backpropagation, but they can be left out at test-time. In this case, latents will be sampled using the VAE prior.

Parameters:
  • conditions (dict) – a dictionary that maps modalities to torch.Tensor batches. These should correspond to the modalities used for conditioning in either the decoder or the prior (or both). Only for cVAEs.

  • goals (dict) – a dictionary that maps modalities to torch.Tensor batches. These should correspond to goal modalities. Only for cVAEs.

  • z (torch.Tensor) – if provided, these latents are used to generate reconstructions from the VAE, and the prior is not sampled.

  • n (int) – this argument is used to specify the number of samples to generate from the prior. Only required if @z is None - i.e. sampling takes place

Returns:

dictionary of reconstructed inputs

Return type:

recons (dict)

encode(inputs, conditions=None, goals=None)#
Parameters:
  • inputs (dict) – a dictionary that maps input modalities to torch.Tensor batches. These should correspond to the encoder-only modalities (i.e. @self.encoder_only_shapes).

  • conditions (dict) – a dictionary that maps modalities to torch.Tensor batches. These should correspond to the modalities used for conditioning in either the decoder or the prior (or both). Only for cVAEs.

  • goals (dict) – a dictionary that maps modalities to torch.Tensor batches. These should correspond to goal modalities. Only for cVAEs.

Returns:

dictionary with posterior parameters

Return type:

posterior params (dict)

forward(inputs, outputs, conditions=None, goals=None, freeze_encoder=False)#

A full pass through the VAE network to construct KL and reconstruction losses.

Parameters:
  • inputs (dict) – a dictionary that maps input modalities to torch.Tensor batches. These should correspond to the encoder-only modalities (i.e. @self.encoder_only_shapes).

  • outputs (dict) – a dictionary that maps output modalities to torch.Tensor batches. These should correspond to the modalities used for reconstruction (i.e. @self.output_shapes).

  • conditions (dict) – a dictionary that maps modalities to torch.Tensor batches. These should correspond to the modalities used for conditioning in either the decoder or the prior (or both). Only for cVAEs.

  • goals (dict) – a dictionary that maps modalities to torch.Tensor batches. These should correspond to goal modalities. Only for cVAEs.

  • freeze_encoder (bool) – if True, don’t backprop into encoder by detaching encoder outputs. Useful for doing staged VAE training.

Returns:

a dictionary that contains the following outputs.

encoder_params (dict): parameters for the posterior distribution

from the encoder forward pass

encoder_z (torch.Tensor): latents sampled from the encoder posterior

decoder_outputs (dict): reconstructions from the decoder

kl_loss (torch.Tensor): KL loss over the batch of data

reconstruction_loss (torch.Tensor): reconstruction loss over the batch of data

Return type:

vae_outputs (dict)

get_gumbel_temperature()#

Return current Gumbel-Softmax temperature. Should only be used if @self.prior_use_categorical is True.

kl_loss(posterior_params, encoder_z=None, conditions=None, goals=None)#

Computes KL divergence loss given the results of the VAE encoder forward pass and the conditioning and goal modalities (if the prior is input-dependent).

Parameters:
  • posterior_params (dict) – dictionary with keys “mu” and “logvar” corresponding to torch.Tensor batch of means and log-variances of posterior Gaussian distribution. This is the output of @self.encode.

  • encoder_z (torch.Tensor) – samples from the Gaussian distribution parametrized by @mu and @logvar. Only required if using a GMM prior.

  • conditions (dict) – inputs according to @self.condition_shapes. Only needs to be provided if any prior parameters are input-dependent.

  • goal_dict (dict) – inputs according to @self.goal_shapes (only if using goal observations)

Returns:

VAE KL divergence loss

Return type:

kl_loss (torch.Tensor)

reconstruction_loss(reconstructions, targets)#

Reconstruction loss. Note that we compute the average per-dimension error in each modality and then average across all the modalities.

The beta term for weighting between reconstruction and kl losses will need to be tuned in practice for each situation (see https://twitter.com/memotv/status/973323454350090240 for more discussion).

Parameters:
  • reconstructions (dict) – reconstructed inputs, consistent with @self.output_shapes

  • targets (dict) – reconstruction targets, consistent with @self.output_shapes

Returns:

VAE reconstruction loss

Return type:

reconstruction_loss (torch.Tensor)

reparameterize(posterior_params)#
Parameters:

params (posterior) – dictionary from encoder forward pass that parametrizes the encoder distribution

Returns:

sampled latents that are also differentiable

Return type:

z (torch.Tensor)

sample_prior(n, conditions=None, goals=None)#

Samples from the prior using the prior parameters.

Parameters:
  • n (int) – this argument is used to specify the number of samples to generate from the prior.

  • conditions (dict) – a dictionary that maps modalities to torch.Tensor batches. These should correspond to the modalities used for conditioning in either the decoder or the prior (or both). Only for cVAEs.

  • goals (dict) – a dictionary that maps modalities to torch.Tensor batches. These should correspond to goal modalities. Only for cVAEs.

Returns:

sampled latents from the prior

Return type:

z (torch.Tensor)

set_gumbel_temperature(temperature)#

Used by external algorithms to schedule Gumbel-Softmax temperature, which is used during reparametrization at train-time. Should only be used if @self.prior_use_categorical is True.

training: bool#
robomimic.models.vae_nets.vae_args_from_config(vae_config)#

Generate a set of VAE args that are read from the VAE-specific part of a config (for example see config.algo.vae in BCConfig).

robomimic.models.value_nets module#

Contains torch Modules for value networks. These networks take an observation dictionary as input (and possibly additional conditioning, such as subgoal or goal dictionaries) and produce value or action-value estimates or distributions.

class robomimic.models.value_nets.ActionValueNetwork(obs_shapes, ac_dim, mlp_layer_dims, value_bounds=None, goal_shapes=None, encoder_kwargs=None)#

Bases: robomimic.models.value_nets.ValueNetwork

A basic Q (action-value) network that predicts values from observations and actions. Can optionally be goal conditioned on future observations.

forward(obs_dict, acts, goal_dict=None)#

Modify forward from super class to include actions in inputs.

training: bool#
class robomimic.models.value_nets.DistributionalActionValueNetwork(obs_shapes, ac_dim, mlp_layer_dims, value_bounds, num_atoms, goal_shapes=None, encoder_kwargs=None)#

Bases: robomimic.models.value_nets.ActionValueNetwork

Distributional Q (action-value) network that outputs a categorical distribution over a discrete grid of value atoms. See https://arxiv.org/pdf/1707.06887.pdf for more details.

forward(obs_dict, acts, goal_dict=None)#

Return mean of critic categorical distribution. Useful for obtaining point estimates of critic values.

Parameters:
  • obs_dict (dict) – batch of observations

  • acts (torch.Tensor) – batch of actions

  • goal_dict (dict) – if not None, batch of goal observations

Returns:

expectation of value distribution

Return type:

mean_value (torch.Tensor)

forward_train(obs_dict, acts, goal_dict=None)#

Return full critic categorical distribution.

Parameters:
  • obs_dict (dict) – batch of observations

  • acts (torch.Tensor) – batch of actions

  • goal_dict (dict) – if not None, batch of goal observations

Returns:

value_distribution (DiscreteValueDistribution instance)

training: bool#
class robomimic.models.value_nets.ValueNetwork(obs_shapes, mlp_layer_dims, value_bounds=None, goal_shapes=None, encoder_kwargs=None)#

Bases: robomimic.models.obs_nets.MIMO_MLP

A basic value network that predicts values from observations. Can optionally be goal conditioned on future observations.

forward(obs_dict, goal_dict=None)#

Forward through value network, and then optionally use tanh scaling.

output_shape(input_shape=None)#

Function to compute output shape from inputs to this module.

Parameters:

input_shape (iterable of int) – shape of input. Does not include batch dimension. Some modules may not need this argument, if their output does not depend on the size of the input, or if they assume fixed size input.

Returns:

list of integers corresponding to output shape

Return type:

out_shape ([int])

training: bool#

Module contents#