robomimic natively supports multiple different observation modalities, and provides integrated support for modifying observations and adding your own custom ones.
First, we highlight semantic distinctions when referring to different aspects of observations:
Keys are individual observations that are received from an environment / dataset. For example,
joint_velcould be keys, depending on the dataset / environment.
Modalities are different observation modes. For example, low dimensional states are considered a single mode, whereas RGB observations might be another mode. robomimic natively supports four modalities:
scan. Each modality owns it own set of observation keys.
Groups consist of potentially multiple modalities and multiple keys per modality, which are together passed to a learning model. For example, robomimic commonly uses three different groups:
obs, which contains the normal observations passed to any model using these as inputs, and
subgoal, which means any specified modalities / keys correspond to a goal / subgoal to be learned.
Observations are handled in the following way:
Each observation key is according to their modality via their
All observations for a given modality are concatenated and passed through an
ObservationEncoderfor that modality,
All processed observations over all modalities are concatenated together and passed to a learning network
Modifying and Adding Your Own Observation Modalities#
robomimic natively supports the following modalities:
low_dim: low-dimensional states
rgb: RGB images
depth: depth images
scan: scan arrays
The way each of these modalities are processed and encoded can be easily specified by modifying their respective
encoder parameters in your
You may want to specify your own custom modalities that get processed and encoded in a certain way (e.g.: semantic segmentation, optical flow, etc…). This can also easily be done, and we refer you to our example script which walks through the process.