Multimodal Observations
Contents
Multimodal Observations#
robomimic natively supports multiple different observation modalities, and provides integrated support for modifying observations and adding your own custom ones.
First, we highlight semantic distinctions when referring to different aspects of observations:
Keys are individual observations that are received from an environment / dataset. For example,
rgb_wrist
,eef_pos
, andjoint_vel
could be keys, depending on the dataset / environment.Modalities are different observation modes. For example, low dimensional states are considered a single mode, whereas RGB observations might be another mode. robomimic natively supports four modalities:
low_dim
,rgb
,depth
, andscan
. Each modality owns it own set of observation keys.Groups consist of potentially multiple modalities and multiple keys per modality, which are together passed to a learning model. For example, robomimic commonly uses three different groups:
obs
, which contains the normal observations passed to any model using these as inputs, andgoal
/subgoal
, which means any specified modalities / keys correspond to a goal / subgoal to be learned.
Observations are handled in the following way:
Each observation key is according to their modality via their
Modality
class,All observations for a given modality are concatenated and passed through an
ObservationEncoder
for that modality,All processed observations over all modalities are concatenated together and passed to a learning network
Modifying and Adding Your Own Observation Modalities#
robomimic natively supports the following modalities:
low_dim
: low-dimensional statesrgb
: RGB imagesdepth
: depth imagesscan
: scan arrays
The way each of these modalities are processed and encoded can be easily specified by modifying their respective encoder
parameters in your Config
class.
You may want to specify your own custom modalities that get processed and encoded in a certain way (e.g.: semantic segmentation, optical flow, etc…). This can also easily be done, and we refer you to our example script which walks through the process.