Overview
Contents
Overview#
Dataset Pipeline#
Datasets capture recorded environment data and are used as inputs to a given offline RL or IL algorithm in robomimic. In general, you can use datasets with robomimic by:
Downloading the desired dataset
Postprocessing the dataset to guarantee compatibility with robomimic
Training agent(s) in robomimic with dataset
robomimic currently supports the following datasets out of the box. Click on the corresponding (1) Downloading link to download the dataset and the corresponding (2) Postprocessing link for postprocessing that dataset.
Dataset | Downloading | Postprocessing | |
---|---|---|---|
robomimic v0.1 | Sim + Real Robot Manipulation | Link | Link |
MimicGen | Sim Robot Manipulation | Link | Link |
D4RL | Sim Locomotion | Link | Link |
MOMART | Sim Mobile Manipulation | Link | Link |
RoboTurk Pilot | Sim Robot Manipulation | Link | Link |
After downloading and postprocessing, (3) Training with the dataset is straightforward and unified across all datasets:
python train.py --dataset <PATH_TO_POSTPROCESSED_DATASET> --config <PATH_TO_CONFIG>
Generating Your Own Dataset#
robomimic provides tutorials for collecting custom datasets for specific environment platforms. Click on any of the links below for more information for the specific environment setup:
Environment Platform | Task Types |
---|---|
robosuite | Robot Manipulation |
Create Your Own Environment Wrapper!
If you want to generate your own dataset in a custom environment platform that is not listed above, please see this page.
Dataset Structure#
All postprocessed robomimic compatible datasets share the same data structure. A single dataset is a single HDF5 file with the following structure:
HDF5 Structure (click to expand)
data
(group)total
(attribute) - number of state-action samples in the datasetenv_args
(attribute) - a json string that contains metadata on the environment and relevant arguments used for collecting data. Three keys:env_name
, the name of the environment or task to create,env_type
, one of robomimic’s supported environment types, andenv_kwargs
, a dictionary of keyword-arguments to be passed into the environment of typeenv_name
.demo_0
(group) - group for the first trajectory (every trajectory has a group)num_samples
(attribute) - the number of state-action samples in this trajectorymodel_file
(attribute) - the xml string corresponding to the MJCF MuJoCo model. Only present for robosuite datasets.states
(dataset) - flattened raw MuJoCo states, ordered by time. Shape (N, D) where N is the length of the trajectory, and D is the dimension of the state vector. Should be empty or have dummy values for non-robosuite datasets.actions
(dataset) - environment actions, ordered by time. Shape (N, A) where N is the length of the trajectory, and A is the action space dimensionrewards
(dataset) - environment rewards, ordered by time. Shape (N,) where N is the length of the trajectory.dones
(dataset) - done signal, equal to 1 if playing the corresponding action in the state should terminate the episode. Shape (N,) where N is the length of the trajectory.obs
(group) - group for the observation keys. Each key is stored as a dataset.<obs_key_1>
(dataset) - the first observation key. Note that the name of this dataset and shape will vary. As an example, the name could be “agentview_image”, and the shape could be (N, 84, 84, 3).…
next_obs
(group) - group for the next observations.<obs_key_1>
(dataset) - the first observation key.…
demo_1
(group) - group for the second trajectory…
mask
(group) - this group will exist in hdf5 datasets that contain filter keys<filter_key_1>
(dataset) - the first filter key. Note that the name of this dataset and length will vary. As an example, this could be the “valid” filter key, and contain the list [”demo_0”, “demo_19”, “demo_35”], corresponding to 3 validation trajectories.…
Data Conventions#
robomimic-compatible datasets expect certain values (such as images and actions) to be formatted a specific way. See the below sections for further details:
Storing images
Warning!
Dataset images should be of type np.uint8
and be stored in channel-last (H, W, C)
format. This is because:
(1) this is a common format that many
gym
environments and allrobosuite
environments return image observations in(2) using
np.uint8
(vs floats) saves space in dataset storage
Note that the robosuite observation extraction script (dataset_states_to_obs.py
) already stores images in the correct format.
Storing actions
Warning!
Actions should be normalized between -1 and 1. This is because this range enables easier policy learning via the use of tanh
layers).
The get_dataset_info.py
script can be used to sanity check stored actions, and will throw an Exception
if there is a violation.
Filter Keys#
Filter keys enable arbitrary splitting of a dataset into sub-groups, and allow training on a specific subset of the data.
A common use-case is to split data into train-validation splits. We provide a convenience script for doing this in the robomimic/scripts
directory:
$ python split_train_val.py --dataset /path/to/dataset.hdf5 --ratio 0.1 --filter_key <FILTER_KEY_NAME>
--dataset
specifies the path to the hdf5 dataset--ratio
specifies the amount of validation data you would like to create. In the example above, 10% of the demonstrations will be put into the validation group.--filter_key
(optional) By default, this script splits all demonstration keys in the hdf5 into 2 new hdf5 groups - one undermask/train
, and one undermask/valid
. If this argument is provided, the demonstration keys corresponding to this filter key (undermask/<FILTER_KEY_NAME>
) will be split into 2 groups -mask/<FILTER_KEY_NAME>_train
andmask/<FILTER_KEY_NAME>_valid
.
Note!
You can easily list the filter keys present in a dataset with the get_dataset_info.py
script (see this link), and you can even pass a --verbose
flag to list the exact demonstrations that each filter key corresponds to.
Using filter keys during training is easy. To use the generated train-valid split, you can set config.experiment.validate=True
to ensure that validation will run after each training epoch, and then set config.train.hdf5_filter_key="train"
and config.train.hdf5_validation_filter_key="valid"
so that the demos under mask/train
are used for training, and the demos under mask/valid
are used for validation.
You can also use a custom filter key for training by setting config.train.hdf5_filter_key=<FILTER_KEY_NAME>
. This ensures that only the demos under mask/<FILTER_KEY_NAME>
are used during training. You can also specify a custom filter key for validation by setting config.train.hdf5_validation_filter_key
.