# robosuite Datasets The repository is fully compatible with datasets collected using [robosuite](https://robosuite.ai/). See [this link](https://robosuite.ai/docs/algorithms/demonstrations.html) for more information on collecting your own human demonstrations using robosuite. ## Converting robosuite hdf5 datasets The raw `demo.hdf5` file generated by the `collect_human_demonstrations.py` robosuite script can easily be modified in-place to be compatible with **robomimic**: ```sh $ python conversion/convert_robosuite.py --dataset /path/to/demo.hdf5 ```
Post-Processed Dataset Structure
This post-processed `demo.hdf5` file in its current state is _missing_ observations (e.g.: proprioception, images, ...), rewards, and dones, which are necessary for training policies. However, keeping these observation-free datasets is useful because it **allows flexibility in [extracting](robosuite.md#extracting-observations-from-mujoco-states) different kinds of observations and rewards**.- `data` (group) - `total` (attribute) - number of state-action samples in the dataset - `env_args` (attribute) - a json string that contains metadata on the environment and relevant arguments used for collecting data - `demo_0` (group) - group for the first demonstration (every demonstration has a group) - `num_samples` (attribute) - the number of state-action samples in this trajectory - `model_file` (attribute) - the xml string corresponding to the MJCF MuJoCo model - `states` (dataset) - flattened raw MuJoCo states, ordered by time - `actions` (dataset) - environment actions, ordered by time - `demo_1` (group) - group for the second demonstration ...
Warning! Train-Validation Data Splits
For robosuite datasets, if using your own [train-val splits](overview.md#filter-keys), generate these splits _before_ extracting observations. This ensures that all postprocessed hdf5s generated from the `demo.hdf5` inherits the same filter keys.Saving storage space
Image datasets can be quite large in terms of storage, but we also offer two flags that might be useful to save on storage. First, the `--compress` flag will run lossless compression on the extracted observations, resulting in datasets that are up to 5x smaller in storage (in our testing). However, training will be marginally slower due to uncompression costs when loading batches. Second, the `--exclude-next-obs` will exclude the `next_obs` keys per trajectory, since they are not needed for imitation learning algorithms like `BC` and `BC-RNN`. In our testing, enabling both flags reduced the Square (PH) Image dataset size from 2.5 GB to 307 MB at the cost of increasing BC-RNN training time from 7 hours to 8.5 hours.