# robosuite Datasets

The repository is fully compatible with datasets collected using [robosuite](https://robosuite.ai/). See [this link](https://robosuite.ai/docs/algorithms/demonstrations.html) for more information on collecting your own human demonstrations using robosuite. 

## Converting robosuite hdf5 datasets

The raw `demo.hdf5` file generated by the `collect_human_demonstrations.py` robosuite script can easily be modified in-place to be compatible with **robomimic**:

```sh
$ python conversion/convert_robosuite.py --dataset /path/to/demo.hdf5
```

<div class="admonition info">
<p class="admonition-title">Post-Processed Dataset Structure</p>

This post-processed `demo.hdf5` file in its current state is _missing_ observations (e.g.: proprioception, images, ...), rewards, and dones, which are necessary for training policies.

However, keeping these observation-free datasets is useful because it **allows flexibility in [extracting](robosuite.md#extracting-observations-from-mujoco-states) different kinds of observations and rewards**.

<details>
  <summary><b>Dataset Structure <span style="color:red;">(click to expand)</span></b></summary>
<p>

- `data` (group)

  - `total` (attribute) - number of state-action samples in the dataset

  - `env_args` (attribute) - a json string that contains metadata on the environment and relevant arguments used for collecting data

  - `demo_0` (group) - group for the first demonstration (every demonstration has a group)

    - `num_samples` (attribute) - the number of state-action samples in this trajectory
    - `model_file` (attribute) - the xml string corresponding to the MJCF MuJoCo model
    - `states` (dataset) - flattened raw MuJoCo states, ordered by time
    - `actions` (dataset) - environment actions, ordered by time

  - `demo_1` (group) - group for the second demonstration

    ...
</p>
</details>

</div>


Next, we will extract observations from this raw dataset.


## Extracting Observations from MuJoCo states

<div class="admonition warning">
<p class="admonition-title">Warning! Train-Validation Data Splits</p>

For robosuite datasets, if using your own [train-val splits](overview.md#filter-keys), generate these splits _before_ extracting observations. This ensures that all postprocessed hdf5s generated from the `demo.hdf5` inherits the same filter keys.

</div>

Generating observations from a dataset is straightforward and can be done with a single command from `robomimic/scripts`:

```sh
# For low dimensional observations only, with done on task success
$ python dataset_states_to_obs.py --dataset /path/to/demo.hdf5 --output_name low_dim.hdf5 --done_mode 2

# For including image observations
$ python dataset_states_to_obs.py --dataset /path/to/demo.hdf5 --output_name image.hdf5 --done_mode 2 --camera_names agentview robot0_eye_in_hand --camera_height 84 --camera_width 84

# For including depth observations too
python dataset_states_to_obs.py --dataset /path/to/demo.hdf5 --output_name depth.hdf5 --done_mode 2 --camera_names agentview robot0_eye_in_hand --camera_height 84 --camera_width 84 --depth

# Using dense rewards
$ python dataset_states_to_obs.py --dataset /path/to/demo.hdf5 --output_name image_dense.hdf5 --done_mode 2 --dense --camera_names agentview robot0_eye_in_hand --camera_height 84 --camera_width 84

# (space saving option) extract 84x84 image observations with compression and without 
# extracting next obs (not needed for pure imitation learning algos)
python dataset_states_to_obs.py --dataset /path/to/demo.hdf5 --output_name image.hdf5 \
    --done_mode 2 --camera_names agentview robot0_eye_in_hand --camera_height 84 --camera_width 84 \
    --compress --exclude-next-obs

# Only writing done at the end of the trajectory
$ python dataset_states_to_obs.py --dataset /path/to/demo.hdf5 --output_name image_done_1.hdf5 --done_mode 1 --camera_names agentview robot0_eye_in_hand --camera_height 84 --camera_width 84

# For seeing descriptions of all the command-line args available
$ python dataset_states_to_obs.py --help
```

<div class="admonition tip">
<p class="admonition-title">Saving storage space</p>

Image datasets can be quite large in terms of storage, but we also offer two flags that might be useful to save on storage. First, the `--compress` flag will run lossless compression on the extracted observations, resulting in datasets that are up to 5x smaller in storage (in our testing). However, training will be marginally slower due to uncompression costs when loading batches. Second, the `--exclude-next-obs` will exclude the `next_obs` keys per trajectory, since they are not needed for imitation learning algorithms like `BC` and `BC-RNN`.

In our testing, enabling both flags reduced the Square (PH) Image dataset size from 2.5 GB to 307 MB at the cost of increasing BC-RNN training time from 7 hours to 8.5 hours.
</div>

## Citation
```sh
@article{zhu2020robosuite,
  title={robosuite: A modular simulation framework and benchmark for robot learning},
  author={Zhu, Yuke and Wong, Josiah and Mandlekar, Ajay and Mart{\'\i}n-Mart{\'\i}n, Roberto and Joshi, Abhishek and Nasiriany, Soroush and Zhu, Yifeng},
  journal={arXiv preprint arXiv:2009.12293},
  year={2020}
}
```