# D4RL ## Overview The [D4RL](https://arxiv.org/abs/2004.07219) benchmark provides a set of locomotion tasks and demonstration datasets. ## Downloading Use `convert_d4rl.py` in the `scripts/conversion` folder to automatically download and postprocess the D4RL dataset in a single step. For example: ```sh # by default, download to robomimic/datasets $ python convert_d4rl.py --env walker2d-medium-expert-v2 # download to specific folder $ python convert_d4rl.py --env walker2d-medium-expert-v2 --folder /path/to/output/folder/ ``` - `--env` specifies the dataset to download - `--folder` specifies where you want to download the dataset. If no folder is provided, the `datasets` folder at the top-level of the repository will be used. The script will download the raw hdf5 dataset to `--folder`, and the converted one that is compatible with this repository into the `converted` subfolder. ## Postprocessing No postprocessing is required, assuming the above script is run! ## D4RL Results Below, we provide a table of results on common D4RL datasets using the algorithms included in the released codebase. We follow the convention in the TD3-BC paper, where we average results over the final 10 rollout evaluations, but we use 50 rollouts instead of 10 for each evaluation. All results are reported on the `-v2` environment variants. Apart from a small handful of the halfcheetah results, the results align with those presented in the [TD3_BC paper](https://arxiv.org/abs/2106.06860). We suspect the halfcheetah results are different because we used `mujoco-py` version `2.1.2.14` in our evaluations, as opposed to `1.5` in order to be consistent with the version we were using for robosuite datasets. The results below were generated with `gym` version `0.24.1` and this `d4rl` [commit](https://github.com/Farama-Foundation/D4RL/tree/305676ebb2e26582d50c6518c8df39fd52dea587). | | **BCQ** | **CQL** | **TD3-BC** | **IQL** | | ----------------------------- | ------------- | ------------- | ------------- | ------------- | | **HalfCheetah-Medium** | 46.8% (5535) | 46.7% (5516) | 47.9% (5664) | 45.6% (5379) | | **Hopper-Medium** | 63.9% (2059) | 59.2% (1908) | 61.0% (1965) | 53.7% (1729) | | **Walker2d-Medium** | 74.6% (3426) | 79.7% (3659) | 82.9% (3806) | 77.0% (3537) | | **HalfCheetah-Medium-Expert** | 89.9% (10875) | 77.6% (9358) | 92.1% (11154) | 89.0% (10773) | | **Hopper-Medium-Expert** | 79.5% (2566) | 62.9% (2027) | 89.7% (2900) | 110.1% (3564) | | **Walker2d-Medium-Expert** | 98.7% (4535) | 109.0% (5007) | 111.1% (5103) | 109.7% (5037) | | **HalfCheetah-Expert** | 92.9% (11249) | 67.7% (8126) | 94.6% (11469) | 93.3% (11304) | | **Hopper-Expert** | 92.3% (2984) | 104.2% (3370) | 108.5% (3512) | 110.5% (3577) | | **Walker2d-Expert** | 108.6% (4987) | 108.5% (4983) | 110.3% (5066) | 109.1% (5008) | ### Reproducing D4RL Results In order to reproduce the results above, first make sure that the `generate_paper_configs.py` script has been run, where the `--dataset_dir` argument is consistent with the folder where the D4RL datasets were downloaded using the `convert_d4rl.py` script. This is also the first step for reproducing results on the released robot manipulation datasets. The `--config_dir` directory used in the script (`robomimic/exps/paper` by default) will contain a `d4rl.sh` script, and a `d4rl` subdirectory that contains all the json configs. The table results above can be generated simply by running the training commands in the shell script. ## Citation ```sh @article{fu2020d4rl, title={D4rl: Datasets for deep data-driven reinforcement learning}, author={Fu, Justin and Kumar, Aviral and Nachum, Ofir and Tucker, George and Levine, Sergey}, journal={arXiv preprint arXiv:2004.07219}, year={2020} } ```