D4RL
Contents
D4RL#
Overview#
The D4RL benchmark provides a set of locomotion tasks and demonstration datasets.
Downloading#
Use convert_d4rl.py
in the scripts/conversion
folder to automatically download and postprocess the D4RL dataset in a single step. For example:
# by default, download to robomimic/datasets
$ python convert_d4rl.py --env walker2d-medium-expert-v2
# download to specific folder
$ python convert_d4rl.py --env walker2d-medium-expert-v2 --folder /path/to/output/folder/
--env
specifies the dataset to download--folder
specifies where you want to download the dataset. If no folder is provided, thedatasets
folder at the top-level of the repository will be used.
The script will download the raw hdf5 dataset to --folder
, and the converted one that is compatible with this repository into the converted
subfolder.
Postprocessing#
No postprocessing is required, assuming the above script is run!
D4RL Results#
Below, we provide a table of results on common D4RL datasets using the algorithms included in the released codebase. We follow the convention in the TD3-BC paper, where we average results over the final 10 rollout evaluations, but we use 50 rollouts instead of 10 for each evaluation. All results are reported on the -v2
environment variants. Apart from a small handful of the halfcheetah results, the results align with those presented in the TD3_BC paper. We suspect the halfcheetah results are different because we used mujoco-py
version 2.1.2.14
in our evaluations, as opposed to 1.5
in order to be consistent with the version we were using for robosuite datasets. The results below were generated with gym
version 0.24.1
and this d4rl
commit.
BCQ | CQL | TD3-BC | IQL | |
---|---|---|---|---|
HalfCheetah-Medium | 46.8% (5535) | 46.7% (5516) | 47.9% (5664) | 45.6% (5379) |
Hopper-Medium | 63.9% (2059) | 59.2% (1908) | 61.0% (1965) | 53.7% (1729) |
Walker2d-Medium | 74.6% (3426) | 79.7% (3659) | 82.9% (3806) | 77.0% (3537) |
HalfCheetah-Medium-Expert | 89.9% (10875) | 77.6% (9358) | 92.1% (11154) | 89.0% (10773) |
Hopper-Medium-Expert | 79.5% (2566) | 62.9% (2027) | 89.7% (2900) | 110.1% (3564) |
Walker2d-Medium-Expert | 98.7% (4535) | 109.0% (5007) | 111.1% (5103) | 109.7% (5037) |
HalfCheetah-Expert | 92.9% (11249) | 67.7% (8126) | 94.6% (11469) | 93.3% (11304) |
Hopper-Expert | 92.3% (2984) | 104.2% (3370) | 108.5% (3512) | 110.5% (3577) |
Walker2d-Expert | 108.6% (4987) | 108.5% (4983) | 110.3% (5066) | 109.1% (5008) |
Reproducing D4RL Results#
In order to reproduce the results above, first make sure that the generate_paper_configs.py
script has been run, where the --dataset_dir
argument is consistent with the folder where the D4RL datasets were downloaded using the convert_d4rl.py
script. This is also the first step for reproducing results on the released robot manipulation datasets. The --config_dir
directory used in the script (robomimic/exps/paper
by default) will contain a d4rl.sh
script, and a d4rl
subdirectory that contains all the json configs. The table results above can be generated simply by running the training commands in the shell script.
Citation#
@article{fu2020d4rl,
title={D4rl: Datasets for deep data-driven reinforcement learning},
author={Fu, Justin and Kumar, Aviral and Nachum, Ofir and Tucker, George and Levine, Sergey},
journal={arXiv preprint arXiv:2004.07219},
year={2020}
}