References

[Sutton and Barto, 1998]: Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. The MIT Press.
[Madgwick et al., 2010]: Madgwick, S., Vaidyanathan, R., and Harrison, A. (2010). An efficient orientation filter for inertial measurement units (imus) and magnetic angular rate and gravity (marg) sensor arrays. Technical report, Department of Mechanical Engineering. [ http ]
[Kingma and Ba, 2014]: Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. [ http ]
[Schulman et al., 2015]: Schulman, J., Moritz, P., Levine, S., Jordan, M., and Abbeel, P. (2015). High-dimensional continuous control using generalized advantage estimation. [ http ]
[Coumans and Bai, 2016]: Coumans, E. and Bai, Y. (2016). Pybullet, a python module for physics simulation for games, robotics and machine learning. [ http ]
[Brockman et al., 2016]: Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. (2016). Openai gym. [ http ]
[Duan et al., 2016]: Duan, Y., Chen, X., Houthooft, R., Schulman, J., and Abbeel, P. (2016). Benchmarking deep reinforcement learning for continuous control. CoRR. [ http ]
[Henderson et al., 2017]: Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., and Meger, D. (2017). Deep reinforcement learning that matters. [ http ]
[Schulman et al., 2017]: Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. CoRR. [ http ]
[Achiam, 2018]: Achiam, J. (2018). Spinning up in deep reinforcement learning. [ http ]
[Tan et al., 2018]: Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., Bohez, S., and Vanhoucke, V. (2018). Sim-to-real: Learning agile locomotion for quadruped robots. [ http ]
[Hill et al., 2018]: Hill, A., Raffin, A., Ernestus, M., Gleave, A., Kanervisto, A., Traore, R., Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., and Wu, Y. (2018). Stable baselines. [ http ]
[Raffin, 2018]: Raffin, A. (2018). Rl baselines zoo. [ http ]
[Akiba et al., 2019]: Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. CoRR. [ http ]