首页 开发技术 其它     /    imitation_learning:PyTorch实现的一些强化学习算法:优势演员评论(A2C),近距离策略优化(PPO),V-MPO,行为克隆(BC)。将添加更多算法-源码

imitation_learning:PyTorch实现的一些强化学习算法:优势演员评论(A2C),近距离策略优化(PPO),V-MPO,行为克隆(BC)。将添加更多算法-源码

上传者: weixin_42128015 | 上传时间:2016/4/5 15:54:46 | 文件大小:11.42MB | 文件类型:ZIP
imitation_learning:PyTorch实现的一些强化学习算法:优势演员评论(A2C),近距离策略优化(PPO),V-MPO,行为克隆(BC)。将添加更多算法-源码
模仿学习此仓库包含一些强化学习算法的简单PyTorch实现:优势演员评论家(A2C)的同步变体近端策略优化(PPO)-最受欢迎的RL算法,,,策略上最大后验策略优化(V-MPO)-DeepMind在其上次工作中使用的算法(尚不起作用...)行为克隆(BC)-一种将某些专家行为克隆到新策略中的简单技术每种算法都支持向量/图像/字典观察空间和离散/连续动作空间。
为什么回购被称为“模仿学习”?当我开始这个项目并进行回购时,我认为模仿学习将是我的主要重点,并且无模型方法仅在开始时用于培训“专家”。
但是,PPO实施(及其技巧)似乎比我预期的花费了更多时间。
结果,现在大多数代码与PPO有关,但是我仍然对模仿学习感兴味,并打算添加一些相关算法。
当前功能目前,此仓库包含一些无模型的基于策略的算法实现:A2C,PPO,V-MPO和BC。
每种算法都支持离散(分类,伯努利,GumbelSoftmax)和连续(贝塔,正态,tanh(正态))策略分布以及矢量或图像观察环境。
Beta和tanh(Normal)在我的实验中效果最好(在BipedalWalker和Huma 本软件ID:14988612

文件下载

资源详情

[{"title":"(41个子文件11.42MB)imitation_learning:PyTorch实现的一些强化学习算法:优势演员评论(A2C),近距离策略优化(PPO),V-MPO,行为克隆(BC)。将添加更多算法-源码","children":[{"title":"imitation_learning-master","children":[{"title":"utils","children":[{"title":"vec_env.py <span style='color:#111;'>11.44KB</span>","children":null,"spread":false},{"title":"batch_crop.py <span style='color:#111;'>773B</span>","children":null,"spread":false},{"title":"utils.py <span style='color:#111;'>816B</span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'>0B</span>","children":null,"spread":false},{"title":"env_wrappers.py <span style='color:#111;'>8.27KB</span>","children":null,"spread":false}],"spread":true},{"title":"algorithms","children":[{"title":"agents","children":[{"title":"v_mpo.py <span style='color:#111;'>7.77KB</span>","children":null,"spread":false},{"title":"bc.py <span style='color:#111;'>3.01KB</span>","children":null,"spread":false},{"title":"ppo.py <span style='color:#111;'>7.62KB</span>","children":null,"spread":false},{"title":"agent_train.py <span style='color:#111;'>7.81KB</span>","children":null,"spread":false},{"title":"a2c.py <span style='color:#111;'>2.69KB</span>","children":null,"spread":false}],"spread":true},{"title":"kl_divergence.py <span style='color:#111;'>1.56KB</span>","children":null,"spread":false},{"title":"real_nvp.py <span style='color:#111;'>6.71KB</span>","children":null,"spread":false},{"title":"nn","children":[{"title":"conv_encoders.py <span style='color:#111;'>3.25KB</span>","children":null,"spread":false},{"title":"recurrent_encoders.py <span style='color:#111;'>2.48KB</span>","children":null,"spread":false},{"title":"actor_critic.py <span style='color:#111;'>3.31KB</span>","children":null,"spread":false},{"title":"agent_model.py <span style='color:#111;'>5.47KB</span>","children":null,"spread":false}],"spread":true},{"title":"normalization.py <span style='color:#111;'>2.82KB</span>","children":null,"spread":false},{"title":"distributions.py <span style='color:#111;'>9.36KB</span>","children":null,"spread":false}],"spread":true},{"title":"test.py <span style='color:#111;'>6.31KB</span>","children":null,"spread":false},{"title":"requirements.txt <span style='color:#111;'>92B</span>","children":null,"spread":false},{"title":"trainers","children":[{"title":"base_trainer.py <span style='color:#111;'>2.29KB</span>","children":null,"spread":false},{"title":"rollout.py <span style='color:#111;'>7.51KB</span>","children":null,"spread":false},{"title":"on_policy.py <span style='color:#111;'>9.09KB</span>","children":null,"spread":false},{"title":"behavior_cloning.py <span style='color:#111;'>1.51KB</span>","children":null,"spread":false}],"spread":true},{"title":"train_scripts","children":[{"title":"bc","children":[{"title":"cart_pole_10_episodes.py <span style='color:#111;'>1.62KB</span>","children":null,"spread":false}],"spread":true},{"title":"ppo","children":[{"title":"bipedal_rnn.py <span style='color:#111;'>2.38KB</span>","children":null,"spread":false},{"title":"car_racing.py <span style='color:#111;'>2.21KB</span>","children":null,"spread":false},{"title":"cart_pole.py <span style='color:#111;'>1.69KB</span>","children":null,"spread":false},{"title":"bipedal_hardcore.py <span style='color:#111;'>2.58KB</span>","children":null,"spread":false},{"title":"bipedal.py <span style='color:#111;'>1.80KB</span>","children":null,"spread":false},{"title":"humanoid.py <span style='color:#111;'>1.94KB</span>","children":null,"spread":false},{"title":"cart_pole_rnn.py <span style='color:#111;'>2.16KB</span>","children":null,"spread":false}],"spread":true},{"title":"a2c","children":[{"title":"cart_pole.py <span style='color:#111;'>1.61KB</span>","children":null,"spread":false},{"title":"cart_pole_rnn.py <span style='color:#111;'>2.13KB</span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":".gitignore <span style='color:#111;'>90B</span>","children":null,"spread":false},{"title":"gifs","children":[{"title":"cartpole.gif <span style='color:#111;'>84.82KB</span>","children":null,"spread":false},{"title":"car_racing.gif <span style='color:#111;'>5.93MB</span>","children":null,"spread":false},{"title":"humanoid.gif <span style='color:#111;'>3.67MB</span>","children":null,"spread":false},{"title":"bipedal.gif <span style='color:#111;'>1.78MB</span>","children":null,"spread":false}],"spread":true},{"title":"custom_environments","children":[{"title":"mario_wrapper.py <span style='color:#111;'>1.42KB</span>","children":null,"spread":false}],"spread":true},{"title":"readme.md <span style='color:#111;'>7.00KB</span>","children":null,"spread":false}],"spread":true}],"spread":true}]

评论信息

  • misterdays:
    用户下载后在一定时间内未进行评价,系统默认好评。2021-08-09

免责申明

【好快吧下载】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【好快吧下载】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【好快吧下载】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,8686821#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明