值迭代与策略迭代

1. Value iteration algorithm

2. Policy iteration algorithm

3. Truncated policy iteration algorithm

参考文献

本文是一篇学习笔记,内容全部源自于以下视频

https://www.bilibili.com/video/BV1Pz5C6iE3X/?p=6&spm_id_from=333.1007.top_right_bar_window_history.content.click&vd_source=44ed90827c8f67247cab0ab288133c80