1. epsilon-greedy strategy
11111
2. UCB strategy
222
3. Softmax strategy
333
4. Gradient strategy
444
References
1\] [科学网---【RL系列】Multi-Armed Bandit笔记------Softmax选择策略 - 管金昱的博文](https://blog.sciencenet.cn/home.php?mod=space&uid=3189881&do=blog&quickforward=1&id=1122463 "科学网—【RL系列】Multi-Armed Bandit笔记——Softmax选择策略 - 管金昱的博文") \[2\] [The Epsilon-Greedy Algorithm \| James D. McCaffrey](https://jamesmccaffrey.wordpress.com/2017/11/30/the-epsilon-greedy-algorithm/ "The Epsilon-Greedy Algorithm | James D. McCaffrey")