Reinforcement Learning (2021)-04

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

教材
课程内容
1.Introduction to RL
2.Making Sequences of Good Decisions Given a
Model of the World
3.Model-Free Policy Evaluation and Control
4.Value Function Approximation and Deep Q
Learning
5.Deep Q Learning Varietas
6.Policy Gradient I
7.Policy Gradient II
Recall: Reinforcement Learning Involves
Optimization
Delayed consequences
Exploration
Generalization
Today: Focus on Generalization
Optimization
Delayed consequences
Exploration
Generalization
Value Function Approximation (VFA)
Represent a (state-action/state) value function with a parameterized function instead of a table
Motivation for VFA
Don't want to have to explicitly store or learn for
every single state a
–Dynamics or reward model
–Value
–State-action value
–Policy
Want more compact representation that
generalizes across state or states and actions
Benets of Generalization
Reduce memory needed to store (P,R)/V/Q/
Reduce computation needed to compute (P,R)/V/Q/ Reduce experience needed to find a good (P,R)/V/Q/
Value Function Approximation (VFA)
Represent a (state-action/state) value function with a parameterized function instead of a table
Which function approximator?
Function Approximators
Many possible function approximators including –Linear combinations of features
–Neural networks
–Decision trees
–Nearest neighbors
–Fourier/ wavelet bases
In this class we will focus on function approximators that are dierentiable(Why?)
Two very popular classes of differentiable function approximators
–Linear feature representations
–Neural networks
Recall: Stochastic Gradient Descent
Model Free VFA Prediction / Policy Evaluation Recall model-free policy evaluation (Lecture 3)
–Following a fixed policy (or had access to prior data)
–Goal is to estimate and/or
Maintained a look up table to store estimates
and/or
Updated these estimates after each episode (Monte Carlo methods) or after each step (TD methods)
Now: in value function approximation, change the
estimate update step to include fitting the function
approximator
State Action Value Function Approximation
Incremental Model-Free Control Approaches
RL with Function Approximator
Linear value function approximators assume value
function is a weighted combination of a set of features, where each feature a function of the state
Linear VFA often work well given the right set of features But can require carefully hand designing that feature set An alternative is to use a much richer function
approximation class that is able to directly go from states without requiring an explicit specification of features
Local representations including Kernel based approaches have some appealing properties (including convergence results under certain cases) but can't typically scale well to enormous spaces and datasets
Deep Neural Networks (DNN)
Uses distributed representations instead of local representations
Universal function approximator
Can potentially need exponentially less
nodes/parameters (compared to a shallow net) to represent the same function
Can learn the parameters using stochastic gradient descent
Why Do We Care About CNNs?
CNNs extensively used in computer vision
If we want to go from pixels to decisions, likely useful to leverage insights for visual input
Deep Reinforcement Learning
•Use deep neural networks to represent
•Value function
•Policy
•Model
•Optimize loss function by stochastic gradient descent (SGD)
Deep Q-Networks (DQNs)
Recall: Incremental Model-Free Control Approaches
Using these ideas to do Deep RL in Atari
DQNs in Atari
Q-Learning with Value Function Approximation
DQNs: Experience Replay
DQNs: Fixed Q-Targets
DQNs Summary
DQN
Figure: Human-level control through deep
reinforcement learning, Mnih et al, 2015。