MIT开放课程Dynamic Programming Lecture (14)

合集下载

MIT课程列表

第 2 页，共 26 页
MIT课程列表
MIT OCW Networks for Learning Regression and Classification MIT OCW Statistical Learning Theory and Applications MIT OCW Investigating the Neural Substrates of Remote Memory using fMRI MIT OCW Topics in Brain and Cognitive Sciences Human Ethology MIT OCW Computational Cognitive Science MIT OCW Cellular and Molecular Computation MIT OCW Language Acquisition MIT OCW Language Processing MIT OCW Psycholinguistics MIT OCW Language Acquisition I MIT OCW Laboratory in Cognitive Science MIT OCW Introduction to Neural Networks MIT OCW Cognitive Processes MIT OCW Object and Face Recognition MIT OCW Affect Biological, Psychological, and Social Aspects of Feelings MIT OCW Foundations of Cognition MIT OCW Reasonable Conduct in Science MIT OCW Special Topics in Brain and Cognitive Sciences MIT OCW Modularity, Domain specificity, and the Organization of Knowledge MIT OCW Probability and Causality in Human Cognition MIT OCW Cognitive Neuroscience of Remembering Creating and Controlling Memory MIT OCW Language and Mind MIT OCW Media Education and the Marketplace MIT OCW Introduction to Psychology 2001 MIT OCW Introduction to Psychology 2002 MIT OCW Neuroscience and Behavior MIT OCW The Brain and Cognitive Sciences I MIT OCW The Brain and Cognitive Sciences II MIT OCW Brain Laboratory MIT OCW Evolutionary Psychology MIT OCW Introduction to Computational Neuroscience MIT OCW Social Psychology MIT OCW Functional MRI of High Level Vision MIT OCW Foundations of Human Memory and Learning MIT OCW Psychology of Gender MIT OCW Intensive Neuroanatomy MIT OCW Pattern Recognition for Machine Vision MIT OCW Research Topics in Neuroscience MIT OCW Experimental Methods of Adjustable Tetrode Array Neurophysiology MIT OCW Economic History MIT OCW Econometrics Spring MIT OCW Political Economy I Theories of the State and the Economy MIT OCW Political Economy of Western Europe MIT OCW Political Economy of Chinese Reform MIT OCW Political Economy of Latin America

国外开放课程及一些国外大学的网址

国外开放课程及一些国外大学的网址国外开放课程及一些国外大学的网址一、伯克利加州大学伯克利分校/courses.php作为美国第一的公立大学，伯克利分校提供了许多优秀教授的播客和视频讲座，可以跟踪最新的讲座。

想看教授布置的作业和课堂笔记，可以点击该教授的网页，通常，他/她都会第一堂课留下网址。

实在不行，用google搜搜吧！伯克利的视频都是.rm格式，请注意转换二、麻省麻省理工学院/OcwWeb/web/courses/courses/index.htm麻省理工是免费开放教育课件的先驱，计划在今年把１８００门课程的课件都放在网站上，提供课程与作业的ＰＤＦ格式下载。

三是，麻省理工只提供少数的视频讲座。

坐过学生上麻省有一个绝对优势，麻省理工在中国大陆和中国台湾都建立了镜像网站，把麻省的课程都翻译成立中文。

鉴于ＰＤＦ格式，推荐使用ＦｏｘＩｔＲｅａｄｅｒ。

（中国大陆）推荐（中国台湾）二、卡耐基梅隆/oli/卡耐基梅隆针对初入大学的大学生，提供１０门学科的课程视频。

与其他大学的免费课程一样，非卡耐基梅隆的学子能学习课程，但是为了使学生能够及时了解自己的课程进度，卡耐基梅隆建议造访者在网站上注册，建立自己的资料库。

这样一来，你得在有限的时间内完成一门课程，还要参加几次考试，当然，即使你得了１００分，卡耐基梅隆也不会给你开证明，更不会给你学分。

四、犹他犹他大学/front-page/Courese_listing犹他大学类似于麻省理工，提供大量的课程课件五、塔夫茨塔夫茨大学塔夫茨大学也是“开放式教育课程”的先驱之一，初期提供的课程着重在本校专长的生命科学、跨领域方法、国际观点以及对美国地区性、全国性社群服务的基础理论。

六、公开英国公开大学/course/index.php英国十几所大学联合起来，组建了英国公开大学。

有一部分课程是对注册学生开放的，但是有一批很好的课程是免费的，并提供视频。

每门课还设立了论坛，在社区中，大家发表意见，提供其他的学习资源，互相取经。

名校课堂——精选推荐

附录一VERYCD上目前已有的一些开放课程（1）麻省理工学院《麻省理工开放课程：微积分重点》(Highlights of Calculus)《麻省理工开放课程：单变量微积分》(Single Variable Calculus)《麻省理工开放课程：多变量微积分》(Multivariable Calculus)《麻省理工开放课程：微分方程》(Differential Equations,Spring,2004)《麻省理工开放课程：线性代数》(Linear Algebra)《麻省理工开放课程：经典力学》(Classical MEchanics)《麻省理工开放课程: 物理学I》(Physics I)《麻省理工开放课程：电磁学》(Electricity & Magnetism)《麻省理工开放课程: 振动与波》(Vibrations and Waves)《麻省理工开放课程：航天系统工程学》(Aircraft Systems Engineering)《麻省理工开放课程：算法导论》(Introduction to Algorithms)《麻省理工开放课程：计算机科学及编程导论》(MIT Introduction to Computer Science and Programming)《麻省理工开放课程：计算机程序设计与解释》(Structure and Interpretation of Comput er Programs)《麻省理工开放课程：固态化学导论》(Introduction to Solid State Chemistry)《麻省理工开放课程：生物学》(Biology)《麻省理工开放课程：生物学导论》(Introduction to Biology)《麻省理工开放课程：生物工程学导论》(Introduction to Bioengineering)《麻省理工开放课程：西方世界的爱情哲学》(Philosophy of Love in the Western World)《麻省理工开放课程：哥德尔，埃舍尔，巴赫：一次心灵太空漫游》(Gödel, Escher, Bac h: A Mental Space Odyssey)《麻省理工开放课程：建筑设计：地景中的建筑》(Architecture Studio : Building in Land scapes)《麻省理工开放课程：电影哲学》(Philosophy of Film)《麻省理工开放课程：艺术、科学和技术中的情感和想象》(Feeling and Imagination in Art, Science, and Technology)《麻省理工开放课程：心理学导论》(Introduction to Psychology)《麻省理工开放课程：西班牙语学习》(Learn Spanish)（2）斯坦福大学《斯坦福大学开放课程：编程范式》(Programming Paradigms )《斯坦福大学开放课程：抽象编程》(Programming Abstractions)《斯坦福大学开放课程：iPhone开发教程》(Phone Application Programming)《斯坦福大学开放课程：编程模式(C和C++)》(Introduction to Computer Science - Pro gramming Abstractions)《斯坦福大学开放课程：编程方法》(Programming Methodology)《斯坦福大学开放课程：人机交互研讨》(Human-Computer Interaction Seminar)《斯坦福大学开放课程：机器学习》(Engineering Everywhere - Machine Learning)《斯坦福大学开放课程：机器人学》(Introduction to robotics)《斯坦福大学开放课程：傅立叶变换及应用》(The Fourier Transform and Its Application s )《斯坦福大学开放课程：近现代物理专题课程-宇宙学》(Modern Physics - Cosmology)《斯坦福大学开放课程：近现代物理专题课程-经典力学》(Modern Physics - Classical Me chanics)《斯坦福大学开放课程：近现代物理专题课程-统计力学》(Modern Physics - Statistic Me chanics)[《斯坦福大学开放课程：近现代物理专题课程-量子力学》(Modern Physics - Quantum M echanics)《斯坦福大学开放课程：近现代物理专题课程-量子纠缠-part1》(Modern Physics - Quant um Entanglement, Part 1)《斯坦福大学开放课程：近现代物理专题课程-量子纠缠-part3》(Modern Physics - Quant um Entanglement, Part 3)《斯坦福大学开放课程：近现代物理专题课程-广义相对论》(Modern Physics - Einstein's Theory)《斯坦福大学开放课程：近现代物理专题课程-狭义相对论》(Modern Physics - Special Re lativ ity)《斯坦福大学开放课程：线性动力系统绪论》(Introduction to Linear Dynamical Systems)《斯坦福大学开放课程：经济学》(Economics)《斯坦福大学开放课程：商业领袖和企业家》(Business Leaders and Entrepreneurs)《斯坦福大学开放课程：法律学》(Law)《斯坦福大学开放课程：达尔文的遗产》(Darwin's Legacy)《斯坦福大学开放课程：人类健康的未来：七个颠覆你思想的演讲》(The Future of Huma n Health: 7 Very Short Talks That Will Blow Your Mind)《斯坦福大学开放课程：迷你医学课堂：医学、健康及科技前沿》(Mini Med School：Med icine, Human Health, and the Frontiers of Science)《斯坦福大学开放课程：迷你医学课堂：人类健康之动态》(Mini Med School : The Dyna mics of Human Health)（3）耶鲁大学《耶鲁大学开放课程：基础物理》(Fundamentals of Physics)《耶鲁大学开放课程：天体物理学之探索和争议》(Frontiers and Controversies in Astrop hysics)《耶鲁大学开放课程：新生有机化学》(Freshman Organic Chemistry )《耶鲁大学开放课程：生物医学工程探索》(Frontiers of Biomedical Engineering)《耶鲁大学开放课程：博弈论》(Game Theory)《耶鲁大学开放课程：金融市场》(Financial Markets )《耶鲁大学开放课程：文学理论导论》(Introduction to Theory of Literature )《耶鲁大学开放课程：现代诗歌》(Modern Poetry)《耶鲁大学开放课程：1945年后的美国小说》(The American Novel Since 1945)《耶鲁大学开放课程: 弥尔顿》(Milton)《耶鲁大学开放课程：欧洲文明》(European Civilization )《耶鲁大学开放课程：旧约全书导论》(Introduction to the Old Testament (Hebrew Bibl e) )《耶鲁大学开放课程：新约及其历史背景》(Introduction to New Testament History and Literature)《耶鲁大学开放课程：1871年后的法国》(France Since 1871)《耶鲁大学开放课程：古希腊历史简介》(Introduction to Ancient Greek History )《耶鲁大学开放课程：美国内战与重建，1845-1877》(The Civil War and Reconstruction Era，1845-1877)《耶鲁大学开放课程：全球人口增长问题》(Global Problems of Population Growth)《耶鲁大学开放课程：进化，生态和行为原理》(Principles of Evolution, Ecology, and Be havior )《耶鲁大学开放课程：哲学：死亡》(Philosophy：Death)《耶鲁大学开放课程：政治哲学导论》(Introduction to Political Philosophy)《耶鲁大学开放课程：有关食物的心理学，生物学和政治学》(The Psychology, Biology a nd Politics of Food)《耶鲁大学开放课程：心理学导论》(Introduction to Psychology)《耶鲁大学开放课程：罗马建筑》(Roman Architecture)《耶鲁大学开放课程：聆听音乐》(Listening to Music)（4）哈佛大学《哈佛大学开放课程：哈佛幸福课》(Positive Psychology at Harvard)《哈佛大学开放课程：公正：该如何做是好？》(Justice: What's the Right Thing to Do? )《哈佛大学开放课程：构设动态网站》(Building Dynamic Websites）（5）牛津大学《牛津大学开放课程：尼采的心灵与自然》(Nietzsche on Mind and Nature)《牛津大学开放课程：哲学概论》(General Philosophy)《牛津大学开放课程：哲学入门》(Philosophy for Beginners)《牛津大学开放课程：批判性推理入门》(Critical Reasoning for Beginners) （6）其它名校《普林斯顿大学开放课程：人性》(InnerCore)《普林斯顿大学开放课程：自由意志定理》(The Free Will Theorem)《剑桥大学开放课程：人类学》(Anthropology)《沃顿商学院开放课程：沃顿知识在线》(Knowledge@Wharton)《哥伦比亚大学开放课程：房地产金融学I》(Real Estate Finance I)附录二部分英美名校开放课程网站美国1. 麻省理工学院/index.htm2. 卡内基梅隆大学/openlearning/forstudents/freecourses3. 约翰霍普金斯大学彭博公共卫生学院/4. 斯坦福大学/5. 圣母大学/courselist6. 杜克大学法律中心的公共领域/cspd/lectures7. 哈佛医学院/public/8. 普林斯顿大学/main/index.php9. 耶鲁大学/10. 加州大学伯克利分校英国1. 牛津大学的文字资料馆2. Greshem学院/default.asp3. 格拉斯哥大学/downloads.html4. 萨里大学/Teaching/5. 诺丁汉大学/6. 剑桥大学播客/main/Podcasts.html 参考资料：/thread-42142-1-1.html。

MIT14_15JF09_lec09

In all of these cases, interactions with other agents you are connected to aﬀect your payoﬀ, well-being, utility. How to make decisions in such situations? → “multiagent decision theory” or game theory.
Reading: Osborne, Chapters 1-2. EK, Chapter 6.
2
Networks: Lecture 9
Introduction
Motivation
In the context of social networks, or even communication networks, agents make a variety of choices. For example:
3
Networks: Lectu“Rational Decision-Making”
Powerful working hypothesis in economics: individuals act rationally in the sense of choosing the option that gives them higher “payoﬀ”.
�
u (c ) dF a (c ) ≥ U (b ) =
�
u (c ) dF b (c ) .
7
Networks: Lecture 9
Introduction
From Single Person to Multiperson Decision Problems

全球50名校开放课程网址大全

世界50所知名大学提供开放课程: (Top 50 University Open Courseware Collections)学术权威1. 麻省理工学院：麻省理工学院有许多人认为是在该国最广泛的开放课件的收集，也正好是著名大学中的第一。

学科覆盖范围从建筑、规划到人文、科学，此目录中有惊人的信息数量。

（/OcwWeb/web/home/home/index.htm）在很早以前就有台湾人开始做MIT的汉化课件，有兴趣的朋友可以去搜一下。

2. 卡内基梅隆大学：这个奇妙的大学有优秀的学术传统。

凭借其“开放的学习计划”的目标使每个人都有学习的机会并得到满足。

（/openlearning/forstudents/freecourses）3. 约翰霍普金斯大学彭博公共卫生学院：约翰霍普金斯大学是世界重要的学校之一。

虽然他们的课程设置仅限于健康知识，专业的知识使巨量收集成为最好的之一。

（/）4. 斯坦福大学：这个著名的大学为学生提供的课程，可通过iTunes供选择。

（/)5. 圣母大学：被许多人认为如果不是世界最好也是在该国最好的学校之一。

随着如历史，英语和数学等科目开放课件的产品，任何人都可以受益于这种知识的美妙学校。

（/courselist）6. 杜克大学法律中心的公共领域：杜克大学之一，是在南方最好的学校。

如果你对法律感兴趣，杜克大学学科领域的开放式课件可以大大有助于您了解司法系统漫长的道路。

（/cspd/lectures）常春藤联盟7. 哈佛医学院：虽然它的课程是限制在医学界，但他们是为在常春藤寻找信息的人很好的资源。

哈佛大学提供的课程主题，生物医疗和商业主题不等。

（/public/）8. 普林斯顿大学的通道：这所常春藤盟校有一整套客座讲座。

翻译不了了:Yale University —This wonderful Ivy League institution has a great number of ivy quality open course classes available for all.（/main/index.php）9. 耶鲁大学：这所美妙的常春藤盟校中的常春藤有一大批高质量的开放课程班所有可用。

MIT开放课程Dynamic Programming Lecture (22)

6.231DYNAMIC PROGRAMMINGLECTURE22LECTURE OUTLINE•Approximate DP for large/intractable problems •Approximate policy iteration•Simulation-based policy iteration•Actor-critic interpretation•Learning how to play tetris:A case study •Approximate value iteration with function ap-proximationAPPROX.POLICY ITERATION-DISCOUNTED CASE •Suppose that the policy evaluation is approxi-mate,according to,maxx|J k(x)−Jµk(x)|≤δ,k=0,1,...and policy improvement is also approximate,ac-cording to,maxx|(Tµk+1J k)(x)−(T J k)(x)|≤ ,k=0,1,...whereδand are some positive scalars.•Error Bound:The sequence{µk}generatedby the approximate policy iteration algorithm sat-isﬁeslim sup k→∞maxx∈SJµk(x)−J∗(x)≤ +2αδ(1−α)2•Typical practical behavior:The method makes steady progress up to a point and then the iteratesJµkoscillate within a neighborhood of J∗.APPROXIMATE POLICY ITERATION-SSP •Suppose that the policy evaluation is approxi-mate,according to,maxi=1,...,n|J k(i)−Jµk(i)|≤δ,k=0,1,...and policy improvement is also approximate,ac-cording to,maxi=1,...,n|(Tµk+1J k)(i)−(T J k)(i)|≤ ,k=0,1,... whereδand are some positive scalars.•Assume that all policies generated by the method are proper(they are guaranteed to be ifδ= =0, but not in general).•Error Bound:The sequence{µk}generated by approximate policy iteration satisﬁeslim sup k→∞maxi=1,...,nJµk(i)−J∗(i)≤n(1−ρ+n)( +2δ)(1−ρ)2whereρ=max i=1,...,nµ:properP{x n=t|x0=i,µ}SIMULATION-BASED POLICY EVALUATION •Givenµ,suppose we want to calculate Jµby simulation.•Generate by simulation sample costs.Approx-imation:Jµ(i)≈1M iM im=1c(i,m)c(i,m):m th sample cost starting from state i •Approximating each Jµ(i)is impractical for a large state space.Instead,a“compact represen-tation”˜Jµ(i,r)may be used,where r is a tunable parameter vector.We may calculate an optimal value r∗of r by a least squaresﬁtr∗=arg minrni=1M im=1c(i,m)−˜Jµ(i,r)2•This idea is the starting point for more sophisti-cated simulation-related methods,to be discussed in the next lecture.ACTOR-CRITIC INTERPRETATION•The critic calculates approximately(e.g.,using some form of a least squaresﬁt)Jµk by processing state/sample cost pairs,which are generated by the actor by simulation•Given the approximate Jµk,the actor imple-ments the improved policy Jµk+1byJ k)(i)=(T J k)(i)(Tµk+1•The state consists of the board position i,and the shape of the current falling block(astronomi-cally large number of states).•It can be shown that all policies are proper!!•Use a linear approximation architecture with feature extraction˜J(i,r)=sm=1φm(i)r m,where r=(r1,...,r s)is the parameter vector and φm(i)is the value of m th feature associated w/i.•Approximate policy iteration was implemented with the following features:−The height of each column of the wall−The difference of heights of adjacent columns −The maximum height over all wall columns −The number of“holes”on the wall−The number1(provides a constant offset)•Playing data was collected for aﬁxed value of the parameter vector r(and the corresponding policy);the policy was approximately evaluated by choosing r to match the playing data in some least-squares sense.•The method used for approximate policy eval-uation was theλ-least squares policy evaluation method,to be described in the next lecture.•See:Bertsekas and Ioffe,“Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming,”in:8001//people/dimitrib/publ.htmlVALUE ITERATION W/FUNCTION APPROXIMATION •Suppose we use a linear approximation archi-tecture ˜J(i,r )=φ(i ) r ,or ˜J=Φr where r =(r 1,...,r s )is a parameter vector,and Φis a full rank n ×s feature matrix.•Approximate value iteration method:Start with initial guess r 0;given r t ,generate r t +1byr t +1=arg min rΦr −T (Φr t ) where · is some norm.•Questions:Does r t converge to some r ∗?How close is Φr ∗to J ∗?•Convergence Result:If T is a contraction with respect to a weighted Euclidean norm ( J 2=J DJ ,where D is positive deﬁnite,symmetric),then r t converges to (the unique)r ∗satisfying r ∗=arg min rΦr −T (Φr ∗)GEOMETRIC INTERPRETATION•Consider the feature subspaceS ={Φr |r ∈ s }of all cost function approximations that are linear combinations of the feature vectors.Let Πdenote projection on this subspace.•The approximate value iteration is r t +1=ΠT (Φr t )=arg min rΦr −T (Φr t ) and amounts to starting at the point Φr t of S ap-plying T to it and then projecting on S .•Proof Idea:Since T is a contraction with re-spect to the norm of projection,and projection is nonexpansive,ΠT (which maps S to S )is a con-traction (with respect to the same norm).PROOF•Consider two vectorsΦr andΦr in S.The(Eu-clidean)projection is a nonexpansive mapping,so ΠT(Φr)−ΠT(Φr ) ≤ T(Φr)−T(Φr ) Since T is a contraction mapping(with respect to the norm of projection),T(Φr)−T(Φr ) ≤β Φr−Φrwhereβ∈(0,1)is the contraction modulus,so ΠT(Φr)−ΠT(Φr ) ≤β Φr−Φrand it follows thatΠT is a contraction(with respect to the same norm and with the same modulus).•In general,it is not clear how to obtain a Eu-clidean norm for which T is a contraction.•Important fact:In the case where T=Tµ, whereµis a stationary policy,T is a contraction for the norm J 2=J DJ,where D is diagonal with the steady-state probabilities along the diagonal.ERROR BOUND•If T is a contraction with respect to a weighted Euclidean norm · with modulus β,and r ∗is the limit of r t ,i.e.,r ∗=arg min rΦr −T (Φr ∗) then Φr ∗−J ∗ ≤ ΠJ ∗−J ∗ 1−βwhere J ∗is the ﬁxed point of T ,and ΠJ ∗is the projection of J ∗on the feature subspace S (with respect to norm · ).Proof:Using the triangle inequality,Φr ∗−J ∗ ≤ Φr ∗−ΠJ ∗ + ΠJ ∗−J ∗= ΠT (Φr ∗)−ΠT (J ∗) + ΠJ ∗−J ∗ ≤β Φr ∗−J ∗ + ΠJ ∗−J ∗ Q.E.D.•Note that the error Φr ∗−J ∗ is proportional to ΠJ ∗−J ∗ ,which can be viewed as the “power of the approximation architecture”(measures how well J ∗can be represented by the chosen fea-tures).。

10960-运筹学-21_Dynamic+Programming+I

1 2 3 4 5 6 7 8 9 10 WW L
11 12 13 14 15 16 17 18 19 20
8
Decisions with 4 or 5 matches left.
Take 1
You
win.
Take 2
4
Your opponent is Y. 3 Y loses 2 Y wins
Take 6
11
Decisions with n matches left.
Take 1
f(n)
Take 2
n
Take 6
n-1 f(n-1) n-2 f(n-2) n-6 f(n-6)
n The state
The optimal f(n-1) value
function.
TImTfcosfnrrothhpooaatmeetmmtthicemsoDPpifahsmt(tnpuraynhnaeicttlnTadatis)iiasvemnakhct=fstamigiept(dhanpelonWuoealitgD,l)chemesitvany=itescarfPolhnatlerlLfofrefukaoitm(oobhpOnemeinorgfeaat-p.paif1irsfpitmcun(tat)rnoidianmsePa=m-nelc.v1llrmaceLatoi)vtiolilhoisgan=otnudanilyrerougsfeta.e(ffnecmni(d(sRsTn,ni-ftssmeh2aua-)ni2c)entotndeioe)un=dnceeos=rgktdspfst.wi(nhLcoeRitnohiorednm-eoinin6wblcrfieu)noeiupfstm=sri(hrrsnnsotmeWtth-diahsc6oaent;ee)neakcuem=vtiimndnesaeLgiblfonto;urhep.anoeortmditdomoeftocadhfiloeslyiaotn.

MIT公开课程介绍

MIT公开课程介绍（MIT OCW）网址：//ocwweb/MIT网上免费公开课程项目于2001年4月宣布，计划在今后十年内把MIT 所有的课程内容放到网上，称之为“MIT OpenCourseWare”(MIT OCW)。

此项目2001年秋季正式启动，计划在此后两年内进行大规模的OCW试验项目，两年半内使500门以上的课程上网，到2007年总量将达到1800门课程。

目的是通过这个项目推动MIT本身的教育教学，提升MIT在全世界的形象、同时供全世界所有的人免费使用，但OCW提供的并不是网上的MIT学历教育。

到目前为止，MIT已免费公开500 门课程，涉及MIT的33个学科和全部5个学院。

我校相关课程的教师备课及课件制作、学生自学、尤其是使用双语教学的教师都可以将其作为很好的学习和参考资料，使我们能够站在一个更高的起点上进行课程建设，提高教学水平。

网址为/MIT网上所有课程都有教程大纲、课程的日期安排（教学日历）和讲课记录。

许多课程还有作业、试卷、问题（包括解答）、实验室、项目、超文本的课本、模拟、演示、辅导和讲课的视频实况及网络资源链接等。

下面对公开课程的有关内容举例介绍，希望广大教师有一个更直观的感受，以便更好的利用和借鉴。

一、33个学科1．Aeronautics and Astronautics2．Anthropology3．Architecture4．Biological Engineering Division5．Biology6．Brain and Cognitive Sciences7．Chemical Engineering8．Chemistry9．Civil and Environmental Engineering10．Comparative Media Studies11．Earth,Atmospheric,and Planetary Sciences12．Economics13．Electrical Engineering and Computer Science14．Engineering Systems Division15．Foreign Languages and Literatures16．Health Science and Technology17．History18．Linguistics and Philosophy19．Literature20．Material Science and Technology21．Mathematics22．Mechanical Engineering23．Media Arts and Sciences24．Music and Theater Arts25．Nuclear Engineering26．Ocean Engineering27．Physics28．Political Science29．Science ,Technology and Society30．Sloan School of Management31．Urban Studies and Planning32．Women’s Studies33．Writing and Humanistic Studies二、课程大纲我们以建筑学系为例。

Dynamic Programming

References
1. Richard E. Bellman. Dynamic Programming. Princeton University Press, Princeton, USA, 1957. 2. Stuart E. Dreyfus and Averill M. Law. The Art and Theory of Dynamic Programming. Mathematics in Science and Engeneering, Volume 130. Academic Press, New York, USA, 1977. 3. Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis. Parallel Computing. Benjamin Cummings, Redwood City (CA), USA, 1994. 4. David K. Smith. Dynamic Programming: a practical introduction. Mathematics and its Applications. Ellis Horwood, Chichester, GB, 1991. 5. Moshe Sniedovich. Dynamic Programming. Marcel Dekker, New York, USA, 1992.
Dynamic Programming
Marc Gengler
Swiss Federal Institute of Technology Lausanne Computer Science Theory Laboratory EPFL-DI-LITH, Ecublens (IN) CH-1015 Lauof this chapter is devoted to a certain number of example problems from di erent application areas showing how standard problems can be solved using dynamic programming. The examples will be selected so as to emphasize the two points which follow. First, we simply want to present to the reader how one may express a given problem so as to obtain a well suited state space, the transition function and so on. Second and more importantly, we also try to convince the reader that the choice of, for instance, a convenient state space may ask some imagination from the modeler trying to force a problem into the dynamic programming framework. We also address the topic of parallelization of dynamic programming programs. In order to do so, we will distinguish between several classes of transition functions, the di erence being made considering the set of states that are used in order to compute a new state. Classically, a dynamic programming problem is called monadic if the computation of a new state uses only one existing state, polyadic in the contrary. A problem is serial if it needs only states that were computed at the level immediately preceding the current level. Otherwise, it is nonserial. These distinctions re ect in fact di erent kinds of dependencies that exist between the states in the state space. These dependencies naturally in uence the parallelization of the dynamic programming problem as the only new states one can compute in parallel are those that are independent two by two and for which all entry states needed have already been computed. The principles of dynamic programmingare described and discussed in a large number of publications. 1, 2, 3, 4, 5] are for instance some general references that present and analyze the concepts of dynamic programming or address the problem of their parallelization. The work presented in this chapter is mostly based on the previously cited references as well as on a large number of research papers that address speci c points of the dynamic programming methodology, present the dynamic programming formulation for given problems or propose ways to parallelize them.

MIT麻省理工学院算法导论公开课Problem Set 7

Introduction to Algorithms November7,2005 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik D.Demaine and Charles E.Leiserson Handout22Problem Set7MIT students:This problem set is due in lecture on Monday,November14,2005.There will be two homework labs for this problem set,one held6–8P.M on Wednesday,November9,2005 and one held2–4P.M.on Sunday,November13,2005.Reading:Chapter15,16.1–16.3,22.1,and23.Problem7-1is mandatory.Failure to turn in a solution will result in a serious and negative impact on your term grade!Both exercises and problems should be solved,but only the problems should be turned in.Exercises are intended to help you master the course material.Even though you should not turn in the exercise solutions,you are responsible for material covered in the exercises.Mark the top of each sheet with your name,the course number,the problem number,your recitation section,the date and the names of any students with whom you collaborated.Please staple and turn in your solutions on3-hole punched paper.You will often be called upon to“give an algorithm”to solve a certain problem.Your write-up should take the form of a short essay.A topic paragraph should summarize the problem you are solving and what your results are.The body of the essay should provide the following:1.A description of the algorithm in English and,if helpful,pseudo-code.2.At least one worked example or diagram to show more precisely how your algorithm works.3.A proof(or indication)of the correctness of the algorithm.4.An analysis of the running time of the algorithm.Remember,your goal is to communicate.Full credit will be given only to correct solutions which are described clearly.Convoluted and obtuse descriptions will receive low marks.Exercise7-1.Do Exercise15.4-3on page356of CLRS.Exercise7-2.Do Exercise16.1-3on page379of CLRS.Exercise7-3.Do Exercise16.3-2on page384of CLRS.Exercise7-4.Do Exercise22.1-5on page530of CLRS.Exercise7-5.Do Exercise23.1-5on page566of CLRS.Exercise7-6.Do Exercise23.2-4on page574of CLRS.Exercise7-7.Do Exercise23.2-5on page574of CLRS.2Handout22:Problem Set7Problem7-1.Edit distanceIn this problem you will write a program to compute edit distance.This problem is mandatory. Failure to turn in a solution will result in a serious and negative impact on your term grade! We advise you to start this programming assignment as soon as possible,because getting all the details right in a program can take longer than you think.Many word processors and keyword search engines have a spelling correction feature.If you type in a misspelled word x,the word processor or search engine can suggest a correction y.The correction y should be a word that is close to x.One way to measure the similarity in spelling between two text strings is by“edit distance.”The notion of edit distance is useful in otherﬁelds as well.For example,biologists use edit distance to characterize the similarity of DNA or protein sequences.The edit distance d(x,y)of two strings of text,x[1..m]and y[1..n],is deﬁned to be the minimum possible cost of a sequence of“transformation operations”(deﬁned below)that transforms string1x[1..m]into string y[1..n].To deﬁne the effect of the transformation operations,we use an auxiliary string z[1..s]that holds the intermediate results.At the beginning of the transformation sequence,s=m and z[1..s]=x[1..m](i.e.,we start with string x[1..m]).At the end of the transformation sequence,we should have s=n and z[1..s]=y[1..n](i.e.,our goal is to transform into string y[..n]).Throughout the tranformation,we maintain the current length s of string z,as well as a cursor position i,i.e.,an index into string z.The invariant1�i�s+1 holds at all times during the transformation.(Notice that the cursor can move one space beyond the end of the string z in order to allow insertions at the end of the string.)Each transformation operation may alter the string z,the size s,and the cursor position i.Each transformation operation also has an associated cost.The cost of a sequence of transformation operations is the sum of the costs of the individual operations on the sequence.The goal of the edit-distance problem is toﬁnd a sequence of transformation operations of minimum cost that transforms x[1..m]into y[1..n].There areﬁve transformation operations:1Here we view a text string as an array of characters.Individual characters can be manipulated in constant time.Problem7-2.GreedSoxGreedSox,a popular major-league baseball team,is interested in one thing:making money.They have hired you as a consultant to help boost their group ticket sales.They have noticed the following problem.When a group wants to see a ballgame,all members of the group need seats(in the bleacher section),or they go away.Since partial groups can’t be seated,the bleachers are often not full.There is still space available,but not enough space for the entire group.In this case,the group cannot be seated,losing money for the GreedSox.The GreedSox want your recommendation on a new seating policy.Instead of seating peopleﬁrst-come/ﬁrst-serve,the GreedSox decide to seat large groupsﬁrst,followed by smaller groups,and ﬁnally singles(i.e.,groups of1).You are given a set of groups,G[1..m]=[g1,g2,...,g m],where g i is a number representing the size of the group.Assume that the bleachers seat n people.Consider the following greedy seating algorithm,where the function A DMIT(i)admits group i,and R EJECT(i)sends away group i.S EAT(G[1..m],n)1admitted�02G�S ORT(G) �Sort groups largest to smallest.3for i�1to m4do if G[i]�n5then A DMIT(i)6 n�n−G[i]7 admitted�admitted+G[i]8else R EJECT(i)9return admittedThe S EAT algorithmﬁrst sorts the groups by size.It then iterates through the groups from largest to smallest,seating any group thatﬁts in the bleachers.It returns the number of people admitted.(a) The GreedSox owners are right:the greedy seating algorithm works pretty well.Showthat if,given G and n,it is possible to admit k people,then the greedy seating algorithm admits at least k/2people.(b) Unfortunately,the S EAT algorithm does not work perfectly.Show that S EAT is notoptimal by giving a counterexample in which,asymptotically as n gets large,the ratiobetween greedy seating and optimal seating approaches1/2.When you present your results to the GreedSox owners,they point out the following problem: unlike numbers in a computer’s memory,real people are hard to move around.In particular,people waiting in line do not like to be“sorted.”The GreedSox owners ask you to develop a version of the greedy seating algorithm that does not modify the set G.(You can think of G as being stored in read-only memory.)You suggest the following algorithm:R E S EAT(G[1..m],n)1admitted�02for j�1to←lg n≤3do for i�1to m4do if G[i]�n/2j and G[i]�n5then A DMIT(i)6n�n−G[i]7admitted�admitted+G[i]8else if G[i]>n9then R EJECT(i)10 return admittedThe R E S EAT algorithm iterates through the list of groups several times.In theﬁrst iteration,it admits any group of size at least n/2.In the second iteration,it admits any group of size at least n/4.It continues in the same manner seating smaller and smaller groups until the theater isﬁlled. When R E S EATﬁnishes,it returns the number of people admitted.(c) Assume that,given G and n,it is possible to admit at least k people.Show that theR E S EAT algorithm still seats at least k/2people.(d) The R E S EAT algorithm runs in O(m lg n)time.Devise a new algorithm that runs inO(m)time and still guarantees that if k people can be seated,your algorithm seats atleast k/2people.。

MIT 网络技术 14_15JF09_lec16

6.207/14.15: Networks
Lecture 16: Cooperation and Trust in Networks
Daron Acemoglu and Asu Ozdaglar
MIT
November 4, 2009
1
Networks: Lecture 16
Introduction
Outline
The role of networks in cooperation A model of social norms Cohesion of groups and social norms Trust in networks
Reading: Osborne, Chapters 14 and 15.
2
Networks: Lecture 16
Introduction
The Role of Social Networks
Recall the importance of “social contacts” in ﬁnding jobs. Especially of “weak ties” (e.g., Granovetter (1973) “The Strength of Weak Ties”: most people ﬁnd jobs through acquaintances not close friends. The idea is that recommendations from people you know are more trusted. Similarly, social networks important in starting businesses? Recall that in many developing economies (but also even in societies with very strong institutions), networks of “acquaintances and contacts” shape business behavior. (e.g., Munshi (2009) “Strength in Numbers: A Network-Based Solution to Occupational Traps”). The Indian diamond industry is dominated by a few small subcasts, the Marwaris, the Palanpuris, the Kathiawaris—in the same way that Antwerp and New York diamond trade used to be dominated by ultra-Orthodox Jews.

MIT-OCW-麻省理工学院开放式课程

MIT OCW 麻省理工学院开放式课程原文:麻省理工学院的开放式课程计划”目前上线了九百门课程，已经到达了2007年所有课程上线的目标的一半。

这些课程包括了麻省理工学院五个领域的三十三个不同学科的课程。

中文简体:中文繁体:JHSPH约翰霍普金斯大学彭博公共卫生管理学院开放式课程原文:我们网站初期将先公布两门课程，在2005年4月之前预计再开放八门，在接下来的数年中则会公布更多的课程。

中文简体：中文繁体：USU OpenCourseWare犹他州立大学开放课程原文：犹他州立大学开放课程是提供世界各地的学生、自学者和教育家们免费、自由开放的教育资源。

开放式课程符合犹他州立大学透过学习、发现和投入来服务公众的目标。

即是引导犹他州立大学的理念：「学术优先。

」不管你是寻求额外协助的学生、想要准备新课程的教师、或是仅想要学习感兴趣主题的自学者，我们都希望犹他州立大学的开放式课程对您是有价值的。

犹他州立大学开放课程并不提供学分或学位，也不提供与犹他州立大学师资的联络管道。

犹他州立大学开放课程给您自由取用的是本校各个课程中所使用的资源与内容。

中文简体：中文繁体：大阪大学开放式课程计划日文:大阪大学坚守“立足本土、放眼世界”的格言，为了培养同时享有社会信赖的判断力、丰富的策划力以及和不同文化背景交流的沟通能力的人才，敝校以“教养”、“设计”、“国际”为具体的教育目标，努力不懈。

推动研究进步首重“网络”和“连结”，而在善用各种合作以发展学门融合的新学术领域的同时，希望也可向教育界反应其成果。

国内六所大学首度携手合作，将所进行的教育及研究资产公开于麻省理工学院所提倡的开放式课程网页上，共同构筑一个“知识网络”。

此举对负有知识交流场的使命的大学来说，是理所当然的责任，也是社会贡献的一环。

另外，相信藉由此本知识网络上所提供的有限的教材，将可实现敝校“设计”、“国际”等教育目标。

中文繁体:京都大学开放式课程网页日文:京都大学开放式课程网页从“具创造性的全球与在地知识社群”的全人类视点出发，希望可对国际知识资产的累积做出贡献，提高京都大学的能见度，招收全世界优秀的教员及学生，并增进国际网络教育的进展。

《斯坦福大学开放课程：编程方法》讲义#13

Mehran Sahami CS 106A Handout #12 October 5, 2007Assignment #2: Simple Java ProgramsDue: 3:15pm on Monday, October 15thBased on a handout by Eric Roberts Your job in this assignment is to write programs to solve each of these six problems.1. Write a GraphicsProgram subclass that draws a pyramid consisting of bricksarranged in horizontal rows, so that the number of bricks in each row decreases by one as you move up the pyramid, as shown in the following sample run:The pyramid should be centered at the bottom of the window and should use constants for the following parameters:BRICK_WIDTH BRICK_HEIGHT BRICKS_IN_BASE The width of each brick (30 pixels) The height of each brick (12 pixels) The number of bricks in the base (14)The numbers in parentheses show the values for this diagram, but you must be able to change those values in your program.2. Suppose that you’ve been hired to produce a program that draws an image of an archery target —or, if you prefer commercial applications, a logo for a national department store chain —that looks like this:This figure is simply three GOval objects, two red and one white, drawn in the correct order. The outer circle should have a radius of one inch (72 pixels), the white circle has a radius of 0.65 inches, and the inner red circle has a radius of 0.3 inches. The figure should be centered in the window of a GraphicsProgram subclass.3. Write a GraphicsProgram subclass that draws a partial diagram of the acm.program class hierarchy, as follows: The only classes you need to create this picture are GRect , GLabel , and GLine . The major part of the problem is specifying the coordinates so that the different elementsa b of the picture are aligned properly. The aspects of the alignment for which you are responsible are:· The width and height of the class boxes should be specified as named constants so that they are easy to change.· The labels should be centered in their boxes. You can find the width of a label by calling label.getWidth() and the height it extends above the baseline by calling label.getAscent(). If you want to center a label, you need to shift its origin by half of these distances in each direction.· The connecting lines should start and end at the center of the appropriate edge of the box.· The entire figure should be centered in the window. 4. In high-school geometry, you learned the Pythagorean theorem for the relationship of the lengths of the three sides of a right triangle:a 2 +b 2 =c 2which can alternatively be written as:c = 2 2Most of this expression contains simple operators covered in Chapter 3. The one piece that’s missing is taking square roots, which you can do by calling the standard function Math.sqrt . For example, the statement double y = Math.sqrt(x);sets y to the square root of x .Write a ConsoleProgram that accepts values for a and b as int s and then calculates the solution of c as a double . Your program should be able to duplicate the following sample run:5. Write a ConsoleProgram that reads in a list of integers, one per line, until a sentinel value of 0 (which you should be able to change easily to some other value). When the sentinel is read, your program should display the smallest and largest values in the list, as illustrated in this sample run: Your program should handle the following special cases: · If the user enters only one value before the sentinel, the program should report that value as both the largest and smallest. · If the user enters the sentinel on the very first input line, then no values have been entered, and your program should display a message to that effect.6. Douglas Hofstadter’s Pulitzer -prize-winning book Gödel, Escher, Bach contains many interesting mathematical puzzles, many of which can be expressed in the form of computer programs. In Chapter XII, Hofstadter mentions a wonderful problem that is well within the scope of the control statements from Chapter 4. The problem can be expressed as follows:Pick some positive integer and call it n .If n is even, divide it by two. If n is odd, multiply it by three and add one. Continue this process until n is equal to one.On page 401 of the Vintage edition, Hofstadter illustrates this process with the following example, starting with the number 15:1546 23 70 35 106 53 is odd, so I make 3n + 1: is even, so I take half: is odd, so I make 3n + 1: is even, so I take half: is odd, so I make 3n + 1: is even, so I take half: is odd, so I make 3n + 1: 46 23 70 35 106 53 160160 80 40 20 10 5 16 8 4 2 is even, so I take half:is even, so I take half:is even, so I take half:is even, so I take half:is even, so I take half:is odd, so I make 3n + 1:is even, so I take half:is even, so I take half:is even, so I take half:is even, so I take half:804020105168421As you can see from this example, the numbers go up and down, but eventually—at least for all numbers that have ever been tried—comes down to end in 1. In some respects, this process is reminiscent of the formation of hailstones, which get carried upward by the winds over and over again before they finally descend to the ground. Because of this analogy, this sequence of numbers is usually called the Hailstone sequence, although it goes by many other names as well.Write a ConsoleProgram that reads in a number from the user and then displays the Hailstone sequence for that number, just as in Hofstadter’s book, followed by a line showing the number of steps taken to reach 1. For example, your program should be able to produce a sample run that looks like this:The fascinating thing about this problem is that no one has yet been able to prove that it always stops. The number of steps in the process can certainly get very large. How many steps, for example, does your program take when n is 27?。

DynamicProgramming培训课件x

dynamicprogramming培训课件xxx年xx月xx日•动态规划概述•动态规划基础知识•动态规划算法实现•动态规划进阶知识目•动态规划常见问题•动态规划实例展示录01动态规划概述定义：动态规划是一种通过将问题分解为相互重叠的子问题来获得最优解的算法设计技术。

在每个子问题中，通过存储之前子问题的最优解，避免了重复计算，从而减少了计算时间。

特点基于自底向上的方法：动态规划从底层的子问题开始，逐步解决更大规模的问题。

重叠子问题和记忆化技术：通过将子问题的解存储在表格中以避免重复计算。

最优子结构：问题的最优解可以从其子问题的最优解中推导出来。

定义与特点010*******资源分配问题如背包问题、资源分配问题等，通过动态规划可以求得最优解。

最短路径问题如Floyd-Warshall算法和Dijkstra算法等，使用动态规划思想解决最短路径问题。

序列比对问题如Smith-Waterman算法和Longest CommonSubsequence(LCS)算法等，利用动态规划比对序列。

动态规划的应用场景动态规划的思想起源于20世纪50年代，当时数学家Bellman等人提出了著名的“多阶段决策过程的最优化”问题，即如何找到最优策略以最小化从初始状态到目标状态的代价。

自Bellman的开创性工作以来，动态规划在理论和实际应用中都得到了广泛的发展。

现在，动态规划已经广泛应用于许多领域，如计算机视觉、自然语言处理、机器学习等。

历史发展动态规划的历史与发展02动态规划基础知识状态转移方程描述系统状态的演变过程通常用递推的方式求解可以使用表格或函数的方式表示状态转移方程问题的最优解包含其子问题的最优解用于将原问题分解为子问题，并分别求解子问题可以使用递归的方式求解每个子问题，并将结果存储起来以备后续使用最优子结构边界条件010203描述问题的起始和结束状态通常作为方程的边界约束条件可以使用循环的方式逐个求解每个子问题的边界条件状态转移矩阵02可以使用表格或函数的方式表示状态转移矩阵03可以使用递推的方式计算状态转移矩阵的每个元素010203描述算法的执行时间和空间复杂度可以使用数学分析方法来评估算法的计算复杂度可以使用优化算法来降低算法的计算复杂度计算复杂度03动态规划算法实现总结词在动态规划算法中，使用一维数组存储中间结果，可以有效降低空间复杂度，提高算法效率。

普林斯顿计算机公开课(原书第2版)

第8章络
第10章万维
8.1与调制解调器 8.2有线电视和DSL 8.3局域和以太 8.4无线络 8.5手机 8.6带宽 8.7压缩 8.8错误检测与纠正 8.9小结
9.1互联概述 9.2域名和 9.3路由 9.4 TCP/IP 9.5高层协议 9.7物联 9.8小结
10.1万维是如何工作的 10.2 HTML 10.3 cookie 10.4动态页 10.5页之外的动态内容 10.6病毒、蠕虫和木马 10.7 Web安全 10.8自我防御 10.9小结
普林斯顿计算机公开课（原书第2版）
读书笔记模板
01 思维导图
03 目录分析 05 精彩摘录
目录
02 内容摘要 04 读书笔记 06 作者介绍
思维导图

本书关键字分析思维导图
技术
语言
第版
工作
计算机
课程
硬件
公开课
世界
计算机小结
编程
普林斯顿
第章
字节
软件
部分
信息
程序
内容摘要
从1999年开始，作者在普林斯顿大学开设了一门名为“我们世界中的计算机”的课程（COS 109： Computers in Our World），这门课向非计算机专业的学生介绍计算机的基本常识，多年来大受学生追捧。本书就是基于这门课程的讲义编写而成的，书中不仅解释了计算机和通信系统的工作原理，还分析了新技术带来的隐私和安全问题。第2版的新增章节讨论了Python编程、人工智能、机器学习以及大数据等内容。本书适合所有希望了解数字世界的读者阅读，通过了解技术的工作原理、起源和未来发展趋势，更好地理解并改变我们身处的世界。
第5章编程与编程语言
第4章算法

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

6.231DYNAMIC PROGRAMMINGLECTURE14LECTURE OUTLINE•Limited lookahead policies •Performance bounds •Computational aspects•Problem approximation approach •Vehicle routing example •Heuristic cost-to-go approximation •Computer chessLIMITED LOOKAHEAD POLICIES •One-step lookahead(1SL)policy:At each k and state x k,use the controlµk(x k)thatminu k∈U k(x k)Eg k(x k,u k,w k)+˜J k+1f k(x k,u k,w k),where−˜J N=g N.−˜J k+1:approximation to true cost-to-go J k+1•Two-step lookahead policy:At each k and x k,use the control˜µk(x k)attaining the minimum above, where the function˜J k+1is obtained using a1SL approximation(solve a2-step DP problem).•If˜J k+1is readily available and the minimization above is not too hard,the1SL policy is imple-mentable on-line.•Sometimes one also replaces U k(x k)above with a subset of“most promising controls”U k(x k).•As the length of lookahead increases,the re-quired computation quickly explodes.PERFORMANCE BOUNDS•Let J k(x k)be the cost-to-go from(x k,k)of the 1SL policy,based on functions˜J k.•Assume that for all(x k,k),we haveˆJk(x k)≤˜J k(x k),(*) whereˆJ N=g N and for all k,ˆJ k (x k)=minu k∈U k(x k)Eg k(x k,u k,w k)+˜J k+1f k(x k,u k,w k),[soˆJ k(x k)is computed along withµk(x k)].ThenJ k(x k)≤ˆJ k(x k),for all(x k,k).•Important application:When˜J k is the cost-to-go of some heuristic policy(then the1SL policy is called the rollout policy).•The bound can be extended to the case where there is aδk in the RHS of(*).ThenJ k(x k)≤˜J k(x k)+δk+···+δN−1COMPUTATIONAL ASPECTS •Sometimes nonlinear programming can be used to calculate the1SL or the multistep version[par-ticularly when U k(x k)is not a discrete set].Con-nection with the methodology of stochastic pro-gramming.•The choice of the approximating functions˜J k is critical,and is calculated with a variety of methods.•Some approaches:(a)Problem Approximation:Approximate the op-timal cost-to-go with some cost derived froma related but simpler problem(b)Heuristic Cost-to-Go Approximation:Approx-imate the optimal cost-to-go with a functionof a suitable parametric form,whose param-eters are tuned by some heuristic or system-atic scheme(Neuro-Dynamic Programming) (c)Rollout Approach:Approximate the optimalcost-to-go with the cost of some suboptimalpolicy,which is calculated either analyticallyor by simulationPROBLEM APPROXIMATION •Many(problem-dependent)possibilities−Replace uncertain quantities by nominal val-ues,or simplify the calculation of expectedvalues by limited simulation−Simplify difﬁcult constraints or dynamics •Example of enforced decomposition:Route m vehicles that move over a graph.Each node has a“value.”Theﬁrst vehicle that passes through the node collects its value.Max the total collected value,subject to initial andﬁnal time constraints (plus time windows and other constraints).•Usually the1-vehicle version of the problem is much simpler.This motivates an approximation obtained by solving single vehicle problems.•1SL scheme:At time k and state x k(position of vehicles and“collected value nodes”),consider all possible k th moves by the vehicles,and at the resulting states we approximate the optimal value-to-go with the value collected by optimizing the vehicle routes one-at-a-timeHEURISTIC COST-TO-GO APPROXIMATION •Use a cost-to-go approximation from a paramet-ric class˜J(x,r)where x is the current state and r=(r1,...,r m)is a vector of“tunable”scalars (weights).•By adjusting the weights,one can change the “shape”of the approximation˜J so that it is reason-ably close to the true optimal cost-to-go function.•Two key issues:−The choice of parametric class˜J(x,r)(the approximation architecture).−Method for tuning the weights(“training”the architecture).•Successful application strongly depends on how these issues are handled,and on insight about the problem.•Sometimes a simulator is used,particularly when there is no mathematical model of the sys-tem.APPROXIMATION ARCHITECTURES •Divided in linear and nonlinear[i.e.,linear or nonlinear dependence of˜J(x,r)on r].•Linear architectures are easier to train,but non-linear ones(e.g.,neural networks)are richer.•Architectures based on feature extraction(•Ideally,the features will encode much of the nonlinearity that is inherent in the cost-to-go ap-proximated,and the approximation may be quite accurate without a complicated architecture.•Sometimes the state space is partitioned,and “local”features are introduced for each subset of the partition(they are0outside the subset).•With a well-chosen feature vector y(x),we can use a linear architecture˜J(x,r)=ˆJ y(x),r =r i y i(x)i•Programs use a feature-based position evalua-tor that assigns a score to each move/position•Most often the weighting of features is linear but multistep lookahead is involved.•Most often the training is done by trial and error.•Additional features:−Depthﬁrst search−Variable depth search when dynamic posi-tions are involved−Alpha-beta pruning•Multistep lookahead tree+8+20+18+16+24+20+10+12-4+8+21+11-5+10+32+27+10+9+3•Alpha-beta pruning:As the move scores are evaluated by depth-ﬁrst search,branches whose consideration(based on the calculations so far) cannot possibly change the optimal move are ne-glected。