MIT开放课程Dynamic Programming Lecture (14)
第 2 页,共 26 页
In all of these cases, interactions with other agents you are connected to affect your payoff, well-being, utility. How to make decisions in such situations? → “multiagent decision theory” or game theory.
Reading: Osborne, Chapters 1-2. EK, Chapter 6.
Networks: Lecture 9
In the context of social networks, or even communication networks, agents make a variety of choices. For example:
Networks: Lectu“Rational Decision-Making”
Powerful working hypothesis in economics: individuals act rationally in the sense of choosing the option that gives them higher “payoff”.
u (c ) dF a (c ) ≥ U (b ) =
u (c ) dF b (c ) .
Networks: Lecture 9
From Single Person to Multiperson Decision Problems
MIT开放课程Dynamic Programming Lecture (22)
![MIT开放课程Dynamic Programming Lecture (22)](
6.231DYNAMIC PROGRAMMINGLECTURE22LECTURE OUTLINE•Approximate DP for large/intractable problems •Approximate policy iteration•Simulation-based policy iteration•Actor-critic interpretation•Learning how to play tetris:A case study •Approximate value iteration with function ap-proximationAPPROX.POLICY ITERATION-DISCOUNTED CASE •Suppose that the policy evaluation is approxi-mate,according to,maxx|J k(x)−Jµk(x)|≤δ,k=0,1,...and policy improvement is also approximate,ac-cording to,maxx|(Tµk+1J k)(x)−(T J k)(x)|≤ ,k=0,1,...whereδand are some positive scalars.•Error Bound:The sequence{µk}generatedby the approximate policy iteration algorithm sat-isfieslim sup k→∞maxx∈SJµk(x)−J∗(x)≤ +2αδ(1−α)2•Typical practical behavior:The method makes steady progress up to a point and then the iteratesJµkoscillate within a neighborhood of J∗.APPROXIMATE POLICY ITERATION-SSP •Suppose that the policy evaluation is approxi-mate,according to,maxi=1,...,n|J k(i)−Jµk(i)|≤δ,k=0,1,...and policy improvement is also approximate,ac-cording to,maxi=1,...,n|(Tµk+1J k)(i)−(T J k)(i)|≤ ,k=0,1,... whereδand are some positive scalars.•Assume that all policies generated by the method are proper(they are guaranteed to be ifδ= =0, but not in general).•Error Bound:The sequence{µk}generated by approximate policy iteration satisfieslim sup k→∞maxi=1,...,nJµk(i)−J∗(i)≤n(1−ρ+n)( +2δ)(1−ρ)2whereρ=max i=1,...,nµ:properP{x n=t|x0=i,µ}SIMULATION-BASED POLICY EVALUATION •Givenµ,suppose we want to calculate Jµby simulation.•Generate by simulation sample costs.Approx-imation:Jµ(i)≈1M iM im=1c(i,m)c(i,m):m th sample cost starting from state i •Approximating each Jµ(i)is impractical for a large state space.Instead,a“compact represen-tation”˜Jµ(i,r)may be used,where r is a tunable parameter vector.We may calculate an optimal value r∗of r by a least squaresfitr∗=arg minrni=1M im=1c(i,m)−˜Jµ(i,r)2•This idea is the starting point for more sophisti-cated simulation-related methods,to be discussed in the next lecture.ACTOR-CRITIC INTERPRETATION•The critic calculates approximately(e.g.,using some form of a least squaresfit)Jµk by processing state/sample cost pairs,which are generated by the actor by simulation•Given the approximate Jµk,the actor imple-ments the improved policy Jµk+1byJ k)(i)=(T J k)(i)(Tµk+1•The state consists of the board position i,and the shape of the current falling block(astronomi-cally large number of states).•It can be shown that all policies are proper!!•Use a linear approximation architecture with feature extraction˜J(i,r)=sm=1φm(i)r m,where r=(r1,...,r s)is the parameter vector and φm(i)is the value of m th feature associated w/i.•Approximate policy iteration was implemented with the following features:−The height of each column of the wall−The difference of heights of adjacent columns −The maximum height over all wall columns −The number of“holes”on the wall−The number1(provides a constant offset)•Playing data was collected for afixed value of the parameter vector r(and the corresponding policy);the policy was approximately evaluated by choosing r to match the playing data in some least-squares sense.•The method used for approximate policy eval-uation was theλ-least squares policy evaluation method,to be described in the next lecture.•See:Bertsekas and Ioffe,“Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming,”in:8001//people/dimitrib/publ.htmlVALUE ITERATION W/FUNCTION APPROXIMATION •Suppose we use a linear approximation archi-tecture ˜J(i,r )=φ(i ) r ,or ˜J=Φr where r =(r 1,...,r s )is a parameter vector,and Φis a full rank n ×s feature matrix.•Approximate value iteration method:Start with initial guess r 0;given r t ,generate r t +1byr t +1=arg min rΦr −T (Φr t ) where · is some norm.•Questions:Does r t converge to some r ∗?How close is Φr ∗to J ∗?•Convergence Result:If T is a contraction with respect to a weighted Euclidean norm ( J 2=J DJ ,where D is positive definite,symmetric),then r t converges to (the unique)r ∗satisfying r ∗=arg min rΦr −T (Φr ∗)GEOMETRIC INTERPRETATION•Consider the feature subspaceS ={Φr |r ∈ s }of all cost function approximations that are linear combinations of the feature vectors.Let Πdenote projection on this subspace.•The approximate value iteration is r t +1=ΠT (Φr t )=arg min rΦr −T (Φr t ) and amounts to starting at the point Φr t of S ap-plying T to it and then projecting on S .•Proof Idea:Since T is a contraction with re-spect to the norm of projection,and projection is nonexpansive,ΠT (which maps S to S )is a con-traction (with respect to the same norm).PROOF•Consider two vectorsΦr andΦr in S.The(Eu-clidean)projection is a nonexpansive mapping,so ΠT(Φr)−ΠT(Φr ) ≤ T(Φr)−T(Φr ) Since T is a contraction mapping(with respect to the norm of projection),T(Φr)−T(Φr ) ≤β Φr−Φrwhereβ∈(0,1)is the contraction modulus,so ΠT(Φr)−ΠT(Φr ) ≤β Φr−Φrand it follows thatΠT is a contraction(with respect to the same norm and with the same modulus).•In general,it is not clear how to obtain a Eu-clidean norm for which T is a contraction.•Important fact:In the case where T=Tµ, whereµis a stationary policy,T is a contraction for the norm J 2=J DJ,where D is diagonal with the steady-state probabilities along the diagonal.ERROR BOUND•If T is a contraction with respect to a weighted Euclidean norm · with modulus β,and r ∗is the limit of r t ,i.e.,r ∗=arg min rΦr −T (Φr ∗) then Φr ∗−J ∗ ≤ ΠJ ∗−J ∗ 1−βwhere J ∗is the fixed point of T ,and ΠJ ∗is the projection of J ∗on the feature subspace S (with respect to norm · ).Proof:Using the triangle inequality,Φr ∗−J ∗ ≤ Φr ∗−ΠJ ∗ + ΠJ ∗−J ∗= ΠT (Φr ∗)−ΠT (J ∗) + ΠJ ∗−J ∗ ≤β Φr ∗−J ∗ + ΠJ ∗−J ∗ Q.E.D.•Note that the error Φr ∗−J ∗ is proportional to ΠJ ∗−J ∗ ,which can be viewed as the “power of the approximation architecture”(measures how well J ∗can be represented by the chosen fea-tures).。
Dynamic Programming
![Dynamic Programming](
1. Richard E. Bellman. Dynamic Programming. Princeton University Press, Princeton, USA, 1957. 2. Stuart E. Dreyfus and Averill M. Law. The Art and Theory of Dynamic Programming. Mathematics in Science and Engeneering, Volume 130. Academic Press, New York, USA, 1977. 3. Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis. Parallel Computing. Benjamin Cummings, Redwood City (CA), USA, 1994. 4. David K. Smith. Dynamic Programming: a practical introduction. Mathematics and its Applications. Ellis Horwood, Chichester, GB, 1991. 5. Moshe Sniedovich. Dynamic Programming. Marcel Dekker, New York, USA, 1992.
Dynamic Programming
Marc Gengler
Swiss Federal Institute of Technology Lausanne Computer Science Theory Laboratory EPFL-DI-LITH, Ecublens (IN) CH-1015 Lauof this chapter is devoted to a certain number of example problems from di erent application areas showing how standard problems can be solved using dynamic programming. The examples will be selected so as to emphasize the two points which follow. First, we simply want to present to the reader how one may express a given problem so as to obtain a well suited state space, the transition function and so on. Second and more importantly, we also try to convince the reader that the choice of, for instance, a convenient state space may ask some imagination from the modeler trying to force a problem into the dynamic programming framework. We also address the topic of parallelization of dynamic programming programs. In order to do so, we will distinguish between several classes of transition functions, the di erence being made considering the set of states that are used in order to compute a new state. Classically, a dynamic programming problem is called monadic if the computation of a new state uses only one existing state, polyadic in the contrary. A problem is serial if it needs only states that were computed at the level immediately preceding the current level. Otherwise, it is nonserial. These distinctions re ect in fact di erent kinds of dependencies that exist between the states in the state space. These dependencies naturally in uence the parallelization of the dynamic programming problem as the only new states one can compute in parallel are those that are independent two by two and for which all entry states needed have already been computed. The principles of dynamic programmingare described and discussed in a large number of publications. 1, 2, 3, 4, 5] are for instance some general references that present and analyze the concepts of dynamic programming or address the problem of their parallelization. The work presented in this chapter is mostly based on the previously cited references as well as on a large number of research papers that address speci c points of the dynamic programming methodology, present the dynamic programming formulation for given problems or propose ways to parallelize them.
MIT麻省理工学院 算法导论公开课Problem Set 7
![MIT麻省理工学院 算法导论公开课Problem Set 7](
Introduction to Algorithms November7,2005 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik D.Demaine and Charles E.Leiserson Handout22Problem Set7MIT students:This problem set is due in lecture on Monday,November14,2005.There will be two homework labs for this problem set,one held6–8P.M on Wednesday,November9,2005 and one held2–4P.M.on Sunday,November13,2005.Reading:Chapter15,16.1–16.3,22.1,and23.Problem7-1is mandatory.Failure to turn in a solution will result in a serious and negative impact on your term grade!Both exercises and problems should be solved,but only the problems should be turned in.Exercises are intended to help you master the course material.Even though you should not turn in the exercise solutions,you are responsible for material covered in the exercises.Mark the top of each sheet with your name,the course number,the problem number,your recitation section,the date and the names of any students with whom you collaborated.Please staple and turn in your solutions on3-hole punched paper.You will often be called upon to“give an algorithm”to solve a certain problem.Your write-up should take the form of a short essay.A topic paragraph should summarize the problem you are solving and what your results are.The body of the essay should provide the following:1.A description of the algorithm in English and,if helpful,pseudo-code.2.At least one worked example or diagram to show more precisely how your algorithm works.3.A proof(or indication)of the correctness of the algorithm.4.An analysis of the running time of the algorithm.Remember,your goal is to communicate.Full credit will be given only to correct solutions which are described clearly.Convoluted and obtuse descriptions will receive low marks.Exercise7-1.Do Exercise15.4-3on page356of CLRS.Exercise7-2.Do Exercise16.1-3on page379of CLRS.Exercise7-3.Do Exercise16.3-2on page384of CLRS.Exercise7-4.Do Exercise22.1-5on page530of CLRS.Exercise7-5.Do Exercise23.1-5on page566of CLRS.Exercise7-6.Do Exercise23.2-4on page574of CLRS.Exercise7-7.Do Exercise23.2-5on page574of CLRS.2Handout22:Problem Set7Problem7-1.Edit distanceIn this problem you will write a program to compute edit distance.This problem is mandatory. Failure to turn in a solution will result in a serious and negative impact on your term grade! We advise you to start this programming assignment as soon as possible,because getting all the details right in a program can take longer than you think.Many word processors and keyword search engines have a spelling correction feature.If you type in a misspelled word x,the word processor or search engine can suggest a correction y.The correction y should be a word that is close to x.One way to measure the similarity in spelling between two text strings is by“edit distance.”The notion of edit distance is useful in otherfields as well.For example,biologists use edit distance to characterize the similarity of DNA or protein sequences.The edit distance d(x,y)of two strings of text,x[1..m]and y[1..n],is defined to be the minimum possible cost of a sequence of“transformation operations”(defined below)that transforms string1x[1..m]into string y[1..n].To define the effect of the transformation operations,we use an auxiliary string z[1..s]that holds the intermediate results.At the beginning of the transformation sequence,s=m and z[1..s]=x[1..m](i.e.,we start with string x[1..m]).At the end of the transformation sequence,we should have s=n and z[1..s]=y[1..n](i.e.,our goal is to transform into string y[..n]).Throughout the tranformation,we maintain the current length s of string z,as well as a cursor position i,i.e.,an index into string z.The invariant1�i�s+1 holds at all times during the transformation.(Notice that the cursor can move one space beyond the end of the string z in order to allow insertions at the end of the string.)Each transformation operation may alter the string z,the size s,and the cursor position i.Each transformation operation also has an associated cost.The cost of a sequence of transformation operations is the sum of the costs of the individual operations on the sequence.The goal of the edit-distance problem is tofind a sequence of transformation operations of minimum cost that transforms x[1..m]into y[1..n].There arefive transformation operations:1Here we view a text string as an array of characters.Individual characters can be manipulated in constant time.Problem7-2.GreedSoxGreedSox,a popular major-league baseball team,is interested in one thing:making money.They have hired you as a consultant to help boost their group ticket sales.They have noticed the following problem.When a group wants to see a ballgame,all members of the group need seats(in the bleacher section),or they go away.Since partial groups can’t be seated,the bleachers are often not full.There is still space available,but not enough space for the entire group.In this case,the group cannot be seated,losing money for the GreedSox.The GreedSox want your recommendation on a new seating policy.Instead of seating peoplefirst-come/first-serve,the GreedSox decide to seat large groupsfirst,followed by smaller groups,and finally singles(i.e.,groups of1).You are given a set of groups,G[1..m]=[g1,g2,...,g m],where g i is a number representing the size of the group.Assume that the bleachers seat n people.Consider the following greedy seating algorithm,where the function A DMIT(i)admits group i,and R EJECT(i)sends away group i.S EAT(G[1..m],n)1admitted�02G�S ORT(G) �Sort groups largest to smallest.3for i�1to m4do if G[i]�n5then A DMIT(i)6 n�n−G[i]7 admitted�admitted+G[i]8else R EJECT(i)9return admittedThe S EAT algorithmfirst sorts the groups by size.It then iterates through the groups from largest to smallest,seating any group thatfits in the bleachers.It returns the number of people admitted.(a) The GreedSox owners are right:the greedy seating algorithm works pretty well.Showthat if,given G and n,it is possible to admit k people,then the greedy seating algorithm admits at least k/2people.(b) Unfortunately,the S EAT algorithm does not work perfectly.Show that S EAT is notoptimal by giving a counterexample in which,asymptotically as n gets large,the ratiobetween greedy seating and optimal seating approaches1/2.When you present your results to the GreedSox owners,they point out the following problem: unlike numbers in a computer’s memory,real people are hard to move around.In particular,people waiting in line do not like to be“sorted.”The GreedSox owners ask you to develop a version of the greedy seating algorithm that does not modify the set G.(You can think of G as being stored in read-only memory.)You suggest the following algorithm:R E S EAT(G[1..m],n)1admitted�02for j�1to←lg n≤3do for i�1to m4do if G[i]�n/2j and G[i]�n5then A DMIT(i)6n�n−G[i]7admitted�admitted+G[i]8else if G[i]>n9then R EJECT(i)10 return admittedThe R E S EAT algorithm iterates through the list of groups several times.In thefirst iteration,it admits any group of size at least n/2.In the second iteration,it admits any group of size at least n/4.It continues in the same manner seating smaller and smaller groups until the theater isfilled. When R E S EATfinishes,it returns the number of people admitted.(c) Assume that,given G and n,it is possible to admit at least k people.Show that theR E S EAT algorithm still seats at least k/2people.(d) The R E S EAT algorithm runs in O(m lg n)time.Devise a new algorithm that runs inO(m)time and still guarantees that if k people can be seated,your algorithm seats atleast k/2people.。
MIT 网络技术 14_15JF09_lec16
![MIT 网络技术 14_15JF09_lec16](
Lecture 16: Cooperation and Trust in Networks
Daron Acemoglu and Asu Ozdaglar
November 4, 2009
Networks: Lecture 16
The role of networks in cooperation A model of social norms Cohesion of groups and social norms Trust in networks
Reading: Osborne, Chapters 14 and 15.
Networks: Lecture 16
The Role of Social Networks
Recall the importance of “social contacts” in finding jobs. Especially of “weak ties” (e.g., Granovetter (1973) “The Strength of Weak Ties”: most people find jobs through acquaintances not close friends. The idea is that recommendations from people you know are more trusted. Similarly, social networks important in starting businesses? Recall that in many developing economies (but also even in societies with very strong institutions), networks of “acquaintances and contacts” shape business behavior. (e.g., Munshi (2009) “Strength in Numbers: A Network-Based Solution to Occupational Traps”). The Indian diamond industry is dominated by a few small subcasts, the Marwaris, the Palanpuris, the Kathiawaris—in the same way that Antwerp and New York diamond trade used to be dominated by ultra-Orthodox Jews.
MIT OCW 麻省理工学院开放式课程原文:麻省理工学院的开放式课程计划”目前上线了九百门课程,已经到达了2007年所有课程上线的目标的一半。
中文简体:中文繁体:USU OpenCourseWare犹他州立大学开放课程原文:犹他州立大学开放课程是提供世界各地的学生、自学者和教育家们免费、自由开放的教育资源。
Mehran Sahami CS 106A Handout #12 October 5, 2007Assignment #2: Simple Java ProgramsDue: 3:15pm on Monday, October 15thBased on a handout by Eric Roberts Your job in this assignment is to write programs to solve each of these six problems.1. Write a GraphicsProgram subclass that draws a pyramid consisting of bricksarranged in horizontal rows, so that the number of bricks in each row decreases by one as you move up the pyramid, as shown in the following sample run:The pyramid should be centered at the bottom of the window and should use constants for the following parameters:BRICK_WIDTH BRICK_HEIGHT BRICKS_IN_BASE The width of each brick (30 pixels) The height of each brick (12 pixels) The number of bricks in the base (14)The numbers in parentheses show the values for this diagram, but you must be able to change those values in your program.2. Suppose that you’ve been hired to produce a program that draws an image of an archery target —or, if you prefer commercial applications, a logo for a national department store chain —that looks like this:This figure is simply three GOval objects, two red and one white, drawn in the correct order. The outer circle should have a radius of one inch (72 pixels), the white circle has a radius of 0.65 inches, and the inner red circle has a radius of 0.3 inches. The figure should be centered in the window of a GraphicsProgram subclass.3. Write a GraphicsProgram subclass that draws a partial diagram of the acm.program class hierarchy, as follows: The only classes you need to create this picture are GRect , GLabel , and GLine . The major part of the problem is specifying the coordinates so that the different elementsa b of the picture are aligned properly. The aspects of the alignment for which you are responsible are:· The width and height of the class boxes should be specified as named constants so that they are easy to change.· The labels should be centered in their boxes. You can find the width of a label by calling label.getWidth() and the height it extends above the baseline by calling label.getAscent(). If you want to center a label, you need to shift its origin by half of these distances in each direction.· The connecting lines should start and end at the center of the appropriate edge of the box.· The entire figure should be centered in the window. 4. In high-school geometry, you learned the Pythagorean theorem for the relationship of the lengths of the three sides of a right triangle:a 2 +b 2 =c 2which can alternatively be written as:c = 2 2Most of this expression contains simple operators covered in Chapter 3. The one piece that’s missing is taking square roots, which you can do by calling the standard function Math.sqrt . For example, the statement double y = Math.sqrt(x);sets y to the square root of x .Write a ConsoleProgram that accepts values for a and b as int s and then calculates the solution of c as a double . Your program should be able to duplicate the following sample run:5. Write a ConsoleProgram that reads in a list of integers, one per line, until a sentinel value of 0 (which you should be able to change easily to some other value). When the sentinel is read, your program should display the smallest and largest values in the list, as illustrated in this sample run: Your program should handle the following special cases: · If the user enters only one value before the sentinel, the program should report that value as both the largest and smallest. · If the user enters the sentinel on the very first input line, then no values have been entered, and your program should display a message to that effect.6. Douglas Hofstadter’s Pulitzer -prize-winning book Gödel, Escher, Bach contains many interesting mathematical puzzles, many of which can be expressed in the form of computer programs. In Chapter XII, Hofstadter mentions a wonderful problem that is well within the scope of the control statements from Chapter 4. The problem can be expressed as follows:Pick some positive integer and call it n .If n is even, divide it by two. If n is odd, multiply it by three and add one. Continue this process until n is equal to one.On page 401 of the Vintage edition, Hofstadter illustrates this process with the following example, starting with the number 15:1546 23 70 35 106 53 is odd, so I make 3n + 1: is even, so I take half: is odd, so I make 3n + 1: is even, so I take half: is odd, so I make 3n + 1: is even, so I take half: is odd, so I make 3n + 1: 46 23 70 35 106 53 160160 80 40 20 10 5 16 8 4 2 is even, so I take half:is even, so I take half:is even, so I take half:is even, so I take half:is even, so I take half:is odd, so I make 3n + 1:is even, so I take half:is even, so I take half:is even, so I take half:is even, so I take half:804020105168421As you can see from this example, the numbers go up and down, but eventually—at least for all numbers that have ever been tried—comes down to end in 1. In some respects, this process is reminiscent of the formation of hailstones, which get carried upward by the winds over and over again before they finally descend to the ground. Because of this analogy, this sequence of numbers is usually called the Hailstone sequence, although it goes by many other names as well.Write a ConsoleProgram that reads in a number from the user and then displays the Hailstone sequence for that number, just as in Hofstadter’s book, followed by a line showing the number of steps taken to reach 1. For example, your program should be able to produce a sample run that looks like this:The fascinating thing about this problem is that no one has yet been able to prove that it always stops. The number of steps in the process can certainly get very large. How many steps, for example, does your program take when n is 27?。
policy,based on functions˜J k.•Assume that for all(x k,k),we haveˆJk(x k)≤˜J k(x k),(*) whereˆJ N=g N and for all k,ˆJ k (x k)=minu k∈U k(x k)Eg k(x k,u k,w k)+˜J k+1f k(x k,u k,w k),[soˆJ k(x k)is computed along withµk(x k)].ThenJ k(x k)≤ˆJ k(x k),for all(x k,k).•Important application:When˜J k is the cost-to-go of some heuristic policy(then the1SL policy is called the rollout policy).•The bound can be extended to the case where there is aδk in the RHS of(*).ThenJ k(x k)≤˜J k(x k)+δk+···+δN−1COMPUTATIONAL ASPECTS •Sometimes nonlinear programming can be used to calculate the1SL or the multistep version[par-ticularly when U k(x k)is not a discrete set].Con-nection with the methodology of stochastic pro-gramming.•The choice of the approximating functions˜J k is critical,and is calculated with a variety of methods.•Some approaches:(a)Problem Approximation:Approximate the op-timal cost-to-go with some cost derived froma related but simpler problem(b)Heuristic Cost-to-Go Approximation:Approx-imate the optimal cost-to-go with a functionof a suitable parametric form,whose param-eters are tuned by some heuristic or system-atic scheme(Neuro-Dynamic Programming) (c)Rollout Approach:Approximate the optimalcost-to-go with the cost of some suboptimalpolicy,which is calculated either analyticallyor by simulationPROBLEM APPROXIMATION •Many(problem-dependent)possibilities−Replace uncertain quantities by nominal val-ues,or simplify the calculation of expectedvalues by limited simulation−Simplify difficult constraints or dynamics •Example of enforced decomposition:Route m vehicles that move over a graph.Each node has a“value.”Thefirst vehicle that passes through the node collects its value.Max the total collected value,subject to initial andfinal time constraints (plus time windows and other constraints).•Usually the1-vehicle version of the problem is much simpler.This motivates an approximation obtained by solving single vehicle problems.•1SL scheme:At time k and state x k(position of vehicles and“collected value nodes”),consider all possible k th moves by the vehicles,and at the resulting states we approximate the optimal value-to-go with the value collected by optimizing the vehicle routes one-at-a-timeHEURISTIC COST-TO-GO APPROXIMATION •Use a cost-to-go approximation from a paramet-ric class˜J(x,r)where x is the current state and r=(r1,...,r m)is a vector of“tunable”scalars (weights).•By adjusting the weights,one can change the “shape”of the approximation˜J so that it is reason-ably close to the true optimal cost-to-go function.•Two key issues:−The choice of parametric class˜J(x,r)(the approximation architecture).−Method for tuning the weights(“training”the architecture).•Successful application strongly depends on how these issues are handled,and on insight about the problem.•Sometimes a simulator is used,particularly when there is no mathematical model of the sys-tem.APPROXIMATION ARCHITECTURES •Divided in linear and nonlinear[i.e.,linear or nonlinear dependence of˜J(x,r)on r].•Linear architectures are easier to train,but non-linear ones(e.g.,neural networks)are richer.•Architectures based on feature extraction(•Ideally,the features will encode much of the nonlinearity that is inherent in the cost-to-go ap-proximated,and the approximation may be quite accurate without a complicated architecture.•Sometimes the state space is partitioned,and “local”features are introduced for each subset of the partition(they are0outside the subset).•With a well-chosen feature vector y(x),we can use a linear architecture˜J(x,r)=ˆJ y(x),r =r i y i(x)i•Programs use a feature-based position evalua-tor that assigns a score to each move/position•Most often the weighting of features is linear but multistep lookahead is involved.•Most often the training is done by trial and error.•Additional features:−Depthfirst search−Variable depth search when dynamic posi-tions are involved−Alpha-beta pruning•Multistep lookahead tree+8+20+18+16+24+20+10+12-4+8+21+11-5+10+32+27+10+9+3•Alpha-beta pruning:As the move scores are evaluated by depth-first search,branches whose consideration(based on the calculations so far) cannot possibly change the optimal move are ne-glected。