A feature-relevance heuristic for indexing and compressing large case bases
绿色供应链设计助推碳排量减少翻译英文
Green supply chain network design to reduce carbon emissionsSamir Elhedhli ⇑,Ryan MerrickDepartment of Management Sciences,University of Waterloo,200University Avenue,Waterloo,Ontario,Canada N2L 3G1a r t i c l e i n f o Keywords:Green supply chain design LogisticsTransportation CO 2emissions Carbon footprinta b s t r a c tWe consider a supply chain network design problem that takes CO 2emissions into account.Emission costs are considered alongside fixed and variable location and production costs.The relationship between CO 2emissions and vehicle weight is modeled using a concave function leading to a concave minimization problem.As the direct solution of the resulting model is not possible,Lagrangian relaxation is used to decompose the problem into a capacitated facility location problem with single sourcing and a concave knapsack problem that can be solved easily.A Lagrangian heuristic based on the solution of the subproblem is proposed.When evaluated on a number of problems with varying capacity and cost char-acteristics,the proposed algorithm achieves solutions within 1%of the optimal.The test results indicate that considering emission costs can change the optimal configuration of the supply chain,confirming that emission costs should be considered when designing supply chains in jurisdictions with carbon costs.Ó2012Elsevier Ltd.All rights reserved.1.IntroductionWith the globalization of supply chains,the distance between nodes in the distribution network has grown considerably.Longer travel distances lead to increased vehicle emissions on the transportation routes,resulting in an inflated carbon foot-print.Hence,there is a need to effectively and efficiently design eco-friendly supply chains,to both improve environmental conditions and the bottom line of the work design is a logical place to start when looking to green a supply chain design.Wu and Dunn (1995)cite transportation as the largest source of environmental hazards in the logistics system.This claim is supported by the fact that transportation via combustion engine vehicles accounted for 27%of the Canadian greenhouse gas (GHG)inventory in 2007(Environment Canada,2009).And while heavy duty diesel vehicles,such as diesel tractors commonly used in logistics,account for only 4.2%of vehicles on the road,they accounted for 29.2%of Canadian GHG emissions from transportation in 2007.Thus,reducing the number of vehicle kilometers travelled through the strategic placement of nodes could play a significant role in reducing the carbon footprint of the nation.Supply chain design models have traditionally focused on minimizing fixed and operating costs without taking carbon emissions into account.Recent studies,however,started to take emissions into account.This includes Cruz and Matsypura (2009),Nagurney et al.(2007),Benjaafar et al.(2010),Merrick and Bookbinder (2010),and Ando and Taniguchi (2006).This paper develops a green supply chain design model that incorporates the cost of carbon emissions into the objective function.The goal of the model is to simultaneously minimize logistics costs and the environmental cost of CO 2emissions by strategically locating warehouses within the distribution network.A three echelon,supply chain design model is proposed that uses published experimental data to derive nonlinear concave expressions relating vehicle weight to CO 2emissions.The resulting concave mixed integer programming model is tackled using Lagrangian relaxation to decompose it by echelon and by warehouse site.The nonlinearity in one of the subproblems is eliminated by exploiting its special structure.This 1361-9209/$-see front matter Ó2012Elsevier Ltd.All rights reserved./10.1016/j.trd.2012.02.002⇑Corresponding author.E-mail address:elhedhli@uwaterloo.ca (S.Elhedhli).S.Elhedhli,R.Merrick/Transportation Research Part D17(2012)370–379371decomposition results in subproblems that require less computational effort than the initial problem.By keeping most of the features of the original problem in the subproblems,a strong Lagrangian bound is achieved.A primal heuristic is proposed to generate a feasible solution in each iteration using information from the subproblems.The quality of the heuristic is mea-sured against the Lagrangian bound.Test results indicate that the proposed method is effective infinding good solutions.The remainder of the paper is organized as follows.In the next section we look at the emission data,followed by the prob-lem formulation in Section3.We then delve into the Lagrangian relaxation procedure and proposed heuristic in Sections4 and5,respectively.Finally,we test the algorithm and heuristic in Section6,and conclude in Section7.2.Emissions dataFew comprehensive data sets exist that show the relationship between vehicle weights and exhaust emissions.While the exact emission levels will depend on the engine type,terrain driven and the driver tendencies,the general relationship be-tween vehicle weight and emissions will not change(i.e.linear,concave or convex relationship).This section reviews the available emissions data and draws conclusions about the relationship between emissions and the vehicle operating weight.The most comprehensive data set of vehicular GHG emissions for is that contained in the Mobile6computer program (Environmental Protection Agency,2006).Mobile6contains an extensive database of carbon dioxide(CO2)emissions for hea-vy heavy-duty diesel vehicles obtained from full scale experiments.The database contains emissions factors for various vehi-cle weights,ranging from class2trucks up to class8b.Speed correction factors,outlined by the California Air Resources Board(Zhou,2006)for use with the Mobile6program,can also be applied to relate CO2emission levels with vehicle weight and speed of travel.Fig.2.1displays the relationship between vehicle weight and CO2emissions for various speeds of travel. The units for CO2emissions are grams(g)per vehicle kilometer travelled(VKT)and the vehicle weight is in pounds(Note that‘‘vehicle weight’’represents the empty weight plus the cargo).The speed at which a vehicle travels at is a function of the travel route,whereas as the vehicle weight is dependent on the demand requirement at lower echelons.Consequently,the function used to compute the total pollution emissions can be chosen based on the mean travel speed over the route(e.g.an average highway speed of100kph).With multiple modes of transportation(trucks),the emission cost function is the lower envelope of the individual cost functions,which is concave even for linear individual emission cost functions.As a function of total shipment,modeling the emission costs will depend on the transportation strategy ing full truckloads and a single mode,the cost func-tion could very well be modeled usingfixed and linear cost functions.Whereas,in the general case where multiple modes and less-than-full truckloads are allowed,the long term cost is better modeled using a concave function.3.Problem formulationLet us define the indices i=1,...,m,j=1,...,n and k=1,...,p corresponding to plant locations,potential distribution centers(DCs)and customers,respectively.A distribution center at location j has a maximum capacity V j and afixed cost g j.Each customer has a demand of d k.The variable cost of handling and shipping a production unit from a plant at location i to distribution center j is c ij.Similarly,h jk denotes the average handling and shipping cost to move a production unit from distribution center j to customer k.We introduce one continuousflow variable and two binary location variables:x ij is theunits shipped from plant i to warehouse j ;y jk takes a value of one if customer k is assigned to distribution center j and zero otherwise;and z j takes a value of one if distribution center j is opened and zero otherwise.The capacity of the plants is as-sumed to be unlimited.The resulting MIP is:½FLM :min P m i ¼1P n j ¼1f ðx ij ÞþPn j ¼1P p k ¼1f ðd k y jk ÞþP m i ¼1P n j ¼1c ij x ij þP n j ¼1P p k ¼1h jk d k y jk þP n j ¼1g j z js :t :Pn j ¼1y jk ¼18kð1ÞP m i ¼1x ij ¼P p k ¼1d k y jk 8j ð2ÞPm i ¼1x ij 6V j z j 8j ð3ÞP p k ¼1d k y jk 6V j z j8jð4Þy jk ;z j 2f 0;1g ;x ij P 08i ;j ;kð5ÞThe first two terms of the objective function minimize the pollution cost to the environment,where f (x )is the emissions costfunction.The rest of the terms are the fixed cost of opening DCs and the handling and transportation cost to move goods between nodes.1Constraints (1)guarantee that each customer is assigned to exactly one distribution center.Constraints (2)balance the flow of goods into and out of the warehouse,thus linking the decisions between echelons in the network.Con-straints (3)and (4)force capacity restrictions on the distribution centers and ensure that only open facilities are utilized.Note that constraints (1)and (2)ensure that total customer demand is satisfigrangian relaxationGiven the difficulty in solving [FLM]directly,we use Lagrangian relaxation to exploit the echelon structure of the prob-lem.It is important to select the constraints for relaxation,as relaxing more constraints may deteriorate the quality of the bound and heuristics.We relax constraints (2)using Lagrangian multipliers,l j ,since they link the echelons of the supply chain.This leads to the following subproblem:½LR-FLM :minPm i ¼1P n j ¼1f ðx ij ÞþP n j ¼1P p k ¼1f ðd k y jk ÞþP m i ¼1P n j ¼1ðc ij Àl j Þx ij þP n j ¼1P p k ¼1ðh jk d k þd k l j Þy jk þPn j ¼1g j z js :t :ð1Þ;ð3Þ;ð4Þand ð5Þwhich is separable by echelon.Furthermore,it decomposes to two subproblems:½SP1 :minPn j ¼1P p k ¼1f ðd k y jk ÞþP n j ¼1P p k ¼1ðh jk d k þd k l j Þy jk þP n j ¼1g j z js :t :ð1Þand ð4Þy jk ;z j 2f 0;1g ;8j ;kwhich determines the assignment of customers to distribution centers.As y jk is binary and due to (1),the objective can bewritten as min P n j ¼1P p k ¼1ðf ðd k Þþh jk d k þd k l j Þy jk þP nj ¼1g j z j ,making [SP1]a capacitated facility location problem with single sourcing.The second subproblem:½SP2 :minPm i ¼1P n j ¼1f ðx ij ÞþP m i ¼1P n j ¼1ðc ij Àl j Þx ijs :t :ð3Þx ij P 08i ;jcan be decomposed by potential warehouse site,resulting in n subproblems [SP2j ].½SP2j :min Pm i ¼1f ðx ij ÞþPm i ¼1ðc ij Àl j Þx ijs :t :P m i ¼1x ij 6V j z jx ij P 08i1The model does not take congestion and or breakdowns into account.Congestion and the need to keep reserve capacity for emergencies affect theenvironmental costing of the supply chain.2The model assumes that total demand has to be satisfied.If only partial demand is to be satisfied then the model has to be posed as a profit maximization rather than a cost minimization model.372S.Elhedhli,R.Merrick /Transportation Research Part D 17(2012)370–379As z j =0is trivial,we focus on the case when z j =1,which makes [SP2j ]a concave knapsack problem.An important property of concave functions is that a global solution is achieved at some extreme point of the feasible domain (Pardalos and Rosen,1986).Therefore,[SP2j ]has an optimal solution at an extreme point of x ij P 0;P mi ¼1x ij 6V j ÈÉ.This implies that at optimality at most one x ij will take the value of V j and the remaining x ij will be equal to 0.This allows us to reformulate [SP2j ]as:½SP2j :minPm i ¼1ðf ðV j Þþc ij Àl j Þx ij s :t :Pm i ¼1x ij 6V jx ij P 08iwhich is now a linear knapsack problem.The advantage of the relaxation is that [SP1]retains several important characteristics of the initial problem,such as the assignment of all customers to a single warehouse and the condition that the demand of all customers is satisfied.In addi-tion,[SP2]reduces to n subproblems,which can be solved with little computational effort relative to the original problem.The drawback of this relaxation is that [SP1]is a capacitated facility location problem with single sourcing that is a bit dif-ficult to solve.However,[SP1]is still easier to solve than [FLM]and by retaining the critical characteristics of [FLM]in [SP1],the Lagrangian bound can be achieved in a relatively small number of iterations and reduce the overall solution time while still obtaining a high quality bound.Similarly,by exploiting the solution of [SP1]in a heuristic,high quality feasible solutions will be achieved.The Lagrangian relaxation starts by initializing the Lagrangian multipliers,and solving the subproblems.The solutions to the subproblems yield a lower bound:LB ¼½v ½SP 1 þPn j ¼1v ½SP 2jThe best Lagrangian lower bound,LB ⁄,is:LB ümax l½v ½SP 1 þP n j ¼1v ½SP 2jthat can be found by solving:max lminh 2I x Pn j ¼1P p k ¼1ðf ðd k Þþh jk d k þd k l j Þy h jk þPn j ¼1g j z h j þPn j ¼1minh j 2I yj Pm i ¼1ðf ðV j Þþc ij Àl j Þx hjij ()ð6Þwhere I x is the index set of feasible integer points of the set:y h jk ;z h j :P n j ¼1y h jk ¼1;P p k ¼1d k y h jk 6V j z h j ;y h jk 2f 0;1g ;z h j 2f 0;1g ;8j ;k()and I yj is the index set of extreme points of the set:x hj ij :P m i ¼1x h jij 6V j ;x h jij P 0;8i&'We can then reformulate (6)as the Lagrangian master problem:½LMP :max h 0þPn j ¼1h js :t :h 0ÀPn j ¼1P p k ¼1d k y h jkl j 6P n j ¼1P p k ¼1ðf ðd k Þþh jk d k Þy h jk þP n j ¼1g j z h jh 2I xh j þPm i ¼1x h j ij l j6P m i ¼1f x h j ij þP m i ¼1c ij x hjij h j 2I yj ;8j[LMP]can be solved as a linear programming problem.I x &I x and I yj &I yj define a relaxation of [LMP].An initial set ofLagrangian multipliers,l ,is used to solve [SP1]and [SP2]and generate n +1cuts of the form:h 0ÀP n j ¼1Pp k ¼1d k ya jkl j 6Pn j ¼1P p k ¼1ðf ðd k Þþh jk d k Þy a jk þP n j ¼1g j za jh j þPm i ¼1x bj ij l j6P m i ¼1f x bjij þP mi ¼1c ij x b j ij 8jThe index sets x and yj are updated at each iteration as x [f ag and yj [f b j g ,respectively.S.Elhedhli,R.Merrick /Transportation Research Part D 17(2012)370–379373The solution to [LMP]produces an upper bound,UB ,to the full master problem and a new set of Lagrangian multipliers.The new set of Lagrangian multipliers is input to [SP1]and [SP2]to generate a new solution to the subproblems and an addi-tional set of cuts to the [LMP].The procedure of iterating through subproblems and master problem solutions is terminated when the best lower bound is equal to the upper bound,at which point the Lagrangian bound is achieved.Note that the heu-ristic procedure designed to generate a feasible solution is outlined and discussed in the next section.5.A primal heuristic for generating feasible solutionsWhile the Lagrangian algorithm provides the Lagrangian bound,it does not reveal the combination of product flows,cus-tomer assignments and open facilities that will produce this result.Hence,heuristics are commonly used in conjunction with Lagrangian relaxation algorithms to generate feasible solutions.To generate feasible solutions,we devise a primal heuristic based on the solution of the subproblems.Subproblem [SP1]generates the assignments of customers to distribution centers and determines if a distribution center is open or closed.Using y h jk and z hj from [SP1],the units demanded by the retailers at each distribution center can be determined.With the de-mand at each distribution center being deterministic,the original problem could be reduced to a simple continuous flow transportation problem,[TP],which will always have a feasible solution.½TP :minP n j ¼1P p k ¼1ðf ðd k Þþh jk d k Þy h jk þP m i ¼1P n j ¼1f ðx ij ÞþP m i ¼1P n j ¼1c ij x ij þPn j ¼1g j z h js :t :P m i ¼1x ij ¼P n k ¼1d k y h jk 8jx ij P 08i ;jThe first and fourth terms in [TP]are simply constants,thus leaving only two terms in the objective function.Again,be-cause [TP]is concave there is an extreme point that is optimal,implying that each warehouse will be single-sourced by oneplant and that the goods will be transported on a single truck,as opposed to being spread over multiple vehicles.Therefore,the optimal flow of units from a plant to warehouse will be equal to the quantity demanded by the warehouse or zero.We can then formulate [TP]as an assignment problem:½TP2 :min P m i ¼1f P p k ¼1d k y h jk þc ij P p k ¼1d k y hjkw ij þCs :t :P m i ¼1P p k ¼1d k y hjk w ij ¼P p k ¼1d k y h jk 8jw ij 2f 0;1g 8i ;jwhere w ij takes a value of one if warehouse j is supplied by plant i and zero otherwise.In numerical testing of the algorithm,the heuristic was activated at each iteration to find a feasible solution.6.Numerical testingThe solution algorithm is implemented in Matlab 7and uses Cplex 11to solve the subproblems,the heuristic and the master problems.The test problems were generated similar to the capacitated facility location instances suggested by Cor-nuejols et al.(1991).The procedure calls for problems to be generated randomly while keeping the parameters realistic.The coordinates of the plants,distribution centers and customers were generated uniformly over [10,200].From the coordinates,the Euclidean distance between each set of nodes is computed.The transportation and handling costs between nodes are then set using the following relationship:c ij ¼b 1½10Âd ij h jk ¼b 2½10Âd jkwhere b is a scaling parameter to exploit different scenarios in numerical testing,and i ,j ,k are nodes.The demand of each customer,d k ,is generated uniformly on U [10,50].The capacities of the distribution centers,V j ,are set to:V j ¼j ½U ½10;160where j is used to scale the ratio of warehouse capacity to demand.In essence,j dictates the rigidity or tightness of the problem and has a large impact on the time required to solve the problem.The capacities of the distribution centers were scaled so as to satisfy:j ¼P nj ¼1V jP pk ¼1d k!¼3;5;10The fixed costs of the DCs were designed to reflect economies of scale.The fixed cost to open a distribution center,g j ,is:374S.Elhedhli,R.Merrick /Transportation Research Part D 17(2012)370–379g j ¼a ½U ½0;90 þU ½100;110 ÂffiffiffiffiffiV j qAgain,a is a scaling parameters used to test different scenarios in the numerical analysis.Just as the problem formulation was extended to include emissions costs,the test problems must also be extended.In order to compute the emission costs,the distance travelled,vehicle weight and emission rate must be known.The distance travelled can easily be determined from the randomly generated coordinates of the sites.The vehicle weight is determined by the number of units loaded on the truck (x ij or d k y jk ).To compute the weight of the vehicle,an empty vehicle weight of 15,000lb was assumed and the weight of a single production unit was assumed to be 75lb.The payload was calculated as the number of units on the truck multiplied by the weight of a single unit,which resulted in loads between 0and 45,000lb.The sum of the empty tractor-trailer weight and the payload results in a loaded vehicle weight range of 15,000–60,000lb.It is assumed that single vehicle trips would be made between nodes,thus the vehicle weights are reasonable and the emissions curve for a single truck is used.However,the emissions curve could be substituted with a best fit concave line that would represent a number of vehicle trips,if so desired.Finally,the emission rate,e ,is determined using the US EPA lab data,shown in Fig.2.1(Environmental Protection Agency,2006).Using these parameters,the emission cost of the net-work,f ,is determined using the following equation:Table 6.1Comparison based on different capacity utilizations.Problem Heur.Time (%,%,%,%,s)i .j .kDCLR FCR_DC VCR ECR Iters.Quality SP1SP2Heur.MP Total Tight capacities (j =3)5.10.200.9170.4040.4890.10740.10291.7 5.3 1.7 1.3 2.55.10.400.8860.3710.5220.10740.09298.60.90.20.214.05.10.600.8580.3430.5370.12040.09998.80.80.20.222.08.15.250.9470.4860.4230.09140.05394.1 3.6 1.50.81898.15.500.9520.3800.5090.11140.02599.60.20.10.13588.15.750.9800.3390.5390.12250.00899.60.20.10.127610.20.500.9780.4540.4440.10240.02499.60.30.1$098610.20.750.9440.3980.4980.10540.06099.90.1$0$076510.20.1000.9720.4090.4850.10640.01099.90.1$0$0154310.20.1250.9810.3630.5190.11740.01399.90.0$0$0132810.20.1500.9800.3650.5270.10850.01999.80.1$0$02194Min 0.860.340.420.0940.0191.700.00$0$0Mean 0.950.390.500.11 4.180.0598.32 1.050.350.25Max0.980.490.540.1250.1099.90 5.30 1.70 1.30Moderate capacities (j =5)5.10.200.8490.4020.4890.10940.06385.09.6 3.1 2.4 1.005.10.400.8180.3310.5520.11740.07797.5 1.50.50.47.55.10.600.9170.2800.5950.12440.04197.9 1.20.50.317.08.15.250.9170.4280.4610.11150.03992.7 4.7 1.6 1.01478.15.500.8870.4020.4730.12540.03694.3 3.0 1.3 1.41808.15.750.9390.3520.5310.11640.03099.30.40.20.159210.20.500.9700.3710.5060.12340.00995.6 2.60.90.958310.20.750.9370.3440.5320.12440.02399.9$0$0$088310.20.1000.9350.3430.5310.12740.01499.60.30.10.1106110.20.1250.8970.3100.5630.12740.04199.80.1$0$0124210.20.1500.9160.2980.5780.12540.04198.80.80.30.21548Min 0.820.280.460.1140.0185.00$0$0$0Mean 0.910.350.530.12 4.090.0496.40 2.200.770.62Max 0.970.430.600.1350.0899.909.60 3.10 2.4Excess capacities (j =10)5.10.200.6230.4830.3990.11840.20499.90.10.0$096.15.10.400.6550.2590.5800.16140.03994.8 2.9 1.40.9 2.45.10.600.6400.2730.6110.11630.08395.5 2.90.90.7 1.48.15.250.9040.3560.5330.11140.04099.10.50.20.116.88.15.500.7720.3260.5230.15140.11892.7 5.0 1.3 1.0 1.88.15.750.8260.3150.5580.12760.04995.7 2.90.70.6 4.510.20.500.9020.4500.4340.11640.05099.80.1$0$010610.20.750.9970.3070.5640.12940.00196.8 1.80.70.7 6.610.20.1000.6460.3270.5380.13540.03499.70.20.1$059.210.20.1250.9270.2980.5770.12550.00899.90.1$0$019910.20.1500.6930.2970.5620.14140.03598.6 1.00.20.2173Min 0.620.260.400.1130.0092.700.10$00.0Mean 0.780.340.530.13 4.180.0697.50 1.590.500.38Max1.000.480.610.1660.2099.905.001.401.0S.Elhedhli,R.Merrick /Transportation Research Part D 17(2012)370–379375f ðx ij Þ¼X Â0:2Âe ðx ij ÞÂd ij f ðd k y jk Þ¼X Â0:2Âe ðd k y jk ÞÂd jkX is used as a scaling parameter to test various network scenarios.The constants on the right-hand sides of the above equa-tions are used for unit conversions and to associate a dollar value to the emission quantity.For all test cases,a travel speed of 100kph was used to compute emission levels,which was assumed to be representative of highway transportation.The solution algorithm underwent rigorous testing to measure its effectiveness.Several statistics are collected during the solution procedure.Foremost,the load ratio of the open distribution centers was calculated.The DC load ratio,DCLR,relates the total capacity of all open DCs to the total units demanded by the customers,and is computed as:DCLR ¼P nj ¼1ðV j Áz j ÞP pk ¼1ðd k ÞThe cost breakdown of the best feasible network resulting from the solution algorithm is also evaluated.Three primary cost groups were considered:the fixed costs to open the distribution centers (FCR_DC),the variable logistics costs (VCR)and the emissions costs (ECR).These statistics were computed as a percentage of the total system expense,denoted as Z ,using the following formulas:Table 6.2Comparison based on different dominant cost scenarios.Problem Heur.Time (%,%,%,%,s)i .j .kDCLR FCR_DC VCR ECR Iters.Quality SP1SP2Heur.MP Total Dominant fixed costs 5.10.200.9600.8240.1490.02840.01091.8 5.5 1.5 1.3 1.005.10.400.9830.7440.2130.04240.003$100$0$0$02955.10.600.9970.7160.2370.04740.00099.10.50.30.112.58.15.250.9530.8460.1230.03050.00583.59.2 3.5 3.8 1.28.15.500.9440.8110.1500.03950.006100.00.00.00.032638.15.750.9940.8080.1550.03750.00198.2 1.00.40.434.010.20.500.9990.8280.1420.03050.000$100$0$0$0241010.20.750.9720.7910.1780.03250.004$100$0$0$0147810.20.1000.9830.7680.1900.04250.00299.50.30.10.149.310.20.1250.9990.7450.2160.03950.00099.90.10.00.024810.20.1500.9910.7150.2350.05050.00299.90.1$0$0194Min 0.940.720.120.0340.0083.50$0$0$0Mean 0.980.780.180.04 4.730.0097.45 1.520.530.52Max1.000.850.240.0550.01100.009.20 3.50 3.8Dominant variable costs 5.10.200.5840.1850.7700.04550.13073.314.08.2 4.50.435.10.400.6030.1270.8410.03240.05197.7 1.20.70.4 4.15.10.600.5150.1520.8180.03040.08996.5 1.7 1.10.6 2.98.15.250.7800.2160.7480.03650.05391.7 4.9 2.1 1.3 1.88.15.500.8760.1660.7990.03540.05093.6 3.6 1.8 1.0 2.08.15.750.4690.2020.7620.03540.07996.2 2.2 1.10.6 3.410.20.500.6990.2290.7330.03840.05794.7 3.1 1.40.8 3.010.20.750.7630.1960.7720.03250.02099.80.10.1$087.610.20.1000.6550.1480.8150.03740.04099.80.1$0$010710.20.1250.7500.1440.8210.03550.03499.80.1$0$012010.20.1500.7420.1360.8280.03640.02999.80.10.10.2213Min 0.470.130.730.0340.0273.300.10$0$0Mean 0.680.170.790.04 4.360.0694.81 2.83 1.510.85Max 0.880.230.840.0550.1399.8014.008.20 4.5Dominant emissions cost 5.10.200.7010.3530.3150.3325 1.01494.6 2.8 1.7 1.0 2.15.10.400.5570.3290.3360.33540.77787.5 6.3 4.0 2.10.775.10.600.6330.2300.3890.38140.50199.60.20.10.125.28.15.250.8750.3150.3140.37140.327$1000.008158.15.500.8560.2530.3840.36350.42497.6 1.40.60.4 6.28.15.750.6780.3140.3490.33740.58699.60.20.10.134.110.20.500.8140.3640.2800.35540.39397.3 1.60.70.4 5.410.20.750.7060.3060.3510.34240.56499.90.00.00.018510.20.1000.7090.2890.3620.34940.46999.50.30.10.131.310.20.1250.7880.3150.3540.33150.41199.90.10.00.020710.20.1500.6900.2870.3730.34040.63699.40.30.10.126.6Min 0.560.230.280.3340.3387.500.000.000.0Mean 0.730.310.350.35 4.270.5597.72 1.200.670.39Max 0.880.360.390.3851.011006.304.002.1376S.Elhedhli,R.Merrick /Transportation Research Part D 17(2012)370–379FCR DC ¼P nj ¼1ðg j Áz j ÞZVCR ¼P m i ¼1P nj ¼1c ij x ij þP n j ¼1P p k ¼1h jk d k y jkECR ¼P m i ¼1P n j ¼1f ðx ij ÞþP n j ¼1P pk ¼1f ðd k y jk ÞZThe quality of the heuristic is measured by comparing the cost of the feasible solution vs.the Lagrangian bound,LR,as follows:Heuristic Quality ¼100Âheuristic solution ÀLRData on the evaluation times required to solve each section of the Lagrangian algorithm were also collected,as the solu-tion times can be used to give insight as to the relative difficulty of the particular parison for different capacity utilizationsWe tested the solution algorithm using a variety of cases.The first test case considered is the base scenario,which serves as the baseline for comparison.The base case is constructed with b 1=b 2=1,a =100,X =1.The DC capacity ratio is varied from tight capacities (j =3),moderate capacities (j =5),to excess capacities (j =10).The results are shown in Table 6.1.The test statistics present several insights about the problem formulation and solution algorithm.The data shows that the rigidity of the problem (dictated by j )has a large impact on the DCLR,both in terms of the average and range of the ratio.Table 6.1shows that as the tightness of the problem is decreased (or as j is increased),the DCLR also decreases.Furthermore,the results show that the range of the DCLRs increases as j increases.Thus,it is evident that the tightness of the problem has an adverse effect on the load ratio of the distribution center.The cost breakdowns for the base scenario test cases are also presented in Table 6.1.In contrast to the DCLR,the distri-bution of costs is fairly stable across the varying DC capacity levels.Hence,the value of j has little impact on the cost dis-tribution of the network.Computational times are also shown in Table 6.1.Intuitively,the computation increases as more decisions variables are added to the problem.Additionally,the average solution time increases as the tightness of the problem increases.The data shows that the majority of the solution time is spent solving [SP1],accounting for roughly 96to 98%of the total time.The table shows that the primal heuristic produces very good feasible solutions that are less than.2%from the optimum.Contributing to the strength of the heuristic was the fact that the information we use to construct the heuristic solution is taken from [SP1].Furthermore,[SP1]retains many attributes of the original problem and is already a very strong formula-tion,which is evident by the large amount of time spent solving [SP1]parison for different dominant cost scenariosTo test the behavior of the algorithm and the characteristics of the optimal network design,we vary the cost structure to make one of the cost components dominant.Three cases are considered:dominant fixed costs,dominant variable costs,and dominant emissions costs.All three are compared for the moderate capacity case (j =5).The dominant fixed cost scenario enlarges the scaling parameter on the fixed costs to establish the distribution centers.This case is constructed with b 1=b 2=1,a =1000,X =1.The results are displayed in Table 6.2.S.Elhedhli,R.Merrick /Transportation Research Part D 17(2012)370–379377。
Altalt Combining the advantages of graphplan and heuristic state search
AltAlt:Combining the Advantages of Graphplan andHeuristic State SearchRomeo Sanchez Nigenda,XuanLong Nguyen&Subbarao KambhampatiDepartment of Computer Science and EngineeringArizona State University,Tempe AZ85287-5406Email:rsanchez,xuanlong,rao@AbstractMost recent strides in scaling up planning have centered around two competing themes–disjunctive planners,exemplified by Graphplan,and heuristic state search planners,exemplified by HSP and HSP-R.In this paper,we describe a planner called AltAlt,which successfully combines the advantages ofthe two competing paradigms to develop a planner that is significantly more powerful than either of theapproaches.AltAlt uses Graphplan’s planning graph in a novel manner to derive very effective searchheuristics which are then used to drive a heuristic state search planner.AltAlt is implemented bysplicing together implementations of STAN,a state-of-the-art Graphplan implementation,and HSP-r,a heuristic search planner.We present empirical results in a variety of domains that show the significantscale-up power of our combined approach.We will also present a variety of possible optimizationsfor our approach,and discuss the rich connections between our work and the literature on state-spacesearch heuristics.1IntroductionThere has been a rapid progress in plan synthesis technology in the past few years,and many approaches have been developed for solving large scale deterministic planning problems.Two of the more promi-nent approaches are“disjunctive”planners,as exemplified by Graphplan[2]and its many successors including IPP[11]and STAN[14];and heuristic state search planners exemplified by UNPOP[15],HSP [4]and HSP-R[3].Graphplan-style systems set up bounded length encodings of planning problems, solve those encodings using some combinatorial workhorse(such as CSP,SAT or ILP solvers),and ex-tend the encoding length iteratively if no solution is found at the current encoding level.State search planners depend on a variety of heuristics to effectively control a search in the space of world states. These two approaches have generally been seen to be orthogonal and competing.Although both of them have produced quite powerful planning systems,they both do suffer from some important disadvantages. Graphplan-style planners typically need to exhaustively search for plans at every encoding length until a solution is found.This leads to prohibitively large space and time requirements in certain problems. In contrast,state search planners can,in the best case,find a solution with linear space and time.Un-fortunately,the existing heuristics for state search planners are unable to handle problems with complex subgoal interactions,making them fail on some domains that Graphplan-style sytems are able to handle comfortably.In this paper,we describe a new hybrid planning system called AltAlt1that cleverly leverages the L L hatcomplementary strengths of both the Graphplan-style planners and the heuristic state search planners. Specifically,AltAlt uses a Graphplan-style planner to generate a polynomial time planning data ing the theory we developed in recent work[9],we extract several highly effective state search heuristics from the planning graph.These heuristics are then used to control a heuristic search planner. AltAlt is implemented on top of two highly optimized existing planners–STAN[14]that is a very ef-fective Graphplan style planner is used to generate planning graphs,and HSP-r[3],a heuristic search planner provides an optimized state search engine.Empirical results show that AltAlt can be orders of magnitude faster than both STAN and HSP-r,validating the utility of hybrid approach.In the rest of this paper,we discuss the implementation and evaluation of the AltAlt planning sys-tem.Section2starts by providing the high level architecture of the AltAlt system.Section3briefly reviews the theory behind extraction of state search heuristics[9].Section4discusses a variety of op-timizations used in AltAlt implementation to drive down the cost of heuristic computation,as well as the state search.Section5presents extensive empirical evaluation of AltAlt system that demonstrate its domination over both STAN and HSP-r planners.This section also presents experiments to study the cost and effectiveness tradeoffs involved in the computation of AltAlt’s planning graph-based heuristics. Section6discusses some related work and Section7summarizes our contributions.2Architecture of AltAltAs mentioned earlier,AltAlt system is based on a combination of Graphplan and heuristic state space search technology.The high-level architecture of AltAlt is shown in Figure1.The problem specification and the action template description arefirst fed to a Graphplan-style planner,which constructs a planning graph for that problem in polynomial time.We use the publicly available STAN implementation[14] for this purpose as it provides a highly memory efficient implementation of planning graph(see below). This planning graph structure is then fed to a heuristic extractor module that is capable of extracting a variety of effective and admissible heuristics,based on the theory that we have developed in our recent work[9,10].This heuristic,along with the problem specification,and the set of ground actions in thefinal action level of the planning graph structure(see below for explanation)are fed to a regression state-search planner.The regression planner code is adapted from HSP-R[3].To explain the operation of AltAlt at a more detailed level,we need to provide some further back-ground on its various components.We shall start with the regression search module.This module starts with the goal state and regresses it over the set of relevant action instances from the domain.An action instance is considered relevant for a state if the effects of give at least one element of and do not delete any element of.The result of regressing over is then–which is essentially the set of goals that still need to be achieved before the application of,such that everything in would have been achieved once is applied.For each relevant action,a separate search branch is generated,with result of regressing over that action as the new state in that branch.Search terminates with success at a node if every literal in the state corresponding to that node is present in the initial state of the problem.Figure2pictorially depicts the initial andfinal state specification of a simple grid problem,and presents thefirst level of the regression search for that problem.In this problem,a robot that is in the cell (0,0)in the beginning,is required to pick a key that is in cell(0,1)and place it in the cell(2,2)and get back to its original cell(0,0).The actions in this domain include picking and dropping a key,and moving from one cell to an adjacent cell.If we use the predicate to represent the location of the key, and to represent the location of the robot,the initial state of the problem iswhile the goal state is.Figure2(b)shows the search branches generated by theFigure 1:Architecture of AltAlt(a)Grid Problem Key(2,2), At(0,0)Goal state (6)Drop_key(2,2)Move(0,1,0,0) Move(1,0,0,0) At(2,2), At(0,0),Have_keyKey(2,2), At(1,0) Key(2,2), At(0,1) (6) (7)(7) (b)Regression SearchFigure 2:A simple grid problem and the first level of regression search on it.regression search.Notice that there is no branch corresponding to a pickup action instance since none of them are relevant for achieving any of the top-level goals.The crux of controlling a regression search involves providing a heuristic function that can estimate the relative goodness of the states on the fringe of the current search tree and guide the search in most promising directions.Such heuristics can be quite tricky to develop.Consider,for example,the fringe states in the search tree of Figure 2(b).Given the robot moves,it is clear that the left most state can never be reached from the initial state–as it requires the robot to be in two positions at the same time.Unfortunately,as we shall show below,naive heuristic functions may actually consider this to be a more promising state than the other two.In fact,as we discuss in [9,10],HSP-R,a state-of the art heuristic search regression planner is unable to solve this relatively simple problem!The issue turns out to be that each one of the three subgoals in the left most state are easier to achieve in isolation than the subgoals of the two other states.Thus any heuristic that considers the cost of achieving subgoals in isolation winds up ranking the left most state as the more promising one.However,once we consider the interactions among the subgoals,the ranking can change quite drastically (as it does in this problem).Taking interactions into account in a principled way turns out to present several technical challenges.Fortunately,our recent work[9]provides an interesting way of leveraging the Graphplan technology to generate very effective heuristics.In the next section,we provide a brief review of this work,and explain how it is used in AltAlt.3Extraction of Heuristics from Graphplan’s Planning Graph3.1Structure of the Planning GraphGraphplan algorithm[2]involves two interleaved stages–expansion of the“planning graph”data struc-ture,and a backward search on the planning graph to see if any subgraph of it corresponds to a valid solution for the given problem.The expansion of the planning graph is a polynomial time operation while the backward search process is an exponential time operation.Since AltAltfinds solutions using regression search,our only interest in Graphplan is in its planning graph data structure.Figure3shows part of the planning graph constructed for the3x3grid problem shown in Figure2. As illustrated here,a planning graph is an ordered graph consisting of two alternating structures,called “proposition lists”and“action lists”.We start with the initial state as the zeroth level proposition list. Given a level planning graph,the extension of the structure to level involves introducing all actions whose preconditions are present in the level proposition list.In addition to the actions given in the domain model,we consider a set of dummy“noop”actions,one for each condition in the level proposition list(the condition becomes both the single precondition and effect of the noop).Once the actions are introduced,the proposition list at level is constructed as just the union of the effects of all the introduced actions.Planning-graph maintains the dependency links between the actions at level and their preconditions in level proposition list and their effects in level proposition list.The critical asset of the planning graph,for our purposes,is the efficient marking and propagation of mutex constraints during the expansion phase.The propagation starts at level1,with the actions that are statically interfering with each other(i.e.,their preconditions and effects are inconsistent)labeled mutex.Mutexes are then propagated from this level forward by using two simple propagation rules:Two propositions at level are marked mutex if all actions at level that support one proposition are pair-wise mutex with all actions that support the second proposition.Two actions at level are mutex if they are statically interfering or if one of the propositions(preconditions)supporting thefirst action is mutually exclusive with one of the propositions supporting the second action.Figure3shows a part of the planning graph for the robot problem specified in Figure2.The curved lines with x-marks denote the mutex relations.3.2Heuristics based on the planning graphTo guide a regression search in the state space,a heuristic function needs to evaluate the cost of some set of subgoals,comprising a regression state,from the initial state–in terms of number of actions needed to achieve them from the initial state.We now discuss how such a heuristic can be computed from the planning graph.Normally,the planning graph data structure supports“parallel”plans–i.e.,plans where at each step more than one action may be executed simultaneously.Since we want the planning graph to provide heuristics to the regression search module,which generates sequential solutions,wefirst make a modifi-cation to the algorithm so that it generates“serial planning graph.”A serial planning graph is a planning graph in which,in addition to the normal mutex relations,every pair of non-noop actions at the same level are marked mutex.These additional action mutexes propagate to give additional propositional mu-texes.Finally,a planning graph is said to level off when there is no change in the action,proposition and mutex lists between two consecutive levels.We will assume for now that given a problem,the Graphplan module of AltAlt is used to generate and expand a serial planning graph until it levels off.(As we shall see later,we can relax the requirement of growing the planning graph to level-off,if we can tolerate a graded loss of informedness of heuristics derived from the planning graph.)We will start with the notion of level of a set of propositions:Definition1(Level)Given a set of propositions,denote as the index of thefirst level in the leveled serial planning graph in which all propositions in appear and are non-mutexed with one an-other.(If is a singleton,then is just the index of thefirst level where the singleton element occurs.)If no such level exists,then if the planning graph has been grown to level-off,and ,where is the index of the last level that the planning graph has been grown to(i.e not until level-off).The intuition behind this definition is that the level of a literal in the planning graph provides a lower bound on the number of actions required to achieve from the initial ing this insight,a simple way of estimating the cost of a set of subgoals will be to sum their levels.Heuristic1(Sum heuristic)The sum heuristic is very similar to the greedy regression heuristic used in UNPOP[15]and the heuristic used in the HSP planner[4].Its main limitation is that the heuristic makes the implicit assump-tion that all the subgoals(elements of)are independent.For example,the heuristic winds up ranking the left most state of the search tree in Figure2(b)as the most promising among the three fringe states.Sum heuristic is neither admissible nor particularly informed.Specifically,since subgoals can be interacting negatively(in that achieving one winds up undoing progress made on achieving the others), the true cost of achieving a pair of subgoals may be more than the sum of the costs of achieving themindividually.This makes the heuristic inadmissible.Similarly,since subgoals can be positively interact-ing in that achieving one winds up making indirect progress towards the achievement of the other,the true cost of achieving a set of subgoals may be lower than the sum of their individual costs.To develop more effective heuristics,we need to consider both positive and negative interactions among subgoals in a limited fashion.In[9,10],we discuss a variety of ways of capturing the negative and positive interactions into the heuristic estimate using the planning graph structure,and discuss their relative tradeoffs.One of the best heuristics according to that analysis was a heuristic called.We adopted this heuristic as the default heuristic in AltAlt.In the following,we briefly describe this heuristic.The basic idea of is to adjust the sum heuristic to take positive and negative interactions into account.This heuristic approximates the cost of achieving the subgoals in some set as the sum of the cost of achieving,while considering positive interactions and ignoring negative interactions,plus the penalty for ignoring the negative interactions.Thefirst component can be computed as the length of a“relaxed plan”for supporting,which is extracted by ignoring all the mutex relations.To approximate the penalty induced by the negative interactions alone,we proceed with the following argument.Consider any pair of subgoals.If there are no negative interactions between and,then,the level at which and are present together is exactly the maximum of and.The degree of negative interaction between and can thus be quantified by:We now want to use the-values to characterize the amount of negative interactions present among the subgoals of a given set.If all subgoals in are pair-wise independent,clearly,all values will be zero,otherwise each pair of subgoals in will have a different value.The largest such value among any pair of subgoals in is used as a measure of the negative interactions present in in the heuristic .In summary,we haveHeuristic2(Adjusted heuristic2M)The analysis in[9,10]shows that this is one of the more robust heuristics in terms of both solution time and quality.This is thus the default heuristic used in AltAlt.4Implementational issues of extracting heuristics from planning graphWhile we described the main components and design issues underlying AltAlt system,there are several optimization issues that still deserve attention.Two of them are discussed in this section.4.1Controlling the Cost of Computing the HeuristicThefirst issue is the cost of computing the heuristic using planning graphs.Although,as we mentioned earlier,planning graph construction is a polynomial time operation,it does lead to relatively high time and space consumption in many problems.The main issues are the sheer size of the planning graph,and the cost of marking and managing mutex relations.Fortunately,however,there are several possible ways of keeping the heuristic computation cost in check.To begin with,one main reason for basing AltAlt onSTAN rather than other Graphplan implementations is that STAN provides a particularly compact and efficient planning graph construction.In particular,as described in[14],STAN exploits the redundancy in the planning graph and represents it using a very compact bi-level representation of the planning graph. Secondly,STAN uses efficient data structures to mark and manage the“mutex”relations.While the use of STAN system reduces planning graph construction costs significantly,heuristic computation cost can still be a large fraction of the total run time.For example,in one of the benchmark problems,bw-large-d,the heuristic computation takes3.5m.sec.while the search takes3m.sec. Worse yet,in some domains such as the Scheduling World from the AIPS2000competition suite[1], the graph construction phase winds up overwhelming the memory of the system.Thankfully,however,by trading off heuristic quality for reduced cost,we can aggressively limit the heuristic computation costs.Specifically,in the previous section,we discussed the extraction of heuristics from a full leveled planning graph.Since AltAlt does not do any search on the planning graph directly,there is no strict need to use the full leveled graph to preserve rmally, any subgraph of the full leveled planning graph can be gainfully utilized as the basis for the heuristic computation.There are at least three ways of computing a smaller subset of the leveled planning graph:1.Grow the planning graph to some length that is less than the length where it levels off.For example,we may grow the graph until the top level goals of the problem are present without any mutex relations in thefinal proposition level of the planning graph.2.Spend only limited time on marking mutexes on the planning graph.3.Introduce only a subset of the“applicable”actions at each level of the planning graph.For exam-ple,we can exploit the techniques such as RIFO[16]and identify a subset of the action instances in the domain that are likely to be“relevant”for solving the problem.Any combination of the above three techniques can be used to limit the space and time resources expended on computing the planning graph.What is more,it can be shown that the admissibility and completeness characteristics of the heuristic will remain unaffected as long as we do not use the third approach(recall that the definition of level in section3.2avoids assigning as the cost of a set if the underlying planning graph is not grown to level off).Only the informedness of the heuristic is affected. We shall see in the next section that in many problems the loss of informedness is more than offset by the improved time and space costs of the heuristic.4.2Limiting the Branching Factor of Regression Search Using Planning Graphs Although the preceding discussion focused on the use of the planning graphs for computing the heuris-tic in AltAlt,from Figure1,we see that the planning graph is also used to pick the action instances considered in expanding the regression search tree.The advantages of using the action instances from the planning graph are that in many domains there are a prohibitively large number of ground action instances,only a very small subset of which are actually applicable in any state reachable from the initial ing all such actions in regression search can significantly increase the cost of node expansion (and may,on occasion,lead the search down the wrong paths).In contrast,the action instances present in the planning graph are more likely to be applicable in states reachable from the initial state.The simplest way of picking action instances from the planning graph is to consider all action in-stances that are present in thefinal level of the planning graph.If the graph has been grown to level off,it can be proved that limiting regression search to this subset of actions is guaranteed to preserve complete-ness.A more aggressive selective expansion approach,that we call sel-exp,involves the following.HSP-r HSP2.0Problem Time Length Length Time gripper-15-45570.31 gripper-20-57730.84 gripper-25-6783 1.57 gripper-30-7793 2.83 tower-30.04770.04 tower-50.2131310.16 tower-7 2.63-127 1.37 tower-9108.85-51148.45 8-puzzle137.4045590.69 8-puzzle235.9252480.74 8-puzzle30.6324340.19 8-puzzle4 4.8826420.41 aips-grid1 1.07-140.88 aips-grid2--2695.98 mystery20.2089 3.53 mystery30.13440.26 mystery6 4.99-1662.25 mystery90.12880.49 mprime20.56799 5.79 mprime3 1.0244 1.67 mprime40.83810 1.29 mprime70.418-- 1.32 mprime16 5.56-6 4.74 mprime27 1.90-7 2.67Schedule Domain (AIPS-00)ProblemsT i m e (S e c o n d s )(a)Scheduling WorldFigure 4:Results in Jobshop Scheduling worldOur experiments were all done on a Linux system running on a 500mega hertz pentium III CPU with 256megabytes of RAM.We compared AltAlt with the latest versions of both STAN and HSP-r system running on the same hardware.HSP2.0is a recent variant of the HSP-r system that opportunistically shifts between regression search (HSP-r)and progression search (HSP).We also compare AltAlt to HSP2.0.The problems used in our experiments come from a variety of domains,and were derived primarily from the AIPS-2000competition suites [1],but also contain some other benchmark problems known in the literature.Unless noted otherwise,in all the experiments,AltAlt was run with the heuristic,and with a planning graph grown only until the first level where top level goals are presentwithout being mutex (see discussion in Section 4.1).Only the action instances present in the final level of the planning graph are used to expand nodes in the regression search (see Section 4.2).Table 1shows some statistics gathered from head-on comparisons between AltAlt ,STAN,HSP-r and HSP2.0across a variety of domains.For each system,the table gives the time taken to produce the solution,and the length (measured in the number of actions)of the solution produced.Dashes show problem instances that could not be solved by the corresponding system under a time limit of 10minutes.We note that AltAlt demonstrates robust performance across all the domains.It decisively outperforms STAN and HSP-r in most of the problems,easily solving both those problems that are hard for STAN as well as those that are hard for HSP-r.We also note that the quality of he solutions produced by AltAlt is as good or better than those produced by the other two systems in most problems.The table also shows a comparison with HSP2.0.While HSP2.0predictably outperforms HSP-r,it is still dominated by AltAlt ,especially in terms of solution quality.The plots in Figure 5compare the time performance of STAN,AltAlt and HSP2.0in specific do-mains.Plot a summarizes the problems from blocks world and the plot b refers to the problems from logistics domain,and the plot in Figure 4is from the scheduling world,three of the standard benchmark domains that have ben used in the recent planning competition [1].We see that in all domains,AltAlt clearly dominates STAN.It dominates HSP2.0in logistics and is very competitive with it in blocks world.Scheduling world was a very hard domain for most planners in the recent planning competition [1].We see that AltAlt scales much better than both STAN and HSP2.0.(Recall,once again,that HSP2.0uses aBlocks-world domain (AIPS-00)Problems T i m e (s e c o n d s )(a)Blocks World(b)LogisticsFigure 5:Results in Blocks World and Logisticscombination of progression and regression parison with HSP-r system would be even more decisively in favor of AltAlt .)Although not shown in the plots,the length of the solutions found by AltAlt in all these domains was as good or better than the other two systems.Cost/Quality tradeoffs in the heuristic computation:We mentioned earlier that in all these experi-ments we used a partial (non-leveled)planning graph that was grown only until all the goals are present and are non-mutex in the final level.As the discussion in Section 4.1showed,deriving heuristics from such partial planning graphs trades cost of the heuristic computation with quality.To get an an idea of how much of a hit on solution quality we are taking,we ran experiments comparing the same heuristicderived once from full leveled planning graph,and once from the partial planning graphstopped at the level where goals first become non-mutexed.The plots in Figure 6show the results of experiments with a large set of problems from the schedul-ing domain.Plot a shows the total time taken for heuristic computation and search together,and Plot b compares the length of the solution found for both strategies.We can see very clearly that if we insist on full leveled planning graph,we are unable to solve problems beyond 81,while the heuristic derived from the partial planning graph scales all the way to 161problems.The time taken by the partial plan-ning graph strategy is significantly lower,as expected.Plot b shows that even on the problems that are solved by both strategies,we do not incur any appreciable loss of solution quality because of the use of partial planning graph.The few points below the diagonal correspond to the problem instances on which the plans generated with the heuristic derived from the partial planning graph were longer than those generated with heuristic derived from the full leveled planning graph.This validates our contention in Section 4.1that the heuristic computation cost can be kept within limits without an appreciable loss in efficiency of search or the quality of the solution.It should be mentioned here that the planning graph computation cost depends a lot upon domains.In domains such as Towers of hanoi,where there are very few irrelevant actions,the full and partial planning graph strategies are almost indistinguishable in terms of cost.In contrast,domains such as grid world and scheduling world incur significantly higher planning graph construction costs,and thus benefit more readily by the use of partial planning graphs.Heuristic extracted from partial graph vs. leveled graphProblems T i m e (S e c o n d s )(a)RunningTime Comparing Solution QualitiesLev(S)L e v e l s o f f(b)Solution QualityFigure 6:Results on trading heuristic quality for cost by extracting heuristics from partial planning graphs.6Related workAs we had already discussed in the paper,by its very nature,AltAlt has obvious rich connections to the existing work on Graphplan [2,14,11]and heuristic state search planners [4,3,15,17].The idea of using the planning graph to select action instances to focus the regression search is similar to techniques such as RIFO [16],that use relevance analysis to focus progression search.As discussed in [9,10],there are several rich connections between our strategies for deriving the heuristics from the planning graphs,and recent advances in heuristic search,such as pattern databases [5],and capturing subproblem interactions[12,13].Finally,given that the informedness of our heuristics is closely related to the subgoal interaction analysis,pre-processing and consistency enforcement techniques,such as those described in [7,18,6]can be used to further improve the informedness of the heuristics.7Concluding RemarksWe described the implementation and evaluation of a novel plan synthesis system,called AltAlt .Al-tAlt is designed to exploit the complementary strengths of two of the currently popular competing ap-proaches for plan generation–Graphplan,and heuristic state search.It uses the planning graph to derive effective heuristics that are then used to guide heuristic state search.The heuristics derived from the planning graph do a better job of taking the subgoal interactions into account and as such are signif-icantly more effective than existing heuristics.AltAlt was implemented on top of two state of the art planning systems–STAN3.0a Graphlan-style planner,and HSP-r,a heuristic search planner.Our exten-sive empirical evaluation shows that AltAlt convincingly outperforms both STAN3.0and HSP-r.In fact,AltAlt ’s performance is very competitive with the planning systems that took part in the recent AI Plan-ning Competition [1].We also demonstrate that it is possible to aggressively reduce the cost of heuristic。
Heuristics for Thelen’s Prime Implicant Method 1
S C H E D A E I N F O R M A T I C A E1The research is partially supported by Polish State Committee for Scientific Research (KBN)grant No.4T11C00624.126the other hand,new variants of minimization methods requiring all the prime implicants are still being developed[8,10,11].And there are a lot of other applications of a method of prime implicants generation.For example,calcu-lation of the complement of a Boolean function(in DNF),or transformation of a Boolean equation from CNF to DNF.And vice versa–as far as due to the Morgan’s laws transformation from DNF to CNF can be performed by transformation from CNF to DNF.One more application is detecting dead-locks and traps in a Petri net,which can be performed by solving logical equations[13,14].Generally,solutions of a logical equation can be easily obtained from prime implicants of its left part,if the right part is1.Also there are tasks,which can be solved by calculating the shortest prime implicant or prime implicants satisfying certain conditions.In[5]several of such logical design tasks are discussed.Covering problems,both unate and binate covering,can be easily represented as logical expressions in CNF and are usually solved by one of two approaches:BDD-based[1]or branch and bound,for which the shortest prime implicant would correspond to an optimal solution[2].The same is true for some graph problems,such as decyclisation of graphs[4].Task of detecting deadlocks in FSM networks can be reduced to task of generating a subset of prime implicants.The approach discussed in this paper can be applied(directly or with some modifications) to the whole range of mentioned problems.For generation of prime implicants several algorithms are known.The method of Nelson[9],probably historicallyfirst such method for CNF,is based on straightforward multiplying the disjunctions and deleting the prod-ucts that subsume other products.Such transformation is very time-and memory-consuming.More efficient methods are known:an algorithm based on a search tree,proposed by B.Thelen[12],and a recursive method de-scribed in[8].Comparison of those two methods is beyond the scope of our paper;the paper is dedicated to heuristics allowing to accelerate Thelen’s method.Execution time of this algorithm depends remarkably on the order of clauses and literals in the expression.Hence we may suppose that some reordering of the expression will increase efficiency of the algorithm.As far as the search tree in Thelen’s method is reduced by means of certain rules (described below),it is difficult to evaluate a priori effects of different variants of reordering.So it is reasonable to use a heuristic approach and to verify the heuristics statistically.Some of such heuristics are described in[5,6].The article describes some new heuristics,their analysis and comparison with known heuristics.Experiments are performed by using the randomly generated samples;the optimal combination of the heuristics is formulated on the basis of experimental results.127 2.Thelen’s algorithmThelen’s prime implicant algorithm is based on the method of Nelson[9],who has shown,that all the prime implicants of a Boolean function ina conjunctive form can be obtained by its transformation into a disjunctive form.Nelson’s transformation is performed by straightforward multiplying the disjunctions and deleting the products that subsume other products. Such transformation is very time-and memory-consuming,because all the intermediate products should be kept in memory,and their number grows exponentially.Thelen’s algorithm transforms CNF into DNF in a much more efficient way.It requires linear memory for transformation and additional memory for calculated prime implicants.The subsuming products are not kept in memory.A search tree is built,such that every level of it corresponds to a clause of the CNF,and the outgoing arcs of a node correspond to the literals of the disjunction.Conjunction of all the literals corresponding to the arcs at the path from the root of the tree to a node is associated with the node.Leaf nodes of the tree are the elementary conjunctions being the prime implicants of the expression or the implicants subsuming the prime implicants calculated before.A sample tree is shown in Fig.1.The tree is searched in DFS order,and several pruning rules are used to minimize it.The rules are listed below.R1An arc is pruned,if its predecessor node-conjunction contains the com-plement of the arc-literal.R2An arc is pruned,if another non-expanded arc on a higher level still exists which has the same arc-literal.R3A disjunction is discarded,if it contains a literal which appears also in the predecessor node-conjunction.The rules above are based on the following laws of Boolean algebra:a∧a=aa∨a∧b=aa∧a=1(3)a∨1=1(4)a∧1=a(5)128Rules R1and R3follow immediately from(2)and(3).Rule R2provides that the implicants associated with the leaf nodes,if they are not prime, subsume the implicants calculated before.That means that thefirst calcu-lated implicant is always simple.An arc at level k with arc-literal x,such that there is a non-expanded arc with the same arc-literal at level l higher than k,is pruned by it.An implicant obtained by expanding the mentioned arc would be at least one literal shorter than the implicant which would be obtained without applying rule R2.As far as the path two times comes through literal x(at the levels k and l),according to(1),(2)the longest of those two implicants subsumes the shortest one.Hence thefirst calculated implicant cannot subsume the implicants calculated later,but it can be sub-sumed by them.So,applying rule R2allows to check whether an implicant is simple immediately after its calculation.It is enough to compare it with all the implicants calculated before.Due to this property the algorithm is less memory-consuming,because only prime implicants are kept.Fig.1.An example of the tree for Boolean formula:(a∨129 R4An arc j is pruned,if another already expanded arc k with the same arc-literal exists on a higher level v and if rule R2was not applied in the subtree of arc k with respect to arc p on level v which leads to arc j.But using this rule complicates the algorithm remarkably,because addi-tional information on applying rule R2has to be kept.Additional reduction reduces probability of appearing the non-prime implicants at the leaf nodes. But there is no guarantee that such implicants will not appear,and still it is necessary to perform checking,the same as in the case of tree built with using only3pruning rules.The next expression is an example for which non-prime implicants still appear even if all4rules are used:(x∨y)y(y∨z)z(x∨z) (Fig.3).3.Heuristics for Thelen’s methodOne of the possibilities of reducing the search tree is sorting the disjunc-tions by their size in ascending order.Heuristic1(Sort by Length[5]).Choose disjunction D j with the small-est number of literals.Effect of this heuristics can be illustrated with a complete search tree (without arc pruning).Its size(number of nodes)can be calculated accordingto the formula:|V|=1+ni=1ij=1L j(6)where L j is the number of literals in clause number j.Let a formula consist of5clauses,each having a different number of literals,from2to6.If they are sorted from maximal to minimal length,the complete search tree will contain1237nodes;if sorted from minimal to maximal the tree will contain only873nodes.In the second case it is30%smaller.So sorting of clauses influences the tree size remarkably.Of course for the reduced search trees relation may differ.Now let us turn to the pruning rules.Note that every rule can be imple-mented only if the disjunction under consideration contains the same vari-ables as the disjunctions corresponding to the predecessor nodes.That means that if the next disjunction considers the variables which appear in the pre-vious disjunctions,there are possibilities of reduction at that level;and there is no possibility of reduction for the new variables.So we may suppose that130sorting of the clauses according to the variables also may lead to the tree reduction.Here the similar effect is used as in the case of sorting by length: disjunctions containing many repeating variables allow to reduce the tree re-markably,and if such reduction can be performed not far from the root,the tree will be growing slower.So the following heuristics reorder the disjunc-tions in such a way that minimal number of new variables appears at every next level of the search tree.Heuristic2a(Sort by Literals).Choose disjunction D j with the smallest number of literals that do not appear in the disjunctions chosen before.Heuristic2b(Sort by Variables).Choose disjunction D j with the small-est number of variables that do not appear in the disjunctions chosen before.The only difference between these two variants is that heuristic2a com-pares clauses according to literals and heuristic2b according to variables. This means that131Fig.2.An example of the tree,in which effects of heuristic4and rule R4are the sameappearing non-prime implicants at the leaf nodes.As far as due to rule R2an implicant can subsume only the implicants calculated before,if the implicants calculated later are in most cases shorter than those calculated earlier,then chance of subsuming is small.The next heuristic is a reversion of heuristic3.Heuristic4(Reordering Literals).Choose literal v i with the minimum frequency in the non-expanded part of the expression.In many cases(but not always)effects of rule R4and heuristic4are very similar.Rule R4prunes an arc,if at a higher level there is a non-expanded arc with the same arc-literal(let it be a).That means that at the level k literal a is not the last literal in the clause.Let literalb there is a path to the node under consideration at level l.If literal a would be the last in the clause, instead of R4the rule R2would be applicable with the same effect(Fig.2).We may also state that if literal a appears at level k and also at a lower level l(that means in clauses D k oraz D l(k>l)),then if b does not appear in the clauses with numbers greater than k,after applying heuristic4in the clause D k literal b will appear before a and R2will be applicable instead of R4.But if b appears in the clauses with numbers greater than k,such an effect will not always occur.Here is example of heuristic4:(a∨c∨d)After applying the heuristic:(c∨d)132Such ordering of literals causes that the arc leading to non-prime impli-canta∨b∨c)(a∨c∨d)(b,appearing in clause2,appear in the next clauses with the same frequency,and without applying rule R4the algorithm will generate a non-prime implicantP H1H4R4 N N T%N T%N 14463361296752217.80702 1.739620x2677559504122.6420259633.50426855.0 121502459942.1028319.9020x2224863166127.431970228.2070228.2 70317125143547.5073824.4020x21134667149.98433324.7033324.7 3637748157536.2065415.0020x23265066024.95688133.2088133.2 1174064064181122.30113933.3025x31512752603050.84895636512.40637412.4 14490265401770 3.204380.87225x32632342282636.162264170 6.604170 6.6 91288288410960.1097014.230 Avg:133Fig.3.An example of the tree,in which there are differences between heuristic4 and rule R4implicants.But rule R4is more difficult for implementation and increases a necessary memory amount.So it seems that applying heuristic4is more reasonable because allows obtaining a similar effect with less effort.Results of computer experiments are summarized in Tab.1.For the tests the randomly generated Boolean expressions were used.In thefirst column number of variables and the number of clauses of an expression are given(e.g. 20x18).T denotes the tree size(number of nodes);P denotes the number of prime implicants;N denotes the number of non-prime implicants,being the leaves of the search tree.A column‘%’shows for every heuristic the percentage of the tree size in respect of the size in the case when no heuristic is used.The experiments show that it is best to sort disjunctions according to heuristic2a,and literals in the disjunctions according to heuristic4.1344.Conclusion and further workThe presented heuristics,according to the experimental results,allow to generate all the prime implicants of a logical expression represented in the conjunctive normal form more quickly,than it can be done by using Thelen’s method with the heuristics known before.Besides of that,the presented heuristics reduce remarkably the number of the leaf nodes in the search tree corresponding to non-prime implicants.A prospective direction of future work is evaluation of efficiency of the proposed heuristics for solving problems mentioned in Introduction,for which Thelen’s algorithm can be applied.That may require taking into account ad-ditional optimization parameters and modification of heuristics.One more direction is comparison between Thelen’s approach and the BDD-based ap-proach to solving problems such as covering problems.5.References[1]Brayton R.K.et al.;VIS:A System for Verification and Synthesis,in:The Pro-ceedings of the Conf.on Computer-Aided Verification,August1996,Springer Verlag,1102,pp.332–334.[2]Coudert O.,Madre J.K.;New Ideas for Solving Covering Problems,DesignAutomation Conference,1995,pp.641–646.[3]Coudert O.,Madre J.K.,Fraisse H.;A New Viewpoint on Two-Level LogicMinimization,Design Automation Conference,1993,pp.625–630.[4]Karatkevich A.;On Algorithms for Decyclisation of Oriented Graphs,in:Pro-ceedings of the International Workshop DESDes’01,Zielona G´o ra,Poland, 2001,pp.35–40.[5]Mathony H.J.;Universal logic design algorithm and its application the synthesisof two-level switching circuits,IEE Proceedings,136,3,1989,pp.171–177. [6]Mathony H.J.;Algorithmic Design of Two-Level and Multi-Level Switching Cir-cuits,(in German),PhD thesis,ITIV,Univ.of Karlsruhe,1988.[7]McGeer P.C.et al.;Espresso-Signature:A New Exact Minimizer for LogicFunctions,Design Automation Conference,1993,pp.618–624.[8]De Micheli D.;Synthesis and Optimization of Digital Circuits,Stanford Univ.,McGraw-Hill,Inc.,1994.135 [9]Nelson R.;Simplest Normal Truth Functions,Journal of Symbolic Logic,20,2,1955,pp.105–108.[10]Rudell R.,Sangiovanni-Vincentelli A.;Multiple-valued Minimization for PLAOptimization,IEEE Transactions on CAD/ICAS,Sept.1987,CAD-6,5,1987, pp.727–750.[11]Rytsar B.,Minziuk V.;The Set-theoretical Modification of Boolean FunctionsMinimax Covering Method,in:Proceedings of the International Conference TCSET’2004,Lviv–Slavsko,Ukraine,2004,pp.46–48.[12]Thelen B.;Investigations of algorithms for computer-aided logic design of digitalcircuits,(in German),PhD thesis,ITIV,Univ.of Karlsruhe,1981.[13]W¸e grzyn A.,W¸e grzyn M.;Symbolic Verification of Concurrent Logic Con-trollers by Means Petri Nets,in:Proceedings of the Third International Con-ference CAD DD’99,Minsk,Belarus,1999,pp.45–50.[14]W¸e grzyn A.,Karatkevich A.,Bieganowski J.;Detection of deadlocks and trapsin Petri nets by means of Thelen’s prime implicant method,AMCS,14,1,2004, pp.113–121.Received March8,2004。
专题05 阅读理解D篇(2024年新课标I卷) (专家评价+三年真题+满分策略+多维变式) 原卷版
《2024年高考英语新课标卷真题深度解析与考后提升》专题05阅读理解D篇(新课标I卷)原卷版(专家评价+全文翻译+三年真题+词汇变式+满分策略+话题变式)目录一、原题呈现P2二、答案解析P3三、专家评价P3四、全文翻译P3五、词汇变式P4(一)考纲词汇词形转换P4(二)考纲词汇识词知意P4(三)高频短语积少成多P5(四)阅读理解单句填空变式P5(五)长难句分析P6六、三年真题P7(一)2023年新课标I卷阅读理解D篇P7(二)2022年新课标I卷阅读理解D篇P8(三)2021年新课标I卷阅读理解D篇P9七、满分策略(阅读理解说明文)P10八、阅读理解变式P12 变式一:生物多样性研究、发现、进展6篇P12变式二:阅读理解D篇35题变式(科普研究建议类)6篇P20一原题呈现阅读理解D篇关键词: 说明文;人与社会;社会科学研究方法研究;生物多样性; 科学探究精神;科学素养In the race to document the species on Earth before they go extinct, researchers and citizen scientists have collected billions of records. Today, most records of biodiversity are often in the form of photos, videos, and other digital records. Though they are useful for detecting shifts in the number and variety of species in an area, a new Stanford study has found that this type of record is not perfect.“With the rise of technology it is easy for people to make observation s of different species with the aid of a mobile application,” said Barnabas Daru, who is lead author of the study and assistant professor of biology in the Stanford School of Humanities and Sciences. “These observations now outnumber the primary data that comes from physical specimens(标本), and since we are increasingly using observational data to investigate how species are responding to global change, I wanted to know: Are they usable?”Using a global dataset of 1.9 billion records of plants, insects, birds, and animals, Daru and his team tested how well these data represent actual global biodiversity patterns.“We were particularly interested in exploring the aspects of sampling that tend to bias (使有偏差) data, like the greater likelihood of a citizen scientist to take a picture of a flowering plant instead of the grass right next to it,” said Daru.Their study revealed that the large number of observation-only records did not lead to better global coverage. Moreover, these data are biased and favor certain regions, time periods, and species. This makes sense because the people who get observational biodiversity data on mobile devices are often citizen scientists recording their encounters with species in areas nearby. These data are also biased toward certain species with attractive or eye-catching features.What can we do with the imperfect datasets of biodiversity?“Quite a lot,” Daru explained. “Biodiversity apps can use our study results to inform users of oversampled areas and lead them to places – and even species – that are not w ell-sampled. To improve the quality of observational data, biodiversity apps can also encourage users to have an expert confirm the identification of their uploaded image.”32. What do we know about the records of species collected now?A. They are becoming outdated.B. They are mostly in electronic form.C. They are limited in number.D. They are used for public exhibition.33. What does Daru’s study focus on?A. Threatened species.B. Physical specimens.C. Observational data.D. Mobile applications.34. What has led to the biases according to the study?A. Mistakes in data analysis.B. Poor quality of uploaded pictures.C. Improper way of sampling.D. Unreliable data collection devices.35. What is Daru’s suggestion for biodiversity apps?A. Review data from certain areas.B. Hire experts to check the records.C. Confirm the identity of the users.D. Give guidance to citizen scientists.二答案解析三专家评价考查关键能力,促进思维品质发展2024年高考英语全国卷继续加强内容和形式创新,优化试题设问角度和方式,增强试题的开放性和灵活性,引导学生进行独立思考和判断,培养逻辑思维能力、批判思维能力和创新思维能力。
A_review_of_feature_selection_techniques_in_bioinformatics
A review of feature selection techniques in bioinformaticsAbstractFeature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed techniques.In this article, we make the interested reader aware of the possibilities of feature selection, providing a basic taxonomy of feature selection techniques, and discussing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.1 INTRODUCTIONDuring the last decade, the motivation for applying feature selection (FS) techniques in bioinformatics has shifted from being an illustrative example to becoming a real prerequisite for model building. In particular, the high dimensional nature of many modelling tasks in bioinformatics, going from sequence analysis over microarray analysis to spectral analyses and literature mining has given rise to a wealth of feature selection techniques being presented in the field.In this review, we focus on the application of feature selection techniques. In contrast to other dimensionality reduction techniques like those based on projection (e.g. principal component analysis) or compression (e.g. using information theory), feature selection techniques do not alter the original representation of the variables, but merely select a subset of them. Thus, they preserve the original semantics of the variables, hence, offering the advantage of interpretability by a domain expert.While feature selection can be applied to both supervised and unsupervised learning, we focus here on the problem of supervised learning (classification), where the class labels are known beforehand. The interesting topic of feature selection for unsupervised learning (clustering) is a more complex issue, and research into this field is recently getting more attention in several communities (Liu and Yu, 2005; Varshavsky et al., 2006).The main aim of this review is to make practitioners aware of the benefits, and in some cases even the necessity of applying feature selection techniques. Therefore, we provide an overview of the different feature selection techniques for classification: we illustrate them by reviewing the most important application fields in the bioinformatics domain, highlighting the efforts done by the bioinformatics community in developing novel and adapted procedures. Finally, we also point the interested reader to some useful data mining and bioinformatics software packages that can be used for feature selection.Previous SectionNext Section2 FEATURE SELECTION TECHNIQUESAs many pattern recognition techniques were originally not designed to cope with large amounts of irrelevant features, combining them with FS techniques has become a necessity in many applications (Guyon and Elisseeff, 2003; Liu and Motoda, 1998; Liu and Yu, 2005). The objectives of feature selection are manifold, the most important ones being: (a) to avoid overfitting andimprove model performance, i.e. prediction performance in the case of supervised classification and better cluster detection in the case of clustering, (b) to provide faster and more cost-effective models and (c) to gain a deeper insight into the underlying processes that generated the data. However, the advantages of feature selection techniques come at a certain price, as the search for a subset of relevant features introduces an additional layer of complexity in the modelling task. Instead of just optimizing the parameters of the model for the full feature subset, we now need to find the optimal model parameters for the optimal feature subset, as there is no guarantee that the optimal parameters for the full feature set are equally optimal for the optimal feature subset (Daelemans et al., 2003). As a result, the search in the model hypothesis space is augmented by another dimension: the one of finding the optimal subset of relevant features. Feature selection techniques differ from each other in the way they incorporate this search in the added space of feature subsets in the model selection.In the context of classification, feature selection techniques can be organized into three categories, depending on how they combine the feature selection search with the construction of the classification model: filter methods, wrapper methods and embedded methods. Table 1 provides a common taxonomy of feature selection methods, showing for each technique the most prominent advantages and disadvantages, as well as some examples of the most influential techniques.Table 1.A taxonomy of feature selection techniques. For each feature selection type, we highlight a set of characteristics which can guide the choice for a technique suited to the goals and resources of practitioners in the fieldFilter techniques assess the relevance of features by looking only at the intrinsic properties of the data. In most cases a feature relevance score is calculated, and low-scoring features are removed. Afterwards, this subset of features is presented as input to the classification algorithm. Advantages of filter techniques are that they easily scale to very high-dimensional datasets, they are computationally simple and fast, and they are independent of the classification algorithm. As a result, feature selection needs to be performed only once, and then different classifiers can be evaluated.A common disadvantage of filter methods is that they ignore the interaction with the classifier (the search in the feature subset space is separated from the search in the hypothesis space), and that most proposed techniques are univariate. This means that each feature is considered separately, thereby ignoring feature dependencies, which may lead to worse classification performance when compared to other types of feature selection techniques. In order to overcome the problem of ignoring feature dependencies, a number of multivariate filter techniques were introduced, aiming at the incorporation of feature dependencies to some degree.Whereas filter techniques treat the problem of finding a good feature subset independently of the model selection step, wrapper methods embed the model hypothesis search within the feature subset search. In this setup, a search procedure in the space of possible feature subsets is defined, and various subsets of features are generated and evaluated. The evaluation of a specific subset of features is obtained by training and testing a specific classification model, rendering this approach tailored to a specific classification algorithm. To search the space of all feature subsets, a search algorithm is then ‘wrapped’ around the classification model. However, as the space of feature subsets grows exponentially with the number of features, heuristic search methods are used to guide the search for an optimal subset. These search methods can be divided in two classes: deterministic and randomized search algorithms. Advantages of wrapper approaches include the interaction between feature subset search and model selection, and the ability to take into account feature dependencies. A common drawback of these techniques is that they have a higher risk of overfitting than filter techniques and are very computationally intensive, especially if building the classifier has a high computational cost.In a third class of feature selection techniques, termed embedded techniques, the search for an optimal subset of features is built into the classifier construction, and can be seen as a search in the combined space of feature subsets and hypotheses. Just like wrapper approaches, embedded approaches are thus specific to a given learning algorithm. Embedded methods have the advantage that they include the interaction with the classification model, while at the same time being far less computationally intensive than wrapper methods.Previous SectionNext Section3 APPLICATIONS IN BIOINFORMATICS3.1 Feature selection for sequence analysisSequence analysis has a long-standing tradition in bioinformatics. In the context of feature selection, two types of problems can be distinguished: content and signal analysis. Content analysis focuses on the broad characteristics of a sequence, such as tendency to code for proteins or fulfillment of a certain biological function. Signal analysis on the other hand focuses on the identification of important motifs in the sequence, such as gene structural elements or regulatory elements.Apart from the basic features that just represent the nucleotide or amino acid at each position in a sequence, many other features, such as higher order combinations of these building blocks (e.g.k-mer patterns) can be derived, their number growing exponentially with the pattern length k. As many of them will be irrelevant or redundant, feature selection techniques are then applied to focus on the subset of relevant variables.3.1.1 Content analysisThe prediction of subsequences that code for proteins (coding potential prediction) has been a focus of interest since the early days of bioinformatics. Because many features can be extracted from a sequence, and most dependencies occur between adjacent positions, many variations of Markov models were developed. To deal with the high amount of possible features, and the often limited amount of samples, (Salzberg et al., 1998) introduced the interpolated Markov model (IMM), which used interpolation between different orders of the Markov model to deal with small sample sizes, and a filter method (χ2) to select only relevant features. In further work, (Delcher et al., 1999) extended the IMM framework to also deal with non-adjacent feature dependencies, resulting in the interpolated context model (ICM), which crosses a Bayesian decision tree with a filter method (χ2) to assess feature relevance. Recently, the avenue of FS techniques for coding potential prediction was further pursued by (Saeys et al., 2007), who combined different measures of coding potential prediction, and then used the Markov blanket multivariate filter approach (MBF) to retain only the relevant ones.A second class of techniques focuses on the prediction of protein function from sequence. The early work of Chuzhanova et al. (1998), who combined a genetic algorithm in combination with the Gamma test to score feature subsets for classification of large subunits of rRNA, inspired researchers to use FS techniques to focus on important subsets of amino acids that relate to the protein's; functional class (Al-Shahib et al., 2005). An interesting technique is described in Zavaljevsky et al. (2002), using selective kernel scaling for support vector machines (SVM) as a way to asses feature weights, and subsequently remove features with low weights.The use of FS techniques in the domain of sequence analysis is also emerging in a number of more recent applications, such as the recognition of promoter regions (Conilione and Wang, 2005), and the prediction of microRNA targets (Kim et al., 2006).3.1.2 Signal analysisMany sequence analysis methodologies involve the recognition of short, more or less conserved signals in the sequence, representing mainly binding sites for various proteins or protein complexes. A common approach to find regulatory motifs, is to relate motifs to gene expressionlevels using a regression approach. Feature selection can then be used to search for the motifs that maximize the fit to the regression model (Keles et al., 2002; Tadesse et al.,2004). In Sinha (2003), a classification approach is chosen to find discriminative motifs. The method is inspired by Ben-Dor et al. (2000) who use the threshold number of misclassification (TNoM, see further in the section on microarray analysis) to score genes for relevance to tissue classification. From the TNoM score, a P-value is calculated that represents the significance of each motif. Motifs are then sorted according to their P-value.Another line of research is performed in the context of the gene prediction setting, where structural elements such as the translation initiation site (TIS) and splice sites are modelled as specific classification problems. The problem of feature selection for structural element recognition was pioneered in Degroeve et al. (2002) for the problem of splice site prediction, combining a sequential backward method together with an embedded SVM evaluation criterion to assess feature relevance. In Saeys et al. (2004), an estimation of distribution algorithm (EDA, a generalization of genetic algorithms) was used to gain more insight in the relevant features for splice site prediction. Similarly, the prediction of TIS is a suitable problem to apply feature selection techniques. In Liu et al. (2004), the authors demonstrate the advantages of using feature selection for this problem, using the feature-class entropy as a filter measure to remove irrelevant features.In future research, FS techniques can be expected to be useful for a number of challenging prediction tasks, such as identifying relevant features related to alternative splice sites and alternative TIS.3.2 Feature selection for microarray analysisDuring the last decade, the advent of microarray datasets stimulated a new line of research in bioinformatics. Microarray data pose a great challenge for computational techniques, because of their large dimensionality (up to several tens of thousands of genes) and their small sample sizes (Somorjai et al., 2003). Furthermore, additional experimental complications like noise and variability render the analysis of microarray data an exciting domain.In order to deal with these particular characteristics of microarray data, the obvious need for dimension reduction techniques was realized (Alon et al., 1999; Ben-Dor et al., 2000; Golub et al., 1999; Ross et al., 2000), and soon their application became a de facto standard in the field. Whereas in 2001, the field of microarray analysis was still claimed to be in its infancy (Efron et al., 2001), a considerable and valuable effort has since been done to contribute new and adapt known FS methodologies (Jafari and Azuaje, 2006). A general overview of the most influential techniques, organized according to the general FS taxonomy of Section 2, is shown in Table 2.Table 2.Key references for each type of feature selection technique in the microarray domain3.2.1 The univariate filter paradigm: simple yet efficientBecause of the high dimensionality of most microarray analyses, fast and efficient FS techniques such as univariate filter methods have attracted most attention. The prevalence of these univariate techniques has dominated the field, and up to now comparative evaluations of different classification and FS techniques over DNA microarray datasets only focused on the univariate case (Dudoit et al., 2002; Lee et al., 2005; Li et al., 2004; Statnikov et al., 2005). This domination of the univariate approach can be explained by a number of reasons:the output provided by univariate feature rankings is intuitive and easy to understand;the gene ranking output could fulfill the objectives and expectations that bio-domain experts have when wanting to subsequently validate the result by laboratory techniques or in order to explore literature searches. The experts could not feel the need for selection techniques that take into account gene interactions;the possible unawareness of subgroups of gene expression domain experts about the existence of data analysis techniques to select genes in a multivariate way;the extra computation time needed by multivariate gene selection techniques.Some of the simplest heuristics for the identification of differentially expressed genes include setting a threshold on the observed fold-change differences in gene expression between the states under study, and the detection of the threshold point in each gene that minimizes the number of training sample misclassification (threshold number of misclassification, TNoM (Ben-Dor etal.,2000)). However, a wide range of new or adapted univariate feature ranking techniques has since then been developed. These techniques can be divided into two classes: parametric and model-free methods (see Table 2).Parametric methods assume a given distribution from which the samples (observations) have been generated. The two sample t-test and ANOVA are among the most widely used techniques in microarray studies, although the usage of their basic form, possibly without justification of their main assumptions, is not advisable (Jafari and Azuaje, 2006). Modifications of the standard t-test to better deal with the small sample size and inherent noise of gene expression datasets include a number of t- or t-test like statistics (differing primarily in the way the variance is estimated) and a number of Bayesian frameworks (Baldi and Long, 2001; Fox and Dimmic, 2006). Although Gaussian assumptions have dominated the field, other types of parametrical approaches can also be found in the literature, such as regression modelling approaches (Thomas et al., 2001) and Gamma distribution models (Newton et al.,2001).Due to the uncertainty about the true underlying distribution of many gene expression scenarios, and the difficulties to validate distributional assumptions because of small sample sizes,non-parametric or model-free methods have been widely proposed as an attractive alternative to make less stringent distributional assumptions (Troyanskaya et al., 2002). Many model-free metrics, frequently borrowed from the statistics field, have demonstrated their usefulness in many gene expression studies, including the Wilcoxon rank-sum test (Thomas et al., 2001), the between-within classes sum of squares (BSS/WSS) (Dudoit et al., 2002) and the rank products method (Breitling et al., 2004).A specific class of model-free methods estimates the reference distribution of the statistic using random permutations of the data, allowing the computation of a model-free version of the associated parametric tests. These techniques have emerged as a solid alternative to deal with the specificities of DNA microarray data, and do not depend on strong parametric assumptions (Efron et al., 2001; Pan, 2003; Park et al., 2001; Tusher et al., 2001). Their permutation principle partly alleviates the problem of small sample sizes in microarray studies, enhancing the robustness against outliers.We also mention promising types of non-parametric metrics which, instead of trying to identify differentially expressed genes at the whole population level (e.g. comparison of sample means), are able to capture genes which are significantly disregulated in only a subset of samples (Lyons-Weiler et al., 2004; Pavlidis and Poirazi, 2006). These types of methods offer a more patient specific approach for the identification of markers, and can select genes exhibiting complex patterns that are missed by metrics that work under the classical comparison of two prelabelled phenotypic groups. In addition, we also point out the importance of procedures for controlling the different types of errors that arise in this complex multiple testing scenario of thousands of genes (Dudoit et al., 2003; Ploner et al., 2006; Pounds and Cheng, 2004; Storey, 2002), with a special focus on contributions for controlling the false discovery rate (FDR).3.2.2 Towards more advanced models: the multivariate paradigm for filter, wrapperand embedded techniquesUnivariate selection methods have certain restrictions and may lead to less accurate classifiers by, e.g. not taking into account gene–gene interactions. Thus, researchers have proposed techniques that try to capture these correlations between genes.The application of multivariate filter methods ranges from simple bivariate interactions (Bø and Jonassen, 2002) towards more advanced solutions exploring higher order interactions, such as correlation-based feature selection (CFS) (Wang et al., 2005; Yeoh et al., 2002) and several variants of the Markov blanket filter method (Gevaert et al., 2006; Mamitsuka, 2006; Xing et al., 2001). The Minimum Redundancy-Maximum Relevance (MRMR) (Ding and Peng, 2003) and Uncorrelated Shrunken Centroid (USC) (Yeung and Bumgarner, 2003) algorithms are two other solid multivariate filter procedures, highlighting the advantage of using multivariate methods over univariate procedures in the gene expression domain.Feature selection using wrapper or embedded methods offers an alternative way to perform a multivariate gene subset selection, incorporating the classifier's; bias into the search and thus offering an opportunity to construct more accurate classifiers. In the context of microarray analysis, most wrapper methods use population-based, randomized search heuristics (Blanco et al., 2004; Jirapech-Umpai and Aitken, 2005; Li et al., 2001; Ooi and Tan, 2003), although also a few examples use sequential search techniques (Inza et al., 2004; Xiong et al., 2001). An interesting hybrid filter-wrapper approach is introduced in (Ruiz et al., 2006), crossing a univariatelypre-ordered gene ranking with an incrementally augmenting wrapper method.Another characteristic of any wrapper procedure concerns the scoring function used to evaluate each gene subset found. As the 0–1 accuracy measure allows for comparison with previous works, the vast majority of papers uses this measure. However, recent proposals advocate the use of methods for the approximation of the area under the ROC curve (Ma and Huang, 2005), or the optimization of the LASSO (Least Absolute Shrinkage and Selection Operator) model (Ghosh and Chinnaiyan, 2005). ROC curves certainly provide an interesting evaluation measure, especially suited to the demand for screening different types of errors in many biomedical scenarios.The embedded capacity of several classifiers to discard input features and thus propose a subset of discriminative genes, has been exploited by several authors. Examples include the use of random forests (a classifier that combines many single decision trees) in an embedded way to calculate the importance of each gene (Díaz-Uriarte and Alvarez de Andrés, 2006; Jiang et al., 2004). Another line of embedded FS techniques uses the weights of each feature in linear classifiers, such as SVMs (Guyon et al., 2002) and logistic regression (Ma and Huang, 2005). These weights are used to reflect the relevance of each gene in a multivariate way, and thus allow for the removal of genes with very small weights.Partially due to the higher computational complexity of wrapper and to a lesser degree embedded approaches, these techniques have not received as much interest as filter proposals. However, an advisable practice is to pre-reduce the search space using a univariate filter method, and only then apply wrapper or embedded methods, hence fitting the computation time to the available resources.3.3 Mass spectra analysisMass spectrometry technology (MS) is emerging as a new and attractive framework for disease diagnosis and protein-based biomarker profiling (Petricoin and Liotta, 2003). A mass spectrum sample is characterized by thousands of different mass/charge (m / z) ratios on the x-axis, each with their corresponding signal intensity value on the y-axis. A typical MALDI-TOF low-resolution proteomic profile can contain up to 15 500 data points in the spectrum between 500 and 20 000 m / z, and the number of points even grows using higher resolution instruments.For data mining and bioinformatics purposes, it can initially be assumed that each m / z ratio represents a distinct variable whose value is the intensity. As Somorjai et al. (2003) explain, the data analysis step is severely constrained by both high-dimensional input spaces and their inherent sparseness, just as it is the case with gene expression datasets. Although the amount of publications on mass spectrometry based data mining is not comparable to the level of maturity reached in the microarray analysis domain, an interesting collection of methods has been presented in the last 4–5 years (see Hilario et al., 2006; Shin and Markey, 2006 for recent reviews) since the pioneering work of Petricoin et al.(2002).Starting from the raw data, and after an initial step to reduce noise and normalize the spectra from different samples (Coombes et al., 2007), the following crucial step is to extract the variables that will constitute the initial pool of candidate discriminative features. Some studies employ the simplest approach of considering every measured value as a predictive feature, thus applying FS techniques over initial huge pools of about 15 000 variables (Li et al., 2004; Petricoin et al., 2002), up to around 100 000 variables (Ball et al.,2002). On the other hand, a great deal of the current studies performs aggressive feature extraction procedures using elaborated peak detection and alignment techniques (see Coombes et al., 2007; Hilario et al., 2006; Shin and Markey, 2006 for a detailed description of these techniques). These procedures tend to seed the dimensionality from which supervised FS techniques will start their work in less than 500 variables (Bhanot et al., 2006; Ressom et al., 2007; Tibshirani et al., 2004). A feature extraction step is thus advisable to set the computational costs of many FS techniques to a feasible size in these MS scenarios. Table 3 presents an overview of FS techniques used in the domain of mass spectrometry. Similar to the domain of microarray analysis, univariate filter techniques seem to be the most common techniques used, although the use of embedded techniques is certainly emerging as an alternative. Although the t-test maintains a high level of popularity (Liu et al., 2002; Wu et al., 2003), other parametric measures such as F-test (Bhanot et al., 2006), and a notable variety of non-parametric scores (Tibshirani et al., 2004; Yu et al., 2005) have also been used in several MS studies. Multivariate filter techniques on the other hand, are still somewhat underrepresented (Liu et al., 2002; Prados et al., 2004).Table 3.Key references for each type of feature selection technique in the domain of mass pectrometryWrapper approaches have demonstrated their usefulness in MS studies by a group of influential works. Different types of population-based randomized heuristics are used as search engines in the major part of these papers: genetic algorithms (Li et al., 2004; Petricoin et al., 2002), particle swarm optimization (Ressom et al., 2005) and ant colony procedures (Ressom et al., 2007). It is worth noting that while the first two references start the search procedure in ∼ 15 000 dimensions by considering each m / z ratio as an initial predictive feature, aggressive peak detection and alignment processes reduce the initial dimension to about 300 variables in the last two references (Ressom et al., 2005; Ressom et al., 2007).An increasing number of papers uses the embedded capacity of several classifiers to discard input features. Variations of the popular method originally proposed for gene expression domains by Guyon et al. (2002), using the weights of the variables in the SVM-formulation to discard features with small weights, have been broadly and successfully applied in the MS domain (Jong et al., 2004; Prados et al., 2004; Zhang et al., 2006). Based on a similar framework, the weights of the input masses in a neural network classifier have been used to rank the features'importance in Ball et al. (2002). The embedded capacity of random forests (Wu et al., 2003) and other types of decision tree-based algorithms (Geurts et al., 2005) constitutes an alternative embedded FS strategy.Previous SectionNext Section4 DEALING WITH SMALL SAMPLE DOMAINSSmall sample sizes, and their inherent risk of imprecision and overfitting, pose a great challenge for many modelling problems in bioinformatics (Braga-Neto and Dougherty, 2004; Molinaro et al., 2005; Sima and Dougherty, 2006). In the context of feature selection, two initiatives have emerged in response to this novel experimental situation: the use of adequate evaluation criteria, and the use of stable and robust feature selection models.4.1 Adequate evaluation criteria。
涌现优于权威英文原文
涌现优于权威英文原文"Emergence Trumps Authority"In today's rapidly changing and interconnected world, the concept of emergence is gaining increasing attention as a more effective way to tackle complex problems and drive innovation. Emergence refers to the phenomenon where new and unexpected patterns, properties, or behaviors emerge from the interactions of simpler elements within a system. This stands in stark contrast to the traditional top-down approach of authority, where decisions and solutions are handed down from a single source of power.The main advantage of emergence over authority is its ability to harness the collective intelligence and creativity of a group. Instead of relying on the expertise of a few individuals at the top, emergence draws on the diverse perspectives and experiences of many. This leads to more robust and innovative solutions, as well as greater buy-in and support from those involved in the process.Furthermore, emergence is better suited to navigate the complexities and uncertainties of modern challenges. With the pace of change accelerating and the interdependencies of various systems becoming more evident, no single authority figure can possibly possess all the knowledge and insight needed to address the diverse and evolving issues we face. In contrast, emergence allows for a more organic and adaptive approach, where solutions can emerge and evolve over time as new information and perspectives come to light.Additionally, emergence encourages participation and empowerment, as individuals feel a sense of ownership and responsibility for the outcomes of the collective efforts. This can lead to increased motivation, collaboration, and resilience within the group, as well as a greater sense of satisfaction and fulfillment for all involved.While authority certainly has its time and place, especially in situations requiring clear direction and decisive action, the benefits of emergence cannot be overlooked. By recognizing and harnessing the power of emergence, organizations, communities, and individuals can better adapt to the complexities and uncertainties of our modern world and drive more effective and sustainable solutions. Ultimately, "Emergence Trumps Authority."。
学术写作英语词汇
学术写作英语词汇一、引言在学术写作中,使用准确、丰富的英语词汇可以提升文章的质量。
本文将介绍一些常用的学术写作英语词汇,帮助读者在写作过程中选用合适的词汇,使文章更加精准、流畅。
二、文献综述1. 关于研究主题的背景和前沿:- The background and forefront of the research topic- The historical context and recent developments in the research area- Previous studies in the field and their contributions2. 文献综述的目的和意义:- The purpose and significance of the literature review- Reviewing the existing literature on the topic- Synthesizing the findings from previous studies3. 文献综述的方法和过程:- The methods and procedures of the literature review - Searching and selecting relevant articles- Analyzing and synthesizing the data三、方法与材料1. 研究设计与样本选取:- The research design and sample selection- Choosing an appropriate research methodology- Selecting a representative sample2. 数据收集与分析:- Data collection and analysis- Collecting quantitative/qualitative data- Conducting statistical analysis四、结果与讨论1. 结果的描述与解释:- Describing and interpreting the results- Presenting the findings of the study- Explaining the outcomes of the research2. 结果的重要性与影响:- The significance and implications of the results- The impact and relevance of the findings- The practical and theoretical implications of the study3. 结果与前人研究的比较:- Comparing the results with previous studies- Contrasting the findings with existing literature- Relating the outcomes to previous research五、结论1. 结论的总结与归纳:- Summarizing and synthesizing the conclusions- Drawing overall inferences from the findings- Making generalizations based on the results2. 结论的启示与建议:- The implications and recommendations of the conclusions- Suggesting future research directions- Providing practical suggestions and advice六、致谢1. 感谢指导教师或团队成员:- Acknowledging the guidance of the supervisor or team members - Expressing gratitude to those who provided assistance- Thanking individuals for their support and contributions2. 感谢经费支持和实验设施提供:- Thanking funding agencies and institutions for financial support - Acknowledging the provision of experimental facilities七、参考文献1. 期刊论文:- Journal articles- Papers published in academic journals- Scholarly publications2. 会议论文:- Conference papers- Papers presented at conferences- Proceedings of academic conferences3. 书籍与章节:- Books and chapters- Monographs and book chapters- Edited volumes八、总结本文介绍了学术写作中常用的英语词汇,包括文献综述、方法与材料、结果与讨论、结论、致谢和参考文献等部分。
研究生专业词汇
2-dimensional space3D mapabstractaccess dataAccessibilityaccuracyacquisitionad-hocadjacencyadventaerial photographsAge of dataagglomerationaggregateairborneAlbers Equal-Area Conic projection (ALBER alignalphabeticalphanumericalphanumericalalternativealternativealtitudeameliorateanalogue mapsancillaryANDannotationanomalousapexapproachappropriatearcarc snap tolerancearealAreal coverageARPA abbr.Advanced Research Projects Agen arrangementarrayartificial intelligenceArtificial Neural Networks (ANN) aspatialaspectassembleassociated attributeattributeattribute dataautocorrelationautomated scanningazimuthazimuthalbar chartbiasbinary encodingblock codingBoolean algebrabottombottom leftboundbreak linebufferbuilt-incamouflagecardinalcartesian coordinate system cartographycatchmentcellcensuscentroidcentroid-to-centroidCGI (Common Gateway Interface) chain codingchainscharged couple devices (ccd) children (node)choropleth mapclass librariesclassesclustercodecohesivelycoilcollinearcolumncompactcompasscompass bearingcomplete spatial randomness (CSR) componentcompositecomposite keysconcavityconcentricconceptual modelconceptuallyconduitConformalconformal projectionconic projectionconnectivityconservativeconsortiumcontainmentcontiguitycontinuouscontourcontour layercontrol pointsconventionconvertcorecorrelogramcorrespondencecorridorCostcost density fieldcost-benefit analysis (CBA)cost-effectivecouplingcovariancecoveragecoveragecriteriacriteriacriterioncross-hairscrosshatchcross-sectioncumbersomecustomizationcutcylindrical projectiondangledangle lengthdangling nodedash lineDATdata base management systems (DBMS) data combinationdata conversiondata definition language (DDL)data dictionarydata independencedata integritydata itemdata maintenancedata manipulationData manipulation and query language data miningdata modeldata representationdata tabledata typedatabasedateDBAdebris flowdebugdecadedecibeldecision analysisdecision makingdecomposededicateddeductiveDelaunay criterionDelaunay triangulationdelete(erase)delineatedemarcationdemographicdemonstratedenominatorDensity of observationderivativedetectabledevisediagonaldictatedigital elevation model (DEM)digital terrain model (DTM) digitizedigitizedigitizerdigitizing errorsdigitizing tablediscrepancydiscretediscretedisparitydispersiondisruptiondissecteddisseminatedissolvedistance decay functionDistributed Computingdividedomaindot chartdraftdragdrum scannersdummy nodedynamic modelingeasy-to-useecologyelicitingeliminateellipsoidellipticityelongationencapsulationencloseencodeentity relationship modelingentity tableentryenvisageepsilonequal area projectionequidistant projectionerraticerror detection & correctionError Maperror varianceessenceet al.EuclideanEuclidean 2-spaceexpected frequencies of occurrences explicitexponentialextendexternal and internal boundaries external tablefacetfacilityfacility managementfashionFAT (file allocation table)faultyfeaturefeaturefeedbackfidelityfieldfield investigationfield sports enthusiastfields modelfigurefile structurefillingfinenessfixed zoom infixed zoom outflat-bed scannerflexibilityforefrontframe-by framefreefrom nodefrom scratchfulfillfunction callsfuzzyFuzzy set theorygantrygenericgeocodinggeocomputationgeodesygeographic entitygeographic processgeographic referencegeographic spacegeographic/spatial information geographical featuresgeometricgeometric primitive geoprocessinggeoreferencegeo-relational geosciences geospatialgeo-spatial analysis geo-statisticalGiven that GNOMONIC projection grain tolerance graticulegrey scalegridhand-drawnhand-heldhandicaphandlehand-written header recordheftyheterogeneity heterogeneous heuristichierarchical hierarchicalhill shading homogeneoushosthouseholdshuehumichurdlehydrographyhyper-linkedi.e.Ideal Point Method identicalidentifiable identification identifyilluminateimageimpedanceimpedanceimplementimplementimplicationimplicitin excess of…in respect ofin terms ofin-betweeninbuiltinconsistencyincorporationindigenousinformation integration infrastructureinherentinheritanceinlandinstanceinstantiationintegerintegrateinteractioninteractiveinteractiveinternet protocol suite Internet interoperabilityinterpolateinterpolationinterrogateintersectintersectionIntersectionInterval Estimation Method intuitiveintuitiveinvariantinventoryinvertedirreconcilableirreversibleis adjacent tois completely withinis contained iniso-iso-linesisopleth mapiterativejunctionkeyframekrigingKriginglaglanduse categorylatitudelatitude coordinatelavalayerlayersleaseleast-cost path analysisleftlegendlegendlegendlength-metriclie inlightweightlikewiselimitationLine modelline segmentsLineage (=history)lineamentlinearline-followinglitho-unitlocal and wide area network logarithmiclogicallogicallongitudelongitude coordinatemacro languagemacro-like languagemacrosmainstreammanagerialmanual digitizingmany-to-one relationMap scalemarshalmaskmatricesmatrixmeasured frequencies of occurrences measurementmedialMercatorMercator projectionmergemergemeridiansmetadatameta-datametadatamethodologymetric spaceminimum cost pathmirrormis-representmixed pixelmodelingmodularmonochromaticmonolithicmonopolymorphologicalmosaicmovemoving averagemuiticriteria decision making (MCDM) multispectralmutually exclusivemyopicnadirnatureneatlynecessitatenestednetworknetwork analysisnetwork database structurenetwork modelnodenodenode snap tolerancenon-numerical (character)non-spatialnon-spatial dataNormal formsnorth arrowNOTnovicenumber of significant digit numeric charactersnumericalnumericalobject-based modelobjectiveobject-orientedobject-oriented databaseobstacleomni- a.on the basis ofOnline Analytical Processing (OLAP) on-screen digitizingoperandoperatoroptimization algorithmORorderorganizational schemeoriginorthogonalORTHOGRAPHIC projectionortho-imageout ofoutcomeoutgrowthoutsetovaloverdueoverheadoverlapoverlayoverlay operationovershootovershootspackagepairwisepanpanelparadigmparent (node)patchpath findingpatternpatternpattern recognitionperceptionperspectivepertain phenomenological photogrammetric photogrammetryphysical relationships pie chartpilotpitpixelplanarplanar Euclidean space planar projection platformplotterplotterplottingplug-inpocketpoint entitiespointerpoint-modepointspolar coordinates polishingpolygonpolylinepolymorphism precautionsprecisionpre-designed predeterminepreferences pregeographic space Primary and Foreign keys primary keyprocess-orientedprofileprogramming tools projectionprojectionproprietaryprototypeproximalProximitypseudo nodepseudo-bufferpuckpuckpuckPythagorasquadquadrantquadtreequadtree tessellationqualifyqualitativequantitativequantitativequantizequasi-metricradar imageradii bufferrangelandrank order aggregation method ranking methodrasterRaster data modelraster scannerRaster Spatial Data Modelrating methodrational database structureready-madeready-to-runreal-timerecordrecreationrectangular coordinates rectificationredundantreference gridreflexivereflexive nearest neighbors (RNN) regimeregisterregular patternrelationrelationalrelational algebra operators relational databaseRelational joinsrelational model relevancereliefreliefremarkremote sensingremote sensingremote sensingremotely-sensed repositoryreproducible resemblanceresembleresemplingreshaperesideresizeresolutionresolutionrespondentretrievalretrievalretrievalretrieveridgerightrobustrootRoot Mean Square (RMS) rotateroundaboutroundingrowrow and column number run-length codingrun-length encoded saddle pointsalientsamplesanitarysatellite imagesscalablescalescanscannerscannerscannerscarcescarcityscenarioschemascriptscrubsecurityselectselectionself-descriptiveself-documentedsemanticsemanticsemi-automatedsemi-major axessemi-metricsemi-minor axessemivariancesemi-variogram modelsemi-varogramsensorsequencesetshiftsillsimultaneous equations simultaneouslysinusoidalskeletonslide-show-stylesliverslope angleslope aspectslope convexitysnapsnapsocio-demographic socioeconomicspagettiSpatial Autocorrelation Function spatial correlationspatial dataspatial data model for GIS spatial databaseSpatial Decision Support Systems spatial dependencespatial entityspatial modelspatial relationshipspatial relationshipsspatial statisticsspatial-temporalspecificspectralspherical spacespheroidsplined textsplitstakeholdersstand alonestandard errorstandard operationsstate-of-the-artstaticSTEREOGRAPHIC projection STEREOGRAPHIC projection stereoplotterstorage spacestovepipestratifiedstream-modestrideStructured Query Language(SQL) strung outsubdivisionsubroutinesubtractionsuitesupercedesuperimposesurrogatesurveysurveysurveying field data susceptiblesymbolsymbolsymmetrytaggingtailoredtake into account of … tangencytapetastefullyTelnettentativeterminologyterraceterritorytessellatedtextureThe Equidistant Conic projection (EQUIDIS The Lambert Conic Conformal projection (L thematicthematic mapthemeThiessen mapthird-partythresholdthroughputthrust faulttictiertiletime-consumingto nodetolerancetonetopographic maptopographytopologicaltopological dimensiontopological objectstopological structuretopologically structured data set topologytopologytrade offtrade-offTransaction Processing Systems (TPS) transformationtransposetremendousTriangulated Irregular Network (TIN) trimtrue-direction projectiontupleunbiasednessuncertaintyunchartedundershootsunionunionupupdateupper- mosturban renewaluser-friendlyutilityutility functionvaguevalidityvarianceVariogramvectorvector spatial data model vendorverbalversusvertexvetorizationviablevice versavice versaview of databaseview-onlyvirtualvirtual realityvisibility analysisvisualvisualizationvitalVoronoi Tesselationvrticeswatershedweedweed toleranceweighted summation method whilstwithin a distance ofXORzoom inzoom out三维地图摘要,提取,抽象访问数据可获取性准确,准确度 (与真值的接近程度)获得,获得物,取得特别邻接性出现,到来航片数据年龄聚集聚集,集合空运的, (源自)航空的,空中的艾伯特等面积圆锥投影匹配,调准,校直字母的字母数字的字母数字混合编制的替换方案替代的海拔,高度改善,改良,改进模拟地图,这里指纸质地图辅助的和注解不规则的,异常的顶点方法适合于…弧段弧捕捉容限来自一个地区的、 面状的面状覆盖范围(美国国防部)高级研究计划署排列,布置数组,阵列人工智能人工神经网络非空间的方面, 方向, 方位, 相位,面貌采集,获取关联属性属性属性数据自动扫描方位角,方位,地平经度方位角的条状图偏差二进制编码分块编码布尔代数下左下角给…划界断裂线缓冲区分析内置的伪装主要的,重要的,基本的笛卡儿坐标系制图、制图学流域,集水区像元,单元人口普查质心质心到质心的公共网关接口链式编码链电荷耦合器件子节点地区分布图类库类群编码内聚地线圈在同一直线上的列压缩、压紧罗盘, 圆规, 范围 v.包围方位角完全空间随机性组成部分复合的、混合的复合码凹度,凹陷同心的概念模型概念上地管道,导管,沟渠,泉水,喷泉保形(保角)的等角投影圆锥投影连通性保守的,守旧的社团,协会,联盟包含关系相邻性连续的轮廓,等高线,等值线等高线层控制点习俗,惯例,公约,协定转换核心相关图符合,对应走廊, 通路费用花费密度域,路径权值成本效益分析有成本效益的,划算的结合协方差面层,图层覆盖,覆盖范围标准,要求标准,判据,条件标准,判据,条件十字丝以交叉线作出阴影截面麻烦的用户定制剪切圆柱投影悬挂悬挂长度悬挂的节点点划线数据文件的扩展名数据库管理系统数据合并数据变换数据定义语言数据字典与数据的无关数据的完整性数据项数据维护数据操作数据操作和查询语言数据挖掘数据模型数据表示法数据表数据类型数据库日期数据库管理员泥石流调试十年,十,十年期分贝决策分析决策,判定分解专用的推论的,演绎的狄拉尼准则狄拉尼三角形删除描绘划分人口统计学的说明分母,命名者观测密度引出的,派生的可察觉的发明,想出对角线的,斜的要求数字高程模型数字地形模型数字化数字化数字化仪数字化误差数字化板,数字化桌差异,矛盾不连续的,离散的不连续的,离散的不一致性分散,离差中断,分裂,瓦解,破坏切开的,分割的发散,发布分解距离衰减函数分布式计算分割域点状图草稿,起草拖拽滚筒式扫描仪伪节点动态建模容易使用的生态学导出消除椭球椭圆率伸长包装,封装围绕编码实体关系建模实体表进入,登记想像,设想,正视,面对希腊文的第五个字母ε等积投影等距投影不稳定的误差检查和修正误差图误差离散,误差方差本质,本体,精华以及其他人,等人欧几里得的,欧几里得几何学的欧几里得二维空间期望发生频率明显的指数的延伸内外边界外部表格(多面体的)面工具设备管理样子,方式文件分配表有过失的,不完善的(地理)要素,特征要素反馈诚实,逼真度,重现精度字段现场调查户外运动发烧友场模型外形, 数字,文件结构填充精细度以固定比例放大以固定比例缩小平板式扫描仪弹性,适应性,机动性,挠性最前沿逐帧无…的起始节点从底层完成,实现函数调用模糊的模糊集合论构台,桶架, 跨轨信号架通用的地理编码地理计算大地测量地理实体地理(数据处理)过程地理参考地理空间地理信息,空间信息地理要素几何的,几何学的几何图元地理(数据)处理过程地理坐标参考地理关系的地球科学地理空间的地学空间分析地质统计学的假设心射切面投影颗粒容差地图网格灰度栅格,格网手绘的手持的障碍,难点处置、处理手写的头记录重的,强健的异质性异构的启发式的层次层次的山坡(体)阴影图均匀的、均质的主机家庭色调腐植的困难,阻碍水文地理学超链接的即,换言之,也就是理想点法相同的可识别的、标识识别阐明图像,影像全电阻,阻抗阻抗实现,履行履行,实现牵连,暗示隐含的超过…关于根据…在中间的嵌入的,内藏的不一致性,矛盾性结合,组成公司(或社团)内在的,本土的信息集成基础设施固有的继承,遗传, 遗产内陆的实例,例子实例,个例化整数综合,结合相互作用交互式的交互式的协议组互操作性内插插值询问相交交集、逻辑的乘交区间估值法直觉的直觉的不变量存储,存量反向的,倒转的,倒置的互相对立的不能撤回的,不能取消的相邻完全包含于包含于相等的,相同的线族等值线图迭代的接合,汇接点主帧克里金内插法克里金法标签,标记间隙,迟滞量土地利用类别纬度 (B)纬度坐标熔岩,火山岩图层图层出租,租用最佳路径分析左图例图例图例长度量测在于小型的同样地限制,限度,局限线模型线段谱系,来源容貌,线性构造线性的,长度的,直线的线跟踪的岩性单元局域和广域网对数的逻辑的逻辑的经度 (L)经度坐标宏语言类宏语言宏主流管理人的, 管理的手工数字化多对一的关系地图比例尺排列,集合掩膜matrix 的复数矩阵实测发生频率量测中间的合并墨卡托墨卡托投影法合并合并,融合子午线元数据元数据,也可写为 metadata元数据方法学,方法论度量空间最佳路径镜像错误表示混合像素建模模块化的单色的,单频整体的垄断, 专利权, 专卖形态学镶嵌, 镶嵌体移动移动平均数多准则决策分析多谱线的,多谱段的相互排斥的短视,没有远见的最低点,天底,深渊,最底点本性,性质整洁地成为必要嵌套的、巢状的网络网络分析网状数据库结构网络模型节点节点节点捕捉容限非数值的(字符)非空间的非空间数据范式指北针非新手,初学者有效位数数字字符数值的数值的基于对象的模型客观的,目标的面向对象的模型面向对象的数据库阻碍全能的,全部的以…为基础在线分析处理屏幕数字化运算对象,操作数算子,算符,操作人员优化算法或次,次序组织方案原点,起源,由来直角的,直交的正射投影正射影像缺少结果长出,派出,结果,副产物开头 ,开端卵形的,椭圆形的迟到的管理费用重叠,叠加叠加叠置运算超出过头线软件包成对(双)地,两个两个地平移面,板范例、父节点补钉,碎片,斑点路径搜索图案式样,图案, 模式模式识别感觉,概念,理解力透视图从属, 有关, 适合现象学的,现象的摄影测量的摄影测量物理关系饼图导航洼坑象素平面的平面欧几里得空间平面投影平台绘图仪绘图仪绘图插件便携式,袖珍式,小型的点实体指针点方式点数,分数极坐标抛光多边形多义线,折线多形性,多态现象预防措施精确, 精度(多次测量结果之间的敛散程度) 预定义的,预设计的预定、预先偏好先地理空间主外键主码面向处理的纵剖面、轮廓编程工具投影投影所有权,业主原型,典型最接近的,近侧的接近性假的, 伪的伪节点缓冲区查询(数字化仪)鼠标数字化鼠标鼠标毕达哥拉斯方庭,四方院子象限,四分仪四叉树四叉树方格限定,使合格定性的量的定量的、数量的使量子化准量测雷达影像以固定半径建立缓冲区牧场,放牧地等级次序集合法等级评定法栅格栅格数据模型栅格扫描仪栅格空间数据模型分数评定法关系数据结构现成的随需随运行的实时记录娱乐平面坐标纠正多余的,过剩的, 冗余的参考网格自反的自反最近邻体制,状态,方式配准规则模式关系关系关系代数运算符关系数据库关系连接中肯,关联,适宜,适当地势起伏,减轻地势的起伏评论,谈论,谈到遥感遥感遥感遥感的知识库可再产生的相似,相似性,相貌相似类似,像重取样调整形状居住, 驻扎调整大小分辨率分辨率回答者,提取检索检索检索高压脊右稳健的根部均方根旋转迂回的舍入的、凑整的行行和列的编号游程长度编码行程编码鞍点显著的,突出的,跳跃的,凸出的样品, 标本, 样本卫生状况卫星影像可升级的比例尺扫描扫描仪扫描仪扫描仪缺乏,不足情节模式脚本,过程(文件)灌木安全, 安全性选择选择自定义的自编程的语义的,语义学的语义的,语义学的半自动化长半轴半量测短半轴半方差半变差模型半变差图传感器次序集合、集、组改变, 移动基石,岩床联立方程同时地正弦的骨骼,骨架滑动显示模式裂片坡度坡向坡的凸凹性咬合捕捉社会人口统计学的社会经济学的意大利面条自相关函数空间相互关系空间数据GIS的空间数据模型 空间数据库空间决策支持系统空间依赖性空间实体空间模型空间关系空间关系空间统计时空的具体的,特殊的光谱的球空间球状体,回转椭圆体曲线排列文字分割股票持有者单机标准误差,均方差标准操作最新的静态的极射赤面投影极射赤面投影立体测图仪存储空间火炉的烟囱形成阶层的流方式步幅,进展,进步结构化查询语言被串起的细分,再分子程序相减组, 套件,程序组,代替,取代叠加,叠印代理,代用品,代理人测量测量,测量学野外测量数据免受...... 影响的(地图)符号符号,记号对称性给...... 贴上标签剪裁讲究的考虑…接触,相切胶带、带子风流地,高雅地远程登录试验性的术语台地,露台领域,领地,地区棋盘格的,镶嵌的花样的纹理等距圆锥投影兰伯特保形圆锥射影专题的专题图主题,图层泰森图第三方的阈值生产量,生产能力,吞吐量逆冲断层地理控制点等级,一排,一层,平铺费时间的终止节点允许(误差)、容差、容限、限差色调地形图地形学拓扑的拓扑维数拓扑对象拓扑结构建立了拓扑结构的数据集拓扑关系拓扑交替换位,交替使用,卖掉交换,协定,交易事务处理系统变换,转换转置,颠倒顺序巨大的不规则三角网修整真方向投影元组不偏性不确定性海图上未标明的,未知的欠头线合并并集、逻辑的和上升级最上面的城市改造用户友好的效用, 实用,公用事业效用函数含糊的效力,正确,有效性方差,变差变量(变化记录)图矢量矢量空间数据模型经销商言语的, 动词的对,与…相对顶点 (单数)矢量化可实行的,可行的反之亦然反之亦然数据库的表示只读的虚拟的虚拟现实通视性分析视觉的可视化,使看得见的重大的沃伦网格顶点(复数)分水岭杂草,野草 v.除草,铲除清除容限度加权求和法同时在 ...... 距离内异或放大缩小。
语言学提纲笔记
Chapter 1 Invitation to LinguisticsLanguage The Definition(语言的定义)The Design Features Arbitrariness(本质特征)DualityCreativityDisplacement语言先天反射理论The Origin Of Language The bow-bow theory(语言的起源) The pooh-pooh theoryThe “yo-he-yo”theoryJacobos(与The Prague School一致)Referential Functions Of Language Ideational PoeticEmotiveHalliday Interpersonal ConativePhaticTextual MetalingualThe Basic Functions InformativeInterpersonalPerformativeEmotive functionPhatic communion(B.Malinowski 提出)Recreation functionMetalingual function Linguistics The DefinitionThe Main Branches of Linguistics Phonetics(微观语言学) PhonologyMorphologySyntaxSemanticsPragmaticsMacrolinguistics Psycholinguistics(宏观语言学)SociolinguisticsAnthropological LinguisticsComputaioanl LinguisticsDescriptive &PrescriptiveSynchronic&DiachronicImportant Distinctions Langue&ParoleCompetence&PerformanceChapter 2 Speech SoundsPhonetics Acoustic Phonetics (声学语音学)语音学Auditory Phonetics(听觉语言学)Articulatory Phonetics(发声语音学)Speech Organs/Vocal organs(lungs ,trachea,throat,nose.mouth)IPA/Diacritics(变音符)Consonants The definitionThe manner of articulationArticulatory Phonetics The place of articulation(发声语音学)Vowels The definitionThe sound of English:RP/GACardinal vowelsThe requirements of descriptionCoarticulation Anticipatory CoarticulationPerseverative CoarticulationPhonetics transcription Narrow transcriptionBroad transcriptionPhonology 音位理论Minimal Pairs(c ut&p ut)Phone&Phonemes&Allophone(音素&音位&音位变体)音系学C omplementary DistributionFree variants(自由变体)/variation(自由变体现象)Phonological contrasts or opposition(音位对立)Distinctive Features(First developed by Jacobson as a meansof working out a set of phonological contrasts or opposition toCapture particular aspect of language sounds)progressive assimilationPhonological Process音系过程Assimilation Progressive assimilation音素是语音学研究的单位。
Heuristic
Heuristic (computer science)Heuristic optimizationIn computer science, artificial intelligence, and mathematical optimization, a heuristic is a technique designed for solving a problem more quickly when classic methods are too slow, or for finding an approximate solution when classic methods fail to find any exact solution. This is achieved by trading optimality, completeness, accuracy, or precision for speed. In a way, it can be considered a shortcut.Contents[hide]1 Definition and motivation2 Trade-off3 Examples3.1 Simpler problem3.2 Traveling salesman problem3.3 Search3.4 Newell and Simon: Heuristic Search Hypothesis3.5 Virus scanning3.6 Russell and Norvig4 Pitfalls5 See also6 ReferencesDefinition and motivation[edit]The objective of a heuristic is to produce a solution in a reasonable time frame that is good enough for solving the problem at hand. This solution may not be the best of all the actual solutions to this problem, or it may simply approximate the exact solution. But it is still valuable because finding it does not require a prohibitively long time.Heuristics may produce results by themselves, or they may be used in conjunction with optimization algorithms to improve their efficiency (e.g., they may be used to generate good seed values).Results about NP-hardness in theoretical computer science make heuristics the only viable option for a variety of complex optimization problems that need to be routinely solved in real-world applications.Trade-off[edit]The trade-off criteria for deciding whether to use a heuristic for solving a given problem include the following: Optimality: When several solutions exist for a given problem, does the heuristic guarantee that the best solution will be found? Is it actually necessary to find the best solution?Completeness: When several solutions exist for a given problem, can the heuristic find them all? Do we actually need all solutions? Many heuristics are only meant to find one solution.Accuracy and precision: Can the heuristic provide a confidence interval for the purported solution? Is the error bar on the solution unreasonably large?Execution time: Is this the best known heuristic for solving this type of problem? Some heuristics converge faster than others. Some heuristics are only marginally quicker than classic methods.In some cases, it may be difficult to decide whether the solution found by the heuristic is good enough, because the theory underlying that heuristic is not very elaborate.Examples[edit]Simpler problem[edit]One way of achieving the computational performance gain expected of a heuristic consists in solving a simpler problemwhose solution is also a solution to the initial problem. Such a heuristic is unable to find all the solutions to the initial problem, but it may find one much faster because the simple problem is easy to solve.Traveling salesman problem[edit]An example of approximation is described by Jon Bentley for solving the traveling salesman problem (TSP) so as to select the order to draw using a pen plotter. TSP is known to be NP-Complete so an optimal solution for even moderate size problem is intractable. Instead, the greedy algorithm can be used to give a good but not optimal solution (it is an approximation to the optimal answer) in a reasonably short amount of time. The greedy algorithm heuristic says to pick whatever is currently the best next step regardless of whether that precludes good steps later. It is a heuristic in that practice says it is a good enough solution, theory says there are better solutions (and even can tell how much better in some cases).[1]Search[edit]Another example of heuristic making an algorithm faster occurs in certain search problems. Initially, the heuristic tries every possibility at each step, like the full-space search algorithm. But it can stop the search at any time if the current possibility is already worse than the best solution already found. In such search problems, a heuristic can be used to try good choices first so that bad paths can be eliminated early (see alpha-beta pruning).Newell and Simon: Heuristic Search Hypothesis[edit]In their Turing Award acceptance speech, Allen Newell and Herbert A. Simon discuss the Heuristic Search Hypothesis: a physical symbol system will repeatedly generate and modify known symbol structures until the created structure matches the solution structure. Each successive iteration depends upon the step before it, thus the heuristic search learns what avenues to pursue and which ones to disregard by measuring how close the current iteration is to the solution. Therefore, some possibilities will never be generated as they are measured to be less likely to complete the solution.A heuristic method can accomplish its task by using search trees. However, instead of generating all possible solution branches, a heuristic selects branches more likely to produce outcomes than other branches. It is selective at each decision point, picking branches that are more likely to produce solutions.[2]Virus scanning[edit]Many virus scanners use heuristic rules for detecting viruses and other forms of malware. Heuristic scanning looks for code and/or behavioral patterns indicative of a class or family of viruses, with different sets of rules for different viruses. If a file or executing process is observed to contain matching code patterns and/or to be performing that set of activities, then the scanner infers that the file is infected. The most advanced part of behavior-based heuristic scanning is that it can work against highly randomized polymorphic viruses, which simpler string scanning-only approaches cannot reliably detect. Heuristic scanning has the potential to detect many future viruses without requiring the virus to be detected somewhere, submitted to the virus scanner developer, analyzed, and a detection update for the scanner provided to the scanner's users.。
四川省眉山市彭山区第一中学2024-2025学年高三上学期开学考试英语试题
四川省眉山市彭山区第一中学2024-2025学年高三上学期开学考试英语试题一、阅读理解There are tons of physics textbooks available around the world. Based on our web research, here are our top four picks with the introduction of physics in simple, practical language.Mechanics, Relativity, and ThermodynamicsThis book is a collection of online teachings by Professor R. Shankar. Shankar is one of the first to be involved in the innovative Open Yale Courses program. It is a perfect introduction to college-level physics. Students of chemistry, engineering, and AP Physics will find this book helpful.Physics for Students of Science and EngineeringThis book helps students to read scientific data, answer scientific questions, and identify fundamental concepts. The new and improved 10th edition features multi-media resources, and questions to test students’ understanding of each concept.The Feynman Lectures on PhysicsRichard Feynman is regarded as one of the greatest teachers of physics to walk the face of the earth. This book is a collection of Feynman’s lectures. In his words, these lectures all began as an experiment, which, in turn, formed the basis of this book.University Physics with Modern PhysicsThe book is recognized for teaching and applying principles of physics through a narrative (叙事的) method. To ensure a better understanding and ability to apply these concepts, worked examples are provided, giving students tools to develop problem-solving skills and conceptual understanding.1.What do the first two books have in common?A.They are improved editions.B.They are written by professors.C.They favor students of engineering.D.They feature multi-media resources.2.Which book best suits students who enjoy learning physics through practical examples?A.Mechanics, Relativity, and Thermodynamics.B.Physics for Students of Science and Engineering.C.The Feynman Lectures on Physics.D.University Physics with Modern Physics.3.Where is this text probably taken from?A.An online article.B.A research paper.C.A physics textbook.D.A science journal.Tech businessman Jared Isaacman, who made a fortune in tech and fighter jets, bought an entire flight and took three “everyday” people with him to space. He aimed to use the private trip to raise $200 million for St. Jude Children’s Research Hospital, half coming from his own pocket.His crew included a St. Jude worker with direct ties to the activity, representing the activity’s pillar (核心) of Hope, a professor, and another person, representing the pillar of Generosity, chosen as part of a $200 million St. Jude fundraising program. All were invited to join in donating to reach the ambitious overall campaign goal in support of St. Jude’s current multi-billion dollar expansion to speed up research advances and save more children worldwide. Anyone donating to St. Jude would be entered into a random drawing for the “Generosity” seat.Isaacman has been “really interested in space” since he was in kindergarten. He dropped out of high school when he was 16, got a GED certificate and started a business in his parents’ basement that became the beginning of Shift4 Payments, a credit card processing company. He set a speed record flying around the world in 2009 while raising money for the Make-A-Wish program, and later established Draken International, the world’s largest private fleet (舰队) of fighter jets.Now he has realized his childhood dream-boarding a spaceship, launched in Florida and orbiting the Earth for three days in the history-making event. He called it an “epic (史诗般的) adventure”. “I truly want us to live in a world 50 or 100 years from now where people are jumping their rockets,” Isaacman said. “And if we’re going to live in that world, we’d better deal withchildhood cancer successfully along the way.”4.Why did Isaacman raise funds for St. Jude?A.To expand a fundraising programme.B.To perform an act of great generosity.C.To make his childhood dream come true.D.To encourage St. Jude’s life-saving work. 5.What is mainly talked about in paragraph 3?A.The commercial skills of Isaacman.B.The growth experience of Isaacman.C.The reason for Isaacman’s good deeds.D.The beginning of Isaacman’s business. 6.What can be learned about the “epic adventure”?A.It was a multi-day journey.B.It will be common in the future.C.It involved three civilians in total.D.It is a symbol of hope for a better life. 7.What message is conveyed in Isaacman’s story?A.No sweet without sweat.B.Many hands make light work.C.Nothing is impossible to a willing heart.D.A penny saved is a penny earned.Is diet soda safe? If you’re concerned about sugar, diet products seem a better option, sweet and not so bad for you. Wrong! Drinking diet soda regularly can increase your risk of diseases. Despite the fact that we call these drinks “diet”, the artificial sweeteners they contain are linked to weight gain, not loss.There’s the latest evidence that they increase the risk of depression, which comes from a new analysis by researchers at Harvard Medical School. The team drew upon a data set of nearly 32,000 female nurses, ages 42 to 62 when the study began. It turned out that the nurses who consumed the most diet drinks had a 37 percent higher chance of depression, compared to those who drank the least or none.Diet soda also increases your risk of stroke (中风), according to a separate meta-analysis that included 72 studies. Looking for the causes behind the stroke, researchers took various blood measurements when 12 healthy volunteers in their 20s drank water, soda, or diet soda. The result showed that both sodas slowed the flow of blood within the brain. Though the effect didn’t seem sufficient to cause stroke, slower blood flow could have accumulating effects.Other researchers have found that diet soda increases the risk of dementia (痴呆), from data from nearly 178,000 volunteers tracked over an average of nine years. That’s not a big surprise.An earlier study of about 4,300 volunteers concluded that drinking diet soda every day was tied to three times the risk of dementia over the following decade. The researchers looked at brain scans and the results of mental function assessments. A daily diet soda was linked to smaller brains and aggravates long-term memory, two risk factors for dementia.Avoiding depression, stroke, and dementia is an obvious goal for whoever desires to age healthily. So you know what to do.8.How does the author present his point of view?A.By analyzing causes.B.By giving opinions.C.By quoting specialists.D.By presenting research.9.What effect might diet soda have on people?A.Slight weight loss.B.Increased blood flow.C.Raised depression risk.D.Severe mental decline.10.Which can best replace the underlined word “aggravates” in paragraph 4?A.Deletes.B.Worsens.C.Motivates.D.Stimulates. 11.What might the author advise us to do?A.Quit consuming diet sodas.B.Limit the daily sugar intake.C.Set achievable health goals.D.Follow fixed aging process.Recent developments in robotics, artificial intelligence, and machine learning have brought us in the eye of the storm of a new automation age. About half of the work carried out by people was likely to be automated by 2055 with adaption to technology, a McKinsey Global Institute report predicted.Automation can enable businesses to improve performance by reducing errors and improving quality and speed, and in some cases achieving outcomes that go beyond human capabilities. At a time of weak productivity growth worldwide, automation technologies can provide the much-needed promotion of economic growth, according to the report. Automation could raise productivity growth globally by 0.8 percent to 1.4 percent. At a global level, technically automated activities involved 1.1 billion employees and 11.9 trillion U.S. dollars in wages, the report said.The report also showed that activities most influenced by automation were physical ones inhighly structured and predictable environments, as well as data collection and processing. In the United States, these activities make up 51 percent of activities in the economy, accounting for almost 2.7 trillion dollars in wages. They are most common in production, accommodation and food service, and the retail (零售) trade. And it’s not just low-skill, low-wage work that is likely to be influenced by automation; middle-skill and high-paying, high-skill occupations, too, have a degree of automation potential.The robots and computers not only can perform a range of routine physical work activities better and more cheaply than humans, but are also increasingly capable of accomplishing activities that require cognitive (认知的) capabilities, such as feeling emotions or driving.While much of the current debate about automation has focused on the potential that many people may be replaced and therefore lose their financial resources, the analysis shows that humans will still be needed: The total productivity gains will only come about if people work alongside machines.12.What is the report mainly about?A.Comparisons of robots with humans.B.Analysis of automation’s potential in economy.C.Prediction of the unemployment problem.D.Explanations of the concept of the automation age.13.What might happen in 2055 according to the text?A.Automation will cause weak productivity growth.B.Automation will reduce employees’ wages.C.Activities like data collection and processing will disappear.D.Activities involve feeling emotions can be performed by robots.14.How does the author feel about human workers?A.Worried.B.Mixed.C.Optimistic.D.Doubtful.15.Which can be a suitable title for the text?A.Automation: A challenge to all?B.Automation: Where to go from here?C.Automation: Who is the eventual winner?D.Automation: A future replacement for humans?Sustainable travel is now one of the fastest-growing movements. Its goal is to meet the needs of the tourism industry without harming natural and cultural environments. 16 Here are some concrete ways to reduce your environmental impact as a traveler.17 Travel doesn’t have to be about going somewhere far away. It’s the art of exploration, discovery and getting out of your comfort zone, all of which can just as well be nearby. Find somewhere nearby you haven’t been, get in your car, and go for a visit. You never know what you’ll come across.Make greener transportation choices. After walking, public transportation is the next best way to explore new destinations. 18 When it comes to longer distances, buses and trains are your best way of getting around, both of which can be quite an experience in and of itself.Avoid over-visited destinations. If you can, avoid places with over-tourism. You’ll find fewer crowds and lower prices, and you also won’t be putting as much pressure on local communities struggling to keep up. And, from a personal-enjoyment point of view, who wants to deal with crowds or long lines? No one. 19Take a nature-related trip. If you want to better understand and appreciate the natural world, try taking a trip with the single purpose of connecting with nature. 20 I promise that when you come home, you’ll have a new viewpoint on why we’re all so focused on being environmentally friendly these days.A.Stay close to home.B.Find an ideal place to explore.C.Sustainable travel can be useful to support communities.D.Not only is it better for the environment, but it’s cheaper as well.E.Get in touch with the world in a way that sitting at home doesn’t.F.If not managed properly, tourism can have incredibly negative impacts.G.Visiting less-visited destinations can be much more enjoyable and rewarding.二、完形填空Last Friday, I headed to work on a crowded subway. Eyes glued to my 21 , I surfed the Internet. As the doors closed, I heard the overhead voice. I generally 22 the repeated announcements. But this one was 23 .“Good morning,” said an energetic voice. It was such a nice voice, with such a nice 24 , that I looked up, catching the eye of a fellow 25 . “Paddington Station will be your next stop, your first opportunity to change for the two or three trains. It’s a new day, a new year, and a time for second chances. Please 26 your steps as you leave the train!”I smiled, and the woman whose eyes I’d caught smiled, too. We 27 . Then we did the thing that nobody ever does on a subway — we 28 to each other. Other passengers smiled, too. Our smiles lasted as the train reached Paddington Station. Together, we 29 to the very train that we might have the opportunity to 30 in limited time. On this train, I felt relieved and smiled. Then I got off at my stop and started my day. I felt so good in the office. That nice feeling 31 all day.What happened? Could it be that an unusually 32 announcement and small talks with a 33 changed my mood? Yes, I believed so. Maybe I enjoyed the smile, the laugh, and the 34 philosophy. I realized that just saying “hello” might make you feel unexpectedly good. It’s the 35 , though, that makes me feel most important.21.A.seat B.phone C.book D.exit 22.A.forget B.doubt C.mistake D.ignore 23.A.different B.similar C.terrible D.funny 24.A.greet B.sense C.tone D.note 25.A.director B.passenger C.worker D.guide 26.A.take out B.speed up C.arrange for D.watch out for 27.A.laughed B.stopped C.refused D.wondered 28.A.referred B.objected C.spoke D.turned 29.A.walked B.rushed C.moved D.headed 30.A.miss B.repair C.control D.catch 31.A.ended B.began C.lasted D.changed 32.A.optimistic B.meaningful C.amusing D.powerful33.A.friend B.colleague C.stranger D.broadcaster 34.A.irregular B.improper C.illogical D.unexpected 35.A.transportation B.connection C.direction D.invitation三、语法填空阅读下面短文,在空白处填入1个适当的单词或括号内单词的正确形式。
从生态系统的角度考虑 英文
从生态系统的角度考虑英文English:From an ecosystem perspective, it is crucial to consider the interconnected web of relationships between living organisms and their physical environment. Ecosystems rely on the delicate balance of energy flow, nutrient cycling, and biodiversity to maintain their functionality and resilience. Each organism plays a unique role in the ecosystem, whether as a producer, consumer, or decomposer, contributing to the overall stability of the system. Human activities, such as deforestation, pollution, and overexploitation of natural resources, can disrupt these intricate relationships and have far-reaching consequences on the health and sustainability of ecosystems. Therefore, it is essential to manage ecosystems sustainably, taking into account their inherent complexity and interconnectedness to ensure their long-term health and productivity.中文翻译:从生态系统的角度来看,考虑生物与其物理环境之间相互关系的错综复杂网格至关重要。
and years of study lie ahead for whale researchers
and years of study lie ahead forwhale researchersFor whale researchers, the road ahead is paved with years of study. Their journey is one of dedication, perseverance, and a deep fascination with these magnificent creatures of the sea. The study of whales is a complex and ever-evolving field, requiring a diverse array of skills and knowledge.Years of study lie ahead as they delve into the intricacies of whale behavior, migration patterns, and communication. They will spend countless hours observing whales in their natural habitats, documenting their movements, and analyzing their sounds and signs. This painstaking work is essential in understanding the lives of these elusive animals and protecting them for future generations.Furthermore, whale researchers must stay up-to-date with the latest scientific advancements and techniques. They will engage in ongoing research, collaborate with colleagues, and contribute to the global body of knowledge on whales. By sharing their findings and insights, they can drive conservation efforts and raise awareness about the importance of these gentle giants.The years of study also involve facing challenges and uncertainties. Fieldwork can be physically and mentally demanding, with long hours at sea and unpredictable weather conditions. However, the reward lies in the discovery of new insights and the ability to make a meaningful contribution to the conservation of whale populations.In the face of these challenges, whale researchers remain driven by a passion for the conservation and understanding of these incredible animals. Their work is not only about adding to scientific knowledge but also about safeguarding the future of whales and their ocean homes. With each passing year, their study brings us closer to a greater appreciation and protection of these majestic creatures.。
个别认定的英文财经表述
个别认定的英文财经表述Title: The Impact of Individual Recognition on Financial PerformanceIn the dynamic field of finance, the concept of individual recognition has gained significant attention in recent years. It's the acknowledgment of unique skills, contributions, and achievements of individuals within an organization, and its impact on financial performance is profound.Firstly, individual recognition fosters a sense of belonging and motivation among employees. When employees feel valued and respected for their efforts, they are more likely to work harder, be more innovative, and take on additional responsibilities. This positive attitude trickles down to their work output, leading to improved financial outcomes.Secondly, individual recognition can lead to a more efficient allocation of resources. By recognizing the unique skills and abilities of individuals, organizations can allocate tasks and projects based on their expertise. This targeted approach not only maximizes the use of resources but also leads to better financial returns on investments.Moreover, individual recognition can enhance employee retention. When employees feel recognized and rewarded for their work, they are less likely to seek employment elsewhere. This reduction in employee turnover can significantly reduce the costs associated with hiring and training new employees, leading to financial savings.However, it's crucial to ensure that the recognition is authentic and well-deserved. Improper or biased recognition can lead to dissatisfaction and a negative work environment. Organizations need to establish clear criteria for recognition and ensure that the process is fair and transparent.In conclusion, individual recognition has a significant impact on financial performance. It not only boosts employee morale and motivation but also leads to more efficient resource allocation and reduced employee turnover. By implementing meaningful recognition programs, organizations can enhance their financial outcomes while fostering a positive and inclusive work environment.。
湖南英语大作文评分细则
湖南英语大作文评分细则Sure, here is a detailed scoring rubric for the English essay in the Hunan province:1. Content (40 points):Relevance to the Topic (10 points): Does the essay directly address the given topic? Is there a clear understanding of the prompt?Depth of Analysis (15 points): Does the essay demonstrate a thorough understanding of the topic? Are ideas explored in depth with supporting evidence and examples?Originality and Creativity (10 points): Does the essay offer fresh insights or perspectives on the topic? Are there unique ideas presented?Coherence and Organization (5 points): Is the essaywell-structured and logically organized? Are there clear transitions between paragraphs and ideas?2. Language Use (30 points):Grammar and Syntax (10 points): Are sentences grammatically correct? Is there varied sentence structure? Are there any syntactical errors?Vocabulary (10 points): Is a wide range of vocabulary used appropriately? Are words used accurately to convey meaning?Idiomatic Language (5 points): Are idiomatic expressions and phrases used appropriately? Does the language sound natural and fluent?Spelling and Punctuation (5 points): Are there any spelling errors? Are punctuation marks used correctly?3. Argumentation and Persuasion (20 points):Clarity of Argument (5 points): Is the main argument clearly articulated and supported throughout the essay?Evidence and Examples (10 points): Are there sufficient evidence and examples to support the main argument? Are they relevant and effectively integrated into the essay?Counterargument and Rebuttal (5 points): Does the essay address potential counterarguments? Are they effectively refuted?4. Overall Impression (10 points):Engagement and Interest (5 points): Does the essay engage the reader? Is it interesting and thought-provoking?Adherence to Instructions (5 points): Does the essay meet the specified length requirement? Does it adhere to the guidelines provided in the prompt?Total Score: 100 points.Remember, this rubric is just a general guideline, and actual scoring may vary depending on specific instructions and the discretion of the examiners.。
Sequence Classification
Abstract In this paper we present an applica-
tion of neural networks to biomedical data mining. Speci cally we propose a hybrid approach, combining similarity search and Bayesian neural networks, to classify protein sequences. We apply our techniques to recognizing the globin sequences obtained from the database maintained in the Protein Information Resources (PIR) at the National Biomedical Research Foundation. Experimental results indicate an excellent performance of the proposed approach.
Classi cation, or supervised learning, is one of the major data mining processes. Classi cation is to partition a set of data into two or more categories. When there are only two categories, it is called binary classi cation. Here we focus on binary classi cation of protein sequences. In binary classi cation, we are given some training data including both positive and negative examples. The positive data belongs to a target class, whereas the negative data belongs to the non-target class. The goal is to assign unlabeled test data to either the target class or the non-target class. In our case, the test data are some unlabeled protein sequences, the positive data are protein sequences belonging to the globin superfamily in the PIR database and the negative data are non-globin sequences. We use globin sequence classi cation as an example, though our techniques should generalize to any type of protein sequences.
广东省2025届普通高中毕业班第一次调研考试(英语)
广东省2025届普通高中毕业班第一次调研考试英语本试卷共8页,考试用时120分钟,满分120分。
注意事项:1.答卷前,考生务必将自己所在的市(县、区)、学校、班级、姓名、考场号和座位号填写在答题卡上,将条形码横贴在每张答题卡左上角“条形码粘贴处”。
2.作答选择题时,选出每小题答案后,用2B铅笔在答题卡上将对应题目选项的答案信息点涂黑;如需改动,用橡皮擦干净后,再选涂其他答案。
答案不能答在试卷上。
3.非选择题必须用黑色字迹的钢笔或签字笔作答,答案必须写在答题卡各题目指定区域内相应位置上;如需改动,先画掉原来的答案,然后再写上新答案;不准使用铅笔和涂改液。
不按以上要求作答无效。
4.考生必须保证答题卡的整洁。
考试结束后,将试卷和答题卡一并交回。
第二部分阅读(共两节,满分50分)第一节(共15小题;每小题2.5分,满分37.5分)阅读下列短文,从每题所给的A、B、C、D四个选项中选出最佳选项。
ATICKJETS FOR KENSINGTON PALACJE AND UNTOJLJD JLIVESKensington Palace TicketsAn admission ticket includes access to all public areas of the palace and gardens including: Untold Lives exhibition, Victoria:A Royal Childhood, The King's State Apartments and The Queen’s State Apartments.(续表)How to Get Tickets You've Bought OnlineDownload your PDF ticket to your mobile for scanning(扫描) at the entrance or click the link in the email that we’ll send you and print out all your tickets.If you are not able to download your e-tickets using the link in your confirmation email, please show your reference number which begins 42xxxxxxxxx to the ticket desk when you arrive and staff on site will be able to print your tickets for you.21. What can a Kensington Palace ticket be used to do?A. Serve as an identification card.B. Provide discounts for kid tickets.C. Offer free visit to several places.D. Show how to print online tickets.22. How much should a class of 20 pupils and a teacher pay for the entry?A. About £21.B. About£264.C. About £404.D. About£464.23. What is needed when you have your tickets printed on site?A. The cellphone screen.B. The reference number.C. The ticket price table.D. The confirmation email.BAs a college professor, I am required to hold an office hour before my lecture. These office hours are optional and tend to be busier at the beginning and end of a semester(学期).In the middle, they can become quiet. A few years ago I was given a flute(长笛) as a gift, so I decided that I would use my quiet office hours to practice this new instrument. The experience brought unexpected insights into performance anxiety.I held my office hour in the near-empty lecture hall, one hour before the class began. The hall was open to any student who wished to talk with me about coursework or to take a seat and quietly read before the lecture began. I would assemble (组装) my flute, open my lesson book, and begin working on the instrument I had never played before. I also followed online video lesson s-a ll done in front of a few students who would come early to class.I would begin playing l ong tones, closing my eyes and “forgetting” that anyone was in the room with me. I was surprised to find that I felt no anxiety while learning a new instrument in front of others. Had I been playing my main instrument, I would have had more concern about the level of my playing and how my playing was being received. However, in this setting, it was clear that I was an absolute beginner with no expectations of impressing anyone with my mastery. My attention was set on figuring the instrument out. I had no expectations of how I would sound and had little expectations of sounding like anything more than a beginner.There have been many things I have learned from my experiment of learning an instrument in public. Whenever musicians talk with me about their stage fright, I offer them this story.24. What is “an office hour” for?A. The professors to show talents.B. The students to appreciate music.C. The teachers to offer consultation.D. The lecturers to make preparations.25. Why did the author play a flute?A. To pass the time.B. To give a lecture.C. To do a research.D. To attract students.26. What made the author at ease when playing the flute?A. The technique from the video.B. His impressive performance.C. The audience’s active response.D. His concentration on playing.27. Which of the following is a suitable title for the text?A. My Joy of Learning a New ThingB. My Tip on Performing in the PublicC. My Discovery to Ease Stage FrightD. My Office Hour Before Every LessonCAs AI develops, it becomes challenging to distinguish between its content and human-created work. Before compar ing both, it’s good to know about the Perplexity & Burstiness of a text.Perplexity is a measurement used to evaluate the performance of language models in predicting the next word in a group of words. It measures how well the model can estimate the probability of a word occurring based on the previous context. A lower perplexity score indicates better predictability and understanding of the language, while a higher perplexity score suggests a higher degree of uncertainty and less accurate predictions. The human mind is so complex compared to current AI models that human-written text has high perplexity compared to AI-generated text.Examples :High Perplexity: “The teapot sang an opera of hot, wheeling tea, every steamy note a symphony of flavor. ”Low Perplexity: “I poured hot water into the teapot, and a fresh smell filled the room. ”Burstiness refers to the variation in the length and structure of sentences within a piece of content. It measures the degree of diversity and unpredictability in the arrangement of sentences. Human writing often exhibits bursts and lulls (间歇) , with a mix of long and short sentences, while AI-generated content tends to have a more uniform and regular pattern. Higher burstiness indicates greater creativity, spontaneity (自发性) , and engagement in writing, while lower burstiness reflects a more robotic and monotonous (单调的) style. Just like the perplexity score, human-written content usually has a high burstiness score.Examples :High Burstiness: “The alarm screamed. Feet hit the floor. The tea kettle whistled. Steam streamed. Heart pounded. The world, awake. ”Low Burstiness: “In the peaceful morning, the alarm clock’s soft ring greeted a new day. I walked to the kitchen, my steps light and unhurried. The tea kettle whistled its gentle song, a comforting tune that harmonized with the steam’s soft whisper. ”Here, I wrote a passage on the “Importance of l ifelong learning”myself and also asked ChatGPT to do the same to compare better AI-generated and human-written text.28. What do Perplexity & Burstiness probably serve as?A. Complexities of a language.B. Criteria on features of a text.C. Phenomena of language varieties.D. References in generating a text.29. What are the characteristics of an Al-generated text?A. Low perplexity and low burstiness.B. High perplexity and low burstiness.C. Low perplexity and high burstiness.D. High perplexity and high burstiness.30. Which of the writing ways below does the author skip when developing the article?A. Quoting sayings.B. Showing examples.C. Giving definitions.D. Making comparisons.31. What will be probably talked about next?A. Some essays from ChatGPT.B. An illustration for differences.C. An example of the writer’s own.D. Analyses of lifelong learning.DWhen stressed out, many of us turn to junk food like deep-fried food for comfort. But a new research suggests this strategy may backfire. The study found that in animals, a high-fat diet disrupts resident gut bacteria (肠道细菌) , changes behavior and, through a complex pathway connecting the gut to the brain, influences brain chemicals in ways chat fuel anxiety.“Everyone knows that these are not healthy foods, but we tend to think about them strictly in terms of a little weight gain,”said lead author Christopher Lowry, a professor of integrative physiology at CU Boulder. “If you understand that they also impact your brain in a way that can promote anxiety, that makes the risk even higher.”Lowry’s team divided mice into two groups: Half got a standard diet of about 11% fat for nine weeks; the others got a high-fat diet of 45% fat, consisting mostly of fat from animal products. The typical American diet is about 36% fat, according to the Centers for Disease Control and Prevention.When compared to the control group, the group eating a high-fat diet, not surprisingly, gained weight. But the animals also showed significantly less diversity of gut bacteria. Generally speaking, more bacterial diversity is associated with better health, Lowry explained. The high-fat diet group also showed higher expression of three genes(基因)(tph2, htrla, and slc6a4) involved in production and signaling of the brain chemical called serotoni n-particularly in a region of the central part of the brain known as the dorsal raphe nucleus cDRD, which is associated with stress and anxiety. While serotonin is often billed as a “feel-good brain chemical”, Lowry notes that certain subsets of serotonin neurons(神经元)can, when activated, touch off anxiety-like responses in animals. Especially, heightened expression of tph2 in the cDRD has been associated with mood disorders in humans.“To think that just a high-fat diet could change expression of these genes in the brain is extraordinary,” said Lowry.“The high-fat group essentially had a high anxiety state in their brain. ” However, L owry stresses that not all fats are bad, and that healthy fats like those found in fish, nuts and seeds can be good for the brain.32. What is山e new finding?A. Junk food leads to overweight.B. High-fat food brings bad moods.C. Brain chemicals cause anxiety.D. Gut bacteria benefit brain health.33. What does the underlined word “disrupts” in paragraph l mean?A. Upsets.B. Facilitates.C. Loosens.D. Generates.34. How were the mice eating a high-fat diet by contrast with the control group?A. They looked more anxious.B. They lost much more weight.C. They suffered mood disorders.D. They lacked gut bacteria variety.35. What does Lowry agree with?A. Every fat is harmful.B. Fish fat is harmless.C. Stress comes from fat.D. Some fats are good.第二节(共5小题;每小题2.5分,满分12.5分)阅读下面短文,从短文后的选项中选出可以填入空白处的最佳选项。
特征衍生 英语
特征衍生英语全文共四篇示例,供读者参考第一篇示例:Feature engineering is the process of transforming raw data into useful features that can help machine learning models perform better. It is a crucial step in the data preprocessing phase, as the quality of features directly impacts the performance of the model. In this article, we will explore the concept of feature engineering, its importance, and some common techniques used in the process.1. What is feature engineering?Feature engineering plays a critical role in the success of a machine learning project. Here are some reasons why feature engineering is important:4. Conclusion第二篇示例:Feature Engineering in Machine LearningImportance of Feature EngineeringFeature engineering is crucial for the success of a machine learning project. Good features can significantly improve the performance of a model, while bad features can lead to poor performance. Some of the reasons why feature engineering is important include:第三篇示例:特征衍生是数据科学中一个重要的概念,它指的是通过对现有数据特征进行组合、转换或处理,创造出新的特征,以提高模型的性能或解释力。
环境生态学---第二章 适应
hution)
协同进化的定义——一个物种的性状作为对另一物种性状的 反应而进化,而后一物种的这一性状本身又作为前一物种性 状的反应而进化,这种方式称-----。
“协同”意味着非敌对,关系弱化 两方面的进化速度不同——不对称进化
捕食者难以专化 避稀效应 活命-饱餐原理 捕食者密度低,近亲繁殖较多 捕食者繁殖速度慢(寄生者进化速度快)
the result of natural selection.
适应
有机体所具有的有助于生存和生殖的任 何可遗传特征都是适应。适应性特征可 以是生理的或行为的。适应是自然选择 的结果。
h
7
适应
适应(adapatation) :生物对环境压力的调整过程。分基 因型适应和表型适应两类,后者又包括可逆适应和不可逆 适应。如桦尺蠖在污染地区的色型变化。
环境变异
大多数有机体都必须应付在一定时间尺
度范围内不断变化着的外界环境。某些
环境因子的变化以秒或分计(如当有云块
时的阳光强度),另一些因子的变化以日
或季计,甚至更长更长的时期(如冰河周
期)。
h
18
Internal regulation
Biological cells cannot function with a wildly fluctuating
负反馈
大多数生物的稳态机制以大致一样的方 式起作用:如果一个因子的内部水平(如 温度或渗透性)太高,该机制将减少它; 如果水平太低,就提高它。这个过程叫 做负反馈。负反馈反应的方向与信号的 相反。
h
21
Tolerance
耐受性
Organisms can cope with variation in their external
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
extended with a simple feature-weighting similarity function, outperforms ib1, and sometimes also outperforms both connectionist approaches and knowledgebased \linguistic{engineering" approaches VD93]. The similarity function we introduced in lazy learning DV92] consisted simply of multiplying, when comparing two feature vectors, the similarity between the values for each feature with the corresponding information gain , or in case of features with di erent numbers of values the gain ratio, for that feature.We call this version of lazy learning ib1-ig. During experimentation with the linguistic problems, we also found that accuracy (generalisation performance) decreased considerably when the case base is pruned in some way (e.g., using ib2 Aha91], or by eliminating non-typical cases). Keeping available all potentially relevant case information turns out to be essential for good accuracy on our linguistic problems, because they often exhibit a lot of sub-regularity and pockets of exceptions that have potential relevance in generalisation. Unfortunately, as the prediction function in lazy learning has to compare a test case to all stored cases, and our linguistic datasets typically contain hundreds of thousands of cases or more, processing of new cases is prohibitively slow on single-processor machines. Based on these ndings with ib1-ig, we designed a variant of ib1 in which the case base is compressed into a tree-based data structure in such a way that access to relevant cases is faster, and no relevant information about the cases is lost. This simple algorithm, igtree VD93, DVW97], uses a feature-relevance metric such as information gain to restructure the case base into a decision tree. In Section 2, we describe the igtree model and its properties. Section 3 describes comparative experiments with igtree, ib1, and ib1-ig on learning one of our linguistic tasks and some of the uci benchmark problems. In Section 4, we discuss problems for igtree with other benchmark datasets, introduce tribl, and describe comparative experiments. We discuss related research in Section 5, and present our conclusions in Section 6.
Abstract. This paper reports results with , a formalism for index-
ing and compressing large case bases in Instance-Based Learning (ibl) and other lazy-learning techniques. The concept of information gain (entropy minimisation) is used as a heuristic feature-relevance function for performing the compression of the case base into a tree. igtree reduces storage requirements and the time required to compute classi cations considerably for problems where current ibl approaches fail for complexity reasons. Moreover, generalisation accuracy is often similar, for the tasks studied, to that obtained with information-gain-weighted variants of lazy learning, and alternative approaches such as c4.5. Although igtree was designed for a speci c class of problems {linguistic disambiguation problems with symbolic (nominal) features, huge case bases, and a complex interaction between (sub)regularities and exceptions{ we show in this paper that the approach has a wider applicability when generalising it to tribl, a hybrid combination of igtree and ibl.
2 IGTree
In this Section, we provide both an intuitive and algorithmic description of igtree, and provide some analyses on complexity issues. A more detailed discussion can be found in DVW97]. igtree compresses a case base into a decision tree by recursively partitioning the case base on the basis of the most relevant features. All nodes of the tree contain a test (based on one of the features) and a class label (representing the most probable (most frequently occurring) class of the case-base partition indexed by that node). Nodes are connected via arcs denoting the outcomes for the test (feature values), so that individual cases are stored as paths of connected nodes. A feature-relevance ordering technique (e.g., information gain) is used to determine a xed order in which features are used as tests throughout the whole tree. Thus, the maximal depth of the tree is always equal to the number of features. A considerable compression is obtained as similar cases share partial
A Feature-Relevance Heuristic for Indexing and Compressing Large Case Bases
Walter Daelemans1 , Antal van den Bosch2 , and Jakub Zavrel1
2 1 Computational Linguistics, Tilburg University, The Netherlands Dept. of Computer Science, Universiteit Maastricht, The Netherlands