Marginal Deformations with U(1)^3 Global Symmetry
国际经济学
CHAPTER 3THE STANDARD THEORY OF INTERNATIONAL TRADE Multiple-Choice Questions1. The marginal rate of transformation (MRT) of X for Y refers to:a. the amount of Y that a nation must give up to produce each additional unit of Xb. the opportunity cost of Xc. the absolute slope of the production frontier at the point of productiond. all of the above2. Community indifference curves:a. are negatively slopedb. are convex to the originc. should not crossd. all of the above3. Which of the following statements is true with respect to the MRS of X for Y?a. It is given by the absolute slope of the indifference curveb. declines as the nation moves down an indifference curvec. rises as the nation moves up an indifference curved. all of the above4. Which of the following is not true for a nation that is in equilibrium in isolation?a. It consumes inside its production frontierb. it reaches the highest indifference curve possible with its production frontierc. the indifference curve is tangent to the nation's production frontierd. MRT of X for Y equals MRS of X for Y, and they are equal to Px/Py5. Nation 1's share of the gains from trade will be greater:a. the greater is nation 1's demand for nation 2's exportsb. the closer Px/Py with trade settles to nation 2's pretrade Px/Pyc. the weaker is nation 2's demand for nation 1's exportsd. the closer Px/Py with trade settles to nation 1's pretrade Px/Py6. With free trade under increasing costs:a. neither nation will specialize completely in productionb. at least one nation will consume above its production frontierc. a small nation will always gain from traded. all of the above7. The gains from exchange with respect to the gains from specialization are always:a. greaterb. smallerc. equald. we cannot say without additional information8. If nations have identical production possibilities frontiers it is possible for them to experience gains from trade ifa. they have the same tastes and preferenceb. they have different tastes and preferencesc. they have different natural resourcesd. they have constant opportunity costs9. If a nation has a steeper indifference curve relative to that of another nation it means thata. it has stronger tastes and preferences for good Yb. it has stronger tastes and preferences for good Xc. it has a higher opportunity costs for good Yd. it has a higher opportunity costs for good X10. The primary factor for the loss in manufacturing employment in developed nations isa. Tradeb. Investmentc. Productivity growthd. OutsourcingTRUE/FALSE1. With increasing opportunity costs, comparative advantage depends on a nation's supply conditions and demand conditions; with constant opportunity costs, comparative advantage depends only on demand conditions.2. T he Heckscher-Ohlin theory asserts that relative differences in labor productivity underlie comparative advantage.3. A nation achieves autarky equilibrium at the point where its community indifference curve is tangent to its production possibilities schedule.4. A nation realizes maximum gains from trade at the point where the internationalterms-of-trade line is tangent to its community indifference curve.5. Assume that the United States and Canada engage in trade. If the international terms of trade coincides with the U.S. cost ratio, the United States realizes all of the gains from trade with Canada.Short Answer1. Carefully explain what an indifference curve is.2. What is the reason for increasing opportunity cost.Essay1. Given: (1) two nations (1 and 2) which have the same technology but different factor endowments and tastes, (2) two commodities (X and Y) produced under increasing costs conditions, and (3) no transportation costs, tariffs, or other obstructions to trade. Prove geometrically that mutually advantageous trade between the two nations is possible. Note: Y our answer should show the autarky (no-trade) and free-trade points of production and consumption for each nation, the gains from trade of each nation, and express the equilibrium condition that should prevail when trade stops expanding.)。
A Simple Additive Re-weighting Strategy for Improving Margins
A Simple Additive Re-weighting Strategy for Improving MarginsFabio Aiolli and Alessandro SperdutiDepartment of Computer Science,Corso Italia40,Pisa,Italye-mail:aiolli,perso@di.unipi.itAbstractWe present a sample re-weighting scheme inspiredby recent results in margin theory.The basic idea isto add to the training set replicas of samples whichare not classified with a sufficient margin.We provethe convergence of the input distribution obtainedin this way.As study case,we consider an instanceof the scheme involving a1-NN classifier imple-menting a Vector Quantization algorithm that ac-commodates tangent distance models.The tangentdistance models created in this way have showna significant improvement in generalization powerwith respect to the standard tangent models.More-over,the obtained models were able to outperformstate of the art algorithms,such as SVM.1IntroductionIn this paper we introduce a simple additive re-weighting method that is able to improve the margin distribution on the training set.Recent results in computational learning the-ory[Vapnik,1998;Schapire et al.,1998;Bartlett,1998]have tightly linked the expected risk of a classifier(i.e.the proba-bility of misclassification of a pattern drawn from an indepen-dent random distribution),with the distribution of the margins in the training set.In general,it results that we can expect best performances on generalization(minimal error on test data) when most of the patterns have high margins.The aforementioned results are at the basis of the theory of two of the most impressive algorithms:Support Vector Ma-chines and Boosting.Either SVM’s and Boosting effective-ness is largely due to the fact that they,directly or not,effec-tively improve the margins on the training set.In particular, SVM explicitlyfinds the hyper-plane with the largest mini-mum margin in a dimensional-augmented space where train-ing points are mapped by a kernel function.In this case,mar-gin theory permits to explain impressive performances even in very high dimensional spaces where data are supposed to be more separated.Most of the recent efforts in SVMs are in the choice of the right kernels for particular applications. For example,in OCR problems,the polynomial kernel was proven to be very effective.On the other side,boosting algorithms,and in particular the most famous version AdaBoost,produce weighted ensemble of hypotheses,each one trained in such a way to minimize the empirical error in a given“difficult”distribution of the train-ing set.Again,it has been shown[Schapire,1999]that boost-ing essentially is a procedure forfinding a linear combination of weak hypotheses which minimizes a particular loss func-tion dependent on the margins on the training set,literally.Recently,research efforts related to boosting algorithms faced the direct optimization of the mar-gins on the training set.For example,this has been done by defining different margin-based cost functions and searching for combinations of weak hypotheses so to minimize these functions[Mason et al.,1998].We will follow a related approach that aims tofind a sin-gle(eventually non linear)optimal hypothesis where the op-timality is defined in terms of a loss-function dependent on the distribution of the margins on the training set.In order to minimize this loss we propose a re-weighting algorithm that maintains a set of weights associated with the patterns in the training set.The weight associated to a pattern is iteratively updated when the margin of the current hypothesis does not reach a predefined threshold on it.In this way a new distribu-tion on the training data will be induced.Furthermore,a new hypothesis is then computed that improves the expectation of the margin on the new distribution.In the following we prove that the distribution converges to a uniform distribution on a subset of the training set.We apply the above scheme to an OCR pattern recognition problem,where the classification is based on a1-NN tangent distance classifier[Simard et al.,1993],obtaining a signifi-cant improvement in generalization.Basically,the algorithm builds a set of models for each class by an extended version of the Learning Vector Quantization procedure(LVQ[Kohonen et al.,1996])adapted to tangent distance.In the following we will refer to this new algorithm as Tangent Vector Quantiza-tion(TVQ).The paper is organized as follows.In Section2,we intro-duce the concept of margin regularization via the input dis-tribution on the training set.Specifically,we present the-Margin Re-weighting Strategy,which holds the property to guarantee the convergence of the input distribution.In Sec-tion3,we introduce a definition for the margins in a1-NN scheme that considers the discriminative ratio observed for a particular pattern,and in Section4we define the TVQ al-gorithm.Finally,in Section5we present empirical resultscomparing TVQ with other1-NN based algorithms,includ-ing SVM.2Regularization of the marginsWhen learning takes place,the examples tend to influence in a different way the discriminant function of a classifier.A discriminant function can be viewed as a resource that has to be shared among different clients(the examples).Often, when pure Empirical Risk Minimization(ERM)principle is applied,that resource is used in a wrong way since,with high probability,it is almost entirely used by a fraction of the train-ing set.Margin theory formally tells us that it is preferable to regularize the discriminant function in such a way to make the examples sharing more equally its support.Inspired on the basic ideas of margin optimization,here, we propose a simple general procedure applicable,eventu-ally,to any ERM-based algorithm.It permits to regularize the parameters of a discriminant function so to obtain hypotheses with large margins for many examples in the training set. Without generality loss we consider the margin for a train-ing example as a real number,taking values in,rep-resenting a measure of the confidence shown by a classifier in the prediction of the correct label.In a binary classifier,e.g. the perceptron,the margin is usually defined as where is the target and is the output computed by the clas-sifier.Anyway,it can be easily re-conduced to therange by a monotonic(linear or sigmoidal)transformation of the output.In any case,a positive value of the margin must correspond to a correct classification of the example. Given the function that,provided an hypothesis, associates to each pattern its margin,we want to define a loss-function that,when minimized,permits to obtain hypotheses with large margins(greater than afixed threshold)for many examples in the training set.For this,we propose to minimize a function that,basically,is a re-formulation of SVM’s slack variables:(1)where is a training set with examples,andif,and0otherwise.The function is null for margins higher than the threshold and is linear with re-spect to the values of the margins when they are below this threshold.We suggest to minimize indirectly via a two-step itera-tive method that“simultaneously”(1)searches for an a priori distribution for the examples that,given the current hy-pothesis,better approximates the function and (2)searches for a hypothesis(e.g.by a gradient based procedure)that,provided the distribution,improves the weighted function(2)This new formulation is equivalent to that given in eq.(1) provided that converges to the uniform distribution on the-mistakes(patterns that have margin less than the thresh-old).-Margin Re-weighting Strategy Input:T:number of iterations;:hypotheses space;:margin threshold;:bounded function;:training set;Initialize(initial hypothesis);for,;endreturn;Figure1:The-Margin Re-weighting Strategy.The algorithm,shown in Figure1,consists of a series of trials.An optimization process,that explicitly maximizes the function according to the current distribution for the examples,works on an artificial training set,initialized to be equal to the original training set.For each, replicas of those patterns in that have margin below the fixed threshold are added to augmenting their density in and consequently their contribution in the optimization process.Note that denotes the number of occurrences in the extended training set of the pattern.In the following,we will prove that the simple iterative procedure just described makes the distribution approaching a uniform distribution on the-mistakes,provided thatis bounded.2.1Convergence of the distributionFor each trial,given the margin of each example in the training set,we can partition the training sample aswhere is the set of-mistakes andis the complementary set of-correct patterns.Let denote and let be the number of occurrences of pattern in at time,with density.Moreover,let be a suitable function of such that.Let be the update rule for the number of occurences in,where is bounded and takes values in(note that may change at different iterations but it is independent from).It’s easy to verify that for each because of the monotonicity of,and thatwith the number of iterations.In factFirst of all we show that the distribution converges.Thiscan be shown by demonstrating that the changes tend to zero with the number of iterations,i.e.,We havewhich can be easily bounded in module by a quantity thattends to0:We now show to which values they converge.Let and be,respectively,the cumulative number and the mean ratio of-mistakes for on thefirst epochs,Given the convergence of the optimization process that maximizes in eq.(2),the two sets and are going to become stable and the distribution on will tend to a uniformdistribution in(where)and will be null elsewhere (where).This can be understood in the following way as well.Given the definition of the changes made on the gamma values on each iteration of the algorithm,we calculate the function that we indeed minimize.Since,after some algebra,we can rewrite as: Thus,whenis reached.Note that,the minimum of is consistent with the constraint.In general,the energy function is modulated by a term de-creasing with the number of iterations,dependent on the used but independent from gamma,that can be viewed as a sort of annealing introduced in the process.In the following,we study a specific instance of the-Margin Re-weighting Strategy.3Margins in a1-NN framework and tangent distanceGiven a training example,and afixed number of models for each class,below,we give a definition of the margin for the example when classified by a distance based1-NN classifier.Given the example,let and be the squared distances between the nearest of the positive set of models and the nearest of the negative sets of models,respectively. We can define the margin of a pattern in the training set as:.The corresponding problem for the two-sided tangent dis-tance can be solved by an iterative algorithm,called(two-sided)HSS,based on Singular Value Decomposition,pro-posed by Hastie et al.[Hastie et al.,1995].When the one-sided version of tangent distance is used,HSS and PCA co-incide.So,in the following,the one sided version of this algorithm will be simply referred as to HSS.Given a one-sided tangent distance model,it is quite easy to verify that the squared tangent distance between a pat-tern and the model can be written as:(6)where,and denotes the transpose of.Consequently,in our definition of margin,we have,and.Given this definition of margin,we can implement the choice of the new hypothesis in the-Margin Re-weighting Strategy by maximizing the margin using gradient ascent on the current input distribution.3.1Improving margins as a driven gradient ascent Considering the tangent distance formulation as given in equation(6)we can verify that it is defined by scalar prod-ucts.Thus,we can derivate it with respect to the centroid and the tangent vectors of the nearest positive model obtaining:Considering that we can compute the derivative of the margin with respect to changes in the nearest positive model:A similar solution is obtained for the nearest negative model since it only differs in changing the sign and in exchanging indexes and.Moreover,the derivatives are null for all the other models.Thus,we can easily maximize the average margin in the training set if for each pattern presented to the classifier we move the nearest models in the direction suggested by the gra-dient.Note that,like in the LVQ algorithm,for each training example,only the nearest models are changed.When maximizing the expected margin on the current dis-tribution,i.e.,,for each model we have:where is the usual learning rate parameter.In the algorithm (see Figure2),for brevity,we will group the above variations by referring to the whole model,i.e.,;,initialize with random models;for;,select s.t.andare the nearest,positive and negative,-pute as in eq.(3)and accumulate the changes on the nearest modelsand orthonormalize its tangents;,update the distribution by the ruleErr%3.583.523.513.403.1610.642.823.234.021The number of pixel with value equal to1is used as the grey value for the corresponding pixel in the new image.2http://www-ai.cs.uni-dortmund.de/SOFTWARE/SVM=10=15=02.40 2.22 6.402.10 2.10–Table2:Test results for TVQ.with a polynomial kernel of degree2.We ran the TVQ al-gorithm with two different values for and four different ar-chitectures.Moreover,we ran also an experiment just using a single centroid for class(i.e.,)with.The smaller value for has been chosen just to account for the far smaller complexity of the model.In almost all the experiments the TVQ algorithm obtained the best performance.Results on the test data are reported in Table2.Specifically,the best result for SVM is worst than almost all the results obtained with TVQ.Particularly inter-esting is the result obtained by just using a single centroid for each class.This corresponds to perform an LVQ with just10 codebooks,one for each class.In addition,TVQ returns far more compact models allow-ing a reduced response time in classification.In fact,the1-NN using polynomial SVMs with,needs2523support vectors,while in the worst case the models returned by the TVQ involve a total of480vectors(one centroid plus15tan-gents for each model).In Figure3,typical error curves for the training and test er-rors(3-),as well as the margin distributions on the training set(3-)and the induced margin distribution on the test set (3-)are reported.From these plots it is easy to see that the TVQ doesn’t show overfitting.This was also confirmed by the experiments involving the models with higher complexity and smaller values of.Moreover,the impact of the-margin on thefinal margin distribution on the training set is clearly shown in3-,where a steep increase of the distribution is observed in correspondence of at the expenses of higher values of margin.Even if at a minor extent,a similar impact on the margin distribution is observed for the test data.In Figure4we have reported the rejection curves for the different algorithms.As expected,the TVQ algorithm was competitive with the best SVM,resulting to be the best algo-rithm for almost the whole error range.6ConclusionsWe proposed a provably convergent re-weighting scheme for improving margins,which focuses on“difficult”examples. On the basis of this general approach,we defined a Vector Quantization algorithm based on tangent distance,which ex-perimentally outperformed state of the art classifiers both in generalization and model compactness.These results confirm that the control of the shape of the margin distribution has a great effect on the generalization performance.When comparing the proposed approach with SVM,we may observe that,while our approach shares with SVM the Statistical Learning Theory concept of uniform convergence of the empirical risk to the ideal risk,it exploits the input distribution to directly work on non-linear models instead of resorting to predefined kernels.This way to proceed is2468100200040006000800010000HSS initialization Random InitializationCentroid Initialization123450500100015002000TVQ Test-Error (%) TVQ Train-Error (%)00.10.20.30.40.50.60.70.80.91-0.200.20.40.60.8HSS model 50 It. TVQ 500 It. TVQ 5000 It. TVQ00.10.20.30.40.50.60.70.80.91-0.200.20.40.60.8HSS model 50 It. TVQ 500 It. TVQ 5000 It. TVQFigure 3:TVQ with 151tangents,and :comparison with different initialization methods;test and training error;cumulative margins on the training set at different iterations;cumulative margins on the test set at different iterations.% r e j e c t i o n% errorFigure 4:Detail of rejection curves for the different 1-NN methods.The rejection criterion is the difference between the distances of the input pattern with respect to the first and the second nearest models.very similar to the approach adopted by Boosting algorithms.However,in Boosting algorithms,several hypotheses are gen-erated and combined,while in our approach the focus is on a single hypothesis.This justifies the adoption of an addi-tive re-weighting scheme,instead of a multiplicative scheme which is more appropriate for committee machines.AcknowledgmentsFabio Aiolli wishes to thank Centro META -Consorzio Pisa Ricerche for supporting his PhD fellowship.References[Bartlett,1998]P.L.Bartlett.The sample complexity of pat-tern classification with neural networks:the size of the weights is more important than the size of the network.IEEE Trans.on Infor.Theory ,44(2):525–536,1998.[Hastie et al.,1995]T.Hastie,P.Y .Simard,and E.S¨a ckinger.Learning prototype models for tan-gent distance.In G.Teasauro,D.S.Touretzky,and T.K.Leen,editors,Advances in rm.Proc.Systems ,volume 7,pages 999–1006.MIT Press,1995.[Kohonen et al.,1996]T.Kohonen,J.Hynninen,J.Kangas,aksonen,and K.Torkkola.Lvq。
Evaluation of Recommendations__Rating-Prediction and Ranking
Evaluation of Recommendations: Rating-Prediction and RankingHarald Steck∗Netflix Inc.Los Gatos,Californiahsteck@netflABSTRACTThe literature on recommender systems distinguishes typi-cally between two broad categories of measuring recommen-dation accuracy:rating prediction,often quantified in terms of the root mean square error(RMSE),and ranking,mea-sured in terms of metrics like precision and recall,among others.In this paper,we examine both approaches in de-tail,andfind that the dominating difference lies instead in the training and test data considered:rating prediction is concerned with only the observed ratings,while ranking typi-cally accounts for all items in the collection,whether the user has rated them or not.Furthermore,we show that predict-ing observed ratings,while popular in the literature,only solves a(small)part of the rating prediction task for any item in the collection,which is a common real-world prob-lem.The reasons are selection bias in the data,combined with data sparsity.We show that the latter rating-prediction task involves the prediction task’Who rated What’as a sub-problem,which can be cast as a classification or ranking problem.This suggests that solving the ranking problem is not only valuable by itself,but also for predicting the rating value of any item.Categories and Subject DescriptorsH.2.8[Database Management]:Database Applications—Data MiningKeywordsRecommender Systems;Selection Bias;Rating Prediction; Ranking1.INTRODUCTIONThe idea of recommender systems(RS)is to automatically suggest items to each user that s/he mayfind appealing,e.g.,∗Part of this work was done while at Bell-Labs,Alcatel-Lucent,Murray Hill,New Jersey.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage,and that copies bear this notice and the full ci-tation on thefirst page.Copyrights for third-party components of this work must be honored.For all other uses,contact the owner/author(s).Copyright is held by the author/owner(s).RecSys’13,October12–16,2013,Hong Kong,China.ACM978-1-4503-2409-0/13/10./10.1145/2507157.2507160.see[2]for an overview.Prior to a user study or deployment, offline testing on historical data provides a time and cost efficient initial assessment of an RS.A meaningful offline test ideally provides a good approximation to the utility function to be optimized by the deployed system(e.g.,user satisfaction,increase in sales).Recommendation accuracy is an important part of such an offline metric.The literature on recommender systems typically distin-guishes between two ways of measuring recommendation ac-curacy:rating prediction,often measured in terms of the root mean square error(RMSE);and ranking,which is mea-sured in terms of metrics like precision and recall,among others.In this paper,we examine both kinds of accuracy measures in detail and identify different variants. Concerning rating prediction,the literature has focused mainly on predicting the rating values for those items that a user has deliberately chosen to rate.This kind of data can be collected easily,and is hence readily available for offline training and testing of recommender systems.Moreover,the root mean square error(RMSE),the most popular accuracy metric in the recommender literature,can easily be evalu-ated on the user-item pairs that actually have a rating value in the data.The objective of common real-world rating prediction tasks, however,is often different from this scenario:typically,the goal is to predict the rating value for any item in the collec-tion,independent of the fact if a user rates it or not.The reason for the difference is that the ratings are missing not at random(MNAR)in the data sets typically collected[14,13, 22,23,17].For instance,on the Netflix web site,a personal-ized rating value is predicted for any video in the collection, as to provide the user with an indication of enjoyment if s/he watches it.These two variants of rating prediction motivated us to examine it in more detail.In thefirst part of this paper,we identify three variants of rating prediction,and show how they differ from each other in terms of answers they provide to the recommendation problem and in terms of the degree of difficulty to solve them.Moreover,we show that many real-world rating prediction tasks require an additional sub-problem to be solved:as to which user deliberately chooses to assign a rating to which item in the collection(also known as’Who rated What’[1]).This sub-problem may be cast as a classification or ranking problem.In the second part of this paper,we focus on ranking for solving real-world recommendation tasks.As just mo-tivated,ranking may not only be an important sub-problem of many real-world rating-prediction tasks,but ranking is arelevant approach on its own–for instance,when the objec-tive is to select a small subset of items from the entire collec-tion(top N recommendations).We examine three variants of ranking:ranking of all items in the collection,ranking of only those items that the user has not rated yet,and ranking of only those items that the user has already rated. Our experiments provide strong empirical evidence that the difference in ranking accuracy due to these variants is con-siderably larger than the difference due to different ranking metrics.This suggests that the appropriate choice of train-ing and testing protocol,e.g.,which user-item pairs to con-sider,may be more important than the ranking metric for solving a given real-world ranking problem.The main contributions of this paper can be summarized as follows:1.we identify three variants of rating prediction.Weshow that the variant that is the main focus in the literature solves only a part of many real-world rating-prediction problems.This suggests further research is needed to develop rating prediction approaches for the (various)real-world tasks.2.concerning ranking,wefind that in practice the choiceof an appropriate training and testing protocol makesa bigger difference than the choice of ranking metric.3.when comparing rating prediction with ranking,wefind that their main difference is not caused by the different metrics(e.g.,RMSE vs.ranking metrics).Instead,it is due to the user-item pairs that are taken into account:only the subset of user-item pairs wherea rating is observed in the data(e.g.,for RMSE),vs.all user-item pairs–whether the user deliberately chose to assign a rating value to an item or not.As main causes,we identify selection bias in the data combined with data sparsity.This paper is organized into two main parts:first,we examine rating prediction,which we model as a two-stage problem:the user’s deliberate choice as to which item to rate,and which rating value to assign(Section2.1).This leads immediately to three variants of rating prediction(Sec-tion2.2),and we derive their general relationships and dif-ferences in Section2.3.In Section2.4,we discuss general implications on modeling the two-stage rating process,while Section2.5justifies a particular model approach.In the sec-ond part of this paper(Section3),we outline three variants of ranking protocols(Section3.1),which is analogous to the variants for rating prediction.In Section3.2,we briefly review various ranking metrics,so that we are ready to com-pare the effects of ranking protocols and ranking metrics in our experiments in Section3.3.Wefinish with our Conclu-sions.2.RATING PREDICTIONThe root mean square error(RMSE)is by far the most popular accuracy measure in the recommender literature.It is commonly associated with the objective of rating predic-tion,i.e.,predicting the rating value that a user would assign to an item which s/he has not rated yet.In this section,we examine this objective in detail,and identify three variants with important differences.2.1Modeling the Decision to RateThere has been recent work on the fact that ratings are missing not at random(MNAR)in the data typically col-lected in recommender system applications[14,13,22,23, 17].The main reason for this selection bias in the observed data is that the users are free to deliberately choose which items to rate.This motivates us to model the rating-prediction task as shown in Figure1(left):In this graphical model,U denotes the random variable concerning users,and takes values u, where u∈{1,...,n}denotes user IDs,and n is the num-ber of users.Similarly,I is the random variable concerning items,taking values i∈{1,...,m},where m is the number of items in the collection.The decision if a user u deliberately chooses to rate item i is represented by the random variableC,which takes values c∈{c+,c−},where c+denotes that the user chooses to rate the item,while c−means that the user does not deliberately choose to rate the item.Finally,R is the random variable regarding rating values,taking val-ues r.For instance,r∈{1,...,5}in case of ratings on a scale from1to5stars like in the Netflix Prize competition[3]or MovieLens data[5].Note that C depends on the user and the item,such that this model is able to capture any reason that depends on u and i for choosing to give a rating(e.g.,the item’s popu-larity[23,17]).This is more general than the approaches presented in[13],where this decision does not depend on the user-item pair directly,but only on the rating value itself. The dependence on the rating value as a decision-criteria whether to provide a rating is included as a special case in the model in Figure1:using theory on graphical models,the graph in Figure1is Markov-equivalent to the graph where the orientation of the edge between C and R is reversed. Markov-equivalence means that both graphical models rep-resent the same probability distribution(even though the graphs look different).The reason why the edge can be re-versed is that both nodes C and R in the graph have the same parents,namely U and I.This is a key difference to the graphs used in[13],where C is not connected to the user or item.Given that all random variables U,I,C,and R are dis-crete,we assume the most general probability distribution, the multinomial distribution.Note that this may be too general a model assumption as to learn a“good”model in practice,but in thefirst part of this paper,we are concerned with general insights that we can obtain without resorting to a more specific model.2.2Rating-Prediction VariantsRating prediction is concerned with predicting the rating value r that a user u assigns to an item i.It can be cast as predicting the probability of each rating value r that user u assigns to each item i,denoted by p(r u,i)=p(r|u,i).Based on the graphical model in Figure1,this can be decomposed as follows:p(r|u,i)(1)=p(r|c+,u,i)(2)p(c+|u,i)+p(r|c−,u,i)(3)p(c−|u,i)(1) This equation shows that there are three variants of rating prediction,i.e.,three probabilities concerning rating value r. Additionally,Eq.1also shows that these terms are linked via two terms regarding the probability that the user de-marginalization:Figure 1:When a user (node U )provides a rating value (node R )for an item (node I ),the user typ-ically has the freedom to deliberately choose which item(s)to rate.The user’s implicit decision as to whether rate an item or not,is represented by node C in the graphical model.The graph on the right hand side is obtained by marginalizing out U and I .liberately chooses to rate an item (’Who rated What’[1]),which is discussed at the end of this sub-section.We first discuss the three variants of rating prediction:1.p (r |u,i ):This is the probability that user u assigns rat-ing value r to item i in the collection.It is important that there is no restriction on which item i it is;it can be any item in the collection.The ability to make an accurate rating prediction for any item i is extremely useful in practical applications,as it allows one to find the items with the highest (predicted)enjoyment value from the entire collection .2.p (r |c +,u,i ):This is the probability that user u assigns rating value r to item i under the condition that the user deliberately chooses to rate this item.This con-dition may not hold for any item i in the collection.In fact,due to data sparsity,it may apply to only a (small)subset of items ,as shown in the next section.For instance,it is commonly observed in practice that users tend to assign ratings mainly to items they like or know [14,13,22,23,17].The reason is that the observed data is collected by allowing users to choose which items to rate.As a consequence,this conditional probability is learned when optimizing RMSE or MAE on the observed data.3.p (r |c −,u,i ):This is the probability that user u would assign rating value r to item i under the condition that the user would not deliberately choose to rate this item.In practice,this rating value could be elicited,for instance,by providing the user with some reward for completing this task.These ratings are typically costly to elicit,and burdensome for the user to provide.For this reason,ratings under this condition are typically not available,which obviously makes it difficult to es-timate this probability distribution accurately.Due to data sparsity,this distribution applies to the vast ma-jority of user-item pairs,as discussed in more detail in the next section.In the rare case that they were collected,for instance in the Yahoo!Music data [13],it becomes obvious that this conditional distribution of rating values can be very different from the rating distribution under the condition that the user delib-erately chose an item to rate.For instance,compare Figures 2(a)and (b)in [13],or Figure 1in [17].The decomposition in Eq.1also shows that,in addition,the two conditional probabilities p (r |c +,u,i )and p (r |c −,u,i )have to be estimated as to obtain valid rating predictions for any item in the collection.These probabilities capture whether user u deliberately chooses to rate item i or not.This shows their importance for rating prediction of any item,and motivates further research beyond the few works conducted so far,like the KDD Cup 2007,titled ’Who rated What’[1];there,one of the insights was that it was difficult to make accurate predictions as to which user deliberately chooses to rate which items.2.3Relationship of the three VariantsThis section presents our main results concerning rating prediction,i.e.,a general relationship between the three vari-ants of rating prediction.2.3.1Marginal ProbabilitiesA necessary condition for accurate personalized rating pre-diction is that the average (and hence unpersonalized)rating predictions have to be accurate on average as well.This is outlined in detail in the following.To this end,let us con-sider p (r ),which is the probability of rating value r averaged over all users u and items i .Based on our graphical model in Figure 1,this can be derived by marginalizing over u and i in Eq.1:p (r )=c,u,ip (r |c,u,i )·p (c |u,i )·p (u )·p (i )(2)=cp (r |c )·p (c )(3)=p (r |c +)·p (c +)+p (r |c −)·p (c −)(4)=p (r |c −)+p (c +){p (r |c +)−p (r |c −)}(5)≈p (r |c −)(6)The line-by-line comments are as follows:equality (2)in the first line follows from the definition of p (r )in terms of the graphical model in Figure 1.In equality (3),the random variables U and I are marginalized out from the probability distribution,so that we obtain the marginal dis-tribution over C and R ,which is also represented in Figure 1(right).This is valid [11]because the class of graphi-cal models representing probability distributions from the exponential family,like the multinomial distribution used here,is closed when marginalizing out over a variable that is connected with all its edges to the same clique 1in the graph,which is obviously the case in Figure 1.Equality (4)re-states the previous line,using c ∈{c +,c −},and equality (5)uses p (c −)=1−p (c +).The interesting approximation in the last line is accurate due the sparsity of the data ,which is typical for recommender applications:based on the graph,we havep (c +)=u,ip (c +|u,i )·p (u )·p (i ).(7)Any accurate prediction for p (c +)has to be close to the empirical estimate ˆp (c +),which is given byˆp (c +)=#ratings|U |·|I |=data sparsity1A clique is a completely connected sub-graph.where#ratings is the number of ratings in the data; |U|=n and|I|=m denote the number of users and items, respectively.Hence,p(c+)equals the data sparsity.In typ-ical applications the data sparsity is in the single-digit per-cent range,and often even one or more orders of magnitudes lower.For instance,it was about1%in the Netflix Prize data[3],which may be considered a relatively dense data set compared to other recommender applications.Hence, the approximation in Eq.6is accurate up to a few percent in general.This is orders of magnitudes more accurate than any real-world rating-prediction accuracy,e.g.,see RMSE for the Netflix Prize data[3]or MovieLens data[5].We hence arrive atConclusion1:Due to data sparsity,the rating prediction variants1.and3.,as outlined in Section2.2,have to be closely related.In contrast,variant2.is not required to be similar to variants1.or3.This suggests that there is a difference between the main focus of the literature(variant 2.),and many real-world rating-prediction problems(variants1.and3.).2.3.2Average RatingsThe average of the predicted rating values of a recom-mender system has an important impact on its accuracy in terms of RMSE(and similarly for MAE).This is outlined in the following.Analogous to Eq.(2)-(6),one obtains for the average rating value:E[R]=r,c,u,ir·p(r|c,u,i)·p(c|u,i)·p(u)·p(i)(8) =E[R|c−]+p(c+){E[R|c+]−E[R|c−]}(9)≈E[R|c−](10) where E[·]denotes the(conditional)expectation/average of the random variable R,i.e.,the rating values.Not sur-prisingly,this confirms the previous relationships among the three variants of rating prediction:the average rating value of variants1.and3.has to be approximately the same due to data sparsity,while the average rating of variant2.is not required to be similar.In fact,there is strong empirical evidence,for instance, provided by the Yahoo!Music data2[13],suggesting that these average rating values can actually be very different: Figures2(a)and(b)in[13]show that,on a rating scale from1to5,we have:•E[R|c−]≈E[R]≈1.8:this is the average rating when users where asked to rate songs that were randomly selected by Yahoo!(instead of selected by the users).These rating data may hence be considered as(approx-imately)fulfilling the missing at random condition[18, 12],such that its rating distribution provides an unbi-ased estimate of the(unknown)true distribution.•E[R|c+]≈2.9:this is the average rating when users were free to deliberately choose which items to rate.Evidently,this value is significantly larger than1.8, suggesting a strong selection bias.2The actual Yahoo!Music data set was not available to us, so that we had to resort to the histograms published in[13].This difference in the average rating is extremely large–when compared to typical improvements in terms of RMSE, as outlined in the following.In the Netflix Prize data,the winning approach achieved RMSE≈0.86,which was so dif-ficult to achieve that it required about three years of research work.In comparison,when simply predicting the average rating for all users and all items,one achieves RMSE≈1.0. This shows that an improvement in RMSE of0.14(on a rat-ing scale of1to5)may look small,but is actually very large. The numbers for MovieLens data are slightly different,but also here,the improvement of a sophisticated approach over a simple approach in terms of RMSE is much less than1. This shows that a difference in the average predicted rat-ing of about1can easily dominate over the improvement of a more accurate recommender system.This becomes clear from the following thought experiment:Let us assume we train a recommender system on available rating data where the users were free to choose which items to rate.This is the typical scenario for collecting data.Let its RMSE(on a test set)be given by RMSE0;its average predicted rating value will be close to the average rating value in the training data, E[R|c+].This recommender system will hence be accurate for variant2.Now,let us consider the common real-world task of rating-prediction for any/each item in the collection (variant1):in this case,the(unknown)true average rating is E[R];now,the previously trained recommender system has a bias b=E[R|c+]−E[R].This results in a degraded RMSE:RMSE1=RMSE2+ing the numerical values from above,this suggests that RMSE1 1.Inter-estingly,this is worse than RMSE≈1,which can be achieved by an(unpersonalized)recommender system that predicts the average rating E[R].Even though the value E[R]may be unknown in many practical applications,it may also be possible to determine its value with some additional efforts, for instance,by running a truly randomized experiment with a small subset of the users.This leads us to the following conclusion:Conclusion2:A recommender system with a low RMSE concerning the rating-prediction variant2,is not guaranteed to achieve a low RMSE regarding rating-prediction tasks1 or3.Among these three variants,variant1refers to a common real-world rating-prediction task,like e.g.,on the Netflix web site,which provides a personalized rating prediction for any video on its website.While many excellent solutions have been developed for variant2in the literature,this conclusion suggests that additional research is needed as to develop accurate prediction models for variant1.2.4Implications for ModelingConclusions1and2above show that a key challenge in building real-world recommender systems is that rating distributions p(R|U,I)(variant1)or p(R|c−,U,I)(variant 3)are relevant for the user experience in many real-world applications;in contrast,the data that are readily avail-able in large quantities follow the distribution p(R|c+,U,I) (i.e.,variant2).Given the practical importance of variants 1and3,it is crucial to build recommender systems that account for the items(and their ratings)that a user has not rated.Developing solutions for these new objectives of rating-prediction is an interesting area for future work.In the following,we outline a Bayesian approach that uses an informative prior distribution that incorporates the rat-ing distributions of the items that were not rated by a user.There are different ways of defining this prior distribution.First,one may run an experiment and elicit ratings for ran-dom items from users,like in the Yahoo!Music data.This provides a good estimate of the ratings concerning items that a user would not have rated otherwise.But it is also a costly experiment,and puts a burden onto users.Especially,if the items to rate are movies,it would be very time-consuming for users to watch random movies as they might not enjoy them.Second,one may use a prior distribution with a small number of free parameters,which can be then tuned to achieve the desired result,e.g.,by cross-validation.Such an approach was outlined in [22]for rating data,and a prob-abilistic version with prior distributions in [25].The lat-ter paper also shows that several rating values assigned to the same item i by a user u ,i.e.,a distribution of rating values p (R |u,i ),is equivalent to using the average rating ¯r u,i =E[R |u,i ]= r r ·p (r |u,i )when optimizing the least squares objective function.3This allows one to parameter-ize the prior distribution in terms of its mean.In [22],a single mean rating value is used for all users and items.It is an interesting result that,when this mean rating value is optimized as to achieve the best ranking 4performance on the Netflix test set,a mean rating value of about 2is found.This appears like a reasonable value for approximating the (unknown)mean rating value for the items that a user did not rate,as it agrees well with the results found for the Ya-hoo!Music data.Hence,this approach may provide a way for approximating relevant parameters of the (otherwise un-known)rating distribution concerning the items that a user did not rate.The approach in [22]is summarized in the fol-lowing section;it will also be used in our experiments later in this paper.2.5Model and TrainingIn this section,we briefly review the low-rank matrix-factorization (MF)model named AllRank in [22],which was introduced as a point-wise ranking approach that accounts for all (unrated)items for each user.However,one can also view it from the perspective of rating prediction:the ob-served ratings in the data (i.e.,p (r |c +,u,i )),are comple-mented by imputed ratings with low values (as to approxi-mate the unknown p (r |c −,u,i )).As a result,an approxima-tion to p (r |u,i )is achieved,which applies to any item in the collection.For comparison,also the standard MF approach of minimizing RMSE on the observed ratings is discussed,and denoted by MF-RMSE.The matrix of predicted ratings ˆR∈R m ×n ,where m de-notes the number of items,and n the number of users,is modeled asˆR=r 0+P Q ,(11)with matrices P ∈R m ×j 0and Q ∈R n ×j 0,where the rank j 0 m,n ;and r 0∈R is a (global)offset.3Note that this holds only for optimization/maximum-a-posteriori estimates,but does not apply for estimating the full posterior distribution,e.g.,by means of Markov Chain Monte Carlo.4Note that,in contrast to RMSE,(some)ranking measures can be applied to the entire collection of items,and hence account for both items with and without ratings assigned by the user.For computationally efficient training,the square error (with the usual L2-norm regularization)is optimized: all uall iW i,u ·R o&i i,u −ˆRi,u 2+λj 0j =1P 2i,j +Q 2u,j,(12)where ˆRi,u denotes the ratings predicted by the model in Eq.11;and R o&ii,uequals the actual rating value in the training data if observed for item i and user u ;otherwise thevalue R o&ii,u=r 0∈R is imputed.The key is in the training weights,W i,u =1if R obsi,u observedw 0otherwise.(13)In AllRank [22],the weight assigned to the imputedratings is positive ,i.e.,w 0>0[22],and the imputed rat-ing value is lower than the observed average rating in the data.This captures the fact that the (unknown)probability p (r |c −,u,i )is larger for lower rating values,compared to the observed probabilities p (r |c +,u,i ).In contrast,MF-RMSE is obtained for w 0=0.This seemingly small difference has the important effect that AllRank is trained on a combi-nation of both distributions,p (r |c −,u,i )and p (r |c +,u,i ),geared towards approximating p (r |u,i ).In contrast,MF-RMSE is optimized towards p (r |c +,u,i ).Due to this differ-ence,in [22]AllRank was found to achieve a considerably larger top-N hit-rate or recall when ranking all items in the collection,compared to various state-of-the-art approaches optimized on the observed ratings only,see results in [22]and compare to [10,4].Alternating least squares can be used for computationally efficient training of AllRank [22]and MF-RMSE.We found the following values for the tuning parameters in Eq.12for the Netflix data (see also [22])to yield the best results:r 0=2,w 0=0.005and λ=0.04for AllRank;and λ=0.07for MF-RMSE (and w 0=0);we use rank j o =50for both approaches.While there are several state-of-the-art approaches,e.g.,[6,9,10,16,19],that achieve a lower RMSE on observed rat-ings than MF-RMSE does,note that their test performances on all (unrated)items is quite similar to each other,e.g.,see [10,4].This is interesting,as some of these approaches,like [16,19,10,20],actually account in some sense for the MNAR nature of the data–but in the context of observed ratings (minimizing RMSE).Note that AllRank does not only apply to explicit feedback data,but can also be used for implicit feedback data,similar to [8,15].3.RANKINGIn this section,we examine ranking as a means for assess-ing recommendation accuracy.Ranking is a useful approach when the recommendation task is,for each user,to pick a small number,say N ,of items from among all available items in the collection.We divide the ranking task into two parts in this section:ranking protocols and ranking metrics,as outlined and compared to each other in the following.3.1Ranking ProtocolsWith ranking protocols,we refer to the fact as to which items are ranked.One may distinguish between two slightly。
cfa三级笔记
cfa三级笔记Monte Carlo simulation(专题)定义:Monte Carlo simulation allows asset manager to model the uncertainty of several key variables. Generates random outcomes according to assumed probability distribution for these key variables. It is flexible approach for exploring different market or investment scenario. 蒙特卡洛模拟是将变量(事先定义好分布)的值随机发射,生成了结果,可灵活的探索不同市场、投资环境下的状态。
较MVO的优势:1, Rebalancing and taxes, Monte carlo simulation allow to analyze different rebalancing policies and their cost over time(in multi-period situation). 蒙特卡洛模拟可以用于分析执行不同的再平衡策略、税收时的影响。
2, Path dependent. As there are cash out flow each year, terminal wealth(the portfolio’s value at a given point)will be path-dependent because of the interaction of cash flows and returns. 如果每年都有资金流出,指定时间的组合价值会受这些资金流出和收益的影响Cash flows in and out of a portfolio and the sequence of returns will have a material effect on terminal wealth, this is termed path-dependent.3, Monte Carlo can incorporate statistical properties outside the normal distribution, such as skewness and excess kurtosis.蒙特卡洛模拟可用于建模非正态分布。
The law of diminishing marginal utility
--The most basic econimic principle
marginal utility(边际效用)
• The term ‘marginal’ refers to a small change, starting from some baseline level. • In economics, the marginal utility of a good or service is the gain from an increase, or loss from a decrease, in the consumption of that good or service.
It has been proposed that different tasks may require different levels of arousal.
Results
• It dictates that performance increases with cognitive(认知的) arousal, but only to a certain point: when levels of arousal become too high, performance will decrease. A corollary(推论) is that there is an optimal(最佳的) level of arousal for a given task.
Who can be successful? It must be the man who can control his arousal level!
The Yerkes-Dodson law(耶克斯— 多德森定律)
Marginality in non-compatible random events
a rXiv:q uant-ph/0312183v122D ec23Marginality in non-compatible random events O ˇlga N´a n´a siov´a ∗,Andrei Yu.Khrennikov †Dept.of Math.and Descr.Geom.,Faculty of Civil Engineering,Slovak University of Technology,Radlinsk´e ho 11,81368Bratislava,Slovakia International Center for Mathematical Modeling in Physics and Cognitive Sciences University of V¨a xj¨o ,S-35195,Sweden Emails:olga@math.sk,Andrei.Khrennikov@msi.vxu.se February 1,2008We present a way of introducing joint distibution function and its marginal distribution functions for non-compatible observables.Each such marginal distribution function has the property of commutativity.Models based onthis approach can be used to better explain some classical phenomena in stochastic processes.1IntroductionLet(Ω,ℑ,P)be a probability space and letξ1,ξ2,ξ3be random variables. ThenFξ1,ξ2,ξ3=P({ω∈Ω;3i=1ξ−1i(−∞,r i)})is the distribution function and the marginal distribution function is defined by the following wayFξ1,ξ2(r1,r2)=limr3→∞Fξ1,ξ2,ξ3(r1,r2,r3).From the definition of a distribution function it follows,that all random variables are simultaneously measurable.It means,that they can be observ-able at the same time.Let(Ωi,ℑi,P i)be probability spaces and i=1,2,...,n be the time coor-dinate.Letξi be random variable on the probability space(Ωi,ℑi,P i).How to define the joint distribution function,now?In this paper,we will study such random events,which are not simul-taneously measurable.One of the approache to this problem is studing an algebraic strucute an orthomodular lattice(an OML)[1],[10]1Definition1.1Let L be a nonempty set endowed with a partial ordering ≤.Let there exists the greatest element1and the smallest element0.Weconsider operations supremum(∨),infimum∧(the lattice operations)and an map⊥:L→L defined as follows.(i)For any{a n}n∈A∈L,where A⊂N isfinite,n∈A a n,n∈Aa n∈L.(ii)For any a∈L(a⊥)⊥=a.(iii)If a∈L,then a∨a⊥=1.(iv)If a,b∈L such that a≤b,then b⊥≤a⊥.(v)If a,b∈L such that a≤b then b=a∨(a⊥∧b)(orthomodular law).Then(L,0,1,∨,∧,⊥)is said to be the orthomodular lattice(briefly the OML).Let L be an OML.Then elements a,b∈L will be called:•orthogonal(a⊥b)iffa≤b⊥;•compatible(a↔b)iffthere exist mutually orthogonal elements a1,b1,c∈L such thata=a1∨c and b=b1∨c.If a i∈L for any i=1,2,...,n and b∈L is such,that b↔a i for all i,then b↔ n i=1a i andb∧(ni=1a i)=ni=1(a i∧b)([1],[?],[10]).Let a,b∈L.It is easy to show,that a↔b if and only if a=(a∨b)∧(a∨b⊥)(distributive law).Moreover,L is a Boolean algebra if and only if L is distributive.The well known example of an OML is the lattice of orthogonal projectors in a Hilbert space.Let(Ω,ℑ,P)be a probability space.Then a statement A is represented as a measurable subset ofΩ(A∈ℑ).For example,if we say A or B it means A∪B and non A it means A c(the set complement inΩ).If a basic structure is an OML,then a and b it means infimum(a∧b),a or b it means supremum(a∨b)and non a it means a⊥.If(Ω,ℑ,P)is a probability space,then for any A,B∈ℑA=(A∩B)∪(A∩B c).If L is an OML,then for any a,b∈La≥(a∧b)∨(a∧b⊥).Example1Let L be the Hilbert space R2.Then1:=R2and0:=[0,0].If a∈L−{1,0},then a is a linear subspace of R2,it means that a is a line, which contains the point[0,0].We can write,that a:y=k a x.Let a,b∈L, a=b.If a:y=k a x,b:y=k b x,then a⊥:y=−1(i)m(0)=0and m(1)=1.(ii)If a⊥b then m(a∨b)=m(a)+m(b)is called a state on L.Let B(R)be aσ-algebra of Borel sets.A homomorphism x:B(R)→L is called an observable on L.If x is an observable,then R(x):={x(E);E∈B(R)}is called a range of the observable x.It is clear that R(x)is a Boolean algebra[Var].A spectrum of an observable x is defined by the following way:σ(x)=∩{E∈B(R);x(E)=1}.If g is a real function,then g◦x is such observable on L that:(1.)R(g◦x)⊂R(x);(2.)σ(g◦x)={g(t);t∈σ(x)};(3.)for any E∈B(R)g◦x(E)=x({t∈σ(x);g(t)∈E}).We say that x and y are compatible(x↔y)if there exists a Boolean sub-algebra B⊂L such that R(x)∪R(y)⊂B.In other words x↔y if for any E,F∈B(R),x(E)↔y(F).We call an observable x afinite ifσ(x)is afinite set.It means,that σ(x)={t i}n i=1,n∈N.Let us denote O the set of allfinite observables on L.A state is an analogical notion to the probability measure,an observable is analogical to a random variable.2s-mapLet L be an OML.In the papers[8],[9]is defined s-map in the following way:Definition2.1(simult)[rpesent]Let L be an OML.The map p:L2→[0,1]will be called s-map if the following conditions hold:(s1)p(1,1)=1;(s2)if there exists a⊥b,then p(a,b)=0;(s3)if a⊥b,then for any c∈L,p(a∨b,c)=p(a,c)+p(b,c)p(c,a∨b)=p(c,a)+p(c,b).The s-map allows us e.g.to define a conditional probability for non compatible random events,a joint distribution,a conditional expectation and covariance for non compatible observales.Such random events cannot be described the classical probabilty theory[5].This problems are studed in for example in[6]-[9].In this section we will introduce n-dimensional an s-map(briefly an s n-map)and we will show its basic properties.Definition2.2Let L be an OML.The map p:L n→[0,1]will be called an s n-map if the following conditions hold:(s1)p(1,...,1)=1;(s2)if there exist i,such that a i⊥a i+1,then p(a1,...,a n)=0;(s3)if a i⊥b i,thenp(a1,...,a i∨b i,...,a n)=p(a1,...,a i,...,a n)+p(a1,...,b i,...,a n), for i=1,...,n.Proposition2.1Let L be an OML and let p be an s n-map.Then(1)if a i⊥a j,then p(a1,...,a n)=0;(2)for any a∈L,a mapν:L→[0,1],such thatν(a):=p(a,...,a)is astate on L;(3)for any(a1,...,a n)∈L n p(a1,...,a n)≤ν(a i)for each i=1,...,n;(4)if a i↔a j,thenp(a1,...,a n)=p(a1,...,a i−1,a i∧a j,...,a j∧a i,a j+1,...,a n). Proof.(1)It is enought to prove,that p(a1,...,a n)=0if a1⊥a n.Let(a1,...,a n)∈L n and let a1⊥a n.Then0≤p(a1,...,a n)≤p(a1,...,a n−1,a n)+p(a1,...,a⊥n−1,a n)=p(a1,...,a n−2,1,a n)=p(a1,...,a n−2,a n,a n)+p(a1,...,a n−2,a⊥n,a n)=p(a1,...,a n−2,a n,a n)≤...≤p(a1,a n,...,a n)=0.From this follows,that p(a1,...,a n)=0.(2)It is clear,thatν(0)=0,andν(1)=1.Let a,b∈L,such that a⊥b.Thenν(a∨b)=p(a∨b,...,a∨b)=p(a,a∨b,...,a∨b)+p(b,a∨b,...,a∨b)=p(a,a,a∨b,...,a∨b)+p(a,b,a∨b,...,a∨b)+p(b,a,a∨b,...,a∨b)+p(b,b,a∨b,...,a∨b)=p(a,a,a∨b,...,a∨b)+p(b,b,a∨b,...,a∨b)=....=p(a,...,a)+p(b,...,b)=ν(a)+ν(b).From it follows,thatνis a state on L.(3)Let(a1,...,a n∈L n.Then for any i=1,...,n we havep(a1,...,a i,...,a n)≤p(a1,...,a i,...,a n)+p(a⊥1,...,a i,...,a n) and sop(a1,a2,...,a i,...,a n)≤p(1,a2,...,a i,...,a n)=p(a i,a2,...,a i,...,a n).From it follows,thatp(a i,a2,...,a i,...,a n)≤p(a i,1,...,a i,...,a n)=p(a i,a i,a3,...,a i,...,a n).Hencep(a1,...,a n)≤p(a i,...,a i)=ν(a i).(4)Let a,b∈L,such that a↔b.Then a=(a∧b)∨(a∧b⊥)andb=(b∧a)∨(b∧a⊥).Let(a1,...,a n)∈L n and let a1⊥a2.Thenp(a1,a2,...,a n)=p((a1∧a2)∨(a1∧a2⊥),a2,...,a n).From the property(s3)and for the property(1)we getp(a1,a2,...,a n)=p(a1∧a2,a2,...,a n).And hencep(a1,a2,a3,...,a n)=p(a1∧a2,a2∧a1,a3...,a n).(Q.E.D.)Let¯a=(a1,...,a n)∈L n.Let us denoteπ(¯a)a permutation of(a1,...,a n).Proposition2.2Let L be an OML.Let p be an s n-map and let(a1,...,a n)∈L n.(1)If there exists i∈{1,...,n},such that a i=1,thenp(a1,...,a n)=p(a1,...,a i−1.a j,a i+1,...,a n)for each j=1,...,n.(2)If there exist i=j such that a i=a j,thenp(a1,...,a n)=p(π(a1,...,a n)).(3)If there exist i,j such that a i↔a j,thenp(a1,...,a n)=p(π(a1,...,a n)).Proof.(1)Let a i=1and let i=j.Then a i=a j∨a⊥j and from the Proposition2.1.(1)follows that p(a i,...,a i−1,a⊥j,a j+1,...a n)=0.From the property(s3)we getp(a1,...,a i−1,1,a i+1,...a n)=p(a1,...,a i−1,a j,a i+1,...a n)+p(a1,...,a i−1,a⊥j,a i+1,...a n).And sop(a1,...,a i−1,1,a i+1,...a n)=p(a1,...,a i−1,a j,a i+1,...a n).(2)If n=2and a1=a2then it is clear that p(a1,a2)=p(a2,a1).Letn≥3and let1=i and i=n.Let a1=a n=a.It is enought to prove, thatp(a,a2,...,a i,...,a n−1,a)=p(a i,a2,...,a i−1,a,a i+1,...,a n−1,a).From the(1)we havep(a,a2,...,a i,...,a n−1,a)=p(1,a2,...,a i−1,a i,a i+1,...,a n−1,a).From it follows,thatp(1,a2,...,a i,...,a n−1,a)=p(a i,a2,...,a i−1,a i,a i+1,...,a n−1,a)andp(a i,a2,...,a i,...,a n−1,a)=p(a i,a2,...,a i−1,1,a i+1,...,a n−1,a).From the aditivity it follows,thatp(a i,a2,...,a i−1,1,a i+1,...,a n−1,a)=p(a i,a2,...,a i−1,a,a i+1,...,a n−1,a).Hencep(a,a2,...,a i−1,a i,a i+1,...,a n−1,a)=p(a i,a2,...,a i−1,a1,a i+1,...,a n−1,a n).From it follows that p(a1,...,a n)p(π(a1,...,a n),if there exist i,j,suchthat i=j and a i=a j.(3)Let a1↔a n.Thenp(a1,...,a n)=p(a1∧a n,a2,...,a n−1,a n∧a1).Because a1∧a n=a n∧a1and from the property(2)it follows,thatp(a1,...,a n)=p(π(a1,...,a n)).(Q.E.D.)LetΠ(¯a)be the set of all permutions and let¯a(i)=(a1,...,a k−1,a k,a k+1,...,a i−1,a k,a i+1,...,a n).(k)Corollary2.2.1Let L be an OML.Let p be an s n-map and let¯a∈L n.(1)If there exists i∈{1,...,n},such that a i=1,thenp(¯a)=p(¯b)for each¯b∈ kΠ(¯a(i)(k)).(2)If there exist i=j such that a i=a j,thenp(¯a)=p(¯b)for each¯b∈ kΠ(¯a(i)(k)).(3)If there exist i,j such that a i↔a j,thenp(¯a)=p(¯b)for each¯b∈ kΠ(¯a(i)(k)).Example2.1Let n=3and a,b∈L.If¯a=(a,a,b),thenΠ(¯a)={(a,a,b),(b,a,a),(a,b,a)}and¯a(1)(3)=(b,a,b),¯a(2)(3)=(a,b,b),¯a(1)(2)=(a,a,b).Hencep(a,a,b)=p(a,b,a)=p(b,a,a)=p(b,b,a)=p(a,b,b)=p(b,a,b).Let n=4and a,b,c∈L.If¯a=(a,b,c,c),then¯a(4)(2)=(a,b,c,b)and p(a,a,b,c)=p(a,b,c,a)=p(b,b,c,a)=...=p(c,a,b,c).3The joint distribution function and marginal distribution funtionsDefinition3.1Let L be an OML and let p be an s n-map.If x1,...,x2are observables on L,then the mapp x1,...,x n:B(R)n→[0,1],such thatp x1,...,x n(E1,...,E n)=p(x1(E1),...,x n(E n))is called the joint distribution of the observables x1,...,x n.Definition3.2Let L be an OML and let p be an s n-map.If x1,...,x2be observables on L,then the mapF x1,...,x n:R n→[0,1],such thatF x1,...,x n(r1,...,r n)=p(x1(−∞,r1),...,x n(−∞,r n))is called the joint distribution function of the observables x1,...,x n.Definition3.3Let L be an OML and let p be an s n-map.If x1,...,x2be observables on L,then a marginal distribution function islim x i→∞F x1,...,x i,...,x n(r1,...,r i,...,r n).Definition3.4Let L be an OML and let p be an s n-map.Let x1,...,x2beobservables on L and F x1,...,x nbe the joint distribution function of the observ-ables x1,...,x n.Then we say,that F x1,...,x nhas the property of commutativity if for each(r1,...,r n)∈R nF x1,...,x n (r1,...,r n)=Fπ(x1,...,x n)(π(r1,...,r n)).It is clear that F x1,...,x nhas the property of commutativity if and only if p(x1(E1),...,x n(E n))=p(π(x1(E1),...,x n(E n))),for each E i∈B(R),i=1,...,n.Proposition3.1Let L be an OML and let p be an s n-map.Let x1, (x2)O and let F x1,...,x n(r1,...,r n)be the joint distribution function of the observ-ables x1,...,x n.(1)For each(r1,...,r n)∈R n0≤F x1,...,x n(r1,...,r n)≤1;(2)If r i≤s i,then F x1,...,x n (r1,...,r i,...,r n)≤F x1,...,x n(r1,...,s i,....,r n).(3)For each i=1,...,nlim r i→∞F x1,...,x n(r1,...,r n)=F x1,...,x n(r1,...r i−1,1,r i+1,...,r n).(4)For each i=1,...,nlim r i→−∞F x1,...,x n(r1,...,r n)=0.(5)If there exist i,j,such that i=j and x i↔x j,thenF x1,...,x n (r1,...,r n)=Fπ(x1,...,x n)(π(r1,...,r n)).Proof.(1)It follows directly from the definition of the function F x1,...,x n.(2)Let r i≤s i.Then(−∞,r i)⊆(−∞,s i)and so x i((−∞,r i))≤x i((−∞,r i))and x i((−∞,s i))=x i((−∞,r i))∨x i([r i,s i)).From it follows,thatF x1,...,x n(r1,...,s i,...,r n)=p(x1((−∞,r1)),...,x i((−∞,r i)),....,x n((−∞,r n))+p(x1((−∞,r1)),....,x i([r i,s)),....,x n((−∞,r n))=F x1,...,x n(r1,...,r i,...,r n)+p(x1((−∞,r1)),....,x i([r i,s)),....,x n((−∞,r n)) and soF x1,...,x n (r1,...,s i,...,r n)≥F x1,...,x n(r1,...,r i,...,r n).(3)Because x i∈O,then there exist r i0∈R,such that for any r≥r i0σ(x i)⊆(−∞,r)and so x i(−∞,r))=1.Hencelim r i→∞F x1,...,x n(r1,...,r n)=F x1,...,x n(r1,...r i−1,1,r i+1,...,r n).(4)Because x i∈O,then there exist r i0∈R,such that for each r≤r i0(−∞,r)∩σ(x i)=∅and so x i(−∞,r))=0.Hencelim r i→−∞F x1,...,x n(r1,...,r n)=0.(5)Because F x1,...,x n(r1,...,r n)=p(x1((−∞,r1)),...,x n(((−∞,r n)),then itfollows directly from the Proposition2.2.(Q.E.D.)Proposition3.2Let L be an OML and let p be an s n-map.Let x1,...,x n∈O and let F x1,...,x n(r1,...,r n)be the joint distribution patibility of just two observables imply the total commutativity.Proof.It follows directly from the definition of the joint distribution function and from the Proposition2.2.Proposition3.3Let L be an OML and let x1,...,x n∈O.Then there exist a probability space(Ω,ℑ,P)and random variablesξ1,...,ξn on it,such thatF x1,...,x n (r1,...,r n)=Fξ1,...,ξn(r1,...,r n)and Pξisuch thatPξi((−∞,r))=ν(x i(−∞,r)),where r∈R and i=1,...,n is the probability distribution of the random varaibleξi,Proof.LetΩ=σ(x1)×...×σ(x n)and letℑ=2Ω.Then eachω=(r1,...,r2)ξi(ω1,...,ωn)=ωi.Let A⊂ℑand let P:ℑ→[0,1],such thatP(A)=ω∈Ap(x1(ξ1(ω)),...,x n(ξn(ω))).F x1,...,x n (r1,...,r n)=Fξ1,...,ξn(r1,...,r n).It is clear,that P(∅)=0and P(Ω)=P(σ(x1)×σ(x n))=1.Let A,B∈ℑ, such that A∩B=∅.ThenP(A∪B)=ω∈A∪Bp(x1(ξ1(ω)),...,x n(ξn(ω)))and soP(A∪B)=ω∈A p(x1(ξ1(ω)),...,x n(ξn(ω)))+ω∈Bp(x1(ξ1(ω)),...,x n(ξn(ω))).From it follows thatP(A∪B)=P(A)+P(B).From the fact,tahtΩis thefinite set follows,that P is theσ-aditive measure and so(Ω,ℑ,P)has the same properties as a classical probability space and ξi:Ω→R is a measurable function on it.For each r∈RPξ1((−∞,r))=P(ξ−11(−∞,r))=P((−∞,r)×σ(x2)×...×σ(x n))and thenPξ1(−∞,r)=ω∈(−∞,r)×σ(x2)×...×σ(x n)p(x1(ξ1(ω)),...,x n(ξn(ω))).From it follows,thatPξ1(−∞,r)=p(x1((−∞,r)),1,...,1)=ν(x1(−∞,r)).From the defnition of the marginal distributiuon function it follows,thatν(x i(−∞,r))=Fξi(r)is the distribution function for the observableξi andp x1,...,x n (r1,...,r n)=Fξ1,...,ξn(r1,...,r n).is a joint distribution for that vector of random variables(ξ1,...,ξn).(Q.E.D.) If we consider a quantum model as an OML,a marginal distribution function defined by using an s n-map has the property of commutativity.It follows that,in general,it need not true that thatF x1,...,x n (t1,...,t n)=F x1,...,x n+1(t1,...,t n,∞),where F x1,...,x n (t1,...,t n),F x1,...,x n+1(t1,...,t n,∞)are joint distribution fun-tions and x1,...,x n+1are observables on L.Consequently,we canfind such an s n-map and an s n+1-map such thatp(a1,...,a n)=p(a1,...,a n,1)on L.Moreover ifp(a1,...,a n)=p(a1,...,a n,1)on L,then the s n-map has the property of commutativity.This is not true in general,either([8],[9]).Example3.1Let L={a,a⊥,b,b⊥,c,c⊥,0,1}.Let x,y,z∈O.Letσ(x)=σ(y)=σ(z)={−1,1}.Let x(1)=a,x(1)=b and z(1)=c.Let an s3-map be defined by the following way:p(a,a,a)=0.3,p(b,b,b)=0.4,p(c,c,c)=0.5,p(a,b,1)=0.1,p(a,b⊥,1)=0.2,p(a⊥,b,1)=0.3,p(a⊥,b⊥,1)=0.4,p(a,c,1)=0.2,p(a,c⊥,1)=0.1,p(a⊥,c,1)=0.3,p(a⊥,c⊥,1)=0.4, p(b,c,1)=0.2,p(b,c⊥,1)=0.2,p(b⊥,c,1)=0.3,p(b⊥,c⊥)=0.3, p(a,b,c)=0,p(a,b,c⊥)=0.1,p(a,b⊥,c)=0.2,p(a,b⊥,c⊥)=0,p(a⊥,b,c)=0.2,p(a⊥,b,c⊥)=0.1,p(a⊥,b⊥,c)=0.1,p(a⊥,b⊥,c⊥,1)=0.3,p(b,a,c)=0.1,p(b,a,c⊥)=0,p(b⊥,a,c)=0.1,p(b⊥,a,c⊥)=0.1,p(b,a⊥,c)=0.1,p(b,a⊥,c⊥)=0.2,p(b⊥,a⊥,c)=0.2,p(b⊥,a⊥,c⊥,1)=0.2,p(c,a,b)=0.01,p(c,a,b⊥)=0.19,p(c,a⊥,b)=0.19,p(c,a⊥,c⊥)=0.11, p(c⊥,a,b)=0.09,p(c⊥,a,b⊥)=0.01,p(c⊥,a⊥,b)=0.11,p(c⊥,a⊥,b⊥,1)=0.29,p(a,b,c)=p(a,c,b),p(b,a,c)=p(b,c,a),p(c,a,b)=p(c,b,a).................................................................p(a⊥,b⊥,c⊥)=p(a⊥,c⊥,b⊥),p(b⊥,a⊥,c⊥)=p(b⊥,c⊥,a⊥),p(c⊥,a⊥,b⊥)=p(c⊥,b⊥,a⊥). Then p is an s3-map andF x1,x2,x3(1,1,1)=p(a⊥,b⊥,c⊥)=0.3,F x2,x1,x3(1,1,1)=p(b⊥,a⊥,c⊥)=0.2,F x3,x2,x1(1,1,1)=p(c⊥,b⊥,a⊥)=0.29,lim r1→∞F x1,x2,x3(r1,r2,r3)=p(1,y(r2),z(r3))=p(1,z(r3),z(r2))=limr1→∞F x1,x3,x2(r1,r3,r2),where r2,r3∈R.ReferencesReferences[1]A.Dvureˇc enskij,S.Pulmannov´a,New Trends in Quantum Structures,Kluwer Acad.Publ.,(2000).[2]A.Yu.Khrennikov,Contextual viewpoint to quantum stochastics,J.Math.Phys.,44,N.6,2471-2478(2003).[3]A.Yu.Khrennikov,Contextual viewpoint to quantum stochastics.J.Math.Phys.,44,N.6,2471-2478(2003).[4]A.Yu.Khrennikov,Representation of the Kolmogorov model having alldistinguishing features of quantum probabilistic model.Phys.Lett.A, 316,279-296(2003).[5]A.N.Kolmogoroff,Grundbegriffe der Wahrscheikchkeitsrechnung,Springer,Berlin,(1933).[6]O.N´a n´a siov´a,A note on the independent events on a quantum logic,Busefal,vol.76,53-57,(1998).[7]N´a n´a siov´a O.,Principle conditioning,Sent to Int.Jour.of Theor.Phys.,(2001).[8]O.N´a n´a siov´a,Map for Simultaneus Measurements for a Quantum Logic,Int.Journ.of Theor.Phys.,Vol.42,No.8,1889-1903,(2003).[9]O.N´a n´a siov´a,A.Yu.Khrennikov,Observables on a quantum logic.Foundation of Probability and Physics-2,Ser.Math.Modelling in Phys., Engin.,and Cogn.Sc.,vol.5,417-430,V¨a xj¨o Univ.Press,(2002).[10]Varadarajan V.,Geometry of quantum theory,Princeton,New Jersey,D.Van Nostrand,(1968)。
国际财务管理课后习题答案chapter 10---精品管理资料
CHAPTER 10 MANAGEMENT OF TRANSLATION EXPOSURESUGGESTED ANSWERS AND SOLUTIONS TO END—OF—CHAPTERQUESTIONS AND PROBLEMSQUESTIONS1. Explain the difference in the translation process between the monetary/nonmonetary method and the temporal method.Answer: Under the monetary/nonmonetary method, all monetary balance sheet accounts of a foreign subsidiary are translated at the current exchange rate。
Other balance sheet accounts are translated at the historical rate exchange rate in effect when the account was first recorded。
Under the temporal method, monetary accounts are translated at the current exchange rate。
Other balance sheet accounts are also translated at the current rate, if they are carried on the books at current value。
If they are carried at historical value, they are translated at the rate in effect on the date the item was put on the books. Since fixed assets and inventory are usually carried at historical costs, the temporal method and the monetary/nonmonetary method will typically provide the same translation.2。
marginal fisher analysis
marginal fisher analysis[marginal fisher analysis]In economics, Marginal Fisher Analysis is a technique used to assess the economic performance of a firm or industry by analyzing the marginal fisher ratio. The Marginal Fisher ratio measures the incremental change in a firm's profitability relative to a change in its level of investment. This analysis helps in evaluating the efficiency and profitability of a firm in allocating its resources and making investment decisions.The first step in conducting a Marginal Fisher Analysis is to gather the necessary financial data of the firm or industry under consideration. This data includes the firm's income statement, balance sheet, and cash flow statement. It is important to consider multiple years of data to establish any underlying trends before conducting the analysis.Once the financial data is collected, the next step is to calculate the various components of the Marginal Fisher ratio. The Marginal Fisher ratio is calculated using the formula:Marginal Fisher Ratio = (Change in Profitability) / (Change in Investment)To calculate the change in profitability, the net income of the current period is subtracted from the net income of the previous period. This provides the measure of how much the firm's profitability has changed over time.To calculate the change in investment, the total investment of the current period is subtracted from the total investment of the previous period. This measures the change in the firm's investment level over time.Once the components of the Marginal Fisher ratio are calculated, the actual ratio can be determined by dividing the change in profitability by the change in investment. A ratio greater than 1 indicates that incremental investment has resulted in a greater increase in profitability, indicating higher efficiency in resource allocation.A ratio less than 1 suggests that the firm's investment decisions have not been effective in generating profitability, as the change ininvestment has not resulted in a proportionate change in profitability. This may indicate inefficient resource allocation or poor investment decisions.However, it is important to note that a single Marginal Fisher ratio may not provide a complete picture of a firm's performance. Multiple periods of analysis are usually necessary to identify any underlying trends or patterns.In addition, it is important to compare the Marginal Fisher ratio with industry benchmarks or competitor performance to assess the relative performance of the firm. A higher Marginal Fisher ratio compared to the industry average indicates a better performance in resource allocation and investment decisions.Moreover, Marginal Fisher analysis can be used to determine the optimal level of investment for a firm. By analyzing the ratio at different investment levels, it is possible to identify the point at which increasing investment no longer leads to a proportionate increase in profitability. This helps in avoiding overinvestment or underinvestment situations.Overall, Marginal Fisher Analysis provides valuable insights into a firm's performance, resource allocation efficiency, and investment decision-making. By analyzing the Marginal Fisher ratio, managers can identify areas of improvement and make informed decisions regarding investment allocation.。
Singular values and eigenvalues of tensors a variational approach
哥伦比亚大学佩里梅林货币银行学中英翻译4-微观和宏观的货币观
The Money View, Micro and Macro微观和宏观的货币观(see full matrix at beginning) Notable features—household deleveraging, switching from credit to money, instrument discrepancy is repo, sectoral discrepancies(⻅开始的完整矩阵)显着特征——家庭去杠杆化,信贷向货币的转变,回购⼯具分化,部⻔分化Last time we saw how the US banking system was born from the strains of war finance andfinancial crisis, and we also saw how understanding balance sheet relationships can help us to understand the underlying processes. Today we focus more specifically on the balance sheet approach that will be used throughout the course, and to aid that focus we confine our discussion to the most placid of events, namely the use of the banking system to facilitate ordinary daily exchange.上⼀次,我们看到了美国银⾏体系是如何在战争⾦融和⾦融危机的压⼒下诞⽣的,我们还看到了理解资产负债表的关系如何帮助我们理解基本流程。
今天,我们将更具体地关注在整个课程中使⽤的资产负债表⽅法,为了有助于集中精⼒,我们将讨论限制在最普遍的事件上,即使⽤银⾏系统促进⽇常交易。
Level set methods An overview and some recent results
Level Set Methods:An Overview and Some Recent Results∗Stanley Osher†Ronald P.Fedkiw‡September5,2000AbstractThe level set method was devised by Osher and Sethian in[64]as a simple and versatile method for computing and analyzing the motion of an interfaceΓin two or three dimensions.Γbounds a(possibly multiply connected)regionΩ.The goal is to compute and analyze the subsequent motion ofΓunder a velocityfield v.This velocity can depend on position,time,the geometry of the interface and the external physics.The interface is captured for later time as the zero level set of a smooth(at least Lipschitz continuous)functionϕ( x,t), i.e.,Γ(t)={ x|ϕ( x,t)=0}.ϕis positive insideΩ,negative outsideΩand is zero onΓ(t).Topological merging and breaking are well defined and easily performed.In this review article we discuss recent variants and extensions, including the motion of curves in three dimensions,the Dynamic Sur-face Extension method,fast methods for steady state problems,diffu-sion generated motion and the variational level set approach.We also give a user’s guide to the level set dictionary and technology,couple the method to a wide variety of problems involving external physics, such as compressible and incompressible(possibly reacting)flow,Ste-fan problems,kinetic crystal growth,epitaxial growth of thinfilms,vortex dominatedflows and extensions to multiphase motion.We con-clude with a discussion of applications to computer vision and image processing.21IntroductionThe original idea behind the level set method was a simple one.Given an interfaceΓin R n of codimension one,bounding a(perhaps multiply connected)open regionΩ,we wish to analyze and compute its subsequent motion under a velocityfield v.This velocity can depend on position,time, the geometry of the interface(e.g.its normal or its mean curvature)and the external physics.The idea,as devised in1987by S.Osher and J.A. Sethian[64]is merely to define a smooth(at least Lipschitz continuous) functionϕ(x,t),that represents the interface as the set whereϕ(x,t)=0. Here x=x(x1,...,x n)εR n.The level set functionϕhas the following propertiesϕ(x,t)>0for x∈Ωϕ(x,t)<0for x∈¯Ωϕ(x,t)=0for x∈∂Ω=Γ(t)Thus,the interface is to be captured for all later time,by merely locat-ing the setΓ(t)for whichϕvanishes.This deceptively trivial statement is of great significance for numerical computation,primarily because topolog-ical changes such as breaking and merging are well defined and performed “without emotional involvement”.The motion is analyzed by convecting theϕvalues(levels)with the velocityfield v.This elementary equation is∂ϕ|∇ϕ|,so(1)becomes∂ϕWe emphasize that all this is easy to implement in the presence of bound-ary singularities,topological changes,and in2or3dimensions.Moreover, in the case which v N is a function of the direction of the unit normal(as in kinetic crystal growth[62],and Uniform Density Island Dynamics[15],[36]) then equation(2)becomes thefirst order Hamilton-Jacobi equation∂ϕ|∇ϕ|.High order accurate,essentially non-oscillatory discretizations to general Hamilton-Jacobi equations including(3)were obtained in[64],see also[65] and[43].Theoretical justification of this method for geometric based motion came through the theory of viscosity solutions for scalar time dependent partial differential equations[23],[30].The notion of viscosity solution(see e.g.[8, 27])–which applies to a very wide class of these equations,including those derived from geometric based motions–enables users to have confidence that their computer simulations give accurate,unique solutions.A particularly interesting result is in[29]where motion by mean curvature,as defined by Osher and Sethian in[64],is shown to be essentially the same motion as is obtained from the asymptotics in the phasefield reaction diffusion equation. The motion in the level set method involves no superfluous stiffness as is required in phasefield models.As was proven in[53],this stiffness due to a singular perturbation involving a small parameter will lead to incorrect answers as in[48],without the use of adaptive grids[59].This is not an issue in the level set approach.The outline of this paper is as follows:In section2we present recent vari-ants,extensions and a rather interesting selection of related fast numerical methods.This section might be skipped atfirst,especially by newcomers to this subject.Section3contains the key definitions and basic level set tech-nology,as well as a few words about the numerical implementation.Section 4describes applications in which the moving interfaces are coupled to ex-ternal physics.Section5concerns the variational level set approach with applications to multiphase(as opposed to two phase)problems.Section6 gives a very brief introduction to the ever-increasing use of level set method and related methods in image analysis.42Recent Variants,Extensions and Related Fast Methods2.1Motion of Curves in Three Spatial DimensionsIn this section we discuss several new and related techniques and fast nu-merical methods for a class of Hamilton-Jacobi equations.These are all relatively recent developments and less experienced readers might skip this section atfirst.As mentioned above,the level set method was originally developed for curves in R2and surfaces in R3.Attempts have been made to modify it to handle objects of high codimension.Ambrosio and Soner[5]were interested in moving a curve in R3by curvature.They used the squared distance to the curve as the level set function,thusfixing the curve as the zero level set, and evolved the curve by solving a PDE for the level set function.The main problem with this approach is that one of the most significant advantages of level set method,the ability to easily handle merging and pinching,does not carry over.A phenomenon called“thickening”emerges,where the curve develops an interior.Attempts have also been made in other directions,front tracking,e.g. see[41].This is where the curve is parameterized and then numerically rep-resented by discrete points.The problem with this approach lies infinding when merging and pinching will occur and in reparameterizing the curve when it does.The representation we derived in[13]makes use of two level set functions to model a curve in R3,an approach Ambrosio and Soner also suggested but did not pursue because the theoretical aspects become very difficult.In this formulation,a curve is represented by the intersection between the zero level sets of two level set functionsφandψ,i.e.,where φ=ψ=0.From this,many properties of the curve can be derived such as the tangent vectors, T=∇ψ×∇φA simple example involves moving the curve according to its curvature vec-tors,for which v=κ N.We have shown that this system can also be obtained by applying a gradient descent algorithm minimizing the length ofthe curve,L(φ,ψ)=R3|∇ψ×∇φ|δ(ψ)δ(φ)d x.This follows the general procedure derived in[88]for the variational level set method for codimension one motion,also described in[90].Numerical simulations performed in[13]on this system of PDE’s,and shown infigures 1and2,show that merging and pinching offare handled automatically and follow curve shortening principles.We repeat the observation made above that makes this sort of motion easily accessible to this vector valued level set ly all geometric properties of a curveΓwhich is expressed as the zero level set of the vector equationφ(x,y,z,t)=0ψ(x,y,z,t)=0can easily be obtained numerically by computing discrete gradients and higher derivatives of the functionsφandψrestricted to their common zero level set.This method will be used to simulate the dynamics of defect lines as they arise in heteroepitaxy of non-lattice notched materials,see[79]and[80]for Lagrangian calculations.An interesting variant of the level set method for geometry based mo-tion was introduced in[53]as diffusion generated motion,and has now been generalized to forms known as convolution generated motion or threshold dy-namics.This method splits the reaction diffusion approach into two highly simplified steps.Remarkably,a vector valued generalization of this ap-proach,as in the vector valued level set method described above gives an alternative approach[74]to easily compute the motion(and merging)of curves moving normal to themselves in three dimensions with velocity equal to their curvature.2.2Dynamic Surface Extension(DSE)Anotherfixed grid method for capturing the motion of self-intersecting in-terfaces was obtained in[73].This is afixed grid,interface capturing formu-lation based on the Dynamic Surface Extension(DSE)method of Steinhoff6et.al.[82].The latter method was devised as an alternative to the level set method of Osher and Sethian[64]which is needed to evolve wavefronts according to geometric optics.The problem is that the wavefronts in this case are supposed to pass through each other–not merge as in the viscos-ity solution case.Ray-tracing can be used but the markers tend to diverge which leads to loss of resolution and aliasing.The original(ingenious)DSE method was not well suited to certain fundamental self intersection problems such as formation of swallowtails.In [73]we extended the basic DSE scheme to handle this fundamental problem, as well as all other complex intersections.The method is designed to track moving setsΓof points of arbitrary (perhaps changing)codimension,moreover there is no concept of“inside”or“outside”.The method is,in some sense,dual to the level set method. In the latter,the distance representation is constant tangential to a surface. In the DSE method,the closest point to a surface is constant in directions orthogonal to the surface.The version of DSE presented in[73]can be described as follows:For each point in R n,set the tracked pointed T P( x)equal to CP( x)the closest point(to x)on the initial surfaceΓ0.Set N equal to the surface normal at the tracked point T P( x).Let v(T P( x))be the velocity of the tracked point.Repeat for all steps:(1)Evolve the tracked point T P( x)according to the local dynamics T P( x)t=v(T P( x)).(2)Extend the surface representation by resetting each tracked point T P( x)equal to the true closest point CP( x)on the updated surfaceΓ,where Γis defined to be the locus of all tracked points,i.e.Γ={T P( x)| xεR n}.Replace each N( x)by the normal at the updated T P( x).This method treats self intersection by letting moving sets pass through each other.This is one of its main virtues in the ray tracing case.However, it has other virtues–namely the generality of the moving set–curves can end or change dimension.An important extension is motivated by consideringfirst arrival times. This enables us to easily compute swallowtails,for example,and other sin-gular points.We actually use a combination of distance and direction of7motion.One interesting choice arises when nodal values of T P ( x )are set equal to the “Minimizing Point”MP ( x )=min y εInterfaceβ|( x − y )· N ⊥( y )|+ x − y 2for β>0(rather than CP ( x )),since a good agreement with the minimal arrival time representation is found near the surface.Recall that the minimal arrival time at a point x is the shortest time it takes a ray emanating from the surface to reach x .Using this idea gives a very uniform approximation and naturally treats the prototype swallowtail problem.For the special case of curvature dependent motion we may use an elegant observation of DeGiorgi [28].Namely the vector mean curvature for a surface of arbitrary codimension is given by κ N=−∆∇ d 2as in[66]for the level set method would be of great practical importance. It would be particularly interesting to determine if surfaces fatten(or de-velop interiors)when mergers occur.See[9]for a detailed discussion of this phenomenon.Additionally in[73]we successfully calculated a geometric optics ex-pansion by retaining the wave front curvature.Thus this method has the possibility of being quite useful in electromagnetic calculations.We hope to investigate its three dimensional performance and include the effects of diffraction.2.3A Class of Fast Hamilton-Jacobi SolversAnother important set of numerical algorithms involves the fast solution of steady(time independent)Hamilton-Jacobi equations.We also seek meth-ods which are faster than the globally defined schemes originally used to solve equation2.The level set method of Osher and Sethian[64]for time dependent problems can be localized.This means that the problemϕt+ v·∇ϕ=0withΓ(t)={ x|ϕ( x,t)=0}as the evolving front,can be solved locally near Γ(t).Several algorithms exist for doing this,see[66]and[2].These both report an O(N)algorithm where N is the total number of grid points on or near the front.However,the algorithm in[66]has O(N log(N))complexity because a partial differential equation based reinitialization step requires log(1then we proved in[63]that the t level set{ x|ψ( x)=t}=Γ (t)is the same as the zero level setΓ(t)ofϕ( x,t),for t>0whereϕsatisfies˜Hx,−∇ϕc( x)=a( x)>0.So wefindfirst arrival times instead of zero level sets.In[86]J.N.Tsitsiklis devised a fast algorithm for the eikonal equation. He obtained the viscosity solution using ideas involving Dijkstra’s algorithm, adapted to the eikonal equation,heap sort and control theory.From a nu-merical PDE point of view,however,Tsitsiklis had an apparently nonstan-dard approximation to|∇ψ|on a uniform Cartesian grid.In(1995)Sethian[76]and Helmsen et.al.[40]independently published what appeared to be a simpler algorithm making use of the Rouy-Tourin al-gorithm to approximate|∇ϕ|.This has become known as the“fast marching method”.However,together with Helmsen[61]we have proven that Tsit-siklis’approximation is the usual Rouy-Tourin[69]version of Godunov’s monotone upwind scheme.That is,the algorithm in[76]and[40]is simply Tsitsiklis’algorithm with a different(simpler)exposition.Our goal here is to extend the applicability of this idea from the eikonal equation to any geometrically based Hamiltonian.By this we mean a Hamil-tonian satisfying the properties:H( x,∇ψ)>0,if∇ψ= 0(4) andH( x,∇ψ)is homogeneous of degree one in∇ψ(5)10We wish to obtain a fast algorithm to approximate the viscosity solution of˜H ( x ,∇ψ)=H ( x ,∇ψ)−a ( x )=0.(6)The first step is to set up a monotone upwind scheme to approximatethis problem.Such a scheme is based on the idea of Godunov used in the approximation of conservation laws.In Bardi and Osher [7],see also [65],the following was obtained (for simplicity we exemplify using two space dimensions and ignore the explicit x dependence in the Hamiltonian)H (ψx ,ψy )≈H G (D x +ψ,D x−ψj ;D y +ψ,D y −ψ)=ext uεI (u −,u +)ext v εI (v −,v +)H (u,v )whereI (a,b )=[min(a,b ),max(a,b )]ext u I (a,b )=min a ≤u ≤b if a ≤bmax b ≤u ≤a if a >bu ±=D x±ψij =±(ψi ±1,j −ψij )∆y.(Note,the order may be reversed in the ext operations above –we always obtain a monotone upwind scheme which is often,but not always,order invariant [65]).This is a monotone upwind scheme which is obtained through the Go-dunov procedure involving Riemann problems,extended to general Hamilton-Jacobi equations [7],[65].If we approximateH (∇ϕ)=a (x,y )byH G (D x +ϕ,D x −ϕ;D x+ψ;D y +ψ,D y −ψ)(7)for Hamiltonians satisfying (4),(5)above,then there exists a unique solutionfor ψi,j in terms of ψi ±1,j ,ψi,j ±1and ψi,j .Furthermore ψi,j is a nondecreas-ing function of all these variables.However,the fast algorithm needs to have property F :The solution to (7)depends on the neighboring ψµ,νonly for ψµ,ν<ψi,j .This gives us a hint as to how to proceed.11For special Hamiltonians of the form:H(u,v)=F(u2,v2),with F non-decreasing in these variables,then we have the following result[61]H G(u+,u−;v+,v−)=F(max((u−+)2,(u+−)2);max((v−+)2,(v+−)2))(8) where x+=max(x,0),x−=min(x,0).It is easy to see that this numerical Hamiltonian has property F described above.This formula,as well as the one obtained in equation10below enables us to extend the fast marching method algorithm to a much wider class than was done before.For example, using this observation we were able to solve an etching problem,also consid-ered in[3]where the authors did not use a fast marching method algorithm, but instead used a local narrow band approach and schemes devised in[64]. The Hamiltonian wasH(ϕx,ϕy,ϕz)=ϕ2x+ϕ2y+ϕ2z.We are able to use the same heap sort technology as for the eikonal equation,for problems of this type.Seefigures3and4.Thesefigures represent the level contours of an etching process whose normal velocity is a function of the direction of the normal.The process moves down infigure 3and up infigure4.More generally,for H(u,v)having the propertyuH1≥0,vH2≥0(9) then we also proved[61]H G(u+,u−;v+,v−)=max[H(u−+,v−+),H(u+−,v−+),H(u−+,v+−),H(u+−,v+−)](10) and property F is again satisfied.Again in[61],we were able to solve a somewhat interesting and very anisotropic etching problem with this new fast algorithm.Here we tookH(ϕx,ϕy)=|ϕy|(1−a(ϕy)ϕy/(ϕ2x+ϕ2y))wherea=0ifϕy<0a=.8ifϕy>012and observed merging of two fronts.Seefigures5and6.Thesefigures show a two dimensional etching process resulting in a merger.The fast method originating in[86]is a variant of Dijkstra’s algorithm and,as such involves the tree like heap sort algorithm in order to compute the smallest of a set of numbers.Recently Bou´e and Dupuis[11]have pro-posed an extremely simple fast algorithm for a class of convex Hamiltonians including those which satisfy(4)and(5)above.Basically,their statement is that the standard Gauss-Seidel algorithm,with a simple ordering,con-verges in afinite number of iterations for equation(7).This would give an O(N),not O(N log N)operations,with an extremely simple to program algorithm–no heap sort is needed.Moreover,for the eikonal equation with a(x,y)=1,the algorithm would seem to converge in2d N iterations in R d,d=1,2,3,which is quite fast.This gives a very simple and fast re-distancing algorithm.For more complicated problems we have found more iterations to be necessary,but still obtained promising results,together with some theoretical justification.See[85]for details,which also include results for a number of nonconvex Hamiltonians.We call this technique the“fast sweeping method”in[85].We refer to it in section3when we discuss the basic distance reinitialization algorithm.131415Figure3:Three dimensional etching using a fast algorithm.Reprinted from [61].16Figure4:Three dimensional etching using a fast algorithm.Reprinted from [61].17Figure5:Two dimensional etching with merging using a fast algorithm. Reprinted from[61].18Figure6:Two dimensional etching with merging using a fast algorithm. Reprinted from[61].193Level Set Dictionary,Technology and Numerical ImplementationWe list key terms and define them by their level set representation.1.The interface boundaryΓ(t)is defined by:{ x|ϕ( x,t)=0}.The regionΩ(t)is bounded byΓ(t):{ x|ϕ( x,t)>0}and its exterior is defined by:{ x|ϕ( x,t)<0}2.The unit normal N toΓ(t)is given byN=−∇ϕ|∇ϕ| .4.The Dirac delta function concentrated on an interface is:δ(ϕ)|∇ϕ|,whereδ(x)is a one dimensional delta function.5.The characteristic functionχof a regionΩ(t):χ=H(ϕ)whereH(x)≡1if x>0H(x)≡0if x<0.is a one dimensional Heaviside function.6.The surface(or line)integral of a quantity p( x,t)overΓ:R np( x,t)δ(ϕ)|∇ϕ|d x.7.The volume(or area)integral of p( x,t)overΩR np( x,t)H(ϕ)d x.20Next we describe three key technological advances which are important in many,if not most,level set calculations.8.The distance reinitialization procedure replaces a general level set func-tionϕ( x,t)by d( x,t)which is the value of the distance from x to Γ(t),positive outside and negative inside.This assures us thatϕdoes not become tooflat or too steep nearΓ(t).Let d( x,t),be signed dis-tance of x to the closest point onΓ.The quantity d( x,t)satisfies |∇d|=1,d>0inΩ,d<0in(¯Ω)c and is the steady state solution (asτ→∞)to∂ψ∂τ+sgn(ϕ)∇ϕThis wasfirst suggested and implemented in[24],analyzed carefully in[88],and further discussed and implemented in both[32]and[66].Acomputationally efficient algorithm based on heap sort technology and fast marching methods was devised in[1].There are many reasons to extend a quantity offofΓ,one of which is to obtain a well conditioned normal velocity for level contours ofϕclose toϕ=0[24].Others involve implementation of the Ghost Fluid Method of[32]discussed in the next section.10.The basic level set method concerns a functionϕ( x,t)which is definedthroughout space.Clearly this is wasteful if one only cares about information near the zero level set.The local level set method defines ϕonly near the zero level set.We may solve(2)in a neighborhood of Γof width m∆x,where m is typically5or6.Points outside of this neighborhood need not be updated by this motion.This algorithm works in“ϕ”space–so not too much intricate computer science is used.For details see[66].Thus this local method works easily in the presence of topological changes and for multiphaseflow.An earlier local level set approach called“narrow banding”was devised in[2].Finally,we repeat that,in the important special case where v N in equa-tion2is a function only of x,t and∇ϕ(e.g.v N=1),then equation 2becomes a Hamilton-Jacobi equation whose solutions generally develop kinks(jumps in derivatives).We seek the unique viscosity solution.Many good references exist for this important subject,see e.g.[8,27].The appear-ance of these singularities in the solution means that special,but not terribly complicated,numerical methods have to be used,usually on uniform Carte-sian grids.This wasfirst discussed in[64]and numerical schemes developed there were generalized in[65]and[43].The key ideas involve monotonicity, upwind differencing,essentially nonoscillatory(ENO)schemes and weighted essentially nonoscillatory(WENO)schemes.See[64],[65]amd[43]for more details.224Coupling of the Level Set Method with External PhysicsInterface problems involving external physics arise in various areas of science. The computation of such problems has a very long history.Methods of choice include front tracking,see e.g.[87]and[41],phase-field methods,see e.g.[48]and[59],and the volume offluid(VOF)approach,see e.g.[60]and [12].The level set method has had major successes in this area.Much of the level set technology discussed in the previous two sections was developed with such applications in mind.Here,we shall describe level set approaches to problems in compressible flow,incompressibleflow,flows having singular vorticity,Stefan problems, kinetic crystal growth and a relatively new island dynamics model for epitax-ial growth of thinfilms.We shall also discuss a recently developed technique, the ghostfluid method(GFM),which can be used(1)to remove numerical smearing and nonphysical oscillations inflow variables near the interface and (2)to simplify the numerical linear algebra arising in some of the problems in this section and elsewhere.4.1Compressible FlowChronologically,thefirst attempt to use the level set method in this area came in two phase inviscid compressibleflow,[55].There,to the equations of conservation of mass,momentum and energy,we appended equation(1), which we rewrote in conservation form as(ρϕ)t+∇·(ρϕ v)=0(12) using the density of thefluidρ.The sign ofϕis used to identify which gas occupied which region,so it determines the local equation of state.This(naive)method suffered from spurious pressure oscillations at the interface,as shown in[46]and[45]. These papers proposed a new method which reduced these errors by using a nonconservative formulation near the interface.However,[46]and[45]still smear out the density across the interface,leading to terminal oscillations for many equations of state.A major breakthrough in this area came in the development of the ghost fluid method(GFM)in[32].This enables us to couple the level set repre-sentation of discontinuities tofinite difference calculations of compressible23flows.The approach was based on using the jump relations for discontinu-ities which are tracked using equation(1)(for two phase compressibleflow). What the method amounts to(in any number of space dimensions)is to pop-ulate cells next to the interface with“ghost values”,which,for two phase compressibleflow retain their usual values of pressure and normal veloc-ity(quantities which are continuous across the interface),with extrapolated values of entropy and tangential velocity(which jump across the interface). These quantities are used in the numericalflux when“crossing”an interface.An important aspect of the method is its simplicity.There is no need to solve a Riemann problem normal to the interface,consider the Rankine-Hugoniot jump conditions,or solve an initial-boundary value problem.An-other important aspect is its generality.The philosophy appears to be:at a phase boundary,use afinite difference scheme which takes only values which are continuous across the interface,using the natural values when-ever possible.Of course,this implies that the tangential velocity is treated in the same fashion as the normal velocity and the pressure when viscosity is present.The same holds true for the temperature in the presence of thermal conductivity.Figure7shows results obtained for two phase compressibleflow using the GFM together with the level set method.Air with density around 1kgm3 is to the right of the interface.Note that there is no numerical smearing of the density at the interface itself which is fortunate as water cavitates at a density above999kgG-equation forflame discontinuities which was originally proposed in[50]. The G-equation represents theflame front as a discontinuity in the same fashion as the level set method so that one can easily consult the abundant literature on the G-equation to obtain deflagration speeds for the Ghost Fluid Method.Figure10shows two initially circular deflagration fronts that have just recently merged together.Note that the light colored region surrounding the deflagration fronts is a precursor shock wave that causes the initially circular deflagration waves to deform as they attempt to merge.The GFM was extended in[34]in order to treat the two phase compress-ible viscous Navier Stokes equations in a manner that allows for a large jump in viscosity across the interface.This paper spawned the technology needed to extend the GFM to multiphase incompressibleflow including the effects of viscosity,surface tension and gravity as discussed in the next subsection.4.2Incompressible FlowThe earliest real success in the coupling of the level set method to prob-lems involving external physics came in computing two-phase Navier-Stokes incompressibleflow[84],[22].The equations can be written as:u t+ u·∇ u+∇pρ+δ(ϕ)σκ Nρterm is used to approximate the surface tension forces which are lost when using a continuous pressure[84].Successful computations using this model were performed in[84]and elsewhere[22].Problems involving area loss were observed and significant improvements were made in[83].As mentioned above,the technology from[34]motivated the extension of the Ghost Fluid Method to this two phase incompressibleflow problem.25First,a new boundary condition capturing approach was devised and applied to the variable coefficient Poisson equation to solve problems of the form∇1ρ∇p· N]=h are given andρisdiscontinuous across the interface.This was accomplished in[49].A sample calculation from[49]is shown infigure11where one can see that both the solution,p,and itsfirst derivatives are sharp across the interface without numerical smearing.Next,this new technique was applied to multiphase incompressibleflow in[44].Here,since one can model the jumps in pressure directly,there is no need to add theσκ Nm3while the water has density near 1000kg。
SUBMITTED TO IEEE TVCG 1 Discrete Surface Ricci Flow
I. I NTRODUCTION Ricci flow is a curvature flow method, which has been applied to the proof of the Poincar´ e conjecture on three dimensional manifolds [1]–[3]. Ricci flow was introduced by Richard Hamilton for general Riemannian manifolds in his seminal work [4] in 1982. a) Physical Intuition: Ricci flow has a simple physical intuition. Given a surface with a Riemannian metric, the metric induces the Gaussian curvature function. If the metric is changed, then the Gaussian curvature will be changed accordingly. We deform the metric in the following way: at each point, we locally scale the metric, such that the scaling factor is proportional to the curvature at the point. After the deformation, the curvature will be changed. We repeat the deformation process, then both the metric and the curvature will evolve, such that the curvature evolution is like a heat diffusion process. Eventually, the Gaussian curvature function is constant everywhere. If the surface is closed and simply connected, then the surface becomes a sphere eventually. (The analogy of this process for three dimensional manifolds is the basic idea of the proof of the Poincar´ e conjecture.) b) Motivations: Surface Ricci flow is a powerful tool to design a Riemannian metric, such that the metric induces the user-defined Gaussian curvature function on the surface, and is conformal (i.e.,angle-preserving) to the original metric. Many applications in engineering fields can be formulated as finding some metrics with desired properties, where Ricci flow can be directly utilized. In graphics, a surface parametrization is commonly used,
国际经济学第3章,李嘉图模型The Ricardian Model
RD
– 如果 PC / PW> aLC / aLW, 一国就会专门 生产奶酪
– 如果 PC / PW< aLC / aLW,一国就会专门生 产葡萄酒
– 只有当 PC / PW = aLC / aLW时,一国会同时 生产奶酪和葡萄酒两种产品
RS曲线的推导
(1)如果 PC / PW < aLC / aLW,RS = ? (2)如果 PC / PW = aLC / aLW,RS = ? (3)如果 aLC / aLW < PC / PW < aLC */aLW * ,
三、单一要素世界中的贸易
2、国际贸易均衡
相对供给 (relative supply,RS) 是用一种产品 的供给数量除以另一种产品的供给数量; 相对需求 (relative demand,RD) 是用一种产品 的需求数量除以另一种产品的需求数量。
Relative Supply and Relative Demand
6. Only two countries are modeled: domestic and foreign.
二、单一要素经济
ne) 一种生产要素:劳动 ( labor ) 令L = 本国的劳动总供给
(单位:人小时 person—hours)
RS = ?
(4)如果 PC / PW = aLC */aLW * ,RS = ? (5)如果 aLC */aLW * < PC / PW ,RS = ?
Relative Supply and Relative Demand
• There is no supply of cheese if the relative price of cheese falls below aLC /aLW .
Notation
*Correspondence address:Laboratory for Open Information Systems,Brain Science Institute Riken,2-1Hirosawa,Wako-shi,Saitama 351-0198,Japan.Tel.:(#81)48-467-9668;fax:(#81)48-467-9686;e-mail:cia @brain.riken.go.jpNeurocomputing 24(1999)55—93Neural networks for blind separation with unknown numberof sourcesAndrzej Cichocki *,Juha Karhunen ,Wlodzimierz Kasprzak ,Ricardo Viga rioLaboratory for Open Information Systems,Brain Science Institute Riken,2-1Hirosawa,Wako-shi,Saitama 351-0198,JapanLaboratory of Computer and Information Science,Helsinki Uni v ersity of Technology,P.O.Box 2200,FIN-02150Espoo,FinlandDepartment of Electrical Engineering,Warsaw Uni v ersity of Technology,Pl.Politechniki 1,PL-00-661Warsaw,PolandInstitute of Control and Computation Engineering,Warsaw Uni v ersity of Technology,Nowowiejska 15/19,PL-00-665Warsaw,PolandReceived 8December 1996;accepted 2September 1998AbstractBlind source separation problems have recently drawn a lot of attention in unsupervised neural learning.In the current approaches,the number of sources is typically assumed to be known in advance,but this does not usually hold in practical applications.In this paper,various neural network architectures and associated adaptive learning algorithms are discussed for handling the cases where the number of sources is unknown.These techniques include estimation of the number of sources,redundancy removal among the outputs of the networks,and extraction of the sources one at a time.Validity and performance of the described approaches are demonstrated by extensive computer simulations for natural image and magnetoencephalographic (MEG)data. 1999Elsevier Science B.V.All rights reserved.Keywords:Blind separation;Image processing;Neural networks;Unsupervised learning;Signal reconstruction0925-2312/99/$—see front matter 1999Elsevier Science B.V.All rights reserved.PII:S 0925-2312(98)00091-556 A.Cichocki et al./Neurocomputing24(1999)55–93NotationR VV"E+xx2,covariance matrix of signal vector x(t)u G the i th principal eigenvector of the matrix R VVG the i th eigenvalue of the matrix R VVthe normalized kurtosis of a source signal(t)learning ratem number of sourcesn number of sensorsl3[1,2,n]number of outputss(t)m-dimensional vector of source signalsx(t)n-dimensional vector of mixed signalsy(t)n-or l-dimensional vector of separated(output)signalsz(t)n-dimensional vector of output signals after redundancyeliminationn(t)n-dimensional vector of noise signals(t)m-or l-dimensional vector of pre-whitened signalsA(t)"[a GH]L"K(unknown)full rank mixing matrixV(t)"[v GH]J"L pre-whitening matrixW(t)"[w GH]J"L global de-mixing matrixW(t)"[w GH]J"J source separation matrix after pre-whiteningW I (t)de-mixing matrix of the k th layerP(t)generalized permutation matrixJ(y,W)cost(risk)function1.IntroductionIn blind source separation(BSS),the goal is to extract statistically independent but otherwise unknown source signals from their linear mixtures without knowing the mixing coefficients[1—54].This kind of blind techniques have applications in several areas,such as data communications,speech processing,and various biomedical signal processing problems(MEG/EEG data);see for example[34,46].The study of BSS began about10years ago mainly in the area of statistical signal processing,even though the related single channel blind deconvolution problem has been studied already earlier.Quite recently,BSS has become a highly popular research topic in unsupervised neural learning.Neural network researchers have approached the BSS problem from different starting points,such as information theory[1,4,5]and nonlinear generalizations of Hebbian/anti-Hebbian learning rules [15—17,27,30,32,36,43].Despite of recent advances in neural BSS,there still exist several open questions and possible extensions of the basic mixing model that have received only limited attention thus far[32].Although many neural learning algorithms have been proposed for the BSS problem,in their corresponding models and network architectures it is usually assumed that the number of source signals is known a priori.Typically it should beequal to the number of sensors and outputs.However,in practice,these assumptions do not often hold.The main objective of this paper is to study the behavior of various network structures for a BSS problem,where the number of sources is different from the number of outputs and where the number of sources is in general unknown.We shall propose several alternative solutions to these problems.The paper is organized as follows.In Section2wefirst define the general BSS problem,and then briefly consider special but important cases that may appear in BSS problems.In Section3we discuss two alternative source separation approaches for solving the BSS problem.Thefirst approach uses pre-whitening,while the second approach tries to separate and to determine the source number directly from the input data.The theoretical basics of proposed learning rules are given in an appendix. Computer simulation results are given in Section4,and the last Section5contains discussion of the achieved results and some conclusions.2.Problem formulation2.1.The general blind source separation problemAssume that there exist m zero mean source signals s (t),2,s K(t)that are scalar-valued and mutually(spatially)statistically independent(or as independent as pos-sible)at each time instant or index value t.The original sources s G(t)are unknown to the observer,which has to deal with n possibly noisy but different linear mixtures x (t),2,x L(t)of the sources(usually for n5m).The mixing coefficients are some unknown constants.The task of blind source separation is tofind the waveforms +s G(t),of the sources,knowing only the mixtures x H(t)and usually the number m of sources.Denote by x(t)"[x (t),2,x L(t)]2the n-dimensional t th data vector made up of the mixtures at discrete index value(usually time)t.The BSS mixing(data)model can then be written in the vector formx(t)"As(t)#n(t)"KGs G(t)a G#n(t).(1)Here s(t)"[s (t),2,s K(t)]2is the source vector consisting of the m source signals at the index value t.Furthermore,each source signal s G(t)is assumed to be a stationary zero mean stochastic process.A"[a ,2,a K]is a constant full-rank n;m mixing matrix whose elements are the unknown coefficients of the mixtures(for n5m).The vectors a G are basis vectors of independent component analysis(ICA)[19,20,32]. Besides the above general case,we also discuss the noise-free simplified mixing model,where the additive noise n(t)is negligible so that it can be omitted from the considerations.We assume further that in the general case the noise signal has a Gaussian distribution but none of the sources is Gaussian.In the simplified case at most one of the source signals s G(t)is allowed to have a Gaussian distribution.These assumptionsA.Cichocki et al./Neurocomputing24(1999)55–9357Fig.1.Illustration of the mixing model and neural network for blind means learning algorithm.follow from the fact that it is impossible to separate several Gaussian sources from each other [6,48].In standard neural and adaptive source separation approaches,an m ;n separating matrix W (t )is updated so that the m -vectory (t )"W (t )x (t )(2)becomes an estimate y (t )"s (t )of the original independent source signals.Fig.1shows a schematic diagram of the mixing and source separation system.In neural realizations,y (t )is the output vector of the network and the matrix W (t )is the total weight matrix between the input and output layers.The estimate s G (t )of the i th source signal may appear in any component y H (t )of y (t ).The amplitudes of the source signals s G (t )and their estimates y H (t )are typically scaled so that they have unit variance.This ambiguity can be expressed mathematically asy (t )"s (t )"WAs (t )"PDs (t ),(3)where P is a permutation matrix and D is a nonsingular scaling matrix.With a neural realization in mind,it is desirable to choose the learning algorithms so that they are as simple as possible but yet provide sufficient performance.Many different neural separating algorithms have been proposed recently [2—6,9—18,24,26—38,41—46,52,54].Their performance usually strongly depends on stochastic properties of the source signals.These properties can be determined from higher-order statistics (cumulants)of the sources.Especially useful is a fourth-order cumulant called kurtosis .For the i th source signal s G (t ),the normalized kurtosis is defined by[s G (t )]"E +s G (t ) ,E +s G(t ) ,!3.(4)If s G (t )is Gaussian,its kurtosis [s G (t )]"0.Source signals that have a negative kurtosis are often called sub-Gaussian ones.Typically,their probability distribution is “flatter ”than the Gaussian distribution.Respectively,super-Gaussian sources (with a positive kurtosis)have usually a distribution which has a longer tail and sharper peak when compared with the Gaussian distribution.If the sign of the kurtosis (4)is the same for all the sources s G (t ),i "1,2,m ,and the input vectors are pre-whitened,one can use a particularly simple separating criterion,58 A.Cichocki et al./Neurocomputing 24(1999)55–93This is the sum of the fourth moments of the outputs[36,44]J(y)"KGE+y G(t) ,,(5) usually subject to one of the constraintsE+y G,"1,∀i;#w G#"1,∀i;(6) or w GG"1,∀i;or E+f(y G)x(t)!f(w)#w#,"0.(7) Here we have assumed that the number l of outputs equals to the number m of the sources.A separating matrix W minimizes Eq.(5)for sub-Gaussian sources,and maximizes it for super-Gaussian sources[43].The choice of nonlinear functions in neural separating algorithms depends on the sign of the normalized kurtoses of the sources.This is discussed briefly later on in this paper.2.2.Separation with estimation of the number of sourcesA standard assumption in BSS is that the number m of the sources should be known in advance.Like in most neural BSS approaches,we have assumed up to now that the number m of the sources and outputs l are equal in the separating network.Generally, both these assumptions may not hold in practice.In this paper we shall propose two different approaches for neural blind separation with simultaneous determination of the source number m.The only additional requirement in these approaches is that the number of available mixtures is greater than or equal to the true number of the sources,that is,n5m.For completeness of our considerations,let usfirst briefly discuss the difficult case where there are less mixtures than sources:n(m.Then the n;m mixing matrix A in Eq.(1)has more columns than rows.In this case,complete separation is usually out of question.This is easy to understand by considering the much simpler situation where the mixing matrix A is known(recall that in BSS this does not hold),and there is no noise.Even then the set of linear equations(1)has an infinite number of solutions because there are more unknowns than equations,and the source vector s(t)cannot be determined for arbitrary distributed sources.However,some kind of separation may still be achievable in special instances at least.This topic has recently been studied theoretically in[6].The authors show that it is possible to separate the m sources into n disjoint groups if and only if A has n linearly independent column vectors,and the remaining m!n column vectors satisfy the special condition that each of them is parallel to one of these n column vectors. Before proceeding,we point out that it is not always necessary or even desirable in BSS problems to separate all the sources contained in the mixtures.This holds for example in situations where the number of sensors is large and only a few most powerful sources are of interest.In particular,Hyva rinen and Oja[27,46]have recently developed separating algorithms which estimate one source at a time. However,the sources are extracted in somewhat arbitrary order depending on the initial values,etc.,though thefirst separated sources are usually among the mostA.Cichocki et al./Neurocomputing24(1999)55–9359Fig.2.The two-layer feed-forward network for pre-whitening and blind separation:(a)block diagram,(b)detailed neural network with signal reduction during pre-whitening.powerful ones.Instead of neural gradient rules which converge somewhat slowly and may not be applicable to high-dimensional problems,one can use semi-neural fixed-point algorithms [26,28,48]for separating sources.An example of extracting one source at a time from auditory evoked fields in given in Section 4.3.Two neural network approaches to BSS3.1.Source separation with a pre-whitening layerFig.2shows a two-layer neural network for blind source separation,where the first layer performs pre -whitening (sphering )and the second one separation of sources.The respective weight matrices are denoted by V and W .The operation of the network is described byy (t )"W (t ) (t )"W Vx (t )"W (t )x (t ),(8)where W ,W V is the total separating matrix.The network of Fig.2is useful in context with such BSS algorithms that require whitening of the input data for good performance.In whitening (sphering ),the data vectors x (t )are pre-processed using a whitening transformation(t )"V (t )x (t ).(9)Here (t )denotes the whitened vector,and V (t )is an m ;n whitening matrix.60 A.Cichocki et al./Neurocomputing 24(1999)55–93A.Cichocki et al./Neurocomputing24(1999)55–9361 If n'm,where m is known in advance,V(t)simultaneously reduces the dimension of the data vectors from n to m.In whitening,the matrix V(t)is chosen so that the covariance matrix E+ (t) (t)2,becomes the unit matrix I K.Thus,the components ofthe whitened vectors (t)are mutually uncorrelated and they have unit variance. Uncorrelatedness is a necessary pre-requisite for the stronger independence condition. After pre-whitening the separation task usually becomes easier,because the sub-sequent separating matrix W can be constrained to be orthogonal[36,46]: W W2"I K,(10) where I K is the m;m unit matrix.Whitening seems to be especially helpful in large-scale problems,where separation of sources can sometimes be impossible in practice without resorting to it.3.1.1.Neural learning rules for pre-whiteningThere exist many solutions for whitening the input data[15,32,36,47].The simplest adaptive,on-line learning rules for pre-whitening have the following matrix forms[22]:V(t#1)"V(t)# (t)[I! (t) 2(t)](11) or[10,47]V(t#1)"V(t)# (t)[I! (t) 2(t)]V(t).(12) Thefirst algorithm is a local one,in the sense that the update of every weight v GH is made on the basis of two neurons i and j only.The second algorithm is a robust one with equivariant property[10]as the global system(in the sense that the update of every synaptic weight v GH depends on outputs of all neurons),described by combined mixing and de-correlation processP(t#1) "V(t#1)A"P(t)# (t)[I!P(t)s(t)s2(t)P2(t)]P(t)(13)is completely independent of the mixing matrix A.Both these pre-whitening rules can be used in context with neural separating algorithms.Thefirst rule(11)seems to be more reliable than Eq.(12)if a large number of iterations or tracking of mildly non-stationary sources is required.In these instances,the latter algorithm(12)may sometimes suffer from stability problems.3.1.2.Nonlinear principal subspace learning rule for the separation layerThe nonlinear PCA subspace rule developed and studied by Oja,Karhunen,Xu and their collaborators(see[35,38,45])employs the following update rule for the ortho-gonal separating matrix W:W(t#1)"W(t)# (t)u[y(t)][ (t)!W2(t)u[y(t)]]2,(14)where (t)"V(t)x(t),x(t)"As(t),and y(t)"W(t) (t).Here and later on,u[y(t)]denotes the column vector whose i th component is g G[y G(t)],where g G(t)is usually an odd and62 A.Cichocki et al./Neurocomputing24(1999)55–93monotonically increasing nonlinear activation function.The learning rate (t)must be positive for stability reasons.A major advantage of the learning rule(14)is that it can be realized using a simple modification of one-layer standard symmetric PCA network,allowing a relatively simple neural implementation[35,36].The separation properties of Eq.(14)have been analyzed mathematically in simple cases in[45].In a recent paper[38]it is shown that the Nonlinear PCA rule(14)is related to several other ICA and BSS approaches and contrast functions.Efficient recursive least-squares type algorithms for minimizing the nonlinear PCA criterion in blind separation have been developed in[37,38].They provide a clearly faster convergence than the stochastic gradient rule (14)at the expense of somewhat greater computational load.3.2.Signal number reduction by pre-whiteningThefirst class of approaches for source number determination in the BSS problem is based on the natural compression ability of the pre-whitening layer.If standard Principal Component Analysis(PCA)is used for pre-whitening,one can then simulta-neously compress information optimally in the mean-square error sense andfilter the possible noise[15,36].In fact the PCA whitening matrix V can be computed as V"(R\VV" \ U2,(15) where "diag( ,2, L)is a diagonal matrix of the eigenvalues and U"[u ,u ,2,u L]is the orthogonal matrix of the associated eigenvectors of the covariance matrix R VV"E[x(t)x2(t)]"U U2.If there are more mixtures than sources(n'm),it is possible to use the PCA approach for estimating the number m of the sources.If m is estimated correctly and the input vectors x(t)are compressed to m-dimensional vectors (t)in the whitening stage using the network structure in Fig.2b,then there are usually no specific problems in the subsequent separation stage.In practice,the source number is determined byfirst estimating the eigen-values G of the data covariance matrix E+x(t)x(t)2,.Let us denote these ordered eigenvalues by5 525 L50.(16) In the ideal case where the noise term n(t)in Eq.(1)is zero,only the m largest“signal”eigenvalues ,2, K are nonzero,and the rest n!m“noise”eigenvalues of the data covariance matrix are zero.If the powers of the sources are much larger than the power of noise,the m largest signal eigenvalues are still clearly larger than noise eigenvalues,and it is straightforward to determine m from the breakpoint.However,if some of the sources are weak or the power of the noise is not small,it is generally hard to see what is the correct number m of sources just by inspecting the eigenvalues.In [33]it is demonstrated that two well-known information-theoretic criteria,MDL and AIC,yield in practice good estimates of the number of sources for noisy mixtures on certain conditions.A.Cichocki et al./Neurocomputing24(1999)55–9363 We have also considered a modified network structure,where the possible data compression takes place in the separation layer instead of the pre-whitening layer.The nonlinear PCA subspace rule(14)can well be used for learning the separating matrix W,because this algorithm has originally been designed for situations where data compression takes place simultaneously with learning of the weight matrix W[35,52]. If the number n of mixtures equals to the number m of sources,and the goal is to extract only some sources,so that the number of outputs l(m,this alternative structure seems to perform better.On the other hand,if n'm(the number of mixtures is larger than that of sources),and l"m,the quality of the separation results was in our experiments slightly better when the data compression from n to m took place in the whitening stage instead of the separation stage.Generally,this modified network structure is not recommendable if the power of the noise is not small or the number of mixtures n is larger than the number m of the sources.This is easy to understand,because in this case whitening without data compression tends to amplify the noise by making the variances of n components of the whitened vectors (t)all equal to unity.3.3.Source separation without pre-whiteningWhitening has some disadvantages,too.The most notable of these is that for ill-conditioned mixing matrices and weak sources the separation results may be poor. Therefore,some other neural algorithms have been developed that learn the separat-ing matrix W directly.A single layer performs the linear transformationy(t)"Wx(t),(17)where W is an n;n square nonsingular matrix of synaptic weights updated according to some on-line learning rule.In this section we discuss simple neural network models and associated adaptive learning algorithms,which do not require any pre-processing.3.3.1.General(robust)global ruleThe whitening algorithms discussed so far can be easily generalized for the blind source separation problem.For example,a general form of the learning rule(12)was proposed in[17,18],asW(t#1)"W(t)# (t)+I!f[y(t)]u[y2(t)],W(t),(18)which can be written in scalar form asw GH(t#1)"w GH(t)# (t) w GH(t)!f G[y G(t)]K I w IH(t)g I[y I(t)] ,(19)where (t)'0is the adaptive learning rate and I is the n;n identity matrix. f(y)"[f(y ),2,f(y L)]2and u(y2)"[g(y ),2,g(y L)]are vectors of nonlinear activa-tion functions,where f(y),g(y)is a pair of suitably chosen nonlinear functions.These nonlinear functions are used in the above rule mainly for introducing higher-order statistics or cross-cumulants into computations.The rule tries to cancel these higher-order statistics,leading to at least approximate separation of sources(or independent components).The choice of the activation functions f(y),g(y)depends on the statistical distribution of the source signals(this problem is discussed in the appendix).The above rule is derived more rigorously in the appendix by using the concept of Kullback—¸eibler divergence(or mutual information)and the natural gradient con-cept developed by Amari[1,4].3.3.2.Simplified(nearly local)ruleThe learning rule(18)can be simplified by applying another generalized gradient form[11,12]:W(t#1)"W(t)G (t)*J*W W2(t).(20)In this case we obtain a relatively simple self-normalized local learning rule[12,16]: W(t#1)"W(t)$ (t)+I!f[y(t)]y2(t),.(21) This learning rule which can be written in scalar form as w GH(t#1)" w GH(t)$ G(t)[ GH!f G(y G(t))y H(t)],is stable for both signs#and!under zero initial conditions.The local learning rule(21)can be regarded as a generalization of the local whitening rule(11).Furthermore,this is the simplest on-line learning rule for the BSS problem that to our knowledge has been proposed thus far.3.3.3.Equi v ariant propertyIt is very interesting to observe that the learning rule(18)has a so-called equivariant property[3,4,10,17].This means that its performance is independent of the scaling factors and/or mixing matrix A.Therefore,the algorithm is able to extract extremely weak signals mixed with strong ones provided that there is no noise.Moreover,the condition number of mixing matrix can then be even10 ,and it depends only on the precision of the calculations[17,18].The simplified local learning rule(21)does not have the equivariant property. Hence,a single layer neural network with this learning rule may sometimes fail to separate signals,especially if the problem is ill-conditioned.However,we have discovered that by applying a multi-layer structure(feed-forward or recurrent)this algorithm is also able to solve very ill-conditioned separation problems[11—13].In such a case we apply the same simple local learning rule(21)for each layer,as illustrated by Fig.3.However,for each layer we can use different nonlinear functions for introducing different higher-order statistics,which usually improves the quality of separation.64 A.Cichocki et al./Neurocomputing24(1999)55–93Fig.4.The scheme of a two-layer neural network for blind separation and redundancyelimination.Fig.3.A multi-layer feed-forward neural network architecture for blind source separation without pre-whitening.3.4.Noise-free redundancy reductionThe separation algorithms (18)and (21)presented so far for the complete (deter-mined)source case (m "n )can be applied in the more general (over-determined)case,when the number of sources is unknown,but not larger than the number of sensors,that is if n 5m .In this case we assume that the dimension of matrix W (t )is still n ;n .If n 'm there appears a redundancy among the separated signal set,meaning that one or more signals are extracted in more than one channel.If additive noise exist in each sensor channel,then they appear on the redundant outputs.But consider the noise-free case.Then some separated signals appear in different channels with different scaling factors.In [11]we have proposed to add a post-processing layer to the separation network for the elimination of redundant signals.Thus the applied neural network consists of two or more layers (Fig.4),where the first sub-network (a single layer or a multi-layer)simultaneously separates the sources and the last (post-processing)layer eliminates redundant signals.The post-processing layer determines the number of active sources in the case where the number of sensors (mixtures)n is greater than the number of the primary sources m .Such a layer is described by the linear transformation z (t )"W (t )y (t ),where the synaptic weights (elements of the matrix W (t ))are updated using the following adaptive local learning algorithm:w GG (t )"1,∀t ∀i , w GH (t )"! (t )f [z G (t )]g [z H(t )],i O j ,(22)where g (z )is a nonlinear odd activation function (e.g.g (z )"tanh( z ))and f (z )is eithera linear or slightly nonlinear odd function.Table 1Three separation algorithms considered in the paper Pos.Separation rule Description1 W " [I !f (y )u (y )2]W Global algorithm with equivariant property2 W "$ [I !f (y )u (y )2]Simple local algorithm3W " f (y )[ 2!f (y )2W ]K Nonlinear PCA with pre-whitening[f (y )y 2!yf (y )2]W ,y "W , "Vx "VAsConstraints imposed on the matrix W (t )ensure mutual de-correlation of the output signals,which eliminates the redundancy of the output signals and may also improve the separation results.However,we have found that this performance strongly depends on the statistical distributions of the sources.If the source signals are not completely independent,for example when the sources consist of two or more natural images,the post-processing layer should be used for redundancy elimination only.The learning rules for redundancy elimination given above can be derived using the same optimization criterion which was used for source separation (Eqs.(18)and (21)),but with some constraints for the elements of the matrix W (t ),e.g.w GG(t )"1,∀i .Itshould be noted that the rule (22)is similar to the well-known Herault —Jutten rule [30],but it is now applied to a feed-forward network with different activation functions.puter simulation results 4.1.Experimental arrangementsIn this section some illustrative experiments are presented using the proposed approaches,in particular the three separation algorithms summarized in Table 1.In order to estimate the quality of separation,we use in our simulations known original source signals (images)and a known mixing matrix.Of course,these quantities are unknown to the learning algorithms that are being tested.The separation results are best inspected by comparing images showing the true sources and the separated sources.This gives a good qualitative assessment of the achieved performance.Different types of image sources are applied in a single experiment —sources with both positive or negative kurtosis and a Gaussian noise image are mixed together (compare Table 2).By scanning them,they can easily be transformed to 1-D signals (see Fig.5).It should be noted that the stochastic characteristics of a 1-D signal corresponding to some natural image is frequently changing.Hence,in order to achieve convergence of the weights during the learning process,we apply a descending learning rate.。
marginal analysis例子
marginal analysis例子Marginal Analysis: An IllustrationMarginal analysis is a valuable economic tool used to assess incremental changes in costs and benefits when making decisions. To better understand this concept, let's consider a hypothetical example of a manufacturing company deciding whether to increase the production of a popular product.Suppose ABC Manufacturing produces and sells a particular widget. The current production level is 10,000 units per month, resulting in a profit of $50,000. However, market research suggests that there is a growing demand for this widget, and ABC Manufacturing is contemplating increasing production by 2,000 units.To conduct a marginal analysis, ABC Manufacturing must determine the additional costs and benefits associated with producing these extra 2,000 units. Let's break it down:Cost Analysis:1. Direct Costs: ABC Manufacturing needs to consider the materials, labor, and overhead expenses required to produce the additional 2,000 units. Assuming that the direct cost per unit is $10, the total direct costs for the additional production would amount to $20,000.2. Indirect Costs: Increasing production might require the company to invest in additional machinery or hire more personnel. These indirect costs must be calculated to have a comprehensive view of the overall expenses associated with the increased production.Benefit Analysis:1. Revenue: With the increase in production, ABC Manufacturing expects to generate additional revenue from selling the extra 2,000 widgets. Considering a selling price of $25 per unit, the additional revenue would amount to $50,000.2. Market Share: By meeting the growing demand, ABC Manufacturing can secure a larger share of the market, potentially enhancing its brand image and customer loyalty.Having analyzed the costs and benefits, ABC Manufacturing can now assess whether the increase in production is financially viable. The approach is to compare the marginal cost (additional cost) with the marginal benefit (additional revenue and intangible market share gains). If the marginal benefit surpasses the marginal cost, then the increased production is likely to yield positive returns.In this example, the marginal cost of $20,000 (direct costs) is relatively low compared to the marginal benefit of $50,000 (additional revenue and potential market share gains). As a result, ABC Manufacturing would most likely choose to increase production, as the marginal analysis suggests that the benefits outweigh the costs.By employing marginal analysis, businesses can make informed decisions at the margin, considering the incremental changes and optimizing their resources. This economic tool enables companies to maximize profits and allocate resources efficiently.Remember, marginal analysis is a versatile concept that can be applied to various scenarios, allowing decision-makers to weigh the costs and benefits of any incremental change before making a well-informed choice.。
09-Marginal-And-Absorption-Costing
CH9 MARGINAL AND ABSORPTION COSTING1. MARGINAL COST AND MARGINAL COSTINGMarginal costing is an alternative method of costing to absorption costing. In marginal costing, only variable costs are charged as a cost of sale and a contribution is calculated (sales revenue minus variable cost of sales). Closing stocks of work in progress or finished goods are valued at marginal (variable)production cost. Fixed costs are treated as a period cost, and are charged in full to the profit and loss account of the accounting period in which they are incurred.Marginal cost is the cost of a unit of a product or service which would be avoided if that unit were not produced or provided.The marginal production cost per unit of an item usually consistsSurat is a small business which has the following budgeted marginal costingbudgeted at £4,000 per month and absorbed on the normal level of activity of units produced.2. CONTRAST BETWEEN ABSORPTION vs. MARGINAL COSTINGExample :A company sells a single product at a price of £14 per unit. Variable manufacturing costs of the product are £6.40 per unit. Fixed manufacturing overheads, which are absorbed into the cost of production at a unit rate (based on normal activity of 20,000 units per period), are £92,000 per period. Anyover-absorbed or under-absorbed fixed manufacturing overhead balances are transferred to the profit and loss account at the end of each period, in order to establish the manufacturing profit.(a)Prepare a trading statement to identify the manufacturing profit for Period 2 using the existing absorption costing method.(b)Determine the manufacturing profit that would be reported in Period 2 if marginal costing was used.(c)Explain, with supporting calculations, why the manufacturing profit in (a)and (b)differs.2.1 The cost accounting processThe key issue between absorption costing and marginal costing is how the costs of a business’s input resources are best organized and presented so as to identify individual product/service and total business profit.The choice of costing system may be influenced by the costing method adopted. Specific order costing methods will frequently deploy full absorption costing. One reason for this is that the pricing of each unique piece of work will invariably make reference to the total costs incurred. Continuous operation costing methods are more likely to deploy marginal costing (although this may be in addition to absorption costing)because of the opportunities in such an environment to usecost-volume-profit (CVP)analysis.2.2 Absorption costing principlesIn product/service costing, an absorption costing system allocates or apportions a share of all costs incurred by a business to each of its products/services. In this way, it can be established whether, in the long run, each product/service makes a profit. This can only be a guide. Arbitrary assumptions have to be made about the apportionment of many of the costs which, given that some costs will tend to remain fixed during a period, will also be dependent on the level of activity.An absorption costing system traditionally classifies costs by function. Sales less production costs (of sales)measures the gross profit (manufacturing profit)earned. Gross profit less costs incurred in other business functions establishes the net profit (operating profit)earned.Using an absorption costing system, the profit reported for a manufacturing business for a period will be influenced by the level of production as well as by the level of sales. This is because of the absorption of fixed manufacturing overheads into the value of work-in-progress and finished goods stocks. If stocks remain at the end of an accounting period, then the fixed manufacturing overhead costs included within the stock valuation will be transferred to the following period.2.3 Absorption costing profit statementThe first stage in the preparation of absorption costing profit statements is the measurement of the gross profit (manufacturing profit)earned. This requires the calculation of unit production costs, including the establishment of absorption ratesfor manufacturing overheads.Referring to the example, variable manufacturing costs per units are given in the question (at £6.40 per unit). Fixed manufacturing overheads of £92,000 per period are to be absorbed at a unit rate (based on normal production activity of 20,000 units per period). The fixed manufacturing overhead absorption rate is therefore £4.60 per unit (£92,000 ,20,000 units)giving a total manufacturing cost of £11.00 per unit (£6.40 + £4.60).The use of normal activity as the basis for overhead absorption is similar to the use of budgeted activity. It is to be expected that actual activity (and indeed actual expenditure also)will be different to normal/budget thus giving rise to overhead over or under absorption. It is important that this is highlighted in profit statements. The use of normal (or budgeted)activity and expenditure to establish the absorption rate not only helps to focus attention on overhead recovery but also has the effect of‘normalising’ per unit product/service costs.Fixed manufacturing overheads will be over-absorbed in Period 2 because the actual production of 21,000 units exceeds the normal activity, the basis used to establish the absorption rate, by 1,000 units. The extent of the over-absorption (the balance remaining in the fixed manufacturing overhead account)is, therefore,£4,600 (1,000 units at £4.60 per unit). This amount will be transferred to the profit and loss account in order to establish the manufacturing profit. It will have a positive effect (i.e., it will be added to profit)because more manufacturing overhead has been absorbed into stock in the period than has been incurred (see the entries in the fixed manufacturing overhead account demonstrated later in examples).Some candidates wrongly calculated the over/under- absorption based upon the difference between sales and production quantities, rather than upon the difference between actual production and normal production.Preparation of the remainder of the trading statement (to identify the manufacturing profit for Period 2 using absorption costing)should be straightforward (see Example 2). The manufacturing cost of sales (the transfer out of the finished goods stock account)is simply the 21,600 units sold in the period(which at a selling price of £14.00 per unit yields total sales for the period of £302,400)multiplied by the unit manufacturing cost of £11.00.examination scripts marked)of candidates matching the cost of the goods produced in the period (not the manufacturing cost of the goods sold)with sales.Many examination candidates also got into difficulty because they attempted to introduce stock into the profit statement. There is generally no requirement to do this (unless opening and closing stocks are valued at a different rate per unit)but such an approach should (but often did not)lead to the same profit result.In this example, no details of stock are provided in the question, but it could be assumed, for example, that 3,000 units were in stock at the end of Period I (production of 18,000 units less sales of 15,000 units). Finished goods stock at the end of Period 2 would then be 2,400 units due to the excess of sales over production of 600 units in that period (see Example 3).information provided about costs incurred in other business functions (selling, administration, distribution)and thus it is not possible to complete the profit statement. If information on other costs was available, the costs incurred in the period would simply be deducted from the adjusted manufacturing profit to arrive at net profit.2.4 Marginal costing principlesIn product/service costing, a marginal costing system emphasises the behavioural, rather than the functional, characteristics of costs. The focus is on separating costs into variable elements (where the cost per unit remains the same with total cost varying in proportion to activity)and fixed elements (where the total cost remains the same in each period regardless of the level of activity). Whilst this is not easily achieved with accuracy, and is an oversimplification of reality, marginal costing information can be very useful for short-term planning, control and decision-making, especially in a multi-product business.In a marginal costing system, sales less variable costs (regardless of function)measures the contribution that individual products/services make towards the total fixed costs incurred by the business. The fixed costs (regardless of function)are treated as period costs and, as such, are simply deducted from contribution in the period incurred to arrive at net profit.2.5 Marginal costing profit statementReferring back to Example 1, as there is no information regardingnon-manufacturing costs, it may be assumed that sales less variable manufacturing costs measures contribution. NB. If any variable costs are incurred innon-manufacturing functions then they would also be deducted from sales in the measurement of contribution. (这一点非常重要,务必要深刻理解)The trading statement for Period 2, assuming that a marginal costing system was in place instead, would be as in Example 4.costing approach is more straightforward. The Examiner’s Report noted thatcandidates generally had rather more success with the marginal costing statement in part (b ) than with the absorption costing statement in part (a ), although some confusion between the two was demonstrated.In practice, the marginal costing profit statement would be completed by the further deduction of fixed costs incurred in other functions.2.6 Profit reconciliationThe net profit reported by absorption and marginal costing systems may not be the same owing to the differing treatment of fixed manufacturing overheads. As has been demonstrated above, whilst marginal costing systems treat fixed manufacturing overheads as period costs (i.e. a charge against profit in the period incurred ), in absorption costing systems they are absorbed into the cost of goods produced and are only charged against profit in the period in which those goods are sold.As a result, if quantities produced and sold in a period are not the same (i.e., if the levels of work-in-progress or finished goods stock change )a different profit will be reported by the two systems. The differing profits can be reconciled, and the difference explained, by an analysis of the product of the stock change and the fixed of Example 1:£69,400 and the marginal costing manufacturing profit of £72,160. Absorption costing has a lower profit because more goods are being taken out of stock(including a charge for fixed manufacturing overhead ) than are going into stock. This is demonstrated by the entries in the respective cost accounting systems (fixed manufacturing overhead only ) as in Example .Example2,760 more fixed manufacturing overheads and less profit in absorption costing.NB. The difference in profit between absorption and marginal costing systems is nothing to do with overhead over/under absorption, a popular misconception amongst examination candidates. Despite an over-absorption of £4,600, which is a positive adjustment to the absorption costing profit, the profit was nevertheless less than the marginal costing profit.To emphasise again –the difference in reported profit demonstrated above, which can only be a timing difference, is due to changes in the level of finished goods stock which in an absorption costing system moves overhead, and therefore profit, from one period to another. As a general rule:If production quantity > sales quantity then absorption costing profit > marginal costing profit. If production quantity < sales quantity then absorption costing profit <marginal costing profit.Many candidates, when answering examination questions on this topic, do appreciate that the different treatment of fixed manufacturing overhead is the reason for profit differences but they are rarely able to reconcile the profits.。
英美报刊选读(辅修) 范数 -回复
英美报刊选读(辅修)范数-回复the question: "Should college athletes be paid?"Introduction:College sports in the United States have always been a source of excitement and entertainment for millions of fans. Sports like football and basketball draw large crowds and generate millions of dollars in revenue. However, amidst this frenzy, a debate has emerged regarding whether college athletes should be paid for their services. In this article, we will explore both sides of the argument and analyze the potential impact of such a decision.Body:1. Arguments against paying college athletes:a. Amateurism: One of the main arguments against paying college athletes is that they are amateurs, not professional athletes. College sports are meant to be played for the love of the game and should not be transformed into a source of income for the players.b. Scholarship benefits: College athletes already receive various benefits such as scholarships, which cover their tuition fees and living expenses. These benefits are often valued at tens of thousands of dollars a year, providing athletes with a valuableopportunity to pursue higher education without the burden of student loans.c. Equality among athletes: Paying college athletes would create an imbalance among different sports and colleges. Only revenue-generating sports like football and basketball would be able to pay their athletes, while sports like soccer or swimming, which generate less revenue, would struggle to do so. This would create an unfair advantage for certain athletes and teams.2. Arguments in favor of paying college athletes:a. Exploitation of athletes: College athletes bring in significant revenue for their universities and the NCAA through ticket sales, merchandise, and advertising deals. It is argued that these athletes should receive a fair share of the profits they help generate.b. Time commitment: College athletes dedicate a significant amount of time to their sport, often at the expense of their studies and personal lives. They have to balance grueling training schedules, travel, and academic commitments, all while generating revenue for their institutions. Being able to earn a salary would compensate athletes for their time and effort.c. Economic impact: Paying college athletes could have an economic impact on both athletes and the surroundingcommunities. Athletes would have the opportunity to save for their future or support their families financially. Additionally, college towns heavily rely on sports programs for revenue, and paying athletes could further boost the local economy.3. Potential consequences of paying college athletes:a. Financial strain: Paying college athletes could place a significant financial burden on universities, especially those with smaller sports programs or limited resources. This could result in program cuts, increased tuition fees, or diminished resources for non-athletic programs.b. Corrupting influence: The introduction of monetary compensation could lead to recruiting violations, cheating, and a more professional approach to college sports. This may erode the spirit of amateurism and fair competition that college sports are known for.c. Educational priorities: Paying college athletes might further shift the focus away from academics, as athletes may prioritize their sports careers over their studies. This could undermine the original purpose of attending college, which is to receive a quality education.Conclusion:The debate on whether college athletes should be paid is a complex and contentious one. While there are valid arguments on both sides, the decision ultimately lies in determining the appropriate balance between the athletes' contributions to revenue generation and the preservation of the amateur spirit of college sports. Any potential changes should carefully consider the potential consequences and find a solution that ensures fairness and integrity in the world of college athletics.。
Marginal Distribution in Evolutionary Algorithms
Martin Pelikan
GMD Forschungszentrum Informationstechnik D-53754 Sankt Augustin Germany email: muehlenbein@gmd.de tel/fax: +49-2241-14 2405/ +49-2241troduction
Evolutionary algorithms work over populations of strings. The main schema of evolutionary algorithms is simple. The initial population is generated randomly. From the current population, a set of high quality individuals is selected rst. The better the individual, the bigger the chance for its selection. The information contained in the set of selected individuals is then used in order to create the o spring population. New individuals then replace a part of the old population or the whole old population. The process of the selection, processing the information from the selected set, and the incorporation of new individuals into the old population, is repeated until the population satis es the termination criteria. In a simple genetic algorithm, the information contained in the selected set is processed using Mendelian crossover and mutation on pairs of individuals. Crossover combines the information contained in two individuals together by swapping some of genes between them. Mutation is a small perturbation to the genome in order to preserve the diversity of the population and to introduce the new information. The theory on GAs is based on the fundamental theorem that claims that the number of schemata increases exponentially for schemata with tness above an average of the population. Schemata with a tness lower than average exponentially vanish. When schemata of a large de ning length are needed to obtain the optimum, a simple genetic algorithm does not work well. The other approach to processing the information contained in the selected set is to estimate the distribution of selected individuals and to generate new individuals according to this distribution. A general schema of these algorithms is called the Estimation of Distribution Algorithm (EDA) 1]. The estimation of the distribution of the selected set is a very complex problem. It has to be done e ciently and the estimate should be able to cover a large number of problems. A general implementation of EDA was presented in 1]. The weak point of this implementation was the determination of the distribution. The Univariate Marginal Distribution Algorithm (UMDA) 2] uses simple univariate marginal distributions. The theory shows that UMDA works perfect for linear problems. It works very well for the problems with not many signi cant dependencies. The Bivariate Marginal Distribution Algorithm (BMDA) 3] is an extension of UMDA. It is based on bivariate marginal distributions. Bivariate distributions allow taking into account the most important pair dependencies. BMDA does therefore perform well for linear as well as quadratic problems. The problem arises for the problems with signi cant dependencies of a higher order although many of such problems can be solved by BMDA e ciently as well.. If the structure of a problem were known, there would be a way to cope with dependencies of a higher order. The Factorized Distribution Algorithm 4] (FDA) works very e ciently for decomposable
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
1
Introduction
A marginal deformation of N = 4 super Yang-Mills theory to N = 1 theory preserving U (1) × U (1) global symmetry, along with a U (1)R symmetry, provides a new type IIB supergravity background geometry [1] via the AdS/CFT correspondence [2, 3, 4]. The gravity dual of the original undeformed theory has an isometry group which includes U (1) × U (1). The marginal deformation of the gauge theory can be described by an SL(2, R) transformation acting on a 2-torus T2 in the gravity solution. This particular SL(2, R) transformation produces a non-singular geometry provided that the original geometry is non-singular. Gravity duals corresponding to marginal deformations of other field theories based on conifolds and toric manifolds are also given in [1]. A prescription for finding the marginal deformations of eleven-dimensional gravity solutions with U (1)3 global symmetry was also provided in [1], and was applied to the case of AdS4 × S7 . The isometry group of S7 is SO (8). The U (1)3 symmetries can be embedded in the SU (4) subgroup of SO (8), which implies that the deformed solution preserves two supersymmetries in three dimensions. With the appropriate coordinates, the angular directions corresponding to the three U (1)’s can be identified, and the metric can be written in a form that explicitly shows the three-torus T3 symmetry. An additional angle is related to the SO (2)R = U (1)R R-symmetry, which is a symmetry of the usual 3-dimensional N = 2 superconformal field theories. Two of the T3 angles are used to dimensionally reduce and T-dualize the solution to type IIB theory. Performing an SL(2, R) transformation and then T-dualizing and lifting back to eleven dimensions on the transformed directions yields a new 11-dimensional solution. This deformed solution has a warp factor, as well as an additional term in the 4-form field strength, which depends on the deformation parameter. In the limit of vanishing deformation parameter, the original AdS4 × S7 solution is regained.
Abstract We generate new 11-dimensional supergravity solutions from deformations based on U (1)3 symmetries. The initial geometries are of the form AdS4 × Y7 , where Y7 is a 7-dimensional Sasaki-Einstein space. We consider a general family of cohomogeneity one Sasaki-Einstein spaces, as well as the recently-constructed cohomogeneity three Lp,q,r,s spaces. For certain cases, such as when the Sasaki-Einstein space is S7 , Q1,1,1 or M 1,1,1 , the deformed gravity solutions correspond to a marginal deformation of a known dual gauge theory.
This prescription can be readily applied to other 11-dimensional solutions with geometries of the form AdS4 × Y7 provided that, in addition to the R-symmetry group, the isometry
This U (1) corresponds to the R-symmetry of the gauge theory. The requirement that the
1
global symmetry group of the gauge theory includes U (1)3 corresponds to the condition that the U (1)3 lies within the isometry group of the Einstein-K¨ ahler base space. The isometry groups of most of the Sasaki-Einstein spaces we consider have SU (2) or SU (3) elements, which contain U (1) and U (1)2 subgroups, respectively. Thus, the resulting deformed spaces have isometry groups in which the SU (2) and SU (3) factors are replaced by U (1) and U (1)2 accordingly. Until recently, few explicit metrics were known for Sasaki-Einstein spaces. A countably infinite number of 5-dimensional Sasaki-Einstein manifolds of topology S2 × S3 has been constructed in [5, 6]1 . These Y p,q spaces are characterized by two coprime positive integers p and q . The marginal deformations of type IIB solutions with geometries of the form AdS5 × Y p,q have already been considered in [1]. Higher-dimensional Sasaki-Einstein spaces were found in [7]. We will focus on the 7dimensional spaces, which we refer to as X p,q . These spaces are cohomogeneity one and include the previously-known homogeneous spaces S7 , Q1,1,1 and M 1,1,1 as special cases2 . The 6-dimensional Einstein-K¨ ahler base space of X p,q can be expressed as a 2-dimensional bundle over a 4-dimensional Einstein-K¨ ahler space B4 . We have a couple of choices for the base space B4 , namely CP2 or CP1 × CP1 . For B4 = CP2 , S7 and M 1,1,1 arise as particular cases while, for B4 = CP1 × CP1 , Q1,1,1 arises as a special case. The X p,q family of spaces can be further generalized for B4 = CP1 × CP1 by rendering the characteristic radii of the two 2-spheres to be different [10, 11]. This is also a cohomogeneity one family of Sasaki-Einstein spaces. We will refer to these more general spaces as Z p,q,r,s, which are characterized by four positive