Bayesian Summarization at DUC and a Suggestion for Extrinsic Evaluation

合集下载

小样本类人概念学习贝叶斯学习PPT课件

即P F 1 ,F 2 .....F n C iP C ii 1 ,m
m
由于各个特征属性是条件独立：PC i PFj C i i 1,m ,j 1,n j
4. 如果 P C k F m a x P C 1 F , P C 2 F . . . . . . P C m F , k 1 , m 则 F Ck
1 确定特征属性及划分
7
2、Bayesian基础知识
2.4 理论知识—贝叶斯法则
P(h | D) P(D | h)P(h) P(D)
P(h) 假设h的先验概率 P(D) 训练数据D的先验概率 P(D | h) 假设h成立时观察到数据D的概率 P(h | D) 给定训练数据D时h成立的概率
用P(h)表示在没有训练数据前假设h拥有的先验概率，反映了h是正确假设的机会的背景知识，如果没有P(h)，可以简单地先赋予相同的先验概率
11
2、Bayesian基础知识
•2.5.1朴素贝叶斯分类器流程图
输出：特征属性和训练样本
输入：特征属性和训练样本输出：分类器输入：分类器和待分类项输出：待分类项与类别的映射关系
12
2、Bayesian基础知识
2.5.2 朴素贝叶斯分类器应用——检测SNS社区中不真实账号
设C=0表示真实账号，C=1表示不真实账号
Byvalent to:
a rg m a x cP w c * P c /P w
Since P(w) is the same for every possible c, we can ignore it, giving:
argm axcPwc*Pc
小样本下的类机器学习
贝叶斯学习
1
1、小样本机器学习

华为MateBook X Pro14 使用指南说明书

Research on Hierarchical Interactive Teaching Model Based on Naive Bayesian ClassificationDongyan FanInformation faculty, Business College of Shanxi University, Taiyuan 030031, ChinaAbstract—The purpose of this research is improving the current inject classroom teaching mode that ignores individual differences and inefficiency of students. By studying classification algorithm in data mining and applying the classification method based on Naive Bayes algorithm, we designed and implemented scientific classification of students, and draw lessons from stratified and interactive teaching mode, so as to builded a new effective teaching mode. The results show that through scientific classification of students, real-time hierarchical interaction teaching effectively stimulate students' interest in learning, improve cooperation ability, and improve classroom teaching efficiency.Keywords—Naive Bayesian; student classification; hierarchical interactive; teaching modelI.I NTRODUCTIONUnder the background of big data era, the current teaching mode is not adapt to the cultivation of innovative talents, there are many problems, such as low efficiency of classroom, teachers' manipulation of teaching process, ignore the individual differences of students in knowledge transfer ability. Therefore, this study aimed at these problems, by studying classification algorithm in data mining and applying the classification method based on Naive Bayes algorithm, we design and implement scientific classification of students, and draw lessons from stratified and interactive teaching mode, so as to build a new effective teaching mode. The mode enable students to learn efficiently, so as to adapt to the trend of rapid development of new technology and cultivate innovative talents.II.R ESEARCH M ETHODThe research and practice of the hierarchical interactive teaching model based on the Naive Bayesian classification is based on the classification of students' differences. So there are two major tasks need to do: the approaches to the students' difference measurement and grouping and the design of hierarchical interactive teaching framework. Its method flow is shown in Figure I.FIGURE I. RESEARCH METHOD FLOWFirst of all, based on the samples, the naive Bayes algorithm according to the student's attribute value is used to test the students' differences. Then, according to the results to make a scientific difference classification to achieve effective grouping for students. At the same time, the design of the hierarchical interactive teaching framework is carried out by the two subjects (the student is the main body, the teacher is the leading part). Finally, the teaching effect is evaluated and analyzed.III.S TUDENT C LASSIFICATION D ESIGN B ASED ON N AIVEB AYESIANA.Naive Bayesian Theoretical PrincipleAt present, there are many kinds of algorithms in data mining, such as based on Bayes algorithm, decision tree algorithm, neural network algorithm, rough set algorithm, genetic algorithm, support vector machine algorithm and so on. In the practical application of many classification algorithms, the most widely used algorithm is Naive Bayesian algorithm model. Naive Bayes is a simple and effective classification model.From Bayes’ theorem recall that:()()()||P A B P BP B AP A= (1)Equation (1): P(A) and P(B) separate representation the probability of occurrenceof events A andevents B.()|P A B indicates the probability of occurrence of event A under the premise that event B occurs. ()|P A B is a priori probability, and its value is often easily obtained.()|P B A indicates the probability of occurrence of event B under the premise that event A occurs. ()|P B A is a posteriori probability, and its value is the result of the solution of the Bayesian formula.The classifier structure diagram based on the naive Bayes algorithm is shown in Figure II. It’s leaf node Am represents the m attribute, and the root node C represents the category. Suppose {},,D C A S=are training samples, it includes the studentcategory {}12,,iC C C C= and the student attribute {}12,,mA A A A= .Suppose {}12,,nS S S S= represents acollection of classified students, in whichnS represents nthstudent. Suppose {}12,,k mX a a a= is a student to be classified,International Conference on Computer Science, Electronics and Communication Engineering (CSECE 2018)in which each m a represents an attribute eigenvalue of the pending item k X .FIGURE II. THE CLASSIFIER STRUCTURE DIAGRAMB. Design the Individualized Attributes of StudentsThe student classification method based on the naive Bayes algorithm is used the information of the past students as the sample set , which is used to construct the naive Bayes classifier.Students are classified according to the information of the students' attributes. The students divided into the same category are not simply using the score as criterion of evaluation. Its are classified by comprehensive evaluation after combination of other attributes.The difference classification based on the naive Bayes algorithm is select the individual attributes of the students as shown in Figure III. The students which 8 attribute values similar in the two dimensions (character and learning style) are put into one category, while the 12 attributes values of the three dimensions of personal basic situation, learning interest and cognitive ability are different. The purpose of the classification is to carry out differential teaching to implicit dynamic stratification and heterogeneous cooperation for students'cognitive ability, learning interest and basic information.FIGURE III. INDIVIDUALIZED ATTRIBUTES OF STUDENTSC. Student Classification Design Based on Naive Bayesian The process based on the naive Bayes classification is shown in Figure IV.FIGURE IV. STUDENT CLASSIFICATION CYCLE FLOW CHARTBASED ON NAIVE BAYES ALGORITHM1)()i P C is set to indicate the frequency of the occurrence of the student category i C in the training sample concentration, that is the category probability. For sample data sets, there are different levels of students in each category, which avoids the discrimination of students.()()i i P C Count C n=　(2)The function ()i Count C represents the number of students belonging to category i which is in the entire student sample collection of S .n represents the total number of the entire student sample collection of S .2)()|j j i P A C a = is set to represent the conditional probability of each characteristic attribute value of the student in the category.()()()|i C j j j j i i P A C Count A a a Count C ===(3)j j A a =indicates that the value of the j attribute is j a .Thefunction ()i C j j Count A a =represents the number of students which the attribute name is j A and attribute value is j a in the i student category.3) ()|k i P X C is set to represent the conditional probability of the students k X to be classified in the student category i C , m represents the number of attributes that describe student differences.()()1||mk i j j i j P X C P A C a ===∏ (4)4) ()j j P A a = is set to represent the probability of the student's attribute j A when the value is j a . ()()j j j j P A Count A a a n===　(5)The function ()j j Count A a = indicates the number when the value of attribute j is j a .5) ()k P X is set to indicate the probability that the student k X should be classified in the training sample concentration. ()()1mk j j j P X P A a ===∏ (6)6) ()|i k P C X is set to represent the conditional probability that the student k X should be classified to category i . ()()()()||k i i i k k P X C P C P C X P X =(7)7) ()max |k P C X is set to represent the maximum category probability of the student k X which should be classified to the student category .()()()(){}max 12|max |,|,,|k k k i k P C X P C X P C X P C X = (8) max C indicates the maximum category of conditionalprobability which is obtained by (8).Finally, (8) is used to calculate the maximum category probability of the students to be classified in the students category. That is the category of the students to be classified. At this point, one classification ends.IV. T HE D ESIGN OF THE H IERARCHICAL I NTERACTIVET EACHING F RAMEWORK The hierarchical interactive teaching model is an independent, inquiring and cooperative teaching model based on the classification of the naive Bayes algorithm. This model breaks the original classroom structure, and takes the interaction of teachers and students as the carrier, and also group autonomy, and let the students as the subject of the class. This model is guided by the task of the problem, and it is based on the students' self-study, and it aims at the completion of the task of the group. This model creates an ecological chain class based on group mutual learning to solve problems. It pays attention to the state of learning and the quality of life for every student. The design of the hierarchical interactive teaching model framework is shown in Figure V.FIGURE V. THE HIERARCHICAL INTERACTIVE TEACHING MODELFRAMEWORKThe four layers of the hierarchical interactive teaching model are closely related to each other, and support each other dynamically with the spiral. The five segments drive each other to form a whole, interlace and connect with each other. This teaching mode makes the classroom an active area for teachers and students to resonate with their ideology and to show their personality together.V.A NALYSIS OF T EACHING E FFECTIn this paper, the teaching effect is analyzed from two aspects by using the method of questionnaire and comparative experiment. First, the experimental class's comparative analysis before and after the experiment is carried out. Then, a comparative analysis between the experimental class and the contrast class is carried out.The comparative data of the experimental class before and after the experiment are shown in Figure VI. From Figure VI, it can be seen that 85.72% of the students have An attitude of approval towards the application of the hierarchical interactive teaching model based on the naive Bayes algorithm in the teaching. There are 70.13% of the students satisfied with the improved teaching effect. At the same time, it can be seen that the students' interest in learning and the ability to communicate and cooperate have improved obviously.FIGURE VI. THE COMPARATIVE DATA OF THE EXPERIMENTALCLASS BEFORE AND AFTER THE EXPERIMENT The comparison between the experimental class and the contrast class is shown in Figure VII. From Figure VII, we can see that students' satisfaction degree, teaching effect satisfaction and group learning atmosphere based on Naive Bayes algorithm classification are higher than those of the contrast class. At the same time, it can be seen that the students' interest in learning and the ability to communicate and cooperate have also been improved.FIGURE VII. THE COMPARISON BETWEEN THE EXPERIMENTALCLASS AND THE CONTRAST CLASSVI.C ONCLUSIONThe comprehensive analysis shows that, in the implementation of the hierarchical interactive teaching model based on the naive Bayes algorithm, the new teaching mode was accepted by the students , it was welcomed by the students. The new teaching mode can improve the ability of learning interest and collaboration of students. It has a very good teaching effect. Experiments show that the classification algorithm based on Naive Bayes has better feasibility and effectiveness in solving student classification problem.However, due to the limited personal time and ability, there are still some shortcomings in the study. In order to better achieve hierarchical interaction teaching mode based on Naive Bayes algorithm and improve teaching effect, we still need to further improve the limitation of applying naive Bayes algorithm, that is, suppose the attributes of students are independent.A CKNOWLEDGMENTThis work was supported by “Research and construction of the practice teaching system of information specialty(J2016138, The major project of teaching reform research in Shanxi Education Department)” and “The optimization and the platform construction of the practice teaching system of information specialty (SYJ201509, The major project of the research on teaching reform Business College of Shanxi University)”. Our special thanks are due to Prof. Ma Shangcai, for his helpful discussion with preparing the manuscript.R EFERENCES[1]Jonathan Rauh. Problems in Identifying Public and Private Organizations:A Demonstration Using a Simple Naive Bayesian Classification[J]. PublicOrganization Review,2015,15(1).[2]SangitaB, P., Deshmukh, S.R.. Use of Support Vector Machine, decisiontree and Naive Bayesian techniques for wind speed classification[P].Power and Energy Systems (ICPS), 2011 International Conference on,2011.[3]Yan Dong. Hierarchical interactive teaching mode and its practice andexploration of mathematics teaching in Senior High School [D].Southwest University,2016.[4]Chen Zhiqiang. Hierarchical interactive teaching mode and its practiceand exploration of mathematics teaching in Senior High School [D].Henan University,2016.[5]S. Mukherjee and N. Sharma. Intrusion detection using naïve Bayesclassifier with feature reduction[J].Procedia Technology,vol. 4, pp. 119–128, 2012.[6]L. Jiang, Z. Cai, D. Wang, and H. Zhang. Improving tree augmented naiveBayes for class probability estimation[J]. Knowledge-Based Systems, vol.26, pp. 239–245, 2012.[7]Sharma RK, Sugumaran V, Kumar H, Amarnath M. A comparative studyof naïve Bayes classifier and Bayes net classifier for fault diagnosis of roller bearing using sound signal[J].International Journal of Decision Support Systems. 2015 Jan 1; 1(1):115-29.[8]Hamse Y Mussa, John BO Mitchell,Robert C Glen.Full “Laplacianised”posterior naïve Bayesian algorithm[J]. Journal of Cheminformatics. 2013 5:37.[9]K. Magesh Kumar, P. Valarmathie. Domain and Intelligence BasedMultimedia Question Answering System[J]. International Journal of Evaluation and Research in Education, Vol. 5, No. 3, September 2016 : 227 – 234.[10][11]Zhijun Wang1, Li Chen, Terry Anderson. A Framework forInteraction and Cognitive Engagement in Connectivist Learning Contexts[J]. International Review of Research in Open and DistanceLearning, Vol. 15,No.2, Apr 2014:121-141.。

贝叶斯优化算法通俗理解

贝叶斯优化算法通俗理解Bayesian optimization algorithm is a powerful tool that is widely used in various fields such as machine learning, engineering design, and computer experiments. It is a probabilistic approach that helps in optimizing the parameters of a system by making informed decisions based on previous observations. The key idea behind it is to model the objective function as a probability distribution and update this distribution as new data becomes available.贝叶斯优化算法是一种强大的工具，在机器学习、工程设计和计算实验等领域被广泛应用。

它是一种概率方法，通过根据先前观察到的数据做出知情决策来优化系统的参数。

其关键思想是将目标函数建模为概率分布，并随着新数据的不断更新而更新该分布。

One of the main advantages of Bayesian optimization is its ability to handle noisy or expensive objective functions. Unlike traditional optimization techniques, which require a large number of function evaluations, Bayesian optimization uses a probabilistic model to guide the search process efficiently. This makes it particularly useful in scenarios where each evaluation is time-consuming or costly.贝叶斯优化的主要优势之一是它能够处理嘈杂或昂贵的目标函数。

贝叶斯分类英文缩写

贝叶斯分类英文缩写Bayesian classification, often abbreviated as "Naive Bayes," is a popular machine learning algorithm used for classification tasks. It is based on Bayes' theorem and assumes that features are independent of each other, hence the "naive" aspect. 贝叶斯分类，通常缩写为“朴素贝叶斯”，是一种常用的用于分类任务的机器学习算法。

它基于贝叶斯定理，并假设特征相互独立，因此有“朴素”之称。

One of the main advantages of Naive Bayes classification is its simplicity and efficiency. It is easy to implement and works well with large datasets. Additionally, it performs well even with few training examples. However, its main downside is the assumption of feature independence, which may not hold true in real-world scenarios. 朴素贝叶斯分类的主要优点之一是其简单和高效。

它易于实现，适用于大型数据集。

此外，即使只有少量训练样本，它也能表现良好。

然而，其主要缺点是特征独立的假设，在真实场景中可能并不成立。

From a mathematical perspective, Naive Bayes classification calculates the probability of each class given a set of features using Bayes' theorem. It estimates the likelihood of each class based on thetraining data and the probabilities of different features belonging to each class. The class with the highest probability is assigned to the input data point. 从数学角度来看，朴素贝叶斯分类使用贝叶斯定理计算了给定一组特征时每个类别的概率。

Stata 统计软件：biprobit 双因变量回归分析的 postestimation 工具说明书

Title biprobit postestimation—Postestimation tools for biprobitPostestimation commands predict margins Also seePostestimation commandsThe following postestimation commands are available after biprobit:Command Descriptioncontrast contrasts and ANOV A-style joint tests of estimatesestat ic Akaike’s,consistent Akaike’s,corrected Akaike’s,and Schwarz’s Bayesian in-formation criteria(AIC,CAIC,AIC c,and BIC)estat summarize summary statistics for the estimation sampleestat vce variance–covariance matrix of the estimators(VCE)estat(svy)postestimation statistics for survey dataestimates cataloging estimation resultsetable table of estimation results∗hausman Hausman’s speciﬁcation testlincom point estimates,standard errors,testing,and inference for linear combinations ofcoefﬁcients∗lrtest likelihood-ratio testmargins marginal means,predictive margins,marginal effects,and average marginal effects marginsplot graph the results from margins(proﬁle plots,interaction plots,etc.)nlcom point estimates,standard errors,testing,and inference for nonlinear combinationsof coefﬁcientspredict probabilities for joint,marginal,and conditional outcomespredictnl point estimates,standard errors,testing,and inference for generalized predictions pwcompare pairwise comparisons of estimatessuest seemingly unrelated estimationtest Wald tests of simple and composite linear hypothesestestnl Wald tests of nonlinear hypotheses∗hausman and lrtest are not appropriate with svy estimation results.12biprobit postestimation—Postestimation tools for biprobitpredictDescription for predictpredict creates a new variable containing predictions such as probabilities,linear predictions, and standard errors.Menu for predictStatistics>PostestimationSyntax for predictpredicttypenewvarifin,statistic nooffsetpredicttypestub*ifin,scoresstatistic DescriptionMainp11Φ2(x j b,z j g,ρ),predicted probability Pr(y1j=1,y2j=1);the defaultp10Φ2(x j b,−z j g,−ρ),predicted probability Pr(y1j=1,y2j=0)p01Φ2(−x j b,z j g,−ρ),predicted probability Pr(y1j=0,y2j=1)p00Φ2(−x j b,−z j g,ρ),predicted probability Pr(y1j=0,y2j=0)pmarg1Φ(x j b),marginal success probability for equation1pmarg2Φ(z j g),marginal success probability for equation2pcond1Φ2(x j b,z j g,ρ)/Φ(z j g),conditional probability of success for equation1pcond2Φ2(x j b,z j g,ρ)/Φ(x j b),conditional probability of success for equation2 xb1x j b,linear prediction for equation1xb2z j g,linear prediction for equation2stdp1standard error of the linear prediction for equation1stdp2standard error of the linear prediction for equation2whereΦ(·)is the standard normal-distribution function andΦ2(·)is the bivariate standardnormal-distribution function.These statistics are available both in and out of sample;type predict...if e(sample)...if wanted only for the estimation sample.Options for predict££Main p11,the default,calculates the bivariate predicted probability Pr(y1j=1,y2j=1).p10calculates the bivariate predicted probability Pr(y1j=1,y2j=0).p01calculates the bivariate predicted probability Pr(y1j=0,y2j=1).biprobit postestimation—Postestimation tools for biprobit3p00calculates the bivariate predicted probability Pr(y1j=0,y2j=0).pmarg1calculates the univariate(marginal)predicted probability of success Pr(y1j=1).pmarg2calculates the univariate(marginal)predicted probability of success Pr(y2j=1).pcond1calculates the conditional(on success in equation2)predicted probability of success Pr(y1j=1,y2j=1)/Pr(y2j=1).pcond2calculates the conditional(on success in equation1)predicted probability of success Pr(y1j=1,y2j=1)/Pr(y1j=1).xb1calculates the probit linear prediction x j b.xb2calculates the probit linear prediction z j g.stdp1calculates the standard error of the linear prediction for equation1.stdp2calculates the standard error of the linear prediction for equation2.nooffset is relevant only if you speciﬁed offset1(varname)or offset2(varname)for biprobit.It modiﬁes the calculations made by predict so that they ignore the offset variables;the linear predictions are treated as x j b rather than as x j b+offset1j and z jγrather than as z jγ+offset2j. scores calculates equation-level score variables.Theﬁrst new variable will contain∂ln L/∂(x jβ).The second new variable will contain∂ln L/∂(z jγ).The third new variable will contain∂ln L/∂(atanhρ).4biprobit postestimation—Postestimation tools for biprobitmarginsDescription for marginsmargins estimates margins of response for probabilities and linear predictions.Menu for marginsStatistics>PostestimationSyntax for marginsmarginsmarginlist,optionsmarginsmarginlist,predict(statistic...)predict(statistic...)...optionsstatistic Descriptionp11Φ2(x j b,z j g,ρ),predicted probability Pr(y1j=1,y2j=1);the default p10Φ2(x j b,−z j g,−ρ),predicted probability Pr(y1j=1,y2j=0)p01Φ2(−x j b,z j g,−ρ),predicted probability Pr(y1j=0,y2j=1)p00Φ2(−x j b,−z j g,ρ),predicted probability Pr(y1j=0,y2j=0)pmarg1Φ(x j b),marginal success probability for equation1pmarg2Φ(z j g),marginal success probability for equation2pcond1Φ2(x j b,z j g,ρ)/Φ(z j g),conditional probability of success for equation1 pcond2Φ2(x j b,z j g,ρ)/Φ(x j b),conditional probability of success for equation2 xb1x j b,linear prediction for equation1xb2z j g,linear prediction for equation2stdp1not allowed with marginsstdp2not allowed with marginsStatistics not allowed with margins are functions of stochastic quantities other than e(b).For the full syntax,see[R]margins.Also see[R]biprobit—Bivariate probit regression[U]20Estimation and postestimation commandsStata,Stata Press,and Mata are registered trademarks of StataCorp LLC.Stata andStata Press are registered trademarks with the World Intellectual Property Organizationof the United Nations.Other brand and product names are registered trademarks ortrademarks of their respective companies.Copyright c 1985–2023StataCorp LLC,College Station,TX,USA.All rights reserved.®。

斯坦福大学人工智能所有课程介绍

List of related AI Classes CS229covered a broad swath of topics in machine learning,compressed into a sin-gle quarter.Machine learning is a hugely inter-disciplinary topic,and there are many other sub-communities of AI working on related topics,or working on applying machine learning to diﬀerent problems.Stanford has one of the best and broadest sets of AI courses of pretty much any university.It oﬀers a wide range of classes,covering most of the scope of AI issues.Here are some some classes in which you can learn more about topics related to CS229:AI Overview•CS221(Aut):Artiﬁcial Intelligence:Principles and Techniques.Broad overview of AI and applications,including robotics,vision,NLP,search,Bayesian networks, and learning.Taught by Professor Andrew Ng.Robotics•CS223A(Win):Robotics from the perspective of building the robot and controlling it;focus on manipulation.Taught by Professor Oussama Khatib(who builds the big robots in the Robotics Lab).•CS225A(Spr):A lab course from the same perspective,taught by Professor Khatib.•CS225B(Aut):A lab course where you get to play around with making mobile robots navigate in the real world.Taught by Dr.Kurt Konolige(SRI).•CS277(Spr):Experimental Haptics.Teaches haptics programming and touch feedback in virtual reality.Taught by Professor Ken Salisbury,who works on robot design,haptic devices/teleoperation,robotic surgery,and more.•CS326A(Latombe):Motion planning.An algorithmic robot motion planning course,by Professor Jean-Claude Latombe,who(literally)wrote the book on the topic.Knowledge Representation&Reasoning•CS222(Win):Logical knowledge representation and reasoning.Taught by Profes-sor Yoav Shoham and Professor Johan van Benthem.•CS227(Spr):Algorithmic methods such as search,CSP,planning.Taught by Dr.Yorke-Smith(SRI).Probabilistic Methods•CS228(Win):Probabilistic models in AI.Bayesian networks,hidden Markov mod-els,and planning under uncertainty.Taught by Professor Daphne Koller,who works on computational biology,Bayes nets,learning,computational game theory, and more.1Perception&Understanding•CS223B(Win):Introduction to computer vision.Algorithms for processing and interpreting image or camera information.Taught by Professor Sebastian Thrun, who led the DARPA Grand Challenge/DARPA Urban Challenge teams,or Pro-fessor Jana Kosecka,who works on vision and robotics.•CS224S(Win):Speech recognition and synthesis.Algorithms for large vocabu-lary continuous speech recognition,text-to-speech,conversational dialogue agents.Taught by Professor Dan Jurafsky,who co-authored one of the two most-used textbooks on NLP.•CS224N(Spr):Natural language processing,including parsing,part of speech tagging,information extraction from text,and more.Taught by Professor Chris Manning,who co-authored the other of the two most-used textbooks on NLP.•CS224U(Win):Natural language understanding,including computational seman-tics and pragmatics,with application to question answering,summarization,and inference.Taught by Professors Dan Jurafsky and Chris Manning.Multi-agent systems•CS224M(Win):Multi-agent systems,including game theoretic foundations,de-signing systems that induce agents to coordinate,and multi-agent learning.Taught by Professor Yoav Shoham,who works on economic models of multi-agent interac-tions.•CS227B(Spr):General game playing.Reasoning and learning methods for playing any of a broad class of games.Taught by Professor Michael Genesereth,who works on computational logic,enterprise management and e-commerce.Convex Optimization•EE364A(Win):Convex Optimization.Convexity,duality,convex programs,inte-rior point methods,algorithms.Taught by Professor Stephen Boyd,who works on optimization and its application to engineering problems.AI Project courses•CS294B/CS294W(Win):STAIR(STanford AI Robot)project.Project course with no lectures.By drawing from machine learning and all other areas of AI, we’ll work on the challenge problem of building a general-purpose robot that can carry out home and oﬃce chores,such as tidying up a room,fetching items,and preparing meals.Taught by Professor Andrew Ng.2。

Bayesian Inference

p(black) = 18/38 = 0.473
p(red) = 18/38 = 0.473
p(color )

What is a conditional probability?
p(black|col1) = 6/12 = 0.5
p(color | col1)

p(black|col2) = 8/12 = 0.666
p(x)
Bayesian Roulette
• We’re interested in which column will win. • p(column) is our prior. • We learn color=black.
Bayesian Roulette
• • • • We’re interested in which column will win. p(column) is our prior. We learn color=black. What is p(color=black|column)?
What do you need to know to use it?
• You need to be able to express your prior beliefs about x as a probability distribution, p (x ) • You must able to relate your new evidence to your variable of interest in terms of it’s likelihood, p(y|x) • You must be able to multiply.
p(black|col1) = 6/12 = 0.5 p(black|col2) = 8/12 = 0.666 p(black|col3) = 4/12 = 0.333 p(black|zeros) = 0/2 = 0

SAR建筑物识别

• 数据预处理分为两步：第一步是几何校正；第二步是去噪。首先以SPOT全色数据位基准，将研究所需的COSMO-SkyMed数据和SPOT数据配准到一起。SAR图像上相干斑采用FROST自适应滤波方法消除。
Detection and extraction of building from interferometric SAR data
• 由于蚁群算法提取出的边缘的连续性较差，本文采用了链码特征来提取直线，通过链码分裂提取出直线段。
• 本文对任明武链码跟踪算法和莲码分裂算法进行了改进，使得算法适用于直线的提取。
• 由于关照和气候等因素的影响，图片质量往往都不高，再加上蚁群算法提取的边缘的连续性较差，提取出的建筑物的边缘大多是短的直线段，甚至房屋边界处会造成很多的干扰直线，这些直线无疑会对房屋结构搜索的准确性产生影响，而边界本身的不完整和间断给检测带来更大的困难。
• In this work, we focus on the task to extract information on urban structures of interest from high-resolution IFSAR data. Specifically, we want to automate the detection of the height and shape of the buildings present in a given area. To this aim, we apply to the original data a segmentation algorithm able to exploit their resolution, while maintaining at the same time a high robustness to noise.

贝叶斯统计方法 Bayesian methods

24-25 January 2007 An Overview of State-of-the-Art Data Modelling
The triplot
• A triplot gives a graphical representation of prior to posterior updating.
24-25 January 2007
An Overview of State-of-the-Art Data Modelling
Prior to posterior updating
Prior Data Posterior Bayes’s theorem is used to update our beliefs.
Bayesian methods, priors and Gaussian processes
John Paul Gosling Department of Probability and Statistics
Overview
• The Bayesian paradigm • Bayesian data modelling • Quantifying prior beliefs • Data modelling with Gaussian processes
• So, once we have our posterior, we have captured all our beliefs about the parameter of interest. • We can use this to do informal inference, i.e. intervals, summary statistics. • Formally, to make choices about the parameter, we must couple this with decision theory to calculate the optimal decision.

人工智能贝叶斯网络

Probability theory can be expressed in terms of two simple equations概率理论可使用两个简单线性方程来表达
– Sum Rule （加法规则） • 变量的概率是通过边缘化或者求和其他变量获得的
– Product Rule （乘法规则） • 用条件表达联合概率
Independence /Conditional Independence
A and B are independent iff P(A| B) = P(A) or P(B| A) = P(B) or P(A, B) = P(A) P(B)
A is conditionally independent of B given C: P(A | B, C) = P(A | C)
• 在概率图模型中 – 每个节点表示一个随机变量(or 一组随机变量) – 边表示变量间的概率关系
Graphical Models in CS
• 处理不确定性和复杂性的天然工具 –贯穿整个应用数学和工程领域
• 图模型中最重要的思想是模块性概念 – a complex system is built by combining simpler parts.
所有的概率推理和学习相当于不断重复加法和乘法法则
大纲
• Graphical models （概率图模型） • Bayesian networks
– Syntax（语法） – Semantics（语义） • Inference（推导） in Bayesian networks
什么是图模型？
概率分布的图表示 – 概率论和图论的结合
一种简单的，图形化的数据结构，用于表示变量之间的依赖关系（条件独立性），为任何全联合概率分布提供一种简明的规范。

高斯朴素贝叶斯训练集精确度的英语

高斯朴素贝叶斯训练集精确度的英语Gaussian Naive Bayes (GNB) is a popular machine learning algorithm used for classification tasks. It is particularly well-suited for text classification, spam filtering, and recommendation systems. However, like any other machine learning algorithm, GNB's performance heavily relies on the quality of the training data. In this essay, we will delve into the factors that affect the training set accuracy of Gaussian Naive Bayes and explore potential solutions to improve its performance.One of the key factors that influence the training set accuracy of GNB is the quality and quantity of the training data. In order for the algorithm to make accurate predictions, it needs to be trained on a diverse and representative dataset. If the training set is too small or biased, the model may not generalize well to new, unseen data. This can result in low training set accuracy and poor performance in real-world applications. Therefore, it is crucial to ensure that the training data is comprehensive and well-balanced across different classes.Another factor that can impact the training set accuracy of GNB is the presence of irrelevant or noisy features in the dataset. When the input features contain irrelevant information or noise, it can hinder the algorithm's ability to identify meaningful patterns and make accurate predictions. To address this issue, feature selection and feature engineering techniques can be employed to filter out irrelevant features and enhance the discriminative power of the model. Byselecting the most informative features and transforming them appropriately, we can improve the training set accuracy of GNB.Furthermore, the assumption of feature independence in Gaussian Naive Bayes can also affect its training set accuracy. Although the 'naive' assumption of feature independence simplifies the model and makes it computationally efficient, it may not hold true in real-world datasets where features are often correlated. When features are not independent, it can lead to biased probability estimates and suboptimal performance. To mitigate this issue, techniques such as feature extraction and dimensionality reduction can be employed to decorrelate the input features and improve the training set accuracy of GNB.In addition to the aforementioned factors, the choice of hyperparameters and model tuning can also impact the training set accuracy of GNB. Hyperparameters such as the smoothing parameter (alpha) and the covariance type in the Gaussian distribution can significantly influence the model's performance. Therefore, it is important to carefully tune these hyperparameters through cross-validation andgrid search to optimize the training set accuracy of GNB. By selecting the appropriate hyperparameters, we can ensure that the model is well-calibrated and achieves high accuracy on the training set.Despite the challenges and limitations associated with GNB, there are several strategies that can be employed to improve its training set accuracy. By curating a high-quality training dataset, performing feature selection and engineering, addressing feature independence assumptions, and tuning model hyperparameters, we can enhance the performance of GNB and achieve higher training set accuracy. Furthermore, it is important to continuously evaluate and validate the model on unseen data to ensure that it generalizes well and performs robustly in real-world scenarios. By addressing these factors and adopting best practices in model training and evaluation, we can maximize the training set accuracy of Gaussian Naive Bayes and unleash its full potential in various applications.。

若干分布参数的经验贝叶斯估计的风险分析

I
华中科技大学硕士学位论文 Abstract
The opinion of Bayes’ school regard the unknown-parameter θ as a random variable, and according to the prior information of parameter
绪论
Lindley, Berger 等 Bayes 学者的努力下，Bayes 方法在观点、方法和理论上都有了不断
的完善，逐步形成一种系统的统计推断方法。到上世纪 30 年代已经形成 Bayes 学派，
50～60 年代已经发展成为一个有影响的统计学派，彻底打破了经典统计学一统天下的
局面，顶起了统计学的半边天。近年来，Bayes 理论在许多国家都有广泛的应用，尤以经济领域居多。Bayes 学派著名的学者 Lindley 曾预言： “统计学的未来—— 一个 Bayes 的二十一世纪[1]”不论这一论断是否偏颇，近些年来 Bayes 统计的发展速度确实很快。翻下国内外杂志，尤其是美国统计学会的 JASA 和英国皇家学会的 JRSS 等，几乎每期都有 Bayes 统计方面的文章。可以说，Bayes 统计是当今国际统计科学研究的新热点。
Γ(θ , 1 ) is constructed under square loss function by using kernel estimation method. 2
Then to proved the asymptotic optimality of the estimator and gained the convergence rate. Linex loss function is inducted. Bayes estimate, empirical estimate and maximum likelihood estimate for the parameter θ of the positive exponential family were gained under squared loss function and Linex loss function. The three estimates were compared with each others.

论文-毕业论文

艾拉姆咖分布参数的贝叶斯估计摘要在研究武器装备的维修时间的过程中，俄罗斯的数学家引入了艾拉姆咖分布，该分布对装备维修理论的研究起到了积极的作用。

首先，我的的论文在经典统计学中，对艾拉姆咖分布的参数进行了矩估计和最大似然估计；然后，在选取指数分布函数作为先验分布的条件下，研究了艾拉姆咖分布在Linex、复合Linex、MLinex和复合MLinex损失函数下的Bayes估计，并且深入对艾拉姆咖分布的参数在复合Linex和MLinex损失函数下的多层Bayes估计的研究；最后，利用matlab 软件，产生了一组随机数，对在Linex损失函数的情况下，比较了矩估计、最大似然估计和Bayes估计的三个估计的估计值；并且对不同损失函数下，不同参数值对艾拉拉姆咖分布的Bayes估计的估计值变化的影响进行了研究。

关键词损失函数Bayes估计多层Bayes估计数值模拟matlab软件Bayesian Estimation of Эрланга Distribution Parametersunder Different Loss FunctionsAbstract The Эрланга Distribution Parameters plays an important role in studying the maintenance of the moment estimation First, this paper in classical statistics, compared in the case of Linex loss the and he moment estimation maximum likelihood estimation;Then, under the condition of selecting the function as the prior distribution, exponential distribution is studied in Linex, composite Linex, MLinex and composite MLinex bayesian, and further to the parameters of the exponential distribution, the the moment estimation of the exponential distribution oss function of empirical bayes estimation and bayesian estimation research; Finally, using matlab software, a set of random Numbers is created, and the estimation values of the moment estimation, maximum likelihood estimation and bayesian estimation are compared in the case of Linex loss funection. In addition, the moment estimration of the exponedntial distrribution and the estimtation valdues of the modment estidmation loss functions.Key words Loss function Bayesian estimation of Multilayer bayesian estimation Data simulation Matlab software目录引言 (1)1 经典统计学中的参数估计 (2)1.1 参数的矩估计 (2)1.2 参数的极大似然估计： (2)2 不同损失函数下的贝叶斯估计 (3)2.1 Linex损失函数下的贝叶斯估计 (5)2.2 复合Linex损失函数的贝叶斯估计 (5)2.3 MLinex损失函数下的贝叶斯估计 (6)2.4 复合MLinex损失函数下的贝叶斯估计 (7)3 超参数λ的估计 (8)4 不同损失函数下的多层Bayes估计 (10)4.1 复合Linex损失函数下的多层贝叶斯估计 (11)4.2 复合MLinex损失函数下的多层贝叶斯估计 (12)5 实例分析与数据模拟 (13)5.1 Linex损失函数下估计量的比较 (13)5.2 艾拉姆咖分布参数各种损失函数下的贝叶斯估计比较 (14)5.3 复合Linex损失函数下估计量的比较分析 (14)5.4 MLinex损失函数下估计量的比较 (15)结论 (16)参考文献 (17)致谢............................................................. 错误!未定义书签。

Bayesian Model Comparison by Monte Carlo Chaining

Abstract
The techniques of Bayesian inference have been applied with great success to many problems in neural computing including evaluation of regression functions, determination of error bars on predictions, and the treatment of hyper-parameters. However, the problem of model comparison is a much more challenging one for which current techniques have signi cant limitations. In this paper we show how an extended form of Markov chain Monte Carlo, called chaining, is able to provide e ective estimates of the relative probabilities of di erent models. We present results from the robot arm problem and compare them with the corresponding results obtained using the standard Gaussian approximation framework.
j j j
Z
expf?E (w)g dw:
(3)
Generally, it is straightforward to evaluate E (w) for a given value of w, although it is extremely di cult to evaluate the corresponding model evidence using (3) since the posterior distribution is typically very small except in narrow regions of the high-dimensional parameter space, which are unknown a-priori. Standard numerical integration techniques are therefore inapplicable. One approach is based on a local Gaussian approximation around a mode of the posterior (MacKay, 1992). Unfortunately, this approximation is expected to be accurate only when the number of data points is large in relation to the number of parameters in the model. In fact it is for relatively complex models, or problems for which data is scarce, that Bayesian methods have the most to o er. Indeed, Neal, R. M. (1996) has argued that, from a Bayesian perspective, there is no reason to limit the number of parameters in a model, other than for computational reasons. We therefore consider an approach to the evaluation of model evidence which overcomes the limitations of the Gaussian framework. For additional techniques and references to Bayesian model comparison, see Gilks et al. (1995) and Kass and Raftery (1995).

BAYESIAN MODEL AVERAGING AND BAYESIAN

The problem of evaluating the goodness of the predictive distributions developed by the Bayesian model averaging approach is investigated. Considering the maximization of the posterior mean of the expected log-likelihood of the predictive distributions (Ando (2007a)), we develop the Bayesian predictive information criterion (BPIC). According to the numerical examples, we show that the posterior mean of the log-likelihood has a positive bias comparing with the posterior mean of the expected log-likelihood, and that the bias estimate of BPIC is close to the true bias. One of the advantages of BPIC is that we can optimize the size of Occam’s razor. Monte Carlo simulation results show that the proposed method performs well. Key words and phrases : Bayesian model averaging, Bayesian predictive information criterion, Markov chain Monte Carlo.

贝叶斯统计的计算问题_Basics of Bayesian Computation

Bayesian Methods & ComputationLecture 8Basics of Bayesian ComputationDr. Ke DengCenter for statistical ScienceTsinghua University, Beijingkdeng@Challenges in Bayesian Inference v Two key computational challenges:Øcomputation of the posterior distribution p(θ|y)Øcomputation of the posterior predictive distribution p(!"|y)v Normalized and unnormalized densitiestarget distribution unnormalized densityUsually impossible to compute analytically in close formInference via Monte Carlo Simulation Remark: with samples from the posterior distribution, we can do any posterior inference we want in a numerical way!Approximation based on samplesv A distribution can be well represented by its samplesSignificance of Sampling v Numerical integrationv Optimization LLN. Convergence rate = #$%based on CLTiid samples from p(θ|y)ØPositive target function to be minimized: f(x)ØDefine Bolzeman distribution of f(x)as 123∝56789ØSampling 123 for small T is equivalent to minimizing f(x) Almost all optimization problems can be solved by samplingSampling a Univariate Distributionv Creating uniformly distributed (pseudo) random numbers ØRipley (1987)Ølinear congruential generator (LCG)v Simulating from standard univariate distributions ØGentle (2003)v Sample a general univariate distributionØInverse transform samplingØRejection samplingØImportance samplingInverse Transform Sampling Target distribution: F(x)cdfRandom variable: U~ Unif[0,1]:=<6$= ~F(x)Advantage: in principle can be used to simulate any univariate distribution Disadvantage: it’s often expensive to calculate <6$=Rejection SamplingTwo steps of rejection sampling:Target distribution: f(x)Envelope function: g(x)s.t.Mg(x) ≥f(x)(unnormalized) pdfLogic behind rejection samplingAdvantage: easy to implementDisadvantage: sometimes it’s difficult to find a good envelope function, and theacceptance rate can be too low.Assume ℎE ⊥D EAssume M F ℎE ≈M I F(E)I(E)=1Reduce Weight Variance byResampling •Importance resampling•Rejection control samplingChallenges in Sampling a High-Dimentional DistributionHow to sample a high-dimentional distribution?A(3$,3^,⋯,3_)There is a easy solution for independent case:A3$,3^,⋯,3_=A3$⋯A(3_)For non-independent, non-Gaussian case,there is NO easy solution!A toy example of transition kernel/matrixMarkov Chain Monte Carlo Target distribution:A(3⃗)=A(3$,3^,⋯,3_)Basic idea:construct a Markov transition kernel `(3⃗,!⃗)on the state space Swhose stationary distribution 13⃗=A(3⃗)Procedure:ØStart from an arbitrary location 3⃗b in SØTravel in S with the guidance of T, i.e., let 3⃗cd$∼`(3⃗c,⋅)ØWhen t is large enough, the chain converges to 13⃗Advantage: can be used to sample any distribution (even high-dimensional ones) Disadvantage: may take a long time to convergeKey Challenge: how to construct a proper Markov transition kernel `(3⃗,!⃗)Example: Bivariate Normal Joint distribution:Conditional distribution:Four independent sequences of the Gibbs sampler for a bivariateMetropolis-Hastings AlgorithmA(3⃗)=A(3$,3^,⋯,3_)Target distribution:`3⃗,!⃗=3⃗+n(0,Σ)Proposal distribution at one step or local transition kernel:a local random walkProcedure:ØStart from an arbitrary location 3⃗b in SØTravel in S with the guidance of T , i.e., let 3⃗cd$∼`(3⃗c ,⋅)ØCalculate i =F(p)2p,E ⃗F(E ⃗)2E ⃗,p , which becomes i =F(p)F(E ⃗) if `!⃗,3⃗ =`3⃗,!⃗ØDraw u~U[0,1]Move to new place !⃗if u<r , otherwise stay at 3⃗ØRepeat this for many time until convergeAdvantage: can be used to sample any distribution (even high-dimensional ones)Disadvantage: may take a long time to convergeKey Challenge: converge rate highly depends on the transition kernel `(3⃗,!⃗)Example: Bivariate Unit NormalTrace plots of 5 independent chains based on random walk MetropolisInference & Accessing Convergencebetween-sequence & within-sequencevarianceANOV A mindependent chain with equal lengthgood bad badbadReference•Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (2003). Bayesian Data Analysis (3rd ed), Chapman & Hall: London. (Textbook) –Chapter 10 & 11•Liu, J.S. (2001). Monte Carlo Strategies in Scientific Computing, Springer-Verlag: New York. Chapter 1-2, 5-621。

Stata软件关于贝叶斯统计介绍：基本概念

Stata软件关于贝叶斯统计介绍：基本概念本文非技术性的介绍贝叶斯统计。

贝叶斯统计方法变得越来越受欢迎，在Stata中，您可以使用bayesmh 命令拟合贝叶斯模型。

这篇文章简单的介绍下贝叶斯统计的概念和术语，以及bayesmh语法。

通过实例介绍贝叶斯统计我们中的大多数人都学习过概率统计法，虽然不知道具体数量，但是参数会被看作是固定的。

我们可以通过一个人口的样本来估计参数，但是不同的样本会有不同的估计。

这些不同估计的分布被称为抽样分布，它量化了我们估计的不稳定性。

但是参数本身仍然被认为是固定的。

贝叶斯统计方法是一种不同的统计思维方式。

参数被视为随机变量，可以用概率分布来描述。

我们甚至不需要数据来描述一个参数的分布，概率只是我们信任的程度。

让我们通过一个硬币投掷的例子来分析一下我们的直观感觉。

我会把硬币的两面称为“头”和“尾”，如果我将硬币投掷到空中，它落地时必须是“头”或“尾”在上面。

我使用θ表示硬币“头”在上面的概率。

先验分布贝叶斯例子的第一步就是定义一个先验分布θ。

先验分布是关于参数分布的数学表达式。

先验分布可以根据我们的经验或假设来设定这个参数，甚至是简单的猜测。

我可以用一个统一的分布来表示我的信念，“头”在上面的概率为0到1之间任意数字。

图1表示参数1和1的Beta分布，相当于0到1区间上均匀分布。

Figure 1: Uninformative Beta(1,1) PriorBeta（1，1）分布被称为无信息先验，因为参数的所有值都有相等的概率。

一般意义来说，“头”在上面的概率接近0.5，通过增加我的Beta分布，我可以用数学的方法表达这个信念。

图2表示的是参数为30和30的Beta分布。

Figure 2: Informative Beta(30,30) Prior图2被称为信息先验，因为所有的参数值没有相等的概率。

似然函数第二步就是收集数据，并定义一个似然函数。

再比如，我投掷硬币10次，有4次是“头”在上面。

Bayesian Kernel Machine Regression工具包用户指南说明书

Package‘bkmrhat’October12,2022Title Parallel Chain Tools for Bayesian Kernel Machine RegressionVersion1.1.3Date2022-03-29Author Alexander Keil[aut,cre]Maintainer Alexander Keil<*************>Description Bayesian kernel machine regression(from the'bkmr'package)is a Bayesian semi-parametric generalized linear model approach underidentity and probit links.There are a number of functions in thispackage that extend Bayesian kernel machine regressionﬁts to allowmultiple-chain inference and diagnostics,which leverage functionsfrom the'future','rstan',and'coda'packages.Reference:Bobb,J.F.,Henn,B.C.,Valeri,L.,&Coull,B.A.(2018).Statisticalsoftware for analyzing the health effects of multiple concurrentexposures via Bayesian kernel machine regression.;<doi:10.1186/s12940-018-0413-y>.License GPL(>=3)Depends coda,R(>=3.5.0)Imports bkmr,data.table,future,rstanSuggests knitr,markdownVignetteBuilder knitrEncoding UTF-8Language en-USRoxygenNote7.1.2NeedsCompilation noRepository CRANDate/Publication2022-03-2908:50:05UTCR topics documented:as.mcmc.bkmrﬁt (2)12as.mcmc.bkmrﬁt as.mcmc.list.bkmrﬁt.list (3)ExtractPIPs_parallel (4)kmbayes_combine (5)kmbayes_combine_lowmem (6)kmbayes_continue (8)kmbayes_diagnose (9)kmbayes_parallel (10)kmbayes_parallel_continue (11)OverallRiskSummaries_parallel (12)predict.bkmrﬁt (12)PredictorResponseBivar_parallel (13)PredictorResponseUnivar_parallel (14)SamplePred_parallel (14)SingVarRiskSummaries_parallel (15)Index16 as.mcmc.bkmrfit Convert bkmrﬁt to mcmc object for coda MCMC diagnosticsDescriptionConverts a kmrfit(from the bkmr package)into an mcmc object from the coda package.The coda package enables many different types of single chain MCMC diagnostics,including geweke.diag, traceplot and effectiveSize.Posterior summarization is also available,such as HPDinterval and summary.mcmc.Usage##S3method for class bkmrfitas.mcmc(x,iterstart=1,thin=1,...)Argumentsx object of type kmrﬁt(from bkmr package)iterstartﬁrst iteration to use(e.g.for implementing burnin)thin keep1/thin%of the total iterations(at regular intervals)...unusedValueAn mcmc objectas.mcmc.list.bkmrﬁt.list3 Examples#following example from https://jenfb.github.io/bkmr/overview.htmlset.seed(111)library(coda)library(bkmr)dat<-bkmr::SimData(n=50,M=4)y<-dat$yZ<-dat$ZX<-dat$Xset.seed(111)fitkm<-kmbayes(y=y,Z=Z,X=X,iter=500,verbose=FALSE,varsel=FALSE)mcmcobj<-as.mcmc(fitkm,iterstart=251)summary(mcmcobj)#posterior summaries of model parameters#compare with default from bkmr package,which omits first1/2of chainsummary(fitkm)#note this only works on multiple chains(see kmbayes_parallel)#gelman.diag(mcmcobj)#lots of functions in the coda package to usetraceplot(mcmcobj)#will also fail with delta functions(when using variable selection)try(geweke.plot(mcmcobj))as.mcmc.list.bkmrfit.listConvert multi-chain bkmrﬁt to mcmc.list for coda MCMC diagnosticsDescriptionConverts a kmrfit.list(from the bkmrhat package)into an mcmc.list object from the coda pack-age.The coda package enables many different types of MCMC diagnostics,including geweke.diag, traceplot and effectiveSize.Posterior summarization is also available,such as HPDinterval and ing multiple chains is necessary for certain MCMC diagnostics,such as gelman.diag,and gelman.plot.Usage##S3method for class list.bkmrfit.listas.mcmc(x,...)Argumentsx object of type kmrﬁt.list(from bkmrhat package)...arguments to as.mcmc.bkmrfit4ExtractPIPs_parallelValueAn mcmc.list objectExamples#following example from https://jenfb.github.io/bkmr/overview.htmlset.seed(111)library(coda)dat<-bkmr::SimData(n=50,M=4)y<-dat$yZ<-dat$ZX<-dat$Xset.seed(111)future::plan(strategy=future::multisession,workers=2)#run2parallel Markov chains(more usually better)fitkm.list<-kmbayes_parallel(nchains=2,y=y,Z=Z,X=X,iter=1000,verbose=FALSE,varsel=FALSE)mcmcobj=as.mcmc.list(fitkm.list)summary(mcmcobj)#Gelman/Rubin diagnostics won t work on certain objects,#like delta parameters(when using variable selection),#so the rstan version of this will work better(does not give errors)try(gelman.diag(mcmcobj))#lots of functions in the coda package to useplot(mcmcobj)#both of these will also fail with delta functions(when using variable selection) try(gelman.plot(mcmcobj))try(geweke.plot(mcmcobj))closeAllConnections()ExtractPIPs_parallel Posterior inclusion probabilities by chainDescriptionPosterior inclusion probabilities by chainUsageExtractPIPs_parallel(x,...)Argumentsx bkmrﬁt.list object from kmbayes_parallel...arguments to ExtractPIPskmbayes_combine5Valuedata.frame with all chains togetherkmbayes_combine Combine multiple BKMR chainsDescriptionCombine multiple chains comprising BKMRﬁts at different starting values.Usagekmbayes_combine(fitkm.list,burnin=NULL,excludeburnin=FALSE,reorder=TRUE)comb_bkmrfits(fitkm.list,burnin=NULL,excludeburnin=FALSE,reorder=TRUE) Argumentsfitkm.list output from kmbayes_parallelburnin(numeric,or default=NULL)add in custom burnin(number of burnin iterations per chain).If NULL,then default to half of the chainexcludeburnin(logical,default=FALSE)should burnin iterations be excluded from theﬁnal chains?Note that all bkmr package functions automatically exclude burnin fromcalculations.reorder(logical,default=TRUE)ensures that theﬁrst half of the combined chain con-tains only theﬁrst half of each individual chain-this allows unaltered use ofstandard functions from bkmr package,which automatically trims theﬁrst halfof the iterations.This can be used for posterior summaries,but certain diag-nostics may not work well(autocorrelation,effective sample size)so the diag-nostics should be done on the individual chains#’@param...arguments toas.mcmc.bkmrfitDetailsChains are not combined fully sequentiallyValuea bkmrplusfit object,which inherits from bkmrfit(from the kmbayes function)with multiplechains combined into a single object and additional parameters given by chain and iters,which index the speciﬁc chains and iterations for each posterior sample in the bkmrplusfit objectExamples#following example from https://jenfb.github.io/bkmr/overview.htmlset.seed(111)library(bkmr)dat<-bkmr::SimData(n=50,M=4)y<-dat$yZ<-dat$ZX<-dat$Xset.seed(111)future::plan(strategy=future::multisession,workers=2)#run4parallel Markov chains(low iterations used for illustration)fitkm.list<-kmbayes_parallel(nchains=2,y=y,Z=Z,X=X,iter=500,verbose=FALSE,varsel=TRUE)#use bkmr defaults for burnin,but keep thembigkm=kmbayes_combine(fitkm.list,excludeburnin=FALSE)ests=ExtractEsts(bigkm)#defaults to keeping second half of samplesExtractPIPs(bigkm)pred.resp.univar<-PredictorResponseUnivar(fit=bigkm)risks.overall<-OverallRiskSummaries(fit=bigkm,y=y,Z=Z,X=X,qs=seq(0.25,0.75,by=0.05),q.fixed=0.5,method="exact")#additional objects that are not in a standard bkmrfit object:summary(bigkm$iters)#note that this reflects how fits are re-ordered to reflect burnin table(bigkm$chain)closeAllConnections()kmbayes_combine_lowmemCombine multiple BKMR chains in lower memory settingsDescriptionCombine multiple chains comprising BKMRﬁts at different starting values.This function writes some results to disk,rather than trying to process fully within memory which,in some cases,will result in avoiding"out of memory"errors that can happen with kmbayes_combine.Usagekmbayes_combine_lowmem(fitkm.list,burnin=NULL,excludeburnin=FALSE,reorder=TRUE)comb_bkmrfits_lowmem(fitkm.list,burnin=NULL,excludeburnin=FALSE,reorder=TRUE)Argumentsfitkm.list output from kmbayes_parallelburnin(numeric,or default=NULL)add in custom burnin(number of burnin iterations per chain).If NULL,then default to half of the chainexcludeburnin(logical,default=FALSE)should burnin iterations be excluded from theﬁnal chains?Note that all bkmr package functions automatically exclude burnin fromcalculations.reorder(logical,default=TRUE)ensures that theﬁrst half of the combined chain con-tains only theﬁrst half of each individual chain-this allows unaltered use ofstandard functions from bkmr package,which automatically trims theﬁrst halfof the iterations.This can be used for posterior summaries,but certain diag-nostics may not work well(autocorrelation,effective sample size)so the diag-nostics should be done on the individual chains#’@param...arguments toas.mcmc.bkmrfitDetailsChains are not combined fully sequentially(see"reorder")Valuea bkmrplusfit object,which inherits from bkmrfit(from the kmbayes function)with multiplechains combined into a single object and additional parameters given by chain and iters,which index the speciﬁc chains and iterations for each posterior sample in the bkmrplusfit objectExamples#following example from https://jenfb.github.io/bkmr/overview.htmlset.seed(111)library(bkmr)dat<-bkmr::SimData(n=50,M=4)y<-dat$yZ<-dat$ZX<-dat$Xset.seed(111)future::plan(strategy=future::multisession,workers=2)#run4parallel Markov chains(low iterations used for illustration)fitkm.list<-kmbayes_parallel(nchains=2,y=y,Z=Z,X=X,iter=500,verbose=FALSE,varsel=TRUE)8kmbayes_continue #use bkmr defaults for burnin,but keep thembigkm=kmbayes_combine_lowmem(fitkm.list,excludeburnin=FALSE)ests=ExtractEsts(bigkm)#defaults to keeping second half of samplesExtractPIPs(bigkm)pred.resp.univar<-PredictorResponseUnivar(fit=bigkm)risks.overall<-OverallRiskSummaries(fit=bigkm,y=y,Z=Z,X=X,qs=seq(0.25,0.75,by=0.05),q.fixed=0.5,method="exact")#additional objects that are not in a standard bkmrfit object:summary(bigkm$iters)#note that this reflects how fits are re-ordered to reflect burnin table(bigkm$chain)closeAllConnections()kmbayes_continue Continue sampling from existing bkmrﬁtDescriptionUse this when you’ve used MCMC sampling with the kmbayes function,but you did not take enough samples and do not want to start over.Usagekmbayes_continue(fit,...)Argumentsfit output from kmbayes...arguments to kmbayes_continueDetailsNote this does not fully start from the prior values of the MCMC chains.The kmbayes function does not allow full speciﬁcation of the kernel function parameters,so this will restart the chain at the last values of allﬁxed effect parameters,and start the kernel r parmeters at the arithmetic mean of all r parameters from the last step in the previous chain.Valuea bkmrfit.continued object,which inherits from bkmrfit objects similar to kmbayes output,andwhich can be used to make inference using functions from the bkmr package.See Alsokmbayes_parallelkmbayes_diagnose9Examplesset.seed(111)dat<-bkmr::SimData(n=50,M=4)y<-dat$yZ<-dat$ZX<-dat$X##Not run:fitty1=bkmr::kmbayes(y=y,Z=Z,X=X,est.h=TRUE,iter=100)#do some diagnostics here to see if100iterations(default)is enough#add100additional iterations(for illustration-still will not be enough)fitty2=kmbayes_continue(fitty1,iter=100)cobj=as.mcmc(fitty2)varnames(cobj)##End(Not run)kmbayes_diagnose MCMC diagnostics using rstanDescriptionGive MCMC diagnostistics from the rstan package using the Rhat,ess_bulk,and ess_tail functions.Note that r-hat is only reported for bkmrfit.list objects from kmbayes_parallelUsagekmbayes_diagnose(kmobj,...)kmbayes_diag(kmobj,...)Argumentskmobj Either an object from kmbayes or from kmbayes_parallel...arguments to monitorExamplesset.seed(111)dat<-bkmr::SimData(n=50,M=4)y<-dat$yZ<-dat$ZX<-dat$Xset.seed(111)future::plan(strategy=future::multisession)fitkm.list<-kmbayes_parallel(nchains=2,y=y,Z=Z,X=X,iter=1000,10kmbayes_parallelverbose=FALSE,varsel=TRUE)kmbayes_diag(fitkm.list)kmbayes_diag(fitkm.list[[1]])#just the first chaincloseAllConnections()kmbayes_parallel Run multiple BKMR chains in parallelDescriptionFit parallel chains from the kmbayes function.These chains leverage parallel processing from the future package,which can speedﬁtting and enable diagnostics that rely on multiple Markov chains from dispersed initial values.Usagekmbayes_parallel(nchains=4,...)Argumentsnchains number of parallel chains...arguments to kmbayesValuea"bkmrﬁt.list"object,which is just an R list object in which each entry is a"bkmrﬁt"object kmbayesExamplesset.seed(111)dat<-bkmr::SimData(n=50,M=4)y<-dat$yZ<-dat$ZX<-dat$Xset.seed(111)future::plan(strategy=future::multisession,workers=2)#only50iterations fit to save installation timefitkm.list<-kmbayes_parallel(nchains=2,y=y,Z=Z,X=X,iter=50,verbose=FALSE,varsel=TRUE)closeAllConnections()kmbayes_parallel_continue11 kmbayes_parallel_continueContinue sampling from existing bkmr_parallelﬁtDescriptionUse this when you’ve used MCMC sampling with the kmbayes_parallel function,but you did not take enough samples and do not want to start over.Usagekmbayes_parallel_continue(fitkm.list,...)Argumentsfitkm.list output from kmbayes_parallel...arguments to kmbayes_continueValuea bkmrfit.list object,which is just a list of bkmrfit objects similar to kmbayes_parallelSee Alsokmbayes_parallelExamplesset.seed(111)dat<-bkmr::SimData(n=50,M=4)y<-dat$yZ<-dat$ZX<-dat$X##Not run:future::plan(strategy=future::multisession,workers=2)fitty1p=kmbayes_parallel(nchains=2,y=y,Z=Z,X=X)fitty2p=kmbayes_parallel_continue(fitty1p,iter=3000)cobj=as.mcmc.list(fitty2p)plot(cobj)##End(Not run)12predict.bkmrﬁt OverallRiskSummaries_parallelOverall summary by chainDescriptionOverall summary by chainUsageOverallRiskSummaries_parallel(x,...)Argumentsx bkmrﬁt.list object from kmbayes_parallel...arguments to OverallRiskSummariesValuedata.frame with all chains togetherpredict.bkmrfit Posterior mean/sd predictionsDescriptionProvides observation level predictions based on the posterior mean,or,alternatively,yields the pos-terior standard deviations of predictions for an observation.This function is useful for interfacing with ensemble machine learning packages such as SuperLearner,which utilize only point esti-mates.Usage##S3method for class bkmrfitpredict(object,ptype=c("mean","sd.fit"),...)Argumentsobjectﬁtted object of class inheriting from"bkmrﬁt".ptype"mean"or"sd.ﬁt",where"mean"yields posterior mean prediction for every observation in the data,and"sd.ﬁt"yields the posterior standard deviation forevery observation in the data....arguments to SamplePredPredictorResponseBivar_parallel13 Valuevector of predictions the same length as the outcome in the bkmrﬁt objectExamples#following example from https://jenfb.github.io/bkmr/overview.htmllibrary(bkmr)set.seed(111)dat<-bkmr::SimData(n=50,M=4)y<-dat$yZ<-dat$ZX<-dat$Xset.seed(111)fitkm<-kmbayes(y=y,Z=Z,X=X,iter=200,verbose=FALSE,varsel=TRUE)postmean=predict(fitkm)postmean2=predict(fitkm,Znew=Z/2)#mean difference in posterior meansmean(postmean-postmean2)PredictorResponseBivar_parallelBivariate predictor response by chainDescriptionBivariate predictor response by chainUsagePredictorResponseBivar_parallel(x,...)Argumentsx bkmrﬁt.list object from kmbayes_parallel...arguments to PredictorResponseBivarValuedata.frame with all chains together14SamplePred_parallel PredictorResponseUnivar_parallelUnivariate predictor response summary by chainDescriptionUnivariate predictor response summary by chainUsagePredictorResponseUnivar_parallel(x,...)Argumentsx bkmrﬁt.list object from kmbayes_parallel...arguments to PredictorResponseUnivarValuedata.frame with all chains togetherSamplePred_parallel Posterior samples of E(Y|h(Z),X,beta)by chainDescriptionPosterior samples of E(Y|h(Z),X,beta)by chainUsageSamplePred_parallel(x,...)Argumentsx bkmrﬁt.list object from kmbayes_parallel...arguments to SamplePredValuedata.frame with all chains togetherSingVarRiskSummaries_parallel15 SingVarRiskSummaries_parallelSingle variable summary by chainDescriptionSingle variable summary by chainUsageSingVarRiskSummaries_parallel(x,...)Argumentsx bkmrﬁt.list object from kmbayes_parallel...arguments to SingVarRiskSummariesValuedata.frame with all chains togetherIndexas.mcmc.bkmrfit,2,3,5,7as.mcmc.list.bkmrfit.list,3comb_bkmrfits(kmbayes_combine),5 comb_bkmrfits_lowmem(kmbayes_combine_lowmem),6 effectiveSize,2,3ess_bulk,9ess_tail,9ExtractPIPs,4ExtractPIPs_parallel,4gelman.diag,3gelman.plot,3geweke.diag,2,3HPDinterval,2,3kmbayes,5,7–10kmbayes_combine,5kmbayes_combine_lowmem,6 kmbayes_continue,8,8,11kmbayes_diag(kmbayes_diagnose),9 kmbayes_diagnose,9kmbayes_parallel,4,5,7–9,10,11–15 kmbayes_parallel_continue,11 mcmc,2mcmc.list,3,4monitor,9 OverallRiskSummaries,12 OverallRiskSummaries_parallel,12 predict.bkmrfit,12 PredictorResponseBivar,13 PredictorResponseBivar_parallel,13 PredictorResponseUnivar,14 PredictorResponseUnivar_parallel,14 Rhat,9SamplePred,12,14SamplePred_parallel,14SingVarRiskSummaries,15SingVarRiskSummaries_parallel,15summary.mcmc,2,3traceplot,2,316。

相关主题

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Bayesian Summarization at DUC and a Suggestion for Extrinsic EvaluationHal Daum´e III and Daniel MarcuInformation Sciences Institute4676Admiralty Way,Suite1001Marina del Rey,CA90292{hdaume,marcu}@AbstractWe describe our entry into the Docu-ment Understanding Conference compe-tition for evaluating query-focused multi-document summarization systems.Oursystem is based on a Bayesian Query-Focused Summarization model,similarto the system we entered into the MSEcompetition.This paper begins by de-scribing the(few)differences between ourDUC system and our MSE system anddescribes our placement in the competi-tion.The remainder of this paper arguesin favor of performing extrinsic evaluationof summarization systems,and suggests amethod for doing so.1Our DUC SystemThe system we entered into DUC this year is nearly identical to the system we entered into MSE a few months ago.Please refer to(Daum´e III and Marcu, 2005)for more details on the system.In brief, the system is based on a Bayesian Query-Focused Summarization(BQFS)model(manuscript,unpub-lished).This model can be considered a black box that takes as input a collection of queries,a collec-tion of documents and links that connect queries to relevant documents(relevance judgments).As out-put,the system will produce scores for each sentence in each document for its respective query.These scores are generalized distances,so that a sentence achieving a score of zero is best,and all other scores are strictly ing these scores,we can eas-ily rank sentences for extraction.Once sentences have been scored by the BQFS method,we use the same discriminative training technique described in our MSE paper to perform the actual sentence selection.This aspect of the model learns weights for the following sentence-level features:similarity to previously extracted sen-tences via MMR(Carbonell and Goldstein,1998), score according to the BQFS model,position score, a binary feature that detects quotes,the sentence length,the document length,the document similar-ity to the mean document,KL divergence between the sentence and the centroid document,the num-ber of pronouns in this sentence,and the number of attribution verbs in this sentence.We use bi-nary search on feature weights,one feature at a time, with R OUGE-BE as the objective function.Finally, like our MSE system,we perform weak sentence compression,by dropping constituents of various la-bels.Unlike MSE,this turned out to not be so useful for optimizing R OUGE-BE performance on devel-opment data.We believe this is because MSE aimed for100word summaries,while DUC aimed for250 word summaries;we believe the former case to be more interesting.2DUC ResultsWe brieﬂy describe three sets of the ofﬁcial DUC results.In Figure1,we have graphed the ofﬁ-cial pyramid scores,both precision-based and recall-based;we have also presented the aggregated f-score.We have not included the human summaries in this graph,though they score better than any otherQ2Q4Best Human 4.97 5.004.34 4.14 4.00 Med Human 4.91 4.913.92 2.98 2.10 Our System4.04 3.081Thanks to Liang Zhou for aggregating these results.2Thanks to Guy Lapalme for aggregating these results.ity assessments for the four DUC questions for:the best human(for each question),the best system(for each question),the median human,the median sys-tem and our system.The questions are as follows:Q1.Grammaticality:no datelines,system-internal formatting,capitalization errors or ungrammat-ical sentences.Q2.Non-redundancy:no unnecessary repetition in the form of whole sentences,facts or noun phrases.Q3.Referential clarity:no dangling references.Q4.Focus:no sentences containing irrelevant infor-mation.Q5.Structure and coherence:no disconnected facts;summary should build into a coherent body of information.As we can see from Table1,our system falls signiﬁcantly below the median for grammaticality (Q1),and slightly below for non-redundancy(Q2), focus(Q4)and coherence(Q5).We are slightly above the median for referential clarity(Q3).Our relatively poor grammaticality performance is al-most certainly due to the simplistic sentence com-pression we employ.Our slightly better perfor-mance on referential clarity most likely has to do with the fact that we explicitly include a feature in our model for looking at the presence of pronouns, and we have observed that after training the model, this feature receives a negative weight(sentences with many pronouns are disprefered).Another observation one can draw from Table1is that the best system(and the median system)suffer most on Q3and Q5:referential clarity and structure and coherence(the latter being the worst).This sug-gests that further research into these areas is needed for all systems,not just for our own.4Toward Extrinsic Evaluation Evaluation of summarization systems is difﬁcult.As a community,DUC has,over the past several years, employed a variety of manual and automatic mea-sures of summary quality,including ROUGE(Lin and Hovy,2003),SEE(Lin,2001)and pyramid (Nenkova and Passonneau,2004).Additionally,weFigure1:Ofﬁcial DUC pyramid results;we are system10. Figure2:Ofﬁcial DUC human evaluation results;we are system10.have this year employed the linguistic quality ques-tions(derived from SEE)and a responsiveness ques-tion(related to the fact that this year’s summaries are query-focused).With the exception of ROUGE,all of these metrics are manually computed and are in-creasingly time consuming and expensive(pyramid, especially so).In this section,we argue in favor of moving away from these intrinsic evaluation metrics and toward extrinsic evaluation metrics.We will not discuss automatic metrics at all.4.1Why Extrinsic?When one wishes to solve a real-world task,the world hands to us an evaluation metric.For in-stance,in information retrieval,a reasonable evalua-tion metric is:does the system improve my ability to ﬁnd information quickly.Unfortunately,this metric is often too difﬁcult or too time-consuming or too expensive to measure.When this is the case,one of-ten seeks a surrogate measure that is expected to cor-relate well with the true metric.Again,in IR,mea-sures such as mean average precision,or mean re-ciprocal rank,have been used to evaluate IR systems without having to set up large-scale experiments. All of the manual DUC evaluation measures have been of the latter variety:we have separated the task (producing summaries that enable users toﬁnd in-formation relevant to their individual needs)from the measure(how well does a system summary match what a human would do when asked to per-form this task).By separating the task from the mea-sure,we have been able to evaluate summarization systems without embedding them in a larger task. Unfortunately,our current methods for evaluatingFigure3:Ofﬁcial DUC automatic ROUGE results;we are system10.summarization systems intrinsically are time con-suming and error prone.Given that we are already expending a huge amount of time doing system eval-uation,weﬁnd it interesting to consider whether we might not be better off performing an extrin-sic evaluation,instead.This presents its own prob-lems,most importantly,how to select a task.We will shortly present one possible task,which we be-lieve to be both realistic and easy to perform,but it is likely that having a small collection of maximally orthogonal tasks would be worthwhile.Moreover, it is important to keep in mind that when perform-ing an intrinsic evaluation,as we have been doing, we are implicitly assuming that better performance on the intrinsic evaluation will lead to better perfor-mance on a task.However,at this point,we wonder what sort of task the current evaluation systems mea-sure:Do high pyramid scores imply good perfor-mance for web search?For law search?For product comparison?4.2Proposal:Relevance PredictionWe propose the use of summaries in the context of a search application.This is inherently a query-focused single document task.The idea behind the relevance-prediction evaluation scheme is that a query-focused summary of a document should give sufﬁcient information about the document to be able to judge relevance,without reading the full docu-ment.The task is executed as follows:a large collection of documents is assembled,and a number of queries (e.g.,20)are assembled similarly to those for TREC (we ensure that there are some relevant documents in the set,but not too many).A baseline IR sys-tem is run against the collection and documents are judged by humans for relevance.For each query, a number of relevant documents and a number of non-relevant documents are set aside(perhaps10-20of each).A summarization system is then given the query and the documents and asked to produce a single-document summary of each(both for the rel-evant and the non-relevant documents).A human, different from the one used for the original docu-ment judgments,now reads each of the queries and summaries and based on only this information,at-tempts to judge whether the document is or is not relevant.A system that enables the judge to cor-rectly predict relevance is better than one that does not.This evaluation metric has several good points. First and most importantly,improved performance in this task clearly has applications to several real world tasks.Second,the more expensive annotation—the original relevance annotations—has already been done by NIST for the TREC evalu-ations;reading the summaries and judging relevance would be a reasonably efﬁcient process,and not un-like the task we all perform daily when interacting with a search engine.Anyone could do such an eval-uation;we would not need specially trained people (aside from being able to understand the queries). Finally,since the evaluation is based on the output of a real IR system,it is unlikely that simple tech-niques such as pulling out the top ranked words,is not likely to perform well(because most IR engines use something like tf-idf for ranking,so the tf-idf scores of most top rated documents will be similar).A potential disadvantage to this approach is that it might be advantageous for a system to attempt to preclassify each document as relevant or non-relevant,and build a bogus(or empty)summary for documents it deems irrelevant.In this case,the eval-uation will measure something other than summary quality.We do not believe this to be a signiﬁcant is-sue,so long as the system that performs the initial re-trieval is state-of-the-art.In that case,improvements in classiﬁcation performance beyond that done im-plicitly by the IR system is unlikely.One possible controversial aspect of this evalua-tion is that it does not require any humans to write any summaries.This is good,because it signiﬁ-cantly cuts down the amount of human effort needed to perform the evaluation.The only downside is that it does not provide the DUC community with addi-tional training data for building systems.4.3Previous Experiments with this Metric This metric has been proposed before(Dorr et al., 2004;Zajic et al.,2004;Dorr et al.,2005)to pro-viding a litmus test for automatic evaluation met-rics.The experiments we describe here can be seen as additional evidence that this metric is potentially useful,but looking at it from a different perspective. The systems we compare using this metric are also different from those compared previously.4.4Evaluation ExperimentsWe have performed some initial experiments vali-dating the effectiveness and efﬁciency of this ap-proach.We have used the TREC data sets and TREC relevance judgments as our data source.We used queries204,205,207,208,210,211,212,216,217, 220,221,223,225,227,228,234,235,238,239, 240,242,243and248in the experiments.For each query,we took a collection of(on average)16doc-uments,divided unevenly between known relevant documents and known irrelevant documents.The ir-relevant documents are the highest-rated documents for a given query from the best performing system from the corresponding TREC evaluation.Four summarization systems were run on theseSystem RKWIC48.265.655.2P REFIX36.351.837.4Table2:Precision,recall and f-score of theﬁltering task for the four systems evaluated.document/query pairs,aiming at a summary length of40words.Theﬁrst,P REFIX,simply took theﬁrst 40words.The second,TF-IDF,selected out the 40words from a document with the highest TF-IDF scores and presented these as a bag of words.The third,KWIC,performed key-word extraction,simi-lar to that done by major search engines.This sys-tem identiﬁed the locations of the query terms,and took windows of three words on each side until the 40word limit was reached.Theﬁnal system,KL, performed sentence extraction using KL-divergence between sentences and queries as the ranking func-tion.Both the KL system and the KWIC system had access only to the short title section of the query. We have six human evaluators perform the eval-uation.They were presented with the query(ti-tle,description and summary)and the summaries of the documents(both relevant and irrelevant).They were instructed to select those documents that they thought were likely to be relevant to the query.In total,we obtained results on2772document/query pairs.On average,it took annotators approximately 100second to evaluate one query(corresponding to roughly16document/query pairs).We evaluated each system to ascertain whether the humans using this system could correctly sepa-rate relevant from irrelevant documents,based only on the summaries.We computed this by looking at the precision,recall and f-score between the true relevance judgments and the relevance judgments based on the summaries alone.These results are shown in Table2.As we can see from this table, the KWIC system performed best,followed by the sentence extraction KL system.The P REFIX sys-tem performed signiﬁcantly worse than either the KWIC and KL systems,and the TF-IDF system performed signiﬁcantly worse again.Interestingly, KWIC did not universally dominate KL:in40%ofthe queries,KL performed better.This suggests that a hybrid method(i.e.,a sentence extraction system with compression capabilities)should be able to do better than either alone.4.5Agreement ExperimentsIn order to judge annotator agreement,we had multi-ple annotators judge the same queries.We computed the kappa score on these multiple annotations(in a pairwise fashion,2categories,2codes,1260data points)and achieved a score ofκ=0.424,which was not as high as we would have hoped.In order to better understand these numbers,we also com-puted the kappa values on a per-system basis(for instance,for a bad system,we would expect humans to have more difﬁculty deciding relevance and thus have lower agreement).These results are as follows: for KWIC(341points),κ=0.513;for KWIC (269points),κ=0.495;for P REFIX(348points),κ=0.437;for TF-IDF(302points),κ=0.226. These kappa values,which top out at0.513,are still lower than we would like.However,in post-evaluation discussion with evaluators,there arose several issues that would need to be sorted out before such an evaluation were used on a large scale,and which might serve to improve agreement.The most pertinent issue was:how strict should the annota-tion be.For instance,someone in desperate need of information,would eventually click every link un-til something is found.On the other hand,some-one only mildly curious in a bit of information might only select two documents before giving up.Some effort would need to be put into making this explicit before deploying this evaluation metric.A second issue that came up was that the TF-IDF system was incredibly difﬁcult to evaluate.This is also seen by the shockingly low(0.226)kappa values for this system.This system also took roughly50%more time to evaluate than the others,since each word was high-content and out of context.4.6Discussion of EvaluationIn this section,we have argued in favor of moving toward extrinsic evaluation metrics,essentially be-cause we are spending so much time and money on evaluation as is,it makes sense to consider evalua-tions that more closely measure performance for a particular task.We have suggested the use of the relevance-prediction metric,which we have found to not be time consuming and to provide at least an intuitive ranking of four basline systems on sam-ple data.Moreover,the best system only achieves an f-score of57.4,leaving signiﬁcant room for im-provement.Since this metric is inherently a single-document,query-focused summarization task,it might be worthwhile to investigate other tasks for which multidocument summarization is natural(for instance,product review summaries). AcknowledgmentsThis work was supported by DARPA-ITO grant N66001-00-1-9814and NSF grants IIS-0326276and IIS-0097846. ReferencesJaime G.Carbonell and Jade Goldstein.1998.The use of MMR,diversity-based reranking for reordering documents and producing summaries.In Research and Development in Information Retrieval,pages335–336.Hal Daum´e III and Daniel Marcu.2005.Bayesian multi-document summarization at MSE.In ACL2005Workshop on Intrinsic and Extrinsic Evaluation Measures.Bonnie Dorr,Christof Monz,Douglas Oard,Stacy President, David Zajic,and Richard Schwartz.2004.Extrinsic evaluation of automatic metrics for summarization.Tech-nical Report LAMP-TR-115,CAR-TR-999,CS-TR-4610, UMIACS-TR-2004-48,University of Maryland,College Park.Bonnie Dorr,Christof Monz,Stacy President,and Richard Schwartz.2005.A methodology for extrinsic evaluation of text summarization:Does ROUGE correlate?In Proceed-ings of the Association for Computational Linguistics Work-shop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization.Chin-Yew Lin and Eduard Hovy.2003.Automatic evaluation of summaries using n-gram co-occurrence statistics.In Pro-ceedings of the Conference of the North American Chapter of the Association for Computational Linguistics and Human Language Technology(NAACL/HLT),Edmonton,Canada, May27–June1.Chin-Yew Lin.2001.SEE–Summary Evaluation Environ-ment./˜cyl/SEE/.Ani Nenkova and Rebecca Passonneau.2004.Evaluating con-tent selection in summarization:The pyramid method.In Proceedings of the Conference of the North American Chap-ter of the Association for Computational Linguistics and Human Language Technology(NAACL/HLT),Boston,MA, USA,May.David Zajic,Bonnie Dorr,Richard Schwartz,and Stacy Presi-dent.2004.Headline evaluation experiment results.Tech-nical Report LAMP-TR-111,CS-TR-4573,UMIACS-TR-2004-18,University of Maryland,College Park.。