to the source. One Simple Test of Samuelson's Dictum for the Stock Market
Simple Random Sampling
Simple Random Sampling ExplainedSimple Random Sampling, or SRS, is a quick, easy, and time efficient method of learning information about entire populations. SRS separates itself from other types of random sampling as the probabilities of being selected for this sample are equal for all subjects in the population. In this type of sampling, simplicity is stressed more than anything.What is Simple Random Sampling?Simple Random Sampling is when a subset of subjects (commonly referred to as a sample) is chosen randomly from a larger group of subjects who all have an equal chance at being selected. Researchers take care to not select the same subjects more than one time. This sample is meant to be representative of the larger population. This type of random sampling is the easiest type to conduct and produces fairly accurate results.AdvantagesOne of the biggest advantages of using this type of random sampling is that it is extremely error proof. It allows researchers to conduct these surveys without having advance knowledge about the population, so it is very time efficient and perfect for surveys where simplicity is a major goal. Simplified random sampling is an easy way to infer information about a whole population of people without having to study every member of that population individually.DisadvantagesUnfortunately, this type of sampling is not the best choice for extremely large or diverse populations as it requires a complete list of every member of a population for the selection process, which is time consuming and not always available to researchers in all areas. Additionally, this type of random sampling is very methodical, and may not be right for people whose strength is not in the area of mathematics as it relies a lot on probability.MethodsThere are several different ways to accomplish SRS in a survey that calls for it. One of these ways is the lottery method, where all subjects in a population are numbered and written on slips. These slips are then mixed up and a number is picked, and this process continues until enough subjects are selected from the sample. Random number tables are also used, but computerization, which leaves practically no room for error, is the most popular method.When To Use Simple Random SamplingSRS is a good choice if you are looking for a simple way to achieve results in a population that is not very large and information about this population is easy to access. You also may wantto use this method when time efficiency is very important to you and when you need to have a sample representative of the population that will help you make the most accurate possible generalization about the population as possible.Simple random sampling may be the easiest way to get unbiased and representative results, especially for surveys where researchers may not have a lot of information about the population being studied beforehand. Despite the kind of information you need to gather, SRS is a great way to get fast and accurate results.。
营销调研Sampling
Interviewers then select respondents according to these criteria rather than at random.
The selection of elements is then made separately from within each strata, usually by random or systematic sampling methods
Cluster or Multi-stage Sampling
This is the ideal choice as it is a ‘perfect’ random method.
Using this method, individuals are randomly selected from a list of the population and every single individual has an equal chance of selection.
Main difference lies in the fact that quotas (i.e. the amount of people to be surveyed) within subgroups are set beforehand (e.g. 25% 16-24 yr olds, 30% 25-34 yr olds, 20% 35-55 yr olds, and 25% 56+ yr olds)
A stratified sample is constructed by classifying the population in sub-populations (or strata), based on some well-known characteristics of the population, such as age, gender or socioeconomic status.
奈奎斯特采样定理英语
奈奎斯特采样定理英语Nyquist Sampling TheoremNyquist Sampling Theorem is a fundamental theorem in the field of communication engineering and signal processing. The theorem states that for a continuous-time signal that isband-limited to a frequency of B Hertz or less, it must be sampled at a rate of at least 2B samples per second to avoid aliasing distortion. In other words, the sampling frequency must be greater than or equal to twice the highest frequency present in the signal.Step 1: Understand the Need for Sampling TheoremTo understand why the Nyquist Sampling Theorem is needed, we need to understand the nature of continuous-time signals. Continuous-time signals are signals that change with respect to time and are described by mathematical functions. However, in order to process and communicate signals using computers and digital devices, we need to convert continuous-time signals into discrete-time signals, which are a sequence of values sampled at discrete points in time.Step 2: Learn About AliasingOne of the biggest challenges when convertingcontinuous-time signals to discrete-time signals is the problem of aliasing. Aliasing occurs when the sampling frequency is too low and causes the high-frequency components of the signal to appear as low-frequency components in the resulting discrete-time signal. This can cause significant distortion and make the signal impossible to interpret accurately.Step 3: Understand the Nyquist Sampling TheoremThe Nyquist Sampling Theorem addresses the problem of aliasing by specifying a minimum sampling rate for a given signal based on its maximum frequency component. The theorem states that to avoid aliasing, the sampling frequency must be greater than or equal to twice the maximum frequency present in the signal.Step 4: Apply the Nyquist Sampling TheoremTo apply the Nyquist Sampling Theorem, we need to determine the maximum frequency component of the signal we want to convert from a continuous-time signal to a discrete-time signal. Once we have this information, we can determine the minimum sampling rate required to avoid aliasing.Step 5: ConclusionIn conclusion, the Nyquist Sampling Theorem is a fundamental theorem in the field of communication engineering and signal processing. It specifies the minimum sampling rate required to avoid aliasing distortion when converting a continuous-time signal to a discrete-time signal. The theorem has significant implications for modern technology, including digital signal processing, wireless communication, and data storage.。
israel(1992)提出的计算样本大小的公式
israel(1992)提出的计算样本大小的公式英文版The Formula for Calculating Sample Size Proposed by Israel (1992)In statistical research, sample size is a crucial element that determines the validity and reliability of the research findings. Selecting an appropriate sample size is essential to ensure that the results of a study are generalizable and accurate. Among the various methods for calculating sample size, the formula proposed by Israel in 1992 stands out as a widely used and reliable approach.The Israel (1992) formula for calculating sample size is based on several key considerations, including the population size, the desired level of confidence, and the margin of error. The formula takes into account these factors to determine the minimum number of samples required to achieve statistically significant results.The formula can be expressed as follows:n = N / (1 + N * e^2)where:n represents the sample sizeN is the population sizee is the margin of error, expressed as a decimal (e.g., 0.05 for a 5% margin of error)This formula allows researchers to calculate the sample size based on their specific research requirements and constraints. By plugging in the values for the population size and the desired margin of error, the formula provides a scientifically sound estimate of the minimum number of samples needed for the study.It is important to note that while the Israel (1992) formula provides a useful starting point for sample size calculation, it may not be applicable in all scenarios. The formula assumes a simple random sampling without replacement, and it may needto be adjusted for more complex sampling designs or specific research contexts.Nevertheless, the Israel (1992) formula remains a valuable tool for researchers seeking to determine an appropriate sample size for their studies. By carefully considering the relevant factors and using this formula, researchers can ensure that their sample size is adequate to support statistically valid and reliable research findings.中文版以色列(1992)提出的计算样本大小的公式在统计研究中,样本大小是决定研究结果的有效性和可靠性的关键因素。
SADISA包(版本1.2):物种丰度分布与独立物种假设说明书
Package‘SADISA’October12,2022Type PackageTitle Species Abundance Distributions with Independent-SpeciesAssumptionVersion1.2Author Rampal S.Etienne&Bart HaegemanMaintainer Rampal S.Etienne<******************>Description Computes the probability of a set of species abundances of a single or multiple sam-ples of individuals with one or more guilds under a mainland-island model.One must spec-ify the mainland(metacommunity)model and the island(local)community model.It as-sumes that speciesfluctuate independently.The package also contains functions to simulate un-der this model.See Haegeman,B.&R.S.Etienne(2017).A general sampling formula for com-munity structure data.Methods in Ecology&Evolution8:1506-1519<doi:10.1111/2041-210X.12807>.License GPL-3LazyData FALSERoxygenNote6.1.1Encoding UTF-8Depends R(>=3.5)Imports pracma,DDD(>=4.1)Suggests testthat,knitr,rmarkdown,VignetteBuilder knitrNeedsCompilation noRepository CRANDate/Publication2019-10-2312:10:02UTCR topics documented:convert_fa2sf (2)datasets (2)fitresults (3)12datasets integral_peak (4)SADISA_loglik (5)SADISA_ML (6)SADISA_sim (8)SADISA_test (9)Index11 convert_fa2sf Converts different formats to represent multiple sample dataDescriptionConverts the full abundance matrix into species frequencies If S is the number of species and M is the number of samples,then fa is the full abundance matrix of dimension S by M.The for example fa=[010;321;010]leads to sf=[0102;3211];Usageconvert_fa2sf(fa)Argumentsfa the full abundance matrix with species in rows and samples in columnsValuethe sample frequency matrixReferencesHaegeman,B.&R.S.Etienne(2017).A general sampling formula for community structure data.Methods in Ecology&Evolution.In press.datasets Data sets of various tropical forest communitiesDescriptionVarious tree commnunity abundance data sets to test and illustrate the Independent Species ap-proach.•dset1.abunvec contains a list of6samples of tree abundances from6tropical forest plots(BCI, Korup,Pasoh,Sinharaja,Yasuni,Lambir).•dset2.abunvec contains a list of11lists with one of11samples from BCI combined with samples from Cocoli and Sherman.fitresults3•dset3.abunvec contains a list of6lists with2samples,each from one dispersal guild,for6tropical forest communities(BCI,Korup,Pasoh,Sinharaja,Yasuni,Lambir).•dset4a.abunvec contains a list of6samples from6censuses of BCI(1982,1985,1990,1995,200,2005)with dbh>1cm.•dset4b.abunvec contains a list of6samples from6censuses of BCI(1982,1985,1990,1995,200,2005)with dbh>10cm.Usagedata(datasets)FormatA list of5data sets.See description for information on each of these data sets.Author(s)Rampal S.Etienne&Bart HaegemanSourceCondit et al.(2002).Beta-diversity in tropical forest trees.Science295:666-669.See also11.Janzen,T.,B.Haegeman&R.S.Etienne(2015).A sampling formula for ecological communitieswith multiple dispersal syndromes.Journal of Theoretical Biology387,258-261.fitresults Maximum likelihood estimates and corresponding likelihood valuesfor variousfits to various tropical forest communitiesDescriptionMaximum likelihood estimates and corresponding likelihood values for variousfits to various trop-ical forest communities,to test and illustrate the Independent Species approach.•fit1a.llikopt contains maximum likelihood values offit of pm-dl model to dset1.abunvec•fit1a.parsopt contains maximum likelihood parameter estimates offit of pm-dl model to dset1.abunvec •fit1b.llikopt contains maximum likelihood values offit of pmc-dl model to dset1.abunvec•fit1b.parsopt contains maximum likelihood parameter estimates offit of pmc-dl model todset1.abunvec•fit2.llikopt contains maximum likelihood values offit of rf-dl model to dset1.abunvec•fit2.parsopt contains maximum likelihood parameter estimates offit of rf-dl model to dset1.abunvec •fit3.llikopt contains maximum likelihood values offit of dd-dl model to dset1.abunvec•fit3.parsopt contains maximum likelihood parameter estimates offit of dd-dl model to dset1.abunvec •fit4.llikopt contains maximum likelihood values offit of pm-dl model to dset2.abunvec(mul-tiple samples)4integral_peak •fit4.parsopt contains maximum likelihood parameter estimates offit of pm-dl model to dset1.abunvec(multiple samples)•fit5.llikopt contains maximum likelihood values offit of pm-dl model to dset3.abunvec(mul-tiple guilds)•fit5.parsopt contains maximum likelihood parameter estimates offit of pm-dl model to dset3.abunvec (multiple guilds)•fit6.llikopt contains maximum likelihood values offit of pr-dl model to dset1.abunvec•fit6.parsopt contains maximum likelihood parameter estimates offit of pr-dl model to dset1.abunvec •fit7.llikopt contains maximum likelihood values offit of pm-dd model to dset1.abunvec•fit7.parsopt contains maximum likelihood parameter estimates offit of pm-dd model to dset1.abunvec •fit8a.llikopt contains maximum likelihood values offit of pm-dd model to dset4a.abunvec•fit8a.parsopt contains maximum likelihood parameter estimates offit of pm-dd model todset4a.abunvec•fit8b.llikopt contains maximum likelihood values offit of pm-dd model to dset4b.abunvec•fit8b.parsopt contains maximum likelihood parameter estimates offit of pm-dd model todset4b.abunvecUsagedata(fitresults)FormatA list of20lists,each containing either likelihood values or the corresponding parameter estimates.See description.Author(s)Rampal S.Etienne&Bart HaegemanSourceCondit et al.(2002).Beta-diversity in tropical forest trees.Science295:666-669.integral_peak Computes integral of a very peaked functionDescription#computes the logarithm of the integral of exp(logfun)from0to Inf under the following assump-tions:Usageintegral_peak(logfun,xx=seq(-100,10,2),xcutoff=2,ycutoff=40,ymaxthreshold=1e-12)SADISA_loglik5Argumentslogfun the logarithm of the function to integratexx the initial set of points on which to evaluate the functionxcutoff when the maximum has been found among the xx,this parameter sets the width of the interval tofind the maximum inycutoff set the threshold below which(on a log scale)the function is deemed negligible,i.e.that it does not contribute to the integral)ymaxthreshold sets the deviation allowed infinding the maximum among the xxValuethe result of the integrationReferencesHaegeman,B.&R.S.Etienne(2017).A general sampling formula for community structure data.Methods in Ecology&Evolution.In press.SADISA_loglik Computes loglikelihood for requested modelDescriptionComputes loglikelihood for requested model using independent-species approachUsageSADISA_loglik(abund,pars,model,mult="single")Argumentsabund abundance vector or a list of abundance vectors.When a list is provided and mult=’mg’(the default),it is assumed that the different vectors apply to dif-ferent guilds.When mult=’ms’then the different vectors apply to multiplesamples from the same metacommunity.In this case the vectors should haveequal lengths and may contain zeros because there may be species that occur inmultiple samples and species that do not occur in some of the samples.Whenmult=’both’,abund should be a list of lists,each list representing multiple guildswithin a samplepars a vector of model parameters or a list of vectors of model parameters.Whena list is provided and mult=’mg’(the default),it is assumed that the differentvectors apply to different guilds.Otherwise,it is assumed that they apply tomultiple samples.model the chosen combination of metacommunity model and local community model as a vector,e.g.c(’pm’,’dl’)for a model with point mutation in the metacom-munity and dispersal limitation.The choices for the metacommunity modelare:’pm’(point mutation),’rf’(randomfission),’pr’(protracted speciation),’dd’(density-dependence).The choices for the local community model are:’dl’(dispersal limitation),’dd’(density-dependence).mult When set to’single’(the default),the loglikelihood for a single sample is com-puted When set to’mg’the loglikelihood for multiple guilds is computed.Whenset to’ms’the loglikelihood for multiple samples from the same metacommu-nity is computed.When set to’both’the loglikelihood for multiple guilds withinmultiple samples is computed.DetailsNot all combinations of metacommunity model and local community model have been implemented yet.because this requires checking for numerical stability of the integration.The currently avail-able model combinations are,for a single sample,c(’pm’,’dl’),c(’pm’,’rf’),c(’dd’,’dl’),c(’pr’,’dl’), c(’pm’,’dd’),and for multiple samples,c(’pm’,’dl’).ValueloglikelihoodReferencesHaegeman,B.&R.S.Etienne(2017).A general sampling formula for community structure data.Methods in Ecology&Evolution8:1506-1519.doi:10.1111/2041-210X.12807Examplesdata(datasets);abund_bci<-datasets$dset1.abunvec[[1]];data(fitresults);data.paropt<-fitresults$fit1a.parsopt[[1]];result<-SADISA_loglik(abund=abund_bci,pars=data.paropt,model=c( pm , dl ));cat( The difference between result and the value in fitresults.RData is: ,result-fitresults$fit1a.llikopt[[1]]);SADISA_ML Performs maximum likelihood parameter estimation for requestedmodelDescriptionComputes maximum loglikelihood and corresponding parameters for the requested model using the independent-species approach.For optimization it uses various auxiliary functions in the DDD package.UsageSADISA_ML(abund,initpars,idpars,labelpars,model=c("pm","dl"),mult="single",tol=c(1e-06,1e-06,1e-06),maxiter=min(1000*round((1.25)^sum(idpars)),1e+05),optimmethod="subplex",num_cycles=1)Argumentsabund abundance vector or a list of abundance vectors.When a list is provided and mult=’mg’(the default),it is assumed that the different vectors apply to dif-ferent guilds.When mult=’ms’then the different vectors apply to multiplesamples.from the same metacommunity.In this case the vectors should haveequal lengths and may contain zeros because there may be species that occur inmultiple samples and species that do not occur in some of the samples.initpars a vector of initial values of the parameters to be optimized andfixed.See labelpars for more explanation.idpars a vector stating whether the parameters in initpars should be optimized(1)or remainfixed(0).labelpars a vector,a list of vectors or a list of lists of vectors indicating the labels integers (starting at1)of the parameters to be optimized andfixed.These integers cor-respond to the position in initpars and idpars.The order of the labels in thevector/list isfirst the metacommunity parameters(theta,and phi(for protractedspeciation)or alpha(for density-dependence or abundance-dependent specia-tion)),then the dispersal parameters(I).See the example and the vignette formore explanation.model the chosen combination of metacommunity model and local community model as a vector,e.g.c(’pm’,’dl’)for a model with point mutation in the metacom-munity and dispersal limitation.The choices for the metacommunity modelare:’pm’(point mutation),’rf’(randomfission),’pr’(protracted speciation),’dd’(density-dependence).The choices for the local community model are:’dl’(dispersal limitation),’dd’(density-dependence).mult When set to’single’(the default),the loglikelihood for a single sample and single guild is computed.When set to’mg’,the loglikelihood for multiple guildsis computed.When set to’ms’the loglikelihood for multiple samples from thesame metacommunity is computed.tol a vector containing three numbers for the relative tolerance in the parameters,the relative tolerance in the function,and the absolute tolerance in the parameters.maxiter sets the maximum number of iterationsoptimmethod sets the optimization method to be used,either subplex(default)or an alternative implementation of simplex.num_cycles the number of cycles of opimization.If set at Inf,it will do as many cycles as needed to meet the tolerance set for the target function.8SADISA_simDetailsNot all combinations of metacommunity model and local community model have been implemented yet.because this requires checking for numerical stability of the integration.The currently avail-able model combinations are,for a single sample,c(’pm’,’dl’),c(’pm’,’rf’),c(’dd’,’dl’),c(’pr’,’dl’), c(’pm’,’dd’),and for multiple samples,c(’pm’,’dl’).ReferencesHaegeman,B.&R.S.Etienne(2017).A general sampling formula for community structure data.Methods in Ecology&Evolution8:1506-1519.doi:10.1111/2041-210X.12807Examplesutils::data(datasets);utils::data(fitresults);result<-SADISA_ML(abund=datasets$dset1.abunvec[[1]],initpars=fitresults$fit1a.parsopt[[1]],idpars=c(1,1),labelpars=c(1,2),model=c( pm , dl ),tol=c(1E-1,1E-1,1E-1));#Note that tolerances should be set much lower than1E-1to get the best results. SADISA_sim Simulates species abundance dataDescriptionSimulates species abundance data using the independent-species approachUsageSADISA_sim(parsmc,ii,jj,model=c("pm","dl"),mult="single",nsim=1)Argumentsparsmc The model parameters.For the point mutation(pm)model this is theta and I.For the protracted model(pr)this is theta,phi and I.For the density-dependentmodel(dd)-which can also be interpreted as the per-species speciation model,this is theta and alpha.ii The I parameter.When I is a vector,it is assumed that each value describes a sample or a guild depending on whether mult==’ms’or mult==’mg’.Whenmult=’both’,a list of lists must be specified,with each list element relates to asample and contains a list of values across guilds.jj the sample sizes for each sample and each guild.Must have the same structure as iimodel the chosen combination of metacommunity model and local community model as a vector,e.g.c(’pm’,’dl’)for a model with point mutation in the metacom-munity and dispersal limitation.The choices for the metacommunity modelare:’pm’(point mutation),’rf’(randomfission),’pr’(protracted speciation),’dd’(density-dependence).The choices for the local community model are:’dl’(dispersal limitation),’dd’(density-dependence).mult When set to’single’,the loglikelihood of a single abundance vector will be com-puted When set to’mg’the loglikelihood for multiple guilds is computed.Whenset to’ms’the loglikelihood for multiple samples from the same metacommu-nity is computed.When set to’both’the loglikelihood for multiple guilds withinmultiple samples is computed.nsim Number of simulations to performDetailsNot all combinations of metacommunity model and local community model have been implemented yet.because this requires checking for numerical stability of the integration.The currently available model combinations are c(’pm’,’dl’).Valueabund abundance vector,a list of abundance vectors,or a list of lists of abundance vectors,or a list of lists of lists of abundance vectors Thefirst layer of the lists corresponds to different simulations When mult=’mg’,each list contains a list of abundance vectors for different guilds.When mult =’ms’,each list contains a list of abundance vectors for different samples from the same meta-community.In this case the vectors should have equal lengths and may contain zeros because there may be species that occur in multiple samples and species that do not occur in some of the samples.When mult=’both’,each list will be a list of lists of multiple guilds within a sampleReferencesHaegeman,B.&R.S.Etienne(2017).A general sampling formula for community structure data.Methods in Ecology&Evolution8:1506-1519.doi:10.1111/2041-210X.12807SADISA_test Tests SADISA for data sets included in the paper by Haegeman&Eti-enneDescriptionTests SADISA for data sets included in the paper by Haegeman&EtienneUsageSADISA_test(tol=0.001)Argumentstol tolerance of the testReferencesHaegeman,B.&R.S.Etienne(2017).A general sampling formula for community structure data.Methods in Ecology&Evolution.In press.Index∗datasetsdatasets,2fitresults,3∗modelSADISA_loglik,5SADISA_ML,6SADISA_sim,8SADISA_test,9∗species-abundance-distributionSADISA_loglik,5SADISA_ML,6SADISA_sim,8SADISA_test,9convert_fa2sf,2datasets,2fitresults,3integral_peak,4SADISA_loglik,5SADISA_ML,6SADISA_sim,8SADISA_test,911。
DIN 75201-2011
Translation by DIN-Sprachendienst.
In case of doubt, the German-language original shall be considered authoritative.
©
No part of this translation may be reproduced without prior permission of DIN Deutsches Institut für Normung e. V., Berlin. Beuth Verlag GmbH, 10772 Berlin, Germany, has the exclusive right of sale for German Standards (DIN-Normen).
English price group 12 www.din.de www.beuth.de
!$z#b"
07.12
1870063
DIN 75201:2011-11
Contents
英文科技论文写作_北京理工大学中国大学mooc课后章节答案期末考试题库2023年
英文科技论文写作_北京理工大学中国大学mooc课后章节答案期末考试题库2023年1.If a real physical system shows a variation of both material properties acrossthe graded layer, the assumed linear variation may not give the bestapproximation.答案:may2.The idea of 'community' in terms of GRT lives is very strong and could beseen to correspond to some of the nostalgic constructs that non-GRT groups place on 'community'.答案:could be seen3.Is the research topic “How safe is nuclear power” effective?答案:正确4.Decide whether the following statement is true or false.c.Introductionincludes more detailed information than abstract.答案:正确5.Tertiary education may be ________ asthe period of study which is spent atuniversity.答案:defined6.Unbalanced Force ________ tothe sum total or net force exerted on an object.答案:refers7.This scatter can be attributed to the difficulties in measuring the dent depthdue to specimen processing.答案:can be attributed8.Choose a proper word from the choices to complete the following sentence.Arocket traveling away from Earth ____________ a speed greater than 11.186kilometers per second (6.95 miles per second) or 40,270 kilometers per hour (25,023 mph) will eventually escape Earth’s gravity.答案:at9.Choose a proper word from the choices to complete the following sentence.Inmechanical systems, power, the rate of doing work, can be computed____________ the product of force × velocity.答案:as10.Choose a proper word from the choices to complete the followingsentence.N ewton’s first law, the law of inertia, __________ that it takes a force to change the motion of an object.答案:states11.Choose a proper word from the choices to complete the followingsentence.Newton’s second law relates force, acceleration, and mass and it is often ___________ as the equation:f = ma答案:written12.Choose a proper word from the choices to complete the followingsentence.Because all types of energy can be expressed ___________ the sameunits, joules, this conversion can be expressed quantitatively in simplemodels.答案:in13.Choose a proper word from the choices to complete the followingsentence.So a key difference between a rocket and a jet plane is ____________ a rocket’s engine lifts it directly upward into the sky, whereas a jet’s engin es simply speed the plane forward so its wings can generate lift.答案:that14.Which of the following are the guidelines for writing formulas and equations?答案:Numbering all equations in sequence if referred to later._Centeringequations on their own separate lines._Using equations as grammatical units in sentences._Defining the symbols that are used.15.Acceleration relates to motion. It ________ a change in motion.答案:means16.Assertiveness is ________ asa skill of being able to stand up for your own orother people's rights in a calm and positive way, without being eitheraggressive, or passively accepting 'wrong'.答案:viewed17.The force that pushes a rocket upward is ________ thrust.答案:called18.Water ________ a liquid made up of molecules of hydrogen and oxygen in theratio of 2 to 1.答案:is19.The number of private cars increased ______60% from 2015 to 2016.答案:by20.Which can be the situations for writing a researchproposal?答案:Applying for an opportunity for a project_Applying for a bachelor’s, or master’s or doctor’s degree_Applying for some research funds or grants21.Who are usually the readers of the research proposals?答案:Specialists_Professors_Supervisors for the students_Professionals22.What are the elements to make the research proposal persuasive?答案:Reasonable budget_Clear Schedule_A Capable research team_Theimportance and necessity of the research question23.What are the language features of the research proposal?答案:Future tense_First person24.The purpose of writing a proposal is to ________________ the readers that theresearch plan is feasible and we are capable to do it.答案:persuade25.What types of information are generally supposed to be included in theintroduction section in the report?答案:Background_Summary of the results and conclusion_The purpose of the research26.Please decide whether the following statement is T(true) orF(false)according to the video.Discussion section analyzesand evaluates the research methods.答案:错误27.Please decide whether the following statement is T(true) orF(false)according to the video.Conclusion and recommendation sectionstates the significance of the findings and usually includes possible directions for further research.答案:正确28.These causes affected different regions differently in the 1990s, ______ Europehaving as much as 9.8% of degradation due to deforestation.答案:with29.Coal is predicted to increase steadily to 31q in 2030, whereas gas will remainstable ______ 25q.答案:at30.Manufacturing value added amounted ______12.3% of total U.S. grossdomestic product (GDP) in 2012, according to United Nations calculations.答案:to31.Chinese manufacturing value added accounted ______ 30.6% of its economy’stotal output in 2012, according to the UN.答案:for32.Japan ranked third ______ manufacturing value added at $1.1 trillion (seeFigure 1).答案:in33.About 4.2% of the 1,120 respondents were younger than 20 years, and 26.7%were ______ 21 and 30 years old.答案:between34.______ all the respondents, 67.1% were married and 32.9% were single.答案:of35.Decide whether the following statement is true or false.b.Both introductionand abstract include research findings.答案:错误36.Decide whether the following statement is true or false.a.It is possible to findtables or diagrams in introduction.答案:正确37.What are the possible contents of an introduction?答案:Reviewing the existing literature relevant to the presentstudy_Announcing the purpose/focus of the study_Identifying a gap in the existing literature_Explaining the significance or necessity of the research38.Choose the proper answers for the following questions.Ways to organize thereferences include:答案:a. Chronological order of publications_b. Researchmethods_c. Research theories_d. Research modes39.This indicates that there is a possibility of obtaining fluid density from soundspeed measurements and suggests that it is possible to measure soundabsorption with an ultrasonic cell to determine oil viscosity.In this sentence, the writer presents答案:Implication40.The measurements were shown to lead to an accurate determination of thebubble point of the oil.In this sentence, the writer presents答案:Results and achievement41.An ultrasonic cell was constructed to measure the speed of sound and testedin a crude oil sample. The speed of sound was measured at temperaturesbetween 260 and 411 K at pressures up to 75 MPs.In this sentence, thewriter presents答案:Methodology42.The aim of this study was to investigate the use of an ultrasonic cell todetermine crude oil properties, in particular oil density.In this sentence, the writer presents答案:Research aim43. A citation gives the s____ where the information or idea is from.答案:source44.An in-text citation usually includes information about the author and thep____ year.答案:publishing##%_YZPRLFH_%##publication45.To avoid plagiarism, using citations is the best way to give c____ to theoriginal author.答案:credit46.The publication details of the references listed at the end of the paper usuallyare put in a____ order.答案:alphabetical##%_YZPRLFH_%##alphabetic##%_YZPRLFH_%##alphab et47.The speed of sound in a fluid is determined by, and therefore an indicator of,the thermodynamic properties of that fluid.In this sentence, the writerpresents答案:Background factual information48.Citations are not necessary if the source is not clear.答案:错误49.Unintentional plagiarism can be excused.答案:错误50.Citing will make our writing less original.答案:错误51.Citing can effectively stress the originality of someone’s work.答案:正确52.As for the purposes of a literature review, which one is not included?答案:predicting the trend in relation to a central research question orhypothesis53. A literature review could be possibly presented as a/an ______.答案:all of the above54.The heading “Brief review of literature: drawing a timeline from 2005 to2017” shows the literature review is arranged in ______ order.答案:chronological55.About writing a literature review, which of the following statements is notcorrect?答案:To show respect to others’ work, our own interpretations should not be included.56.In terms of the writing feature, a research paper resembles a/an______.答案:argumentation57.Each citation can only have one particular citing purpose.答案:错误pared with in-text citations, the end-of-text references are more detailed.答案:正确59.In-text citations provide the abbreviation of an author’s given/first namerather than family/last name.答案:错误60.When the Chinese writers’ ideas are cited, the first names in Pinyin will begiven in in-text citations.答案:错误61.When a process is described, _____________ are usually used to show the orderof the stages or steps.答案:sequencers62.To help the reader better understand a complicated process, _____________ is(are) very often used.答案:visual aids63.What information is usually included when defining a process?答案:Equipment._Product_Material64.Decide whether the following statement is true or false.Researchers arerequired to use past tense when describing a process.答案:错误65.Decide whether the following statement is true or false.A definition of theprocess is very often given first when a process is described.答案:正确66.Escherichia coli, when found in conjunction with urethritis, often indicateinfection higher in the uro-genital tract.答案:正确67.The 'management' of danger is also not the sort of language to appear withinpolicy documents that refer to GRT children, which reflects systematicfailures in schools.错误68.Conceivably, different forms, changing at different rates and showingcontrasting combinations of characteristics, were present in different areas.答案:正确69.Viewing a movie in which alcohol is portrayed appears to lead to higher totalalcohol consumption of young people while watching the movie.答案:正确70.Furthermore, this proves that humans are wired to imitate.答案:错误71.One possibility is that generalized latent inhibition is likely to be weaker thanthat produced by pre-exposure to the CS itself and thus is more likely to be susceptible to the effect of the long interval.答案:正确72.It is unquestionable that our survey proved that the portrayal of alcohol anddrinking characters in movies directly leads to more alcohol consumption in young adult male viewers when alcohol is available within the situation.错误73.Implications of these findings may be that, if moderation of alcoholconsumption in certain groups is strived for, it may be sensible to cut down on the portrayal of alcohol in programmes aimed at these groups and thecommercials shown in between.答案:正确74.This effect might occur regardless of whether it concerns a real-lifeinteraction.答案:正确75.It definitely proves that a movie in which a lot of partying is involved triggersa social process between two participants that affects total drinking amounts.答案:错误76.It is believed that alcohol related health problems are on the rise.答案:believed77.Drinking to excess, or 'binge drinking' is often the cause of inappropriatebehaviour amongst teenagers.often78.It seems as though the experiment conducted simply confirms suspicionsheld by the academic and medical professions.答案:seems79.However, attrition was greatest among the heaviest drinking segment of thesample, suggesting under-estimation in the findings, and although the study provided associational, prospective evidence on alcohol advertising effects on youth drinking, it addressed limitations of other research, particularly the unreliability of exposure measures based on self-reporting (Synder andSlater, 2006).答案:suggesting80.These differences may be due to the fact participants reporting higherconsumption levels were primed to overrate their weekly drinking by the condition they were in.答案:may81.The crack tends to grow into the more brittle material and then stay in there,whether the initial crack tip lies in the graded material or in the more ductile material (and thereafter advances across the graded layer.答案:tends82.Decidewhether hedging language is used in thesentence below.Light smokingseems to have dramatic effects on cardiovascular disease.答案:正确83.Decidewhether hedging language is used in thesentence below.The impact ofthe UK’s ageing population will lead to increased welfare costs. Definitely,this will result in higher taxes and an increased retirement age for younger people.答案:错误84.Decidewhether hedging language is used in thesentence below.Althoughduration of smoking is also important when considering risk, it is highlycorrelated with age, which itself is a risk factor, so separating their effectscan be difficult.答案:正确85.Decidewhether hedging language is used in thesentence below.All these factstaken together point toward the likely presence of calcium carbonate in the soils that Phoenix has analyzed.答案:正确86.Decidewhether hedging language is used in thesentence below.Because thesefeatures are carved into the Tharsis Plateau, they must have an intermediate age.答案:错误87.Decidewhether hedging language is used in thesentence below.They appearto be covered with multiple layers of volcanic flows and sedimentary debris that originated in the south.答案:正确88.Decidewhether hedging language is used in thesentence below.Steven M.Clifford of the Lunar and Planetary Science Institute in Houston, amongothers, has conjectured that melting under a glacier or a thick layer ofpermafrost could also have recharged subterranean water sources.答案:正确89.Decidewhether hedging language is used in thesentence below.Earlier thisyear Philip Christensen of Arizona State University discovered gullies that clearly emerge from underneath a bank of snow and ice.答案:错误90.Put the following expressions in the proper place of the Discussion.A. Thesedata suggestB. In this study, we demonstrate C. it is critical to emphasizeD.additional research will be requiredE. we were unable todetermineDiscussionIndividuals who recover from certain viral infections typically develop virus-specific antibody responses that provide robustprotective immunity against re-exposure, but some viruses do not generate protective natural immunity, such as HIV-1. Human challenge studies for the common cold coronavirus 229E have suggested that there may be partialnatural immunity. However, there is currently no data whether humans who have recovered from SARS-CoV-2 infection are protected from re-exposure.This is a critical issue with profound implications for vaccine development, public health strategies, antibody-based therapeutics, and epidemiologicmodeling of herd immunity. _____1_______ that SARS-CoV-2 infection in rhesusmacaques provided protective efficacy against SARS-CoV-2 rechallenge.We developed a rhesus macaque model of SARS-CoV-2 infection thatrecapitulates many aspects of human SARS-CoV-2 infection, including high levels of viral replication in the upper and lower respiratory tract and clear pathologic evidence of viral pneumonia. Histopathology,immunohistochemistry, RNAscope, and CyCIF imaging demonstratedmultifocal clusters of virus infected cells in areas of acute inflammation, with evidence for virus infection of alveolar pneumocytes and ciliated bronchial epithelial cells. ______2_______ the utility of rhesus macaques as a model forSARS-CoV-2 infection for testing vaccines and therapeutics and for studying immunopathogenesis. However, neither nonhuman primate model led torespiratory failure or mortality, and thus further research will be required to develop a nonhuman primate model of severe COVID-19 disease.SARS-CoV-2 infection in rhesus macaques led to humoral and cellular immune responses and provided protection against rechallenge. Residual low levels ofsubgenomic mRNA in nasal swabs in a subset of animals and anamnesticimmune responses in all animals following SARS-CoV-2 rechallenge suggest that protection was mediated by immunologic control and likely was notsterilizing.Given the near-complete protection in all animals following SARS-CoV-2 rechallenge, ______3_______ immune correlates of protection in thisstudy. SARS-CoV-2 infection in rhesus monkeys resulted in the induction of neutralizing antibody titers of approximately 100 by both a pseudovirusneutralization assay and a live virus neutralization assay, but the relativeimportance of neutralizing antibodies, other functional antibodies, cellular immunity, and innate immunity to protective efficacy against SARS-CoV-2remains to be determined. Moreover, ______4_______ to define the durability of natural immunity.In summary, SARS-CoV-2 infection in rhesus macaquesinduced humoral and cellular immune responses and provided protectiveefficacy against SARS-CoV-2 rechallenge. These data raise the possibility that immunologic approaches to the prevention and treatment of SARS-CoV-2infection may in fact be possible. However,______5_______ that there areimportant differences between SARS-CoV-2 infection in macaques andhumans, with many parameters still yet to be defined in both species, andthus our data should be interpreted cautiously. Rigorous clinical studies will be required to determine whether SARS-CoV-2 infection effectively protects against SARS-CoV-2 re-exposure in humans.答案:BAEDC91.Rearrange the order of the following sentences to make a coherent andmeaningful abstract.1.These antibodies neutralized 10 representative SARS-CoV-2 strains, suggesting a possible broader neutralizing ability against otherstrains. Three immunizations using two different doses, 3 or 6 micrograms per dose, provided partial or complete protection in macaques against SARS-CoV-2 challenge, respectively, without observable antibody-dependentenhancement of infection.2.The coronavirus disease 2019 (COVID-19)pandemic caused by severe acute respiratory syndrome coronavirus 2(SARS-CoV-2) has resulted in an unprecedented public health crisis. Because of the novelty of the virus, there are currently no SARS-CoV-2–specifictreatments or vaccines available.3.Therefore, rapid development of effective vaccines against SARS-CoV-2 are urgently needed.4.Here, we developed apilot-scale production of PiCoVacc, a purified inactivated SARS-CoV-2 virus vaccine candidate, which induced SARS-CoV-2–specific neutralizingantibodies in mice, rats, and nonhuman primates.5.These data support the clinical development and testing of PiCoVacc for use in humans.答案:2341592.It seems likely that the details of the predictions depend on the assumedvariations of the toughness parameter and the yield stress.答案:It seems likely that93.The Relationships of Meteorological Factors and Nutrient Levels withPhytoplankton Biomass in a Shallow Eutrophic Lake Dominated byCyanobacteria, Lake Dianchi from 1991 to 2013A. The SHs, WS, and TPconcentrations controlled the bloom dynamics during the dry season, among which the TP concentration was the most important factors, whereas the TN and TP concentrations were the primary factors during the rainy season.B.Interannual analysis revealed that the phytoplankton biomass increased with increases in air temperature and TP concentration, with TP concentration as the main contributing factor.C. The results of our study demonstrated that both meteorological factors and nutrient levels had important roles incontrolling cyanobacterial bloom dynamics.D. All of these results suggest that both climate change regulation and eutrophication management should be considered in strategies aimed at controlling cyanobacterial blooms.E. Insummary, we analyzed the effects of meteorological factors and nutrientlevels on bloom dynamics in Lake Dianchi to represent the phytoplanktonbiomass.F. Further studies should assess the effects of climate change andeutrophication on cyanobacterial bloom dynamics based on data collected over a longer duration and more frequent and complete variables, andappropriate measures should be proposed to control these blooms.G.Decreasing nutrient levels, particularly the TP load should be initiallyconsidered during the entire period and during the dry season, anddecreasing both the TN and TP loads should be considered during the rainy season.H. However, the relative importance of these factors may changeaccording to precipitation patterns.1.2.B3.A4.G5.5. __________答案:F94.The Relationships of Meteorological Factors and Nutrient Levels withPhytoplankton Biomass in a Shallow Eutrophic Lake Dominated byCyanobacteria, Lake Dianchi from 1991 to 2013A. The SHs, WS, and TPconcentrations controlled the bloom dynamics during the dry season, among which the TP concentration was the most important factors, whereas the TN and TP concentrations were the primary factors during the rainy season.B.Interannual analysis revealed that the phytoplankton biomass increased with increases in air temperature and TP concentration, with TP concentration as the main contributing factor.C. The results of our study demonstrated that both meteorological factors and nutrient levels had important roles incontrolling cyanobacterial bloom dynamics.D. All of these results suggest that both climate change regulation and eutrophication management should be considered in strategies aimed at controlling cyanobacterial blooms.E. Insummary, we analyzed the effects of meteorological factors and nutrientlevels on bloom dynamics in Lake Dianchi to represent the phytoplanktonbiomass.F. Further studies should assess the effects of climate change andeutrophication on cyanobacterial bloom dynamics based on data collected over a longer duration and more frequent and complete variables, andappropriate measures should be proposed to control these blooms.G.Decreasing nutrient levels, particularly the TP load should be initiallyconsidered during the entire period and during the dry season, anddecreasing both the TN and TP loads should be considered during the rainy season.H. However, the relative importance of these factors may changeaccording to precipitation patterns.1.2.B3.A4.G5.4. __________答案:D95.The Relationships of Meteorological Factors and Nutrient Levels withPhytoplankton Biomass in a Shallow Eutrophic Lake Dominated byCyanobacteria, Lake Dianchi from 1991 to 2013A. The SHs, WS, and TPconcentrations controlled the bloom dynamics during the dry season, among which the TP concentration was the most important factors, whereas the TN and TP concentrations were the primary factors during the rainy season.B.Interannual analysis revealed that the phytoplankton biomass increased with increases in air temperature and TP concentration, with TP concentration as the main contributing factor.C. The results of our study demonstrated that both meteorological factors and nutrient levels had important roles incontrolling cyanobacterial bloom dynamics.D. All of these results suggest that both climate change regulation and eutrophication management should be considered in strategies aimed at controlling cyanobacterial blooms.E. Insummary, we analyzed the effects of meteorological factors and nutrientlevels on bloom dynamics in Lake Dianchi to represent the phytoplanktonbiomass.F. Further studies should assess the effects of climate change andeutrophication on cyanobacterial bloom dynamics based on data collected over a longer duration and more frequent and complete variables, andappropriate measures should be proposed to control these blooms.G.Decreasing nutrient levels, particularly the TP load should be initiallyconsidered during the entire period and during the dry season, anddecreasing both the TN and TP loads should be considered during the rainy season.H. However, the relative importance of these factors may changeaccording to precipitation patterns.1.2.B3.A4.G5.3. __________答案:H96.The Relationships of Meteorological Factors and Nutrient Levels withPhytoplankton Biomass in a Shallow Eutrophic Lake Dominated byCyanobacteria, Lake Dianchi from 1991 to 2013A. The SHs, WS, and TPconcentrations controlled the bloom dynamics during the dry season, among which the TP concentration was the most important factors, whereas the TN and TP concentrations were the primary factors during the rainy season.B.Interannual analysis revealed that the phytoplankton biomass increased with increases in air temperature and TP concentration, with TP concentration as the main contributing factor.C. The results of our study demonstrated that both meteorological factors and nutrient levels had important roles incontrolling cyanobacterial bloom dynamics.D. All of these results suggest that both climate change regulation and eutrophication management should be considered in strategies aimed at controlling cyanobacterial blooms.E. Insummary, we analyzed the effects of meteorological factors and nutrientlevels on bloom dynamics in Lake Dianchi to represent the phytoplanktonbiomass.F. Further studies should assess the effects of climate change andeutrophication on cyanobacterial bloom dynamics based on data collected over a longer duration and more frequent and complete variables, andappropriate measures should be proposed to control these blooms.G.Decreasing nutrient levels, particularly the TP load should be initiallyconsidered during the entire period and during the dry season, anddecreasing both the TN and TP loads should be considered during the rainy season.H. However, the relative importance of these factors may changeaccording to precipitation patterns.1.2.B3.A4.G5.2. __________答案:C97.The Relationships of Meteorological Factors and Nutrient Levels withPhytoplankton Biomass in a Shallow Eutrophic Lake Dominated byCyanobacteria, Lake Dianchi from 1991 to 2013A. The SHs, WS, and TPconcentrations controlled the bloom dynamics during the dry season, among which the TP concentration was the most important factors, whereas the TN and TP concentrations were the primary factors during the rainy season.B.Interannual analysis revealed that the phytoplankton biomass increased with increases in air temperature and TP concentration, with TP concentration as the main contributing factor.C. The results of our study demonstrated that both meteorological factors and nutrient levels had important roles incontrolling cyanobacterial bloom dynamics.D. All of these results suggest that both climate change regulation and eutrophication management should be considered in strategies aimed at controlling cyanobacterial blooms.E. Insummary, we analyzed the effects of meteorological factors and nutrientlevels on bloom dynamics in Lake Dianchi to represent the phytoplanktonbiomass.F. Further studies should assess the effects of climate change andeutrophication on cyanobacterial bloom dynamics based on data collected over a longer duration and more frequent and complete variables, andappropriate measures should be proposed to control these blooms.G.Decreasing nutrient levels, particularly the TP load should be initiallyconsidered during the entire period and during the dry season, anddecreasing both the TN and TP loads should be considered during the rainy season.H. However, the relative importance of these factors may changeaccording to precipitation patterns.1.2.B3.A4.G5.1. __________答案:E98.It is rare to offer recommendations forfuture researchin Conclusion section.。
测试配置项初级单选题题库
测试配置项初级单选题题库English Answer:1. Which of the following is not a primary testing configuration item?Test environment.Test data.Test procedure.Test case.2. What is the purpose of a test plan?To define the scope, objectives, and approach for testing.To identify the resources needed for testing.To establish the criteria for passing or failing tests.All of the above.3. Which of the following is a benefit of using a test management tool?Improved test planning and execution.Increased test coverage.Reduced testing costs.All of the above.4. What is the purpose of regression testing?To ensure that changes to the software do not introduce new defects.To identify defects that were not caught during initial testing.To verify that the software meets its requirements.None of the above.5. Which of the following is a type of performance testing?Load testing.Stress testing.Endurance testing.All of the above.中文回答:1、下列哪一项不是主要的测试配置项?测试环境。
Checklist-Nature
Reporting Checklist For Life Sciences ArticlesThis checklist is used to ensure good reporting standards and to improve the reproducibility of published results. For more information, please read Reporting Life Sciences Research .▸ Figure legends• Check here to confirm that the following information is available in all relevant figure legends (or Methods section if too long):• t he exact sample size (n ) for each experimental group/condition, given as a number, not a range;• a description of the sample collection allowing the reader to understand whether the samples represent technical or biological replicates(including how many animals, litters, culture, etc.);• a statement of how many times the experiment shown was replicated in the laboratory ;• d efinitions of statistical methods and measures : (For small sample sizes (n<5) descriptive statistics are not appropriate, instead plot indi-vidual data points)o v ery common tests, such as t -test, simple χ2 tests, Wilcoxon and Mann-Whitney tests, can be unambiguously identified by name only,but more complex techniques should be described in the methods section;o are tests one-sided or two-sided?o are there adjustments for multiple comparisons?o statistical test results , e.g., P values ;o definition of ‘center values ’ as median or mean ;o definition of error bars as s.d. or s.e.m. or c.i.This checklist will not be published. Please ensure that the answers to the following questions are reported in the manuscript itself. We encourage you to include a specific subsection in the Methods section for statistics, reagents and animal models. Below, provide the page number or section and paragraph number (e.g. “Page 5” or “Methods, ‘reagents’ subsection, paragraph 2”).Corresponding Author Name: ________________________________________Manuscript Number: ______________________________▸ Statistics and general methods Reported in section/paragraph or page #:1. H ow was the sample size chosen to ensure adequate power to detect a pre-specified effect size? (Give section/paragraph or page #) For animal studies, include a statement about sample size estimateeven if no statistical methods were used.2. D escribe inclusion/exclusion criteria if samples or animals wereexcluded from the analysis. Were the criteria pre-established?(Give section/paragraph or page #)3. I f a method of randomization was used to determine how samples/animals were allocated to experimental groups and processed,describe it. (Give section/paragraph or page #)F or animal studies, include a statement about randomization even if norandomization was used.4. I f the investigator was blinded to the group allocation during theexperiment and/or when assessing the outcome, state the extent ofblinding. (Give section/paragraph or page #)For animal studies, include a statement about blinding even if no blindingwas done.5. F or every figure, are statistical tests justified as appropriate?Do the data meet the assumptions of the tests (e.g., normal distribution)?Is there an estimate of variation within each group of data?Is the variance similar between the groups that are being statisticallycompared? (Give section/paragraph or page #)6. T o show that antibodies were profiled for use in the system understudy (assay and species), provide a citation, catalog number and/orclone number, supplementary information or reference to an antibodyvalidation profile (e.g., Antibodypedia, 1DegreeBio).7. Cell line identity:a. Are any cell lines used in this paper listed in the database ofcommonly misidentified cell lines maintained by ICLAC (alsoavailable in NCBI Biosample)?b. If yes, include in the Methods section a scientific justification oftheir use – indicate here on which page (or section and paragraph)the justification can be found.c. For each cell line, include in the Methods section a statementthat specifies:- the source of the cell lines- have the cell lines been authenticated? If so, by which method?- have the cell lines been tested for mycoplasma contamination?In this checklist, indicate on which page (or section and paragraph)the information can be found.▸Animal Models Reported in section/paragraph or page #:8. Report species, strain, sex and age of animals9. F or experiments involving live vertebrates, include a statement ofcompliance with ethical regulations and identify the committee(s)approving the experiments.10. W e recommend consulting the ARRIVE guidelines (PLoS Biol. 8(6), e1000412,2010) to ensure that other relevant aspects of animal studies areadequately reported.▸Human subjects Reported in section/paragraph or page #:11. Identify the committee(s) approving the study protocol.12. I nclude a statement confirming that informed consent was obtainedfrom all subjects.13. F or publication of patient photos, include a statement confirmingthat consent to publish was obtained.14. R eport the clinical trial registration number (at orequivalent).15. F or phase II and III randomized controlled trials, please refer to theCONSORT statement and submit the CONSORT checklist withyour submission.16. F or tumor marker prognostic studies, we recommend that you followthe REMARK reporting guidelines.17. P rovide accession codes for deposited data.Data deposition in a public repository is mandatory for:a. Protein, DNA and RNA sequencesb. Macromolecular structuresc. Crystallographic data for small moleculesd. Microarray dataDeposition is strongly recommended for many other datasets for which structured public repositories exist; more details on our data policy are available here. We encourage the provision of other source data in supplementary information or in unstructured repositories such as Figshare and Dryad. We encourage publication of Data Descriptors (see Scientific Data) to maximize data reuse18. I f computer code was used to generate results that are central tothe paper’s conclusions, include a statement in the Methods sectionunder “Code availability” to indicate whether and how the codecan be accessed. Include version information as necessary and anyrestrictions on availability.。
phase说明书
Modelling Linkage Disequilibrium,And Identifying Recombination Hotspots Using SNP DataNa Li∗and Matthew Stephens†July25,2003∗Department of Biostatistics,University of Washington,Seattle,WA98195†Department of Statistics,University of Washington,Seattle,WA981951Running Head:Linkage Disequilibrium and RecombinationKey Words:Linkage Disequilibrium,Population Genetics,Recombination Rate,Recombination Hotspot,Coalescent,Approximate LikelihoodCorresponding Author:Matthew StephensDepartment of StatisticsUniversity of WashingtonBox354322Seattle,WA98195(206)543-4302(ph.)(206)685-7419(fax)stephens@2AbstractWe introduce a new statistical model for patterns of Linkage Disequilibrium(LD)among multiple SNPs in a population sample.The model overcomes limitations of existing approaches to under-standing,summarizing,and interpreting LD by(i)relating patterns of LD directly to the underlying recombination process;(ii)considering all loci simultaneously,rather than pairwise;(iii)avoiding the assumption that LD necessarily has a“block-like”structure;and(iv)being computationally tractable for huge genomic regions(up to complete chromosomes).We examine in detail one nat-ural application of the model:estimation of underlying recombination rates from population data. Using simulation,we show that in the case where recombination is assumed constant across the region of interest,recombination rate estimates based on our model are competitive with the very best of current available methods.More importantly,we demonstrate,on real and simulated data, the potential of the model to help identify and quantifyfine-scale variation in recombination rate from population data.We also outline how the model could be useful in other contexts,such as in the development of more efficient haplotype-based methods for LD mapping.31IntroductionLinkage disequilibrium(LD)is the non-independence,at a population level,of the alleles carried at different positions in the genome.The patterns of LD observed in natural populations are the result of a complex interplay between genetic factors,and the population’s demographic history. In particular,recombination plays a key role in shaping patterns of LD in a population.When a recombination occurs between two loci,it tends to reduce the dependence between the alleles carried at those loci,and thus reduce LD.Although recombination events in a single meiosis are relatively rare over small regions,the large total number of meioses that occur each generation in a population have a substantial cumulative effect on patterns of LD,and so molecular data from population samples contain valuable information onfine-scale variations in recombination rate.Despite the undoubted importance of understanding patterns of LD across the genome,most obviously because of its potential impact on the design and analysis of studies to map disease genes in humans,most current methods for interpreting and analyzing patterns of LD suffer from at least one of the following limitations:1.They are based on computing some measure of LD defined only for pairs of sites,rather thanconsidering all sites simultaneously.2.They assume a“block-like”structure for patterns of LD,which may not be appropriate at allloci.3.They do not directly relate patterns of LD to biological mechanisms of interest,such as theunderlying recombination rate.As an example of the limitations of current methods,consider Figure1,which shows a graphical display of pairwise LD measures for six simulated data sets,simulated under various models for heterogeneity in the underlying recombination rate.The reader is invited to speculate on what the underlying models are in each case—the answer appears in the caption to Figure8.In each of the sixfigures one could identify by eye,or by some quantitative criteria(e.g.D ALY et al.2001, O LIVIER et al.2001,W ANG et al.2002),“blocks”of sites,such that LD tends to be high among markers within a block.In some cases there might also be little LD between markers in different “blocks”,which might be interpreted as evidence for variation in local recombination rates:low recombination rates within the blocks,and higher rates between the blocks.Indeed,J EFFREYS et al.(2001)have shown,using sperm-typing,that in the class II region of MHC,variations in local recombination rate are indeed responsible for block-like patterns of LD.However,without this type of experimental confirmation,which is currently technically challenging and time consuming,it is difficult to distinguish between blocks that arise due to recombination rate heterogeneity,and blocks that arise due to chance,perhaps through chance clustering of recombination events in the ancestry of the particular sample being considered(W ANG et al.2002).The ability to distinguish between these cases would of course be interesting from a basic science standpoint—for example, in helping to identify sequence characteristics associated with recombination hotspots.In addition,4>0.10.10.050.010.001<0.0001Upper: FET p0−0.20.2−0.40.4−0.60.6−0.80.8−0.991.0Lower: |D|Figure1:Plots of LD measurement,|D |,(lower right triangle)and p-value for Fisher’s exact test(upper right triangle)for every pair of sites with minor allele frequency>0.15,in data sets simulated under varying assumptions about variation in the local recombination rate.Details of the models used to simulate each data set appear in the caption to Figure8,which is based on the same six data sets.it would have important implications for the design and analysis of LD mapping studies.For example,it would help in predicting patterns of variation at sites that have not been genotyped (perhaps sites influencing susceptibility to a disease),and it would provide some indication of whether block structures observed in one sample are likely to be replicated in other samples—a crucial requirement for being able to select representative“tag”SNPs(J OHNSON et al.2001) based on LD patterns observed in some reference sample.In this paper we introduce a statistical model for LD that overcomes the limitations of existing approaches by relating genetic variation in a population sample to the underlying recombination rate.We examine in detail one natural application of the model:estimation of underlying recombi-nation rates from population ing simulation,we show that in the case where recombination is assumed constant across the region of interest,recombination rate estimates based on our model5are competitive with the very best of current available methods.More importantly,we demonstrate, on real and simulated data,the potential of the model to help identify and quantifyfine-scale vari-ation in recombination rate(including“recombination hotspots”)from population data.Although we focus here on estimating recombination rates,we view the model as being useful more broadly,in interpreting and analyzing patterns of LD across multiple loci.In particular,as we outline in our discussion,the model could be helpful in the development of more efficient haplotype-based methods for LD mapping,along the lines of,for example,M C P EEK and S TRAHS (1999),M ORRIS et al.(2000),and L IU et al.(2001).2Models2.1BackgroundThe most successful current approaches to constructing statistical models relating genetic variation to the underlying recombination rate(and to other genetic and demographic factors)are based on the coalescent(K INGMAN1982),and its generalization to include recombination(H UDSON1983). Although these approaches are based on rather simplistic assumptions about the demographic his-tory of the population from which individuals were sampled,and about the evolutionary processes acting on the genetic region being studied,they have nonetheless proven useful in a variety of ap-plications.In particular,they provide a helpful simulation tool(e.g.software described in H UDSON 2002),allowing more realistic data to be generated under various assumptions about underlying biology and demography,and hence aid exploration of what patterns of LD might be expected under different scenarios(K RUGLYAK1999;P RITCHARD and P RZEWORSKI2001).Despite the ease with which coalescent models can be simulated from,using these models for inference remains extremely challenging.For example,consider the problem of estimating the underlying recombination rate in a region,using data from a random population sample.It follows from coalescent theory that population samples contain information on the value of the product of the recombination rate c,and the effective(diploid)population size N,but not on c and N separately.It has therefore become standard to attempt to estimate the compound parameter ρ=4Nc,and several methods have been proposed.Some(e.g.G RIFFITHS and M ARJORAM1996, N IELSEN2000,K UHNER et al.2000,F EARNHEAD and D ONNELLY2001)try to make use of the full molecular data available.However,although such methods have been applied successfully to small regions and non-recombining parts of the genome(H ARDING et al.1997;H AMMER et al.1998;N IELSEN2000;K UHNER et al.2000;F EARNHEAD and D ONNELLY2001),for even moderate-sized autosomal regions(e.g.a few kilobases in humans)they become computationally impractical(F EARNHEAD and D ONNELLY2001).Other methods,many of which are considered by W ALL(2000),make use of only summaries of the data,substantially reducing computational requirements at the expense of some loss in efficiency.More recently,H UDSON(2001)and F EARNHEAD and D ONNELLY(2002)proposed“com-posite likelihood”methods for estimatingρover moderate to large genomic regions.Hudson’s6method is based on multiplying together likelihoods for every pair of sites genotyped,where these pairwise likelihoods are computed via simulation,assuming an“infinite-sites”mutation model (i.e.no repeat mutation).This method has been modified by M C V EAN et al.(2002)to allow for repeat mutation.Fearnhead and Donnelly’s method is based on dividing data on a large region into smaller regions,and multiplying likelihoods obtained for each smaller region.These meth-ods,together with the best of the summary-statistic-based methods of W ALL(2000),appear to be the most accurate of existing methods for estimating recombination rates from patterns of LD over moderate to large genomic regions.None of these methods,as currently implemented,allows explicitly for variation in recombination rate along the region under study.2.2A New ModelHere we describe a new model for LD,which enjoys many of the advantages of coalescent-based methods(e.g.it directly relates LD patterns to the underlying recombination rate)while remaining computationally tractable for huge genomic regions,up to entire chromosomes.Our model relates the distribution of sampled haplotypes to the underlying recombination rate,by exploiting the identityPr(h1,...,h n|ρ)=Pr(h1|ρ)Pr(h2|h1;ρ)...Pr(h n|h1,...,h n−1;ρ),(1) where h1,...,h n denote the n sampled haplotypes,andρdenotes the recombination parameter (which may be a vector of parameters if the recombination rate is allowed to vary along the region). This identity expresses the unknown probability distribution on the left as a product of conditional distributions on the right.For simplicity we will often use the notationπto denote these conditional distributions.While the conditional distributions are not computationally tractable for models of interest,they are amenable to approximation,as we describe below.Our strategy is to substitute an approximation for these conditional distributions(ˆπsay)into the right hand side of(1),to obtain an approximation to the distribution of the haplotypes h givenρ:Pr(h1,...,h n|ρ)≈ˆπ(h1|ρ)ˆπ(h2|h1;ρ)...ˆπ(h n|h1,...,h n−1;ρ).(2) We refer to this model as a“Product of Approximate Conditionals”(PAC)model,and to the cor-responding likelihood as a PAC likelihood,which we denote L PAC.ExplicitlyL PAC(ρ)=ˆπ(h1|ρ)ˆπ(h2|h1;ρ)...ˆπ(h n|h1,...,h n−1;ρ).(3) Similarly,we will refer to the value ofρthat maximizes L PAC as a maximum PAC likelihood estimate forρ,and denote it byˆρPAC.The utility of the model(3)will naturally depend on the use of an appropriate approximation for the conditional distributionπ.This approximation should be designed to answer the following question:if,at a particular locus,in a random sample of k chromosomes from a population,we observe genetic types h1,...,h k,what is the conditional distribution of the type of the next sam-pled chromosome,Pr(h k+1|h1,...,h k)?We are aware of three forms forπin the literature,each7of which attempts to answer this question under different assumptions for the genetic model un-derlying the loci being studied.Thefirst and best-known comes from the Ewens sampling formula (E WENS1972).This arises from considering a neutral locus in a randomly-mating population, evolving with constant(diploid)size N and mutation rateµper generation,and assuming an“infi-nite alleles”mutation model,where each mutation creates a novel(previously unseen)haplotype. Under these idealized conditions,if we letθ=4Nµ,then with probability k/(k+θ)the k+1st haplotype is an exact copy of one of thefirst k haplotypes chosen at random,otherwise it is a novel haplotype.Although the assumptions underlying this formula will never hold in practice,it does capture the following properties that we would expect to hold more generally:(i)the next haplotype is more likely to match a haplotype that has already been observed manytimes than one that has been observed less frequently.(ii)the probability of seeing a novel haplotype decreases as k increases.(iii)the probability of seeing a novel haplotype increases asθincreases.However,for modern molecular data,and for sequence data and SNP data in particular,it fails to capture the two following properties:(iv)if the next haplotype is not exactly the same as an existing(i.e.previously-seen)haplotype, it will tend to differ by a small number of mutations from an existing haplotype,rather than to be completely different from all existing haplotypes.(v)due to recombination,the next haplotype will tend to look somewhat similar to existing haplotypes over contiguous genomic regions,the average physical length of these regions being larger in areas of the genome where the local rate of recombination is low.S TEPHENS and D ONNELLY(2000)suggested a form forπthat captures properties(i)-(iv)above. In their suggested form forπ,the next haplotype differs by M mutations from a randomly-chosen existing haplotype,where M has a geometric distribution with Pr(M=0)=k/(k+θ)(so that it reproduces the Ewens sampling formula in the special case of the infinite alleles mutation model). Thus the next haplotype is a(possibly imperfect)“copy”of a randomly-chosen existing haplotype.F EARNHEAD and D ONNELLY(2001)(henceforth FD)extended this form forπto also capture property(v)above.In FD’s approximation,the k+1st haplotype is made up of an imperfect mosaic of thefirst k haplotypes,with the size of the mosaic fragments being smaller for higher values of the recombination rate.Here we use two new forms forπthat also capture properties(i)-(v)above.Thefirst,described in detail in Appendix A and illustrated in Figure2,and which we denoteπA,is a simplification of FD’s approximation that is easier to understand and slightly quicker to compute.(Dr.N.Patter-son,personal communication,has independently suggested a similar simplification.)The second, which we describe in detail in Appendix B and denoteπB,is a slight modification ofπA,developed using empirical results from Section3.1to produce a likelihood L PAC that gives more accurate8estimates ofρ.Where necessary,we denote the PAC likelihoods and maximum PAC likelihood estimates corresponding toπA(respectivelyπB)by L PAC−A andˆρPAC−A(respectively L PAC−B and ˆρPAC−B).A key property of bothπA andπB is that they are easy and fast to compute.Unlike the Ewens sampling formula,but like the approximations of S TEPHENS and D ONNELLY(2000)and FD,nei-ther corresponds exactly to the actual conditional distribution under explicit assumptions about population demography and the evolutionary forces on the locus under consideration.Indeed, no closed-form expressions forπ,based on such explicit assumptions,and capturing(iv)or(v), are known.However,the suggested forms forπwere motivated by considering both the Ewens sampling formula,and the underlying genealogy(or,in the case with recombination,genealo-gies)relating a random sample of haplotypes from a neutrally-evolving,constant-sized panmictic population.As such,it may be helpful to view them as approximations to the(unknown)true con-ditional distribution under these assumptions.In particular,there are certain aspects of many real populations(e.g.population expansion,or population structure),and biological factors(e.g.gene conversion,selection)that these forms forπdo not attempt to capture.For some applications this may not matter very much.For others it may be necessary to develop forms forπthat do capture these aspects—a point we return to in the discussion.An unwelcome feature of the PAC likelihoods corresponding to our choices ofπ—and indeed the forms forπfrom S TEPHENS and D ONNELLY(2000)and FD—is that they depend on the order in which the haplotypes are considered.In other words,although these likelihoods each correspond to a valid probability distribution on the haplotypes,these probability distributions do not enjoy the property of exchangeability that we would expect to be satisfied by the true(unknown)distribution. Practical experience,and theory in S TEPHENS and D ONNELLY(2000)(their Proposition1,part d)suggests that this problem cannot be rectified by making a simple modification toπ.Although in principle the dependence on ordering could be removed by averaging the PAC likelihood over all possible orderings of the haplotypes,in practice this would require a sum over n!terms,which is infeasible even for rather small values of n.Instead,as a pragmatic alternative solution,we propose to average L PAC over several random orders of the haplotypes.Unless otherwise stated, all results reported here were obtained by averaging over20random orders.In our experience, the performance of the method is not especially sensitive to the number of random orders used—results based on100random orders gave qualitatively similar results,and results based on a single random order were often not much worse(data not shown).It is,however,important that when comparing likelihoods for different values ofρ,the same set of random orders should be used for each value ofρ.3Estimating constant recombination rateIn this section we consider estimating the recombination rate when it is assumed to be constant across the region of interest.More precisely,we assume that crossovers in a single meiosis occur93h 2h 1h 4B h4Ah Figure 2:Illustration of how πA (h k +1|h 1,...,h k )builds h k +1as an imperfect mosaic of h 1,...,h k .The figure illustrates the case k =3,and shows two possible values (h 4A and h 4B )for h 4,given h 1,h 2,h 3.Each of the possible h 4s can be thought of as having been created by “copying”(imper-fectly)parts of h 1,h 2and h 3.The shading in each case shows which haplotype was “copied”at each position along the chromosome.Intuitively we think of h 4as having recent shared ancestry with the haplotype that it copied in each segment.We assume that the copying process is Markov along the chromosome,with jumps (i.e.changes in the shading)occuring at rate ρ/k per physical distance.Thus the more frequent jumps in h 4B suggest a higher value of ρthan the less frequent jumps in h 4A .Note that for very large numbers of ρthe loci become independent,as they should.Each column of circles represents a SNP locus,with black and white representing the two alleles.The imperfect nature of the copying process is exemplified at the third locus,where each of h 4A and h 4B have the black allele,although they “copied”h 2,which has the white allele.In practice,of course,the shading is not observed,and so to compute the probability of observing a particular h 4we must sum over all possible shadings.The Markov assumption allows us to do this efficiently using standard methods for Hidden Markov Models,as described in Appendix A.10as a Poisson process of constant rate c per unit(physical)distance,and consider estimating the scalar parameterρ=4Nc.Wefirst use simulated data to examine the properties of the estimator ˆρPAC−A,corresponding to the conditional distributionπA described in Appendix A,under what we will call the“standard coalescent model”:constant-sized,panmictic population,with an infinite-sites mutation model.We show that,although quite accurate,ˆρPAC−A exhibits a systematic bias. We use the empirical results to develop a modified conditional distributionπB(described in detail in Appendix B),whose corresponding estimatorˆρPAC−B exhibits considerably less bias,and is more accurate.We compare the performance of models based on bothπA andπB with results from other methods.3.1Properties of the point estimateˆρPACWe used the program mksample(H UDSON2002)to simulate data sets consisting of samples of SNP haplotypes from the standard coalescent model,for various values of1.the number n of haplotypes in the sample.2.the number S of markers typed.3.the value ofρ(we measure physical distance so that the total physical length of each simu-lated haplotype equals1.0.Thus the value ofρis also the total value ofρacross the region.) For each data set we foundˆρPAC−A by numerically maximizing the PAC likelihood(using a golden bisection search method,P RESS et al.1992),and compared it with the true value ofρused to generate the data.It seems natural to measure the error in estimates forρon a relative,rather than an absolute, scale.For example,W ALL(2000)reported the frequency with which different methods for esti-matingρgave estimates within a factor of2of the true value,and both FD and H UDSON(2001) examine the distribution of the ratioˆρ/ρfor their estimatesˆρ,and the deviation of this ratio from the“optimal”value of1.A problem with working with this ratio directly is that it tends to penalize over-estimation more heavily than under-estimation.For example,overestimatingρby a factor of 10gives a larger deviation from1than underestimatingρby a factor of10.To avoid this problem, we quantify the relative error of an estimateˆρforρby Err(ρ,ˆρ)=log10(ˆρ/ρ).This gives,for example,an error0ifˆρ=ρ,an error of1ifˆρoverestimatesρby a factor of10,and an error of -1ifˆρunderestimatesρby a factor of10.We note that Err(ρ,ˆρ)can also be viewed as the error(on an absolute scale)in estimating log10(ρ)by log10(ˆρ).Thus,if the usual asymptotic theory for maximum likelihood estimation applies for estimation of log10(ρ)in this setting(which,as discussed in FD,it may not)then for the actual MLEˆρMLE ofρ,Err(ρ,ˆρMLE)would be asymptotically normally distributed,centered on0.Optimistically,we might therefore hope that for sufficiently-large data sets(large in terms of the number of haplotypes,the number of markers,or both)Err(ρ,ˆρPAC−A)might be approximately normally distributed,centered on0.In our simulations,we found that for some combinations of11n,S andρthis did indeed appear to be the case(e.g.Figure3(b)),but that for other combinations, although the distribution often appeared close to normal,it was centered around some non-zero value(e.g.Figure3(a),(c)),indicating a systematic tendency forˆρPAC−A to over-or under-estimate ρ.We will refer to the median of Err as the“bias”(of log(ˆρPAC−A)in estimating log(ρ)).Although bias is usually defined as a mean error,this is not particularly helpful here since the mean is often heavily influenced by a small number of very large values,and may even be infinite in some cases(see also FD).We therefore follow previous authors,including H UDSON(2001)and FD,in concentrating on the behavior of the median,rather than the mean,of the error.Despite the biases evident in Figure3(a)and(c),ˆρPAC−A gives reasonably accurate estimates of ρ.For example,even in the right-hand panel(c)of Figure3,which shows one of the most extreme biases we observed in our simulations,the bias corresponds to underestimatingρby approximately a factor of2,andˆρPAC−A is within a factor of2of the true value ofρin68%of cases.Although in many statistical applications estimates within a factor of2of the truth would not be considered particularly helpful or impressive,in this setting this kind of accuracy is often not easy to achieve (see for example W ALL2000).We performed extensive simulations to better characterize the bias noted above,and found that although the bias depends on all3variables(n,S,andρ),it is especially dependent on the average spacing between sites.More specifically,forfixed n and S we observed a striking linear relationship between the bias,and the log of the average marker spacing(Figure4).This linear relationship was also apparent for data simulated under an assumption of population expansion(data not shown). The slope of the linear relationship is negative in each case,indicating a tendency forˆρPAC−A to overestimateρwhen the markers are very closely spaced and underestimateρwhen the markers are far apart.As the number of sampled haplotypes increases,both the slope and intercept of the line appear to get closer to0(Table1).Based on these empirical results we can modifyπA to reduce the bias of the point estimates(see Appendix B for details).The improved performance of this modified conditional distribution,which we denoteπB,is illustrated in the next section.Figure4also illustrates the effect of varying parameter values on the variability of point es-timates.As might be expected,the variance of the error reduces with increased sample size,and increased number of sites,with the latter providing the more substantial decrease.For example, doubling the number of sites from50to100roughly halved the variance of the error in most cases, while doubling the number of individuals from50to100resulted in much smaller decreases.For afixed sample size,and number of sites,the variance of the error decreases as the spacing between sites grows.This may be due to the fact that for larger spacings more recombination events occur, increasing the relative accuracy with whichρcan be estimated—although we would not expect this pattern to continue indefinitely as the marker spacing is increased beyond the range considered here.12D e n s i t y012345−1−0.50.51(a)(b)−1−0.500.51−1−0.50.51(c)Figure 3:Histograms of the error Err (ρ,ˆρPAC −A )=log 10(ˆρPAC −A /ρ),each based on 100data sets simulated from the standard coalescent model with n =50haplotypes and S =50segregating sites.The values of ρare a)ρ=5,b)ρ=25,and c)ρ=500.Superposed curves are normal densities with the same mean and standard deviation as the 100values making up the histogram.These results,as well as those in Figure 4and Table 1are based on averaging the likelihoods over 10random orders of the haplotypes.Intercept Slope n \S 2050100205010020−0.16−0.12−0.09−0.18−0.21−0.2650−0.12−0.07−0.06−0.16−0.21−0.24100−0.12−0.06−0.04−0.09−0.17−0.21200−0.10−0.05−0.02−0.06−0.14−0.17Table 1:The intercepts and slopes of the linear relationship between log 10(ˆρPAC /ρ)and log 10(spacing )(see also Figure 4).13n = 20 , S = 200.10.51510−11n = 50 , S = 200.10.51510−11n = 100 , S = 200.10.51510−11n = 200 , S = 200.10.51510−11n = 20 , S = 500.10.51510−11n = 50 , S = 500.10.51510−11n = 100 , S = 500.10.51510−11n = 200 , S = 500.10.51510−11n = 20 , S = 1000.10.51510−11n = 50 , S = 1000.10.51510−11n = 100 , S = 1000.10.51510−11n = 200 , S = 1000.10.51510−11Average Spacing (log10 scale)l o g 10(ρ^P A C ρ)Figure 4:Box plots showing the relationship of the bias to the average marker spacing.For each combination of parameters,100data sets each were simulated under the standard coalescent model.The parameters involved are:the number of haplotypes in each sample n =20,50,100,200;the number of segregating sites S =20,50,100;the average marker spacing ρ/S =0.1,0.5,1.0,5.0and 10.0.In humans a marker spacing of ρ/S =0.5corresponds to roughly 1kb between markers.The unlabeled tick marks on the y -axis correspond to ˆρPAC −A =±2ρ.14。
沼气英语
PDLAMMPS近场动力学
Available to the public from U.S. Department of Commerce National Technical Information Service 5285 Port Royal Rd Springfield, VA 22161 Telephone: Facsimile: E-Mail: Online ordering: (800) 553-6847 (703) 605-6900 orders@ /help/ordermethods.asp?loc=7-4-0#online
Issued by Sandia National Laboratories, operated for the United States Department of Energy by Sandia Corporation. NOTICE: This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government, nor any agency thereof, nor any of their employees, nor any of their contractors, subcontractors, or their employees, make any warranty, express or implied, or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represent that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government, any agency thereof, or any of their contractors or subcontractors. The views and opinions expressed herein do not necessarily state or reflect those of the United States Government, any agency thereof, or any of their contractors. Printed in the United States of America. This report has been reproduced directly from the best available copy. Available to DOE and DOE contractors from U.S. Department of Energy Office of Scientific and Technical Information P.O. Box 62 Oak Ridge, TN 37831 Telephone: Facsimile: E-Mail: Online ordering: (865) 576-8401 (865) 576-5728 reports@ /bridge
要学会量入为出的英语作文
要学会量入为出的英语作文Title: Learning the Principle of "Input Equals Output" 。
In the pursuit of mastering a new language, theprinciple of "input equals output" holds immense significance. Essentially, it suggests that the quality and quantity of information we absorb in the form of input directly influence the quality and fluency of our output in that language. This principle underpins the very essence of language acquisition, emphasizing the importance of immersion, practice, and exposure. In this essay, we will delve into the various aspects of this principle, its implications, and effective strategies for implementing itin our language learning journey.First and foremost, let us explore the significance of input in language acquisition. Input refers to any form of exposure to the target language, including reading, listening, watching videos, and interacting with native speakers. It serves as the foundation upon which languageskills are built. When we expose ourselves to a diverse range of input materials, we are essentially feeding our brains with the building blocks of the language –vocabulary, grammar structures, idiomatic expressions, and cultural nuances. The more varied and extensive our input, the richer our linguistic repertoire becomes.Moreover, input plays a pivotal role in developing our receptive skills – listening and reading comprehension. Through extensive listening and reading practice, we not only enhance our understanding of the language but also internalize its patterns and structures. This subconscious assimilation of linguistic elements lays the groundwork for proficient language production.However, the mere accumulation of input is insufficient without active engagement and processing. This is where the principle of "input equals output" comes into play. To effectively internalize and utilize the input we receive,we must engage in meaningful practice and output activities. Output encompasses speaking and writing – the active expression of language. By engaging in regular speaking andwriting practice, we reinforce our grasp of the language, solidify our understanding of its nuances, and refine our communication skills.Furthermore, the principle of "input equals output" underscores the importance of feedback in the language learning process. Constructive feedback acts as a guiding light, illuminating our strengths and weaknesses, and pointing out areas for improvement. Whether it comes from teachers, language exchange partners, or language learning apps, feedback helps us fine-tune our language skills and progress towards fluency.Now, let us explore some effective strategies for implementing the principle of "input equals output" in our language learning journey:1. Immersive Learning: Surround yourself with the target language as much as possible. Immerse yourself in authentic language environments through watching movies, listening to music, and engaging with native speakers.2. Extensive Reading and Listening: Devote time to extensive reading and listening practice. Choose materials that are slightly above your current proficiency level to challenge yourself while still maintaining comprehension.3. Active Engagement: Actively engage with the input material by taking notes, summarizing, and rephrasing. This helps reinforce your understanding and retention of the language.4. Regular Speaking Practice: Practice speaking regularly, whether with language partners, tutors, or through language exchange platforms. Don't be afraid to make mistakes – they are an integral part of the learning process.5. Structured Writing Practice: Set aside time for regular writing practice. Start with simple exercises such as journaling, writing summaries, or composing emails, and gradually progress to more complex writing tasks.6. Seek Feedback: Solicit feedback from teachers,language exchange partners, or online communities. Use this feedback constructively to identify areas for improvement and tailor your learning approach accordingly.In conclusion, the principle of "input equals output" encapsulates the essence of effective language learning. By immersing ourselves in diverse input materials, actively engaging with the language, and seeking feedback, we can accelerate our progress towards language proficiency and fluency. Remember, language learning is a journey, not a destination – embrace the process, stay motivated, and never cease to explore the boundless possibilities that language acquisition offers.。
英语听力备考技巧与方法
Family with professional vocabulary in different fields
Summary: Familiarity with professional vocabulary in different fields can help candidates better understand listening materials. Candidates should have knowledge of professional terminology and commonly used vocabulary in different fields.
Use authentic records, such as podcasts, news broadcasts, or conversions, to improve family with real world English
Utilize multiple listening resources
Detailed description
Candidates can continuously accumulate new vocabulary through reading English original works, news reports, academic p and try to apply it in oral and writing to deepen their understanding and memory of vocabulary.
Advisor the topics and language used
Select materials that cover a range of topics and use language that is similar to that found in the target test
assumptions of parametric tests
assumptions of parametric tests Assumptions of Parametric TestsParametric tests are statistical tests used to make inferences about a population based on a sample. These tests are widely used in various fields, including psychology, economics, biology, and sociology, to name a few. However, they come with a set of underlying assumptions that must be met for the results to be valid and reliable. In this article, we will explore the assumptions of parametric tests and discuss them step by step.1. Independence: One of the fundamental assumptions of parametric tests is that the observations within a sample or between samples must be independent of each other. This means that the value of one observation should not be influenced by the value of another observation. Violating this assumption can lead to biased and inaccurate results. To ensure independence, researchers often use random sampling or random assignment techniques in their studies.2. Normality: Another important assumption is that the data should follow a normal distribution. This assumption holds formany parametric tests, including t-tests and analysis of variance (ANOVA). A normal distribution is symmetrical and follows a bell curve shape. Violating the normality assumption can lead to Type I or Type II errors and affect the validity of the statistical results. Researchers often check for normality using graphical methods, such as histograms or Q-Q plots, or statistical tests, such as the Shapiro-Wilk test or the Kolmogorov-Smirnov test.3. Homogeneity of variance: Homogeneity of variance assumes that the variances of the groups being compared should be equal. This assumption is crucial for tests such as ANOVA or t-tests that compare means across different groups. Violating this assumption can lead to inaccurate p-values and unreliable results. Researchers often use statistical tests, such as Levene's test or the Bartlett test, to assess the homogeneity of variance assumption.4. Linearity: Linearity assumes that there is a linear relationship between the independent variable(s) and the dependent variable. This assumption is particularly relevant in regression analysis or analysis of covariance (ANCOVA). Violating the linearity assumption can lead to biased estimates and invalid inferences. Researchers often assess linearity assumptions by examiningscatterplots or by testing for interaction effects in their models.5. Absence of outliers: Outliers are extreme values that deviate significantly from the rest of the data. They can have a substantial impact on the results of parametric tests. It is important to identify and deal with outliers appropriately to avoid biased estimates and incorrect inferences. Researchers often use graphical methods, such as boxplots or scatterplots, or statistical tests, such as the Z-score or the Mahalanobis distance, to detect outliers.It is worth noting that not all parametric tests have the same assumptions. The ones mentioned above are the most common assumptions across various parametric tests, but some specific tests may have additional assumptions. For example, the assumptions of normality and independence are particularly important for parametric tests involving regression analysis or multivariate analysis.In conclusion, parametric tests are powerful tools for making statistical inferences. However, they come with assumptions that must be met for the results to be valid and reliable. Theseassumptions include independence, normality, homogeneity of variance, linearity, and absence of outliers. Researchers need to carefully assess these assumptions before applying parametric tests to ensure the accuracy and validity of their findings.。
小学上册第四次英语第2单元期中试卷
小学上册英语第2单元期中试卷英语试题一、综合题(本题有100小题,每小题1分,共100分.每小题不选、错误,均不给分)1.The capital of Armenia is ________ (埃里温).2.Did you ever see a _______ (小蜻蜓) hovering by the water?3.I like to ______ (参与) in student council meetings.4.The cat is under the ______ (table).5.My sister enjoys __________ (与朋友一起玩).6. A ______ is a geographical feature that influences local climates.7.Kittens are baby _______ (猫).8.We enjoy _____ with our friends. (playing)9.I always ______ my teeth after meals.10.How many vowels are in the English language?A. ThreeB. FourC. FiveD. Six11.The ______ is known for its elaborate courtship dance.12. A __________ is formed when rock layers are folded.13.The forecast predicts a ______ (酷热) week.14.The chemical formula for sodium chloride is _______.15.What do we call a female chicken?A. RoosterB. HenC. DuckD. GooseB16.The ______ (小鸟) builds a nest in the spring.17.What do we call the process of a liquid turning into a gas?A. EvaporationB. CondensationC. FreezingD. MeltingA18.The ancient Egyptians had a complex social ________ (阶层).19.The __________ (历史的地理) shapes cultural development.20.The __________ is a layer that protects life on earth.21.What do we call the person who studies space?A. AstronomerB. AstrologerC. MeteorologistD. PhysicistA Astronomer22.What do we call the study of the interactions between organisms and their environment?A. EcologyB. BiologyC. BotanyD. ZoologyA23.What do we call a baby cat?A. KittenB. PuppyC. CubD. FoalA24.My dad encourages me to be __________ (创造性的) in my projects.25.We have _____ (很多) friends.26.The _____ (lamp/desk) is bright.27.My collection of ____ is displayed on my shelf. (玩具名称)28.I like to practice ______ (写作) stories. It helps me improve my language skills and creativity.29. A ____ is a tiny animal with whiskers that likes to explore.30.My brother plays the ____ (keyboard) in a band.31.On rainy days, I like to ______ (动词) with it indoors. It always makes me feel______ (形容词).32.What do we call a series of connected points?A. LineB. CurveC. ShapeD. Path33.What is the chemical symbol for oxygen?A. OB. O2C. O3D. OzA34.advocacy organization) campaigns for change. The ____35.What is the name of the famous landmark in Sydney?A. Opera HouseB. Harbour BridgeC. Bondi BeachD. Great Barrier ReefA Opera House36.__________ are important for the growth of crops.37.The ______ (植物的物种多样性) is crucial for resilience.38.What is the name of the planet known for its rings?A. SaturnB. JupiterC. MarsD. NeptuneA39.What is the term for a baby rabbit?A. KitB. CubC. FoalD. PupA Kit40.What instrument has keys and is played by pressing?A. GuitarB. DrumsC. PianoD. Flute41.I have a magical ________ (玩具名称).42.The _____ (tulip) is blooming.43.The chemical formula for hydrobromic acid is _____.44.I believe in setting goals. This year, my goal is to __________.45.What do you put on your head to protect it from the sun?A. ShoesB. GlovesC. HatD. ScarfC Hat46.What is the smallest unit of life?A. OrganB. TissueC. CellD. Organism47.What is the color of grass?A. BlueB. GreenC. YellowD. Brown48.The ____ has a distinctive call and is known for its intelligence.49.What color do you get when you mix red and white?A. PinkB. PurpleC. BrownD. Gray50.The _____ (灯塔) guides ships.51.I built a spaceship with my toy ____. (玩具名称)52.The turtle is _____ slowly. (moving)53.The _____ is the distance between two celestial bodies.54.Flowers can express ______ (情感).55.I like to ______ (学习) about animals.56.The sloth sleeps most of the ________________ (时间).57.I want to _____ (go/stay) home now.58.What do you call the skin covering the body?A. EpidermisB. DermisC. TissueD. MuscleA59.What do you call a book that tells a fictional story?A. NovelB. BiographyC. AnthologyD. TextbookA60. A _______ can measure the amount of energy used by appliances.61.My brother loves to __________ (参加) school events.62.What is the color of grass?A. BlueB. YellowC. GreenD. Purple63.The teacher is _______.64.We go swimming in the ______. (summer)65.The chemical formula for potassium bromide is ______.66.I like to watch ________ (体育赛事) on TV.67.What do you call a baby kangaroo?A. JoeyB. CalfC. KittenD. PupA68.What is the capital city of Tunisia?A. TunisB. SfaxC. KairouanD. Bizerte69.I like to visit ______ during spring break.70.The fruit is ___. (juicy)71.Every morning, I write in my _______ (日记). It helps me organize my _______ (思想).72.My friend is a ______. He enjoys inventing new things.73. (50) is known for its rainforests. The ____74.What is the color of a ripe tomato?A. GreenB. YellowC. RedD. Blue75.What is the name of the longest river in the world?A. AmazonB. NileC. MississippiD. YangtzeB76.Cherry _______ are red and delicious to eat.77. A ______ is a natural feature that can be explored for research.78.The _______ of light can create different colors when passing through a prism.79.What do we call the person who teaches us in school?A. DoctorB. TeacherC. ChefD. EngineerB80.My favorite holiday is ________ (万圣节) for the costumes.81.When I grow up, I want to be __________ because I want to help people by__________. I admire __________ because he/she is very __________ and inspires others.82.My brother plays the ____ (keyboard) in a band.83.What do we call a period of one hundred years?A. CenturyB. DecadeC. MillenniumD. Generation84. A _______ is a small plant that grows close to the ground.85.How many months are in a year?A. 10B. 11C. 12D. 13C86.Plants can also _____ (提供) food for animals.87.What is the shape of a soccer ball?A. SquareB. TriangleC. CircleD. Rectangle88.What is the main source of water on Earth?A. RiversB. LakesC. OceansD. PondsC89.My brother is a ______. He enjoys playing chess.90.My sister is my best _______ because we laugh together.91. a is the largest ________ (沙漠) in the world. The Samu92.ipation Proclamation freed the _______ in the United States. (奴隶) The Eman93.I enjoy making ________ (生日卡) for my friends.94.Which fruit is red and often mistaken for a vegetable?A. TomatoB. StrawberryC. WatermelonD. CherryA95.The _____ (天空) is blue.96. A rabbit has long _________. (耳朵)97. A ____(team-building exercise) fosters collaboration.98.I have a ______ (collection) of coins.99. A __________ is a layer of rock below the soil.100. A __________ is an example of a physical change.。
Particle Filtering Approaches for Multiple Acoustic
Particle Filtering Approaches for Multiple Acoustic Source Detection and2-D Direction of Arrival Estimation Using a Single Acoustic Vector Sensor Xionghu Zhong,Member,IEEE,and A.B.Premkumar,Senior Member,IEEEAbstract—This paper considers the problem of tracking mul-tiple acoustic sources using a single acoustic vector sensor(A VS). Firstly,a particlefiltering(PF)approach is developed to track the direction of arrivals offixed and known number of sources.Sec-ondly,a more realistic tracking scenario which assumes that the number of acoustic sources is unknown and time-varying is con-sidered.A randomfinite set(RFS)framework is employed to char-acterize the randomness of the state process,i.e.,the dynamics of source motion and the number of active sources,as well as the measurement process.As deriving a closed-form solution for the multi-source probability density is difficult,a particlefiltering ap-proach is employed to arrive at a computationally tractable ap-proximation of the RFS densities.The proposed RFS-PF algorithm is able to simultaneously detect and track multiple sources.Simu-lations under different tracking scenarios demonstrate the ability of the proposed approaches in tracking multiple acoustic sources. Index Terms—Acoustic vector sensor,detection and tracking,di-rection of arrival,particlefiltering,randomfinite set.I.I NTRODUCTIOND ETECTION,localization and tracking of2-D(azimuthand elevation)direction of arrivals(DOA)of multiple acoustic sources in a noisy environment are important topics in signal processing and have many applications such as room speech enhancement,underwater target surveillance,sonar and acoustic radar signal processing.The tasks are traditionally performed by using an array equipped with several pressure sensors together with estimation techniques developed based on the acoustic pressure measurements[1],[2].However,such techniques usually require either an array with large aperture or multiple hybrid arrays.In recent years,a new technology namely acoustic vector sensor(A VS)has been widely employed for acoustic source detection and localization,and different signal processing algorithms have been developed accordingly [3]–[25].Acoustic vector sensor employs a co-located sensor structure which consists of two or three orthogonally oriented velocity sensors and an optional pressure sensor[26],[27].It measuresManuscript received November17,2011;revised March21,2012;accepted May02,2012.Date of publication May17,2012;date of current version August 07,2012.The associate editor coordinating the review of this manuscript and approving it for publication was Prof.Joseph Tabrikian.The authors are with the School of Computer Engineering,College of Engineering,Nanyang Technological University,Singapore,639798(e-mail: xhzhong@.sg;asannamalai@.sg).Color versions of one or more of thefigures in this paper are available online at .Digital Object Identifier10.1109/TSP.2012.2199987acoustic pressure as well as particle velocity at sensor posi-tion.Given an A VS with three velocity components located at the origin of the three-dimensional(-,-and-coordinates) space,the sensor manifold has the form(1) where thefirst component represents the output of pressure sensor,and the rest three components are the velocity response along three directions respectively.The anglesand are the azimuth and the elevation tively.The structure suggests that A VS has following advantages over traditional pressure sensors.1)It produces both the azimuth and elevation information andenables2-D DOA estimation with a single vector sensor.2)Its elevation ranges between and allows elevationangle estimation3)The manifold is independent of the source’s frequency,which makes A VS suitable for wideband source signal or scenarios where the source’s signal frequency is unknowna priori.Due to a number of advantages mentioned above,both the the-oretical aspects and the applications of A VS have been widely studied in the last decade[3]–[25].A VS is used for infrasonic measuring in the ocean environment in[26].In[28],it has been theoretically proved that a single A VS is able to uniquely iden-tify up to two arbitrary distributed acoustic sources.A VS wasfirst introduced in signal processing and acoustic source localization problems in[3].An intensity based algo-rithm that uses both the pressure and particle velocity vector, and a velocity covariance based algorithm that uses only the particle velocity vector are fully presented in[5].A maximum likelihood based DOA estimation algorithm is developed in [28].The conventional beamforming(Bartlett beamforming) and Capon beamforming for2-D DOA estimation using acoustic vector sensors are investigated in[9].It shows that both the azimuth and elevation can be unambiguously estimated by using an A VS array.Further,the subspace based approaches such as MUSIC[12],[17]and ESPRIT[7],[10],[12],[15]have been used for A VS localization problem.The problem of2-D DOA estimation using a single A VS is particularly addressed and investigated in[14].More practically,A VS localization in impulsive noise environments and shallow ocean environments are investigated by employing fractional lower order statistics [29]and subspace intersection method[24],respectively.The1053-587X/$31.00©2012IEEEapplication of A VS in the room reverberant environment is studied in[30].The authors in[31]employ a towed A VS to track angles and frequencies of sperm whales.The existing2-D DOA estimation schemes assume that the source is static and extensively rely on localization approaches. Further,for DOA estimation of multiple sources,the number of sources is usually assumed to be known andfixed.These assumptions are often violated in real applications since the sources(e.g.,submarines in underwater or speakers in the room environment)are in fact dynamic,and the number of sources may be unknown and time-varying.Although the narrowband detection of acoustic source using A VS is addressed in[21],it cannot identify the number of sources and its use is limited in the narrowband scenario.For dynamic source,it is desired to model the source motion as well as the A VS measurements,and develop a multi-source tracking approach to detect the number of sources and track the DOA of each source simultaneously.To the best of authors’knowledge,no such a method has yet been derived for multiple A VS DOA estimation problem.In recent past,particlefiltering(PF)[32],[33]has been found to be effective in coping with nonlinear and non-Gaussian system models and has been widely employed for target tracking problems.It employs a number of particles to repre-sent the probability density function(PDF)of the unknown state vector,and evaluate the importance weights of these parti-cles according to the source dynamic model and the likelihood. Subsequently,the particles are duplicated/discarded according to the high/low importance weights,and the resampled particles are able to represent the posterior PDF of the state.For more details of the PF and its application to the tracking problem, the reader is referred to two books[34],[35].PF has been em-ployed for array signal based multiple target tracking in[36]. Also,it has been successfully used for underwater acoustic source and geoacoustic tracking problems[37],[38],and room acoustic source tracking problem[39],[40].Using A VS and such a PF scheme to track a single acoustic source has been recently studied in[41]–[43].In this paper,a PF is developed to track the2-D DOAs of multiple acoustic sources using an A VS.Firstly,the tracking problem in which the number of sources is assumed to be known andfixed at all time steps is considered.A constant velocity(CV)model[44]is used to model the source dynamics. The covariance matrices of source signal and the measurement noise process are unknown in practice.These parameters are regarded as nuisance parameters and estimated by using a maximum likelihood estimator.The likelihood of particles is consequently concentrated likelihood and formulated based on the measurement sequence and the estimated covariance matrices.Due to a sample-based representation of the posterior PDF of the state vector,PF is able to cope with the nonlinear measurement model and well suited for DOA estimation.It is observed that the mainlobe of the likelihood function is spread under low signal-to-noise ratio(SNR)environments.Therefore, the likelihood function is further exponentially weighted to generate a sharper peak and to emphasize the particles sampled at high likelihood area.A Rao–Blackwellization step is used to marginalize out the velocity component of the state and thus reduce the dimension of the PF.The key advantage of the proposed PF tracking algorithm is that it incorporates both the temporal and spatial information,and hence it is able to estimate the DOA accurately and efficiently even though the source is dynamic in a low SNR environment.Secondly,we consider a more challenging problem where both the source motion dynamics and the number of sources are assumed to be unknown and time-varying.The unknown parameters of interest are thus the number of sources as well as the corresponding2-D DOAs.Also we consider a more practical sensor model in which the measurements can either be regular signals emitted by sources or be strange signals due to transient interferences.The former is modeled by A VS signal model and the later is modeled by using an empty set.Both the source dy-namic model and the sensor measurement model are thus with a random geometry and all the randomness can be encapsulated in a randomfinite set(RFS)framework.Generally,RFS frame-work neglects the intrinsic data association between sources and measurements,and has been found promising for multi-object tracking problem[45]–[49].In the state space,each element of a RFS is a random vector which can be employed to describe the source motion dynamics,and the cardinality is a random variable that can be used to model the time-varying number of sources.Similar structure can also be constructed for the col-lected measurements.Subsequently,a PF implementation for the RFS state tracking is also developed to obtain a computa-tionally tractable approximation of the RFS probability densi-ties.For rigorous mathematical description of RFS framework and its application in multi-object tracking problem,the reader is referred to[45]–[49].Particularly,RFS is employed for room acoustic source detection and tracking in[47].The core contribution of this work is that full probabilistic model based approaches have been derived for A VS based multiple acoustic source tracking pared to the existing RFS approaches[46],[47],the main contributions are double fold:Rao-Blackwellization step is employed to marginalize out the source velocity and both regular and irreg-ular signals are modeled.Particularly,the RFS-PF algorithm developed here is able to simultaneously detect and track multiple acoustic sources based on the received A VS signals. The rest of this paper is organized as follows.In Section II,the A VS signal model and particlefiltering method are introduced. Section III presents the tracking algorithm developed for known andfixed number of sources.The enhanced likelihood model and the Rao-Blackwellization step are also formulated. Section IV presents the RFS-PF tracking algorithm developed for unknown and time-varying number of acoustic sources.Per-formance metric and discussions regarding real application of the proposed approaches are presented in Section V.Simulated experiments are organized in Section VI.Finally,conclusions are drawn and future directions of this work are discussed in Section VII.II.P ROBLEM F ORMULATIONThis section provides a brief review on DOA estimation based on A VS acoustic signal.The signal model for an A VS is introducedfirst.A.AVS Measurement ModelThe goal of this paper is to develop an approach to detect and track dynamic acoustic sources simultaneously.The number of sources as well as the position of each source are thus assumedZHONG AND PREMKUMAR:PF APPROACHES FOR MULTIPLE ACOUSTIC SOURCE DETECTION AND2-D DOA ESTIMATION USING A SINGLE A VS4721to be time-varying.Wefirstly formulate the A VS signal model for multiple dynamic sources.Assume that there are simultaneously active acoustic source signals arriving at an A VS at discrete time.The source signals can be written in a collection given as(2)where is a wide-band signal,i.e.,. It has an independent and identically distributed(i.i.d.)random amplitude and random phase.The phase is assumed to be distributed over.Further assume that the th source signal is emitted at a2-D direction given by(3)with and denoting the azimuth and the elevation anglesAcoustic vector sensor measures the acoustic pressure as well as three component particle velocities.Let be the unit direc-tion vector pointing from the origin toward the source position, and factorized by a constant term,given as(4) where and represent the ambient density and the propa-gation speed of the acoustic wave in the medium respectively. Using a phasor representation,the received signal model for an A VS located at can be written as[5](5) where and represent the corre-sponding pressure and velocity noise terms separately.isthe time delay of the th wave between the sensor and the origin of the coordinate system,i.e.,.For an acoustic source that moves relatively the DOA can be assumed to be stable if a small number of snapshots are processed at each time step.Assume that snapshots are taken into account at time step,the number of sources is thus and the snapshots of the source signal can be written as(6) where.The noise and received data matrices can be expressed as(7)(8) where.Accordingly,is used to express the DOA.Equation(5)can thus be written as(9) where(10)and(11) is the steering vector.The received signal includes both the az-imuth and elevation information,and can be used for2-D DOA estimation.The measurement equation is the A VS data model(9).As-sume that1)the noise terms in(9)are independent identically distributed(i.i.d.),zero-mean complex circular Gaussian pro-cesses and are independent from different channels and2)the source signal and the noise are independent.The PDF of the measurements can be written as(12)where represents a multivariate complex Gaussian distribution with mean and covariance matrix.The source signal process and noise process are characterized by their co-variance matrices given by(13)(14) where is an th order identity matrix,and and are the noise variances for the pressure and velocity components re-spectively.The corresponding covariance matrix is given as(15)We are now faced a2-D DOA estimation problem with un-known nuisance parameters,and.B.Source Motion ModelDOA estimation based on the localization approaches such as beamforming and subspace methods only use the spatial in-formation from the current measurements.Since the DOAs be-tween adjacent time steps are highly correlated,it is desired to model the source motion trajectories and estimate the source DOA by incorporating the temporal information(implied in the source dynamic model).In this section,the CV is introduced to model the source motion.Assume that the sources move with a velocity(in rad/s), for.The source state can be constructed by cascading the DOA and the motion velocity,i.e.,.The CV model[44]is employed here to model the source dynamics and is given as(16)where the coefficient matrix and are defined,respectively, by(17) where represents the time period in seconds between the previous and current time step,and denotes the Kronecker product,and is a zero-mean real Gaussian process(i.e.,4722IEEE TRANSACTIONS ON SIGNAL PROCESSING,VOL.60,NO.9,SEPTEMBER2012)with)used to model the turbulence on the source velocity.represents a diag-onal matrix with main diagonal entry and0elsewhere.For the source states,we have following assumptions:•each active source follows the CV motion model described in(16);•the source motions are independent of each other;The polar system based CV model(16)has been widely em-ployed for DOA tracking problems in[42],[43],and[50]. C.Sequential Monte Carlo EstimationLet denote all measurements ob-tained until time step,and be the collection of states.The task is to estimate the posterior recursively.The solution based on Bayesiantowards to this problem can be given as follows:•Predict:(18)•Update:(19) In this recursion,is the posterior distribution estimated at the last time step,and is the prior distribution for the current time step.The Bayesian recursion states that given both the posterior distribution of the state esti-mated at the previous time step and system models,the cur-rent probability distribution of the state can be obtained recur-sively.Although Kalmanfilter can be used to solve the Bayesian recursion in(18)and(19),its use is limited in the case of linear and Gaussian system models.However,the PF that provides an excellent solution to the nonlinear problem is employed in the work[32].The core idea of PF is that it uses a set of particles and importance weights of these particles to approximate the posterior distribution.Assuming that particles are used to ap-proximate the above Bayesian recursion,the PDFis represented by.The whole procedure of PF processing can be following.At time step, the particles are sampled according to an importance function, given by(20) The importance weights of the particles are then evaluated by(21) Usually,the optimal importance function,i.e.,is able to provide minimum estimation variance.However,this importance function cannot be obtained in a straightforward manner.One alternative is to employ a prior importance func-tion.Given the state particles,for at pre-vious time step,the are sampled at the current time step according to the source dynamic model(16),given by(22)The particles are thus weighted according to,where is the nor-malized weight.After the resampling scheme,the posterior distribution of the state is thus approximated by(23) where is a Dirac-delta function.For slow DOA motion and simple trajectories,the CV model is able to model the source motion accurately and the particles can be drawn from the prior importance function effectively.The remaining task is then con-struct appropriate likelihood function to evaluate the importance of these particles.III.T RACKING A F IXED AND K NOWN N UMBER OFA COUSTIC S OURCESTracking a known andfixed number of multiple acoustic sources using an A VS is considered in this section.This work can be regarded as an extension of our previous investigation of tracking a single acoustic source in[43].The main difference comes from the derivation of the likelihood and formulation of the importance function.In addition,a Rao–Blackwellization step is employed here to marginalize out the velocity compo-nent of the state.The general idea of2-D DOA tracking using PF will form the basis of our solution to the more compli-cated problem in the next section in which an unknown and time-varying number of sources is considered.A.Concentrated Likelihood ModelSince the measurement noise process is assumed to be Gaussian,the likelihood function can be written as(24)where denotes the determinant,and represents the trace operation.is an estimate of the covariance matrix of the A VS measurements given by(25)where superscript represents the Hermitian transpose.The statistics of source signal and noise process are unknown in practice.Taking all these parameters into account will result in a high-dimensional problem.Here,we use a concentrated like-lihood function which estimates the nuisance parameters based on a maximum likelihood estimator.According to[5],the con-centrated likelihood function can be written as(26)ZHONG AND PREMKUMAR:PF APPROACHES FOR MULTIPLE ACOUSTIC SOURCE DETECTION AND2-D DOA ESTIMATION USING A SINGLE A VS4723where(27)(28) The likelihood is then concentrated on the parameters of in-terest,the2-D DOA.The DOA estimates can be obtained by implementing a2-D search over the possible DOA range which maximizes(26).However,such a method is computation-ally expensive since a2-D search is required.In this work,we will introduce a particlefiltering approach to estimate the2-D DOA,by which the2-D search can be avoided.We draw state samples randomly over the2-D DOA space.The likelihood for particles can thus be obtained by substituting the particle state into the likelihood function(26).Generally,when the particles are closer to the ground truth state,a larger likelihood will be produced.Consequently these particles are weighted more sig-nificantly and are more likely to be selected for state estimation. An alternative method to formulate the likelihood is pre-sented[33],where uninformative prior density for the nuisance parameters are specifiedfirst,and these parameters are inte-grated out from the problem.Another advantage of using PF tracking approach to estimate the2-D DOA is that it models the source dynamics and employs both temporal information(from the source dynamic model)and the spatial information(from the measurements)to track the DOAs,while the localization approaches only uses the spatial information.It can be observed in Section II-C that the particles are in fact weighted by their likelihood when the prior importance func-tion is employed.It is desired that the particles around ground truth will present a large likelihood and weighted significantly larger than those far away from the ground truth.Hence,these particles which are more potential to the state estimation can be replicated.However,it is well known that the mainlobe of concentrated likelihood function(26)is usually spread andflat due to the low SNR environment.The particles sampled close to the ground truth cannot be significantly weighted.Another drawback of likelihood function(26)is that it has an exponen-tial factor.When the number of snapshots is very large (e.g.,),the likelihood value becomes extremely small and may be truncated to zero in computer implementation.It is thus necessary to reshape the likelihood function and make it more amenable to our problem.To eliminate the effect due to large,the likelihood is ex-ponentially weighted by.Hence the effect due to the number of snapshots is canceled.On the other hand,the likelihood function becomesflat and the particles cannot be weighted ef-ficiently in the low SNR scenario.We further exponentially weight the likelihood by a constant value.Hence,the like-lihood function will be more peaky and the weight of particles which are located at high likelihood area can be enhanced.The likelihood function(26)can now be written as(29)where.After this normalization and weighting,the like-lihood function is reshaped and the weight of particles which are sampled at the high likelihood area can be enhanced.This step is important since it is able to help the subsequent resampling algorithm to select and replicate the particles more efficiently.B.Rao-Blackwellization StepGiven the transition and likelihood models derived above, formulating the PF algorithm for A VS source tracking is straightforward.It can be observed from the CV model that the state is constructed by the position component and velocity component,and the source position holds a linear relationship with the velocity.The position state can be regarded as a measurement of the velocity component.The CV model(16) can be decomposed as an auxiliary state space model,given as(30)(31) As previously defined,the noise variance of velocity state process is,and the noise variance of auxiliary mea-surement process(position state process)is.Thus the velocity component of the state can be marginalized out by using a Kalmanfilter(KF),and only the position part needs to be handled by using the PF.Such a technique is also referred to as Rao-Blackwellization[34]and widely used for the state estimation where part of state space equations are linear and Gaussian[34],[35].Using Bayesian theorem,the posterior distribution of the whole state can be written as(32) in which is analytically tractable and is estimated by PF approximation.Since part ofcan be estimated optimally(by using a KF),the dimension of the state to be processed by PF can be reduced. Consequently,the Rao-Blackwellization based PF is able to provide better estimates than the standard PF when the same number of particles are used[34].The KF marginalization can be summarized as follows:(33)(34)(35)(36)(37)(38) For KF implementation,the velocity state and its variances are initialized as and,respectively.The complete tracking steps are summarized in Algorithm1. Thefiltering algorithm is quite general and can also be used for multiple target tracking based on other sensor models,i.e.,tra-ditional acoustic pressure sensors.Since the DOA states in the4724IEEE TRANSACTIONS ON SIGNAL PROCESSING,VOL.60,NO.9,SEPTEMBER2012particles are with an arbitrary order,extracting thefinal state es-timates by taking the expectation of the particles will not make any sense.In this paper,a K-means algorithm is employed to cluster the particles,and thefinal estimation is obtained from the centroids of these clusters.It is observed that using one step K-means implementation is able to provide accurate estimates. It is worth mentioning that due to the advantage of A VS steering vector,the DOA tracking algorithm does not require the fre-quency information of each individual source,and is appropriate for both the narrow-band and wideband acoustic source tracking problem.Algorithm1:RB-PF for A VS2-D DOA tracking. Initialization:for,draw particles,where is obtained from MUSIC method;set the initial weight;for to do1)calculate the covariance matrix according to(25);for to do2)compute the likelihood from(26);3)compute the importance weight:end4)normalize the weight;5)resample the particles according to the weights;6)Kalmanfiltering according to(33)to(38);7)draw particles according to(31);8)output the estimates by using one step K-meansmethod.endIV.T RACKING AN U NKNOWN AND T IME-V ARYINGN UMBER OF S OURCESThe previous section illustrated the PF algorithm for known andfixed number of acoustic sources tracking.In practice,the number of sources is actually unknown and time-varying.It is thus necessary to consider detection and tracking of mul-tiple acoustic source jointly using an A VS.This section presents our solution towards to tracking an unknown and time-varying number of sources.Essentially,an RFS framework is formu-lated to characterize the randomness of source dynamics as well as measurement uncertainties.The RFS state process is intro-ducedfirst.It is worth mentioning that the RFS state model for-mulated here is similar to that in[47].A.RFS State Model FormulationFor simultaneously detecting and tracking of unknown number of multiple acoustic sources,the parameters of interest will be the number of sources as well as the2-D DOA of each source.The state of a single source at current time step is stated as,for.All the parameters of interest can be characterized by using a singlefinite set,given as(39)where is the number of sources,with representing the cardinality.Given a realization of the RFS state at previous time step,the source state at current step is modeled by(40)where is the state vector of sources born at time step ,1and denotes the RFS of states that have survived at time step.is the initial state vector under the birth hy-pothesis.The DOA part of all new born states is assumed to be uniformly distributed over the possible DOA range,and the velocity part is assumed to be a Gaussian distribution around a certain velocity.is thus given as(41)For the birth process,we assume the following:•at most one source is born at a time step;•where is the maximum number of sources at each time step.Thefirst assumption is employed to simplify the problem.Also it is plausible to make such an assumption since the number of sources we considered here is relatively small.In practice,it is possible that multiple sources turn up simultaneously.Fur-ther,the maximum number of sources in the surveillance area is bounded at,i.e.,.This means that when,the new born state vector is an empty set,i.e., .It is observed in[28]that for a single A VS,up to two sources can be uniquely identified.Hence,is chosen in this work.2The source birth process can thus be formulated as(42) where and are the hypotheses for birth process and non-birth processes,respectively.The surviving state set can be formulated by considering a death process. Assume that and are the hypotheses for death process and non-death processes,respectively.When a death process happens,the corresponding state will be set as empty, and the remaining states will evolve following the motion dy-namic(16).can thus be given asforth source;.(43) 1Since the time-varying number of sources is considered,the source dynamics is not only the source motion itself,but also source birth and death process.In this paper,we use birth and death processes to describe the source appearance and disappearance in the tracking scene,and accordingly,the terminology‘born’is employed to represent that new source appears in the surveillance area,and ‘die’refers to existing source which disappears from the surveillance area.2Note that when the number of sources is larger than two,an A VS array could always be employed.。
Outlier Detection By Sampling With Accuracy Guarantees
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD ’2006 Philadelphia, USA Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00.
time, where the constant c is usually above three. In these domains, state-of-the-art algorithms may take days to detect DB-outliers, even in data sets of small sizes. The goal of this paper is to define an algorithm that can provide users with interactive-speed performance over the most expensive distance computations, giving the user the ability to “try out” various distance functions and queries during exploratory mining. The question we consider is: How can we reduce the required number of distance computations in DB-outlier mining? For certain distance measures and data sets, indexing and pruning techniques can be used to reduce the number of distance computations. Unfortunately, indexing [3] is not useful in the domains mentioned above due to high data dimensionality, and pruning [2] tends only to reduce the required number of distance computations to a few hundred or a few thousand per data point; as we will show experimentally, this is still too costly if every distance computation requires seconds of CPU time. In this paper, we consider a simple sampling algorithm for mining the kth nearest neighbor (kth-NN) DB-outliers [12]. Given integer parameters k and γ , kth-NN outliers are the top γ points whose distance to its kth-NN is greatest. Sampling has been considered before as an option for detecting outliers [8], but never along with a rigorous treatment of the effect on result quality. If the user is able to tolerate some rigorously-measured inaccuracy, our algorithm can give arbitrarily fast response times. Our algorithm works as follows. For each data point i, we randomly sample α points from the data set. Using the userspecified distance function, we find the kth-NN distance of point i in those α samples. After repeating the above process for each point, the sampling algorithm returns the γ points whose sampled kth-NN distance is greatest. Algorithm 1 presents the corresponding pseudo-code: Algorithm 1 Sampling Algorithm Input: A data set G of size n; k, specifying which NN distance as the criterion; α(α > k ), the sample size; γ , the number of outliers to return. Output: γ points as the kth-NN outliers 1: for each point i ∈ G do 2: Draw a random sample of size α from G (not including point i) 3: Calculate point i’s kth-NN distance in its sample 4: end for 5: Return the top γ points whose kth-NN distance in its sample is the greatest
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
NBER WORKING PAPER SERIESONE SIMPLE TEST OF SAMUELSON’S DICTUMFOR THE STOCK MARKETJeeman JungRobert J. ShillerWorking Paper9348/papers/w9348NATIONAL BUREAU OF ECONOMIC RESEARCH1050 Massachusetts AvenueCambridge, MA 02138November 2002The authors are indebted to John Y. Campbell, Paul A. Samuelson, and Tuomo Vuolteenaho for comments. Ana Fostel provided research assistance. The views expressed herein are those of the authors and not necessarily those of the National Bureau of Economic Research.© 2002 by Jeeman Jung and Robert J. Shiller. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.One Simple Test of Samuelson's Dictum for the Stock MarketJeeman Jung and Robert J. ShillerNBER Working Paper No. 9348November 2002JEL No. G14ABSTRACTSamuelson (1998) offered the dictum that the stock market is “micro efficient” but “macro inefficient.” That is, the efficient markets hypothesis works much better for individual stocks than it does for the aggregate stock market. In this paper, we present one simple test, based both on regressions and on a simple scatter diagram that vividly illustrates that there is some truth to Samuelson’s dictum. The data comprise all U.S. firms on the CRSP tape that have survived since 1926.Jeeman Jung Robert J. ShillerDivision of Economics and Interantional Trade Cowles Foundation andSangmyung University International Center for FinanceSeoul, Korea 110-743Yale UniversityNew Haven, CT 06511and NBERrobert.shiller@One Simple Test of Samuelson’s Dictum for the U. S. Stock Market1by Jeeman Jung and Robert J. ShillerPaul A. Samuelson has argued that one would expect that the efficient markets hypothesis should work better for individual stocks than for the stock market as a whole: Modern markets show considerable micro efficiency (for the reason thatthe minority who spot aberrations from micro efficiency can make moneyfrom those occurrences and, in doing so, they tend to wipe out anypersistent inefficiencies). In no contradiction to the previous sentence, Ihad hypothesized considerable macro inefficiency, in the sense of longwaves in the time series of aggregate indexes of security prices below andabove various definitions of fundamental values.”2We will put this dictum the test in terms of the simplest efficient markets model that asserts that stock prices equal the expected present value (with constant discount rates) of expected future dividends. We will examine Samuelson’s dictum by the simple methodof running a regression of future multi-year dividend changes on current dividend-price ratios and testing whether the dividend-price ratio predicts these changes, along lines shown in Campbell and Shiller [1998], [2001], but for individual stocks 1926-2001, as well as for stock indexes. This will allow us to see in very direct terms whether the simple efficient markets model works better for individual stocks than it does for indexes. It will allow us some new insights into the claim of LeRoy and Porter [1981] and Shiller [1981] that stocks are excessively volatile to be justified in terms of information about 1. The authors are indebted to John Y. Campbell, Paul A. Samuelson, and Tuomo Vuolteenaho for comments. Ana Fostel provided research assistance.2. This quote is from a private letter from Paul Samuelson to John Campbell and Robert Shiller. The quote appears, and is discussed, in Robert J. Shiller, Irrational Exuberance, 2nd Edition, 2001, p. 243. Samuelson’s dictum is also treated in Samuelson [1998].future dividends, and the conclusion of Campbell [1991] that variance of news about future cash flows accounts for only a third to a half of the variance of unexpected stock returns.Our use of individual stock data over a 75-year interval also allows us another advantage over tests of market efficiency based on stock-price indexes. When we assume that stock prices are, according to efficient markets theory, optimal forecasts of the present value of dividends discounted by an estimated constant rate, it follows that the present value gives weight to future dividends many years in the future. Since few firms survive as separate firms for as long a time as the present value formula gives substantial weight to, the efficient markets model has usually been tested using stock price indexes, which continue without interruptions through time. But with stock price indexes, the changing composition of the index over the years means that the subsequent dividends reported for the index at time t+k are not the dividends accruing on the stocks comprising the index at time t. While one may argue that this changing composition of the index is not a problem for index-based tests of market efficiency, it does introduce a layer of complexity to the analysis. In this paper, we take the simpler approach of just looking at how well individual stock prices relative to dividends predict the stock’s actual own dividend changes far into the future.The Efficient Markets Model in Dynamic Gordon Model Form One way of writing the simple efficient markets model expresses the dividend-price ratio as a function of expected future dividend growth. Assuming a constant discount rate but varying growth rate of real dividends, the dividend-price ratio D t/P t canbe derived from the simple expected present value relation with discount rate r as;Dt t t t g E r P D −=/, where ∑∞=−++∆=11)1(/k k t k t D t r P D g , (1) P t is the real (inflation corrected) stock price at the end of year t , D t is the real dividend during the year t, ∆, r is the discount factor used in the present value formula for stock prices, and E 1−−=t t t D D D t denotes expectation conditional on information at time t .3Note that in the equation , representing a dividend growth rate, is expressed as the sum of discounted amounts of future dividend changes from a $1 investment at time t .D t g 4 In other words, the growth rates are computed relative to price P rather than D , and this is important since with individual firms there are in fact some zero dividends, and so growth rates of dividends themselves could not be calculated.The equation can be viewed as a dynamic counterpart of the Gordon model,, where g is the constant expected dividend growth rate. The equation (1) implies that at times when the dividend-price ratio is high, it portends relatively low growth of dividends over future years, while when the dividend-price ratio is low, it portends relatively rapid growth of dividends over future years. We take this model as representing the essence of the simple efficient markets model. While there are other versions of the efficient markets model, with additional complexities, this simple version g r P D −=/3. Note that efficient markets theory implies (1) even if firms repurchase shares in lieu of paying as much dividends: the share repurchase has the effect of raising subsequent per-share dividends.4. Campbell and Shiller [1988a, 1988b] used a log-approximation of the dividend-price model as follows;*)/log()/log(t t t t t P D E P D =, where C ∑∞=+−+∆−=11*log )/log(j j t j t t D P D ρThe formula is closely analogous to (1) in this paper.has sufficient currency in public thinking, at least as a first approximation, to warrant learning whether it is at least approximately true.We could in theory evaluate this model, after turning the efficient marketsequation around to , by regressing, with time series data, onto a constant and the dividend price ratio D t t D t t P D r g E /−=D t g t /P t , and testing the null hypothesis that the coefficient of D t /P t is minus one. Such a test of the efficient markets hypothesis would be recommended by its simplicity and immediacy. There is however the practicaldifficulty that the summation extends to infinity and so the right hand side can never be computed with finite data. Campbell and Shiller [1988b] showed a rigorous way of testing a loglinearized version of this model under the assumption of a vector auto-regressive model for the change in log dividends and the log dividend-price ratio.5 A simpler, and more direct way, without adding the additional assumptions implicit in the vector-autoregressive model, is to approximate the right hand side and run a regression of the approximated right hand side onto the dividend price ratio. This was done inCampbell and Shiller [1998], [2001] for aggregate stock market indexes. Campbell and Shiller [2001] regressed ten-year log dividend growth rates ln(D t +10 /D t ) onto ln(D t /P t ) with annual Standard & Poor Composite stock price data using the long time-series data of 1871 to 2000. The coefficient of ln(D t /P t ) turned out to be positive, to have thewrong sign. The result was interpreted as indicating that in the entire history of the U. S. stock market, the dividend-price ratio has never predicted dividend growth in accordance with the simple efficient markets theory. More complex versions of the efficient markets5. Campbell and Shiller rejected the efficient markets model using index data, while Vuolteenaho[2002] found more encouraging results for efficient markets theory when he applied the vector-autoregressive methods to individual firm data of 1954-96.model, involving time-varying interest rates, were also explored using a generalization of this model, and also found wanting, Campbell and Shiller [1988a]. In this paper, which concentrates on individual firm differences, we focus on the simpler version of the model, with constant discount rates, since this version represents the most popular version of efficient markets theory, asserting just that movements in the price of any stock relativeto its dividend reflect new information about the outlook for the future payoff of that stock.Running the Regression with Individual Stock DataA fundamental problem with testing this model with individual stock data is, as we have noted, that while the model concerns growth rates of dividends from decade to decade, there are not many firms that survive for many decades. In fact, when we did a search on the Center for Research on Security Prices (CRSP) tape, we found that there were only 49 firms that appear on the tape continuously without missing information during the period of 1926 to 2001.6 Since the number of surviving firms is so small, there is a risk that they are atypical, not representative of all firms. While this risk must be borne in mind in evaluating our results, we believe that looking at this the universe of surviving U. S. firms on the CRSP tape still offers some substantial insights, at least as a case study. Note that the mere fact of survival would be expected if anything to put an upward bias on the average return on the stocks. It would have no obvious implication for6. When Poterba and Summers [1988] did a similar search of the CRSP tape, they found 82 survival firms during the 1926-1985 period. The smaller number here apparently reflects the continuing disappearance of firms through time. While the number of firms is small, we observe that they span a wide variety of industries. Among the 49 firms, there are 31 manufacturing firms, 5 utility companies, 5 wholesale & retails, 3 financial firms, 4 mines & oil companies and one telecommunication company.either the time-series or cross-sectional ability of the dividend-price ratio to predict future changes in dividends.Using monthly data from the CRSP tape, we create the series of annual dividends, D t , by summing up twelve monthly dividends from January to December of the year; the price P t is for the end of the year.7 We exclude from the series non-ordinary dividends due to liquidation, acquisition, reorganization, rights offering, and stock splits. All the dividends and stock prices are adjusted by the proper price adjustment factors obtained from the CRSP tape and then are expressed in real terms using the Consumer Price index.As a proxy for the future dividend growth we use , the summation truncated after K years:D t g D t g ˆ∑=−++∆=K k k t k t Dt r P D g 11)1(/ˆ (2)and we set r equal to 0.064, which is the annual average return over all firms and dates in the sample.8To confirm statistical significance, we regress onto a constant and D D t gˆt /P t with the 49 individual firms data in three different ways: A. separately for each of the 49 firms (49 regressions each with 76-K observations ), B. pooled over all firms with a dummy for each firm (one stacked regression with 49×(76-K) observations) and C. for the equally-weighted portfolio composed of the 49 firms (one regression with 76-K7. The results are invariant to the starting month for the calculation of annual dividends. We also work on the same estimation using the data of survival firms after World War II. There are 125 firms that have existed during the 1946-2001 period without any missing information on stock prices and dividends. The results of the regressions on these samples are basically similar to those reported in the paper.8. We avoid the common practice of using the terminal price, P t +K to infer dividend changes beyond t +K since that would bring us back to using a sort of return variable as the dependent variable in our regressions: we want our method to have a simple interpretation, here just whetherobservations). Table 1 shows the three results for K =10, 15, 20, and 25, while for the pooled regression, K =75 is also shown. When appropriate, t statistics were computed using a Hansen-Hodrick [1980] procedure to correct these statistics for the effects of serial correlation in the error term due to the overlapping 10-, 15-, 20- or 25-yearintervals with annual data. For the stacked regressions (B) for K =10, 15, 20 and 25, the Hansen-Hodrick procedure was modified to take account as well of contemporaneous correlation of errors across firms.9If there were no problem of survivorship bias and if the truncation of our infinite sum for were not a problem, then we would expect that the slope in the regressions should be minus one and the intercept be the average return on the market. In fact, the truncation of the infinite sum means that the coefficient might be something other than minus one. Hence, we merely test here for the negativity of the coefficient of thedividend-price ratio, looking only to see if it is significant in predicting future dividend changes in the right direction. Because of survivorship bias, the fact that we are looking only at surviving firms would appear to put a possible upward bias on the intercept, and hence we do not focus on the intercept here.D t gˆTable 1 Panel A reports the summary result of the 49 individual regressions. For K =10, the average coefficient and the average t-statistic on D t /P t are –0.440 and –2.11, respectively. We find that for K =10, 42 out of the 49 firms had negative coefficients as predicted by the theory, and 20 of them are statistically significant at 5% significance the dividend-price ratio predicts future dividend growth.9. The variance matrix Ω of the error term in the stacked regression, for computation of thevariance matrix of the coefficients (X ΄X)-1(X ΄ ΩX)(X ΄X)-1 consists of 49×49 blocks, one for each firm pair. Each block has the usual Hansen-Hodrick form, but we allow for cross-covariance in the off-diagonal blocks.level.10 As K is increased, the average t-statistic and R squared decrease. The coefficientof D/P always has the negative coefficient predicted by the Gordon model, though far from –1.00. Thus, D/P does seem to forecast future dividend growth, although the coefficient is shrunken from minus one towards zero, as one might expect if there is some extraneous noise D/P (caused, say, by investor fads), causing an errors-in-variables biasin the coefficient.Table 1 Panel B shows the results when the regressions were pooled, so that there are (except where K=75) many more observations in the regression than in Panel A and hence more power to the test. In the K=75 case, the limiting case with our 76 annual observations, the regression reduces to a simple cross section regression of the 49 firmsfor t=1926. Since there are only 49 observations in the K=75 case, the test is not powerful here, and we report it only for completeness. For K=10, 15, 20, and 25 the t-statistic is highly significant and negative. As K is increased, the coefficient of the dividend-price ratio decreases, and at K=75, the coefficient is very close to its theoretical value of –1.00 (though poorly measured since only 1926 D/P are used). These results provide impressive evidence for the Gordon model as applied to individual firm data in the sense that the estimated coefficients are significantly negative, though usually above minus one.Table 1 Panel C shows the results when the regressions were put together into one regression (by using an equally-weighted portfolio) so that we can test the Gordon model as applied to an index of the 49 stock prices. The coefficient of the dividend-price ratio has a positive sign, the wrong sign from the standpoint of the Gordon model, and no longer is statistically significant except for K=25. The wrong sign mirrors the negative10. Those results, not reported in the table to conserve the space, are available from the authorson request.result for the efficient markets model that Campbell and Shiller [1988a] found with a much broader stock market index.The t-statistics reported for Panel C are for the null hypothesis that the coefficient of D/P is zero; the statistics are much larger against the efficient markets hypothesis that the coefficient equals minus one. However, there is an issue that the distribution of our t-statistics may not approximate the normal distribution if D /P is nonstationary, or nearly so. While our financial theory suggests that the dividend yield should be stationary, in fact the dividend yield is at best slowly mean-reverting. Elliott and Stock [1994] show that the size distortion in the t-statistic caused by near-unit root behavior may be substantial. Campbell and Yogo [2002] show however that if we rule out explosive processes for the dividend-price ratio in regressions like those of panel C, there is good evidence against market efficiency.We interpret these results as confirming the Samuelson Dictum. In our resultsthere is substantial evidence that individual firm dividend-price ratios predict future dividend growth in the right direction, but no evidence that aggregate dividend-price ratios do.A Look at the DataFigure 1 shows a scatter diagram of for K =25 against D D t gˆt /P t for all 2,499 observations, that is for all 49 firms and for t =1926 to 1976 (1976 being the last year for which 25 subsequent years are available). The range of D t /P t is from 0.0 to 0.4—several times as wide as the range of the dividend-price ratio for the aggregate stock market over the sample period. Over this entire range, there is a distinct negative slope to the curve, asthe efficient markets theory would predict: firms with lower dividend-price ratios did indeed have higher subsequent dividend growth, offering some evidence for micro efficiency. Plots for K=10, 15, and 20 look very similar to Figure 1.One should be cautious in interpreting this diagram, however. Note that by construction all points lie on or above a line from (0,0) with a slope of minus one, reflecting the simple fact that dividends cannot go below zero. The efficient markets model and our assumption that dividends beyond K years into the future cannot be forecasted instead says that the scatter should cluster around a line from (0,r-c) with a slope of minus one, a line that lies above the other line and is parallel with it, where c is the mean of the truncated portion of the present value formula, as well as any possible survivorship bias. But our results are not guaranteed by construction. Indeed when the scatter of points for the aggregated firms (corresponding to the third regressions, Panel C in the table) is plotted, it lies above this line but does not have a negative slope.This line from (0,0) with a slope of minus one is easily spotted visually as the lower envelope of the scatter of points. Any observation of D t /P t that is followed by a dramatic drop in dividends (to approximately zero for K years) will lie approximately on this line. Some of the most visible points on the scatter represent such firms. For example, the extreme right outlier on the scatter, representing Schlumberger Ltd. in 1931, represents nothing more than a situation in which the firm attempted to maintain its dividend level in spite of rapidly declining fortunes. Its stock price fell precipitously after the 1929 crash, converting a roughly 8% dividend into a 40% dividend, which was cut to zero in 1932, and held there for many years. This extreme case may be regarded as a victory for the efficient markets model, in that it does show that the dividend-price ratiopredicts future dividend growth, though not the usual case we think of when we consider market efficiency. It is plain from the fact that the points are so dense around the lower envelope line, that much of the fit derives from firms whose dividends dropped sharply.Another simple story is that of firms that pay zero dividends. Note that all firm-year pairs with zero dividends can be seen arrayed next to the vertical axis, and that the dividend growth for these firms tends to be higher than for the firms with non-zero dividends, as the dynamic Gordon model would predict. Firms with zero dividendsshowed higher dividend growth as measured by : the mean for the zero-dividend observations is 0.149, which, is greater than r=0.064, possibly reflecting the selection bias for surviving firms noted above. The fact that these points along the vertical axis cluster above 0.064 might also be considered a sort of approximate victory for market efficiency. Also note that even if we deleted these firms, there still is a pronounced negative slope to the scatter. The predictive ability of the dynamic Gordon model is not just due to the phenomenon of zero dividends.D t gˆD t g ˆEven if we delete all observations of zero dividends, and look at dividend price ratios less than the discount rate r , that is, less than 0.064, then the slope of the regression line for K=25 changes to –0.479, not much closer to zero. This means that there are also observations of a low but non-zero dividend-price ratio successfully predicting above-normal dividend growth.Regression diagnostics following Belsley, Kuh and Welsch [1980] revealed that no particularly influential observations were responsible for the results in the pooled regressions.SummaryWith these data on the universe of U. S. individual firms on the CRSP tape with continuous data since 1926 Samuelson’s dictum appears to have some validity. Over the interval of U. S. history since 1926, individual-firm dividend-price ratios have had some significant predictive power for subsequent growth rates in real dividends: this is evidence of micro-efficiency. A look at a scatter plot of the data confirms that this result is not exclusively due to zero dividends. Moreover, when the 49 firms are aggregated into an index, the dividend-price ratio gets the wrong sign in the regressions, and is usually insignificant. If anything, high aggregate dividend-price ratios predict high aggregate dividend growth, and so there is no evidence of macro efficiency.11The very negative results on the efficiency of the stock market that were reported by LeRoy and Porter [1981] and Shiller [1981] appear to apply much more to the aggregate stock market than to individual stocks.11. The results are consistent with those of Vuolteenaho (2002), who uses firm-level data in conjunction with a vector autoregressive model and a variance decomposition along lines first described in Campbell [1991] to conclude that firm level stock returns are predominantly driven by fundamentals. Cohen, Polk and Vuolteenaho [2002] provide a similar variance decomposition of firm-level price to book ratios, finding that fundamentals predominate. Jung [2002] finds using variance and covariance ratio tests that individual stock returns show quite different mean reversion characteristics from the portfolio of them.-0.50.511.500.10.20.30.40.5D/P ratiod i v i de n d g r o w t hFigure 1. Scatter diagram showing dividend price ratio D t /P t , horizontal axis, andsubsequent 25-year dividend growth () (equation 2, K =25), vertical axis, 2,499 observations shown, comprising 49 firms, t =1926 through 1976.D t gˆTable 1. Results of Regressions of Future Dividend Growth on Current Dividend-PriceRatio: t t t D t P D gεβα++=)/(ˆCoefficient of D t /P tT statistic R squared A. Average of 49 Separate Regressionsi) K =10, n =66 each regressionii) K =15, n =61 each regressioniii) K =20, n =56 each regressioniv) K =25, n =51 each regression-0.440 -0.498 -0.490 -0.499 -2.11 -1.85 -1.67 -1.55 0.182 0.167 0.173 0.162 B. Pooled over all firmsi) K =10, n =3,234ii) K =15, n =2,989iii) K =20, n =2,744iv) K =25, n =2,499v) K =75, n =49-0.589 -0.648 -0.666 -0.711 -1.087 -5.91 -5.69 -4.82 -4.84 -1.41 0.174 0.217 0.216 0.149 0.041 C. Using the portfolio of the 49 firmsi) K =10, n =66ii) K =15, n =61iii) K =20, n =56iv) K =25, n =510.336 0.322 0.463 0.697 1.79 1.52 1.84 2.40 0.084 0.063 0.101 0.175ReferencesBelsley, David A., Edwin Kuh, and Roy E. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, New York: Wiley, 1980.Campbell, John Y. “A Variance Decomposition for Stock Returns,” Economic Journal, 101:157-79, 1991.Campbell, John Y. and Motohiro Yogo, “Efficient Tests of Stock Return Predictability,”unpublished paper, Harvard University, 2002.Campbell, John Y., and Robert J. Shiller, “The Dividend-Price Ratio and Expectations of Future Dividends and Discount Factors,” Review of Financial Studies, 1:195-228, 1988(a).Campbell, John Y., and Robert J. Shiller, “Stock Prices, Earnings, and Expected Dividends,” Journal of Finance 43:661-76, 1988(b).Campbell, John Y., and Robert J. Shiller, “Valuation Ratios and the Long-Run Stock Market Outlook,” Journal of Portfolio Management, Winter 1998, pp. 11-26. Campbell, John Y., and Robert J. Shiller, “Valuation Ratios and the Long-Run Stock Market Outlook: An Update,” NBER Working Paper No. 8221, 2001,forthcoming in Richard Thaler, editor, Advances in Behavioral Finance II, NewYork: Sage Foundation: 2003.Cohen, Randolph, Christopher Polk, and Tuomo Vuolteenaho. 2002. “The Value Spread.” unpublished paper, Harvard Business School, 2002, forthcoming,Journal of Finance.Elliott, Graham, and James H. Stock, “Inference in Time Series Regression when the Order of Integration of a Regressor is Unknown,” Econometric Theory, 10:672-700, 1994.Hansen, Lars P., and Robert J. Hodrick, “Forward Exchange Rates as Optimal Predictors of Future Spot Rates: An Econometric Analysis,” Journal of Political Economy, 88:829-53, 1980.Jung, Jeeman, “Efficiency and Volatility of Stock Markets: Mean Reversion Detected by Covariance Ratios,” unpublished manuscript, 2002.LeRoy, Stephen, and Richard Porter, “The Present Value Relation: Tests Based on Variance Bounds,” Econometrica, 49:555-74, 1981.Poterba, James M., and Lawrence H. Summers, “Mean Reversion in Stock Prices: Evidence and Implications,” Journal of Financial Economics, 22:26-59, 1988. Samuelson, Paul A., “Summing Up on Business Cycles: Opening Address,” in Jeffrey C.Fuhrer and Scott Schuh, Beyond Shocks: What Causes Business Cycles, Boston:Federal Reserve Bank of Boston, 1998.Shiller, Robert J., “Do Stock Prices Move Too Much to be Justified by Subsequent Changes in Dividends?” American Economic Review, 71:421-36, 1981. Shiller, Robert J., Irrational Exuberance, 2nd Edition, New York: Broadway Books, 2000. Vuolteenaho, Tuomo, “What Drives Firm-Level Stock Returns?” Journal of Finance, 57:233-64, 2002。