Empirical distributions of galactic $lambda$ spin parameters from the SDSS

合集下载

6《资本理论与投资行为》(1963)_戴尔_乔根森著_

6《资本理论与投资行为》(1963)_戴尔_乔根森著_
247
248
AMERICAN
ECONOMIC ASSOCIATION
short of a correct formulation of this theory that the issue of the validity of the neoclassical theory remains undecided. There is not sufficient space to document this point in detail here; but I will try to illustrate what I would regard as a correct formulation of the theory in what follows. Stated baldly, the purpose of this paper is to present a theory of investment behavior based on the neoclassical theory of optimal accumulation of capital. Of course, demand for capital is not demand for investment. The short-run determination of investment behavior depends on the time form of lagged response to changes in the demand for capital. For simplicity, the time form of lagged response will be assumed to be fixed. At the same time a more general hypothesis about the form of the lag is admitted than that customary in the literature. Finally, it will be assumed that replacement investment is proportional to capital stock. This assumption, while customary, has a deep justification which will be presented below. A number of empirical tests of the theory is presented, along with an analysis of new evidence on the time form of lagged response and changes in the long-run demand for capital resulting from changes in underlying market conditions and in the tax structure.

The Dynamical Equilibrium of Galaxy Clusters

The Dynamical Equilibrium of Galaxy Clusters

a rXiv:as tr o-ph/961124v125Nov1996The Dynamical Equilibrium of Galaxy Clusters R.G.Carlberg 1,2,H.K.C.Yee 1,2,E.Ellingson 1,3,S.L.Morris 1,4,R.Abraham 1,4,5,P.Gravel 1,2,C.J.Pritchet 1,6,T.Smecker-Hane 1,4,7F.D.A.Hartwick 6J.E.Hesser 4,J.B.Hutchings 4,&J.B.Oke 4ABSTRACT If a galaxy cluster is effectively in dynamical equilibrium then all galaxy populations within the cluster must have distributions in velocity and position that individually reflect the same underlying mass distribution,although the derived virial masses can be quite different.Specifically,within the CNOC cluster sample the virial radius of the red galaxy population is,on the average,a factor of 2.05±0.34smaller than that of the blue population.The red galaxies also have a smaller RMS velocity dispersion,a factor of 1.31±0.13within our sample.Consequently,the virial mass calculated from the blue galaxies is 3.5±1.3times larger than from the red galaxies.However,applying the Jeans equation of stellar-hydrodynamical equilibrium to the red and blue subsamples separately give statistically identical cluster mass profiles.This is strong evidence that these clusters are effectively equilibrium systems,and therefore empirically demonstrates that the masses in the virialized region are reliably estimated using dynamical techniques.Subject headings:galaxies:clusters,cosmology:large-scale structure of universe 1.Introduction The primary goal of the CNOC (Canadian Network for Observational Cosmology)clusterredshift survey is to obtain a value of density parameter,Ω0,for those components of the massfield(possibly all)that participate in gravitational clustering(Carlberg et al.1996,hereafter CNOCi).The product of thefield luminosity density,j,with its mass-to-light ratio,M/L, estimates the mean mass density of the universe,ρ0(Oort1958),which in ratio to the critical density,ρc is equal toΩ0(Gott&Turner1976),when calculated in current epoch co-ordinates. The crucial quantity is an estimator of thefield mass-to-light ratio,for which we use the cluster virial mass-to-light ratio,M v/L.The main technical concern of the CNOC survey is that there are various possibilities that M v is a biased estimator of the cluster mass.Besides statistical biases, dissipative processes can cause the cluster galaxies to become more centrally concentrated than the cluster mass.In addition,when estimating thefield M/L from the cluster results,allowance must be made for the differences between the cluster andfield galaxy populations.To address these issues a sample of high X-ray luminosity galaxy clusters were selected from the EMSS catalogue(Henry et al.1992,Gioia&Luppino1994).This produces a relatively homogenous set of16clusters that likely contain a substantial virialized component.The sample, at a mean redshift of about1/3,gives a substantial column in redshift space which we use to calculate thefield luminosity density in precisely the same measurement system as the cluster light.The galaxies were selected with no knowledge of the colour or whether they were cluster members,although great care is taken to control the selection effects that are unavoidable to maintain efficiency(Yee,Ellingson&Carlberg1996).From these data we calculated the line-of-sight velocity dispersions,σ1,and virial radii,r v, of the clusters to derive the virial masses(Binney&Tremaine1987),GM v=3σ21r v.The velocity dispersions are calculated with an iterative technique that statistically removes interlopers by taking advantage of the relatively large foreground and background sample to estimate interloper densities(CNOCi).This technique leads to velocity dispersions that are on the average7%less than those found from precisely the same data in the redshift range,and about13%less with a somewhat less restrictive redshift cut,even when evaluated with the iterated bi-weight estimator (Beers,Flynn&Gebhardt1990).The virial radius is calculated from the data in the redshift range of the cluster with a“ring-wise”potential estimator,which allows correction for the strip sampling and helps reduce the noise(details are given in CNOCi).All these global quantities are derived from the individual galaxy positions,velocities,colors and luminosities,for which objective errors are calculated using the Jackknife method(Efron&Tibshirani1986).The errors are sufficiently small that the assumption of a symmetric distribution is appropriate and is verified by comparing to Bootstrap error estimates.The average cluster virial mass-to-light(r band,k-corrected)ratio is M v/L k r =289±50h M⊙/L⊙(CNOCi).In thefield,over the same redshift range,with the same sampling and corrections,the closure mass-to-light ratio isρc/j=1136±138h M⊙/L⊙(in co-moving units, for q0=0.1).There are two additional corrections applied to M v/L k r .The mean luminosity of galaxies brighter than M k r of−18.5mag is0.11±0.05lower in the cluster than thefield and the analysis of the mass profile(Carlberg,Yee&Ellingson1997,hereafter CNOCii)showed that M v needed to be multiplied by0.82±0.14,which we attribute to neglecting the surface term in thevirial mass estimator.Making these corrections leads to a redshift adjustedΩ0=0.19±0.06.Our conclusion that the stellar-hydrodynamical(SHD)mass profile of the clusters averaged together closely follows the number density profile of the galaxies(CNOCii)rests on the assumption that the clusters are effectively in equilibrium such that the Jeans equation can be used to derive the mass profile from the radial and velocity distribution of any tracer population. In this paper we demonstrate that two radically different tracer populations are consistent with the same underlying mass profile,which we take as strong support of the equilibrium assumption. In the following section we describe the division of the sample into two completely disjoint blue and red subsamples,whose surface density and projected velocity dispersion profiles are measured and demonstrate the gross differences in the calculated virial masses.In Section3we derive the stellar-hydrodynamical potential generating mass profiles in which the galaxies are orbiting.The implications of the statistical equality of the two profiles are discussed in thefinal section.All calculations in this paper assume H0=100km s−1Mpc−1and q0=0.1,although the results are either independent or not sensitive to these choices.2.The Blue and Red SubsamplesThe CNOC cluster redshift survey catalogues(Yee,Ellingson&Carlberg1996)contain∼2600 galaxies having Gunn r magnitudes,g−r colors,and redshifts.The sample is split at the color (g−r)z=(g−r)[1.2+2.33(z−0.3)]−1of0.7(CNOCi).This is approximately the color of an Sab galaxy,which roughly divides galaxies into spheroid dominated,“old”stellar systems and disk dominated,“young”stellar systems.Fourteen clusters are averaged together after removing two “binary”clusters.To a limit of M k r=−18.5mag,about70%of the cluster galaxies fall into the red subsample. The methods of analysis are precisely the same as used previously for the full sample(CNOCii). The velocities are normalized to theσ1of each cluster calculated from galaxies of all colors so that the two subsamples have a common reference.The RMS velocity dispersions of the red and blue galaxies are0.97±0.04and1.27±0.11,respectively,in units where the velocities are normalized to the velocity dispersions of all the galaxies in each cluster.That is,the velocity dispersion of blue cluster galaxies is about30%higher than for red galaxies(Rood et al.1972,Kent&Gunn√1982,Stein1996).The projected radii are normalized to r200=galaxies.The projected surface number density profiles,ΣN (R )of the two subsamples are displayed in Figure 1.Theprofilesarefitted to the projection of the density function (Hernquist 1990),ν(r )=Ar 2)r r 2−R 2dr,(2)where β=1−σ2θ/σ2ris the velocity anisotropy parameter.The parameters of σ2r (r )=BG d ln σ2rd ln r +2β .(4)We will call this mass the stellar hydrodynamical mass.The resulting M SHD (r )from the red and blue galaxies are shown in Figure 3for β=0and 1/2.The quantity plotted is the virial mass-to-light bias,b Mv (r )=M SHD (r )˜M v,(5)where˜L is an arbitrary normalization of the total light that cancels between numerator and denominator.The quantity˜M v is the normalization of the virial mass of the dimensionless sample as a whole,˜M v=3˜σ12˜r v,where˜σ1=1.05±0.04and˜r v=3.35±0.32.If the virial mass calculated from the sample as a whole gave the correct M/L at all radii then b Mv(r)=1everywhere.If the tracer population follows the true mass profile but with some scale error,then b Mv will be a constant other than unity.Figure3shows that beyond about0.3r200the mass profiles deduced from the two subsamples are identical within their errors(about15and30%for red and blue mass profiles,respectively).Figure4shows the value of b Mv evaluated at r200,demonstrating that the blue and red subsamples give statistically identical values.Furthermore the value is always less than unity,as we found for the sample as a whole(CNOCii)which has the implication that although the galaxy numbers fairly accurately trace the cluster mass profile the virial mass to light ratio is always an overestimate of the mean M/L inside r200.There is no statistically significant radial gradient of b Mv(r)for r≃r200.There are several important conclusions to be drawn from Figure3.First,it provides strong evidence that the blue and red galaxies are sufficiently in equilibrium with the cluster mass distribution that these dynamical methods work to recover the true mass profile.Second,the fitted scale radius of the mass profile(taken to be equal to that of the full sample)a=0.66±0.09, is slightly more extended than the red galaxies(Tyson&Fischer1995)a=0.56±0.10,but is substantially more compact than the blue galaxy distribution,a=1.82±0.27,all measured in r200 units.The differences in scale radii imply that neither the blue nor the red galaxies are distributed like the mass,so the virial masses calculated from these subsamples are not correct.4.ConclusionsThe densities and velocity dispersions of the red and blue galaxy subsamples,although very different from each other,imply a statistically identical mass profile for the cluster potential.This is necessary if both populations are in equilibrium with the potential,validating our use of the Jeans equation.This mass profile is statistically identical to the number density distribution of all the galaxies of our full sample.The limiting factor in the precise numerical agreement is that there are relatively few blue galaxies in clusters,meaning that their density and velocity dispersion profiles are less accurately measured.There are problems and issues that cannot be addressed with a sample of this size.A much larger sample would likelyfind that a common value ofβwould not work for both red and blue galaxies,since it is expected that on the average blue galaxies will avoid the core,hence near the centre have smallerβthan the red galaxies,and at large radius include many objects on theirfirst cluster crossing,hence have higherβthan the red galaxies.Furthermore the galaxies themselves are not invariant mass points.The blue galaxies are being altered by the cluster environment such that some of the their members are likely leaving the blue population to join the red population (Abraham et al.1996).The virial mass calculated from the full sample,empirically adjusted for its measurement biases,will correctly estimate the mass enclosed.With these two subsamples we have now tested each step in the chain of logic which supports our corrected,population adjusted value of Ω0=0.19±0.06.There is no compelling evidence for any remaining systematic errors of the cluster M/L as an estimator of thefield value over the redshift range of this sample.We thank the Canadian Time Assignment Committee of the CFHT for allocations of observing time,and the CFHT organization for the technical support which made these observations feasible. Funding was provided by NSERC and NRC of Canada.REFERENCESAbraham,R.G.,Smecker-Hane,T.A.,Hutchings,J.B.,Carlberg,R.G.,Yee,H.K.C.,Ellingson,E.,Morris,S.,Oke,J.B.,Davidge,T.1996,ApJ,471,694Beers,T.C.,Flynn,K.&Gebhardt,K.1990,AJ,100,32Binney,J.&Tremaine,S.1987,Galactic Dynamics,(Princeton University Press:Princeton) Carlberg,R.G.,Yee,H.K.C.,Ellingson,E.,Abraham,R.,Gravel,P.,Morris,S.M,&Pritchet,C.J.1996(CNOCi),ApJ,462,32Carlberg,R.G.,Yee,H.K.C.,&Ellingson,E.,1997(CNOCii),ApJ,in pressDressler,A.1980,ApJ,236,351Efron,B.&Tibshirani,R.1986,Statistical Science,1,54Gioia,I.M.&Luppino,G.A.1994,ApJS,94,583Gott,J.R.&Turner,E.L1976,ApJ,209,1Henry,J.P.,Gioia,I.M.,Maccacaro,T.,Morris,S.L.,Stocke,J.T.,&Wolter,A.1992,ApJ, 386,408Hernquist,L.1990,ApJ,356,359Kent,S.&Gunn,J.E.1982,AJ,87,945Oort,J.H.1958,in La Structure et L’´Evolution de L’Univers,Onzi`e me Conseil de Physique,ed.R.Stoops(Solvay Institute:Bruxelles)p.163Rood,H.J.,Page,T.L,Kintner,E.C.&King,I.R.1972,ApJ,175,627Stein,P.1996,A&A,in press(astroph/9606162)Tyson,J.A.&Fischer,P.1995,ApJ,446,L55Yee,H.K.C.,Ellingson,E.&Carlberg,R.G.1996,ApJS,102,269Fig.1.—The projected number density profiles of the blue(solid squares and dotted lines)and red (circles and solid lines)cluster galaxies.The1σconfidence range from a Bootstrap error estimate is shown.Fig.2.—The RMS line-of-sight velocity dispersion,σp,as a function of projected radius.The blue galaxies(solid squares and dotted line)have a largerσp than the red ones(circles and solid line). Theσp(R)arefitted withβ=[0.0,0.5].Fig. 3.—The bias function refered to the total light profile,b Mv(r)calculated from the mass profiles derived from blue(dotted line)and red(solid line)subsamples.The upper line for both subsamples is forβ=0,the lower line is forβ=1/2.Fig. 4.—The value of b Mv evaluated at r200.The blue subsample results are denoted with solid squares and the red with open circles.Fig.1.—Fig.2.—Fig.3.—Fig.4.—。

激光衍射测定PMDI粒径(外文)

激光衍射测定PMDI粒径(外文)

Respiratory Drug Delivery 2014 – Cooper et al.A QbD Method DevelopmentApproach for the Ex-actuatorParticle Size Distribution (PSD)Determination of pMDIsby Laser DiffractionAndy Cooper,1 Chris Blatchford,1and Stephen Stein213M Drug Delivery Systems Ltd, Loughborough,Leicestershire,UK23M Drug Delivery Systems, St. Paul, MN, USAKEYWORDS: pressurized metered dose inhaler (pMDI),quality by design (QbD), laser diffraction,method development, particle sizeIntRODuCtIOnThe aerodynamic particle size distribution (APSD) is a critical quality attribute (CQA) of orally inhaled and nasal drug products (OINDPs). Cascade impactor methodology [1] is typically used to determine this during development and registered product testing. However, this is a time consuming analysis which makes it impractical as a rapid screening tool for early phase develop-ment and investigations. Laser diffraction (LD) analysis of the ex-actuator plume has previously been proposed as an alternative [2] for geometric particle size distribution (PSD) determination. This is considered appropriate as the measured ex-actuator PSD would be influenced by the active pharmaceutical ingredient (API) within the formulation in three ways. One, the dry API par-ticles would be present in the actuation plume after atomization. Two, droplet formation during atomization may be influenced by the particle size of the API [3]. Lastly, the API concentration will influence the number of particles within each droplet and hence the measured PSD [2, 4]. This publication will focus on the specific considerations for method development/validation of this methodology [5-11], using a quality by design (QbD) approach [11, 12]. Data from these methods have the potential to correlate with APSD data [3, 14], as many variables are common to both techniques [15], despite the difference in the principle of measurement (geometric versus aerodynamic).321322QbD Method Development for the Ex-actuator PSD of pMDIs by Laser Diffraction – Cooper et al.MEthODOLOgYMeasurements were made with a Sympatec™ instrument, consisting of a Helos (Helium-Neon Laser Optical System)/BF™ laser diffraction sensor with the Inhaler™ dry dispersion accessory (See instrument settings in Table 1). The pMDI unit is manually actuated into the Inhaler accessory, allowing the actuation plume to pass through the laser (He/Ne @ 633nm) where light is scattered and then focused by a chosen lens onto a detector array. The detector consists of 31 concentric ring elements. The innermost element is referred to as R1 and the outermost element as R31. Scattered light registered on elements R1-R6 is discounted due to beam steer [16]. The signals from all the other detector elements are combined and a PSD is inferred from the scattering pattern based on the chosen optical model, after subtraction of background levels.table 1.Sympatec™ instrument settings.RESuL tS AnD DISCuSSIOnThere are multiple variables which can influence laser diffraction data, as shown in Figure 1. Aside from the product related factors, risk assessments of the other variables are required. While some parameters have no impact if they are suitably controlled (e.g., cleaning), other more critical parameters require experimenta-tion to determine their effects (e.g., trigger conditions). The Inhaler accessory was chosen since the droplets are entrained in an air flow which dilutes the sample and reduces potential artefacts such as velocity bias and geometric effects [15].Figure 1. Fishbone diagram of sources of variability for laser diffraction data.Respiratory Drug Delivery 2014 – Cooper et al. 323Copyright © 2014 VCUThe length of cylinder used for the Inhaler accessory must be optimized. This dictates thedistance between the point of actuation and the laser beam. The optical concentration (OC) must be sufficient for sensitivity purposes but not too high to cause multiple scattering. Data are shown in Figure 2A and 2B. The similarity in OC for the active and placebo shows that the majority of light did not reach the central detector due to the propellant shifting the laser beam (beam steer). The median particle size for the placebo is decreased with increased distance. This is likely due to the increased evaporation of co-solvent, rather than an indication of multiple scattering. This signal observed for the placebo led to the refractive index of the co-solvent being chosen for the method. Although the OC decreased with increased firing distance due to reduced beam steer, the active PSD remains reasonably consistent [<10% shift in d(v, 0.5)] and is considered to be real. A slight increase in active median particle size is consistent with reduced interference from the co-solvent and therefore the long cylinder is considered to be the most accurate.Figure 2. A: Optical concentration. B: d(v, 0.5) data for high strength active and placebo for various cylinder lengths.The selection of sampling criteria (trigger conditions) is critical [4] and they can alsointeract with the method of shaking and firing, particularly for a suspension pMDI. Design experi-ments are crucial for method optimization and for proving method robustness. Parameters, which are selected via a suitable risk assessment, are evaluated over an appropriate design space, as shownin Figure 3.324QbD Method Development for the Ex-actuator PSD of pMDIs by Laser Diffraction – Cooper et al.Figure 3. PSD data for sample preparation DoE – low strength active.Validation data for this method are shown in Table 2. Validation was performed on both high and low strength actives. RSD values are higher than those typically observed for a laser diffraction method for an API [17], however this is likely due to the inherent product variability (See Figure 1) – measurements are made of volatile droplets with a wide range of velocities contributing to differing amounts of evaporation per droplet. Increased variability is therefore expected. However, one important factor to limit variability is humidity level [4].table 2.Method validation data.This validated method has been used to try and understand trends in APSD data. The data shown in Table 3 shows that the PSD is influenced by the temperature of the product, which has similar trends to those observed for ACI data in the literature [18]. The vapor pressure of the formulation increases with temperature, which results in atomization of smaller droplets, which will contain fewer drug particles, hence a decrease in PSD. Faster propellant/co-solvent evaporation, due to the increased temperature, will also result in a decrease in PSD.table 3.PSD data for a range of temperatures (n = 6 at each condition) – high strength active.Respiratory Drug Delivery 2014 – Cooper et al. 325COnCLuSIOnSA QbD approach to development of laser diffraction (LD) methodology to determine the PSD ex-actuator from a pressurized metered dose inhaler (pMDI) is presented. Robust and repeatable data were obtained, however this is inherently more variable than methodology for the PSD deter-mination of the API. The LD methodology can be used to understand trends in cascade impaction data – the industry standard, but more time consuming, methodology for APSD determination.REFEREnCES1. USP Chapter (601): Aerosols, nasal sprays, metered-dose inhalers, and dry powder inhalers.2. Gonda, I: Development of a systematic theory of suspension inhalation aerosols. A frameworkto study the effects of aggregation on the aerodynamic behaviour of drug particles, International Journal of Pharmaceutics 1985, 27: 99-116.3. Pu, Y, Kline, LC, Berry, J: The application of “in-flight” laser diffraction to the particle sizecharacterization of a model suspension metered dose inhaler, Drug Development and Industrial Pharmacy 2011, 37 (5): 552-58.4. Cooper, A, Bell, T: Monitoring of droplet size changes in a suspension pMDI by laserdiffraction on a Sympatec™Instrument. In Drug Delivery to the Lungs 19, 2008.5. Mitchell, JP, Nagel, MW, Nichols, S, Nerbrink, O: Laser diffractometry as a technique for therapid assessment of aerosol particle size from inhalers, Journal of Aerosol Medicine 2006, 19(4): 409-33.6. ICH Harmonised Tripartite Guideline Q2(R1) (1994): Validation of analytical procedures,Text and Methodology.7. ISO Standard 13320-1 (2009): Particle size analysis, laser diffraction methods.8. Ph. Eur. Chapter 2.9.31: Particle size analysis by laser light diffraction.9. USP Chapter (429): Light diffraction measurement of particle size.10. Ward-Smith, RS, Gummery, N, Rawle, AF: Validation of wet and dry laser diffraction particlecharacterisation methods. Malvern Instruments Ltd. /malvern/kbase.nsf/allbyno/KB000167/$file/Laser%20Diffraction%20Method%20Validation.pdf.11. Rawle, A, Kippax, P: Setting new standards for laser diffraction particle size analysis. MalvernInstruments Ltd. /malvern/kbase.nsf/allbyno/KB002403/$file MRK 1399-01.pdf.Copyright © 2014 VCU326QbD Method Development for the Ex-actuator PSD of pMDIs by Laser Diffraction – Cooper et al.12. Schweitzer, M, Pohl, M, Hanna-Brown, M, Nethercote, P, Borman, P, Hansen, G, Smith,K, Wegener, G: I mplications and opportunities of applying QbD principles to analytical measurements, Pharmaceutical Technology 2010, 34 (2): 52-59.13. Borman, P, Chatfield, M, Jackson, P, Laures, A, Okafo, G: Reduced-method robustness testingof analytical methods driven by a risk-based approach, Pharmaceutical Technology 2010, 34(4): 72-86.14. Jones, SA, Martin, GP, Brown, M.B: High-pressure aerosol suspensions – A novel laserdiffraction particle sizing system for hydrofluoroalkane pressurised metered dose inhalers, International Journal of Pharmaceutics 2005, 302: 154-65.15. Blatchford, C: From powder to patient - optimisation of particle sizing techniques, In DrugDelivery to the Lungs 24, 2013.16. Ranucci, J: Dynamic plume – particle size analysis using laser diffraction, PharmaceuticalTechnology 1992, 16: 108-14.17. Cooper, A, Blatchford, C, Kelly, M: Laser diffraction methodology for particle size distribution(PSD) determination during pMDI product development – A QbD approach. In Respiratory Drug Delivery Europe 2013. Volume 2. Edited by Dalby, RN, Byron, PR, Peart, J, Suman, JD, Young, PM, Traini, D. DHI Publishing; River Grove, IL: 2013: 197-202.18. Stein, S, Cocks, P: Size distribution measurements from metered dose inhalers at lowtemperatures. I n Respiratory Drug Delivery Europe 2013. Volume 2. Edited by Dalby, RN, Byron, PR, Peart, J, Suman, JD, Young, PM, Traini, D. DHI Publishing; River Grove, IL: 2013: 203-08.。

Product and destination mix in export markets

Product and destination mix in export markets

Rev World Econ(2013)149:23–53DOI10.1007/s10290-012-0136-zO R I G I N A L P A P E RProduct and destination mix in export marketsJoa˜o Amador•Luca David OpromollaPublished online:9October2012ÓKiel Institute2012Abstract This article studies the joint destination and product strategies of exporters,using the universe of export transactions forfirms located in Portugal in the period1995–2005.The article breaks down the annual growth rate of total exports along different margins and details choices made by multi-product,multi-destinationfirms regarding their export portfolio.In addition,the article looks at similar features for the subsample of new exporters.Wefind that both thefirm-level extensive and intensive margins are important in driving the year-to-year variation in aggregate exports.However,variation over time in the sales of continuing exporters is mainly driven by their sales in continuing destinations.In addition,a product’s export tenure within afirm varies largely across currently exported products in the context of an intense activity of product and destination switching. Moreover,the higher the importance of a product,the more its sales are concen-trated in thefirm’s top destination.Finally,the articlefinds that,while continuing exporters enter new markets mainly by selling old products,new exporters access new destinations mainly by exporting new products.Keywords Multi-productfirmÁProduct scopeÁMarket penetrationÁExportsJ.AmadorNova School of Business and Economics,Campus de Campolide,1099-032Lisbon,Portugale-mail:jamador@bportugal.ptJ.AmadorÁL.D.Opromolla(&)Research Department Central Bank of Portugal,R.do Ouro,27,1100Lisbon,Portugale-mail:luca.opromolla@L.D.OpromollaUECE(Research Unit on Complexity and Economics),Technical University of Lisbon,Rua Miguel Lupi20,1249-078Lisbon,Portugal24J.Amador and L.D.Opromolla JEL Classification F1ÁL25ÁD211IntroductionThe empirical research in international trade has made significant progress in the characterization offirms’exporting decisions.This type of research is progressively providing insights on howfirms behave in international markets and it has unveiled a complex set of potential export determinants[see Wagner(2012b)].Existing research ranges from the impact of heterogeneity at thefirm level to the different dimensions of exporting decisions,including the choice of destinations and products.Examples of articles exploring the determinants of the geographical expansion of exporters are Eaton et al.(2004,2007,2011).In parallel,examples of articles exploring the product dimension for exportingfirms are Arkolakis and Muendler(2011),Bernard et al.(2007)and Iacovone and Javorcik(2010).A related strand of research focuses on multi-product,multi-destinationfirms,which have been identified as strong contributors to total export activity.One key example of the literature on multi-productfirms and product switching is Bernard et al.(2010). Finally,special attention has been given to developments in new exporters.For example,Wagner(2012c)performs an analysis based on cohorts of new exporters in Germany.All these different strands of research have been made possible by the availability of detailedfirm-level databases for several countries.Afirm’s export mix is the result of a complex combination of factors.The decision of which products to offer in each market depends on the attributes of firms,destinations and products.Theoretical models are progressively incorporating these elements,departing from the initial contributions of Melitz(2003)and Yeaple (2005)and evolving to richer frameworks like the one proposed by Bernard et al. (2011).This latter model allows for heterogeneity in ability acrossfirms and in product attributes withinfirms,generating endogenous entry and exit decisions and an optimal export portfolio for each market.As in Melitz(2003),one key ingredient is the existence of sunk entry-costs in export markets,which revealsfirms’profitability.Another important aspect underlyingfirms’exporting decisions relies on search and learning.Before making decisions on expansion,firms can learn from other domesticfirms operating in those foreign markets or from their own experience in other markets,as argued by Fabling et al.(2011).An example of a theoretical model incorporating these features is Eaton et al.(2012),wherefirms take stock of the available information,update their beliefs concerning the scope for export profits and adjust the intensity of search efforts accordingly,seeking to maximize expected profit streams.In this paper we take a fully empirical approach and focus on the joint destination and product strategies of Portuguese manufacturing exporters.The Portuguese case is particularly interesting:EU accession in1986significantly increased tradeflows and participation and its experience can prove relevant for other small countries taking part in economic integration processes.The paper starts by quantifying the role offirm,destination and product margins in explaining the annual growth rate of total exports.Next,it focuses on the choices of multi-product,multi-destinationProduct and destination25firms,which are mostly experienced exporters.In addition,the article provides insights on product and destination switching,the relation between destination and product ranks and choice of new markets.These empirical facts result from a complex combination offirm,destination and market attributes with search and learning processes at the level of thefirm.The third part of the paper focuses on the joint destination and product choices of new exporters.The relevant questions are the importance of destination and product margins for their expansion and choices regarding entrance in new markets.Overall,a set of stylized facts is provided,some confirming existing empirical research,while others are new to the literature.The joint analysis of destination and product dimensions has been barely discussed in previous works.Therefore,there is substantial scope for the identification of new stylized facts that may lead to future research.We believe that some of the new stylized facts presented are informative for developing a dynamic theory of multi-product exporters,potentially integrating joint destination and product dimensions like in Bernard et al.(2011),but also adding components related to search and learning like in Eaton et al.(2012).Results show that roughly one quarter of the variation infirms’exports is explained by variation in the number of destinations served and that higher sales in a destination are mainly due to the product intensive margin,i.e.,higher product sales instead of sales of more products.In our context,the different extensive margins in total exports reflect the foreign sales attributed to new exporters,new destinations or new products, while the different intensive margins reflect exports attributed to existingfirms, existing markets or existing products.In addition,both thefirm-level-extensive(entry and exit of exporters)and intensive margin(sales of continuing exporters)are important in driving the year-to-year variation in aggregate exports.This result is partly consistent with evidence from Eaton et al.(2007)for Colombia but reveals a much more important role for the extensive margin,possibly due to the fact that we restrict attention to manufacturingfirms.1Variation over time in the sales of continuing exporters is instead mainly driven by the intensive margin at the destination level,i.e.,by variation in the sales of continuing exporters in continuing destinations. Similarly,the latter closely follows the sales of continuing products,by continuing exporters in continuing destinations,i.e.,the intensive margin at thefirm-destination-product level.At all levels(firm,destination,and product)the level of churning is quite high,implying that gross entry and exitflows are much bigger than netflows. Moreover,wefind that continuingfirms enter in new markets mainly by selling old products,i.e.,products that were previously sold somewhere by the samefirm.The article also confirms that multi-product and multi-destination exporters(which do not necessarily coincide)are the majority and account for a more than proportional share of total exports.In addition,the range of products that they export is very diversified,spanning multiple2-digit sectors.In this respect,our results are consistent with the recent literature on carry-along trade(Bernard et al.2012).1In a previous version of this article,which has circulated under the same name,the set offirms included in the analysis accounted for all merchandize exporters and not just those that belong to the manufacturing sector,i.e.,it included servicesfirms that export goods(notably retailers and wholesalers). In this context,wefind that the intensive margin in thefirm dimension explains a larger share of the year-to-year growth rate of aggregate exports.26J.Amador and L.D.Opromolla This article also reinforces the empirical evidence about the skewness of the product exports distribution withinfirms.Bernard et al.(2011)show that,under the assumption of a Pareto distribution of product attributes,their model implies a linear relationship between the log rank of products infirms’exports and their log share of exports within destinations.Our results are in line with their theoretical implication and with the supporting empirical evidence they provide using US data.However, we take two more steps.First,we show that a very similar pattern holds for the log rank of destinations infirms’exports and the log share offirms’exports within products.Second,joining the product and destination dimensions,we show that the results in terms of the degree of skewness and the Pareto-like behavior of the within-firm exports distribution across destinations carry through when considering separately products that rank differently within afirm.Further,we show that the degree of skewness is positively related to the product rank.In other words,for each of afirms’top three products,the distribution of sales across destinations resembles a Pareto distribution,but with different slope coefficients.Therefore,while the firm’s top product is sold mainly in thefirm’s top destination,the second and third top products are more homogeneously sold across destinations.These novel features identified in the data should be taken into account when developing the next generation of trade models dealing with multi-product and multi-destinationfirms.Furthermore,wefind that products vary a lot in terms of tenure within thefirm, which is a consequence of frequent product(and destination)switching,with important effects in terms of the growth of aggregate exports as well.Overall,this suggests the importance of extending the current models of multi-product and multi-destinationfirms to a dynamic setting,where products and destinations are added and dropped in the steady state.The literature on industry dynamics,which emphasizesfirms’idiosyncratic shocks to productivity[e.g.,Luttmer(2007)],and some of its extensions to trade[e.g.,Irarrazabal and Opromolla(2008)and Arkolakis(2009)],can be a source of inspiration.Modellingfirms’search efforts directed at learning about the appeal of their products in the foreign market is a complementary way of capturing the intense and frequent changes in the product and destination export portfolio[see Eaton et al.(2012)].Finally,as for the dynamics of new exporters,we show that the intensive margin (sales of continuing products in continuing destinations)has a much bigger weight in explaining the growth rate of exports,when compared with experienced exporters.Moreover,new exporters enter new destinations mainly by selling new products.The article is based on an extensive transaction-level database for Portuguese exporters over the period1995–2005.Given the nature of the analysis performed in the article,the use of this type of data is absolutely key.Other studies based on transaction-level data are,for example,Eaton et al.(2007)for Colombia,Eaton et al.(2011)for France,Muuˆls and Pisu(2009)for Belgium,Bernard et al.(2009) for the United States,Mayer and Ottaviano(2008)for a sample of Europeanfirms, Manova and Zhang(2009)for China,Be´ke´s et al.(2011)for Hungary and Wagner (2012a)for Germany.The article is organized as follows.The next section describes the database used. Section3takes a broad view and identifies the joint role of different margins inProduct and destination27 aggregate export growth.Section4provides evidence on the export behavior of multi-product and multi-destinationfirms,looking at their joint product and destination portfolio and at product and destination switching.Section5focuses on the behavior of cohorts of new exporters,looking at the joint role of destination and product dimensions in their expansion and their choice of new markets.Section6 concludes.2DataThe analysis of product and destination mix is made possible by the use of a new database that combines detailed and comprehensive information on international trading behavior offirms.The database used in this article includes all export transactions byfirms that are located in Portugal,on a monthly basis,from1995to 2005.A transaction record includes thefirm tax identification number,an8-digit combined nomenclature(CN)product code,the value of the transaction,the quantity of exported goods(expressed in kilos),the destination country,the type of transport,the relevant international commercial term(e.g.,FOB or CIF)and a variable indicating the type of transaction(e.g.,transfer of ownership after payment or return of a product).The data used come from customs returns forms in the case of extra-EU trade and from the Intrastat form in the case of intra-EU trade and it aggregates to total Portuguese exports,as reported by Statistics Portugal(Instituto Nacional de Estatı´stica).In the analysis,we consider only manufacturingfirms and transactions that are worth more than100euros.2Still,the data cover,on average,about80%of total exports and about one quarter of all(i.e.,manufacturing and non-manufac-turing)exporters.The data is aggregated at the annual level and all values are expressed in current euros.The analysis focuses on the1996–2005period.Although it would be possible to work at the6-digit CN level,we define products at4-digit level according to the HS.This avoids possible classification problems related to CN and still allows for a set of1,241potential products.3As a robustness test,we computed most of the results at the6-digit level and results qualitatively hold.As shown in Table1,the sample includes6,507exporters in2005,exporting 1,028products to209countries.The average exporter in2005ships4.1products to 5.2destinations and receives about3.3million euros.Table1shows that,at the aggregate level,the number of exporters has increased considerably between1996 and2005,while both the number of products exported and the number of 2Manufacturingfirms were identified using a sectoral classification,by crossing trade data with a matched employer-employee data set for the same time period.For details see Martins and Opromolla (2011).3The CN system is comprised of the harmonized system(HS)nomenclature with further European Community subdivisions.The HS is run by the World Customs Organization.This classification of commodities is used by most trading nations and in international trade negotiations.Thefirst6-digits of the CN system approximately coincide with the HS classification.While the CN system is changed almost every year,the HS,created in1988,was updated on January1st1996,January1st2002and January1st 2007.The adjustments were made at the6-digit level and implied the aggregation of some categories.28J.Amador and L.D.Opromolla destinations served has been more stable.At thefirm level,the average number of products exported by afirm has not changed much,while the average number of destinations reached has decreased slightly.The lack of dynamics for the total number of products exported or destinations served and for the average number of products exported by afirm suggested by Table1is misleading.There is a high degree of reallocation of resources acrossfirms and withinfirms along the product and destination dimensions.The stability in the average number of products exported by afirm or in the total number of products exported by Portugal hides not only a considerable degree offirm entry and exitflows but also frequent and pervasive product and destination switching withinfirms.The highfigures for the standard deviations in Table1indicate a high degree of heterogeneity in terms of the number of destinations served,number of products exported and exports.Such heterogeneity and reallocation stand as major motivations for this article and they are analyzed in detail in the next sections.3Interaction offirms,products and destinationsThis section analyzes how thefirm,product,and destination dimensions affect the structure of Portuguese exports.We start by taking a cross-sectional approach and Table1Summary statistics,selected years1996199920022005Firm levelNumber of productsMean 3.9 3.8 4.0 4.1 Median2222 SD 5.3 5.0 6.3 6.3 Number of destinationsMean 5.6 5.5 5.4 5.2 Median3322 SD 6.7 6.87.47.2 Export(million euros)Mean 3.2 3.4 3.5 3.3 Median0.40.40.30.2 SD34.429.333.229.0 Aggregate levelNumber offirms5,3435,7315,7006,507 Number of products9639771,0031,028 Number of destinations202200216209 Export(million euros)17,295.519,427.319,673.921,519.1The top panel showsfirm-level summary statistics while the bottom panel shows country-level aggregate statistics.A product is defined as an HS4-digit codeSources:INE(trade data)and authors’calculationsexamine how the variation in the number of exported products and destinations reached translates into variation in exports acrossfirms.We proceed by decomposing the growth rate of Portuguese exports into the contribution offirms, destinations and products.3.1Cross-sectional variation in exportsWe examine how the variation in the number of exported products and destinations reached translates into variation in exports acrossfirms through simple regressions. We decompose total exports offirm x in a year t,denoted x x t,into the number of destinations reached n x t d and the average sales per destination"x d x t,i.e.,in logarithms: ln x x t¼ln n d x tþln"x d x t:A regression of ln n d x t against ln x x t then tells us the extent to which variation infirm exports is accounted for by variation in the number of destinations served.That is:ln n d x t¼aþln x x tþ x t:ð1ÞTable2presents the results of the associated regression,taking the entire set of exporters for the11years of the sample.The coefficient of0.27indicates that the intensive margin is the dominant one.Firms with larger exports sell more to each destination with an elasticity of0.73so that the number of destinations reached only increases with an elasticity of0.27.A robustness test regression that includes both firm and yearfixed effects gives a similar result with an estimate of0.14.Moreover, the coefficients of the year dummies(not reported but always significant at99%) increase over time,showing an expansion in the number of destinations reached, even after controlling for the growth infirms’exports.Next,we can take one step further in the analysis to understand to what extent the growth in average sales per destination reflects exports of a higher number of products as opposed to higher average sales per product.We can decompose average sales per destination"x d x t in a year t into the number of products shipped n x t i and the average sales per product and destination"x di x t,i.e.,in logarithms ln"x d x t¼ln n i x tþln"x di x t.A regression of ln n i x t against ln"x d x t shows the extent to Table2Dependent variable:ln n dx tRegressor Base RobustnessIntercept-1.95(0.02)-0.44 (0.03)ln x x t0.27(0.00)0.14 (0.00)Firm effects No YesYear effects No YesR20.400.88 Number of obs.63,60363,603Standard errors in parenthesesSources:INE and authors’calculationsProduct and destination29which variation infirm average sales per destination is accounted for by variation in the number of products sold.ln n i x t¼aþln"x d x tþ x t:ð2ÞTable3presents the results of the associated regression,across all exporters for the11years of the sample.The coefficient of0.17indicates that the intensive margin is again the dominant one.Firms that export more to each destination sell more of each product in each destination with an elasticity of0.83,i.e.,the number of products shipped only increases with an elasticity of0.17.In this case,a regression that includes bothfirm and yearfixed effects gives an estimate of0.12. Again,the coefficients of the year dummies grow considerably over time,showing an expansion in the number of products shipped,even after controlling for the growth infirms’exports per destination.3.2Lessons for growthIn this subsection we study the impact offirms’entry and exit,and offirms’product and destination switching on Portuguese total exports.First thefirm dimension is considered by tracking the number of continuing,exiting and enteringfirms (extensive margin)and their average export(intensive margin).Then we decompose the growth rate of Portuguese exports into the contribution offirms,destinations and products.3.2.1Thefirm marginTable4decomposes the total number of exporters in each year into those continuing,exiting,entering or just staying1year.We follow Eaton et al.(2007)in definingfirm categories.Entrants in year t are thosefirms that did not export in t-1, export in t and will export in t?1as well;exiters in year t are thosefirms that exported in t-1,export in t but will not export in t?1;continuingfirms in year t are thosefirms that exported in t-1,export in t and will export in t?1as well; Table3Dependent variable:ln n ix tRegressor Base Robustness Intercept-1.01-0.42(0.02)(0.03)ln"x d x t0.17(0.00)0.12 (0.00)Firm effects No YesYear effects No YesR20.160.80 Number of obs.63,60363,603Standard errors in parenthesesSources:INE and authors’calculations30J.Amador and L.D.OpromollaProduct and destination31finally,single-year exporters in year t are thosefirms that did not export in t-1, export in t but will not export in t?1.The top panel of Table4reports the number offirms falling in each category over time and the bottom panel reports average exports perfirm for each category.Results show that nearly three quarters of the firms are continuing exporters,single-year exporters represent about10%of the total and entering exporters tend to be more numerous than exiting ones.The share of continuingfirms in total exports is quite variable,representing more than50%on average.In contrast,single-yearfirms represent less than5%of total exports.Thus the gap between exports perfirm in each group is very wide.Entering and exiting firms are on average smaller,in terms of sales perfirm,than incumbents,just like in the IO literature.Eaton et al.(2007)alsofind that single-year Colombian exporters are numerous but count little in terms of exports.4Table4Continuing,entering,exiting and single-year exporters,1997–2004Year Number offirmsContinuing Exiting Entering Single-year19973,595553791533 19983,756630737637 19993,808685654584 20003,727735706621 20013,590843630606 20023,542678814666 20033,698658966684 20044,0925721,013638Year Export perfirm(1,000euros)Continuing Exiting Entering Single-year19974,4921,6411,049165 19984,3802,0292,035411 19993,8325,0521,255947 20004,1292,2752,553278 20013,9081,9914,854973 20024,9041,4901,261400 20034,5701,1141,946227 20044,5995061,138139Entering exporters in year t are thosefirms that did not export in t-1,export in t and will export in t?1as well;exiting exporters in year t are thosefirms that exported in t-1,export in t but will not export in t?1;continuing exporters in year t are thosefirms that exported in t-1,export in t and will export in t?1;single-year exporters in year t are thosefirms that did not export in t-1,export in t but will not export in t?1Sources:INE(trade data)and authors’calculations4Examples of articles referring to these stylized facts in the context of the IO literature are Jovanovic (1982),Hopenhayn(1992),Cabral and Mata(2003),and Luttmer(2007).3.2.2Decomposing export growth:firms,destinations and productsWe decompose Portugal’s total export growth across any2years into the contribution of three distinct decisions:the decision to entry/stay/exit in export markets,the decision of where to export and the decision of what to export.First we decompose total export growth in the contribution of‘‘entering’’,‘‘exiting’’and ‘‘continuing’’exporters,that is,in the extensive and intensive margin at the aggregate level along thefirm dimension.D Y t¼Xj2N D Y jtþXj2XD Y jtþXj2CD Y jt;ð3Þwhere D Y t is the change in Portuguese exports from year t-1to year t,N is the set of entering exporters,X is the set of exiting exporters and C is the set of continuing exporters.The next step is to decompose the change in exports by continuing exporters into‘‘added destinations’’(AD),‘‘dropped destinations’’(DD)and ‘‘continuing destinations’’(CD),that is,in the extensive and intensive margin at the firm level along the destination dimension.X j2C D Y jt¼Xj2CXz2ADD Y zjtþXz2DDD Y zjtþXz2CDD Y zjt"#:ð4ÞNext,we consider the product thatfirms choose to export in‘‘continuing’’and ‘‘added’’destinations.First we distinguish among‘‘added’’(AP),‘‘dropped’’(DP) and‘‘continuing’’(CP)products exported byfirms in‘‘CD’’,that is,the extensive and intensive margin at thefirm level along the product dimension.X z2CD D Y zjt¼Xz2CDXv2APD Y vzjtþXv2DPD Y vzjtþXv2CPD Y vzjt"#:ð5ÞFinally,we split the export change associated to new destinations into products already sold by thefirm somewhere,i.e.,old products(OP),and products that were not sold by thefirm anywhere,i.e.,new products(NP).We consider this as an interaction between the extensive margin along the destination dimension and the product margin.X z2AD D Y zjt¼Xz2ADXv2OPD Y vzjtþXv2NPD Y vzjt"#:ð6ÞSubstituting we can write the change in Portuguese exports as:D Y t¼Xj2N D Y jtþXj2XD Y jtþXj2CXz2ADXv2OPD Y vzjtþXv2NPD Y vzjt"#þXz2DDD Y zjt "#þXj2C Xz2CDXv2APD Y vzjtþXv2DPD Y vzjtþXv2CPD Y vzjt"#ð7Þ32J.Amador and L.D.OpromollaWe compute the percent change in total export by dividing each term in Eq.7by (Y t?Y t-1)/2,i.e.,the average between exports in t and t-1.5Results from this decomposition are presented in Table5.The table shows that both thefirm-level extensive(entry and exit of exporters)and intensive margins (sales of continuing exporters)are important in driving the year-to-year variation in aggregate exports.Over the period from1997to2005,average nominal aggregate export growth was2.2%.Nearly three quarters of this average growth rate is accounted for by the extensive margin along thefirm dimension.This result is partly consistent with evidence from Eaton et al.(2007),using Colombian data for the1997–2005period,but reveals a much more important role for the extensive margin,possibly due to the fact that we restrict our attention to manufacturingfirms only.When the next level of disaggregation(destinations)is considered,we see that the intensive margin,i.e.,export growth in CD,accounts for almost all of the intensive margin along thefirm dimension.However,the gross contribution of added destinations and DD among continuingfirms is quite high.Therefore,there is a high level of reallocation of economic resources associated with destination switching.The decomposition at the product level also offers some interesting patterns.The net contribution of added and dropped products at continuingfirms is usually small but the level of churning is very high.In this vein,basing on Costa Rica’s data,Lederman et al.(2011)find that the rate offirm turnover into and out of exporting activities is quite high.At the end of each panel of Table5we present the importance of each margin in the long-run(1997–2005).As referred by Eaton et al.(2007),the role of thefirms’destination and product extensive margins is more substantial in the long run.They find that net entry over the course of the sample period accounts for one quarter of the cumulative total export expansion,while gross entry explains about half of total growth.This is due to the fact that surviving new exporters are typically able to rapidly expand.Such pattern is confirmed for Portuguese exporters as well and we focus on this aspect in Sect.5.The role of continuing products in continuingfirms is crucial in explaining changes in Portugal’s export growth.Table6shows that continuing exporters tend to ship old products to newly added destination markets.A product is defined as old, within thefirm,if it was exported,the year before,by thefirm in at least one destination.Nevertheless,the role of new products is relevant,notably in AD,i.e., there is an interaction between the extensive margin along the destination dimension and the product margin.Moreover,AD represent an important market for new products.Table7shows that,on average,26%of the sales of a NP refer to added destinations.5As Eaton et al.(2007)explain,computing growth as the change between two dates divided by the average level in the two dates rather than the change divided by the level in the earlier date has at least two advantages:(i)x percent growth followed by-x percent growth returns us to the same level and(ii) values close to zero in thefirst year have a less extreme effect on the growth rate.。

PROCEEDINGS OF THE IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, PP. 1175-1178, 2004.

PROCEEDINGS OF THE IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, PP. 1175-1178, 2004.

PROCEEDINGS OF THE IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING,PP.1175-1178,2004.MULTISCALE ANALYSIS OF FMRI DATA WITH MIXTURE OF GAUSSIAN DENSITIESFranc¸ois G.Meyer and Xilin ShenUniversity of Colorado at Boulder,Department of Electrical Engineering,E-mail:francois.meyer@.ABSTRACTWe describe in this work an exploratory analysis of fMRI data.We regard the fMRI dataset as a set of spatiotemporal signals s x in-dexed by their position x.The analysis is performed on the wavelet packet coefficients of the fMRI signals s x,and we show that we can characterize the coefficients in terms of a mixture model of multivariate Gaussian distributions.1.INTRODUCTIONFunctional Magnetic Resonance Imaging(fMRI)can quantify he-modynamic changes induced by neuronal activity.The goal of the analysis is to detect the“activated”voxels x where the dynamic changes in the fMRI signal s x(t),t=0,···,T−1can be consid-ered to be triggered by the(sensory or cognitive)stimulus.Statis-tical techniques are commonly used for the detection of activated voxels.Unfortunately,these techniques often rely on oversimpli-fied assumptions.We describe in this work an exploratory analysis of the fMRI data.We regard the fMRI dataset as a set of spatiotem-poral signals s x,indexed by their position x.Our analysis is not performed directly on the raw fMRI signal.Instead,the raw data are projected on a set of basis functions conveniently chosen for their ability to reveal the structure of the dataset.Several studies indicate that onefinds dynamic changes of the fMRI signal in time and in frequency[1].Wavelet packets are time-frequency“atoms”that are localized in time and in frequency.We favor therefore the use of wavelet packets to perform the analysis of fMRI data.2.MULTISCALE ANALYSIS AND WA VELET PACKETSA wavelet packet is given byψj,k,l(t)=ψk(2j t−l),where•j=0,...,J represents the scale:ψj,k,l has a support ofsize2−j.J is the maximum scale(2J≤T).•k=0,...,2j−1represent the frequency index at a givenscale j:ψj,k,l has roughly k oscillations.•l=0,...,2J0−j−1represents the translation index withina node(j,k):ψj,k,l is located at l2−j.The library of wavelet packets can be constructed iteratively start-ing from the scaling functionψ0,and the waveletψ1.As shown in Fig.1the library of wavelet packets organizes itself into a binary tree,where the nodes of the tree represent subspaces with different time-frequency localization characteristics.We define the index This work was supported by a Whitaker Foundation Biomedical Engi-neering Research Grant.Fig.1.Top:wavelet packet tree.Bottom:wavelet packet basis functionsψγ=ψj,k,l of the subset of nodes W l.The wavelet packets at each node of the tree are shown on the corresponding row in an enclosed box.γ=(j,k,l),and we considerψγ=ψj,k,lto be a T×1vector. The wavelet packet coefficientαx(γ)=ψTγs x(1)can be computed with a fast algorithm at each node of the tree. We denote W l the set of basis functions from each node of the subtree that is enclosed in a box shown in the top of Fig.1.These basis functions have lost their temporal localization,but are more precisely located in the frequency domain:they roughly behave as sinusoidal functions oscillating at low frequencies.One wavelet packet,ψγis of particular interest to us(the third from the right on the last row,with a bold frame),because its frequency and phase match the frequency and the phase of the periodic stimulus that will be described in the next section.3.GLOBAL STATISTICS OF THE COEFFICIENTSWe consider an fMRI dataset that demonstrates activation of the vi-sual cortex[2].A visual stimulus composed of aflashing checker-board was presented to a subject for30s,and a blank image wasFig.2.Histogram of the wavelet packet coefficients(forψγ∈W l) of the background time series.The mean(top)and the variance (bottom)is shown for each distribution.presented for the next30seconds.Images were acquired every3s, for a total of80images.The voxel size was1.88×1.88×3mm, and the image size was128×128.We used SPM[3]to deter-mine the status:activated/non activated of each time series s x. The background time series had a p-value greater than0.1,the activated time series had a p-value smaller than10−3.We com-puted the wavelet packet coefficientαx(γ)of each time series s x. We want to estimate the probability distribution of the coefficients αx(γ),for differentγ.Wefirst observe that the stochastic pro-cessαx is not stationary(with regard to shifts in x).Indeed,we know that the activated voxels are not randomly scattered through-out the whole brain(activated voxels tend to be clustered spatially) and therefore the distribution of the coefficientsαx is not transla-tion invariant.A more detailed analysis of the distribution of the coefficients requires to partition the time series into two classes, (1)activated and(2)background(non activated).We can assume that within each class,the distribution of the time series will be the same.We can then use all the values ofαx for all the voxels within each class to estimate the distribution ofαx(ergodicity argument). Fig.2shows the histogram of the wavelet packets(forψγ∈W l) of the background time series.The empirical distributions appear to be zero-mean Gaussian distributions,with a standard deviation of about20.Figure3shows the histogram of the wavelet packets (forψγ∈W l)of the activated time series.Because there are much fewer activated time series than background time series(fromfive to ten percent of the whole brain is activated)the histograms are coarser than the corresponding histograms for the background time series.The stimulus is periodic,and we expect the energy of the activated time series to be distributed over only a small number of coefficients in the Fourier domain[1,4].Most of the energy of the activated time series will be at the stimulus frequency.Unfortu-nately,the wavelet packet library does not include true sinusoidal functions.However,as noted in the previous section there existsone basis function,ψγ0,with a frequency and a phase that matchthe frequency and the phase of the periodic stimulus.Most distri-butions have a small mean.The distribution with the largest mean is obtained with the wavelet packet whose frequency match exactly the frequency of the stimulus.In fact,a close inspection ofFig.3.Histogram of the wavelet packet coefficients(forψγ∈W l)of the activated time series.The mean(top)and the variance (bottom)is shown for each distribution.Fig.3reveals that other nodes also have a large mean(albeit rel-atively smaller).As shown in Fig.1the wavelet packets at these nodes also oscillate at the frequency of the stimulus,but have a smaller support(they have local oscillations),and thus contribute to a smaller correlation with s x.The variance at these nodes is much larger(1500-3000)than at the other nodes(400).Also,the distribution at these nodes appears to be skewed and less Gaus-sian than at the other nodes.The empirical distributions appear to be zero-mean Gaussian distributions at all other nodes.As indi-cated in[4],there is very little energy at harmonics of the stimulus frequency,and therefore the histograms of the wavelet packet co-efficients for the other nodes of the tree(not shown here)also have a zero mean.Wefinally considered a three dimensional neigh-borhood N(x0)that was placed around an activated region and included a mixture of activated and background time series.Fig.4.Histogram of the wavelet packet coefficients(forψγ∈W l)of the mixture time series.The mean(top)and the variance (bottom)is shown for each distribution.Fig.5.Histogram of the W statistic computed from activated time series only.The mean(top),the variance(center),and theχ2score (bottom)are displayed for each distribution.Figure4shows the histogram of the coefficients(forψγ∈W l) of all the time series in N(x0).Most projections appear to be Gaussian with a small mean and a variance around400.The samewavelet packetψγ0gives rise to a multimodal distribution that isclearly non Gaussian.4.LOCAL STATISTICS OF THE COEFFICIENTSIn the previous section we studied the empirical distribution of the wavelet packet coefficients computed over the entire brain.While we expect the background signal to be constant(except for a pos-sible slowly varying drift),the strength of the signal s x may vary from one activated voxel x to another.We can interpret the empiri-cal distribution ofαx(γ0)in Fig.4as follows:the activated region is composed of a small number of subregions whereinαx(γ)is Gaussian distributed;and the mean is different for each subregion. The distribution ofαx(γ0)can be thus described by a mixture of Gaussian distributions.We can test this hypothesis by performing a local analysis of the distribution ofαx(γ).For each position of the neighborhood N(x0)we test the hypothesis that the distribu-tion of the coefficients{αx(γ),x∈N(x0)}is Gaussian.Several test for normality exist,and we use the W Shapiro-Wilk test[5] because of its ability to detect non Gaussian distributions.As x0 is moved throughout the brain we collect many samples for the W statistic.We can compute the empirical distribution of W(for all values of x0),as is shown in Fig.5and Fig.6.In order to quan-tify non-normality,we compare this empirical distribution with the distribution of W obtained under the normality assumption.The comparison is performed using theχ2distance.A largeχ2score indicates that the distribution of the wavelet packet coefficient is non Gaussian.Fig.5shows the empirical distribution of the W statistic computed locally from activated time series only(we use αx in the computation of W only if x is activated).Because all theχ2distances are small,the hypothesis thatαx is Gaussian can be accepted.Fig.6.Histogram of the W statistic computed from a mixture of activated and background time series.The mean(top),the vari-ance(center),and theχ2score(bottom)are displayed for each distribution.A similar experiment was conducted,and the results(not shown here)indicate that the distribution ofαx(γ)is Gaussian for allγif s x is a background time-series.Fig.6shows the empirical distri-bution of the W statistic computed locally when N(x0)contains a part of an activated region as well as background voxels.While al-most all distributions appear to be Gaussian(smallχ2),there exists a few non Gaussian distributions.The most non-Gaussian distri-bution is obtained again for the wavelet packet whose frequency match exactly the frequency of the stimulus.5.MIXTURE OF GAUSSIAN DENSITIESOur experimental results indicate that it is possible tofind a small number of interesting projections that reveal the presence of ac-tivated time series.Because most one dimensional projections of high dimensional data are approximatively Gaussian[6],bi-modal or more generally non Gaussian,distributions of the wavelet packet coefficients are likely to reveal the presence of non back-ground time series.A natural model for the joint distribution of the wavelet packet coefficientsαx=[αx(γ1),···,αx(γT)]T is afinite mixture of multivariate Gaussian densities,p(α)=MXm=1πmφ(α,µm,Σm).(2)The mixing parameters are positive weights that add up to1.The densityφis the normal density function.This assumption is in perfect agreement with our experimentalfindings.Indeed,since the marginals of a mixture of multivariate Gaussian densities are mixture of Gaussian densities,we expect the distribution of the αx(γ)to be mixtures of Gaussian densities.This is exactly what we observed in our experiments.We note that the model(2)is global,and we need more than one activated component in order to describe different levels of activation.This model can be inter-preted in the temporal domain.The signal s x can be decomposed as followss x(t)=θx(t)+a x(t)+νx(t)(3)the empirical distribution of the wavelet packets.whereθx(t)is a baseline drift,and a x(t)is the stimulus-induced response.The noiseνx(t)is correlated with a1/f spectral be-havior associated with long memory processes[7–9].The wavelet coefficients of the driftθ(t)correspond to low frequencies,and should be zero forγ=γ0.The wavelet coefficients of the noise will be uncorrelated[8,9],and will be Gaussian distributed.The noise coefficients will contribute to the zero-mean component of the mixture.The wavelet coefficients of the activated component will be non zero forγ=γ0,and Gaussian distributed.The mean of the distribution will vary with the strength of the activation at the voxel x.V oxels with similar activation strength will contribute to a common component in the mixture.The maximum likelihood estimates ofµm andΣm given the observations can be computed with the Expectation Minimization(EM)algorithm.Figures7and 8show the mixture model superimposed on the empirical distribu-tion of the wavelet packetsαx(γ0),where x are taken in a neigh-borhood N(x0)that overlap with an activated region.In order to capture the tail on the right side of the distribution we used several activated components.Tables1and2show the mixture parameters for all the components.m123πm0.740.150.11µm1077114σm2738100Table1.Parameters of the mixture for a3component model.m12345πm0.840.080.050.020.002µm1391139224377σm3023334826Table2.Parameters of the mixture for a5component model.empirical distribution of the wavelet packets.6.REFERENCES[1]P.P.Mitra and B.Pesaran,“Analysis of dynamic brain imagingdata,”Biophysical Journal,vol.76,pp.691–708,1999. [2] F.G.Meyer and J.Chinrungrueng,“Spatiotemporal clusteringof fMRI time series in the spectral domain,”Accepted for publication in Medical Image Analysis,2003.[3]K.J.Friston,A.P.Holmes,K.J.Worsley,J.P.Poline,C.D.Frith,and R.S.J.Frackowiak,“Statistical parametric maps in functional imaging:A general linear approach,”Human Brain Mapping,vol.2,pp.189–210,1995.[4]J.L.Marchini and B.D.Ripley,“A new statistical approach todetecting significant activationin functional MRI,”NeuroIm-age,vol.12,pp.366–380,2000.[5]S.S.Shapiro and M.B.Wilk,“An analysis of variance testfor normality(complete samples),”Biometrika,vol.52,pp.591–611,1965.[6]P.Diaconis and D.Freedman,“Asymptotics of graphical pro-jection pursuit,”Annals of Statistics,vol.12,no.3,pp.793–815,1984.[7] E.Zarahn,G.K.Aguire,and M.D’Esposito,“Empirical anal-ysis of fMRI statistics:I.Spatially unsmoothed data collected under null hypothesis conditons,”Neuroimage,vol.5,pp.179–197,1997.[8]J.Fadili and E.Bullmore,“Wavelet-generalized least squares:a new BLU estimator of linear regression models with1/ferrors,”NeuroImage,vol.15,pp.217–232,2002.[9] F.G.Meyer and J.Chinrungrueng,“Analysis of event-relatedfMRI data using best clustering bases,”IEEE Transactions on medical imaging,vol.22(8),pp.933–939,2003.。

glivenko-cantelli格里文科定理证明

glivenko-cantelli格里文科定理证明

1The Glivenko-Cantelli TheoremLet X i,i=1,...,n be an i.i.d.sequence of random variables with distribu-tion function F on R.The empirical distribution function is the function ofx defined byˆFn(x)=1n1≤i≤nI{X i≤x}.For a given x∈R,we can apply the strong law of large numbers to the sequence I{X i≤x},i=1,...n to assert thatˆFn(x)→F(x)a.s(in order to apply the strong law of large numbers we only need to show that E[|I{X i≤x}|]<∞,which in this case is trivial because|I{X i≤x}|≤1).In this sense,ˆF n(x)is a reasonable estimate of F(x)for a given x∈R. But isˆF n(x)a reasonable estimate of the F(x)when both are viewed as functions of x?The Glivenko-Cantelli Thoerem provides an answer to this question.It asserts the following:Theorem1.1Let X i,i=1,...,n be an i.i.d.sequence of random variables with distribution function F on R.Then,supx∈R|ˆF n(x)−F(x)|→0a.s.(1) This result is perhaps the oldest and most well known result in the very large field of empirical process theory,which is at the center of much of modern econometrics.The statistic(1)is an example of a Kolmogorov-Smirnov statistic.We will break the proof up into several steps.Lemma1.1Let F be a(nonrandom)distribution function on R.For each >0there exists afinite partition of the real line of the form−∞=t0< t1<···<t k=∞such that for0≤j≤k−1F(t−j+1)−F(t j)≤ .1Proof:Let >0be given.Let t0=−∞and for j≥0definet j+1=sup{z:F(z)≤F(t j)+ }.Note that F(t j+1)≥F(t j)+ .To see this,suppose that F(t j+1)<F(t j)+ .Then,by right continuity of F there would existδ>0so that F(t j+1+δ)< F(t j)+ ,which would contradict the definition of t j+1.Thus,between t j and t j+1,F jumps by at least .Since this can happen at most afinite number of times,the partition is of the desired form,that is−∞=t0< t1<···<t k=∞with k<∞.Moreover,F(t−j+1)≤F(t j)+ .To see this, note that by definition of t j+1we have F(t j+1−δ)≤F(t j)+ for allδ>0.The desired result thus follows from the definition of F(t−j+1).Lemma1.2Suppose F n and F are(nonrandom)distribution functions on R such that F n(x)→F(x)and F n(x−)→F(x−)for all x∈R.Thensupx∈R|F n(x)−F(x)|→0.Proof:Let >0be given.We must show that there exists N=N( )such that for n>N and any x∈R|F n(x)−F(x)|< .Let >0be given and consider a partition of the real line intofinitely many pieces of the form−∞=t0<t1···<t k=∞such that for0≤j≤k−1F(t−j+1)−F(t j)≤2.The existence of such a partition is ensured by the previous lemma.For any x∈R,there exists j such that t j≤x<t j+1.For such j,F n(t j)≤F n(x)≤F n(t−j+1)F(t j)≤F(x)≤F(t−j+1),which implies thatF n(t j)−F(t−j+1)≤F n(x)−F(x)≤F n(t−j+1)−F(t j).2Furthermore,F n(t j)−F(t j)+F(t j)−F(t−j+1)≤F n(x)−F(x)F n(t−j+1)−F(t−j+1)+F(t−j+1)−F(t j)≥F n(x)−F(x).By construction of the partition,we have thatF n(t j)−F(t j)−2≤F n(x)−F(x)F n(t−j+1)−F(t−j+1)+2≥F n(x)−F(x).For each j,let N j=N j( )be such that for n>N jF n(t j)−F(t j)>− 2and let M j=M j( )be such that for n>M jF n(t−j )−F(t−j)<2.Let N=max1≤j≤k max{N j,M j}.For n>N and any x∈R,we have that|F n(x)−F(x)|< .The desired result follows.Lemma1.3Suppose F n and F are(nonrandom)distribution functions on R such that F n(x)→F(x)for all x∈Q.Suppose further that F n(x)−F n(x−)→F(x)−F(x−)for all jump points of F.Then,for all x∈R F n(x)→F(x)and F n(x−)→F(x−).Proof:Let x∈R.Wefirst show that F n(x)→F(x).Let s,t∈Q such that s<x<t.First suppose x is a continuity point of F.Since F n(s)≤F n(x)≤F n(t)and s,t∈Q,it follows thatF(s)≤lim infn→∞F n(x)≤lim supn→∞F n(x)≤F(t).Since x is a continuity point of F,lim s→x−F(s)=limt→x+F(t)=F(x),3from which the desired result follows.Now suppose x is a jump point of F .Note thatF n (s )+F n (x )−F n (x −)≤F n (x )≤F n (t ).Since s,t ∈Q and x is a jump point of F ,F (s )+F (x )−F (x −)≤lim inf n →∞F n (x )≤lim sup n →∞F n (x )≤F (t ).Sincelim s →x −F (s )=F (x −)lim t →x +F (t )=F (x ),the desired result follows.We now show that F n (x −)→F (x −).First suppose x is a continuity point of F .Since F n (x −)≤F n (x ),lim sup n →F n (x −)≤lim sup n →F n (x )=F (x )=F (x −).For any s ∈Q such that s <x ,we have F n (s )≤F n (x −),which implies thatF (s )≤lim inf n →∞F n (x −).Sincelim s →x −F (s )=F (x −),the desired result follows.Now suppose x is a jump point of F .By as-sumption,F n (x )−F n (x −)→F (x )−F (x −),and,by the above argument,F n (x )→F (x ).The desired result follows.Proof of Theorem 1.1:If we can show that there exists a set N suchthat Pr {N }=0and for all ω∈N (i)ˆFn (x,ω)→F (x )for all x ∈Q and (ii)ˆFn (x,ω)−F n (x −,ω)→F (x )−F (x −)for all jump points of F ,then the result will follow from an application of Lemmas 1.2and 1.3.For each x ∈Q ,let N x be a set such that Pr {N x }=0and for all ω∈N x ,ˆF n (x,ω)→F (x ).Let N 1= x ∈Q .Then,for all ω∈N 1,ˆF n (x,ω)→F (x )by construction.Moreover,since Q is countable,Pr {N 1}=0.4For integer i ≥1,let J i denote the set of jump points of F of size at least 1/i .Note that for each i ,J i is finite.Next note that the set of all jump points of F can be written as J = 1≤i<∞J i .For each x ∈J ,let M x denotea set such that Pr {M x }=0and for all ω∈M x ,ˆF n (x,ω)−F n (x −,ω)→F (x )−F (x −).Let N 2= x ∈J M x .Since J is countable,Pr {N 2}=0.To complete the proof,let N =N 1∪N 2.By construction,for ω∈N ,(i)and (ii)hold.Moreover,Pr {N }=0.The desired result follows.2The Sample MedianWe now give a brief application of the Glivenko-Cantelli Theorem.Let X i ,i =1,...,n be an i.i.d.sequence of random variables with distribution F .Suppose one is interested in the median of F .Concretely,we will defineMed(F )=inf {x :F (x )≥12}.A natural estimator of Med(F )is the sample analog,Med(ˆFn ).Under what conditions is Med(ˆFn )a reasonable estimate of Med(F )?Let m =Med(F )and suppose that F is well behaved at m in the sense that F (t )>12whenever t >m .Under this condition,we can show usingthe Glivenko-Cantelli Theorem that Med(ˆFn )→Med(F )a.s.We will now prove this result.Suppose F n is a (nonrandom)sequence of distribution functions such thatsup x ∈R |F n (x )−F (x )|→0.Let >0be given.We wish to show that there exists N =N ( )such that for all n >N|Med(F n )−Med(F )|< .Choose δ>0so thatδ<12−F (m − )δ<F (m + )−12,5which in turn implies thatF(m− )<12−δF(m+ )>12+δ.(It might help to draw a picture to see why we should pickδin this way.) Next choose N so that for all n>N,supx∈R|F n(x)−F(x)|<δ.Let m n=Med(F n).For such n,m n>m− ,for if m n≤m− ,thenF(m− )>F n(m− )−δ≥12−δ,which contradicts the choice ofδ.We also have that m n<m+ ,for if m n≥m+ ,thenF(m+ )<F n(m+ )+δ≤12+δ,which again contradicts the choice ofδ.Thus,for n>N,|m n−m|< ,as desired.By the Glivenko-Cantelli Theorem,it follows immediately that Med(ˆF n)→Med(F)a.s.6。

electromagnetic wave diffraction by a slot and strip

electromagnetic wave diffraction by a slot and strip

Article history: Received 28 October 2009 Accepted 26 January 2010 Keywords: Electromagnetic diffraction Exact solution Slot Strip Dual integral e 017 212 47 17; fax: +375 017 212 47 17. E-mail address: serdyukvm@bsu.by. 1434-8411/$ – see front matter © 2010 Elsevier GmbH. All rights reserved. doi:10.1016/j.aeue.2010.04.002
develop another techniques for solving the problem, for example, by the way of utilizing an expansion in the inverse value, which is equal to the ratio of slot (strip) width to a wavelength [5]. But it appears that corresponding series are poorly convergent. Strictly speaking, an exact solution of the diffraction problem for a strip is known as a limiting case of scattering solution for an elliptic cylinder [8]. However, this solution is expressed in terms of the Mathieu functions and is inconvenient for computation. As far as we know even at modern development of computers, nobody uses this solution. It can be simplified by means of asymptotic expansion [9], but such transformation essentially limits the field of its application. So, obtaining of a simple and appropriate form of an exact solution for the slot and strip diffraction remains actual up to now. Especially, it corresponds to the cases of a great wavelength. From the mathematical point of view, the problem is reduced to solving the well-known system of the dual integral equations in Fourier amplitudes of diffraction field [5,6]. Now it is clear that for them one can not obtain such a simple and compact solution, as for a half-plane [1]. For a slot or strip, a solution of diffraction problem could be presented as some complex integral or an infinite series. The first case is undesirable, since calculation of integral sums usually needs in great number of terms, and correspondingly, one should deal with such a great number of unknowns. Hence, it is preferably to have a solution in the form of a discrete convergent series, whose terms could have a simple form and be calculated easily. As such a series, it is possible to consider a power series or Fourier one, and also to employ specific basic functions, appropriate to more complex diffraction structures [10]. A similar representation of a solution in the form of a Fourier series, or a sum of slot modes, is used for a long time for simulation of wave diffraction by slots in conducting screens of finite thickness by means of the mode matching technique [11–16]. It provides opportunity

Small-scale clumps in the galactic halo and dark matter annihilation

Small-scale clumps in the galactic halo and dark matter annihilation
PACS numbers: 12.60.Jv, 95.35.+d, 98.35.Gi
arXiv:astro-ph/0301551v3 15 Jan 2004
I.
INTRODUCTION
Both analytic calculations [1, 2] and numerical simulations [3, 4, 5] predict existence of dark matter clumps in the Galactic halo. The density profile in these clumps according to analytic calculations [6, 7, 8, 9] and numerical simulations [4, 10] is ρ(r) ∝ r−β . An average density of the dark matter (DM) in a galactic halo itself also exhibits a similar density profile (relative to a galactic center) in the both approaches. The DM profiles in clusters of galaxies is discussed in [11] and in references therein. In the analytic approach of Gurevich and Zybin (see review [9] and references therein) the density profiles are predicted to be universal, with β ≈ 1.7 − 1.9 for clumps, galaxies and two-point correlation functions of galaxies. In the numerical simulations the density profiles can be evaluated only for the relatively large scales due to the limited mass resolution. The value of β differs in different simulations from β = 1.0 [10] to β = 1.5 [3] and may be non-universal for the objects of different mass scales [12]. An attempt of analytical explanation of the results of numerical simulations has been performed in [13, 14]. The phase-space density profiles of DM halos are investigated in [15].

SCI写作句型汇总

SCI写作句型汇总

S C I论文写作中一些常用的句型总结(一)很多文献已经讨论过了一、在Introduction里面经常会使用到的一个句子:很多文献已经讨论过了。

它的可能的说法有很多很多,这里列举几种我很久以前搜集的:A.??Solar energy conversion by photoelectrochemical cells?has been intensively investigated.?(Nature 1991, 353, 737 - 740?)B.?This was demonstrated in a number of studies that?showed that composite plasmonic-metal/semiconductor photocatalysts achieved significantly higher rates in various photocatalytic reactions compared with their pure semiconductor counterparts.C.?Several excellent reviews describing?these applications are available, and we do not discuss these topicsD.?Much work so far has focused on?wide band gap semiconductors for water splitting for the sake of chemical stability.(DOI:10.1038/NMAT3151)E.?Recent developments of?Lewis acids and water-soluble organometalliccatalysts?have attracted much attention.(Chem. Rev. 2002, 102, 3641?3666)F.?An interesting approach?in the use of zeolite as a water-tolerant solid acid?was described by?Ogawa et al(Chem.Rev. 2002, 102, 3641?3666)G.?Considerable research efforts have been devoted to?the direct transition metal-catalyzed conversion of aryl halides toaryl nitriles. (J. Org. Chem. 2000, 65, 7984-7989) H.?There are many excellent reviews in the literature dealing with the basic concepts of?the photocatalytic processand the reader is referred in particular to those by Hoffmann and coworkers,Mills and coworkers, and Kamat.(Metal oxide catalysis,19,P755)I. Nishimiya and Tsutsumi?have reported on(proposed)the influence of the Si/Al ratio of various zeolites on the acid strength, which were estimated by calorimetry using ammonia. (Chem.Rev. 2002, 102, 3641?3666)二、在results and discussion中经常会用到的:如图所示A. GIXRD patterns in?Figure 1A show?the bulk structural information on as-deposited films.?B.?As shown in Figure 7B,?the steady-state current density decreases after cycling between 0.35 and 0.7 V, which is probably due to the dissolution of FeOx.?C.?As can be seen from?parts a and b of Figure 7, the reaction cycles start with the thermodynamically most favorable VOx structures(J. Phys. Chem. C 2014, 118, 24950?24958)这与XX能够相互印证:A.?This is supported by?the appearance in the Ni-doped compounds of an ultraviolet–visible absorption band at 420–520nm (see Fig. 3 inset), corresponding to an energy range of about 2.9 to 2.3 eV.B. ?This?is consistent with the observation from?SEM–EDS. (Z.Zou et al. / Chemical Physics Letters 332 (2000) 271–277)C.?This indicates a good agreement between?the observed and calculated intensities in monoclinic with space groupP2/c when the O atoms are included in the model.D. The results?are in good consistent with?the observed photocatalytic activity...E. Identical conclusions were obtained in studies?where the SPR intensity and wavelength were modulated by manipulating the composition, shape,or size of plasmonic nanostructures.?F.??It was also found that areas of persistent divergent surfaceflow?coincide?with?regions where convection appears to be consistently suppressed even when SSTs are above 27.5°C.(二)1. 值得注意的是...A.?It must also be mentioned that?the recycling of aqueous organic solvent is less desirable than that of pure organic liquid.B.?Another interesting finding is that?zeolites with 10-membered ring pores showed high selectivities (>99%) to cyclohexanol, whereas those with 12-membered ring pores, such as mordenite, produced large amounts of dicyclohexyl ether. (Chem. Rev. 2002, 102,3641?3666)C.?It should be pointed out that?the nanometer-scale distribution of electrocatalyst centers on the electrode surface is also a predominant factor for high ORR electrocatalytic activity.D.?Notably,?the Ru II and Rh I complexes possessing the same BINAP chirality form antipodal amino acids as the predominant products.?(Angew. Chem. Int. Ed., 2002, 41: 2008–2022)E. Given the multitude of various transformations published,?it is noteworthy that?only very few distinct?activation?methods have been identified.?(Chem. Soc. Rev., 2009,?38, 2178-2189)F.?It is important to highlight that?these two directing effects will lead to different enantiomers of the products even if both the “H-bond-catalyst” and the?catalyst?acting by steric shielding have the same absolute stereochemistry. (Chem. Soc. Rev.,?2009,?38, 2178-2189)G.?It is worthwhile mentioning that?these PPNDs can be very stable for several months without the observations of any floating or precipitated dots, which is attributed to the electrostatic repulsions between the positively charge PPNDs resulting in electrosteric stabilization.(Adv. Mater., 2012, 24: 2037–2041)2.?...仍然是个挑战A.?There is thereby an urgent need but it is still a significant challenge to?rationally design and delicately tail or the electroactive MTMOs for advanced LIBs, ECs, MOBs, and FCs.?(Angew. Chem. Int. Ed.2 014, 53, 1488 – 1504)B.?However, systems that are?sufficiently stable and efficient for practical use?have not yet been realized.C.??It?remains?challenging?to?develop highly active HER catalysts based on materials that are more abundant at lower costs. (J. Am. Chem.Soc.,?2011,?133, ?7296–7299)D.?One of the?great?challenges?in the twenty-first century?is?unquestionably energy storage. (Nature Materials?2005, 4, 366 - 377?)众所周知A.?It is well established (accepted) / It is known to all / It is commonlyknown?that?many characteristics of functional materials, such as composition, crystalline phase, structural and morphological features, and the sur-/interface properties between the electrode and electrolyte, would greatly influence the performance of these unique MTMOs in electrochemical energy storage/conversion applications.(Angew. Chem. Int. Ed.2014,53, 1488 – 1504)B.?It is generally accepted (believed) that?for a-Fe2O3-based sensors the change in resistance is mainly caused by the adsorption and desorption of gases on the surface of the sensor structure. (Adv. Mater. 2005, 17, 582)C.?As we all know,?soybean abounds with carbon,?nitrogen?and oxygen elements owing to the existence of sugar,?proteins?and?lipids. (Chem. Commun., 2012,?48, 9367-9369)D.?There is no denying that?their presence may mediate spin moments to align parallel without acting alone to show d0-FM. (Nanoscale, 2013,?5, 3918-3930)(三)1. 正如下文将提到的...A.?As will be described below(也可以是As we shall see below),?as the Si/Al ratio increases, the surface of the zeolite becomes more hydrophobic and possesses stronger affinity for ethyl acetate and the number of acid sites decreases.(Chem. Rev. 2002, 102, 3641?3666)B. This behavior is to be expected and?will?be?further?discussed?below. (J. Am. Chem. Soc.,?1955,?77, 3701–3707)C.?There are also some small deviations with respect to the flow direction,?whichwe?will?discuss?below.(Science, 2001, 291, 630-633)D.?Below,?we?will?see?what this implies.E.?Complete details of this case?will?be provided at a?later?time.E.?很多论文中,也经常直接用see below来表示,比如:The observation of nanocluster spheres at the ends of the nanowires is suggestive of a VLS growth process (see?below). (Science, 1998, ?279, 208-211)2. 这与XX能够相互印证...A.?This is supported by?the appearance in the Ni-doped compounds of an ultraviolet–visible absorption band at 420–520 nm (see Fig. 3 inset), corresponding to an energy range of about 2.9 to 2.3 eVB.This is consistent with the observation from?SEM–EDS. (Chem. Phys. Lett. 2000, 332, 271–277)C.?Identical conclusions were obtained?in studies where the SPR intensity and wavelength were modulated by manipulating the composition, shape, or size of plasmonic nanostructures.?(Nat. Mater. 2011, DOI: 10.1038/NMAT3151)D. In addition, the shape of the titration curve versus the PPi/1 ratio,?coinciding withthat?obtained by fluorescent titration studies, suggested that both 2:1 and 1:1 host-to-guest complexes are formed. (J. Am. Chem. Soc. 1999, 121, 9463-9464)E.?This unusual luminescence behavior is?in accord with?a recent theoretical prediction; MoS2, an indirect bandgap material in its bulk form, becomes a direct bandgapsemiconductor when thinned to a monolayer.?(Nano Lett.,?2010,?10, 1271–1275)3.?我们的研究可能在哪些方面得到应用A.?Our ?ndings suggest that?the use of solar energy for photocatalytic watersplitting?might provide a viable source for?‘clean’ hydrogen fuel, once the catalyticef?ciency of the semiconductor system has been improved by increasing its surface area and suitable modi?cations of the surface sites.B. Along with this green and cost-effective protocol of synthesis,?we expect that?these novel carbon nanodots?have potential applications in?bioimaging andelectrocatalysis.(Chem. Commun., 2012,?48, 9367-9369)C.?This system could potentially be applied as?the gain medium of solid-state organic-based lasers or as a component of high value photovoltaic (PV) materials, where destructive high energy UV radiation would be converted to useful low energy NIR radiation. (Chem. Soc. Rev., 2013,?42, 29-43)D.?Since the use of?graphene?may enhance the photocatalytic properties of TiO2?under UV and visible-light irradiation,?graphene–TiO2?composites?may potentially be usedto?enhance the bactericidal activity.?(Chem. Soc. Rev., 2012,?41, 782-796)E.??It is the first report that CQDs are both amino-functionalized and highly fluorescent,?which suggests their promising applications in?chemical sensing.(Carbon, 2012,?50,?2810–2815)(四)1. 什么东西还尚未发现/系统研究A. However,systems that are sufficiently stable and efficient for practical use?have not yet been realized.B. Nevertheless,for conventional nanostructured MTMOs as mentioned above,?some problematic disadvantages cannot be overlooked.(Angew. Chem. Int. Ed.2014,53, 1488 – 1504)C.?There are relatively few studies devoted to?determination of cmc values for block copolymer micelles. (Macromolecules 1991, 24, 1033-1040)D. This might be the reason why, despite of the great influence of the preparation on the catalytic activity of gold catalysts,?no systematic study concerning?the synthesis conditions?has been published yet.?(Applied Catalysis A: General2002, 226, ?1–13)E.?These possibilities remain to be?explored.F.??Further effort is required to?understand and better control the parameters dominating the particle surface passivation and resulting properties for carbon dots of brighter photoluminescence. (J. Am. Chem. Soc.,?2006,?128?, 7756–7757)2.?由于/因为...A.?Liquid ammonia?is particularly attractive as?an alternative to water?due to?its stability in the presence of strong reducing agents such as alkali metals that are used to access lower oxidation states.B.?The unique nature of?the cyanide ligand?results from?its ability to act both as a σdonor and a π acceptor combined with its negativecharge and ambidentate nature.C.?Qdots are also excellent probes for two-photon confocalmicroscopy?because?they are characterized by a very large absorption cross section?(Science ?2005,?307, 538-544).D.?As a result of?the reductive strategy we used and of the strong bonding between the surface and the aryl groups, low residual currents (similar to those observed at a bare electrode) were obtained over a large window of potentials, the same as for the unmodified parent GC electrode. (J. Am. Chem. Soc. 1992, 114, 5883-5884)E.?The small Tafel slope of the defect-rich MoS2 ultrathin nanosheets is advantageous for practical?applications,?since?it will lead to a faster increment of HER rate with increasing overpotential.(Adv. Mater., 2013, 25: 5807–5813)F. Fluorescent carbon-based materials have drawn increasing attention in recent years?owing to?exceptional advantages such as high optical absorptivity, chemical stability, biocompatibility, and low toxicity.(Angew. Chem. Int. Ed., 2013, 52: 3953–3957)G.??On the basis of?measurements of the heat of immersion of water on zeolites, Tsutsumi etal. claimed that the surface consists of siloxane bondings and is hydrophobicin the region of low Al content. (Chem. Rev. 2002, 102, 3641?3666)H.?Nanoparticle spatial distributions might have a large significance for catalyst stability,?given that?metal particle growth is a relevant deactivation mechanism for commercial catalysts.?3. ...很重要A.?The inhibition of additional nucleation during growth, in other words, the complete separation?of nucleation and growth,?is?critical(essential, important)?for?the successful synthesis of monodisperse nanocrystals. (Nature Materials?3, 891 - 895 (2004))B.??In the current study,?Cys,?homocysteine?(Hcy) and?glutathione?(GSH) were chosen as model?thiol?compounds since they?play important (significant, vital, critical) roles?in many biological processes and monitoring of these?thiol?compounds?is of great importance for?diagnosis of diseases.(Chem. Commun., 2012,?48, 1147-1149)C.?This is because according to nucleation theory,?what really matters?in addition to the change in temperature ΔT?(or supersaturation) is the cooling rate.(Chem. Soc. Rev., 2014,?43, 2013-2026)(五)1. 相反/不同于A.?On the contrary,?mononuclear complexes, called single-ion magnets (SIM), have shown hysteresis loops of butterfly/phonon bottleneck type, with negligiblecoercivity, and therefore with much shorter relaxation times of magnetization. (Angew. Chem. Int. Ed., 2014, 53: 4413–4417)B.?In contrast,?the Dy compound has significantly larger value of the transversal magnetic moment already in the ground state (ca. 10?1?μB), therefore allowing a fast QTM. (Angew. Chem. Int. Ed., 2014, 53: 4413–4417)C.?In contrast to?the structural similarity of these complexes, their magnetic behavior exhibits strong divergence.?(Angew. Chem. Int. Ed., 2014, 53: 4413–4417)D.?Contrary to?other conducting polymer semiconductors, carbon nitride ischemically and thermally stable and does not rely on complicated device manufacturing. (Nature materials, 2009, 8(1): 76-80.)E.?Unlike?the spherical particles they are derived from that Rayleigh light-scatter in the blue, these nanoprisms exhibit scattering in the red, which could be useful in developing multicolor diagnostic labels on the basis not only of nanoparticle composition and size but also of shape. (Science 2001,? 294, 1901-1903)2. 发现,阐明,报道,证实可供选择的词包括:verify, confirm, elucidate, identify, define, characterize, clarify, establish, ascertain, explain, observe, illuminate, illustrate,demonstrate, show, indicate, exhibit, presented, reveal, display, manifest,suggest, propose, estimate, prove, imply, disclose,report, describe,facilitate the identification of?举例:A. These stacks appear as nanorods in the two-dimensional TEM images, but tilting experiments?confirm that they are nanoprisms.?(Science 2001,? 294, 1901-1903)B. Note that TEM?shows?that about 20% of the nanoprisms are truncated.?(Science 2001,? 294, 1901-1903)C. Therefore, these calculations not only allow us to?identify?the important features in the spectrum of the nanoprisms but also the subtle relation between particle shape and the frequency of the bands that make up their spectra.?(Science 2001,? 294, 1901-1903)D. We?observed?a decrease in intensity of the characteristic surface plasmon band in the ultraviolet-visible (UV-Vis) spectroscopy for the spherical particles at λmax?= 400 nm with a concomitant growth of three new bands of λmax?= 335 (weak), 470 (medium), and 670 nm (strong), respectively. (Science 2001,? 294, 1901-1903)E. In this article, we present data?demonstrating?that opiate and nonopiate analgesia systems can be selectively activated by different environmental manipulationsand?describe?the neural circuitry involved. (Science 1982, 216, 1185-1192)F. This?suggests?that the cobalt in CoP has a partial positive charge (δ+), while the phosphorus has a partial negative charge (δ?),?implying?a transfer of electron density from Co to P.?(Angew. Chem., 2014, 126: 6828–6832)3. 如何指出当前研究的不足A. Although these inorganic substructures can exhibit a high density of functional groups, such as bridging OH groups, and the substructures contribute significantly to the adsorption properties of the material,surprisingly little attention has been devoted to?the post-synthetic functionalization of the inorganic units within MOFs. (Chem. Eur. J., 2013, 19: 5533–5536.)B.?Little is known,?however, about the microstructure of this material. (Nature Materials 2013,12, 554–561)C.?So far, very little information is available, and only in?the absorber film, not in the whole operational devices. (Nano Lett.,?2014,?14?(2), pp 888–893)D.?In fact it should be noted that very little optimisation work has been carried out on?these devices. (Chem. Commun., 2013,?49, 7893-7895)E. By far the most architectures have been prepared using a solution processed perovskite material,?yet a few examples have been reported that?have used an evaporated perovskite layer. (Adv. Mater., 2014, 27: 1837–1841.)F. Water balance issues have been effectively addressed in PEMFC technology through a large body of work encompassing imaging, detailed water content and water balance measurements, materials optimization and modeling,?but very few of these activities have been undertaken for?anion exchange membrane fuel cells,? primarily due to limited materials availability and device lifetime. (J. Polym. Sci. Part B: Polym. Phys., 2013, 51: 1727–1735)G. However,?none of these studies?tested for Th17 memory, a recently identified T cell that specializes in controlling extracellular bacterial infections at mucosal surfaces. (PNAS, 2013,?111, 787–792)H. However,?uncertainty still remains as to?the mechanism by which Li salt addition results in an extension of the cathodic reduction limit. (Energy Environ. Sci., 2014,?7, 232-250)I.?There have been a number of high profile cases where failure to?identify the most stable crystal form of a drug has led to severe formulation problems in manufacture. (Chem. Soc. Rev., 2014,?43, 2080-2088)J. However,?these measurements systematically underestimate?the amount of ordered material. ( Nature Materials 2013, 12, 1038–1044)(六)1.?取决于a.?This is an important distinction, as the overall activity of a catalyst will?depend on?the material properties, synthesis method, and other possible species that can be formed during activation.?(Nat. Mater.?2017,16,225–229)b.?This quantitative partitioning?was determined by?growing crystals of the 1:1 host–guest complex between?ExBox4+?and corannulene. (Nat. Chem.?2014,?6177–178)c.?They suggested that the Au particle size may?be the decisive factor for?achieving highly active Au catalysts.(Acc. Chem. Res.,?2014,?47, 740–749)d.?Low-valent late transition-metal catalysis has?become indispensable to?chemical synthesis, but homogeneous high-valent transition-metal catalysis is underdeveloped, mainly owing to the reactivity of high-valent transition-metal complexes and the challenges associated with synthesizing them.?(Nature2015,?517,449–454)e.?The polar effect?is a remarkable property that enables?considerably endergonic C–H abstractions?that would not be possible otherwise.?(Nature?2015, 525, 87–90)f.?Advances in heterogeneous catalysis?must rely on?the rational design of new catalysts. (Nat. Nanotechnol.?2017, 12, 100–101)g.?Likely, the origin of the chemoselectivity may?be also closely related to?the H?bonding with the N or O?atom of the nitroso moiety, a similar H-bonding effect is known in enamine-based nitroso chemistry. (Angew. Chem. Int. Ed.?2014, 53: 4149–4153)2.?有很大潜力a.?The quest for new methodologies to assemble complex organic molecules?continues to be a great impetus to?research efforts to discover or to optimize new catalytic transformations. (Nat. Chem.?2015,?7, 477–482)b.?Nanosized faujasite (FAU) crystals?have great potential as?catalysts or adsorbents to more efficiently process present and forthcoming synthetic and renewablefeedstocks in oil refining, petrochemistry and fine chemistry. (Nat. Mater.?2015, 14, 447–451)c.?For this purpose, vibrational spectroscopy?has proved promising?and very useful.?(Acc Chem Res. 2015, 48, 407–413.)d.?While a detailed mechanism remains to be elucidated and?there is room for improvement?in the yields and selectivities, it should be remarked that chirality transfer upon trifluoromethylation of enantioenriched allylsilanes was shown. (Top Catal.?2014,?57: 967.?)e.?The future looks bright for?the use of PGMs as catalysts, both on laboratory and industrial scales, because the preparation of most kinds of single-atom metal catalyst is likely to be straightforward, and because characterization of such catalysts has become easier with the advent of techniques that readily discriminate single atoms from small clusters and nanoparticles. (Nature?2015, 525, 325–326)f.?The unique mesostructure of the 3D-dendritic MSNSs with mesopore channels of short length and large diameter?is supposed to be the key role in?immobilization of active and robust heterogeneous catalysts, and?it would have more hopeful prospects in?catalytic applications. (ACS Appl. Mater. Interfaces,?2015,?7, 17450–17459)g.?Visible-light photoredox catalysis?offers exciting opportunities to?achieve challenging carbon–carbon bond formations under mild and ecologically benign conditions. (Acc. Chem. Res.,?2016, 49, 1990–1996)3. 因此同义词:Therefore, thus, consequently, hence, accordingly, so, as a result这一条比较简单,这里主要讲一下这些词的副词词性和灵活运用。

Empirical processes of dependent random variables

Empirical processes of dependent random variables

2
Preliminaries
n i=1
from R to R. The centered G -indexed empirical process is given by (P n − P )g = 1 n
n
the marginal and empirical distribution functions. Let G be a class of measurabrocesses that have been discussed include linear processes and Gaussian processes; see Dehling and Taqqu (1989) and Cs¨ org˝ o and Mielniczuk (1996) for long and short-range dependent subordinated Gaussian processes and Ho and Hsing (1996) and Wu (2003a) for long-range dependent linear processes. A collection of recent results is presented in Dehling, Mikosch and Sorensen (2002). In that collection Dedecker and Louhichi (2002) made an important generalization of Ossiander’s (1987) result. Here we investigate the empirical central limit problem for dependent random variables from another angle that avoids strong mixing conditions. In particular, we apply a martingale method and establish a weak convergence theory for stationary, causal processes. Our results are comparable with the theory for independent random variables in that the imposed moment conditions are optimal or almost optimal. We show that, if the process is short-range dependent in a certain sense, then the limiting behavior is similar to that of iid random variables in that the limiting distribution is a Gaussian process and the norming √ sequence is n. For long-range dependent linear processes, one needs to apply asymptotic √ expansions to obtain n-norming limit theorems (Section 6.2.2). The paper is structured as follows. In Section 2 we introduce some mathematical preliminaries necessary for the weak convergence theory and illustrate the essence of our approach. Two types of empirical central limit theorems are established. Empirical processes indexed by indicators of left half lines, absolutely continuous functions, and piecewise differentiable functions are discussed in Sections 3, 4 and 5 respectively. Applications to linear processes and iterated random functions are made in Section 6. Section 7 presents some integral and maximal inequalities that may be of independent interest. Some proofs are given in Sections 8 and 9.

实测数据参数拟合的红外大气透过率仿真

实测数据参数拟合的红外大气透过率仿真

第47卷第2期2017年2月激光与红外LASER&INFRAREDVol.47,No.2February,2017文章编号:l〇〇l-5078(2017)02-0183-06 •红外技术及应用•实测数据参数拟合的红外大气透过率仿真宋福印\路远1,杨星1,乔亚\吴晓迪\陈杰2(1.解放军电子工程学院脉冲功率激光技术国家重点实验室红外与低温等离子体安徽省重点实验室,安徽合肥230037;2.安徽建筑大学电子与信息工程学院,安徽合肥230037)摘要:计算红外大气透过率的常规方法主要有经验公式法和专业软件计算法,前者存在较大的误差,后者的计算过程复杂且对红外仿真系统的移植和兼容困难,因此,本文基于具体地区多年实测大气数据,利用分子的单线吸收法计算不同温度下的水蒸气和二氧化碳的吸收系数;然后,对不同高度分布的温度、压强以及气溶肢后向散射系数的解析式进行分月拟合。

在此基础上,实现红外辐射大气透过率的仿真建模。

仿真结果与主流专业软件M ODTRAN自定义模式下的精确计算结果很接近,而且本方法使用简单、移植性强,在工程应用上具有一定的参考价值。

关键词:大气透过率;实测数据;参数拟合;吸收系数;仿真中图分类号:TN211 文献标识码:A DOI:10. 3969/j.issn.1001-5078.2017.02. O ilSimulation of infrared atmospheric transmittance basedon measured data and parameter fittingSO NG Fu-yin1,LU Yuan1,YANG Xing1,Q IA O Ya1,WU Xiao-di1,CHEN Jie2(1. Key Laboratory of Infrared and Low Temperature Plasma of Anhui Province,State Key Laboratory of Pulsed PowerLaser Technology,Electronic Engineering Institute,Hefei 230037 ,China;2. Institute of Electronics and Information Engineering,Anhui Jianzhu University,Hefei 230037 ,China)Abstract :There are two conventional methods t o calculate infrared atmospheric transmittance,including empirical for­mula method and professional software calculating method. However,empirical formula method has large errors;profes­sional software calculating method has a complicated computational process and i s dif f i c u l t t o apply t o other infraredsimulation system. Therefore,based on many years of measured atmospheric data in some areas,absorption coefficientsof water vapor and carbon dioxide under different temperatures were calculated by single line absorption meth­od. Analysis formulas of temperatures, pressures and backscattering coefficients which distribute in different heightswere f i t t e d according t o different months. Then, the simulation calculation model of infrared radiation atmospherictransmittance was b u i l t. The simulation results are close t o the calculated results by user-defined model of M O D T­RAN. The method i s easy and convenient to use and has certain reference value in application.Key words :atmospheric transmittance ;measured data ;parameters f i t t i n g;absorption coefficients ;simulationsi引言 战场上日益显示出巨大的威力。

Lecture 4 Asymptotic Distribution Theory(计量经济学,英文版)

Lecture 4 Asymptotic Distribution Theory(计量经济学,英文版)

P (|Xn Yn |/(an bn ) > )
P (|Xn |/an > ) + P (|Yn |/bn > 1)
→ 0 If |Xn + Yn |/ max(an , bn ) > , then either |Xn |/an > /2 or |Yn |/bn > /2. P (|Xn + Yn |/ max(an , bn ) > ) ≤ P (|Xn |/an > /2) + P (|Yn |/bn > /2)
Proof: let M be a positive real numbe
P (|g (Xn ) − g (X)| > ) ≤ P (|g (Xn ) − g (X)| > , |Xn | ≤ M, |X| ≤ M ) +P ({|Xn | > M } ∪ {|X| > M }). (the above inequality uses P (A ∪ B ) ≤ P (A) + P (B ) where A = {|g (Xn ) − g (X)| > , |Xn | ≤ M, |X| ≤ M )}, B = {|Xn | > M, |X| > M }. ) Recall that if a function g is uniformly continuous on {x : |x| ≤ M }, then ∀ > 0, ∃η ( ), |Xn − X| < η ( ), so that |g (Xn ) − g (X)| < . Then {|g (Xn ) − g (X)| > , |Xn | ≤ M, |X| ≤ M )} ⊆ {|Xn − X| > η ( ).} Therefore, P (|g (Xn ) − g (X)| > ) ≤ P (|Xn − X| > η ( )) + P (|Xn | > M ) + P (|X| > M ) ≤ P (|Xn − X| > η ( )) + P (|X| > M ) +P (|X| > M/2) + P (|Xn − X| > M/2) Given any δ > 0, we can choose M to make the second and third terms each less than δ/4. Since Xn → X, the first and fourth terms will each be less than δ/4. Therefore, we have P (|g (Xn ) − g (X)| > ) ≤ δ. Then g (Xn ) → g (X). Definition 3 (Convergence almost surely) A sequence {Xn } is said to converge to X almost surely or with probability one if P ( lim |Xn − X | > ) = 0.

Review of Final LEP Results or A Tribute to LEP

Review of Final LEP Results or A Tribute to LEP

A word on the four LEP detectors ALEPH, DELPHI, L3, OPAL. All collaborations improved their detectors substantially during the years of data taking, the most important improvements being: 1. The development of silicon micro vertex detectors for high resolution secondary vertex measurements. The installation of these detectors greatly improved the quality of heavy flavour physics. 2. All experiments replaced their first luminosity detectors by new high-precision detectors capable of measuring small angle Bhabha scattering with an accuracy well below 0.1 %. The LEP Collaborations also created a new style of working together, the LEP Working Groups, of which the Electroweak Working Group (EWWG) is best known. These groups have the task to combine the results obtained by the four LEP Collaborations and also by the SLD Collaboration working at the SLAC e+ e− linear collider SLC taking proper account of all systematic correlations between the data.

基于高阶变异的多错误定位实证研究

基于高阶变异的多错误定位实证研究

基于高阶变异的多错误定位实证研究①娄 琨, 尚 颖, 王海峰(北京化工大学 信息科学与技术学院, 北京 100029)通讯作者: 尚 颖摘 要: 错误定位是软件调试中最昂贵的活动之一. 基于变异的错误定位(MBFL)技术假定被大多数失败测试用例杀死的变异体能够很好地定位错误的位置. 之前的研究表明MBFL 在单错误定位上有很好的定位效果, 但关于MBFL 在多错误定位上的表现没有被深入研究过. 近年来, 高阶变异体被提出用于构造难以被杀死的复杂错误, 但高阶变异体是否能提升MBFL 的错误定位精度是未知的. 本文中, 我们研究了一阶变异体和高阶变异体在多错误定位场景下的表现. 进一步, 我们依据不同的变异位置将高阶变异体划分成3类: 准确高阶变异体、部分准确高阶变异体和不准确高阶变异体. 探索哪类变异体在错误定位上更有效. 基于5个程序上的实证研究, 我们发现在多错误定位场景下, 高阶变异体比一阶变异体有更好的定位效果. 更进一步, 我们发现不同种类的高阶变异体的影响是不容忽视的. 具体而言, 准确高阶变异体比不准确高阶变异体有更高的贡献. 因此研究人员应提出更有效的方法生成这类变异体用于未来的MBFL 研究.关键词: 错误定位; 基于变异的错误定位; 一阶变异体; 高阶变异体引用格式: 娄琨,尚颖,王海峰.基于高阶变异的多错误定位实证研究.计算机系统应用,2021,30(5):47–58. /1003-3254/7942.htmlEmpirical Study on Higher Order Mutation-Based Multiple Fault LocalizationLOU Kun, SHANG Ying, WANG Hai-Feng(College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China)Abstract : Fault localization is one of the most expensive activities in software debugging.The Mutation-Based Fault Localization (MBFL) assumes that the mutants killed by most of the failed test cases can provide a good indication about the location of a fault. Previous studies showed MBFL could achieve desired results in a Single Fault Localization Scenario (SFL-Scenario), but its performance in a Multiple Fault Localization Scenario (MFL-Scenario) has not been thoroughly evaluated. Recently, Higher Order Mutants (HOMs) have been proposed to model complex faults that are hard to kill, but whether HOMs can improve the performance of MBFL is still unknown. In this study, we investigate the impact of First Order Mutants (FOMs) and HOMs on MBFL in an MFL-Scenario. Moreover, we divide HOMs into three groups, i.e., accurate, partially accurate, and inaccurate HOMs, considering the mutation location in the program, to find which type of HOMs is more efficient in fault localization. Based on the empirical results on five real-world projects, we find that in an MFL-Scenario, HOMs can behave better than FOMs. The influence of the types of HOMs on the effectiveness of MBFL cannot be ignored. In particular, accurate HOMs can contribute more than inaccurate ones.Therefore, researchers should propose effective methods to generate this type of HOMs for future MBFL studies.Key words : Mutantion Based Fault Localization (MBFL); mutation testing; First-Order-Mutants (FOMs); Higher-Order-Mutants (HOMs)计算机系统应用 ISSN 1003-3254, CODEN CSAOBNE-mail: Computer Systems & Applications,2021,30(5):47−58 [doi: 10.15888/ki.csa.007942] ©中国科学院软件研究所版权所有.Tel: +86-10-62661041① 基金项目: 国家自然科学基金(62077003, 61902015)Foundation item: National Natural Science Foundation of China (62077003, 61902015)收稿时间: 2020-09-26; 修改时间: 2020-10-21, 2020-11-03; 采用时间: 2020-11-12; csa 在线出版时间: 2021-04-281 引言错误定位是识别程序执行过程中导致程序失败的元素的过程[1]. 在软件调试的众多活动中, 错误定位是其中最复杂耗时的活动之一, 尤其在大规模复杂程序中. 为了减小定位错误位置的人工成本, 研究人员提出了众多错误定位方法, 例如基于切片的方法[2], 基于频谱的方法[3,4], 基于变异的方法等[5].在众多自动化错误定位方法中, 基于频谱的错误定位(Spectrum-Based Fault Localization, SBFL)方法[3,4,6]是一种被广泛应用的方法. SBFL考虑到程序元素的二元覆盖矩阵, 但局限于其错误定位精度不高. 目前的研究显示基于变异的错误定位方法比最新的基于频谱的方法有更高的错误定位精度[7,8]. MBFL是一种基于变异测试[7]的方法[9]. 截止目前, MBFL分为两种技术: Metallaxis-FL[5]和MUSE[9]. 研究表明[10,11], Metallaxis-FL的错误定位效率和效果都要优于MUSE, 因此本文选择Metallaxis-FL作为MBFL原始方法.在MBFL中, 将一个程序p通过简单的语法变化生成一系列错误程序p'(也就是变异体), 生成变异体的规则被称为变异算子. 根据变异算子使用的次数, 变异体可以分成两类: 一阶变异体(First-Order-Mutants, FOMs)和高阶变异体(Higher-Order-Mutants, HOMs),其中FOMs是只使用一次变异算子生成, HOMs则是通过多次使用变异算子生成[12].在之前的MBFL研究中, 只有FOMs用于定位单错误程序[5,13]. 但Xue等[14]发现定位多错误更有困难,耗时且成本巨大, 同时多错误之间存在错误干扰现象,导致现有错误定位技术的定位效果较差. 另一方面, Offutt等[15]发现杀死HOMs是否能检测出复杂错误是不确定的. 为填补这项研究内容, 我们进行了一项大规模的实证研究, 研究HOMs是否能提升错误定位的精度, 同时分析不同类别的HOMs与多错误之间的关系.本文中, 我们着力研究FOMs和HOMs在多错误上的表现. 然后我们将HOMs分成三类研究不同HOMs 分类的错误定位效果. 在我们的实验设置中, 首先应用Agrawal等[16]提出的变异算子生成FOMs, 然后根据FOMs构建HOMs. 特别地, 针对多错误定位场景, 我们组合63个单错误程序生成100个多错误程序, 错误个数从2个至5个. 最后, 我们将HOMs分成3类用于比较不同类别HOMs的表现.2 背景与动机2.1 基于变异的错误定位技术基于变异的错误定位技术是一种基于变异分析[8]的错误定位方法. 变异分析通过对被测程序进行简单的语义改变, 生成与原始程序不同的版本. 这些人为植入错误的程序被称为变异体. 生成变异体的规则被称为变异算子. 本文采用Agrawal等[16]提出的C语言的变异算子.在变异分析中, 依据变异体和原始程序不同的输出, 使用变异体来评估测试用例的质量. 如果一个测试用例的执行结果不同于原始程序的结果, 那么这个变异体就被杀死, 记为killed或detected, 反之称这些变异体没有被杀死, 即not killed或live.传统基于变异的错误定位技术主要包含以下4个步骤:(1)获得失败测试用例覆盖的语句: 将测试用例T执行被测程序P, 获得覆盖信息和执行结果(pass或fail). 然后测试用例就可以区分为通过测试用例集合T p和失败测试用例集合T f. 被失败测试用例覆盖的语句集合记为cov f.(2)生成和执行变异体: 采用不同变异算子, 对失败测试用例覆盖的语句植入错误生成变异体. 对某一条语句s生成的变异体集合记为M(s). 然后将所有测试用例执行某一个变异体m, 依据执行结果, T k(m)为杀死变异体m的测试用例集合, T n(m)为未杀死变异体m的测试用例集合.(3)计算程序语句怀疑度: 变异体的怀疑度可以用不同的MBFL公式计算得到, 这些公式都基于以下4个参数: a n p=|T n∩T p|, a k p=|T k∩T p|, a n f=|T n∩T f|, a kf =|T k∩T f|. 其中, a np表示通过测试用例中未杀死变异体的数量, a kp表示通过测试用例中杀死变异体的数量, a nf表示失败测试用例中未杀死变异体的数量, a kf表示失败测试用例中杀死变异体的数量. 表1列举了3个研究人员常用的怀疑度计算公式(Ochiai[17], Tarantula[18], Dstar [19]). 本文的实验中使用Ochiai作为MBFL公式,因为其在MBFL研究中被广泛使用[5,9], 且效果好于其他公式[13]. 计算完变异体的怀疑度, 将某条语句对应的变异体集合的怀疑度最大值赋值为该条语句的怀疑度.(4)生成错误定位报告: 依据程序语句的怀疑度大小, 降序排列生成程序语句排名表. 开发人员可以根据排名表从上至下查找并修正程序错误.计算机系统应用2021 年 第 30 卷 第 5 期表1 常用怀疑度公式名称表达式Ochiai[17]S us(m)=a k f√(a k f+a n f)(a k f+a kp)Tarantula[18]S us(m)=a k fa k f+a kpa k fa k f+a n f+a kpa kp+a npDstar[19]S us(m)=a∗k fa kp+a n f基于上述过程的描述, 我们可以发现MBFL是基于“大部分失败测试用例杀死的变异体与程序错误有关”假设的研究工作, 其理论基础是基于以下两类假设[20]:(1)将变异体视为是原被测程序的一种潜在修复;(2)将变异体视为原被测错误程序的近似版本. 变异体执行测试用例后的状态有两种: 杀死(killed)和未杀死(not killed). 其中, 杀死状态分为: 被失败测试用例杀死(a kf)和被通过测试用例杀死(a kp). 在被失败测试用例杀死的变异体, 存在两种情况: (1)变异体的状态从失败变成通过, 即程序被修复; (2)变异体仍然为失败, 但输出与原始程序不同. 这两种情况都有助于揭示错误位置, 第一种程序修复的情况, 可以依据变异的位置来确定程序错误的位置. 第二种情况, 变异体的输出与原始程序不同, 其有可能是对错误位置变异而造成的输出不同, 此变异体的行为特征与错误程序更加相似. 另一方面, 被通过测试用例杀死的变异体, 其更可能是对正确语句进行变异, 造成输出与原始程序不同. 并且, Moon等[9]的研究发现, 错误语句生成的变异体在失败测试用例下更容易通过, 而正确语句生成的变异体在通过测试用例下更容易失败.同时, 从表1变异体怀疑度公式中可以看出, 变异体的怀疑度值与a kf呈正相关关系, 与a kp呈负相关关系. 本文通过计算变异体m在测试用例上a kf与a kp的差值来度量该变异体对错误定位的影响程度, 即贡献度C (Contribution):其中, T表示测试用例集, P表示被测程序. C(T, P, m)越高表示该变异体的贡献度越高.同理, 对变异体集合M的平均贡献度AC (Average Contribution)的计算公式为:其中, |M|表示集合中变异体的数量.目前研究人员对FOMs和HOMs之间的关系进行了研究. 如Gopinath等[21]的研究表明许多HOMs与它们组成的FOMs在语义上是不同的. 然而, Langdon 等[22]的研究表明被测试用例杀死的HOMs数量高于杀死FOMs的数量, 因此HOMs相对于FOMs, 更容易被测试用例检出.在早期的研究中, Offutt等[15]指出: 杀死n阶变异体是否意味着我们可以检出复杂错误还有待确定. 为了回答这个问题, 我们是第一个进行关于比较FOMs 和HOMs在定位程序错误上的控制实验的.2.2 研究动机在先前的研究中, 大部分MBFL技术基于单错误假设[5,9,13,23]. 然而, 实证研究表明[24], 单个程序失败往往是由系统中的多个故障触发的. Digiuseppe和Jones发现, 多个错误对错误定位的精度有负面影响[25].此外, Offutt的研究结果认为, 杀死n阶变异体是否可以检测到复杂的错误还有待确定[15]. 在Debory和Wong的研究中[26], 他们发现他们所提出的策略不能修复同一个程序中的多个错误, 是因为他们只考虑了FOMs.换句话说, 采用HOMs来定位或修复程序中的多个缺陷是一种潜在可行的方法. 因此, 本文主要通过实证研究HOMs在单错误和多错误程序上的定位效果, 并分析多错误与HOMs之间的关系.(1) HOMs分类依据变异体在程序中的不同变异位置, 我们将HOMs 分成3类. 为便于理解这3类变异体, 我们采用带有两个错误(f1和f2)的程序p作为例子. 首先, 我们HOM f1为变异了错误语句f1的HOMs集合且HOM f1∈HOMs; HOM f2变异了错误语句f2的HOMs集合且HOM f2∈HOMs.其次, 如图1所示, 我们将HOMs分为以下3类:类A: 准确高阶变异体(Accurate HOMs). 即, 同时在错误语句f1和f2上变异生成的HOMs. (HOM f1∩HOM f2).类B: 部分准确高阶变异体(Partially accurate HOMs).即, 只在错误语句f1或f2上变异生成的HOMs. (HOM f1\ HOM f2)∪(HOM f2\HOM f1)类C: 不准确高阶变异体(Inaccurate HOMs). 即, 在其他语句上变异生成的HOMs. (HOMs\(HOM f1∪HOM f2)上述3种HOMs反映出不同HOMs的生成方法.我们推测这3类HOMs在错误定位上有不同的表现.2021 年 第 30 卷 第 5 期计算机系统应用基于这种推测, 我们进行了一次大规模的实证研究来分析3类HOMs 的特性.(2) MBFL 例子为进一步说明我们的研究动机, 我们使用图2中的例子来说明FOMs 和HOMs 如何在MBFL 上使用.在图 2中, 从左到右, 第1列为被测程序的源代码,其中语句s 4和s 11为错误语句. 第2列为对应语句生成的变异体集合, 第3列划分为6部分, 分别是6个测试用例在变异体上的执行信息, 其中“1”表示测试用例杀死对应的变异体, “0”表示测试用例没有杀死对应的变异体, 第4和第5列表示计算得到的变异体怀疑度和语句怀疑度, 最后一列表示对应语句的排名. 在这个例子中, 每一个变异体的怀疑度都是用Ochiai 公式计算的. 在图2中有两个给出的结果, 一个是FOMs 的结果, 另一个是HOMs 的结果.HOM f 1HOMs HOM f 2B BA C图1 HOMs 分类图2 MBFL 例子使用FOMs 进行错误定位. 假设MBFL 技术在失败测试用例覆盖的每条语句只生成两个变异体, 该程序下共生成14个FOMs(列“FOMs”所示). MBFL 首先利用测试用例的杀死信息计算FOMs 的怀疑度(列“FOMs 怀疑度”所示). 接下来, 同一语句生成的变异体中, 取最大的怀疑度记为该语句的怀疑度. 最后, 在“排名”列中, MBFL 将错误语句s 4和s 11都排在第3位.使用HOMs 进行错误定位. 我们首先利用来自不同语句的两个FOMs 构造HOMs, 最后生成3类共14条HOMs(列“HOMs”所示). 计算得到的HOMs 怀疑度如列“HOMs 怀疑度”所示. 接着, 为保证公平性, 我们通过计算语句相关HOMs 怀疑度的均值作为该语句的怀疑度. 以语句s 1为例子, 与s 1相关的HOMs 有3个(HOM 6, HOM 11和HOM 13), 其对应的怀疑度分别为1.00, 0.41, 和1.00.因此, 计算得到的语句s 1的怀疑度为Sus (s 1)=(1.00+0.41+1.00)/3=0.80.最终, 使用HOMs 计算得到的语句怀疑度如列“语句怀疑度”所示. 最终,HOMs 将错误语句s 4和s 11分别排在第3名和第2名.基于上述的例子, 我们可以发现FOMs 将两条错误语句排在前五名, 然而HOMs 将错误语句排在前三计算机系统应用2021 年 第 30 卷 第 5 期名, 表明HOMs在这个例子中有更好的错误定位效果.更进一步, 在高阶变异错误定位中, 三类变异体对错误定位有不同的贡献, 结合式(2), 准确HOMs的平均贡献度为:部分准确HOMs的平均贡献度为:不准确HOMs的平均贡献度为:从以上结果可以看出, 准确HOMs的平均贡献度等于部分准确HOMs, 不准确HOMs的平均贡献度最低. 据我们所知, 本文首次研究FOMs和HOMs在多错误程序上的定位效果. 更进一步, 我们研究了三类HOMs 的错误定位效果并分析其差异.3 实验设计本章讨论实验中使用的程序和实验设计流程, 用以解决提出的研究问题. 图3中显示了实验研究设计流程. 下面将依次介绍设计流程的每个部分.程序植入变异FOMsHOMs多错误定位评估定位效果图3 实验设计流程3.1 实验程序本文选择了错误定位领域常用的软件基准程序库(Subject Infrastructure Repository, SIR)[27]中的5个程序作为实验对象, 分别为printtokens2, schedule2, totinfo, tcas和sed. 这些程序均为开源的C程序, 其中前4个程序来自西门子套件(Siemens Suite), sed是大型的真实错误程序. 实验中使用的错误版本和测试用例均可在SIR库中下载. 这些程序在高阶变异测试领域中广泛使用[12,28–30], 同时也经常应用在错误定位等相关的研究中[18,19,23,26]. 因此我们认为本文测试数据集所得出的结论具有一定的普适性.表2列出了基准程序的具体信息, 包括程序名称,程序所有的版本数量和实验中使用的数量, 程序的平均代码行以及FOMs和HOMs的数量. 其中, FOMs列的“生成的数量”子列表示对应程序生成的FOMs总数,而“使用”子列表示实验中实际运行的FOMs总数. 本文共选择了63个单错误版本程序作为实验对象, 部分版本因为错误语句无法生成有效变异体而导致测试用例无法检测出该版本的错误, 或因为执行过程中出现异常, 无法收集到完整的执行信息.表2 实验基准程序及变异体信息程序版本(使用)行数测试用例FOMs HOMs生成(使用)生成(使用) printtokens29(6)508411521 864(11 548)131 184(23 892) schedule210(4)30727108451(4698)50 706(8038) totinfo23(18)406105239 351(22 476)236 106(37 254) tcas41(30)173160827 245(19 062)163 470(62 991) sed9(5)712536064 307(32 302)385 842(47 551)3.2 生成变异体为了研究FOMs和HOMs在单错误和多错误程序中的表现, 实验首先需要生成FOMs和HOMs. 在这个步骤中, 我们收集被失败测试用例覆盖的程序语句, 通过变异算子植入错误到这些语句, 进而生成相应的变异体. 表3列出了Agrawal等[16]提出的10种经典C语言变异算子.表3 经典C语言变异算子变异算子描述例子CRCR必要的常数替换a = b + *p → a = 0 + *p OAAN算术运算符变异a + b → a * bOAAA算术赋值运算符变异a += b → a -= bOCNG否定谓词逻辑if(a) → if(!a)OIDO自增/自减变异++a → a++OLLN逻辑运算符变异a && b → a || bOLNG否定逻辑a && b →!(a && b)ORRN关系运算符变异 a < b → a <= bOBBA位赋值运算符变异a &= b → a |= bOBBN位运算符变异 a & b → a | b 对于生成FOMs, 我们对fail测试用例覆盖的每条语句使用所有变异算子进行变异, 每次只对一条语句变异, 最终生成161 218个FOMs. 表2“FOMs”列2021 年 第 30 卷 第 5 期计算机系统应用的“(使用)”子列中列出了每个程序所使用的FOMs 数量.对于生成高阶变异体, 在已有高阶变异测试的研究中, 对变异体阶数的研究有所不同, 有关注于阶数较低(2至4阶)的研究[15,28,29,31], 也有关注阶数较高(2至15阶)的研究[12,32–35]. 本文首次考虑将高阶变异体应用于多错误定位, 然而在实际程序中错误数量是不可知的, 因此结合前人的研究成果, 我们选择生成2至7阶的变异体来模拟多错误情况. 在此基础上, 为了进一步探究不同变异位置的高阶变异体与错误定位的关系, 我们依据不同的变异位置对变异体进行了划分, 并通过理论和实验分析发现错误语句处生成的变异体(如准确HOMs和部分准确HOMs)具有更优的错误定位效果. 另一方面, 考虑到MBFL巨大的执行开销, 我们选择生成每阶HOMs的数量与FOMs数量相同来减少HOMs的数量. 假设生成1000个FOMs, 然后2阶变异体和3阶变异体的数量也是1000; 因此最终生成的HOMs为6000. 在我们的实验中, 采用一阶变异算子FOP构建HOMs. 具体来说, 首先随机选择k条失败测试用例覆盖的语句, 然后对每条选择的语句, 随机选择一个与其相对应的一阶变异算子, 最终生成一个k阶变异体. 实验共生成967 308个HOMs, 其中实际使用的数量如表2所示(“HOMs”列的“(使用)”子列).3.3 构建多错误定位场景为了构建实验中的多错误定位场景, 我们通过随机组合SIR库中的原始单错误程序获得多错误程序.每个多错误程序中的错误数量是2到5个. 最终生成100个版本的多错误程序. 最后, 依据多错误程序生成的变异体, 运行变异体收集测试结果用于效果分析. 3.4 评估MBFL的效果为了评估FOMs和HOMs在MBFL中的定位效果, 我们使用了3种研究人员常用的评估指标[36–39].(1) EXAM: EXAM[36,37]是错误定位领域广泛使用的评价指标之一, 用于评估开发人员找到准确错误位置之前需要检查的程序实体的比例, 因此EXAM值越小表明对应的错误定位效果越好[36,37]. EXAM的公式定义如下:式(6)中, 分子是错误语句的排名, 分母是需要检查的程序语句数量的总和. rank的计算公式为:式(7)中, i表示怀疑度值大于错误语句的正确语句的数量, j表示怀疑度值等于错误语句的正确语句的数量.为更接近真实定位场景, 我们选择第i+1位排名与第i+j位排名的平均作为错误语句的排名.(2) Top-N: Top-N用于评估排名前N个程序候选元素中, 能定位到真实错误的个数[38]. 在Kochhar等的研究发现, 73.58%的开发者只检查排名前5的程 序元素, 并且几乎所有的开发者认为检查排名前10的程序元素是可接受的上限[39]. 因此, 参考之前的研究[36,38],我们将N设定为1, 3, 5. 同时, 假设两条语句有相同的怀疑度, 我们同样计算这些语句排名的平均值(如式(7)所示). Top-N越大表明对应的错误定位技术越好.(3) MAP: MAP (Mean Average Precision)是信息检索领域用于评估语句排序质量的指标, 是所有错误平均精度的平均值[40]. AP (Average Precision)的计算公式如下:式(8)中, i是程序语句的排名, M是排名列表中语句的总数, pos(i)是布尔函数, pos(i)=1表示第i条语句是错误的, 反之pos(i)=0表示第i条语句是正确的. P(i)是每个排名i的定位精度.MAP是错误集合的AP的平均值, MAP越大表明对应的错误定位技术越好.4 实验结果4.1 研究问题为了评估HOMs是否能提高错误定位的精度, 本文从错误定位精度角度出发, 提出如下研究问题:(1) RQ1: 与FOMs相比, 不同阶数的HOMs的多错误定位精度如何?(2) RQ2: 与FOMs相比, 不同类型的HOMs的多错误定位精度如何?4.2 实验结果为探究RQ1, 我们首先针对多错误程序生成一阶计算机系统应用2021 年 第 30 卷 第 5 期HOMs, 然后运行这些变异体计算得到每个程序对应的EXAM, Top-N和MAP. 本文使用Metallaxis-FL 为原始MBFL对照组, 并生成2阶到7阶的HOMs.图4中展示了MBFL使用FOMs和不同阶的HOMs的错误检查比例. x轴表示代码检查比例, y轴表示不同程序所有错误版本查找到的累积错误比例,对应的曲线越接近y轴表明对应的变异体的检测错误数量越多, 因此对应的变异体错误定位效果更好.0102030405060 Percentage of code examined (%)(a) Printtokens205101520253035 Percentage of code examined (%)(b) Schedule20102030405060 Percentage of code examined (%)(c) Totinfo 0510152025Percentage of code examined (%)(d) Tcas0246810Percentage of code examined (%)(e) SedFOMs2-HOMs3-HOMs4-HOMs5-HOMs6-HOMs7-HOMS HOMS图4 FOMs 与不同阶 HOMs 的代码检查比例比较从图4(a)中可以看出, 7-HOMs检测20%的程序代码能检测到68%的错误, 而FOMs只能检测到55%的错误. 同理, 在schedule2, totinfo和sed上可以看出, HOMs检测更少的代码能检测到更多的错误, 但在tcas程序上FOMs的检测效果优于HOMs.从Top-1, Top-3, Top-5指标来看, FOMs在2错误程序上的定位效果比HOMs更好, 而HOMs在3错误和5错误程序上的表现比FOMs更好. 表4中显示了FOMs和HOMs在多错误程序定位场景下排在前1, 3, 5位错误的数量. 图中包括4种错误数量的程序统计结果. 对2错误程序, FOMs和各阶高阶变异体都将19个错误排在第一名. 除了7-HOMs, FOMs比其他阶数的HOMs的Top-3, Top-5要更高. 对3错误程序, 3-HOMs 比FOMs和其他阶数的变异体在Top-3和Top-5上更高. 同时FOMs在4错误程序中, Top-3和Top-5上的表现略优于HOMs. 最后, 在5错误程序上, FOMs、6-HOMs和7-HOMs的Top-1值最高, 而6-HOMs和3-HOMs分别在Top-3和Top-5上表现最好.从MAP指标来看, FOMs在4错误程序上表现最优, 在其他错误程序上与HOMs有相近的表现. 从表5可以看出, FOMs在4错误程序上的MAP均值最高.在其他错误程序上与HOMs有相近的表现, 例如3错误程序FOMs的MAP均值与4阶到7阶的变异体的MAP均值相同.综上可以看出, HOMs在一些程序上的检错能力优于FOMs. 同时, FOMs在2错误和4错误程序上的定位效果较好, 而HOMs在3错误和5错误程序上的效果更好. HOMs在3错误和4错误程序上有更大的Top-N值, 并且在一些阶数的HOMs下, 计算的MAP 均值都要高于FOMs.2021 年 第 30 卷 第 5 期计算机系统应用表4 FOMs和不同阶HOMs的TOP-N值比较指标程序FOMs2-HOMs3-HOMs4-HOMs5-HOMs6-HOMs7-HOMs HOMs Top1351351351351351351351352错误printtokens2688688688688688688688688 schedule2466466466466466466466466 totinfo356356356356356356356356 tcas267245256256256256266234 sed455455455455455455455455 Total1930321928301929311929311929311929311930311927293错误printtokens2589589589589589688688589 schedule2344344244344344344344344 totinfo456456456456456456456456 tcas555455556556555555556456 sed311322222311311311311312 Total2023251924261824272023262023252023252023261923274错误printtokens2344344344344344344344344 schedule2345345345345344345345345 totinfo235235235235235235235235 tcas166112246122134144155235 sed344344344344344344344344 Total1221241216201319241217201218221219221220231318235错误printtokens2233233233233233233233233 schedule2345345345345344345345345 totinfo235235235235235235235235 tcas345224029002145356347223 sed333333333333333333333333 Total131721121520101525101318111721131823131723121519表5 FOMs和不同阶HOMs的平均MAP值比较类型程序FOMs2-HOMs3-HOMs4-HOMs5-HOMs6-HOMs7-HOMs HOMs2错误printtokens20.00220.00220.00220.00220.00220.00220.00220.0022 schedule20.00240.00240.00240.00240.00240.00240.00240.0024 totinfo0.00180.00180.00180.00180.00180.00180.00180.0018 tcas0.00420.00310.00330.00380.00360.00380.00390.0041 sed0.00010.00010.00010.00010.00010.00010.00010.0001 Average0.00210.00190.00200.00210.00200.00210.00210.00213错误printtokens20.00150.00150.00150.00150.00150.00150.00150.0015 schedule20.00100.00100.00100.00100.00100.00100.00100.0010 totinfo0.00150.00150.00150.00150.00150.00150.00150.0015 tcas0.00410.00320.00320.00410.00410.00410.00400.0040 sed0.00000.00000.00000.00000.00000.00000.00000.0000 Average0.00160.00140.00140.00160.00160.00160.00160.00164错误printtokens20.00070.00070.00070.00070.00070.00070.00070.0007 schedule20.00100.00100.00100.00100.00100.00100.00100.0010 totinfo0.00070.00070.00070.00070.00070.00070.00070.0007 tcas0.00140.00120.00100.00130.00100.00110.00110.0013 sed0.00000.00000.00000.00000.00000.00000.00000.0000 Average0.00080.00070.00070.00070.00070.00070.00070.00075错误printtokens20.00040.00040.00040.00040.00040.00040.00040.0004 schedule20.00070.00070.00070.00070.00070.00070.00070.0007 totinfo0.00060.00060.00060.00060.00060.00060.00060.0006 tcas0.00170.00130.00140.00080.00050.00110.00170.0016 sed0.00000.00000.00000.00000.00000.00000.00000.0000 Average0.00070.00060.00060.00050.00040.00060.00070.0007计算机系统应用2021 年 第 30 卷 第 5 期由于FOMs只使用一次变异算子生成而HOMs使用多次变异算子生成. 因此在对同一个程序变异生成等量变异体时, HOMs有更大的概率变异到错误语句,从而增大变异体被杀死的概率, 相应a kf值也会更高,则变异体怀疑度也越高, 最终计算的语句怀疑度也越高, 其定位效果也更优(如图4(a), 图4(c), 图4(e); 表4“3错误”行, “5错误”行; 表5 “3错误”行, “5错误”行).但如果HOMs中更多变异体是对正确语句变异生成的, 那么相应的a kp值会更高, 计算的语句怀疑度值也更高, 错误定位效果将更差. (如图4(d); 表4 “2错误”行, “4错误”行; 表5 “2错误”行, “4错误”行). 综合比较可以得出HOMs在一定程度上能提高多错误定位的效果.为探究RQ2, 我们首先收集多错误程序所有版本下3类HOMs的EXAM值, 然后分别计算Top-N和MAP指标. 为了便于展示, 我们将3类HOMs分别表示为“Accurate”(准确HOMs), “Part-accurate”(部分准确HOMs)和“Inaccurate”(不准确HOMs). 图5表示MBFL使用FOMs和3类HOMs在不同程序上所有版本的错误检查比例. 从图5(a)–图5(c)中可以看出Accurate HOMs与FOMs有相近的表现, 并且Accurate HOMs的检测效果优于FOMs. 而在tcas和sed (图5(d)、图5(e))程序上, Part-accurate的检测效果更好, 检查更少量的代码而找到更多的错误. 同时在所有程序上, Inaccurate 的检测效果最差.0102030405060 Percentage of code examined (%)(a) Printtokens205101520253035 Percentage of code examined (%)(b) Schedule20102030405060 Percentage of code examined (%)(c) Totinfo0510152025Percentage of code examined (%)(d) Tcas024681012Percentage of code examined (%)(e) SedFOMs Accurate Part-accurate Inaccurate HOMs图5 FOMs 与不同类 HOMs 代码检查比例比较从Top-N指标来看, 准确HOMs比FOMs和另外两类变异体能将更多错误排在前1, 3, 5名. 表6中显示, 在2错误程序上, 准确变异体与FOMs能够排列相同数量的错误在Top-1, Top-3和Top-5, 而在其他错误程序版本中, 准确HOMs的Top-N指标均为最大. 同时可以发现, 部分准确HOMs在4错误和5错误程序上, 有更高的Top-5值. 然而不准确HOMs 的表现最差.从MAP指标来看, 准确HOMs的表现同样优于FOMs, 部分准确和不准确HOMs. 表7中准确HOMs与FOMs在2错误程序下有相同MAP平均值, 而在3, 4, 5错误程序下, 准确HOMs仍然比另外两类变异体的定位效果好, 其MAP平均值分别为0.0017, 0.0009, 和0.0008.综上所述我们可以发现, 准确HOMs的错误定位精度高于FOMs、部分准确HOMs和不准确HOMs.在一些情况下, 部分准确HOMs有更好的定位效果, 但普遍情况下不准确HOMs的表现都很差.2021 年 第 30 卷 第 5 期计算机系统应用。

Chapter 2 Model Checking

Chapter 2 Model Checking

E (Y ) = a,
Consider the trans. T = h(Y ) s.t. V (T ) = σ2 , then h(y ) = h(a) + (y − a)h0 (a) Taylor expansion at the mean ⇒ {h(y ) − h(a)}2 ≈ (y − a)2 {h0 (a)}2 ⇒ V (T ) + V (Y ){h0 (a)}2 E (h(y ) + h(a) ⇒ σ 2 = a(h0 (a))2 ¯ ¯ ¯ dh ¯ σ ¯ = √ ¯ ⇒ dy ¯a y¯ √ a ⇒ h(·) = k y Example 3 (s.d. proportional to mean of indept variable) Take expectation In ~ (~) independent of a,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Thus the F test is: S.V. L.F. P.E. Total
k P
SS ni (¯ yi − y ˆi ) 1P P (yij − y ¯i )2 SSE
2
d.f. k−2 n−k n−2
k P
mss ni (¯ yi − y ˆi ) /(k − 2) P1P (yij − y ¯i )2 /(n − k)
n−k
k−2
ˆ 2 d.f. loss due to α ˆ & β.
Note: Full model = Alternative hypothesis H1 Reduced model = Null hypothesis H0
Lack of Fit Example

r语言经验分布函数

r语言经验分布函数

r语言经验分布函数一、简介R语言是一种广泛使用的统计分析软件和编程语言,它拥有丰富的数据分析和可视化功能,包括经验分布函数。

二、什么是经验分布函数?经验分布函数(Empirical Distribution Function,EDF)是一种描述样本数据的累积分布函数。

它用于估计总体概率分布的形状和参数,并可以用于检验假设、比较不同样本等。

三、如何计算经验分布函数?计算经验分布函数的步骤如下:1. 将样本数据按照大小排序。

2. 对于每个数据点,计算其在排序后的位置占总样本量的比例。

3. 将这些比例值作为纵坐标绘制出来,横坐标为对应的数据点。

四、R语言中如何实现经验分布函数?在R语言中,可以使用ecdf()函数来计算经验分布函数。

该函数返回一个对象,可以通过plot()函数将其绘制出来。

1. 定义样本数据假设我们有一个包含100个随机数的向量x:x <- rnorm(100)2. 计算经验分布函数并绘图使用ecdf()和plot()函数来计算并绘制经验分布函数:f <- ecdf(x)plot(f, main = "Empirical Distribution Function", xlab = "x", ylab = "F(x)")其中,main和xlab分别用于设置图表标题和横坐标标签,ylab用于设置纵坐标标签。

3. 添加理论分布函数如果需要将经验分布函数与理论分布函数进行比较,可以使用lines()函数来添加理论分布函数的曲线:curve(pnorm, add = TRUE, lty = 2)其中,pnorm为正态分布的累积分布函数,lty用于设置曲线类型(虚线)。

五、实例演示下面通过一个实例来演示如何使用R语言计算和绘制经验分布函数。

1. 导入数据假设我们有一个包含100个身高数据的csv文件“height.csv”,其中第一列为序号,第二列为身高(单位:cm):```1,1652,1733,1804,1755,1686,1707,1788,1639,17210,166...```使用read.csv()函数将其导入到R中:data <- read.csv("height.csv", header = FALSE)x <- data[, 2]其中header参数用于指定是否包含表头行,默认为TRUE。

Model validation and predictive capability

Model validation and predictive capability

Model validation and predictive capability for the thermalchallenge problemScott Ferson a,*,William L.Oberkampf b ,Lev Ginzburg caApplied Biomathematics,100North Country Road,Setauket,NY 11733,USAbValidation and Uncertainty Estimation Department,Mail Stop 0828,Department 1544,Post Office Box 5800,Sandia National Laboratories,Albuquerque,NM 87185-0828,USAcDepartment of Ecology and Evolution,Stony Brook University,Stony Brook,NY 11794,USAReceived 8June 2007;received in revised form 5July 2007;accepted 5July 2007Available online 23December 2007AbstractWe address the thermal problem posed at the Sandia Validation Challenge Workshop.Unlike traditional approaches that confound calibration with validation and prediction,our approach strictly distinguishes these activities,and produces a quantitative measure of model-form uncertainty in the face of available data.We introduce a general validation metric that can be used to characterize the dis-agreement between the quantitative predictions from a model and relevant empirical data when either or both predictions and data are expressed as probability distributions.By considering entire distributions,this approach generalizes traditional approaches to validation that focus only on the mean behaviors of predictions and observations.The proposed metric has several desirable properties that should make it practically useful in engineering,including objectiveness and robustness,retaining the units of the data themselves,and gener-alizing the deterministic difference.The metric can be used to assess the overall performance of a model against all the experimental observations in the validation domain and it can be extrapolated to express predictive capability of the model under conditions for which direct experimental observations are not available.We apply the metric and the scheme for characterizing predictive capability to the thermal problem.Ó2007Elsevier B.V.All rights reserved.Keywords:Validation;Predictive capability;Thermal challenge problem;Area metric1.IntroductionAlthough there is increasing consistency in the formal definition of the term ‘validation’[1,2],there is still wide disagreement about the precise steps involved in the valida-tion process.In this paper,we use the terminology and the following three steps identified by [3]:(i)Validation assessment :assessment of model accuracy by comparison of predictions against experimental data,(ii)Model extrapolation :extrapolation (or possibly inter-polation)of the model to the intended use,and(iii)Adequacy decision :determination of whether themodel is adequate for the intended use.Validation is often contrasted with calibration or updat-ing,which is the fitting of a model to empirical observa-tions to maximize the match between predictions and observations,usually by changing model parameters but sometimes by introducing or omitting model components.Because we want to distinguish assiduously between these two activities,we will assume that a model is fixed while a validation is undertaken.As soon as the model is changed in structure or parameter values in any way to account for data,the activity is no longer validation in our strict sense.The problem of validation might initially seem to be a relatively straightforward one:all one needs to do is take the prediction the modeler gives us and compare it to the0045-7825/$-see front matter Ó2007Elsevier B.V.All rights reserved.doi:10.1016/j.cma.2007.07.030*Corresponding author.Tel.:+16317514350;fax:+16317513435.E-mail address:scott@ (S.Ferson)./locate/cmaAvailable online at Comput.Methods Appl.Mech.Engrg.197(2008)2408–2430new observation the empiricist gives us and see whether they match.Clearly,the answer could be that they match perfectly,that the model is a little off,or that the model is way off.But this simplicity is immediately corrupted by the complexity of the real world.In fact,analysts must have strategies to address several questions:1.What if the prediction is a probability distributionrather than a point value?2.What if there is experiment-to-experiment variability inthe measured data?3.What if there are statistical trends in the experimentaldata?4.What if there are multiple predictions about differentoutputs to be assessed?5.What if the predictions from the model are extremelyexpensive to compute?6.What if the data were collected under conditions otherthan the intended application?7.What if there is non-negligible experimental measure-ment uncertainty in the data?This paper addresses thefirst six of these issues through the example of the Thermal Challenge Problem[4,5],which is thefirst of three problems considered at the Sandia Val-idation Challenge Workshop[6]in Albuquerque,New Mexico,over21–23May2006.We interpreted the chal-lenge problem to involve two goals.Thefirst goal is to measure in some objective way the conformance of predic-tions from the model with the experimental measurements. The second goal is to use this measure to characterize the reliability,i.e.,estimate the uncertainty,of other predic-tions made by the model.Validation is not about checking whether a model is right or wrong per se.Indeed,we believe,as George Box famously asserted[7],that all models of any physical reality are wrong,at least in the narrow sense that they can never be perfect.Validation is about assessing the accuracy of the model and assessing whether a model is good enough for some intended purpose.For a deterministic model,valida-tion can be a fairly straightforward affair.The model makes a point estimate for its prediction about some quan-tity.This prediction would be compared against one or more measurements about that quantity and the differ-ence(s)would be understood as a measure of how accurate the model was.A model could be consistently inaccurate and yet close enough for its purpose.Likewise,even a highly accurate model might not be good enough if it is needed for some delicate,high-consequence decisions.Two pervasive issues complicate these comparisons in general validation problems.Thefirst is that,today,most serious simulation models generate entire distributions rather than point estimates as their predictions of system response quantities.These distributions often characterize the stochastic variability of these quantities and perhaps the epistemic uncertainty about these quantities.In many cases,experimental observations are small collections of numbers,although they can be abundant enough to be conveniently characterized as distributions.The second issue is the data available for validation may not be directly relevant to the prediction of interest.In par-ticular,we might be able to collect data under conditions that are similar,but not identical,to those for which a pre-diction is desired.This necessitates some sort of extrapola-tion that will allow us to characterize the validity of a model for prediction even when no immediately relevant data are available to compare against this prediction.In a strong sense,any forecast about future or general predic-tive capability is necessarily an extrapolation of some kind, even if directly relevant data are available,because such a forecast is always about hypothetical data that might be observed,rather than merely a summary of the consistency with past observations.Section2gives a synopsis of the thermal challenge prob-lem.Section3considers a risk-analytic approach to the problem that translates the variability observed in the material characterization data into exceedance probabili-ties of the system response quantity temperature.The anal-ysis is refined and extended in Section4to account for a subtle trend in the material characterization data that causes values of one of the material properties to depend partially on the material’s temperature.Section5suggests a general validation metric and procedures for its use that can be applied when a model’s predictions take the form of probability distributions.Section6applies this metric to the challenge problem to assess the overall performances of the models developed in Sections3and4relative to all of the experimental observations over the validation domain.Section6also considers the extrapolation from the measured performances of the model in the face of the individual data points to express the predictive capabil-ity of the model under conditions of regulatory interest. Section7answers several specific questions posed as part of the validation challenge and Section8offers conclusions and outlines further research needs.2.Sandia validation challenge problemThe formulation and numerical details of the thermal challenge problem are given in[4]and will not be reiterated here except in briefest outline.The problem consists of a mathematical model,three sets of experimental data which differ in size(‘low’,‘medium’and‘high’),and a regulatory requirement.The mathematical model is of the tempera-ture under heating of a device constructed of some material and has the formTðx;tÞ¼T iþqLðk=q C pÞt2þ1Àxþ1xÀÁ2hÀ22P6n¼112expÀn2p2ðk=q C pÞtL2cos n p xÀÁ!;t>0;T i;t¼0;8>>>><>>>>:ð1ÞS.Ferson et al./Comput.Methods Appl.Mech.Engrg.197(2008)2408–24302409where T is temperature,x is location within the material,t is time since the onset of heating,T i is the initial ambient temperature,q is the heatflux,L is the thickness of the material,and k and q C p are properties of the material. The regulatory requirement isProbð900 C<T x¼0cm;t¼1000s;Ti¼25 C;q¼3500W=m2;L¼1:90cmÞ<0:01;ð2Þgiven material properties of the device associated with a particular manufacturing process.The challenge is to use available empirical data to estimate whether the regulatory requirement in Expression(2)is satisfied.The empirical data includes material characterization data consisting of several measurements of the material properties k and q C p,and‘‘ensemble”and‘‘accreditation”data consisting of experimental observations of temperature for various values of x,t,q and L defining a validation domain,none of which were collected at the conditions of regulatory interest.In this paper we use probability distributions to charac-terize both the observed variability in data and the forecasted variability of predictions.The predicted temper-ature would be a distribution because,although the values of x,t,T i,q,and L are prescribed for us in the statement of the problem,the values of the material properties k and q C p are only known by sample data,which ought to be characterized by probability distributions.The regulatory requirement prescribes a critical temperature of900°C and a critical probability of1%.If the predicted distribu-tion for temperature ventures anywhere into the region of the temperature-probability plane where values larger than 900°C are more probable than0.01,then we know that we are out of compliance with the regulatory condition.In principle,it would have been possible for those who designed the challenge problem to simply present us with the predictions from the model(expressed as distributions) and the observed data(expressed as collections of num-bers),rather than giving us the model and asking us to gen-erate the predictions.In a sense,the activity of producing predictions is not part of validation per se,but rather part of modeling.However,meeting the challenge in this paper requires delving into the modeling process to some extent. We recognize,moreover,that modeling and validation are usually intricately intertwined in practice,just as validation and calibration are often intertwined.Although we intend to distinguish between the generation of predictions and the validation of those predictions against data—and the focus of this paper is on the latter—for the purposes of the challenge,we don the modeler’s hat where necessary and implement(and even slightly modify)the model to cre-ate predictions from it to be compared to observations.The validation challenge problem was purposefully designed with some model weaknesses and inconsistencies in order that it realistically reflect the common situations analysts encounter.Thus,there are assumptions that seem arguable or incorrect,including assertions that certain parameters are constants,that interacting variables are mutually independent,and that there is no uncertainty in measured data.We believe that a comprehensive validation study demands that such assertions be critically examined, dealt with in a realistic manner,and perhaps rejected in favor of more realistic assumptions.3.Risk-analytic approach to stochastic variationThe materials characterization data described in the challenge problem suggest there is variability in both the thermal conductivity k and heat capacity q C p.The step functions in Fig.1show the cumulative empirical distribu-tion functions for the values for q C p and k from the‘med-ium’materials characterization data.These observed patterns likely understate the true variabilities in these parameters because they represent only20observations for each of them.If we had observed different sets of val-ues,it is likely we would have seen slightly different pat-terns,and we may well have seen some values above or below the observed ranges from the20values.To model this possibility of more extreme values than were seen among the limited samples,risk analysts commonlyfit a distribution to data to model the variability of the underly-ing population.We used normal distributions for this pur-pose,configured so that they had the same mean and2410S.Ferson et al./Comput.Methods Appl.Mech.Engrg.197(2008)2408–2430standard deviation as the data themselves,according to the method of matching moments[8].Thefitted normal distri-butions are shown in Fig.1as the smooth cumulative dis-tributions.We do not consider thisfitting of distributions to be model calibration because the distributions are not selected with reference to the system response quantity of temperature.Instead,the distributions merely summarize the material characterization data which are not otherwise used in the validation process.We alsofitted normal distributions to the‘low’and ‘high’data sets as well.The computed moments for the three data sets for each parameter are given in the follow-ing table:Low (n=6)Medium(n=20)High(n=30)Thermal conductivity k,W/m°CArithmetic mean0.060020.061870.06284 Standard deviation0.010770.009230.00991 Volumetric heat capacity q C p,J/m3°CArithmetic mean405,500402,250393,900 Standard deviation42,06539,51136,251 To express the uncertainty about the temperature pre-diction that arises from the stochastic variability of k and q C p observed in the characterization of the material prop-erties,we need to project these normal distributions through the temperature response model in Expression (1).The projection can be effected with a straightforward Monte Carlo simulation[8,9].Thermal conductivity and volumetric heat capacity are the only distributional vari-ables in the simulation;the rest are constants:Variable Symbol Value(s)UnitsThermal conductivity k Normal(0.06187,0.00923)W/m°CVolumetric heat capacity q C p Normal(402250,39511)J/m3°CHeatflux q3500W/m2 Thickness L0.019m Initial temperature T i25°C Location x0m Time t1000s There could also be variability in some of the other inputs too,especially in variables such as thickness L and heatflux q.We neglect these variabilities here only because the challenge problem instructs us to.Whenever there is stochasticity in more than a single variable,there is a possibility that correlation or depen-dence between the variables may influence any arithmetic functions of those variables[10,11].We looked for evidence of such dependence in the paired thermal conductivity and heat capacity data collected during materials characteriza-tion.Fig.2shows the scattergram of these two variables for the‘medium’data set,which reveals no apparent trends or evidence of statistical dependence.The Pearson correla-tion coefficient between these twenty points is0.0595, which is not remotely statistically significant(p)0.5,df= 18).Because there are no physical reasons to expect corre-lations or other dependencies between these variable,at least over the variability ranges considered here,it would seem reasonable to assume that these quantities are statis-tically independent of one another.Plotting and correlation analysis for the‘high’and‘low’data sets gave qualitatively similar results.We used10,000replications in the Monte Carlo simula-tion,although many fewer replications could have sufficed if the model had been computationally intensive.We imple-mented the simulation in the R programming language [12].When the stochasticity of k and q C p is projected through the heating model,it produces a distribution for the surface temperature after1000s.The output distribu-tion produced is displayed as the complementary cumula-tive distribution in Fig.3so that the ordinate gives theS.Ferson et al./Comput.Methods Appl.Mech.Engrg.197(2008)2408–24302411probability that the random variable temperature exceeds the value given on the abscissa.We use the complementary display for temperature because it makes it easier to visual-ize the probabilities of large values,which are the focus of concern in the regulatory statement.This result suggests that the probability of the temperature being larger than 900°C is0.22,much larger than the target of0.01.This result argues strongly that the system is out of compliance with the regulatory requirement in Expression(2).This is obvious in the depiction in Fig.3because the distribution makes a deep incursion into the upper,right quadrant of the temperature-probability plane.The mean of the pre-dicted temperature distribution is about850°C,which is similar to the value838°C from a deterministic calculation based on mean values for k and q C p.Although the mean is well below the temperature threshold of900°C,the sto-chasticity observed in the material properties implies that the regulatory threshold is regularly exceeded much more often than1%of the time.3.1.Robustness of the predictionThe robustness of this distributional result can be esti-mated by exploring its sensitivity to the choice of the distri-butional parameters for the k and q C p inputs,the normal shapes used to model their variation,and the assumptions about intervariable dependence.(An analyst might also want to explore the effects of variability or measurement error in the other inputs of the heating model,and the effect of uncertainty about the structure of the heating equation itself,but the challenge problem instructs us to ignore these uncertainties.)Because there were few observations col-lected in the material characterization study(6,20and30 per variable in the‘low’,‘medium’and‘high’data sets respectively),the estimates of the means and variances used to parameterize the normal distributions are associated with an appreciable degree of sampling uncertainty.Of course the effect of such sampling uncertainty on the esti-mate of thefinal temperature distribution is for the most part symmetric.That is,it creates bands of uncertainty on either side of the central distributional estimate dis-played in Fig.3.Therefore,if we were to enlarge our assess-ment by accounting for the sampling uncertainties about the k and q C p inputs,the result would be that the excee-dance probability estimate of0.22would expand to an interval around that value.From a decision maker’s point of view,this could only make the outcome seem worse for the hypothesis that the system is in compliance with the regulatory requirement in Expression(2),which specifies the probability not be larger than0.01,because the uncer-tainty assessment reveals it might be even larger than0.22.An assessment accounting for uncertainty about the input distribution shapes has similar import.The model for k and q C p that would produce the smallest dispersion in thefinal temperature distribution,while still being con-sistent with the observed variability for k and q C p,would use the empirical distribution functions for these two inputs rather than parametric distributions such as nor-mals.The empirical distribution functions simply summa-rize the observed data actually seen in the material characterization data.They are nonparametric estimates of the distributions because they do not require the analyst to select any parameters to specify the distribution.The model based on these distributions would be at least argu-ably reasonable,although it is likely to understate the chances of extreme values of the inputs.The result of the simulation based on this resampling strategy is very similar to the result shown in Fig.3;in fact thefinal temperature distributions are largely indistinguishable from one another given the line thickness used in the display.The probability of exceeding900°C increases slightly to about0.25,but the distribution tails contract so that,for instance,1050°C is the largest possible temperature(corresponding to the smallest observed values for k and q C p).The only way that we might be in compliance with the regulatory requirement is if our risk analysis has overesti-mated the variation in the resultant temperature distribu-tion.One assumption possibly worth reconsideration is whether k and q C p really have stationary distributions that are independent of temperature changes in the material.If there is some dependence of these material properties on temperature,it might be the case that our model predic-tions are in error.We consider this possibility in the next section.4.Temperature dependenceIn the description of the mathematical model for heat conduction in Expression(1),the volumetric heat capacity q C p and thermal conductivity k are assumed to be indepen-dent of temperature T.It is reasonable to ask whether this assumption is tenable given the available materials charac-terization data.Fig.4is the scattergram for the‘medium’data set for heat capacity as a function of temperature.Lin-ear and quadratic regression analysis reveal no statistically significant trend among these points.The pictures are qual-itatively the same for the‘low’and‘high’data sets in that no trend or other stochastic dependence is evident.Thus, the experimental data for heat capacity support the assumption in the mathematical model.The materials characterization data for thermal conduc-tivity,on the other hand,seem to be strongly related to temperature.Fig.5shows the scattergram of thermal con-ductivity as a function of temperature for the‘medium’data together with a regression linefitted by the least squares criterion.This regression is statistically significant (p<0.001,df=18).The resulting model for this trend and residual scatter isk$aþb Tþnormalð0;rÞ¼0:0505þ2:25Â10À5Tþnormalð0;0:0047Þ;where a=0.0505and b=2.25Â10À5are thefitted regres-sion coefficients for the intercept and slope,the normal2412S.Ferson et al./Comput.Methods Appl.Mech.Engrg.197(2008)2408–2430function denotes a normal distribution with the given mean and standard deviation,and r=0.0047is the residual stan-dard error from the regression analysis.This r is the stan-dard deviation of the Gaussian distributions that,under the linear regression model,represent the vertical scatter of k at a given value of the temperature variable.There is no evidence that this trend is other than linear;quadratic regression does not provide a significant improvement in the regressionfit.The visual impression that the data might be heteroscedastic—specifically that the variance among conductivities at the highest temperature is larger than for other temperatures—was not statistically significant in a post hoc test.The homoscedasticity and the strong linear trend of thermal conductivity on temperature are also evi-dent in the‘low’and‘high’data sets,although the numer-ical details are of course slightly different.This statistical dependence has implications for the anal-ysis of the heating model.As given in the challenge prob-lem,the mathematical model counterfactually assumes independence of thermal conductivity and temperature.If we could account for the observed dependency of k on T, we might be able to reduce the overall uncertainty in the model’s predictions.The middle cumulative distribution function in Fig.6is the normal distributionfitted to the thermal conductivities k in the materials characterization data.(It is the same distribution previously shown in the right graph of Fig.1.)We might be able to make the risk analysis described in the previous section more sophisti-cated by replacing this broad distribution with a family of much tighter normal distributions representing the var-iability of k conditional on temperature.Each of these tigh-ter distributions is simply k$0.0505+2.25Â10À5T+ normal(0,0.0047)for a given scalar value of T.The breadth of each such distribution of k is the standard deviation of the residual term r=0.0047.Two distributions from this family are shown in the graph.When the temperature is 20°C,the distribution of k’s has a mean of0.05watts per meter degree.When the temperature is900,the distri-bution has a mean of0.07.For intermediate temperatures, the distribution of k has an intermediate central tendency, but always the same dispersion.Thus there is an entire con-tinuous family of parallel normal distributions defined by the regression analysis of thermal conductivity on temper-ature.As the surface is heated,the distribution of thermal conductivities shifts higher.This means that,given a tem-perature of the material,the stochastic variability in ther-mal conductivity is constrained to a tighter distribution than would be suggested by the shallow middle distribution which ignores the temperature dependence.The condition-ing of thermal conductivity on temperature reduces the variability compared to the traditional risk analysis of the previous section,although there remains an appreciable amount of variability in k.We can combine this regression model relating k and T with the challenge problem’s heating function Expression (1),albeit in an ad hoc fashion because one of the assump-tions underlying Expression(1)is that k is independent of T.This combination creates a system of two equations that can be solved iteratively.In this iterative approach,we start from the(unconditional)distribution of k observed in the materials characterization data,and compute from it the resulting distribution of T(just as we did in Section3).S.Ferson et al./Comput.Methods Appl.Mech.Engrg.197(2008)2408–24302413We then project this distribution of T through the regres-sion function to compute another distribution of k.That is,we compute the new distribution k$0.0505+2.25Â10À5T+normal(0,0.0047),where T is the just computed distribution of temperatures and the normal function gen-erates normally distributed random deviates centered at zero with standard deviation0.0047(which are indepen-dent of the temperatures).The resulting distribution of k’s conditional on temperature is then used to reseed the process,which is repeated until the distribution of T con-verges.We found that only two or three iterations were suf-ficient for convergence.When we undertake this iterative solution,we are clearly trespassing into the domain of the physics modeler.We said we did not want to do this because it confounds the activities of validation with modeling,but the challenge seems designed to invite us to do it,so we are taking the bait because the model seems clearly deficient as originally stated.We offer the result,not as our belief that it is the best approach,but merely as an alternative model which we will subject to a validation process.Note that we are certainly not asserting,nor do we necessarily believe,that this regression model is the best or even an appropriate way to account for the dependence of k and T.Accounting for the dependence is delicate;one might prefer to send the issue back to the modeler who could devise a new model with a solution to an altered differential equation.A phys-ics modeler who knows about their interactions should really be making pronouncements about such things.We are just exploring this as an exercise to see whether it can reduce the variability of the resulting surface temperatures and improve thefit to data(which we will consider in Sec-tion6).We revisited the Monte Carlo simulation described in the previous section with the ad hoc model for the depen-dence of k and T.The resulting predicted distribution of surface temperatures after1000s is shown as the solid dis-tribution in Fig.7.This distribution of temperatures has a smaller range and reduced variance compared to that seen in the traditional risk analysis of the previous section,but it is still in violation of the regulatory requirement in Expres-sion(2)because it also encroaches into the danger zone of the upper,right quadrant.However,it does so much less than the result from the risk analysis conducted in Section 3with the unmodified model.The estimated probability that temperature exceeds the critical value of900°C is about0.05,onlyfive times larger than that specified in the regulatory requirement.5.Validation assessmentThe topic of model validation has received considerable attention recently[13–16].Three main approaches are com-monly used for validation in engineering settings,including hypothesis testing[17–21,cf.10],Bayesian methods[22–28,cf.29,cf.30],and mean-based comparisons[31,32,3]. All three approaches have drawbacks.For instance,the purpose of hypothesis testing is to identify statements for which there is compelling evidence of truth.This is a rather different goal than that of validation,which is focused on assessing the quantitative accuracy of a model.The Bayes-ian approach to validation is primarily interested in evalu-ating the probability(i.e.,the belief)that the model is correct.Yet,to our minds,this is not the proper focus of validation.We are not concerned about anyone’s belief that the model is right;we are interested in objectively mea-suring the conformance of predictions against data that have not previously been used to develop or calibrate the model. The main limitation of approaches based on comparing means or other summary statistics is that it considers only the central tendencies or other specific behaviors of data and predictions and not their entire distributions.When predictions are distributions,they can contain a consider-able amount of detail and it is not always easy to know what is important,nor to be sure that a comparison of means will be sufficiently informative for a particular application.In this section,we introduce the notion of comparing probabilistic quantities,describe the desirable properties that a validation metric for making such comparisons should have,and suggest a particular one that has these properties.Section6applies this metric to the thermal chal-lenge problem.paring values that vary randomlyThere are a variety of standard ways to compare ran-dom variables in probability theory(/Random_variables#Equivalence_of_random_vari-ables).If random numbers X and Y always have the same value,the random variables are said to be‘‘equal”,or sometimes‘‘surely equal”.A much weaker notion of equal-ity is often useful.If we can only say that the expectation (i.e.,the average)of the absolute values of the differences between X and Y is zero,the random variables are said2414S.Ferson et al./Comput.Methods Appl.Mech.Engrg.197(2008)2408–2430。

distrr包的中文名字:distrr empirical distribution 估计和管理工具

distrr包的中文名字:distrr empirical distribution 估计和管理工具

Package‘distrr’October13,2022Title Estimate and Manage Empirical DistributionsDescription Tools to estimate and manage empirical distributions,which should work with survey data.One of the main features is thepossibility to create data cubes of estimated statistics,that includeall the combinations of the variables of interest(see for example functionsdcc5()and dcc6()).Depends R(>=3.1.2)Imports magrittr(>=1.5),dplyr(>=0.7.4),rlang,utils,stats,tidyr(>=0.7.0)Version0.0.6License GPL-2URL https://gibonet.github.io/distrr,https:///gibonet/distrrMaintainer Sandro Petrillo Burri<******************>RoxygenNote7.1.1Encoding UTF-8LazyData yesCollate``distrr.R''``invented_data.R''``compat-lazyeval.R''``dplyr_new_wrappers.R''``gibutils.R''``jointfuns.R''``Fhat_conditional.R''``distrr_funs.R''``dcc_new.R''``wq_df.R''``dcc6_fixed.R''NeedsCompilation noAuthor Sandro Petrillo Burri[aut,cre]Repository CRANDate/Publication2020-07-1409:20:07UTCR topics documented:combn_char (2)dcc (3)12combn_chardcc6 (4)distrr (5)extract_unique (6)Fhat_conditional_ (6)Fhat_df_ (7)invented_wages (8)jointfun_ (8)only_joint (9)wq (10)Index11 combn_char Generate all combinations of the elements of a character vectorDescriptionGenerate all combinations of the elements of a character vectorUsagecombn_char(x)Argumentsx a character vectorValuea nested list.A list whose elements are lists containing the character vectors with the combinationsof their elements.Examplescombn_char(c("gender","sector"))combn_char(c("gender","sector","education"))dcc Data cube creation(dcc)DescriptionData cube creation(dcc)Usagedcc(.data,.variables,.fun=jointfun_,...)dcc2(.data,.variables,.fun=jointfun_,order_type=extract_unique2,...) dcc5(.data,.variables,.fun=jointfun_,.total="Totale",order_type=extract_unique4,.all=TRUE,...)Arguments.data data frame to be processed.variables variables to split data frame by,as a character vector(c("var1","var2"))..fun function to apply to each piece(default:jointfun_)...additional functions passed to.fun.order_type a function like extract_unique or extract_unique2..total character string with the name to give to the subset of data that includes all the observations of a variable(default:"Totale")..all logical,indicating if functions’have to be evaluated on the complete dataset.Valuea data cube,with a column for each cateogorical variable used,and a row for each combination of allthe categorical variables’modalities.In addition to all the modalities,each variable will also have a "Total"possibility,which includes all the others.The data cube will contain marginal,conditional and joint empirical distributions...Examplesdata("invented_wages")str(invented_wages)tmp<-dcc(.data=invented_wages,.variables=c("gender","sector"),.fun=jointfun_) tmpstr(tmp)tmp2<-dcc2(.data=invented_wages,.variables=c("gender","education"),.fun=jointfun_,order_type=extract_unique2)tmp2str(tmp2)#dcc5works like dcc2,but has an additional optional argument,.total, #that can be added to give a name to the groups that include all the #observations of a variable.tmp5<-dcc5(.data=invented_wages,.variables=c("gender","education"),.fun=jointfun_,.total="TOTAL",order_type=extract_unique2)tmp5dcc6Data cube creationDescriptionData cube creationUsagedcc6(.data,.variables,.funs_list=list(n=~dplyr::n()),.total="Totale",order_type=extract_unique4,.all=TRUE)dcc6_fixed(.data,.variables,.funs_list=list(n=~dplyr::n()),.total="Totale",distrr5 order_type=extract_unique5,.all=TRUE,fixed_variable=NULL)Arguments.data data frame to be processed..variables variables to split data frame by,as a character vector(c("var1","var2"))..funs_list a list of function calls in the form of right-hand formula..total character string with the name to give to the subset of data that includes all the observations of a variable(default:"Totale").order_type a function like extract_unique or extract_unique2..all logical,indicating if functions have to be evaluated on the complete dataset.fixed_variable name of the variable for which you do not want to estimate the totalExamplesdcc6(invented_wages,.variables=c("gender","sector"),.funs_list=list(n=~dplyr::n()),.all=TRUE)dcc6(invented_wages,.variables=c("gender","sector"),.funs_list=list(n=~dplyr::n()),.all=FALSE)distrr Estimate and manage empirical distributionsDescriptionTools to estimate and manage empirical distributions,which should work with survey data.One of the main features is the possibility to create data cubes of estimated statistics,that include all the combinations of the variables of interest(see for example functions dcc5()and dcc6()).6Fhat_conditional_ extract_unique Functions to be used in conjunction with’dcc’familyDescriptionFunctions to be used in conjunction with’dcc’familyUsageextract_unique(df)extract_unique2(df)extract_unique3(df)extract_unique4(df)extract_unique5(df)Argumentsdf a data frameValuea list whose elements are character vectors of the unique values of each columnExamplesdata("invented_wages")tmp<-extract_unique(df=invented_wages[,c("gender","sector")])tmpstr(tmp)Fhat_conditional_Weighted empirical cumulative distribution function(ecdf),condi-tional on one or more variablesDescriptionWeighted empirical cumulative distribution function(ecdf),conditional on one or more variablesUsageFhat_conditional_(.data,.variables,x,weights)Fhat_df_7Arguments.data a data frame.variables a character vector with one or more column namesx character vector of length one,with the name of the numeric column whose conditional ecdf has to be estimatedweights character vector of length one,indicating the name of the positive numeric col-umn of weights,which will be used in the estimation of the conditional ecdf Valuea data frame,with the variables used to condition,the x variable,and columns wsum(aggregatedsum of weights,based on unique values of x)and Fhat(the estimated conditional Fhat).In addition to data frame,the object will be of classes grouped_df,tbl_df and tbl(from package dplyr) ExamplesFhat_conditional_(mtcars,.variables=c("vs","am"),x="mpg",weights="cyl")Fhat_df_Weighted empirical cumulative distribution function(data frame ver-sion)DescriptionWeighted empirical cumulative distribution function(data frame version)UsageFhat_df_(.data,x,weights)Arguments.data a data framex name of the numeric column(as character)weights name of the weight column(as character)Valuea data frame with columns:x,wcum and FhatExamplesdata(invented_wages)Fhat_df_(invented_wages,"wage","sample_weights")8jointfun_ invented_wages Invented dataset with wages of men and women.DescriptionThis dataset has been completely invented,in order to do some examples with the package. Usageinvented_wagesFormatA data frame(tibble)with1000rows and5variables:gender gender of the worker(men or women)sector economic sector where the worker is employed(secondary or tertiary)education educational level of the worker(I,II or III)wage monthly wage of the worker(in an invented currency)sample_weights sampling weightsDetailsEvery row of the dataset consists in a fake/invented individual worker.For every individual there is his/her gender,the economic sector in which he/she works,his/her level of education and his/her wage.Furthermore there is a column with the sampling weights.jointfun_A minimal function which counts the number of observations by groupsin a data frameDescriptionA minimal function which counts the number of observations by groups in a data frameUsagejointfun_(.data,.variables,...)Arguments.data data frame to be processed.variables variables to split data frame by,as a character vector(c("var1","var2"))....additional function calls to be applied on the.dataonly_joint9Valuea data frame,with a column for each cateogrical variable used,and a row for each combination ofall the categorical variables’modalities.Examplesdata("invented_wages")tmp<-jointfun_(.data=invented_wages,.variables=c("gender","sector"))tmpstr(tmp)only_joint Keeps only joint distribution(removes’.total’).DescriptionRemoves all the rows where variables have value.total.Usageonly_joint(.cube,.total="Totale",.variables=NULL)Arguments.cube a datacube with’Totale’modalities.total modality to eliminate(filter out)(default:"Totale").variables a character vector with the names of the categorical variablesValuea subset of the data cube with only the combinations of all variables modalities,without the"mar-gins".Examplesdata(invented_wages)str(invented_wages)vars<-c("gender","education")tmp<-dcc2(.data=invented_wages,.variables=vars,.fun=jointfun_,order_type=extract_unique2)tmpstr(tmp)only_joint(tmp,.variables=vars)#Compare dimensions(number of groups)dim(tmp)dim(only_joint(tmp,.variables=vars))10wq wq Empirical weighted quantileDescriptionEmpirical weighted quantileUsagewq(x,weights,probs=c(0.5))Argumentsx A numeric vectorweights A vector of(positive)sample weightsprobs a numeric vector with the desired quantile levels(default0.5,the median)ValueThe weighted quantile(a numeric vector)ReferencesFerrez,J.,Graf,M.(2007).Enquète suisse sur la structure des salaires.Programmes R pour l’intervalle de confiance de la médiane.(Rapport de méthodes No.338-0045).Neuchâtel:Of-fice fédéral de statistique.Exampleswq(x=rnorm(100),weights=runif(100))Index∗datasetsinvented_wages,8combn_char,2dcc,3dcc2(dcc),3dcc5(dcc),3dcc6,4dcc6_fixed(dcc6),4distrr,5extract_unique,3,5,6extract_unique2,3,5extract_unique2(extract_unique),6extract_unique3(extract_unique),6extract_unique4(extract_unique),6extract_unique5(extract_unique),6Fhat_conditional_,6Fhat_df_,7invented_wages,8jointfun_,3,8only_joint,9wq,1011。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

a r X i v :a s t r o -p h /0607605v 2 10 N o v 2006Mon.Not.R.Astron.Soc.000,000–000(0000)Printed 5February 2008(MN L A T E X style file v2.2)Empirical distributions of galactic λspin parameters fromthe SDSSX.Hernandez 1,Changbom Park 2,B.Cervantes-Sodi 1,and Yun-Young Choi 21Instituto de Astronom´ıa,Universidad Nacional Aut´o noma de M´e xico A.P.70–264,M´e xico 04510D.F.,M´e xico 2Korea Institute for Advanced Study,Dongdaemun-gu,Seoul 130-722,Korea5February 2008ABSTRACTUsing simple dimensional arguments for both spiral and elliptical galaxies,we present formulas to derive an estimate of the halo spin parameter λfor any real galaxy,in terms of common observational parameters.This allows a rough estimate of λ,which we apply to a large volume limited sample of galaxies taken from the SDSS data base.The large numbers involved (11,597)allow the derivation of reliable λdistributions,as signal adds up significantly in spite of the errors in the inferences for particular galaxies.We find that if the observed distribution of λis modeled with a log-normal function,as often done for this distribution in dark matter halos that appear in cosmological simulations,we obtain parameters λ0=0.04±0.005and σλ=0.51±0.05,interestingly consistent with values derived from simulations.For spirals,we find a good correlation between empirical values of λand visually assigned Hubble types,highlighting the potential of this physical parameter as an objective classification tool.Key words:galaxies:statistics –galaxies:formation –galaxies:fundamental param-eters –galaxies:structure –cosmology:observations –cosmology:miscellaneous1INTRODUCTIONPerhaps one of the most useful tools for describing the na-ture of observed galactic populations is the luminosity func-tion.The relative numbers of galaxies of different luminosi-ties offer a quantitative description of the result of the struc-ture formation scenario of the universe,nowadays available for inspection even as a function of redshift.The luminos-ity function,or mass function of galaxies,once a modeling of star formation histories and dust is included to yield mass to light ratios,is indeed one of the principal constraints against which cosmological simulations and structure formation the-ories in general are tested.Often star formation efficiencies and histories are calibrated through a fitting of predicted halo mass functions and observed luminosity functions e.g.Davis et al.(1985)and references thereof.Galaxies however,have many more properties than just their mass,most notably,galactic classification schemes cen-ter on sorting galaxies as ellipticals,lenticulars and spirals,subdividing the latter again into types,typically along the Hubble sequence.In spite of the many well known virtues of this classification scheme,its somewhat subjective,qualita-tive and relative nature,has made it difficult to use in com-parisons against cosmological simulations,where the ’type’of a modeled galaxy is rather difficult to assess,in terms of Hubble’s classification scheme.Inspired by varied theoretical studies which invariably identify the λspin parameter of a host halo as the principal physical parameter in determining the morphological and vi-sual characteristics of a spiral galaxy,which are then subjec-tively integrated into the qualitative assignment of ’type’,in Hernandez &Cervantes-Sodi (2006)two of us derived a sim-ple estimate of λfor any observed spiral.There it was shown that the scalings of the derived λagainst various type deter-mining properties such as colour,disk thickness and bulge to disk ratio,are comparable to the corresponding scalings of these parameters against Hubble type,using a large sample of nearby spirals.It is interesting that a generic prediction of cosmolog-ical N-body simulations over the past 2decades has been the functional form and parameters which define the pre-dicted distribution of λfor dark matter halos e.g.Bullock et al.(2001).Nevertheless,it has not been easy to test this prediction directly,as λis not a straight forward observ-able feature of a real galaxy.Not only is a measurable λdistribution relevant as a test of the general structure for-mation scheme,but also as a further way of independently2X.Hernandez,Changbom Park,B.Cervantes-Sodi and Yun-Young Choi constraining cosmological parameters,which to a certain ex-tent alter its details e.g.Gardner(2001).Having the approximate estimates of Hernandez&Cervantes-Sodi(2006)(henceforth HC06),here we set outto measure empirical distributions of galacticλspin param-eters.Firstly,given the approximate nature of these esti-mates,the only way of having signal adding up to some-thing significant is through the use of very large samples.Also,if we want a meaningful comparison against cosmolog-ical models,use of a volume limited sample is crucial.Thesetwo constraints drive us inevitably to the Sloan Digital SkySurvey(SDSS).Two of us have worked extensively with thisdatabase,and recently constructed an interesting morphol-ogy segregation scheme of galaxies in the SDSS in Park&Choi(2005),henceforth PC05.Here we use a volume limitedsample from the SDSS having galaxies with redshifts in theinterval0.025<z<0.055,corresponding to distances of be-tween75h−1Mpc and162.57h−1Mpc,assuming a WMAPcosmology ofΩM=0.27,ΩΛ=0.73and h=0.71,which wekeep throughout.The following section includes a review of the derivationofλspin parameters for spirals of HS06,and gives our resultsfor the spirals in our SDSS sample.The correspondence be-tweenλfor spirals,galactic type through visual inspectionand the colour and colour gradient morphology segregationscheme of PC05is also made in Section2,where we alsopresent empirical distributions of galacticλspin parametersfor spirals.These are wellfitted by a log-normal function,and give parameters in accordance with recent cosmologicalsimulations.In section3we extend the dimensional estimateofλfor ellipticals,and construct and analyze the empiricaldistributions ofλfor the complete sample.Our results aresummarized in4.2λDISTRIBUTIONS FOR SPIRAL GALAXIESWe are concerned with the determination of the dimension-less angular momentum parameter of a galactic halo:λ=L|E|1/24πG V dGM H.(6)Finally,we can replace M H for M d/F,and introduce adisk Tully-Fisher relation:M d=A T F V3.5d(7)into eq(6)to yield:λ= 21/2F GV3/2d.(8)The existence of a general baryonic Tully-Fisher rela-tion between the total baryonic component and V d of thetype used here,rather than a specific Tully-Fisher relationinvolving total magnitude in any particular band,is sup-ported by recent studies,in general in agreement with the3.5exponent we assume(e.g.Gurovich et al.2004or Kregelet al.2005whofind3.33±0.37).All that remains is tochoose numeric values for F and A T F,to obtain an esti-mate of the spin parameter of an observed galaxy in termsof structural parameters readily accessible to observations,R d and V d.Taking the Milky Way as a representative example,we can use a total baryonic mass of1×1011M⊙andM H=2.5×1012M⊙(where the estimate of the bary-onic mass includes all such components,disk,bulge,stel-lar halo etc.,e.g.Wilkinson&Evans1999,Hernandez etal.2001)as suitable estimates to obtain F=1/25.For arotation velocity of V d=220km/s,the above numbers im-ply A T F=633M⊙(km/s)−3.5,this in turn allows to expresseq(8)as:λ=21.8R d/kpcEmpirical distributions ofλ3 For the Galactic values used above,equation(9)yieldsλMW=0.0234,in excellent agreement with recent esti-mates of this parameter,e.g.through a detailed modelingof the Milky Way within a CDM framework Hernandez etal.(2001)findλMW=0.02.First,we note that by construction,what we are esti-mating is not strictlyλ,but what has been defined asλ′,the equivalentλfor a singular truncated isothermal halo.The relation betweenλandλ′is slightly a function of halostructure,for example,for NFW profiles with concentrationof10,one getsλ′=0.9λe.g.Mo et al.(1998).This slightdifference is smaller than the error introduced through thedispersion in the Tully-Fisher relation used,and will notbe considered further,in any case,it is more rigorouslyλ′and notλthat we will be talking about.Also,it is clearthat eq(9)is at best afirst order approximation,and not aprecise evaluation ofλfor a real galaxy.However,in HC06it was shown that thisλparametershows a one-to-one correlation,with small dispersion,whencompared to the inputλof detailed galactic formation mod-els from two distinct cosmological groups.Also,the scalingsseen against colour,disk thickness and bulge to disk ratios,for a sample of nearby spirals from Courteau(1996,1997),de Grijs(1998)and Kregel et al.(2002),similar to what isseen against Hubble type,highlights the use of this parame-ter as a physical classification scheme.This is not surprising,as it isλas defined by eq.(1)what has been repeatedly iden-tified in analytic and numerical studies of galactic formationas the principal determinant of galactic type e.g.Fall&Ef-stathiou(1980),Flores et al.(1993),Firmani et al.(1996),Dalcanton et al.(1997),Zhang&Wyse(2000),Silk(2001),Kregel et al.(2005).For example,the ratio of disk scale height to disk scalelength,h/R d,is one of the type defining characteristics of agalaxy which it is easy to show,will scale withλ.Startingfrom Toomre’s stability criterion:Q(r)=κ(r)σ(r)2πGΣ.(11)We use this relation for h to replace the gas velocitydispersion appearing in equation(10)for a combination ofh and the surface density.The dependence onΣis replacedby one on M d and R d through the disk profile.Replacingκ(r)for√M dhR d =14X.Hernandez,Changbom Park,B.Cervantes-Sodi and Yun-YoungChoiFigure1.Some of the galaxies in our sample on a colour,colour-gradient plane,with photos of galaxies representative of those found in each region,as visually classified by Park&Choi(2005). Red dots are early type galaxies and others are late types. taken from PC05,shows some of the galaxies in our sam-ple,with photos of galaxies representative of those found in each region,as visually classified by PC05.Notice the well defined cluster of ellipticals centered on∆(g−i)=−0.04, u−r=2.82.Figure(2)shows all our spirals in a colour, colour-gradient plane,with the shading giving the average values ofλwithin each shaded square.We see that disks with high values ofλare found in the lower left hand area, whilst disks having low values ofλpopulate the right and upper regions.It is interesting that PC05find precisely the same segregation pattern for late and early spirals,respec-tively,reinforcing the results of HC06that the quantitative and objectiveλparameter constructed in eq.(9)is a good physical classification parameter for spirals,reproducing the broad trends of the classical subjective,qualitative Hubble sequence.Next we use our collection of values ofλfrom our pruned sample of face-on spirals to construct a histogram, shown infigure(3)by the broken curve.The shape of this histogram suggests a log-normal distribution:P(λ0,σλ;λ)dλ=12πexp −ln2(λ/λ0)λ(14)The moments of the above distribution are analytical, which allows one to express:M1(λ0,σλ)= ∞0P(λ)λdλ∞0P(λ)dλ=λ20exp(2σ2λ),(16) from which one solves:λ0=M21M21 1/2.(17)Figure 2.Spiral galaxies in our sample on a colour colour-gradient plane,the shading shows the average values ofλin eachshaded region,using eq.(9).Regions of high and low values ofλclosely correspond to regions populated by late type and earlytype spirals,respectively,as visually classified by Park&Choi(2005).Equating the1st and2nd order moments of our em-pirical distribution to M1and M2,wefind the bestfit log-normal curve to our distribution for spiral galaxies.This isgiven by the smooth curve infigure(3),where the empiricalhistogram has been rescaled to a unit integral.The parame-ters of the bestfit distribution are:λ0=0.0585,σλ=0.446.Both the form of this distribution and the particular valuesfor the parameters wefind are in good agreement with re-sults from cosmological simulations,e.g.Shaw et.al(2006).Fromfigure(3)it is apparent that the log-normal func-tional form is a good description of our inferred data,amore formal confirmation is provided by a K-S test.Weconstruct the cumulative distribution functions for both thelog-normal P(λ)and our data out toλ=0.2,shown infig-ure(4)by the solid and dotted curves,respectively.Theseyield a maximum difference D=0.029,and a significancegiven by:Q KS(X)=2∞j=1(−1)j−1exp(−2j2X2)(18)where X=N1/2D,N the total number of points inour sample,7,746,giving X=2.55and a significance levelof Q KS=0.000004.We hence see that our inferred distri-bution of galacticλ’s is consistent with having come fromthe bestfit log-normal P(λ),to a very high degree.We notethe important precedent of Syer et.al(1999)who using sim-ilar estimates forλfor spiral galaxies,using a sample fromLauberts(1982),find a distribution of values ofλwellfittedby a log-normal function,with parametersλ0=0.05andσλ=0.36.The small differences with our inference mightcome from their lack of a large volume limited unbiased sam-ple,their lack of an attempt at modelingλfor ellipticals,Empirical distributions of λ500.050.10.150.20.255101520Figure 3.Distribution of values of λfor 7,753spirals using eq.(9),binned into 150intervals,broken curve.Best log-normal fit to the data,having parameters λ0=0.0585and σλ=0.446,solid curve.which further limited the scope of comparisons which could then be made against cosmological n-body models,or their use of a disk formation model calibrated directly from cos-mological n-body simulations.From equation(8)we see that our inferred λ’s are pro-portional to the constants (F/A T F ),since the validity and constancy of the T-F relation is well established,we can think of our inferred λ’s as λ(25F ),where F is the baryon fraction of a spiral system.We have shown that under the simplest assumption of a constant F =1/25,a log-normal distribution results from the data,clearly,any other con-stant F will also yield a log-normal distribution,with the same σλ,and a mean of 25F λ0.Also,notice that most cos-mological simulations yield distributions of λbeing indepen-dent of mass (e.g.Bullock et al.2001)which if true,ensures our inferences will not be skewed by the velocity cuts applied to the spirals.A baryon fraction being a strong function of R d or V d however,could easily result in a best fit P (λ)no longer looking like a log-normal distribution.In the absence of any definitive handle on F or its possible variations,we limit ourselves to concluding that a constant F implies a log-normal P (λ)for the spirals from the SDSS we studied,while varying F (R d ,V d )could doubtlessly imply something else.3λDISTRIBUTIONS FOR THE COMPLETE SAMPLEBefore attempting an empirical total distribution of λ’s we must derive an estimate for this parameter for an ellipti-cal galaxy,we propose a model equivalent to the one used for spirals,including only two components;a baryonic mat-ter distribution with a Hernquist density profile and a dark matter halo having a singular,truncated isothermalprofile.00.050.10.150.20.20.40.60.81Figure 4.Cumulative distribution functions for both our spiral galaxies data and the best fit log-normal P (λ),dotted and solid curves,respectively.In order to calculate the energy of the galaxy,we as-sume that the total potential is dominated by the energy of the virialized dark matter halo.In principle,we expect that halos were elliptical galaxies are found are no different from those of disk galaxies (White &Rees,1978,Kashlin-sky,1982).In the case of a disk galaxy,the halo density profile has the form 4πGρ(r )=(V c /r )2,with potential en-ergy −V 2c M H .The velocity V c is the circular velocity of an element in centrifugal equilibrium with the halo.In an ellip-tical,nothing is actually moving at that velocity,which still can be used to characterize the halo.The dependence on ve-locity can be changed to one on mass using again a baryonic Tully-Fisher relation (Gurovich et al,2004):M b =A T F V 3.5c ,thinking that the dynamics of a halo host of an elliptical galaxy are the same as those of disk galaxies.Finally,we define the baryonic mass fraction as F =M b /M H to ex-press the mass of the halo in terms of the baryonic mass.Validating the above assumptions,we note the interesting new results of Treu et.al (2006)and Koopmans et.al (2006)who find trough dynamical studies of velocity dispersion in ellipticals,and direct inferences on the halos of the studied ellipticals through lensing measurements,1)the presence of extensive dark halos and 2)that these halos have equiva-lent velocity dispersions equal to those of their galaxies,the dynamical equivalent of a flat rotation curve for ellipticals.For the angular momentum,if we assume that the spe-cific angular momenta of dark matter and baryons are equal,l b =l H ,we can obtain the angular momentum of the entire configuration using only the directly observable component.It is important to remember that the baryonic component is susceptible of dissipating,while dark matter is not,this will affect the above assumption and our λestimates,be-ing the value obtained a lower limit.In order to obtain the angular momentum of the baryonic distribution,we need to know the rotation velocity,information not always available6X.Hernandez,Changbom Park,B.Cervantes-Sodi and Yun-Young Choi due to technical observational difficulties and the fact thatin many elliptical galaxies,the rotational velocity is at bestof the same order as the velocity dispersion(Pinkney et al.2003,Sarzi et al.2006).These forces us to determine the an-gular momentum of the system using some other observableparameter.In many works(e.g.Binney1978,Franx et al.1991)it is shown that the ellipticity of a galaxy is related toits rotation velocity,more precisely to its angular momen-tum.By dimensional analysis we expect the specific angularmomentum of the galaxy be proportional to(GM b a)1/2andto the eccentricity e of the galaxy in the i-band,as shown byGott&Thuan(1976)and in Marchant&Shapiro(1977).In-troducing a proportionality constant C related to the detailsof the matter distribution,the specific angular momentumis given by:√l b=Ce(20)Gfrom Padmanabhan et al.(2004),whereσis the velocitydispersion.For elliptical galaxies it is well known that thepresence of dark matter in the inner part of the galaxy isnegligible so the total baryonic mass will be simply2M dyn.Now we have the tools to estimate the energy,the an-gular momentum and the mass of an elliptical galaxy.If weintroduce this information into equation1,we obtain:0.051e(a/kpc)2/7λ=Empirical distributions ofλ7 simistic view that such dispersions are intrinsic to the data(pessimistic in the sense that a measure of this dispersionis undoubtedly observational noise),and hence,propagateas errors into our estimates ofλ.We now estimate what isthe best underlying log-normal distribution,assuming thatthe histograms we obtained are only a degraded version ofan intrinsic log-normal distribution.We determine the bestintrinsic log-normal function through a full maximum likeli-hood analysis of the data.This is done for purposes of com-parison against predictedλdistributions,wefix the func-tional form of the distribution,and set out only to evaluatethe bestfit parametersλ0andσλfrom eq.(14).We construct the likelihood function as the probabilitythat an inferred set of n empirical values ofλ,{λi(λ)}n,might arise from a given model,(λ0,σλ),as:L(λ0,σλ;{λi(λ)}n)=ni=0ln ∞0P(λ0,σλ;λ)×λi(λ)dλ(22)where:λi(λ)=F i2πexp −(λ−λ0i)2 2σi ].In the limit of the error in our inferredλtending to zero, F i tends to1,andλi(λ)tends to a Dirac delta function,re-ducing the integral in equation(22)to an evaluation of P(λ) at the inferred value.The advantage of a full likelihood for-mulation is that parameter inference can be preformed in a way which naturally incorporates the errors in the data sam-ple,without the need of binning the data,a process which intrinsically reduces the information content of the sample.Equation(22)is then evaluated over afine grid of values of(λ0,σλ),and the point where the maximum is found selected as the bestfit model.For the7,753spirals the result is(λ0=0.0517,σλ=0.362),with very small confidence intervals of±0.0003and±0.004,respectively.Although our errors in individualλestimates are of25%ofλ,it is thanks to having a sample running into the thousands that we can retrieve details of theλdistribution with accuracy.Finally,we repeat the evaluation of eq.(22),but using this time the complete sample,of11,597spirals plus ellipti-cals.Hereλfor the ellipticals is taken as2.3×the estimate of eq.(21).This factor can not be much larger than what we are using,since then ellipticals would start overlapping sig-nificantly with spirals in theirλdistributions,as mentioned previously,if this factor is much reduced,the totalλdistri-bution becomes double peaked,with ellipticals appearing as a distinct population at very lowλ,which would be hard to explain.The results this time areλ0=0.0394andσλ=0.509.A comparison of the maximum likelihood model and the collection of inferredλs is given infig.(6),which is analogous tofig.(3),but includes the complete sample,binned into150 discrete intervals.A directfit of eq.(14)to the full data gives λ0=0.046,σλ=0.535.00.050.10.150.2510152025Figure6.Distribution of values ofλfor11,597spirals plus el-lipticals using eq.(9)and eq.(21),binned into150intervals,bro-ken curve.Best log-normalfit to the data,having parameters λ0=0.046andσλ=0.535,smooth solid curve.Maximum like-lihood underlying log-normal distribution,assuming a30%error in our inferredλ’s,with parametersλ0=0.0394andσλ=0.509, dotted curve.The formal errors in the method are again of the or-der of what was found for the spirals,but this time we are dominated by the unknown correction factor between our estimate and the actualλfor ellipticals.This uncertainty, although bounded,dominates ourfinal error estimates and yields±0.005inλ0and±0.05inσλ.Another possible source of error in our estimates would be the existence of a large population of low surface brightness galaxies of highλ,which would lead to larger values of bothλ0andσλ.The above results qualitatively agree with generic pre-dictions of structure formation models regarding the pre-dicted values forλ0andσλ,e.g.Shaw et al.(2006)review recent results from the literature giving values in the range 0.03<λ0<0.05and0.48<σλ<0.64.The availability of empirical measurements of these parameters could serve as a further independent guideline for models of structure formation.We stress that for the full sample,the log-normal na-ture of both the inferred and the underlyingλdistributions is an assumption which we can not confirm,taken only to permit a comparison with models.Under this assumption, the parameters of the distribution can be derived from the data.An inference of the actual functional form of the to-tal distribution would require more solid constraints on the hypothesis introduced in the derivation of theλ’s for ellipti-cals.However,we do note that any similar scheme modeling ellipticals as residing in halos equivalent to those of spirals, and having significantly lower angular momentum than spi-rals,will also yield a P(λ)not very different from what is shown infigure(6).8X.Hernandez,Changbom Park,B.Cervantes-Sodi and Yun-Young Choi4CONCLUSIONSWe apply simple estimates of theλspin parameter to a large volume limited sample from the SDSS.The sample is split into ellipticals and spirals using colour,concentration and colour gradients.Wefind that for spiral galaxies,the average value of the inferredλcorrelates well with standard type,as determined by visual inspection.This highlights the potential ofλas an objective and quantitative classification tool for spiral galaxies.For the spiral galaxies sample we obtain a distribution which is statistically consistent with a log-normal distribu-tion,as what is commonly found in cosmological n-body simulations,wefind parametersλ0=0.0585,σλ=0.446.Ellipticals have,as expected,average values ofλan or-der of magnitude lower than spirals.The details of theirλdistribution are harder to quantify,but average values of around0.005are derived.If the distribution ofλparameters for the full sample isfitted by log-normal distribution,of the type found to reproduce the corresponding distribution of modeled halos arising in cosmological simulations,the parameters wefind are:λ0=0.0394±0.005andσλ=0.509±0.05,derived this time from a sample of real galaxies.5ACKNOWLEDGMENTSThe work of X.Hernandez was partly supported by DGAPA-UNAM grant No IN117803-3and CONACYT grants42809/A-1and42748.CBP is supported by the Korea Science and Engineering Foundation(KOSEF)through the Astrophysical Research Center for the Structure and Evolu-tion of Cosmos(ARCSEC).The work of B.Cervantes-Sodi is supported by a CONACYT scholarship.REFERENCESBarton,E.J.,et al.,2001,ApJ,121,625Binney,J.,1978,MNRAS,183,501Binney J.,Tremaine S.,1987,Galactic Dynamics(Princeton: Princeton Univ.Press)Bullock,J.S.,et al.,2001,ApJ,555,240Choi et al.,2006,in preparationCourteau S.,1996,ApJS,103,363Courteau S.,1997,AJ,114,2402Dalcanton J.J.,Spergel D.N.,Summers F.J.,1997,ApJ,482, 659Davis,M.,Efstathiou,G.,Frenk,C.S.,White,S.D.M.,1985, ApJ,292,371de Grijs R.,1998,MNRAS,299,595Dopita M.A.,Ryder S.D.,1994,ApJ,430,163Fall S.M.,Efstathiou G.,1980,MNRAS,193,189Firmani C.,Hernandez X.,Gallagher J.,1996,A&A,308,403 Flores R.,Primack J.R.,Blumenthal G.R.,Faber S.M.,1993, ApJ,412,443Franx M.,Illingworth G.,Heckman T.,1989,344,613 Gardner,J.P.,2001,ApJ,557,616Gott J.R.,Thuan T.X.,1976,ApJ,204,649Gurovich S.,et al.,2004,PASA,21,412Halliday,C.,et al.,2001,MNRAS,326,473Hernandez X.,Avila-Reese,V.,Firmani,C.,2001,MNRAS,327, 329Hernandez,X.&Cervantes-Sodi,B.,2006,MNRAS,368,351 (HC06)Hernandez X.,Gilmore G.,1998,MNRAS,294,595 Kashlinsky A.,1982,MNRAS,200,585Koeppen J.,Theis C.,Hensler G.,1995,A&A,296,99 Koopmans L.V.E.,Treu T.,Bolton A.S.,Burles S.,Moustakas L.A.,2006,astro-ph/0601628,ApJ in pressKregel M.,van der Kruit P.C.,de Grijs R.,2002,MNRAS,334, 646Kregel M.,van der Kruit P.C.,Freeman K.C.,2005,MNRAS, 358,503Lauberts A.,1982,The ESO/Uppsala Survey of the ESO(B)At-las,European Southern ObservatoryMarchant A.B.,Shapiro S.L.,1977,ApJ,215,1Mo,H.J.,Mao,S.,White,S.D.M.,1998,MNRAS,295,319 Naab T.,Ostriker J.P.,2006,MNRAS,366,899 Padmanabhan N.,et al.,2004,NewA,9,329Park,C.&Choi,Y.,2005,ApJ,635,L29(PC05)Pinkney J.,et al.,2003,ApJ,596,903Sarzi M.,et al.,2006,MNRAS,366,1151Shaw,L.D.,Weller,J.,Ostriker,J.P.,Bode,P.,2006,ApJ,646, 815Silk J.,2001,MNRAS,324,313Syer D.,Mao S.,Mo H.J.,1999,MNRAS,305,357Treu T.,Koopmans L.V.E.,Bolton A.S.,Burles S.,Moustakas L.A.,2006,ApJ,640,662van der Kruit,P.C.,1987,A&A,173,59White S.D.M.,Rees M.J.,1978,MNRAS,183,341Wilkinson M.I.,Evans N.W.,1999,MNRAS,310,645 Yoachim P.,Dalcanton J.J.,2006,AJ,131,226Zhang B.,Wyse R.F G.,2000,MNRAS,313,310。

相关文档
最新文档