The Difference Between Clusters and Groups A Journey from Cluster Cores to their Outskirts

合集下载

robust standard errors

robust standard errors

robust standard errorswhen a linear-regression model’s assumption of uniformity of variance, also known as homoscedasticity, is violated, robust standard errors can be used. heteroscedasticity implies that the oute’s varianceis not constant across observations, which is a phenomenon known as heteroscedasticity.why do we use robust standard errors for heteroskedasticity?to fit a model that does contain heteroskedastic residuals, heteroskedasticity-consistent standard errors are used. huber (1967) proposed the first such approach, and since then, cross-sectional data, time series data, and garch estimation procedures have been improved.should i use a robust standard error method?as a result, the robust standard errors are safe touse (especially when there is a large sample size). even if there is no heteroskedasticity, the robust standard errors will be replaced by conventional ols standard errors. as a result, even under homoskedasticity, robust standard errors are acceptable.how are robust standard errors calculated?the square root of the elements on the covariance matrix’s diagional is equal to the huber-white robust standard errors. the squared residuals from the ols method are represented by the elements of s. these standard errors are referred to as heteroskedasticity-consistent (hc) standards.is it important to use standard errors?standard error is important because it allows you to estimate how well your sample data represents theentire population. increase the sample size to reduce standard error.the best way to minimize sampling bias is to use a large, random sample.why do robust standard errors have a better reputation?in social sciences, where the structure of variationis unknown, robust standard errors are useful, but in physical sciences, where the amount of variation for each observation is the same, they are usually avoided. robust standard errors are usually larger than non-robust standard errors, but they can also be smaller.what exactly is a high-response standard error?if a regression estimator is still reliable in the presence of outliers, it is said to be robust. on the other hand, if regression errors are autocorrelatedand/or heteroskedastic, it is said to be robust ifthey are still reliable.why do we use standard error clusters?the authors argue that clustering standard errors can be caused by two factors: a sampling design reason, which arises when you sample data from a population using clustered sampling and want to say something about the larger population; and an experimental design reason, which arises when some people’s assignment mechanism is altered.why are clustered standard errors more mon?the cluster-robust standard errors in such did examples with panel data can be significantly larger than the default because both the regressor and the errors are highly correlated within the cluster. this serial correlation can result in a significant difference between cluster-robust and default standard errors.is it possible to reduce robust standard errors by a factor of two?the lesson we can take away from this is that standard errors are not a panacea. because of the small sample bias we’ve discussed and the higher sampling variance of these standard errors, they can be smaller than ols standard errors. in finite samples, standard error estimates may be biased.what do you learn from robust standard errors?under heteroscedasticity, “robot” standard errors is a technique that obtains unbiased standard errors of ols coefficients. remember that heteroscedasticity is a violation of the gauss markov assumptions required to make ols the best linear unbiased estimator (blue).what are the consequences of cluster-robust standard errors?standard errors in clusters are designed to allow for correlation between cluster observations.robust standard errors 12when data is contaminated with outliers or influential observations, robust regression is an alternative to least squares regression, and it can also be used to detect influential observations.what is the best way to interpret standard error?the value for the standard error of the mean indicates how far sample means from the population mean are likely to fall using the original measurement units. larger values, once again, indicate larger distributions.we know that the average difference between a sample mean and a population mean is 3 for a sem of 3.what level should standard errors be clustered at?instead, we show that standard errors should be clustered at the pair level by researchers. we demonstrate that those results extend to stratified experiments with only a few units per strata using simulations.robust standard errors 15state in their conclusion that if the sampling process and treatment assignment aren’t clustered, you shouldn’t cluster standard errors, even if clustering alters your standard errors. in the following three possible cases, clustering will yield approximately correct standard errors.is it better to use robust regression?robust regression is a less restrictive alternative to least squares regression. when outliers are present in the data, it provides much better regression coefficient estimates.in least squares regression, outliers violate the assumption of normally distributed residuals.what exactly do strong results entail?the term robust or robustness in statistics refers to the strength of a statistical model, tests, or procedures based on the specific conditions of the statistical analysis that a study hopes to achieve. to put it another way, a reliable statistic is resistant to results errors.is heteroskedasticity robust to regression?we also require a robust heteroskedastic regression method that is patible with the form of heteroskedasticity specified. ordinary least squares (ols) and “heteroskedastic robust” standard errors are used in a very general method (white, 1980).in statistics, what is a robust test?in the case of tests, robustness refers to the test’s validity even after such a change. to put it another way, whether or not the oute is significant is only meaningful if the test’s assumptions are met.the test is said to be robust when such assumptions are relaxed (i.e. not as important).what is the best way to test for strength?fault injection is a system-wide testing method that can be used to check system robustness.the authors worked on a cost-effective method that aids fault injection in discovering critical flawsthat could fail the system.。

GSM认证考试试题集(二)

GSM认证考试试题集(二)

中国通信人才网发布时间: 2010-12-2 17:39:07 文章来源:网络选择题100. 【必考题】 The MS is paged on an incoming call in all cells within a C (1分)A.PLMNB.MSC/VLR Service AreaC.Location Area101. 【必考题】 Where is paging initiated? C (1分)A.BSCB.BTSC.MSC102. 【必考题】 What part was added to the MTP in the SS7 signaling system to enable it to support connection-orientated and connection-less signaling? A (1分)A.ISUPB.TCAPC.SCCP103. 【必考题】 The interface between MSC and the BSC in know as B (1分)A.A-interfaceB.A-bis-interfaceC.B-interface104. 【必考题】 Which of the following effects the size of a cell? A (1分)A.The type of BSC that the BTS is connected toB.Timing AdvanceC.Number of cells in a cluster 105. 【必考题】 Which of the following groups of cells would be most likely to be used in a cityarea? B (1分)A.Umbrella, Micro and PicoB.Omnidirectional, Sector and OverlaidC.Underlaid, Overlaid andOmnidirectional106. 【必考题】 Which logical channel transmits the BSIC of a cell? B (1分)A.BCCHB.SCHC.RACH107. 【必考题】 Where are measurement report, which determine handover of a call, carried out?A (1分)A.BSC and MSB.MS and BTSC.MS and MSC108. 【必考题】 The MS is close to the cell border defined by the Timing Advance. The BSC compares the current average value of TA to the defined value Timing Advance LIMit(TALIM). If TA exceedsthe limit, what happens? A (1分)A.An attempt is made to handover the call to a neighbouring cellB.An intracell handover takesplace C.The TALIM is increased until a handover can be effected109. 【必考题】 The maximum Timing Advance(TA) allowed in GSM is? B (1分)A.63 bit periodsB.128 bit periodsC.2048 bit periods110. 【必考题】 Which unit performs LAPD concentration and multiplexing? (1分)A.GSB.TRAUC.SRS111. 【必考题】 Intracell Handover involves? A (1分)A.Change of timeslot onlyB.Change of frequency onlyC.Change of timeslot and/or change offrequency112. 【必考题】 The normal cell maximum radius for GSM is? A (1分)A.35kmB.70kmC.50km113. 【必考题】 Which of the following types of si gnals would carry the information: “LocationUpdating Request”? B (1分)A.Initial MS MessageB.BSSMAPC.DTAP114. 【必考题】 Which unit is used for the transcoding of speech information? A (1分)A.TRAUB.TRHC.TRX115. 【必考题】 The two different types of multiframes are? B (1分)A.13-and 26-frameB.26-and 51-frameC.51-and 102-frame116. 【必考题】 When a MS is powered on what happens? B (1分)A.The MS scans the radio frequencies of the networkB.The mobile scans the frequencies carrying the BCCH channels onlyC.The mobile scans the frequencies, carrying the BCCH channels, of theneighbouring cells only117. 【必考题】 CALL CONFIRMED is a signal sent from the MS to the MSC for the purpose of? A (1分)A.Responding to SETUP signal from the MSCB.Responding to the AUTHENTICATION REQUEST signal fromMSC C.Responding to the PAGING COMMAND from the MSC118. 【必考题】 Which of the following is used by the BTS and MS in providing security for the speech and data over the air interface? B (1分)A.K1B.KcC.SRES119. 【必考题】 The subsystem that id concerned with the PCM links is? B (1分)A.RCSB.RTSC.TAS120. 【必考题】 he following: Normal, IMSI Detach, IMSI Attach, Periodic Registration are allforms of A (1分)A.Location UpdatingB.AuthenticationC.Paging121. 【必考题】 Which of the following is false regarding Location Areas? C (1分)A.One Location Area can belong to several BSCsB.One Location Area can belong to several MSCsC.One Location Area is always handle by only one MSC122. 【必考题】 Why is TMSI used? B (1分)A.As a temporary personal code to access the networkB.For security reasonsC.For use with a rentedmobile telephone123. 【必考题】 Which of the following channels operation in bit-stealing mode? C (1分)A.SDCCHB.SACCHC.FACCH124. 【必考题】 Timers are used for Periodic Location Updating. Where are these timers located?B (1分)A.A) MS and MSCB.B) MS and BSCC.C) BSC and MSC125. 【必考题】 The Cell Identity is unique to? C (1分)A.The location AreaB.The MSC/VLR Service AreaC.The PLMN126. 【必考题】 How many traffic channels can maximum be handled by a TRX? C (1分)A.1B.4C.8127. 【必考题】 How many TS are required on the Abis interface per TRX if LAPD C (1分)A.2B.2.25C.3128. 【必考题】 How many TSs and E1s are required to connect the following access network to a BSC when a DXC is used to get maximum utilization of the E1s? Access network with 30 sites of type 3/3/3 and 20 sites of type 2/2/2. LAPD-multiplexing and LAPD-concentration is used. C (1分)A.870 TSs, 29 E1sB.848 TSs, 28 E1sC.780 TSs, 26 E1s129. 【必考题】 How many physical connections are required in a network with 10 MSCs in a flat fully meshed network and in a hierarchical transit network with two transits? (1分)A.100 (meshed), 10 (transit)B.45 (meshed), 20 (transit)C.100 (meshed), 20 (transit) 130. 【必考题】 A link set has three signaling links, on which load sharing is used and the maximum load per signaling link is 30%. What will be the maximum signaling volume that can be carriedon this link set? (1分)A.192 kbit/sB.57,6 kbit/sC.51,2 kbit/s131. 【必考题】 What is the difference between message transfer on a) MTP and b) SCCP level inSTP? (1分)A.a) GTT is in STP, b) no GTT in STPB.a) no GTT in STP, b) GTT in STPC.no difference132. 【必考题】 Which of the following is not a valid SDCCH configuration? A (1分)A.SDCCH/4 combined with CCCH on timeslot oneB.SDCCH/4 including CBCH combined with CCCH on timeslot zeroC.SDCCH/8 non-combined on timeslot zeroD.SDCCH/8 including CBCH non-combined on timeslot oneE.SDCCH/8 non-combined on timeslot one133. 【必考题】 What does this user formula calculate? E (1分)A.TCH assignment success rateB.TCH availabilityC.TCH channel utilizationD.TCH traffic levelE.TCH mean holding time134. 【必考题】 What is a probable cause of random access failure? A (1分)A.MAXTA exceededB.TALIM exceededC.Congestion on the SDCCHD.All of theseE.E) None of these135. 【必考题】 AUC是用于 B (1分)A.对用户的合法身份做鉴权。

机器学习与数据挖掘笔试面试题

机器学习与数据挖掘笔试面试题
What is a decision tree? What are some business reasons you might want to use a decision tree model? How do you build a decision tree model? What impurity measures do you know? Describe some of the different splitting rules used by different decision tree algorithms. Is a big brushy tree always good? How will you compare aegression? Which is more suitable under different circumstances? What is pruning and why is it important? Ensemble models: To answer questions on ensemble models here is a :
Why do we combine multiple trees? What is Random Forest? Why would you prefer it to SVM? Logistic regression: Link to Logistic regression Here's a nice tutorial What is logistic regression? How do we train a logistic regression model? How do we interpret its coefficients? Support Vector Machines A tutorial on SVM can be found and What is the maximal margin classifier? How this margin can be achieved and why is it beneficial? How do we train SVM? What about hard SVM and soft SVM? What is a kernel? Explain the Kernel trick Which kernels do you know? How to choose a kernel? Neural Networks Here's a link to on Coursera What is an Artificial Neural Network? How to train an ANN? What is back propagation? How does a neural network with three layers (one input layer, one inner layer and one output layer) compare to a logistic regression? What is deep learning? What is CNN (Convolution Neural Network) or RNN (Recurrent Neural Network)? Other models: What other models do you know? How can we use Naive Bayes classifier for categorical features? What if some features are numerical? Tradeoffs between different types of classification models. How to choose the best one? Compare logistic regression with decision trees and neural networks. and What is Regularization? Which problem does Regularization try to solve? Ans. used to address the overfitting problem, it penalizes your loss function by adding a multiple of an L1 (LASSO) or an L2 (Ridge) norm of your weights vector w (it is the vector of the learned parameters in your linear regression). What does it mean (practically) for a design matrix to be "ill-conditioned"? When might you want to use ridge regression instead of traditional linear regression? What is the difference between the L1 and L2 regularization? Why (geometrically) does LASSO produce solutions with zero-valued coefficients (as opposed to ridge)? and What is the purpose of dimensionality reduction and why do we need it? Are dimensionality reduction techniques supervised or not? Are all of them are (un)supervised? What ways of reducing dimensionality do you know? Is feature selection a dimensionality reduction technique? What is the difference between feature selection and feature extraction? Is it beneficial to perform dimensionality reduction before fitting an SVM? Why or why not? and Why do you need to use cluster analysis? Give examples of some cluster analysis methods? Differentiate between partitioning method and hierarchical methods. Explain K-Means and its objective? How do you select K for K-Means?

高三英语统计学分析单选题60题

高三英语统计学分析单选题60题

高三英语统计学分析单选题60题1. In a survey, the mean age of the participants was 25 years. What does "mean" refer to in statistics?A. The most common valueB. The middle valueC. The average valueD. The difference between the highest and lowest values答案:C。

本题中,“mean”在统计学中表示平均值,是通过将所有数据相加然后除以数据的数量得到的。

选项A 中“the most common value”指的是众数;选项B 中“the middle value”指的是中位数;选项D 中“the difference between the highest and lowest values”指的是极差。

2. When analyzing data, we often use "variance" to measure. What does variance describe?A. How spread out the data isB. The central tendency of the dataC. The frequency of each valueD. The total sum of the data答案:A。

方差(variance)用于衡量数据的离散程度,即数据的分布有多分散。

选项B 中“the central tendency of the data”指的是数据的集中趋势;选项C 中“the frequency of each value”指的是每个值的频率;选项D 中“the total sum of the data”指的是数据的总和。

钻井地质英文词汇标准缩写

钻井地质英文词汇标准缩写
xbd
中文词义
英文词
英文缩写
交错层理
crossbedding
xbdg
交错纹层
cross-laminated
xlam
交错层理
cross-stratified
xstrat
挤压,变皱,扭弯
crumpled
crpld
隐,隐敝(词头)
crypto(prefix)
crp
隐晶质的
cryptocrystalline
cot d,cotg
中砾
cobble
cbl
颜色,显色
color
cbl
组合,混合,结合,化合
combination
comb
普通的,共同的
common
com
致密,压实
compact
cpct
完全,完成
completed
compld
完井,完成,结束
completion
compl
同心的,同轴的
concentric
car
bide

calcium
ca
氯化钙
calcium chloride
cacl
计算,预测,打算
calculate
calc
井径仪
caliper
clpr
寒武纪,寒武系
cambrian
cam
四氯化碳
carbon tetrachloride
ccl
碳质的,含碳的
carbonaceous
carb
套管,下套管
casing
cncn
贝壳状,介壳状的
conchoidal
conch
结核,凝结

2009年分子生物学quiz1参考答案(精)

2009年分子生物学quiz1参考答案(精)

2009年分子生物学quizl 参考答案2. What is central dogma? What contents have you learned in each part of the central dogma? (5’)The central dogma is the pathway for the flow of genetic information.We have learnt the maintenance of the genome (the structure of the genetic material and its faithful duplication) and the expression of the genome (the conversion of genetic instructions contained in DNA into proteins).(缺 reverse transcription 扣一分)3. Write out the structure of four bases, label the positions that participate Watson-Crick base pairing and that connect to the ribose? (10’)如果解答中可以 置等,会酌情wg/Q sugar4. What are the features of DNA structure? And how RNA structure differs from DNA? (2 Copyright © 2004 Pearson Education f I nc., publishing as Benjamin Cummings Two antiparallel polynucleotide chains are twisting around each other in the form of a double helix. 6’(polynucleotide(building block) 2’,antiparallel 2’,double helix 2’)Hydrogen Bonding determines the Specificity of base pairing (complimentarity).2’Stacking interaction between bases determines the stability of DNA double helix, hydrogen bond also contributes to stability.2’The double helix has Minor and Major grooves. (A B Z forms)2’Differences :Primary structure( building block): 2’dNTP vs. rNTP ; T vs. U ; double-stranded vs. single-stranded.Secondary structure: 4’键;AG 是二环,TC 是单环;氢键的位 The features of DNA structure :间是双键,CGDNA has stable double helical structure. (full complimentarity)RNA chains fold back on themselves to form local regions of double helix similar to A-form DNA. RNA helix are the base-paired segments between short stretches of complementary sequences, which adopt one of the various stem-loop structures, pseudoknots.( inter- and intra-molecular base pairing) Tertiary structures: 2’DNA: no tertiary structure.RNA can fold up into complex tertiary structures, because RNA has enormous rotational freedom in the backbone of its non-base-paired regions.pare the chemistry of DNA synthesis and RNA synthesis. (5’)(clues: compare=differences+the same; chemistry=substrate+direction+ energy)Differences:(1)DNA synthesis requires deoxynucleotide triphosphates while RNA synthesis requires oxynucleotide triphosphates;(1')(2)The base for DNA synthesis are A/T/C/G while that for RNA synthesis are A/U/C/G;(1')(3)DNA synthesis needs a primer:template junction while RNA synthesis do not;(1')The same:(1)The direction of both DNA synthesis and RNA synthesis is 5' to 3';(1')(2)The energy needed for DNA synthesis as well as RNA synthesis is hydrolysis of pyrophosphate (PPi).(1')6.Describe the functions of each domain of the DNA polymerase. (14’+1')DNA polymerase palm domain(1'):(1)Contains two catalytic sites, one for addition of dNTPs (1')and one for removal of the mispaired dNTP.(1')(2)The polymerization site: (1) binds to two metal ions that alter the chemical environment around the catalytic site and lead to the catalysis. (1')(2) Monitors the accuracy of base-pairing for the most recently added nucleotides by forming extensive hydrogen bond contacts with minor groove of the newly synthesized DNA. (1')(3)Exonuclease site/proof reading site . The mechanism of proof reading is kinetic selectivity(1') and the mismatched dNMP is removed by proofreading exonuclease in the direction of 3'-5'(1'). DNA polymerase finger domain:(1')(1)Binds to the incoming dNTP, encloses the correct paired dNTP to the position for catalysis;(1') (2)Bends the template to expose the only nucleotide at the template that ready for forming base pair with the incoming nucleotide;(1')(3)Stabilization of the pyrophosphate;(1')DNA polymerase thumb domain:(1')(1)Not directly involved in catalysis;(2)Interacts with the synthesized DNA to maintain correct position of the primer and the active site, (1')and to maintain a strong association between DNA Pol and its substrate.(1')7.How replication of a DNA molecule is accomplished in bacteria ?Initiation:(1)Recognition and binding of OriC by DnaA(initiator)-ATP.(2)Helicase (DnaB) loading and DnaC(helicase loader) and DNA unwinding.(3)Primase synthesizes RNA primer, and D NA Polymerase III synthesizes the new DNA strand.(The "trombone" model was developed to explain lagging strand and Elongation:leading strand synthesized simultaneously):(1)Leading strand: newly synthesized DNA strand that is continuously copied from the template strand by a DNA polymerase after the first RNA primer was made by a primase. A sliding clamp is usually loaded to the DNA polymerase to increase the polymerase processivity. The3 direction of theing strand is the same as the moving direction of the replication fork.(2)Lagging strand is discontinuously copied from the template strand. The 3 direction of the lagging strand is opposite to the moving direction of the replication fork. Primase makes RNA primers periodically after the template strand is unwound and becomes single-stranded. DNA polymerase extends each primer to synthesis short DNA fragments, called Okazaki fragments. The polymerase dissociates from the template strand when it meets the previous Okazaki fragment. Finally, RNA primers are digested by an RNase H activity, and the gaps are filled by DNA polymerase. At last, the adjacent Okazaki fragments are covalently joined together by a DNA ligase to generate a continuous, intact strand of new DNA.Termination:Type II topoisomerases separate daughter DNA molecules.8.How transcription of a RNA molecule by RNA polymerase II is initiated, elongated and terminated in eukaryotes (25’)Initiation: (11’+附加分)A.1.Promoter recognition: TBP in TFIID binds to the TATA box;(1’) TFIIA and TFIIB are recruited withTFIIB binding to the BRE;(1’)2. RNA Pol II recruitment: RNA Pol II-TFIIF complex is the recruited;(1’)3.TFIIE and TFIIH then bind upstream of Pol II (to form the pre-initiation complex).(1 ()如果提到了pre-initiation complex,有1 分的附加分)B.Promoter melting using energy from ATP hydrolysis by TFIIH .(2’)C.Promoter escapes after the phosphorylation of the CTD tail. (2’)Additional proteins are needed for transcription initiation in vivo:■The mediator complex (1 )■Transcriptional regulatory proteins (1 )■Nucleosome-modifying enzymes (1’)Tips:1、题目已经问了是真核中的情况,所以必须把相关的真核里的factor都回答出来,不然没分;2、B和C两点非常重要,分值也很大,由此可见把握keypoints的重要性;3、很少同学可以回答出in vivo状态下所需要的3种蛋白;4、如果可以答出另外一些细节,比如“TBP binds to and distorts DNA using a p sheet inserted into the minor groove”,会有少量加分。

单细胞数据高级分析之初步降维和聚类DimensionalityreductionClust。。。

单细胞数据高级分析之初步降维和聚类DimensionalityreductionClust。。。

单细胞数据⾼级分析之初步降维和聚类DimensionalityreductionClust。

个⼈的⼀些碎碎念:聚类,直觉就能想到kmeans聚类,另外还有⼀个hierarchical clustering,但是单细胞⾥⾯都⽤得不多,为什么?印象中只有⼀个scoring model是⽤kmean进⾏粗聚类。

(10x就是先做PCA,再⽤kmeans聚类的)鉴于单细胞的教程很多,也有不下于10种针对单细胞的聚类⽅法了。

降维往往是和聚类在⼀起的,所以似乎有点难以区分。

PCA到底是降维、聚类还是可视化的⽅法,t-SNE呢?其实稍微思考⼀下,PCA、t-SNE还有下⾯的diffusionMap,都是⼀种降维⽅法。

区别就在于PCA是完全的线性变换得到PC,t-SNE和diffusionMap 都是⾮线性的。

为什么降维?因为我们特征太多了,基因都是万级的,降维之后才能⽤kmeans啥的。

其次就是,降维了才能可视化!我们可视化的最⾼维度就是三维,⼏万维是⽆法可视化的。

但paper⾥,我们最多选前两维,三维在平⾯上的效果还不如⼆维。

聚类策略:聚类还要什么策略?不就是选好特征之后,再选⼀个k就得到聚类的结果了吗?是的,常规分析确实没有什么⾼深的东西。

但通常我们不是为了聚类⽽聚类,我们的结果是为⽣物学问题⽽服务的,如果从任何⾓度都⽆法解释你的聚类结果,那你还聚什么类,总不可能在paper⾥就写我们聚类了,得到了⼀些marker,然后就没了下⽂把!什么问题?什么叫针对问题的聚类呢?下⾯这篇⽂章就是针对具体问题的聚类。

先知:我们知道我们细胞⾥有些污染的细胞,如何通过聚类将他们识别出来?这种具体的问题就没法通过跑常规流程来解决了,得想办法!Dimensionality reduction.Throughout the manuscript we use diffusion maps, a non-linear dimensionality reduction technique37. We calculate a cell-to-cell distance matrix using 1 - Pearson correlation and use the diffuse function of the diffusionMap R package with default parameters to obtain the first 50 DMCs.To determine the significant DMCs, we look at the reduction of eigenvalues associated with DMCs. We determine all dimensions with an eigenvalue of at least 4% relative to the sum of the first 50 eigenvalues as significant, and scale all dimensions to have mean 0 and standard deviation of 1.有点超前(另类),⽤diffusionMap来降维,计算了细胞-细胞的距离,得到50个DMC,鉴定出显著的DMC,scale⼀下。

What is the Difference between Quantitative and Qualitative Research

What is the Difference between Quantitative and Qualitative Research

Research in Social Sciences: Qualitative and Quantitative Methods1Research in social sciencesResearch in social sciences largely depends on measurements and analysis and interpretation of numerical as well as non numerical data. Quantitative research methods focus on statistical approaches and qualitative methods are based on content analysis, comparative analysis, grounded theory, and interpretation (Strauss, 1990). Quantitative methods emphasise on objective measurements and numerical analysis of data collected through polls, questionnaires or surveys and qualitative research focuses on understanding social phenomena through interviews, personal comments etc. Quantitative and qualitative methods are studied within the context of positivistic and phenomenological paradigms (, 2006).●The applications of research methods could be studied in the context ofbusiness and management or in social psychology to understand a social process. Some of the basic tools for qualitative or quantitative research are related to data collection methods which can be case studies, questionnaires or interviews (Simon et al, 1985). Research methods in management focus on leadership studies and leadership issues are examined in accordance with contingency theory and organisational theory.The effects of leadership are best studied with the help of qualitative or quantitative research methods and analyses of questionnaires sent to participants in management positions. Research methods are however focused not just on management issues but also on social process for example a study on the interaction between age, physical exercise and gender. Some disadvantages and possible flaws of such methods may be related to abuse or misuse of interview techniques, inadequacy of data collection methods and reliability of data. The methodological approach and data collection techniques are important in research and form an important aspect of study (Simon, 1985). The case study approach is especially useful in analysis of business environments and perceptual mapping techniques are used for marketing research. Focus groups and surveys are used as other preferred methods of data collection, especially within business environments (, 2006).●Interpretive research can be considered as an important aspect ofqualitative analysis, although as with all qualitative data, subjective bias can be a deterrent in the validity of such approaches. In studying researchmethods, it would be necessary to highlight the differences between qualitative and quantitative methods although it has also been argued that an integrated approach to social analysis could close in the gap between quantitative and qualitative methods and both these methods could be used for social research studies. In fact there may not be a specified correct method of research as each method seems to have its strengths and weaknesses and these factors should be examined carefully before a particular method is selected or used for studying a social process.2What is the Difference between Quantitative and Qualitative Research?Qualitative and quantitative research are the two main schools of research, and although they are often used in tandem, the benefits and disadvantages of each are hotly debated. Particularly in the social sciences, the merits of both qualitative and quantitative research are fought over, with intense views held on both sides of the argument. It is generally agreed upon, however, that there are some phases of research where one or the other is clearly more useful than the other and so few people completely dismiss either.●Quantitative research is probably the least contentious of the two schools,as it is more closely aligned with what is viewed as the classical scientific paradigm. Quantitative research involves gathering data that is absolute, such as numerical data, so that it can be examined in as unbiased a manner as possible. There are many principles that go along with quantitative research, which help promote its supposed neutrality.Quantitative research generally comes later in a research project, once the scope of the project is well understood.●The main idea behind quantitative research is to be able to separate thingseasily so that they can be counted and modeled statistically, to remove factors that may distract from the intent of the research. A researcher generally has a very clear idea what is being measured before they start measuring it, and their study is set up with controls and a very clear blueprint. Tools used are intended to minimize any bias, so ideally are machines that collect information, and less ideally would be carefully randomized surveys. The result of quantitative research is a collection of numbers, which can be subjected to statistical analysis to come to results.●Remaining separate from the research emotionally is a key aspect ofquantitative research, as is removing researcher bias. For things like astronomy or other hard sciences, this means that quantitative research has a very minimal amount of bias at all. For things like sociological data, this means that the majority of bias is hopefully limited to that introduced by the people being studied, which can be somewhat accounted for in models. Quantitative is ideal for testing hypotheses, and for hard sciences trying to answer specific questions.●Qualitative research, on the other hand, is a much more subjective formof research, in which the research allows themselves to introduce their own bias to help form a more complete picture. Qualitative research may be necessary in situations where it is unclear what exactly is being looked for in a study, so that the researcher needs to be able to determine what data is important and what isn’t. While quantitative research generally knows exactly what it’s looking for be fore the research begins, in qualitative research the focus of the study may become more apparent as time progresses.●Often the data presented from qualitative research will be much lessconcrete than pure numbers as data. Instead, qualitative research may yield stories, or pictures, or descriptions of feelings and emotions. The interpretations given by research subjects are given weight in qualitative research, so there is no seeking to limit their bias. At the same time, researchers tend to become more emotionally attached to qualitative research, and so their own bias may also play heavily into the results.●Within the social sciences, there are two opposing schools of thought. Oneholds that fields like sociology and psychology should attempt to be as rigorous and quantitative as possible, in order to yield results that can be more easily generalized, and in order to sustain the respect of the scientific community. Another holds that these fields benefit from qualitative research, as it allows for a richer study of a subject, and allows for information to be gathered that would otherwise be entirely missed by a quantitative approach. Although attempts have been made in recent years to find a stronger synthesis between the two, the debate rages on, with many social scientists falling sharply on one side or the other.Features of Qualitative & Quantitative ResearchQualitative QuantitativeResearcher tends to become subjectively immersed in the subject matter. Researcher tends to remain objectively separated from the subject matter.(the two quotes are from Miles & Huberman (1994, p. 40). Qualitative Data Analysis)Main Points•Qualitative research involves analysis of data such as words(e.g., from interviews), pictures (e.g., video), or objects (e.g.,an artifact).•Quantitative research involves analysis of numerical data. •The strengths and weaknesses of qualitative and quantitative research are a perennial, hot debate, especially in the social sciences. The issues invoke classic 'paradigm war'.•The personality / thinking style of the researcher and/or the culture of the organization is under-recognized as a key factor in preferred choice of methods.•Overly focusing on the debate of "qualitative versus quantitative" frames the methods in opposition. It isimportant to focus also on how the techniques can beintegrated, such as in mixed methods research. More good can come of social science researchers developing skills in both realms than debating which method is superior.。

数据库英文版第六版课后答案(32)

数据库英文版第六版课后答案(32)

数据库英⽂版第六版课后答案(32)C H A P T E R11Indexing and HashingThis chapter covers indexing techniques ranging from the most basic one tohighly specialized ones.Due to the extensive use of indices in database systems,this chapter constitutes an important part of a database course.A class that has already had a course on data-structures would likely be famil-iar with hashing and perhaps even B+-trees.However,this chapter is necessaryreading even for those students since data structures courses typically cover in-dexing in main memory.Although the concepts carry over to database accessmethods,the details(e.g.,block-sized nodes),will be new to such students.The sections on B-trees(Sections11.4.5)and bitmap indexing(Section11.9) may be omitted if desired. Exercises11.15When is it preferable to use a dense index rather than a sparse index?Explain your answer.Answer:It is preferable to use a dense index instead of a sparse indexwhen the?le is not sorted on the indexed?eld(such as when the indexis a secondary index)or when the index?le is small compared to the sizeof memory.11.16What is the difference between a clustering index and a secondary index?Answer:The clustering index is on the?eld which speci?es the sequentialorder of the?le.There can be only one clustering index while there canbe many secondary indices.11.17For each B+-tree of Practice Exercise11.3,show the steps involved in thefollowing queries:a.Find records with a search-key value of11.b.Find records with a search-key value between7and17,inclusive.Answer:With the structure provided by the solution to Practice Exer-cise11.3a:9798Chapter11Indexing and Hashinga.Find records with a value of11i.Search the?rst level index;follow the?rst pointer.ii.Search next level;follow the third pointer.iii.Search leaf node;follow?rst pointer to records with key value11.b.Find records with value between7and17(inclusive)i.Search top index;follow?rst pointer.ii.Search next level;follow second pointer.iii.Search third level;follow second pointer to records with keyvalue7,and after accessing them,return to leaf node.iv.Follow fourth pointer to next leaf block in the chain.v.Follow?rst pointer to records with key value11,then return.vi.Follow second pointer to records with with key value17.With the structure provided by the solution to Practice Exercise12.3b:a.Find records with a value of11i.Search top level;follow second pointer.ii.Search next level;follow second pointer to records with key value11.b.Find records with value between7and17(inclusive)i.Search top level;follow second pointer.ii.Search next level;follow?rst pointer to records with key value7,then return.iii.Follow second pointer to records with key value11,then return.iv.Follow third pointer to records with key value17.With the structure provided by the solution to Practice Exercise12.3c:a.Find records with a value of11i.Search top level;follow second pointer.ii.Search next level;follow?rst pointer to records with key value11.b.Find records with value between7and17(inclusive)i.Search top level;follow?rst pointer.ii.Search next level;follow fourth pointer to records with key value7,then return.iii.Follow eighth pointer to next leaf block in chain.iv.Follow?rst pointer to records with key value11,then return.v.Follow second pointer to records with key value17.Exercises99 11.18The solution presented in Section11.3.4to deal with nonunique searchkeys added an extra attribute to the search key.What effect could this change have on the height of the B+-tree?Answer:The resultant B-tree’s extended search key is unique.This results in more number of nodes.A single node(which points to mutiple records with the same key)in the original tree may correspond to multiple nodes in the result tree.Dependingon how they are organized the height of the tree may increase;it might be more than that of the original tree.11.19Explain the distinction between closed and open hashing.Discuss therelative merits of each technique in database applications.Answer:Open hashing may place keys with the same hash function value in different buckets.Closed hashing always places such keys together in the same bucket.Thus in this case,different buckets can be of different sizes,though the implementation may be by linking together?xed size buckets using over?ow chains.Deletion is dif?cult with open hashing as all the buckets may have to inspected before we can ascertain that a key value has been deleted,whereas in closed hashing only that bucket whose address is obtained by hashing the key value need be inspected.Deletions are more common in databases and hence closed hashing is more appropriate for them.For a small,static set of data lookups may be more ef?cient using open hashing.The symbol table of a compiler would be a good example.11.20What are the causes of bucket over?ow in a hash?le organization?Whatcan be done to reduce the occurrence of bucket over?ows?Answer:The causes of bucket over?ow are:-a.Our estimate of the number of records that the relation will have wastoo low,and hence the number of buckets allotted was not suf?cient.b.Skew in the distribution of records to buckets.This may happeneither because there are many records with the same search keyvalue,or because the the hash function chosen did not have thedesirable properties of uniformity and randomness.To reduce the occurrence of over?ows,we can:-a.Choose the hash function more carefully,and make better estimatesof the relation size.b.If the estimated size of the relation is n r and number of records perblock is f r,allocate(n r/f r)?(1+d)buckets instead of(n r/f r)buckets.Here d is a fudge factor,typically around0.2.Some space is wasted:About20percent of the space in the buckets will be empty.But thebene?t is that some of the skew is handled and the probability ofover?ow is reduced.11.21Why is a hash structure not the best choice for a search key on whichrange queries are likely?100Chapter11Indexing and HashingAnswer:A range query cannot be answered ef?ciently using a hashindex,we will have to read all the buckets.This is because key valuesin the range do not occupy consecutive locations in the buckets,they aredistributed uniformly and randomly throughout all the buckets.11.22Suppose there is a relation r(A,B,C),with a B+-tree index with searchkey(A,B).a.What is the worst-case cost of?nding records satisfying10using this index,in terms of the number of records retrieved n1and the height h of the tree?b.What is the worst-case cost of?nding records satisfying1050∧5n2that satisfy this selection,as well as n1and h de?ned above?c.Under what conditions on n1and n2would the index be an ef?cient way of?nding records satisfying10Answer:a.What is the worst case cost of?nding records satisfying10using this index,in terms of the number of records retrieved n1and the height h of the tree?This query does not correspond to a range query on the search key as the condition on the?rst attribute if the search key is a comparison condition.It looks up records which have the value of A between10and50.However,each record is likely to be in a different block, because of the ordering of records in the?le,leading to many I/O operation.In the worst case,for each record,it needs to traverse the whole tree(cost is h),so the total cost is n1?h.b.What is the worst case cost of?nding records satisfying1050∧5n2that satisfy this selection,as well as n1and h de?ned above.This query can be answered by using an ordered index on the search key(A,B).For each value of A this is between10and50,the system located records with B value between5and10.However,each record could is likely to be in a different disk block.This amounts to exe-cuting the query based on the condition on A,this costs n1?h.Then these records are checked to see if the condition on B is satis?ed.So, the total cost in the worst case is n1?h.c.Under what conditions on n1and n2would the index be an ef?cient way of?nding records satisfying10n1records satisfy the?rst condition and n2records satisfy the second condition.When both the conditions are queried,n1records are output in the?rst stage.So,in the case where n1=n2,no extra records are output in the furst stage.Otherwise,the records whichExercises101 dont satisfy the second condition are also output with an additionalcost of h each(worst case).11.23Suppose you have to create a B+-tree index on a large number of names,where the maximum size of a name may be quite large(say40characters) and the average name is itselflarge(say10characters).Explain how pre?x compression can be used to maximize the average fanout of nonleaf nodes. Answer:There arises2problems in the given scenario.The?rst prob-lem is names can be of variable length.The second problem is names can be long(maximum is40characters),leading to a low fanout and a correspondingly increased tree height.With variable-length search keys, different nodes can have different fanouts even if they are full.The fanout of nodes can be increased bu using a technique called pre?x compres-sion.With pre?x compression,the entire search key value is not stored at internal nodes.Only a pre?x of each search key which is suf?ceint to distinguish between the key values in the subtrees that it seperates.The full name can be stored in the leaf nodes,this way we dont lose any information and also maximize the average fanout of internal nodes. 11.24Suppose a relation is stored in a B+-tree?le organization.Suppose sec-ondary indices stored record identi?ers that are pointers to records on disk.a.What would be the effect on the secondary indices if a node splithappens in the?le organization?b.What would be the cost of updating all affected records in a sec-ondary index?c.How does using the search key of the?le organization as a logicalrecord identi?er solve this problem?d.What is the extra cost due to the use of such logical record identi?ers?Answer:a.When a leaf page is split in a B+-tree?le organization,a numberof records are moved to a new page.In such cases,all secondaryindices that store pointers to the relocated records would have tobe updated,even though the values in the records may not havechanged.b.Each leaf page may contain a fairly large number of records,andeach of them may be in different locations on each secondary index.Thus,a leaf-page split may require tens or even hundreds of I/Ooperations to update all affected secondary indices,making it a veryexpensive operation.c.One solution is to store the values of the primary-index search keyattributes in secondary indices,in place of pointers to the indexed102Chapter11Indexing and Hashingrecords.Relocation of records because of leaf-page splits then doesnot require any update on any secondary index.d.Locating a record using the secondary index now requires two steps:First we use the secondary index to?nd the primary index search-key values,and then we use the primary index to?nd the corre-sponding records.This approach reduces the cost of index updatedue to?le reorganization,although it increases the cost of accesingdata using a secondary index.11.25Show how to compute existence bitmaps from other bitmaps.Make sure that your technique works even in the presence of null values,by using a bitmap for the value null.Answer:The existence bitmap for a relation can be calculated by takingthe union(logical-or)of all the bitmaps on that attribute,including thebitmap for value null.11.26How does data encryption affect index schemes?In particular,how might it affect schemes that attempt to store data in sorted order?Answer:Note that indices must operate on the encrypted data orsomeone could gain access to the index to interpret the data.Otherwise,the index would have to be restricted so that only certain users could access it.To keep the data in sorted order,the index scheme would haveto decrypt the data at each level in a tree.Note that hash systems wouldnot be affected.11.27Our description of static hashing assumes that a large contiguous stretch of disk blocks can be allocated to a static hash table.Suppose you can allocate only C contiguous blocks.Suggest how to implement the hash table,if it can be much larger than C blocks.Access to a block should stillbe ef?cient.Answer:A separate list/table as shown below can be created.Starting address of?rst set of C blocksCStarting address of next set of C blocks2Cand so onDesired block address=Starting adress(from the table depending onthe block number)+blocksize*(blocknumber%C)For each set of C blocks,a single entry is added to the table.In thiscase,locating a block requires2steps:First we use the block number tond the actual block address,and then we can access the desired block.。

聚类算法英文专业术语

聚类算法英文专业术语

聚类算法英文专业术语1. 聚类 (Clustering)2. 距离度量 (Distance Metric)3. 相似度度量 (Similarity Metric)4. 皮尔逊相关系数 (Pearson Correlation Coefficient)5. 欧几里得距离 (Euclidean Distance)6. 曼哈顿距离 (Manhattan Distance)7. 切比雪夫距离 (Chebyshev Distance)8. 余弦相似度 (Cosine Similarity)9. 层次聚类 (Hierarchical Clustering)10. 分层聚类 (Divisive Clustering)11. 凝聚聚类 (Agglomerative Clustering)12. K均值聚类 (K-Means Clustering)13. 高斯混合模型聚类 (Gaussian Mixture Model Clustering)14. 密度聚类 (Density-Based Clustering)15. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)16. OPTICS (Ordering Points To Identify the Clustering Structure)17. Mean Shift18. 聚类评估指标 (Clustering Evaluation Metrics)19. 轮廓系数 (Silhouette Coefficient)20. Calinski-Harabasz指数 (Calinski-Harabasz Index)21. Davies-Bouldin指数 (Davies-Bouldin Index)22. 聚类中心 (Cluster Center)23. 聚类半径 (Cluster Radius)24. 噪声点 (Noise Point)25. 簇内差异 (Within-Cluster Variation)26. 簇间差异 (Between-Cluster Variation)。

面试问答2

面试问答2
?You installed a new AD domain and the new (and first) DC has not registered its SRV records in DNS. Name a few possible causes.
?What are the benefits and scenarios of using Stub zones?
?What is CIDR?
?You have the following Network ID: 192.115.103.64/27. What is the IP range for your network?
?You have the following Network ID: 131.112.0.0. You need at least 500 hosts per network. How many networks can you create? What subnet mask will you use?
?What is DHCPINFORM?
?Describe the integration between DHCP and DNS.
?What options in DHCP do you regularly use for an MS network?
?What are User Classes and Vendor Classes in DHCP?
?What could cause the Forwarders and Root Hints to be grayed out?
?What is a "Single Label domain name" and what sort of issues can it cause?

专八英语阅读

专八英语阅读

英语专业八级考试TEM-8阅读理解练习册(1)(英语专业2012级)UNIT 1Text AEvery minute of every day, what ecologist生态学家James Carlton calls a global ―conveyor belt‖, redistributes ocean organisms生物.It’s planetwide biological disruption生物的破坏that scientists have barely begun to understand.Dr. Carlton —an oceanographer at Williams College in Williamstown,Mass.—explains that, at any given moment, ―There are several thousand marine species traveling… in the ballast water of ships.‖ These creatures move from coastal waters where they fit into the local web of life to places where some of them could tear that web apart. This is the larger dimension of the infamous无耻的,邪恶的invasion of fish-destroying, pipe-clogging zebra mussels有斑马纹的贻贝.Such voracious贪婪的invaders at least make their presence known. What concerns Carlton and his fellow marine ecologists is the lack of knowledge about the hundreds of alien invaders that quietly enter coastal waters around the world every day. Many of them probably just die out. Some benignly亲切地,仁慈地—or even beneficially — join the local scene. But some will make trouble.In one sense, this is an old story. Organisms have ridden ships for centuries. They have clung to hulls and come along with cargo. What’s new is the scale and speed of the migrations made possible by the massive volume of ship-ballast water压载水— taken in to provide ship stability—continuously moving around the world…Ships load up with ballast water and its inhabitants in coastal waters of one port and dump the ballast in another port that may be thousands of kilometers away. A single load can run to hundreds of gallons. Some larger ships take on as much as 40 million gallons. The creatures that come along tend to be in their larva free-floating stage. When discharged排出in alien waters they can mature into crabs, jellyfish水母, slugs鼻涕虫,蛞蝓, and many other forms.Since the problem involves coastal species, simply banning ballast dumps in coastal waters would, in theory, solve it. Coastal organisms in ballast water that is flushed into midocean would not survive. Such a ban has worked for North American Inland Waterway. But it would be hard to enforce it worldwide. Heating ballast water or straining it should also halt the species spread. But before any such worldwide regulations were imposed, scientists would need a clearer view of what is going on.The continuous shuffling洗牌of marine organisms has changed the biology of the sea on a global scale. It can have devastating effects as in the case of the American comb jellyfish that recently invaded the Black Sea. It has destroyed that sea’s anchovy鳀鱼fishery by eating anchovy eggs. It may soon spread to western and northern European waters.The maritime nations that created the biological ―conveyor belt‖ should support a coordinated international effort to find out what is going on and what should be done about it. (456 words)1.According to Dr. Carlton, ocean organism‟s are_______.A.being moved to new environmentsB.destroying the planetC.succumbing to the zebra musselD.developing alien characteristics2.Oceanographers海洋学家are concerned because_________.A.their knowledge of this phenomenon is limitedB.they believe the oceans are dyingC.they fear an invasion from outer-spaceD.they have identified thousands of alien webs3.According to marine ecologists, transplanted marinespecies____________.A.may upset the ecosystems of coastal watersB.are all compatible with one anotherC.can only survive in their home watersD.sometimes disrupt shipping lanes4.The identified cause of the problem is_______.A.the rapidity with which larvae matureB. a common practice of the shipping industryC. a centuries old speciesD.the world wide movement of ocean currents5.The article suggests that a solution to the problem__________.A.is unlikely to be identifiedB.must precede further researchC.is hypothetically假设地,假想地easyD.will limit global shippingText BNew …Endangered‟ List Targets Many US RiversIt is hard to think of a major natural resource or pollution issue in North America today that does not affect rivers.Farm chemical runoff残渣, industrial waste, urban storm sewers, sewage treatment, mining, logging, grazing放牧,military bases, residential and business development, hydropower水力发电,loss of wetlands. The list goes on.Legislation like the Clean Water Act and Wild and Scenic Rivers Act have provided some protection, but threats continue.The Environmental Protection Agency (EPA) reported yesterday that an assessment of 642,000 miles of rivers and streams showed 34 percent in less than good condition. In a major study of the Clean Water Act, the Natural Resources Defense Council last fall reported that poison runoff impairs损害more than 125,000 miles of rivers.More recently, the NRDC and Izaak Walton League warned that pollution and loss of wetlands—made worse by last year’s flooding—is degrading恶化the Mississippi River ecosystem.On Tuesday, the conservation group保护组织American Rivers issued its annual list of 10 ―endangered‖ and 20 ―threatened‖ rivers in 32 states, the District of Colombia, and Canada.At the top of the list is the Clarks Fork of the Yellowstone River, whereCanadian mining firms plan to build a 74-acre英亩reservoir水库,蓄水池as part of a gold mine less than three miles from Yellowstone National Park. The reservoir would hold the runoff from the sulfuric acid 硫酸used to extract gold from crushed rock.―In the event this tailings pond failed, the impact to th e greater Yellowstone ecosystem would be cataclysmic大变动的,灾难性的and the damage irreversible不可逆转的.‖ Sen. Max Baucus of Montana, chairman of the Environment and Public Works Committee, wrote to Noranda Minerals Inc., an owner of the ― New World Mine‖.Last fall, an EPA official expressed concern about the mine and its potential impact, especially the plastic-lined storage reservoir. ― I am unaware of any studies evaluating how a tailings pond尾矿池,残渣池could be maintained to ensure its structural integrity forev er,‖ said Stephen Hoffman, chief of the EPA’s Mining Waste Section. ―It is my opinion that underwater disposal of tailings at New World may present a potentially significant threat to human health and the environment.‖The results of an environmental-impact statement, now being drafted by the Forest Service and Montana Department of State Lands, could determine the mine’s future…In its recent proposal to reauthorize the Clean Water Act, the Clinton administration noted ―dramatically improved water quality since 1972,‖ when the act was passed. But it also reported that 30 percent of riverscontinue to be degraded, mainly by silt泥沙and nutrients from farm and urban runoff, combined sewer overflows, and municipal sewage城市污水. Bottom sediments沉积物are contaminated污染in more than 1,000 waterways, the administration reported in releasing its proposal in January. Between 60 and 80 percent of riparian corridors (riverbank lands) have been degraded.As with endangered species and their habitats in forests and deserts, the complexity of ecosystems is seen in rivers and the effects of development----beyond the obvious threats of industrial pollution, municipal waste, and in-stream diversions改道to slake消除the thirst of new communities in dry regions like the Southwes t…While there are many political hurdles障碍ahead, reauthorization of the Clean Water Act this year holds promise for US rivers. Rep. Norm Mineta of California, who chairs the House Committee overseeing the bill, calls it ―probably the most important env ironmental legislation this Congress will enact.‖ (553 words)6.According to the passage, the Clean Water Act______.A.has been ineffectiveB.will definitely be renewedC.has never been evaluatedD.was enacted some 30 years ago7.“Endangered” rivers are _________.A.catalogued annuallyB.less polluted than ―threatened rivers‖C.caused by floodingD.adjacent to large cities8.The “cataclysmic” event referred to in paragraph eight would be__________.A. fortuitous偶然的,意外的B. adventitious外加的,偶然的C. catastrophicD. precarious不稳定的,危险的9. The owners of the New World Mine appear to be______.A. ecologically aware of the impact of miningB. determined to construct a safe tailings pondC. indifferent to the concerns voiced by the EPAD. willing to relocate operations10. The passage conveys the impression that_______.A. Canadians are disinterested in natural resourcesB. private and public environmental groups aboundC. river banks are erodingD. the majority of US rivers are in poor conditionText CA classic series of experiments to determine the effects ofoverpopulation on communities of rats was reported in February of 1962 in an article in Scientific American. The experiments were conducted by a psychologist, John B. Calhoun and his associates. In each of these experiments, an equal number of male and female adult rats were placed in an enclosure and given an adequate supply of food, water, and other necessities. The rat populations were allowed to increase. Calhoun knew from experience approximately how many rats could live in the enclosures without experiencing stress due to overcrowding. He allowed the population to increase to approximately twice this number. Then he stabilized the population by removing offspring that were not dependent on their mothers. He and his associates then carefully observed and recorded behavior in these overpopulated communities. At the end of their experiments, Calhoun and his associates were able to conclude that overcrowding causes a breakdown in the normal social relationships among rats, a kind of social disease. The rats in the experiments did not follow the same patterns of behavior as rats would in a community without overcrowding.The females in the rat population were the most seriously affected by the high population density: They showed deviant异常的maternal behavior; they did not behave as mother rats normally do. In fact, many of the pups幼兽,幼崽, as rat babies are called, died as a result of poor maternal care. For example, mothers sometimes abandoned their pups,and, without their mothers' care, the pups died. Under normal conditions, a mother rat would not leave her pups alone to die. However, the experiments verified that in overpopulated communities, mother rats do not behave normally. Their behavior may be considered pathologically 病理上,病理学地diseased.The dominant males in the rat population were the least affected by overpopulation. Each of these strong males claimed an area of the enclosure as his own. Therefore, these individuals did not experience the overcrowding in the same way as the other rats did. The fact that the dominant males had adequate space in which to live may explain why they were not as seriously affected by overpopulation as the other rats. However, dominant males did behave pathologically at times. Their antisocial behavior consisted of attacks on weaker male,female, and immature rats. This deviant behavior showed that even though the dominant males had enough living space, they too were affected by the general overcrowding in the enclosure.Non-dominant males in the experimental rat communities also exhibited deviant social behavior. Some withdrew completely; they moved very little and ate and drank at times when the other rats were sleeping in order to avoid contact with them. Other non-dominant males were hyperactive; they were much more active than is normal, chasing other rats and fighting each other. This segment of the rat population, likeall the other parts, was affected by the overpopulation.The behavior of the non-dominant males and of the other components of the rat population has parallels in human behavior. People in densely populated areas exhibit deviant behavior similar to that of the rats in Calhoun's experiments. In large urban areas such as New York City, London, Mexican City, and Cairo, there are abandoned children. There are cruel, powerful individuals, both men and women. There are also people who withdraw and people who become hyperactive. The quantity of other forms of social pathology such as murder, rape, and robbery also frequently occur in densely populated human communities. Is the principal cause of these disorders overpopulation? Calhoun’s experiments suggest that it might be. In any case, social scientists and city planners have been influenced by the results of this series of experiments.11. Paragraph l is organized according to__________.A. reasonsB. descriptionC. examplesD. definition12.Calhoun stabilized the rat population_________.A. when it was double the number that could live in the enclosure without stressB. by removing young ratsC. at a constant number of adult rats in the enclosureD. all of the above are correct13.W hich of the following inferences CANNOT be made from theinformation inPara. 1?A. Calhoun's experiment is still considered important today.B. Overpopulation causes pathological behavior in rat populations.C. Stress does not occur in rat communities unless there is overcrowding.D. Calhoun had experimented with rats before.14. Which of the following behavior didn‟t happen in this experiment?A. All the male rats exhibited pathological behavior.B. Mother rats abandoned their pups.C. Female rats showed deviant maternal behavior.D. Mother rats left their rat babies alone.15. The main idea of the paragraph three is that __________.A. dominant males had adequate living spaceB. dominant males were not as seriously affected by overcrowding as the otherratsC. dominant males attacked weaker ratsD. the strongest males are always able to adapt to bad conditionsText DThe first mention of slavery in the statutes法令,法规of the English colonies of North America does not occur until after 1660—some forty years after the importation of the first Black people. Lest we think that existed in fact before it did in law, Oscar and Mary Handlin assure us, that the status of B lack people down to the 1660’s was that of servants. A critique批判of the Handlins’ interpretation of why legal slavery did not appear until the 1660’s suggests that assumptions about the relation between slavery and racial prejudice should be reexamined, and that explanation for the different treatment of Black slaves in North and South America should be expanded.The Handlins explain the appearance of legal slavery by arguing that, during the 1660’s, the position of white servants was improving relative to that of black servants. Thus, the Handlins contend, Black and White servants, heretofore treated alike, each attained a different status. There are, however, important objections to this argument. First, the Handlins cannot adequately demonstrate that t he White servant’s position was improving, during and after the 1660’s; several acts of the Maryland and Virginia legislatures indicate otherwise. Another flaw in the Handlins’ interpretation is their assumption that prior to the establishment of legal slavery there was no discrimination against Black people. It is true that before the 1660’s Black people were rarely called slaves. But this shouldnot overshadow evidence from the 1630’s on that points to racial discrimination without using the term slavery. Such discrimination sometimes stopped short of lifetime servitude or inherited status—the two attributes of true slavery—yet in other cases it included both. The Handlins’ argument excludes the real possibility that Black people in the English colonies were never treated as the equals of White people.The possibility has important ramifications后果,影响.If from the outset Black people were discriminated against, then legal slavery should be viewed as a reflection and an extension of racial prejudice rather than, as many historians including the Handlins have argued, the cause of prejudice. In addition, the existence of discrimination before the advent of legal slavery offers a further explanation for the harsher treatment of Black slaves in North than in South America. Freyre and Tannenbaum have rightly argued that the lack of certain traditions in North America—such as a Roman conception of slavery and a Roman Catholic emphasis on equality— explains why the treatment of Black slaves was more severe there than in the Spanish and Portuguese colonies of South America. But this cannot be the whole explanation since it is merely negative, based only on a lack of something. A more compelling令人信服的explanation is that the early and sometimes extreme racial discrimination in the English colonies helped determine the particular nature of the slavery that followed. (462 words)16. Which of the following is the most logical inference to be drawn from the passage about the effects of “several acts of the Maryland and Virginia legislatures” (Para.2) passed during and after the 1660‟s?A. The acts negatively affected the pre-1660’s position of Black as wellas of White servants.B. The acts had the effect of impairing rather than improving theposition of White servants relative to what it had been before the 1660’s.C. The acts had a different effect on the position of white servants thandid many of the acts passed during this time by the legislatures of other colonies.D. The acts, at the very least, caused the position of White servants toremain no better than it had been before the 1660’s.17. With which of the following statements regarding the status ofBlack people in the English colonies of North America before the 1660‟s would the author be LEAST likely to agree?A. Although black people were not legally considered to be slaves,they were often called slaves.B. Although subject to some discrimination, black people had a higherlegal status than they did after the 1660’s.C. Although sometimes subject to lifetime servitude, black peoplewere not legally considered to be slaves.D. Although often not treated the same as White people, black people,like many white people, possessed the legal status of servants.18. According to the passage, the Handlins have argued which of thefollowing about the relationship between racial prejudice and the institution of legal slavery in the English colonies of North America?A. Racial prejudice and the institution of slavery arose simultaneously.B. Racial prejudice most often the form of the imposition of inheritedstatus, one of the attributes of slavery.C. The source of racial prejudice was the institution of slavery.D. Because of the influence of the Roman Catholic Church, racialprejudice sometimes did not result in slavery.19. The passage suggests that the existence of a Roman conception ofslavery in Spanish and Portuguese colonies had the effect of _________.A. extending rather than causing racial prejudice in these coloniesB. hastening the legalization of slavery in these colonies.C. mitigating some of the conditions of slavery for black people in these coloniesD. delaying the introduction of slavery into the English colonies20. The author considers the explanation put forward by Freyre andTannenbaum for the treatment accorded B lack slaves in the English colonies of North America to be _____________.A. ambitious but misguidedB. valid有根据的but limitedC. popular but suspectD. anachronistic过时的,时代错误的and controversialUNIT 2Text AThe sea lay like an unbroken mirror all around the pine-girt, lonely shores of Orr’s Island. Tall, kingly spruce s wore their regal王室的crowns of cones high in air, sparkling with diamonds of clear exuded gum流出的树胶; vast old hemlocks铁杉of primeval原始的growth stood darkling in their forest shadows, their branches hung with long hoary moss久远的青苔;while feathery larches羽毛般的落叶松,turned to brilliant gold by autumn frosts, lighted up the darker shadows of the evergreens. It was one of those hazy朦胧的, calm, dissolving days of Indian summer, when everything is so quiet that the fainest kiss of the wave on the beach can be heard, and white clouds seem to faint into the blue of the sky, and soft swathing一长条bands of violet vapor make all earth look dreamy, and give to the sharp, clear-cut outlines of the northern landscape all those mysteries of light and shade which impart such tenderness to Italian scenery.The funeral was over,--- the tread鞋底的花纹/ 踏of many feet, bearing the heavy burden of two broken lives, had been to the lonely graveyard, and had come back again,--- each footstep lighter and more unconstrained不受拘束的as each one went his way from the great old tragedy of Death to the common cheerful of Life.The solemn black clock stood swaying with its eternal ―tick-tock, tick-tock,‖ in the kitchen of the brown house on Orr’s Island. There was there that sense of a stillness that can be felt,---such as settles down on a dwelling住处when any of its inmates have passed through its doors for the last time, to go whence they shall not return. The best room was shut up and darkened, with only so much light as could fall through a little heart-shaped hole in the window-shutter,---for except on solemn visits, or prayer-meetings or weddings, or funerals, that room formed no part of the daily family scenery.The kitchen was clean and ample, hearth灶台, and oven on one side, and rows of old-fashioned splint-bottomed chairs against the wall. A table scoured to snowy whiteness, and a little work-stand whereon lay the Bible, the Missionary Herald, and the Weekly Christian Mirror, before named, formed the principal furniture. One feature, however, must not be forgotten, ---a great sea-chest水手用的储物箱,which had been the companion of Zephaniah through all the countries of the earth. Old, and battered破旧的,磨损的, and unsightly难看的it looked, yet report said that there was good store within which men for the most part respect more than anything else; and, indeed it proved often when a deed of grace was to be done--- when a woman was suddenly made a widow in a coast gale大风,狂风, or a fishing-smack小渔船was run down in the fogs off the banks, leaving in some neighboring cottage a family of orphans,---in all such cases, the opening of this sea-chest was an event of good omen 预兆to the bereaved丧亲者;for Zephaniah had a large heart and a large hand, and was apt有…的倾向to take it out full of silver dollars when once it went in. So the ark of the covenant约柜could not have been looked on with more reverence崇敬than the neighbours usually showed to Captain Pennel’s sea-chest.1. The author describes Orr‟s Island in a(n)______way.A.emotionally appealing, imaginativeB.rational, logically preciseC.factually detailed, objectiveD.vague, uncertain2.According to the passage, the “best room”_____.A.has its many windows boarded upB.has had the furniture removedC.is used only on formal and ceremonious occasionsD.is the busiest room in the house3.From the description of the kitchen we can infer that thehouse belongs to people who_____.A.never have guestsB.like modern appliancesC.are probably religiousD.dislike housework4.The passage implies that_______.A.few people attended the funeralB.fishing is a secure vocationC.the island is densely populatedD.the house belonged to the deceased5.From the description of Zephaniah we can see thathe_________.A.was physically a very big manB.preferred the lonely life of a sailorC.always stayed at homeD.was frugal and saved a lotText BBasic to any understanding of Canada in the 20 years after the Second World War is the country' s impressive population growth. For every three Canadians in 1945, there were over five in 1966. In September 1966 Canada's population passed the 20 million mark. Most of this surging growth came from natural increase. The depression of the 1930s and the war had held back marriages, and the catching-up process began after 1945. The baby boom continued through the decade of the 1950s, producing a population increase of nearly fifteen percent in the five years from 1951 to 1956. This rate of increase had been exceeded only once before in Canada's history, in the decade before 1911 when the prairies were being settled. Undoubtedly, the good economic conditions of the 1950s supported a growth in the population, but the expansion also derived from a trend toward earlier marriages and an increase in the average size of families; In 1957 the Canadian birth rate stood at 28 per thousand, one of the highest in the world. After the peak year of 1957, thebirth rate in Canada began to decline. It continued falling until in 1966 it stood at the lowest level in 25 years. Partly this decline reflected the low level of births during the depression and the war, but it was also caused by changes in Canadian society. Young people were staying at school longer, more women were working; young married couples were buying automobiles or houses before starting families; rising living standards were cutting down the size of families. It appeared that Canada was once more falling in step with the trend toward smaller families that had occurred all through theWestern world since the time of the Industrial Revolution. Although the growth in Canada’s population had slowed down by 1966 (the cent), another increase in the first half of the 1960s was only nine percent), another large population wave was coming over the horizon. It would be composed of the children of the children who were born during the period of the high birth rate prior to 1957.6. What does the passage mainly discuss?A. Educational changes in Canadian society.B. Canada during the Second World War.C. Population trends in postwar Canada.D. Standards of living in Canada.7. According to the passage, when did Canada's baby boom begin?A. In the decade after 1911.B. After 1945.C. During the depression of the 1930s.D. In 1966.8. The author suggests that in Canada during the 1950s____________.A. the urban population decreased rapidlyB. fewer people marriedC. economic conditions were poorD. the birth rate was very high9. When was the birth rate in Canada at its lowest postwar level?A. 1966.B. 1957.C. 1956.D. 1951.10. The author mentions all of the following as causes of declines inpopulation growth after 1957 EXCEPT_________________.A. people being better educatedB. people getting married earlierC. better standards of livingD. couples buying houses11.I t can be inferred from the passage that before the IndustrialRevolution_______________.A. families were largerB. population statistics were unreliableC. the population grew steadilyD. economic conditions were badText CI was just a boy when my father brought me to Harlem for the first time, almost 50 years ago. We stayed at the hotel Theresa, a grand brick structure at 125th Street and Seventh avenue. Once, in the hotel restaurant, my father pointed out Joe Louis. He even got Mr. Brown, the hotel manager, to introduce me to him, a bit punchy强力的but still champ焦急as fast as I was concerned.Much has changed since then. Business and real estate are booming. Some say a new renaissance is under way. Others decry责难what they see as outside forces running roughshod肆意践踏over the old Harlem. New York meant Harlem to me, and as a young man I visited it whenever I could. But many of my old haunts are gone. The Theresa shut down in 1966. National chains that once ignored Harlem now anticipate yuppie money and want pieces of this prime Manhattan real estate. So here I am on a hot August afternoon, sitting in a Starbucks that two years ago opened a block away from the Theresa, snatching抓取,攫取at memories between sips of high-priced coffee. I am about to open up a piece of the old Harlem---the New York Amsterdam News---when a tourist。

SIMATIC Energy Manager PRO V7.2 - Operation Operat

SIMATIC Energy Manager PRO V7.2 - Operation Operat
Disclaimer of Liability We have reviewed the contents of this publication to ensure consistency with the hardware and software described. Since variance cannot be precluded entirely, we cannot guarantee full consistency. However, the information in this publication is reviewed regularly and any necessary corrections are included in subsequent editions.
2 Energy Manager PRO Client................................................................................................................. 19
2.1 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 2.1.5.1 2.1.5.2 2.1.6
Basics ................................................................................................................................ 19 Start Energy Manager ........................................................................................................ 19 Client as navigation tool..................................................................................................... 23 Basic configuration ............................................................................................................ 25 Search for object................................................................................................................ 31 Quicklinks.......................................................................................................................... 33 Create Quicklinks ............................................................................................................... 33 Editing Quicklinks .............................................................................................................. 35 Help .................................................................................................................................. 38

latent class model

latent class model

Latent Class ModelsbyJay Magidson, Ph.D.Statistical Innovations Inc.Jeroen K. Vermunt, Ph.D.Tilburg University, the NetherlandsOver the past several years more significant books have been published on latent class and other types of finite mixture models than any other class of statistical models. The recent increase in interest in latent class models is due to the development of extended computer algorithms, which allow today's computers to perform latent class analysis on data containing more than just a few variables. In addition, researchers are realizing that the use of latent class models can yield powerful improvements over traditional approaches to cluster, factor, regression/segmentation, as well as to multivariable biplots and related graphical displays.What are Latent Class Models?Traditional models used in regression, discriminant and log-linear analysis contain parameters that describe only relationships between the observed variables. Latent class (LC) models (also known as finite mixture models) differ from these by including one or more discrete unobserved variables. In the context of marketing research, one will typically interpret the categories of these latent variables, the latent classes, as clusters or segments (Dillon and Kumar 1994; Wedel and Kamakura 1998). In fact, LC analysis provides a powerful new tool to identify important market segments in target marketing. LC models do not rely on the traditional modeling assumptions which are often violated in practice (linear relationship, normal distribution, homogeneity). Hence, they are less subject to biases associated with data not conforming to model assumptions. In addition, LC models have recently been extended (Vermunt and Magidson, 2000a, 2000b) to include variables of mixed scale types (nominal, ordinal, continuous and/or count variables) in the same analysis. Also, for improved cluster or segment description the relationship between the latent classes and external variables (covariates) can be assessed simultaneously with the identification of the clusters. This eliminates the need for the usual second stage of analysis where a discriminant analysis is performed to relate the cluster results to demographic and other variables.Kinds of Latent Class ModelsThree common statistical application areas of LC analysis are those that involve1)clustering of cases,2)variable reduction and scale construction, and3)prediction.This paper introduces the three major kinds of LC models:•LC Cluster Models,•LC Factor Models,•LC Regression Models.Our illustrative examples make use of the new computer program (Vermunt and Magidson, 2000b) called Latent GOLD®.LC Cluster ModelsThe LC Cluster model:•identifies clusters which group together persons (cases) who share similar interests/values/characteristics/behavior,•includes a K-category latent variable, each category representing a cluster. Advantages over traditional types of cluster analysis include:•probability-based classification: Cases are classified into clusters based upon membership probabilities estimated directly from the model,•variables may be continuous, categorical (nominal or ordinal), or counts or any combination of these,•demographics and other covariates can be used for cluster description.Typical marketing applications include:•exploratory data analysis,•development of a behavioral based and other segmentations of customers and prospects.Traditional clustering approaches utilize unsupervised classification algorithms that group cases together that are "near" each other according to some ad hoc definition of "distance". In the last decade interest has shifted towards model-based approaches which use estimated membership probabilities to classify cases into the appropriate cluster. The most popular model-based approach is known as mixture-model clustering, where each latent class represents a hidden cluster (McLachlan and Basford, 1988). Within the marketing research field, this method is sometimes referred to as “latent discriminant analysis” (Dillon and Mulani, 1999). Today's high-speed computers make these computationally intensive methods practical.For the general finite mixture model, not only continuous variables, but also variables that are ordinal, nominal or counts, or any combination of these can be included. Also, covariates can be included for improved cluster description.As an example, we used the LC cluster model to develop a segmentation of current bank customers based upon the types of accounts they have. Separate models were developed specifying different numbers of clusters and the model selected was the one that had the lowest BIC statistic.This criteria resulted in 4 segments which were named:1)Value Seekers (15% of customers),2)Conservative Savers (35% of customers),3)Mainstreamers (40% of customers),4)Investors (10% of customers).For each customer, the model gave estimated membership probabilities for each segment based on their account mix. The resulting segments were verified to be very homogeneous and to differ substantially from each other not only with respect to their mix of accounts, but also with respect to demographics, and profitability. In addition, examination of survey data among the sample of customers for which customer satisfaction data were obtained found some important attitudinal and satisfaction differences between the segments as well. Value seekers were youngest and a high percentage were new customers. Basic savers were oldest.Investors were the most profitable customer segment by far. Although only 10% of all customers, they accounted for over 30% of the bank’s deposits. Survey data pinpointed the areas of the bank with which this segment was least satisfied and a LC regression model (see below) on follow-up data related their dissatisfaction to attrition. The primary uses of the survey data was to identify reasons for low satisfaction and to develop strategies of improving satisfaction in the manner that increased retention.This methodology of segmenting based on behavioral information available on all customers offers many advantages over the common practice of developing segments from survey data and then attempting to allocate all customers to the different clusters. Advantages of developing a segmentation based on behavioral data include:•past behavior is known to be the best predictor of future behavior,•all customers can be assigned to a segment directly, not just the sample for which survey data is available,•improved reliability over segmentations based on attitudes, demographics, purchase intent and other survey variables (when segment membership is based on survey data,a large amount of classification error is almost always present for non-surveyedcustomers) .LC Factor ModelsThe LC Factor model:•identifies factors which group together variables sharing a common source of variation,•can include several ordinal latent variables, each of which contains 2 or more levels,•is similar to maximum likelihood factor analysis in that its use may be exploratory or confirmatory and factors may be assumed to be correlated or uncorrelated(orthogonal).Advantages over traditional factor analysis are:•factors need not be rotated to be interpretable,•factor scores are obtained directly from the model without imposing additional assumptions,•variables may be continuous, categorical (nominal or ordinal), or counts or any combination of these,•extended factor models can be estimated that include covariates and correlated residuals.Typical marketing applications include:•development of composite variables from attitudinal survey items,•development of perceptual maps and other kinds of biplots which relate product and brand usage to behavioral and attitudinal measures and to demographics,•estimation of factor scores,•direct conversion from factors to segments.The conversion of ordinal factors to segments is straightforward. For example, consider a model containing 2 dichotomous factors. In this case, the LC factor model provides membership classification probabilities directly for 4 clusters (segments) based on the classification of cases as high vs. low on each factor: segment 1 = (low, low); segment 2 = (low, high); segment 3 = (high, low) and segment 4 = (high, high). Magidson and Vermunt (2000) found that LC factor models specifying uncorrelated factors often fit data better than comparable cluster models (i.e., cluster models containing the same number of parameters).Figure 1 provides a bi-plot in 2-factor space of lifestyle interests where the horizontal axis represents the probability of being high on factor 1 and the vertical axis the probability of being high on factor 2. The variable AGE was included directly in the LC Factor model as a covariate and therefore shows up in the bi-plot to assist in understanding the meaning of the factors. For example, we see that persons aged 65+ are most likely to be in the (low, high) segment, as are persons expressing an interest in sewing. As a group, their (mean) factor scores are (Factor 1, Factor 2) = (.06, .67).Since these factor scores have a distinct probabilistic interpretation, this bi-plot represents an improvement over traditional biplots and perceptual maps (see Magidson and Vermunt 2000). Individual cases can also be plotted based on their factor scores.Figure 1: Bi-plot for life-style dataThe factor model can also be used to deal with measurement and classification errors in categorical variables. It is actually equivalent to a latent trait (IRT) model without the requirement that the traits be normally distributed.LC Regression ModelsThe LC Regression model, also known as the LC Segmentation model:• is used to predict a dependent variable as a function of predictors,• includes an R-category latent variable, each category representing a homogeneous population (class, segment),• different regressions are estimated for each population (for each latent segment),• classifies cases into segments and develops regression models simultaneously.Advantages over traditional regression models include:• relaxing the traditional assumption that the same model holds for all cases (R=1)allows the development of separate regressions to be used to target each segment,• diagnostic statistics are available to determine the value for R,Factor10.00.20.40.60.8 1.0Factor21.00.80.60.40.20.0AGECAMPINGHUNTINGFISHINGWINESKNITTINGSEWINGFITNESSTENNISGOLFSKIBIKINGBOATINGGARDENTRAVEL 18-2425-3435-4445-5455-6465+camp huntfish wines knit sew fitness tennis golf ski bikeboat gardentravel•for R > 1, covariates can be included in the model to improve classification of each case into the most likely segment.Typical marketing applications include:•customer satisfaction studies: identify particular determinants of customer satisfaction that are appropriate for each customer segment,•conjoint studies: identify the mix of product attributes that appeal to different market segments,•more generally: identify segments that differ from each other with respect to some dependent variable criterion.Like traditional regression modeling, LC regression requires a computer program. As LC regression modeling is relatively new, very few programs currently exist. Our comparisons between LC regression and traditional linear regression are based on the particular forms of LC regression that are implemented in the Latent GOLD® program. For other software see Wedel and DeSarbo (1994) and Wedel and Kamakura (1998). Typical regression programs utilize ordinary least squares estimation in conjunction with a linear model. In particular, such programs are based on two restrictive assumptions about data that are often violated in practice:1)the dependent variable is continuous with prediction error normally distributed,2)the population is homogeneous - one model holds for all cases.LC regression as implemented in the Latent GOLD® program relaxes these assumptions: 1) it accommodates dependent variables that are continuous, categorical (binary, polytomous nominal or ordinal), binomial counts, or Poisson counts,2) the population needs not be homogeneous (i.e., there may be multiple populations as determined by the BIC statistic).One potential drawback for LC models is that there is no guarantee that the solution will be the maximum likelihood solution. LC computer programs typically employ the EM or Newton Raphson algorithm which may converge to a local as opposed to a global maximum. Some programs provide randomized starting values to allow users to increase the likelihood of converging to a global solution by starting the algorithm at different randomly generated starting places. An additional approach is to use Bayesian prior information in conjunction with randomized starting values which eliminates the possibility of obtaining boundary (extreme) solutions and reduces the chance of obtaining local solutions. Generally speaking, we have achieved good results using 10 randomized starting values and small Bayes constants (the default option in the Latent GOLD program).In addition to using predictors to estimate separate regression model for each class, covariates can be specified to refine class descriptions and improve classification of cases into the appropriate latent classes. In this case, LC regression analysis consists of 3 simultaneous steps:1)identify latent classes or hidden segments2)use demographic and other covariates to predict class membership, and3)classify cases into the appropriate classes/segmentsDependent variables may also include repeated/correlated observations of the kind often collected in conjoint marketing studies where persons are asked to rate different product profiles. Below is an example of a full factorial conjoint study designed to assist in the determination of the mix of product attributes for a new product.Conjoint Case StudyIn this example, 400 persons were asked to rate each of 8 different attribute combinations regarding their likelihood to purchase. Hence, there are 8 records per case; one record for each cell in this 2x2x2 conjoint design based on the following attributes:•FASHION (1 = Traditional; 2 = Modern),•QUALITY (1 = Low; 2 = High),•PRICE (1 = Lower; 2 = Higher) .The dependent variable (RATING) is the rating of purchase intent on a five-point scale. The three attributes listed above are used as predictor variables in the model and the following demographic variables are used as covariates:•SEX (1 = Male; 2 = Female),•AGE (1 = 16-24; 2 = 25-39; 3 = 40+).The goal of a traditional conjoint study of this kind is to determine the relative effects of each attribute in influencing one’s purchase decision; a goal attained by estimating regression (or logit) coefficients for these attributes. When the LC regression model is used with the same data, a more general goal is attained. First, it is determined whether the population is homogeneous or whether there exists two or more distinct populations (latent segments) which differ with respect to the relative importance placed on each of the three attributes. If multiple segments are found, separate regression models are estimated simultaneously for each. For example, for one segment, price may be found to influence the purchase decision, while a second segment may be price insensitive, but influenced by quality and modern appearance.We will treat RATING as an ordinal dependent variable and estimate several different models to determine the number of segments (latent classes). We will then show how this methodology can be used to describe the demographic differences between these segments and to classify each respondent into the segment which is most appropriate. We estimated one- to four-class models with and without covariates. Table 1 reports the obtained test results. The BIC values indicate that the three-class model is the best model (BIC is lowest for this model) and that the inclusion of covariates significantly improves the model.Table 1: Test results for regression models for conjoint dataModelLog-likelihood BIC-valueNumber ofparametersWithout covariatesOne segment-440288467Two segments-4141831915Three segments-4087831223Four segments-4080834631With covariatesTwo segments-4088828418Three segments-4036824629Four segments-4026829340The parameter estimates of the three-class model with covariates are reported in Tables 2 and 3 and 4. As can be seen from the first row of Table 2, segment 1 contains about 50% of the subjects, segment 2 contains about 25% and segment 3 contains the remaining 25%. Examination of class-specific probabilities shows that overall, segment 1 is least likely to buy (only 5% are Very Likely to buy) and segment 3 is most likely (21% are Very Likely to buy).♦Table 2: Profile outputClass 1Class 2Class 3Segment Size0.490.260.25RatingVery Unlikely0.210.100.05Not Very Likely0.430.200.12Neutral0.200.370.20Somewhat Likely0.100.210.43Very Likely0.050.110.21♦Table 3: Beta's or parameters of model for dependent variableClass 1Class 2Class 3Wald p-value Wald(=)p-value Fashion 1.97 1.140.04440.19 4.4e-95191.21 3.0e-42Quality0.040.85 2.06176.00 6.5e-38132.33 1.8e-29Price-1.04-0.99-0.94496.38 2.9e-1070.760.68The beta parameter for each predictor is a measure of the influence of that predictor on RATING. The beta effect estimates under the column labeled Class 1 suggest that segment 1 is influenced in a positive way by products for which FASHION = Modern (beta = 1.97) and in negative way by PRICE = Higher (beta = -1.04), but not by QUALITY (beta is approximately 0). We also see that segment 2 is influenced by all 3 attributes, having a preference for those product choices that are modern (beta = 1.14), high quality (beta = .85) and lower priced (beta = -0.99). Members of segment 3 preferhigh quality (beta = 2.06) and the lower (beta = -.94) product choices, but are not influenced by FASHION.Note that PRICE has more or less the same influence on all three segments. The Wald (=) statistic indicates that the differences in these beta effects across classes are not significant (the p-value = .68 which is much higher than .05, the standard level for assessing statistical significance). This means that all 3 segments exhibit price sensitivity to the same degree. This is confirmed when we estimate a model in which this effect is specified to be class-independent. The p-value for the Wald statistic for PRICE is2.9x10-107 indicating that the amount of price sensitivity is highly significant.With respect to the effect of the other two attributes we find large between-segment differences. The predictor FASHION has a strong influence on segment 1, a less strong effect on segment 2, and virtually no effect on segment 3. QUALITY has a strong effect on segment 3, a less strong effect on segment 2, and virtually no effect on segment 1. The fact that the influence of FASHION and QUALITY differs significantly between the 3 segments is confirmed by the significant p-values associated with the Wald(=) statistics for these attributes. For example, for FASHION, the p-value = 3.0x10-42.The beta parameters of the regression model can be used to name the latent segments. Segment 1 could be named the “Fashion-Oriented” segment, segment 3 the “Quality-Oriented” segment, and segment 2 is the segment that takes into account all 3 attributes in their purchase decision.♦Table 4: Gamma's: parameters of model for latent distributionClass 1Class 2Class 3Wald p-valueSexMale-0.560.71-0.1524.47 4.9e-6Female0.56-0.710.15Age16-250.84-0.59-0.2453.098.1e-1126-40-0.320.59-0.2740+-0.520.010.51The parameters of the (multinomial logit) model for the latent distribution appear in Table 4. These show that females have a higher probability of belonging to the “Fashion-oriented” segment (segment 1), while males more often belong to segment 2. The Age effects show that the youngest age group is over-represented in the “Fashion-oriented”segment, while the oldest age group is over-represented in the “Quality oriented”Segment.ConclusionsWe introduced three kinds of LC models and described applications of each that are of interest in marketing research, survey analysis and related fields. It was shown that LC analysis can be used as a replacement for traditional cluster analysis techniques, as a factor analytic tool for reducing dimensionality, and as a tool for estimating separate regression models for each segment. In particular, these models offer powerful new approaches for identifying market segments.BIOSJay Magidson is founder and president of Statistical Innovations, a Boston based consulting, training and software development firm specializing in segmentation modeling. His clients have included A.C. Nielsen, Household Finance, and National Geographic Society. He is widely published on the theory and applications of multivariate statistical methods, and was awarded a patent for a new innovative graphical approach for analysis of categorical data. He taught statistics at Tufts and Boston University, and is chair of the Statistical Modeling Week workshop series. Dr. Magidson designed the SPSS CHAID™ and GOLDMineR® programs, and is the co-developer (with Jeroen Vermunt) of Latent GOLD®.Jeroen Vermunt is Assistant Professor in the Methodology Department of the Faculty of Social and Behavioral Sciences, and Research Associate at the Work and Organization Research Center at Tilburg University in the Netherlands. He has taught a variety of courses and seminars on log-linear analysis, latent class analysis, item response models, models for non-response, and event history analysisall over the world, as well as published extensively on these subjects. Professor Vermunt is developer of the LEM program and co-developer (with Jay Magidson) of Latent GOLD® .ReferencesDillon, W.R., and Kumar, A. (1994). Latent structure and other mixture models in marketing: An integrative survey and overview. R.P. Bagozzi (ed.), Advanced methods of Marketing Research, 352-388,Cambridge: Blackwell Publishers.Dillon, W.R.. and Mulani, N. (1989) LADI: A latent discriminant model for analyzing marketing research data. Journal of Marketing Research, 26, 15-29.Magidson J., and Vermunt, J.K. (2000), Latent Class Factor and Cluster Models, Bi-plots and Related Graphical Displays. Submitted for publication.McLachlan, G.J., and Basford, K.E. (1988). Mixture models: inference and application to clustering. New York: Marcel Dekker.Vermunt, J.K. & Magidson, J. (2000a). “Latent Class Cluster Analysis”, chapter 3 in J.A. Hagenaars and A.L. McCutcheon (eds.), Advances in Latent Class Analysis. Cambridge University Press.Vermunt, J.K. & Magidson, J. (2000b). Latent GOLD 2.0 User's Guide. Belmont, MA: Statistical Innovations Inc.Wedel, M., and DeSarbo, W.S (1994). A review of recent developments in latent class regression models. R.P. Bagozzi (ed.), Advanced methods of Marketing Research, 352-388, Cambridge: Blackwell Publishers. Wedel, M., and Kamakura, W.A. (1998). Market segmentation: Concepts and methodological foundations. Boston: Kluwer Academic Publishers.。

欧氏距离类间距离——最短距离PPT课件

欧氏距离类间距离——最短距离PPT课件
初始分类g初始类别数目m5初始类间距离矩阵d1535153525步骤步骤2新的类别数目m4新的类间距离矩阵d24步骤步骤3新的类别数目m3新的类间距离矩阵d25步骤步骤4新的类别数目m2新的类间距离矩阵d26步骤步骤5新的类别数目m1新的类间距离矩阵d步骤步骤6gibbonsymphalangushumangorillachimpanzee28影响聚类结果的主要因素影响聚类结果的主要因素样本间距离的定义dij类间距离的定义dij29层次聚类linkage方法linkage方法直接影响了聚类结果它取决于类间距离如何定义
G1
L: c1x1+c2x2-c=0
G2
x1
模式分类算法
• 线性分类器 • 神经网络 • 最近邻 • 贝叶斯分类器 • 隐马尔科夫模型分类器 • 决策树 • 支持向量机
Principal component analysis (PCA, 主成分分析)
• 基因芯片数据维数高,难以可视化 • 基因芯片数据噪音比较强 • PCA主要的应用
D(3)
X(5)
C(4)
C(3)
X(5)
0
C(4)
C(3)
6
2
0
2.5
0
步骤4
由D(3)知,合并X(5)和C(3)为一新类C(2)={X(5), C(3)},有:
新的G (4)={C(4) , C(2)} 新的类别数目m=2 新的类间距离矩阵D(4)
D(4)
C(4)
C(2)
C(4)
0
2.5
C(2)
0
步骤5
由D(4)知,最后合并C(4)和C(2)为一新类C(1)={C(4), C(2)},有:
新的G (5)={C(4) , C(2)} 新的类别数目m=1 新的类间距离矩阵D(5)

最大类间方差法英文缩写

最大类间方差法英文缩写

最大类间方差法(Maximun Between-Clustering Variance)英文缩写1. 引言最大类间方差法,是一种常用的数据分析和聚类方法。

它通过最大化类间方差来寻找最佳的聚类划分,从而获得高质量的聚类结果。

本文将对最大类间方差法进行全面、详细、完整和深入的探讨。

2. 聚类概述聚类是将一组未标记的数据对象划分为若干个类的过程。

聚类算法根据数据对象之间的相似度或距离进行划分,使得同一类对象相似度高,不同类对象相似度低。

3. 类间方差与类内方差在介绍最大类间方差法之前,我们需要了解类间方差和类内方差的概念。

3.1 类间方差类间方差用于衡量不同类别之间的分散程度。

它表示各个类别的中心点之间的离散程度,可以通过计算类别中心点的欧氏距离平方和来获得。

类间方差越大,表示不同类别之间的区分度越高。

3.2 类内方差类内方差用于衡量同一类别内部的紧密程度。

它表示同一类别内各个数据对象与该类别中心点的距离平方和。

类内方差越小,表示同一类别内部的数据对象越相似。

4. 最大类间方差法原理最大类间方差法的目标是找到最佳的聚类划分,使得类间方差最大。

算法的基本原理如下:1.初始化:将所有的数据对象看作一个类别。

2.计算总体的类别中心点。

3.遍历所有的数据对象,将其移动到与其中心点距离最大的类别中心点所在的类别。

4.重复步骤2和步骤3,直到类别的类间方差最大化。

5. 最大类间方差法的优缺点最大类间方差法作为一种常用的聚类方法,具有以下优点:•简单易懂:最大类间方差法的原理易于理解和实现。

•有效性高:通过最大化类间方差,可以得到高质量的聚类结果。

然而,最大类间方差法也存在一些缺点:•对初始聚类中心点敏感:初始聚类中心点的选择会影响到最终的聚类结果。

•需要预先设定聚类数目:最大类间方差法需要事先确定聚类的数目。

6. 最大类间方差法的应用最大类间方差法在许多领域都有广泛的应用,比如:1.数据挖掘:最大类间方差法可以帮助发现数据集中的有意义的模式和趋势。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

a rXiv:as tr o-ph/36342v228Jan24Carnegie Observatories Astrophysics Series,Vol.3:Clusters of Galaxies:Probes of Cosmological Structure and Galaxy Evolution ed.J.S.Mulchaey,A.Dressler,and A.Oemler (Cambridge:Cambridge Univ.Press)Abstract In this review,we take the reader on a journey.We start by looking at the properties of galaxies in the cores of rich clusters.We have focused on the overall picture:star formation in clusters is strongly suppressed relative to field galaxies at the same redshift.We will argue that the increasing activity and blue populations of clusters with redshift results from a greater level of activity in field galaxies rather than a change in the transformation imposed by the cluster environment.With this in mind,we travel out from the cluster,focusing first on the properties of galaxies in the outskirts of clusters and then on galaxies in isolated groups.At low redshift,we are able to efficiently probe these environments using the Sloan Digital Sky Survey and 2dF redshift surveys.These allow an accurate comparison of galaxy star formation rates in different regions.The current results show a strong suppression of star formation above a critical threshold in local density.The threshold seems similar regardless of the overall mass of the system.At low redshift at least,only galaxies in close,isolated pairs have their star formation rate boosted above the global average.At higher redshift,work on constructing homogeneous catalogs of galaxies in groups and in the infall regions of clusters is still at an early stage.In the final section,we draw these strands together,summarizing what we can deduce about the mechanisms that transform star-forming field galaxies into their quiescent cluster counterparts.We discuss what we can learn about the impact of environment on the global star formation history of the Universe.1.1IntroductionLet us start with an outline of this review.We will begin by looking at galaxies in the cores of clusters.We have been observing clusters for many years.Some milestones are the papers on the morphological differences between cluster galaxies and the general field (Hubble &Humason 1931),the discovery of a global morphology-density relation (Oemler 1974;Dressler 1980),and the realization of the importance of the color-magnitude relation (Sandage &Visvanathan 1978).We will attempt to summarize what we have learned from looking at clusters since this time.In particular,recent observations now span a wide range of redshift,allowing us to look directly at how the galaxy populations evolve.In the second section,we will investigate how galaxy star formation rates vary with radius and local density.In particular,we will focus on the recent results from the 2dF galaxy redshift survey.The aim here is to understand how galaxy properties are influenced byR.G.Bower and M.L.Baloghtheir environment.As we will discuss,it seems that the group environment is critical to the evolution of galaxies,creating a distinctive threshold.In the third section,we will review some of the ideas about how this can all be put to-gether,and how we can hope to use the environmental studies that many groups are under-taking to build a better understanding of the evolution of the Universe.Throughout,this paper will focus on galaxy star formation rates as the measure of galaxy properties,and will leave aside the whole issue of galaxy morphology for other reviewers to deal with.Clearly,the two issues are related since galaxy morphology partly reflects the strength of H II regions in the galaxy disk(Sandage1961),but the two factors are not uniquely linked.Morphology and star formation may be influenced differently by different environments(Dressler et al.1997;Poggianti et al.1999;McIntosh,Rix,&Caldwell2003). We will also skirt around the important issue of E+A galaxies(Couch&Sharples1987; Dressler&Gunn1992;Barger et al.1996;Balogh et al.1999;Poggianti et al.1999)and star formation that is obscured from view in the optical(Poggianti&Wu1999;Smail et al. 1999;Duc et al.2002;Miller&Owen2002).These are discussed in detail in Poggianti (2003).Wherever possible we will use Hαas the star formation indicator(Kennicutt1992), but as we probe to higher redshift,we are forced to use[O II]λ3727unless we shift our strategy to infrared spectrographs.We will also stick to talking about bright galaxies,by which we mean galaxies brighter than1mag fainter than L∗.It would need another complete review if we were to compare the properties of dwarf galaxies over the same range of environments.A good place to start would be Drinkwater et al.(2001),or the many presentations on cluster dwarfs at this Symposium.By the same token,we will avoid discussion of the evolution of the galaxy lu-minosity function(Barger et al.1998;De Propris et al.1999);this is summarized in Rudnick et al.(2003).To avoid confusion,it is worth laying out exactly what we mean by the terms“cluster”and“group.”We will use the term cluster to mean a virialized halo with mass greater than 1014M⊙and the term group to mean a halo more massive than about1013M⊙(but less than 1014M⊙).If an isolated L∗galaxy has a halo mass of order1012M⊙(Evans&Wilkinson 2000;Guzik&Seljak2002;Sakamoto,Chiba,&Beers2003),then our definition of a group contains more thanfive L∗galaxies at the present day.At higher redshift,the conversion between mass and galaxy numbers is more complicated since it depends on whether L∗evolves or not.If we stick to a definition in terms of mass,then at least everything is clear from a theoretical perspective,and we can make quite definite predictions about the numbers of such halos,their clustering as a function of redshift(Press&Schechter1974;Jenkins et al.2001;Sheth,Mo,&Tormen2001),and how mass accumulates from smaller halos into large clusters(Bond et al.1991;Bower1991;Lacey&Cole1993;Mo&White2002). 1.2Clusters of GalaxiesAt the outset,its worth reminding ourselves of why we study galaxy evolution in clusters.One popular reason is that the cluster is a good laboratory in which to study galaxy evolution.Another is that it is“easy”—when we observe the galaxy spectra,we know that most objects will be in this dense environment and that our observations will be highly efficient.The same reason allows us to recognize clusters out to very high redshifts and thus to extend our studies to a very long baseline.But we should remember that clusters do have a significant drawback:they are rare objects.For the standardΛCDM cosmology(Ωm=0.3,R.G.Bower and M.L.BaloghΩΛ=0.7,h=0.7,σ8=0.9),the space density of>1014M⊙halos is7×10−5h−3Mpc−3. Even though such clusters contain∼100L∗galaxies,less than10%of the cosmic galaxy population is found in such objects.There is an emerging consensus that suggests that the stellar populations of galaxies in cluster cores are generally old,with most of the stars formed at z>2.Most of these galaxies also have early-type morphology.It is possible to derive remarkably tight constraints from looking at colors(Bower,Lucey,&Ellis1992;Bower,Kodama,&Terlevich1998;Gladders et al.1998;van Dokkum et al.1998),at the Mg-σrelation(Gúzman et al.1992),or at the scatter in the fundamental plane(Jørgensen et al.1999;Fritz et al.2003).These results rely on the argument that recent star formation would lead to excessive scatter in these tight relations,unless it was in some way coordinated,or the color variations due to age were cancelled out by variations in metal abundance(Faber et al.1999;Ferreras,Charlot,&Silk 1999).Line-index measurements generally suggest very old populations(Jørgensen1999; Poggianti et al.2001),but these relations tend to show somewhat more scatter.This has been interpreted as evidence for the cancellation effects in broad-band colors.To improve the evidence,one can compare clusters at high redshift.For example,if we concentrate on the color-magnitude relation,we would expect the narrow relation seen in lo-cal clusters to break down as we approach the epoch when star formation was prevalent.In fact,we have discovered that the color-magnitude relation is well established in high-redshift clusters(Ellis et al.1997;van Dokkum et al.1998),and that the line-index correlations,fun-damental plane(Kelson et al.2001),and Tully-Fisher relation measurements(e.g.,Metavier 2003;Ziegler et al.2003;but see Milvang-Jensen et al.2003)also show little increase in scatter compared to local clusters.So far,tight relations have been identified in clusters out to z=1.27(van Dokkum et al.2000;Barrientos et al.2003;van Dokkum&Stanford2003). The tight relation does eventually seem to break down,and we are not aware of any strong color-magnitude relation that has been identified in“proto-clusters”at z>2.There is a bias here,however,that should be clearly recognized.Although we are discov-ering that clusters at high redshifts seem also to contain old galaxies,this does not mean that all galaxies in local clusters must have these old populations.A large fraction of galaxies that are bound into local clusters would have been isolated“field”galaxies at z≈1.An even stronger bias of this type has been termed“progenitor bias”by van Dokkum&Franx(2001). They point out that if only a subset of the cluster populations is studied(for example only the galaxies with early-type morphology),then it is quite easy to arrive at a biased view.To get the full picture,one needs to study the galaxy population of the cluster as a whole.An interesting strategy is therefore to simply measure the star formation rate in clusters at different epochs.The general consensus seems to be that there is little star formation (relative tofield galaxies at the same redshift)in virialized cluster cores below z=1.5.For example,Couch et al.’s(2001)survey of the AC114cluster found that star formation was suppressed by an order of magnitude compared to thefield.Similar levels of suppression are seen in poor clusters(Balogh et al.2002).While these studiesfind some exciting objects(see Finn&Zaritsky2003for further examples),the general trend is for the star formation rate to be strongly suppressed relative to thefield at the same redshift.infrared measurements(Duc et al.2002)and radio measurements(Miller2003;Morrison&Owen 2003)have generally come to similar conclusions.The E+A galaxies(Dressler&Gunn 1992)or post-starburst galaxies(Couch&Sharples1987)are a puzzling exception.The large numbers found by the MORPHs group(Dressler et al.1997)suggest that there wasR.G.Bower and M.L.Baloghstrong star formation activity in the recent past in many galaxies(but see Balogh et al.1999).A possible explanation is that these galaxies have only recently arrived in the cluster from much lower-density environments.Indeed,field studies at low redshift have shown this type of object to be more common in low-density regions than in clusters(Zabludoff et al.1996; Goto et al.2003;Quintero et al.2003).Therefore,the greater numbers of E+A galaxies found in high-redshift clusters may result from the greater star formation activity of galaxies outside clusters—this idea gains strong support from Tran et al.’s(2003)observations presented at the Symposium.The next step is to compare the star formation rates in clusters cores at different redshifts. Work is only just starting on this using emission-line strengths(e.g.,Ellingson et al.2001), since it is essential to control systematic uncertainties,such as the aperture through which the star formation rate is measured.However,extensive comparisons have been made on the basis of colors,starting with Butcher&Oemler(1978,1984)and Couch&Newell(1984). These papers showed a startling increase in the numbers of blue galaxies in z>0.2galaxy clusters compared to the present day.These results have been confirmed by more recent studies(e.g.,Rakos&Schombert1995;Margoniner et al.2001),although the effect of the magnitude limit and cluster selection play at least as important a role as the redshift(Fairley et al.2002).There are two issues that complicate the comparison of the galaxies in cluster cores,how-ever.Firstly,we must be careful how we select galaxies that are to be compared.Most of the blue galaxies lie close to the photometric completeness limit.These galaxies will fade by up to1mag if star formation is turned off,and thus they are not directly comparable to the red-galaxy population selected at the same magnitude limit(Smail et al.1998;Kodama &Bower2001).Secondly,we are observing galaxy clusters in projection.There is little doubt that thefield galaxy population at intermediate redshift is much bluer than in the local Universe(Lilly et al.1995;Madau,Pozzetti,&Dickinson1998);thus,although a small level of contamination byfield galaxies has little influence on the overall color distribution, the same contamination will have a much bigger impact on the distribution at intermediate redshift.This problem is only partially eliminated if a complete sample of galaxy redshifts is available since the velocity dispersion of the cluster makes it impossible to distinguish cluster members from“near-field”galaxies that are close enough to the cluster to be indis-tinguishable in redshift space(Allington-Smith et al.1993;Balogh et al.1999;Ellingson et al.2001).This idea is reinforced by experiments with numerical simulations.Galaxies can be associated with dark matter particles,and then“observed”to measure the extent to which radial information is lost.Diaferio et al.(2001)found that a contamination of10%can easily occur;furthermore,since most of the contaminating galaxies are blue(and in these models most genuine cluster galaxies are red),the fraction of blue galaxies can then be boosted by 50%.Despite this,Ellingson et al.(2001)conclude that the rate at which clusters are being built up must also be higher in the past in order for this explanation to work.Kauffmann (1995)shows that there is good theoretical justification for this.It will be interesting to see if the evolution in the colors of the cluster population are con-sistent with the evolution in the emission-line strengths.We might expect to see a difference because of the different time scales probed by colors and by emission lines.For example, if galaxies that fall into the cluster have their star formation quickly suppressed,they will remain blue(in the Butcher-Oemler sense)for a significant period after the line emission subsides(Ellingson et al.2001).Combining these factors,it seems quite possible to accom-R.G.Bower and M.L.BaloghFig.1.1.The median star formation rate as a function of local density from galaxies around clusters in the2dF survey(based on Lewis et al.2002).The left panel shows the star formation rate of all galaxies in the sample;the right panel shows the effect of removing galaxies within2virial radii of the cluster center.Error bars are jackknife estimates.The relations are amazingly similar,showing that the local density is more important than the overall mass of the group or cluster.modate both weak evolution in emission-line strength and more rapid evolution in the colors of cluster galaxies.1.3The Other Axis:Density1.3.1The Cluster OutskirtsSo far we have been discussing the properties of galaxies within the cores of clus-ters,but the dependence on density(or,nearly equivalently,cluster-centric radius)provides another axis over which to study galaxy properties.We have seen that star formation is strongly suppressed in the cores of rich clusters—but at what radius do the galaxies be-come more like thefield?We should also realize that it might be better to compare galaxy properties with their local densities(Dressler1980;Kodama et al.2001),as the large-scale structure surrounding clusters may have the dominant impact on galaxy evolution.One of thefirst steps at studying galaxies in the transition zone around clusters were made with the CNOC2survey(Balogh et al.1999).They showed that there was a strong radial dependence in the star formation rate,but that the star formation rate had not yet reached thefield value even at r≈r vir.The Sloan Digital Sky Survey(SDSS)and2dF galaxy redshift survey surveys have allowed us to make a huge leap forward in this respect.In the local Universe,we are able to map galaxy star formation rates,using the complete redshift information to eliminate contamination by interlopers.In this section we will concentrate on what we have learned from the2dF survey(Lewis et al.2002),but the results from the SDSS give very consistent answers(Gómez et al.2003).Figure1.1shows the median star formation rate as a function of local density.What is remarkable in this plot is that there is quite a sharp transition between galaxies withfield-like star formation rates atΣ<1Mpc−2R.G.Bower and M.L.Baloghand galaxies with low star formation rates comparable to cluster cores(Σ>7Mpc−2).The switch is complete over a range of less than7in density.The density at which the transition occurs corresponds to the density at the virial radius. If star formation is plotted against radius,the transition is considerably smeared out,but does occur at around the cluster virial radius—well outside the core region on which a lot of previous work has been focused.The2dF galaxy redshift survey sample is sufficiently large that we can remove the cluster completely from this diagram.By only plotting galaxies more than2virial radii from the cluster centers,we concentrate on thefilaments of infalling material.The correlation with local density is shown in Figure1.1.Amazingly,the relation hardly changes compared to the complete cluster diagram.This is a great success:we have identified the region where galaxy transformation occurs! It is in the infallingfilaments(consisting of chains of groups)where galaxies seem to change from star-forming,field-like galaxies to passive,cluster-like objects.Of course,it is tempt-ing to associate the transformation in star formation rate with a transformation from late-to early-type morphology.Unfortunately,this test cannot be undertaken with the available2dF data,but we can expect clearer results from SDSS.What happens at higher redshift?In fact,thefirst claim of a sharp transition in galaxy properties was made by Kodama et al.(2001)for the distant cluster A851at z=0.41(top panel in Fig.1.2).Kodama et al.(2001)used photometric redshifts to eliminate foreground objects,and thus to reduce contamination of the cluster members to a level that allowed the color distribution to be studied in the outer parts of the cluster.Their results show an amazing transition in color.Direct comparison with the local clusters is difficult,however,as the magnitude limits are very different(Kodama et al.’s photometric data reach much fainter than the local spectroscopic samples),but Gómez et al.(2003)concluded that the threshold seen by Kodama et al.(2001)was at a significantly higher local density.Perhaps dwarf galaxies are more robust to this environmental transformation;we are not going to attempt to cover this issue.A number of researchers are now engaged in spectroscopic programs to study the trans-formation threshold in higher-redshift systems.The results of Treu et al.(2003)are perhaps the most advanced.They also have the advantage of panoramic WFPC2imaging that will allow them to compare the transformation of galaxy morphology(see Treu2003).The highest redshifts that can be studied require a combination of photometric preselec-tion of objects for spectroscopy.Nakata et al.(2003)have used the photometric technique to map the large-scale structure around the Lynx cluster at z=1.27(lower panel in Fig.1.2), and similar techniques are described by Demarco et al.(2003).These groups identify several candidatefilaments;spectroscopy of these regions is now underway.1.3.2Galaxy GroupsReturning to the local Universe,it is interesting to see if we can probe the properties of galaxies in groups directly.A lot of work has been carried out looking at small samples of groups selected from the CfA redshift survey(Geller&Huchra1983;Moore,Frenk,& White1993),from the Hickson compact group catalog(Hickson,Kindl,&Auman1989), and also from X-ray surveys(Henry et al.1995;Mulchaey et al.2003).In the era of the2dF and SDSS redshift surveys,we can construct robust catalogs contain-ing thousands of groups(Eke et al.2003).It is interesting to compare the star formation rate in the groups as a function of local density,with the relation found in clusters.The relationR.G.Bower and M.L.BaloghFig.1.2.Top:Contour lines pinpoint the transition zone around the rich cluster A851at z=0.41studied by Kodama et al.2001.Bottom:z=1.27groups tracing the large-scale structure surrounding the Lynx clusters(based on Nakata et al.2003).Thesefigures illustrate how photometric methods can be used to reduce contamination rates in the outskirts of distant clusters so that the galaxy population can be studied there.R.G.Bower and M.L.Balogh 1010.111001020301010.11100102030Fig.1.3.The star formation rate as a function of density,comparing groups of galaxies with clusters.The upper and lower horizontal dashed lines show the 75th percentile and the me-dian of the equivalent widths.The hashed region shows the relation for the complete sample,while the solid line shows the relation for systems with 500kms −1<σ<1000kms −1(left )and σ<500kms −1(right ).The dependence on local density is identical irrespective of the velocity dispersion of the whole system.Figure based on Balogh et al.(2003).for the 2dF survey is shown in Figure 1.3(Balogh et al.2003).The panels show the effect of selecting systems on the basis of their velocity dispersion.There is actually very little difference between the trends.The galaxies in dense regions suffer the same suppression of their star formation rate,regardless of the system’s total mass.It is also possible to show that the groups in the infall regions of clusters show the same pattern as isolated groups.We have to conclude that the suppression of star formation is very much a local process.This is an important clue to distinguish between the different transformation mechanisms.Interestingly,in the local Universe,there is little evidence for the environment producing a rise in the star formation rate above the field value.The only exception to this appears to be the close,low-velocity encounters of isolated galaxies (Barton,Geller,&Kenyon 2000;Lambas et al.2003).Figure 1.4shows the star formation rate as a function of separation for systems of different total velocity dispersion.A spike in the median star formation rate appears only in the smallest bin of the first panel.It will be interesting to study this trend within groups and clusters (Balogh et al.2003).One of the next goals is to extend studies of groups to higher redshifts.The first steps in this direction were made by Allington-Smith et al.(1993).They used radio galaxies to pick out galaxy groups at redshifts up to 0.5.By stacking photometric catalogs,they showed thatthe galaxy populations of rich groups (N −190.5>30)∗became increasingly blue with redshift,while poorer groups contained similar populations of blue galaxies at all redshifts.A survey of redshift-space selected groups at intermediate redshift became possible with the CNOC2redshift survey.Carlberg et al.(2001)report a statistical sample of 160groups out to redshift ∗Group richness defined as the number of galaxies with M V ≤−19mag within a 0.5Mpc radius of the radio galaxy (H 0=50km s −1Mpc −1and q 0=0assumed).R.G.Bower and M.L.Balogh00.20.40.60.8111.21.41.6Fig.1.4.The star formation rate of galaxy pairs (b )as a function of separation.The different line styles distinguish the spectral type of the first galaxy of the pair,with the sequence of dot-dash,dotted,and solid lines showing the effect of restricting the sample to more and more active central objects.A strong enhancement is only seen when the separation is less than 100kpc.The figure is based on Lambas et al.(2003).0.4.On the Magellan telescopes,we have been following up the systems at z >0.3in order to determine the complete membership and measure total star formation rates.Figure 1.5shows the membership of a sample group.The initial results are exciting —star formation in many galaxies are more comparable to the surrounding field values.If these results are confirmed as we derive more redshifts and improve the group completeness,it represents a very interesting change from the properties seen in the 2dF groups.At higher redshifts,a tantalizing glimpse of the properties of a few groups can be obtained from the Caltech redshift survey (Cohen et al.2000).R.G.Bower and M.L.BaloghFig.1.5.A sample group from the CNOC2/Magellan group survey.This poor group,con-taining10members,is at z=0.393.1.4What Does It All Mean?1.4.1The Mechanisms Driving Galaxy EvolutionThe mechanisms that have been proposed to drive galaxy evolution in dense envi-ronments can be broadly separated into three categories.R.G.Bower and M.L.Balogh•Ram pressure stripping.Galaxies traveling through a dense intracluster medium suffer a strong ram pressure effect that sweeps cold gas out of the stellar disk(Gunn&Gott1972;Abadi,Moore,&Bower1999;Quilis,Moore,&Bower2000).The issue withthis mechanism is whether it can be effective outside dense,rich cluster cores where thegalaxy velocities are very high and the intracluster medium is very dense.Quilis et al.(2000)found that incorporating holes in the galaxy H I distribution made galaxies easierto strip(Fig.1.6),but it still required clusters more massive than the Virgo cluster to havea great effect.•Collisions and harassment.Collisions or close encounters between galaxies can have a strong effect on their star formation rates.The tidal forces generated tend to funnelgas toward the galaxy center(Barnes&Hernquist1991;Barnes2002;Mihos2003).It islikely that this will fuel a starburst,ejecting a large fraction of the material(Martin1999).Gas in the outer parts of the disk,on the other hand,will be drawn out of the galaxy bythe encounter.Although individual collisions are expected to be most effective in groupsbecause the velocity of the encounter is similar to the orbital time scale within the galaxy,Moore et al.(1996)showed that the cumulative effect of many weak encounters can alsobe important in clusters of galaxies.•Strangulation.Current theories of galaxy formation suggest that isolated galaxies contin-uously draw a supply of fresh gas from a hot,diffuse reservoir in their halo(Larson,Tinsley,&Caldwell1980;Cole et al.2000).Although the reservoir is too cool anddiffuse to be easily detected(Benson et al.2000;Fang,Sembach,&Canizares2003),this idea is supported by the observation that90%of the baryonic content of clusters isin the from of a hot,diffuse intracluster medium.The baryon reservoir in galaxy halosis entirely analogous.When an isolated galaxy becomes part of a group,it may loose itspreferential location at the center of the halo and thus be unable to draw further on thebaryon reservoir.Without a mechanism for resupplying the material that is consumed instar formation and feedback,the galaxies’star formation rate will decline.The exact ratedepends on the star formation law that is used(Schmidt1959;Kennicutt1989)and onwhether feedback is strong enough to drive an outflow from the disk.Semi-analytic models(e.g.,Cole et al.2000)generally incorporate only the third of these mechanisms.The observational data strongly suggest that the ram pressure stripping sce-nario cannot be important for the majority of galaxies.As we have seen,the suppression of star formation seems to occur well outside of the clusters and is equally effective in low-velocity groups,which do not possess a sufficiently dense intracluster medium.Distinguish-ing between the remaining two scenarios is rather harder,since they have similar dependence on environment.Indeed,they may both play a role.The key difference is the time scale on which they operate:collisions are expected to produce changes in galaxy properties on short time scales(∼100Myr),while the changes due to strangulation are much more gradual (>1Gyr).The time scale for harassment is less well defined;while the individual encoun-ters may induce short-lived bursts of star formation,the overall effect may accumulate over several Gyr.The radial gradients that we observe appear to prefer long time scales and, hence,a mechanism like strangulation or harassment(Balogh,Navarro,&Morris2000).To make further progress in this area,we need to compile detailed observations of galaxies that are caught in the transition phase.In particular,morphological measurements will provide another important distinction(e.g.,McIntosh et al.2003).。

相关文档
最新文档