Rational R-matrices, centralizer algebras and tensor identities for e_6 and e_7 exceptional
使用旋转森林分类癌症汇编
使用旋转森林分类癌症【摘要】我们处理使用新提出的多个分类器系统(MCS)的微阵列数据集为基础的癌症分类,为旋转森林。
尽我们所知,这是第一次旋转森林已经应用于微阵列数据集的分类。
在旋转森林的框架,需要一个线性变换方法对项目数据到新的特征空间中的每个分类器,并然后将基分类被训练在不同的新的空间,以提高基分类的两个精度,并在分集集成系统。
主成分分析(PCA),非参数判别分析(NDA)和随机突起(RP)的施加以特色在原有旋转林改造。
在本文中,我们使用独立成分分析(ICA)作为一种新的转型。
关键词:癌症分类科幻阳离子; DNA微阵列数据集;多个分类器系统(MCS);旋转森林;线性变换方法1.介绍与微阵列技术的发展,它是POS-sible对于那些诊断和分类某些特定的易拉罐核证减排量直接基于DNA微阵列数据集。
到现在为止,越来越多的新的预测,分类音响阳离子和集群技术正被用于微阵列数据的分析。
为例如,戈卢布等。
[1]利用最近邻方法分类科幻ER方法的急性髓细胞淋巴瘤分类科幻阳离子(AML)和急性白血病淋巴瘤(ALL)的儿童。
和一些研究已经报道了关于微的应用分子分类科幻阳离子阵列基因表达数据分析癌症。
简言之,微阵列实验导致之间TU-MORS分子变化的一个更完整的理解,并因此以一音响仪,更可靠的分类音响阳离子。
如何有史以来,微阵列数据集的一个特征是,收集的肿瘤样本的数目趋于比基因的数量要少得多。
也就是说,对于前者的数目往往是几十或几百的量级,而微阵列数据集通常包含数千个基因的每个芯片上。
同时,芯片的数据集通常包含每个芯片上数千个基因。
因为它是一个典型的“大P,小N”的问题[4],对于基因表达数据分析中英法fi古老而EF-fective方法仍然是一个挑战。
目前各种算法和数学模型已经提出了用于管理,分析和interpre-塔季翁微阵列数据集。
和许多研究人员仍在致力于不同的线性或非线性CLAS-SI网络阳离子系统的设计。
然而,应当指出,一个单一的分类音响阳离子系统不能总是导致高分类音响阳离子精度。
计量经济学英语专业词汇
• • • •
•
• • • •
模型设定正确假设。The regression model is correctly specified. 线性回归假设。The regression model is linear in the parameters。 与随机项不相关假设。The covariances between Xi and μi are zero. 观测值变化假设。X values in a given sample must not all be the same. 无完全共线性假设。There is no perfect multicollinearity among the explanatory variables. 0均值假设。The conditional mean value of μi is zero. 同方差假设。The conditional variances of μi are identical.(Homoscedasticity) 序列不相关假设。The correlation between any two μi and μj is zero. 正态性假设。The μ’s follow the normal distribution.
• 方程的显著性检验(F检验) Testing the Overall Significance of a Multiple Regression (the F test) • 假设检验(Hypothesis Testing)变量的显著性检验(t检 验) Testing the Significance of Variables (the t test) • 参数的置信区间 Confidence Interval of Parameter • 置信系数(置信度)(confidence coefficient) • 置信限(confidence limit) • 恩格尔曲线(Engle curves) • 菲利普斯曲线(Pillips cuves)
应用Robust rank aggregation法筛选肝癌的差异表达关键基因
<< 上接 65 页
脉内瘘的再通与内瘘闭塞的时间有关,尽 早溶栓是治疗的关键 [6]。血栓形成的早期 可以溶栓再通,但若血栓形成时间越长, 则溶栓效果越差。护理工作中我们要指导 患者内瘘的注意事项,主要包括内瘘肢体 应尽量不负重、不碰撞、远离尖利物品, 同时嘱患者触摸内瘘震颤及听诊血管杂音 3-4 次 /d,观察内瘘通畅情况及做好内瘘 的自我检测,一旦发现内瘘闭塞,及时就医。 就医后一旦判断内瘘有血栓形成,立刻行 彩色多普勒超声检查,因彩色多普勒超声 检查是初步判定内瘘血栓形成部位的有效 手段 , [7,8] 为选择尿激酶溶栓部位和远红 外照射部位提供依据。远红外理疗是一种 简单、方便、非侵入性的治疗方式,它能 减少单核细胞粘附于血管内皮,减少炎症 反应,减少血管内皮因炎症所造成的增生, 同时能改善血液透析患者动静脉内瘘血流 量及通畅率。另外,尿激酶溶栓治疗的并 发症较少,有研究显示有效溶解微小血栓
[4] 韩 昕 彤 , 马 鸿 雁 , 何 英 等 . 扶 他 林 联合远红外线照射治疗动静脉内瘘早 期 狭 窄 的 效 果 观 察 [J]. 中 国 血 液 净 化 ,2014,13(01):62-63.
[5] 王质刚 . 血液净化学 [M]. 二版 . 北京 科技出版社 ,2003:744-760.
随着基因组学领域的快速发展,PLC 研究领域也出现重大的革新性改变。高通
量测序技术的出现使大量的基因表达数据
不断涌现,使人们发现肝癌组织和细胞在
特定状态下的基因表达情况和关键基因变
化规律提供了可能。另外,由于各个实验 室实验条件不同、临床样本包含的人群种 族差异以及芯片平台的不同,大量的研究 结果呈现出的结果也不尽相同。因此,寻 找一种有效评价不同基因表达谱研究结果 的方法具有重要的意义。
生信 自噬基因集
生信自噬基因集
生物学中,自噬是细胞内的一种重要的分解过程,它通过溶酶体对储存蛋白、脂类或其他细胞成分的自身膜囊泡进行降解,使细胞能够重新利用分解产物以维持生命活动。
自噬过程涉及到许多基因的调控,其中自噬基因集是指与自噬相关的一组基因的总称。
自噬基因集包括许多基因,如ATG5、ATG7、ATG12、ATG16L 等。
这些基因的编码产物参与了自噬相关蛋白的翻译和修饰,并在自噬过程中发挥了重要的调节作用。
此外,还有许多其他基因与自噬过程密切相关,如ULK1、Beclin-1、LC3等。
自噬基因集在许多方面都具有重要的作用。
它们参与了细胞的代谢调节、细胞生长和分化、细胞凋亡等重要生物学过程。
同时,在许多疾病中,如癌症、神经性疾病、心血管疾病等,自噬基因集的异常表达与病理进程密切相关。
随着生物技术的不断发展,人们对自噬基因集的研究也在不断深入。
利用基因编辑技术和动物模型,科学家们不断探索自噬基因集在疾病中的作用,并寻找可靶向自噬基因集的治疗方法。
自噬基因集的研究将为我们更好地了解细胞自噬机制提供重要的参考。
- 1 -。
IB数学和IB课程简介
吴老师
IB数学和IB课程简介
• • • • • • • IB课程总体介绍 IB课程特点 IB课程目标 IB课程设置 IBDP数学课程的特点 IB数学HL教学大纲 一些建议
IB课程介绍
• IB课程是International Baccalaureate Diploma Programme的简称。 • 目前全球共有138个国家和地区,2771所学 校,超过763000位学生在接受IB课程学习。 • IB文凭被包括英国、美国、加拿大、澳大利 亚等在内的全球很多大学所认可,成为通 往世界名校的通行证。
IB数学HL教学大纲(Core)
• • • • • • • Algebra Functions & equations Circular functions & trigonometry Matrices Vectors Statistics & probability Calculus
IBDP数学学科特点
• 涵盖广泛的知识领域(Syllabus) • 有不同难度的子课程可供选择(HL, SL, etc.) • 重视数学软件及图形计算器的应用 (Technology) • 侧重知识的形成过程(TOK) • 培养学生对数学知识的运用(IA) • 激发学生对数学本身的学科热情(EE)
IB课程特点
• 自成体系:不以世界上任何一个国家的课 程体系为基础而,广泛吸收了当代许多发 达国家主流课程体系的优点,涵盖了其主 要的核心内容。 • 富有挑战性 • 有较高承认度 • 具备国际性
IB课程目标
• The International Baccalaureate aims to develop inquiring, knowledgeable and caring young people who help to create a better and more peaceful world through intercultural understanding and respect. • To this end the organization works with schools, governments and international organizations to develop challenging programmes of international education and rigorous assessment. • These programmes encourage students across the world to become active, compassionate and lifelong learners who understand that other people, with their differences, can also be right.
Local Rademacher complexities
a rX iv:mat h /58275v1[mat h.ST]16Aug25The Annals of Statistics 2005,Vol.33,No.4,1497–1537DOI:10.1214/009053605000000282c Institute of Mathematical Statistics ,2005LOCAL RADEMACHER COMPLEXITIES By Peter L.Bartlett,Olivier Bousquet and Shahar Mendelson University of California at Berkeley ,Max Planck Institute for Biological Cybernetics and Australian National University We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity.The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages,in the sense that the Rademacher averages are computed from the data,on a subset of functions with small empirical error.We present some applications to classification and prediction with convex function classes,and with kernel classes in particular.1.Introduction.Estimating the performance of statistical procedures is useful for providing a better understanding of the factors that influence their behavior,as well as for suggesting ways to improve them.Although asymptotic analysis is a crucial first step toward understanding the behavior,finite sample error bounds are of more value as they allow the design of model selection (or parameter tuning)procedures.These error bounds typically have the following form:with high probability,the error of the estimator (typically a function in a certain class)is bounded by an empirical estimate of error plus a penalty term depending on the complexity of the class of functions that can be chosen by the algorithm.The differences between the true and empirical errors of functions in that class can be viewed as an empirical process.Many tools have been developed for understanding the behavior of such objects,and especially for evaluating their suprema—which can be thought of as a measure of how hard it is to estimate functions in the class at hand.The goal is thus to obtain the sharpest possible estimateson the complexity of function classes.A problem arises since the notion of complexity might depend on the (unknown)underlying probability measure2P.L.BARTLETT,O.BOUSQUET AND S.MENDELSON according to which the data is produced.Distribution-free notions of the complexity,such as the Vapnik–Chervonenkis dimension[35]or the metric entropy[28],typically give conservative estimates.Distribution-dependent estimates,based for example on entropy numbers in the L2(P)distance, where P is the underlying distribution,are not useful when P is unknown. Thus,it is desirable to obtain data-dependent estimates which can readily be computed from the sample.One of the most interesting data-dependent complexity estimates is the so-called Rademacher average associated with the class.Although known for a long time to be related to the expected supremum of the empirical process (thanks to symmetrization inequalities),it wasfirst proposed as an effective complexity measure by Koltchinskii[15],Bartlett,Boucheron and Lugosi [1]and Mendelson[25]and then further studied in[3].Unfortunately,one of the shortcomings of the Rademacher averages is that they provide global estimates of the complexity of the function class,that is,they do not reflect the fact that the algorithm will likely pick functions that have a small error, and in particular,only a small subset of the function class will be used.As a result,the best error rate that can be obtained via the global Rademacher√averages is at least of the order of1/LOCAL RADEMACHER COMPLEXITIES3 general,power type inequalities.Their results,like those of van de Geer,are asymptotic.In order to exploit this key property and havefinite sample bounds,rather than considering the Rademacher averages of the entire class as the complex-ity measure,it is possible to consider the Rademacher averages of a small subset of the class,usually the intersection of the class with a ball centered at a function of interest.These local Rademacher averages can serve as a complexity measure;clearly,they are always smaller than the corresponding global averages.Several authors have considered the use of local estimates of the complexity of the function class in order to obtain better bounds. Before presenting their results,we introduce some notation which is used throughout the paper.Let(X,P)be a probability space.Denote by F a class of measurable func-tions from X to R,and set X1,...,X n to be independent random variables distributed according to P.Letσ1,...,σn be n independent Rademacher random variables,that is,independent random variables for which Pr(σi= 1)=Pr(σi=−1)=1/2.For a function f:X→R,defineP n f=1nni=1σi f(X i).For a class F,setR n F=supf∈FR n f.Define Eσto be the expectation with respect to the random variablesσ1,...,σn, conditioned on all of the other random variables.The Rademacher averageof F is E R n F,and the empirical(or conditional)Rademacher averages of FareEσR n F=1rx/n+4P.L.BARTLETT,O.BOUSQUET AND S.MENDELSONc3/n,which can be computed from the data.Forˆr N defined byˆr0=1,ˆr k+1=φn(ˆr k),they show that with probability at least1−2Ne−x,2xPˆf≤ˆr N+r)≥EσR n{f∈F:P n f≤r},and if the number of iterations N is at least1+⌈log2log2n/x⌉,then with probability at least1−Ne−x,ˆr N≤c ˆr∗+xr)=bining the above results,one has a procedure to obtain data-dependent error bounds that are of the order of thefixed point of the modulus of continuity at0of the empirical Rademacher averages.One limitation of this result is that it assumes that there is a function f∗in the class with P f∗=0.In contrast,we are interested in prediction problems where P f is the error of an estimator, and in the presence of noise there may not be any perfect estimator(even the best in the class can have nonzero error).More recently,Bousquet,Koltchinskii and Panchenko[9]have obtained a more general result avoiding the iterative procedure.Their result is that for functions with values in[0,1],with probability at least1−e−x,∀f∈F P f≤c P n f+ˆr∗+t+log log nr)≥EσR n{f∈F:P n f≤r}.The main difference between this and the results of[16]is that there is no requirement that the class contain a perfect function.However,the local Rademacher averages are centered around the zero function instead of the one that minimizes P f.As a consequence,thefixed pointˆr∗cannot be expected to converge to zero when inf f∈F P f>0.In order to remove this limitation,Lugosi and Wegkamp[19]use localized Rademacher averages of a small ball around the minimizerˆf of P n.However, their result is restricted to nonnegative functions,and in particular functions with values in{0,1}.Moreover,their bounds also involve some global in-formation,in the form of the shatter coefficients S F(X n1)of the function class(i.e.,the cardinality of the coordinate projections of the class F onLOCAL RADEMACHER COMPLEXITIES5 the data X n1).They show that there are constants c1,c2such that,with probability at least1−8/n,the empirical minimizerˆf satisfiesP f+2 ψn(ˆr n),Pˆf≤inff∈Fwhereψn(r)=c1 EσR n{f∈F:P n f≤16P nˆf+15r}+log n log n P nˆf+randˆr n=c2(log S F(X n1)+log n)/n.The limitation of this result is thatˆr n has to be chosen according to the(empirically measured)complexity of the whole class,which may not be as sharp as the Rademacher averages,and in general,is not afixed point of ψn.Moreover,the balls over which the Rademacher averages are computed in ψn contain a factor of16in front of P nˆf.As we explain later,this induces a lower bound on ψn when there is no function with P f=0in the class.It seems that the only way to capture the right behavior in the general, noisy case is to analyze the increments of the empirical process,in other words,to directly consider the functions f−f∗.This approach wasfirst proposed by Massart[22];see also[26].Massart introduces the assumption Var[ℓf(X)−ℓf∗(X)]≤d2(f,f∗)≤B(Pℓf−Pℓf∗),whereℓf is the loss associated with the function f[in other words,ℓf(X,Y)=ℓ(f(X),Y),which measures the discrepancy in the prediction made by f],d is a pseudometric and f∗minimizes the expected loss.(The previous results could also be stated in terms of loss functions,but we omitted this in order to simplify exposition.However,the extra notation is necessary to properly state Massart’s result.)This is a more refined version of the assumption we mentioned earlier on the relationship between the variance and expectation of the increments of the empirical process.It is only satisfied for some loss functionsℓand function classes F.Under this assumption,Massart considers a nondecreasing functionψsatisfying|P f−P f∗−P n f+P n f∗|+c xψ(r)≥E supf∈F,d2(f,f∗)2≤rr is nonincreasing(we refer to this property as the sub-root property later in the paper).Then,with probability at least1−e−x,∀f∈F Pℓf−Pℓf∗≤c r∗+x6P.L.BARTLETT,O.BOUSQUET AND S.MENDELSONsituations of interest,this bound suffices to prove minimax rates of conver-gence for penalized M-estimators.(Massart considers examples where the complexity term can be bounded using a priori global information about the function class.)However,the main limitation of this result is that it does not involve quantities that can be computed from the data.Finally,as we mentioned earlier,Mendelson[26]gives an analysis similar to that of Massart,in a slightly less general case(with no noise in the target values,i.e.,the conditional distribution of Y given X is concentrated at one point).Mendelson introduces the notion of the star-hull of a class of functions(see the next section for a definition)and considers Rademacher averages of this star-hull as a localized measure of complexity.His results also involve a priori knowledge of the class,such as the rate of growth of covering numbers.We can now spell out our goal in more detail:in this paper we com-bine the increment-based approach of Massart and Mendelson(dealing with differences of functions,or more generally with bounded real-valued func-tions)with the empirical local Rademacher approach of Koltchinskii and Panchenko and of Lugosi and Wegkamp,in order to obtain data-dependent bounds which depend on afixed point of the modulus of continuity of Rademacher averages computed around the empirically best function.Ourfirst main result(Theorem3.3)is a distribution-dependent result involving thefixed point r∗of a local Rademacher average of the star-hull of the class F.This shows that functions with the sub-root property can readily be obtained from Rademacher averages,while in previous work the appropriate functions were obtained only via global information about the class.The second main result(Theorems4.1and4.2)is an empirical counterpart of thefirst one,where the complexity is thefixed point of an empirical local Rademacher average.We also show that thisfixed point is within a constant factor of the nonempirical one.Equipped with this result,we can then prove(Theorem5.4)a fully data-dependent analogue of Massart’s result,where the Rademacher averages are localized around the minimizer of the empirical loss.We also show(Theorem6.3)that in the context of classification,the local Rademacher averages of star-hulls can be approximated by solving a weighted empirical error minimization problem.Ourfinal result(Corollary6.7)concerns regression with kernel classes, that is,classes of functions that are generated by a positive definite ker-nel.These classes are widely used in interpolation and estimation problems as they yield computationally efficient algorithms.Our result gives a data-dependent complexity term that can be computed directly from the eigen-values of the Gram matrix(the matrix whose entries are values of the kernel on the data).LOCAL RADEMACHER COMPLEXITIES7 The sharpness of our results is demonstrated from the fact that we recover, in the distribution-dependent case(treated in Section4),similar results to those of Massart[22],which,in the situations where they apply,give the minimax optimal rates or the best known results.Moreover,the data-dependent bounds that we obtain as counterparts of these results have the same rate of convergence(see Theorem4.2).The paper is organized as follows.In Section2we present some prelimi-nary results obtained from concentration inequalities,which we use through-out.Section3establishes error bounds using local Rademacher averages and explains how to compute theirfixed points from“global information”(e.g., estimates of the metric entropy or of the combinatorial dimensions of the indexing class),in which case the optimal estimates can be recovered.In Section4we give a data-dependent error bound using empirical and local Rademacher averages,and show the connection between thefixed points of the empirical and nonempirical Rademacher averages.In Section5we ap-ply our results to loss classes.We give estimates that generalize the results of Koltchinskii and Panchenko by eliminating the requirement that some function in the class have zero loss,and are more general than those of Lugosi and Wegkamp,since there is no need have in our case to estimate global shatter coefficients of the class.We also give a data-dependent exten-sion of Massart’s result where the local averages are computed around the minimizer of the empirical loss.Finally,Section6shows that the problem of estimating these local Rademacher averages in classification reduces to weighted empirical risk minimization.It also shows that the local averages for kernel classes can be sharply bounded in terms of the eigenvalues of the Gram matrix.2.Preliminary results.Recall that the star-hull of F around f0is de-fined bystar(F,f0)={f0+α(f−f0):f∈F,α∈[0,1]}. Throughout this paper,we will manipulate suprema of empirical processes, that is,quantities of the form sup f∈F(P f−P n f).We will always assume they are measurable without explicitly mentioning it.In other words,we assume that the class F and the distribution P satisfy appropriate(mild) conditions for measurability of this supremum(we refer to[11,28]for a detailed account of such issues).The following theorem is the main result of this section and is at the core of all the proofs presented later.It shows that if the functions in a class have small variance,the maximal deviation between empirical means and true means is controlled by the Rademacher averages of F.In particular, the bound improves as the largest variance of a class member decreases.8P.L.BARTLETT,O.BOUSQUET AND S.MENDELSON Theorem2.1.Let F be a class of functions that map X into[a,b]. Assume that there is some r>0such that for every f∈F,Var[f(X i)]≤r. Then,for every x>0,with probability at least1−e−x,sup f∈F (P f−P n f)≤infα>0 2(1+α)E R n F+n+(b−a) 1α x 1−αEσR n F+ n+(b−a) 1α+1+αn .Moreover,the same results hold for the quantity sup f∈F(P n f−P f).This theorem,which is proved in Appendix A.2,is a more or less directconsequence of Talagrand’s inequality for empirical processes[30].However,the actual statement presented here is new in the sense that it displays thebest known constants.Indeed,compared to the previous result of Koltchin-skii and Panchenko[16]which was based on Massart’s version of Talagrand’sinequality[21],we have used the most refined concentration inequalitiesavailable:that of Bousquet[7]for the supremum of the empirical process and that of Boucheron,Lugosi and Massart[5]for the Rademacher averages.This last inequality is a powerful tool to obtain data-dependent bounds,since it allows one to replace the Rademacher average(which measures thecomplexity of the class of functions)by its empirical version,which can beefficiently computed in some cases.Details about these inequalities are givenin Appendix A.1.When applied to the full function class F,the above theorem is not useful.Indeed,with only a trivial bound on the maximal variance,better resultscan be obtained via simpler concentration inequalities,such as the boundeddifference inequality[23],which would allow x/n. However,by applying Theorem2.1to subsets of F or to modified classesobtained from F,much better results can be obtained.Hence,the presence ofan upper bound on the variance in the square root term is the key ingredientof this result.A last preliminary result that we will require is the following consequenceof Theorem2.1,which shows that if the local Rademacher averages are small,then balls in L2(P)are probably contained in the corresponding empiricalballs[i.e.,in L2(P n)]with a slightly larger radius.Corollary2.2.Let F be a class of functions that map X into[−b,b] with b>0.For every x>0and r that satisfyr≥10b E R n{f:f∈F,P f2≤r}+11b2xLOCAL RADEMACHER COMPLEXITIES9 then with probability at least1−e−x,{f∈F:P f2≤r}⊆{f∈F:P n f2≤2r}.Proof.Since the range of any function in the set F r={f2:f∈F, P f2≤r}is contained in[0,b2],it follows that Var[f2(X i)]≤P f4≤b2P f2≤b2r.Thus,by thefirst part of Theorem2.1(withα=1/4),with probability at least1−e−x,every f∈F r satisfiesP n f2≤r+52b2rx3n≤r+52+16b2x2+16b2xr is nonincreasing for r>0.We only consider nontrivial sub-root functions,that is,sub-root functions that are not the constant functionψ≡0.10P.L.BARTLETT,O.BOUSQUET AND S.MENDELSON Lemma3.2.Ifψ:[0,∞)→[0,∞)is a nontrivial sub-root function,then it is continuous on[0,∞)and the equationψ(r)=r has a unique positive solution.Moreover,if we denote the solution by r∗,then for all r>0,r≥ψ(r)if and only if r∗≤r.The proof of this lemma is in Appendix A.2.In view of the lemma,we will simply refer to the quantity r∗as the unique positive solution ofψ(r)=r, or as thefixed point ofψ.3.1.Error bounds.We can now state and discuss the main result of this section.It is composed of two parts:in thefirst part,one requires a sub-root upper bound on the local Rademacher averages,and in the second part,it is shown that better results can be obtained when the class over which the averages are computed is enlarged slightly.Theorem3.3.Let F be a class of functions with ranges in[a,b]and assume that there are some functional T:F→R+and some constant B such that for every f∈F,Var[f]≤T(f)≤BP f.Letψbe a sub-root function and let r∗be thefixed point ofψ.1.Assume thatψsatisfies,for any r≥r∗,ψ(r)≥B E R n{f∈F:T(f)≤r}.Then,with c1=704and c2=26,for any K>1and every x>0,with probability at least1−e−x,∀f∈F P f≤K B r∗+x(11(b−a)+c2BK)K P f+c1Kn.2.If,in addition,for f∈F andα∈[0,1],T(αf)≤α2T(f),and ifψsatisfies,for any r≥r∗,ψ(r)≥B E R n{f∈star(F,0):T(f)≤r},then the same results hold true with c1=6and c2=5.The proof of this theorem is given in Section3.2.We can compare the results to our starting point(Theorem2.1).The improvement comes from the fact that the complexity term,which was es-sentially sup rψ(r)in Theorem2.1(if we had applied it to the class F di-rectly)is now reduced to r∗,thefixed point ofψ.So the complexity term is always smaller(later,we show how to estimate r∗).On the other hand,LOCAL RADEMACHER COMPLEXITIES11 there is some loss since the constant in front of P n f is strictly larger than1. Section5.2will show that this is not an issue in the applications we have in mind.In Sections5.1and5.2we investigate conditions that ensure the assump-tions of this theorem are satisfied,and we provide applications of this result to prediction problems.The condition that the variance is upper bounded by the expectation turns out to be crucial to obtain these results.The idea behind Theorem3.3originates in the work of Massart[22],who proves a slightly different version of thefirst part.The difference is that we use local Rademacher averages instead of the expectation of the supremum of the empirical process on a ball.Moreover,we give smaller constants.As far as we know,the second part of Theorem3.3is new.3.1.1.Choosing the functionψ.Notice that the functionψcannot be chosen arbitrarily and has to satisfy the sub-root property.One possible approach is to use classical upper bounds on the Rademacher averages,such as Dudley’s entropy integral.This can give a sub-root upper bound and was used,for example,in[16]and in[22].However,the second part of Theorem3.3indicates a possible choice for ψ,namely,one can takeψas the local Rademacher averages of the star-hull of F around0.The reason for this comes from the following lemma, which shows that if the class is star-shaped and T(f)behaves as a quadratic function,the Rademacher averages are sub-root.Lemma3.4.If the class F is star-shaped aroundˆf(which may depend on the data),and T:F→R+is a(possibly random)function that satis-fies T(αf)≤α2T(f)for any f∈F and anyα∈[0,1],then the(random) functionψdefined for r≥0byψ(r)=EσR n{f∈F:T(f−ˆf)≤r}is sub-root and r→Eψ(r)is also sub-root.This lemma is proved in Appendix A.2.Notice that making a class star-shaped only increases it,so thatE R n{f∈star(F,f0):T(f)≤r}≥E R n{f∈F:T(f)≤r}. However,this increase in size is moderate as can be seen,for example,if one compares covering numbers of a class and its star-hull(see,e.g.,[26], Lemma4.5).12P.L.BARTLETT,O.BOUSQUET AND S.MENDELSON3.1.2.Some consequences.As a consequence of Theorem3.3,we obtain an error bound when F consists of uniformly bounded nonnegative functions. Notice that in this case the variance is trivially bounded by a constant times the expectation and one can directly use T(f)=P f.Corollary3.5.Let F be a class of functions with ranges in[0,1].Let ψbe a sub-root function,such that for all r≥0,E R n{f∈F:P f≤r}≤ψ(r),and let r∗be thefixed point ofψ.Then,for any K>1and every x>0,with probability at least1−e−x,every f∈F satisfiesP f≤Kn.Also,with probability at least1−e−x,every f∈F satisfiesP n f≤K+1n.Proof.When f∈[0,1],we have Var[f]≤P f so that the result follows from applying Theorem3.3with T(f)=P f.We also note that the same idea as in the proof of Theorem3.3gives a converse of Corollary2.2,namely,that with high probability the intersection of F with an empirical ball of afixed radius is contained in the intersection of F with an L2(P)ball with a slightly larger radius.Lemma3.6.Let F be a class of functions that map X into[−1,1].Fix x>0.Ifr≥20E R n{f:f∈star(F,0),P f2≤r}+26xLOCAL RADEMACHER COMPLEXITIES13 Corollary3.7.Let F be a class of{0,1}-valued functions with VC-dimen-sion d<∞.Then for all K>1and every x>0,with probability at least1−e−x,every f∈F satisfiesP f≤Kn+x14P.L.BARTLETT,O.BOUSQUET AND S.MENDELSON(b)Upper bound the Rademacher averages of this weighted class,by“peeling off”subclasses of F according to the variance of their elements,and bounding the Rademacher averages of these subclasses usingψ.(c)Use the sub-root property ofψ,so that itsfixed point gives a common upper bound on the complexity of all the subclasses(up to some scaling).(d)Finally,convert the upper bound for functions in the weighted classinto a bound for functions in the initial class.The idea of peeling—that is,of partitioning the class F into slices wherefunctions have variance within a certain range—is at the core of the proof of thefirst part of Theorem3.3[see,e.g.,(3.1)].However,it does not appearexplicitly in the proof of the second part.One explanation is that when oneconsiders the star-hull of the class,it is enough to consider two subclasses:the functions with T(f)≤r and the ones with T(f)>r,and this is done by introducing the weighting factor T(f)∨r.This idea was exploited inthe work of Mendelson[26]and,more recently,in[4].Moreover,when oneconsiders the set F r=star(F,0)∩{T(f)≤r},any function f′∈F with T(f′)>r will have a scaled down representative in that set.So even though it seems that we look at the class star(F,0)only locally,we still take intoaccount all of the functions in F(with appropriate scaling).3.2.Proofs.Before presenting the proof,let usfirst introduce some ad-ditional notation.Given a class F,λ>1and r>0,let w(f)=min{rλk:k∈N,rλk≥T(f)}and setG r= rT(f)∨r:f∈F ,and define˜V+ r =supg∈˜G rP g−P n g and˜V−r=supg∈˜G rP n g−P g.Lemma3.8.With the above notation,assume that there is a constant B>0such that for every f∈F,T(f)≤BP f.Fix K>1,λ>0and r>0.LOCAL RADEMACHER COMPLEXITIES15If V+r≤r/(λBK),then∀f∈F P f≤KλBK.Also,if V−r≤r/(λBK),then∀f∈F P n f≤K+1λBK. Similarly,if K>1and r>0are such that˜V+r≤r/(BK),then∀f∈F P f≤K BK.Also,if˜V−r≤r/(BK),then∀f∈F P n f≤K+1BK.Proof.Notice that for all g∈G r,P g≤P n g+V+r.Fix f∈F and define g=rf/w(f).When T(f)≤r,w(f)=r,so that g=f.Thus,the fact that P g≤P n g+V+r implies that P f≤P n f+V+r≤P n f+r/(λBK).On the other hand,if T(f)>r,then w(f)=rλk with k>0and T(f)∈(rλk−1,rλk].Moreover,g=f/λk,P g≤P n g+V+r,and thusP fλk+V+r.Using the fact that T(f)>rλk−1,it follows thatP f≤P n f+λk V+r<P n f+λT(f)V+r/r≤P n f+P f/K. Rearranging,P f≤KK−1P n f+r2rx3+1n.16P.L.BARTLETT,O.BOUSQUET AND S.MENDELSONLet F(x,y):={f∈F:x≤T(f)≤y}and define k to be the smallest integer such that rλk+1≥Bb.ThenE R n G r≤E R n F(0,r)+E supf∈F(r,Bb)rw(f)R n f(3.1)=E R n F(0,r)+kj=0λ−j E supf∈F(rλj,rλj+1)R n f≤ψ(r)Bkj=0λ−jψ(rλj+1).By our assumption it follows that forβ≥1,ψ(βr)≤√Bψ(r) 1+√r/r∗ψ(r∗)=√B √2rx3+1n.Set A=10(1+α)√2x/n and C=(b−a)(1/3+1/α)x/n,and note that V+r≤A√r+C=r/(λBK).It satisfies r0≥λ2A2B2K2/2≥r∗and r0≤(λBK)2A2+2λBKC,so that applying Lemma3.8, it follows that every f∈F satisfiesP f≤KK−1P n f+λBK 100(1+α)2r∗/B2+20(1+α)2xr∗n+(b−a) 1α x2xr∗/n≤Bx/(5n)+ 5r∗/(2B)completes the proof of thefirst statement.The second statement is proved in the same way,by considering V−r instead of V+r.LOCAL RADEMACHER COMPLEXITIES17 Proof of Theorem3.3,second part.The proof of this result uses the same argument as for thefirst part.However,we consider the class˜G rdefined above.One can easily check that˜G r⊂{f∈star(F,0):T(f)≤r}, and thus E R n˜G r≤ψ(r)/B.Applying Theorem2.1to˜G r,it follows that,for all x>0,with probability1−e−x,˜V+ r≤2(1+α)2rx3+1n.The reasoning is then the same as for thefirst part,and we use in the very last step thatn .(3.2)Clearly,if f∈F,then f2maps to[0,1]and Var[f2]≤P f2.Thus,Theo-rem2.1can be applied to the class G r={rf2/(P f2∨r):f∈F},whose functions have range in[0,1]and variance bounded by r.Therefore,with probability at least1−e−x,every f∈F satisfiesr P f2−P n f22rx3+1n.Selectα=1/4and notice thatP f2∨r≤52+19xr 54+19x18P.L.BARTLETT,O.BOUSQUET AND S.MENDELSON4.Data-dependent error bounds.The results presented thus far use distribution-dependent measures of complexity of the class at hand.In-deed,the sub-root functionψof Theorem3.3is bounded in terms of theRademacher averages of the star-hull of F,but these averages can only becomputed if one knows the distribution P.Otherwise,we have seen that it is possible to compute an upper bound on the Rademacher averages using apriori global or distribution-free knowledge about the complexity of the classat hand(such as the VC-dimension).In this section we present error boundsthat can be computed directly from the data,without a priori information. Instead of computingψ,we compute an estimate, ψn,of it.The function ψn is defined using the data and is an upper bound onψwith high probability.To simplify the exposition we restrict ourselves to the case where the func-tions have a range which is symmetric around zero,say[−1,1].Moreover, we can only treat the special case where T(f)=P f2,but this is a minor restriction as in most applications this is the function of interest[i.e.,for which one can show T(f)≤BP f].4.1.Results.We now present the main result of this section,which givesan analogue of the second part of Theorem3.3,with a completely empiricalbound(i.e.,the bound can be computed from the data only).Theorem4.1.Let F be a class of functions with ranges in[−1,1]and assume that there is some constant B such that for every f∈F,P f2≤BP f. Let ψn be a sub-root function and letˆr∗be thefixed point of ψn.Fix x>0 and assume that ψn satisfies,for any r≥ˆr∗,ψn(r)≥c1EσR n{f∈star(F,0):P n f2≤2r}+c2xK−1P n f+6Kn.Also,with probability at least1−3e−x,∀f∈F P n f≤K+1Bˆr∗+x(11+5BK)。
From Data Mining to Knowledge Discovery in Databases
s Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media atten-tion of late. What is all the excitement about?This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges in-volved in real-world applications of knowledge discovery, and current and future research direc-tions in the field.A cross a wide variety of fields, data arebeing collected and accumulated at adramatic pace. There is an urgent need for a new generation of computational theo-ries and tools to assist humans in extracting useful information (knowledge) from the rapidly growing volumes of digital data. These theories and tools are the subject of the emerging field of knowledge discovery in databases (KDD).At an abstract level, the KDD field is con-cerned with the development of methods and techniques for making sense of data. The basic problem addressed by the KDD process is one of mapping low-level data (which are typically too voluminous to understand and digest easi-ly) into other forms that might be more com-pact (for example, a short report), more ab-stract (for example, a descriptive approximation or model of the process that generated the data), or more useful (for exam-ple, a predictive model for estimating the val-ue of future cases). At the core of the process is the application of specific data-mining meth-ods for pattern discovery and extraction.1This article begins by discussing the histori-cal context of KDD and data mining and theirintersection with other related fields. A briefsummary of recent KDD real-world applica-tions is provided. Definitions of KDD and da-ta mining are provided, and the general mul-tistep KDD process is outlined. This multistepprocess has the application of data-mining al-gorithms as one particular step in the process.The data-mining step is discussed in more de-tail in the context of specific data-mining al-gorithms and their application. Real-worldpractical application issues are also outlined.Finally, the article enumerates challenges forfuture research and development and in par-ticular discusses potential opportunities for AItechnology in KDD systems.Why Do We Need KDD?The traditional method of turning data intoknowledge relies on manual analysis and in-terpretation. For example, in the health-careindustry, it is common for specialists to peri-odically analyze current trends and changesin health-care data, say, on a quarterly basis.The specialists then provide a report detailingthe analysis to the sponsoring health-care or-ganization; this report becomes the basis forfuture decision making and planning forhealth-care management. In a totally differ-ent type of application, planetary geologistssift through remotely sensed images of plan-ets and asteroids, carefully locating and cata-loging such geologic objects of interest as im-pact craters. Be it science, marketing, finance,health care, retail, or any other field, the clas-sical approach to data analysis relies funda-mentally on one or more analysts becomingArticlesFALL 1996 37From Data Mining to Knowledge Discovery inDatabasesUsama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth Copyright © 1996, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1996 / $2.00areas is astronomy. Here, a notable success was achieved by SKICAT ,a system used by as-tronomers to perform image analysis,classification, and cataloging of sky objects from sky-survey images (Fayyad, Djorgovski,and Weir 1996). In its first application, the system was used to process the 3 terabytes (1012bytes) of image data resulting from the Second Palomar Observatory Sky Survey,where it is estimated that on the order of 109sky objects are detectable. SKICAT can outper-form humans and traditional computational techniques in classifying faint sky objects. See Fayyad, Haussler, and Stolorz (1996) for a sur-vey of scientific applications.In business, main KDD application areas includes marketing, finance (especially in-vestment), fraud detection, manufacturing,telecommunications, and Internet agents.Marketing:In marketing, the primary ap-plication is database marketing systems,which analyze customer databases to identify different customer groups and forecast their behavior. Business Week (Berry 1994) estimat-ed that over half of all retailers are using or planning to use database marketing, and those who do use it have good results; for ex-ample, American Express reports a 10- to 15-percent increase in credit-card use. Another notable marketing application is market-bas-ket analysis (Agrawal et al. 1996) systems,which find patterns such as, “If customer bought X, he/she is also likely to buy Y and Z.” Such patterns are valuable to retailers.Investment: Numerous companies use da-ta mining for investment, but most do not describe their systems. One exception is LBS Capital Management. Its system uses expert systems, neural nets, and genetic algorithms to manage portfolios totaling $600 million;since its start in 1993, the system has outper-formed the broad stock market (Hall, Mani,and Barr 1996).Fraud detection: HNC Falcon and Nestor PRISM systems are used for monitoring credit-card fraud, watching over millions of ac-counts. The FAIS system (Senator et al. 1995),from the U.S. Treasury Financial Crimes En-forcement Network, is used to identify finan-cial transactions that might indicate money-laundering activity.Manufacturing: The CASSIOPEE trou-bleshooting system, developed as part of a joint venture between General Electric and SNECMA, was applied by three major Euro-pean airlines to diagnose and predict prob-lems for the Boeing 737. To derive families of faults, clustering methods are used. CASSIOPEE received the European first prize for innova-intimately familiar with the data and serving as an interface between the data and the users and products.For these (and many other) applications,this form of manual probing of a data set is slow, expensive, and highly subjective. In fact, as data volumes grow dramatically, this type of manual data analysis is becoming completely impractical in many domains.Databases are increasing in size in two ways:(1) the number N of records or objects in the database and (2) the number d of fields or at-tributes to an object. Databases containing on the order of N = 109objects are becoming in-creasingly common, for example, in the as-tronomical sciences. Similarly, the number of fields d can easily be on the order of 102or even 103, for example, in medical diagnostic applications. Who could be expected to di-gest millions of records, each having tens or hundreds of fields? We believe that this job is certainly not one for humans; hence, analysis work needs to be automated, at least partially.The need to scale up human analysis capa-bilities to handling the large number of bytes that we can collect is both economic and sci-entific. Businesses use data to gain competi-tive advantage, increase efficiency, and pro-vide more valuable services to customers.Data we capture about our environment are the basic evidence we use to build theories and models of the universe we live in. Be-cause computers have enabled humans to gather more data than we can digest, it is on-ly natural to turn to computational tech-niques to help us unearth meaningful pat-terns and structures from the massive volumes of data. Hence, KDD is an attempt to address a problem that the digital informa-tion era made a fact of life for all of us: data overload.Data Mining and Knowledge Discovery in the Real WorldA large degree of the current interest in KDD is the result of the media interest surrounding successful KDD applications, for example, the focus articles within the last two years in Business Week , Newsweek , Byte , PC Week , and other large-circulation periodicals. Unfortu-nately, it is not always easy to separate fact from media hype. Nonetheless, several well-documented examples of successful systems can rightly be referred to as KDD applications and have been deployed in operational use on large-scale real-world problems in science and in business.In science, one of the primary applicationThere is an urgent need for a new generation of computation-al theories and tools toassist humans in extractinguseful information (knowledge)from the rapidly growing volumes ofdigital data.Articles38AI MAGAZINEtive applications (Manago and Auriol 1996).Telecommunications: The telecommuni-cations alarm-sequence analyzer (TASA) wasbuilt in cooperation with a manufacturer oftelecommunications equipment and threetelephone networks (Mannila, Toivonen, andVerkamo 1995). The system uses a novelframework for locating frequently occurringalarm episodes from the alarm stream andpresenting them as rules. Large sets of discov-ered rules can be explored with flexible infor-mation-retrieval tools supporting interactivityand iteration. In this way, TASA offers pruning,grouping, and ordering tools to refine the re-sults of a basic brute-force search for rules.Data cleaning: The MERGE-PURGE systemwas applied to the identification of duplicatewelfare claims (Hernandez and Stolfo 1995).It was used successfully on data from the Wel-fare Department of the State of Washington.In other areas, a well-publicized system isIBM’s ADVANCED SCOUT,a specialized data-min-ing system that helps National Basketball As-sociation (NBA) coaches organize and inter-pret data from NBA games (U.S. News 1995). ADVANCED SCOUT was used by several of the NBA teams in 1996, including the Seattle Su-personics, which reached the NBA finals.Finally, a novel and increasingly importanttype of discovery is one based on the use of in-telligent agents to navigate through an infor-mation-rich environment. Although the ideaof active triggers has long been analyzed in thedatabase field, really successful applications ofthis idea appeared only with the advent of theInternet. These systems ask the user to specifya profile of interest and search for related in-formation among a wide variety of public-do-main and proprietary sources. For example, FIREFLY is a personal music-recommendation agent: It asks a user his/her opinion of several music pieces and then suggests other music that the user might like (<http:// www.ffl/>). CRAYON(/>) allows users to create their own free newspaper (supported by ads); NEWSHOUND(<http://www. /hound/>) from the San Jose Mercury News and FARCAST(</> automatically search information from a wide variety of sources, including newspapers and wire services, and e-mail rele-vant documents directly to the user.These are just a few of the numerous suchsystems that use KDD techniques to automat-ically produce useful information from largemasses of raw data. See Piatetsky-Shapiro etal. (1996) for an overview of issues in devel-oping industrial KDD applications.Data Mining and KDDHistorically, the notion of finding useful pat-terns in data has been given a variety ofnames, including data mining, knowledge ex-traction, information discovery, informationharvesting, data archaeology, and data patternprocessing. The term data mining has mostlybeen used by statisticians, data analysts, andthe management information systems (MIS)communities. It has also gained popularity inthe database field. The phrase knowledge dis-covery in databases was coined at the first KDDworkshop in 1989 (Piatetsky-Shapiro 1991) toemphasize that knowledge is the end productof a data-driven discovery. It has been popular-ized in the AI and machine-learning fields.In our view, KDD refers to the overall pro-cess of discovering useful knowledge from da-ta, and data mining refers to a particular stepin this process. Data mining is the applicationof specific algorithms for extracting patternsfrom data. The distinction between the KDDprocess and the data-mining step (within theprocess) is a central point of this article. Theadditional steps in the KDD process, such asdata preparation, data selection, data cleaning,incorporation of appropriate prior knowledge,and proper interpretation of the results ofmining, are essential to ensure that usefulknowledge is derived from the data. Blind ap-plication of data-mining methods (rightly crit-icized as data dredging in the statistical litera-ture) can be a dangerous activity, easilyleading to the discovery of meaningless andinvalid patterns.The Interdisciplinary Nature of KDDKDD has evolved, and continues to evolve,from the intersection of research fields such asmachine learning, pattern recognition,databases, statistics, AI, knowledge acquisitionfor expert systems, data visualization, andhigh-performance computing. The unifyinggoal is extracting high-level knowledge fromlow-level data in the context of large data sets.The data-mining component of KDD cur-rently relies heavily on known techniquesfrom machine learning, pattern recognition,and statistics to find patterns from data in thedata-mining step of the KDD process. A natu-ral question is, How is KDD different from pat-tern recognition or machine learning (and re-lated fields)? The answer is that these fieldsprovide some of the data-mining methodsthat are used in the data-mining step of theKDD process. KDD focuses on the overall pro-cess of knowledge discovery from data, includ-ing how the data are stored and accessed, howalgorithms can be scaled to massive data setsThe basicproblemaddressed bythe KDDprocess isone ofmappinglow-leveldata intoother formsthat might bemorecompact,moreabstract,or moreuseful.ArticlesFALL 1996 39A driving force behind KDD is the database field (the second D in KDD). Indeed, the problem of effective data manipulation when data cannot fit in the main memory is of fun-damental importance to KDD. Database tech-niques for gaining efficient data access,grouping and ordering operations when ac-cessing data, and optimizing queries consti-tute the basics for scaling algorithms to larger data sets. Most data-mining algorithms from statistics, pattern recognition, and machine learning assume data are in the main memo-ry and pay no attention to how the algorithm breaks down if only limited views of the data are possible.A related field evolving from databases is data warehousing,which refers to the popular business trend of collecting and cleaning transactional data to make them available for online analysis and decision support. Data warehousing helps set the stage for KDD in two important ways: (1) data cleaning and (2)data access.Data cleaning: As organizations are forced to think about a unified logical view of the wide variety of data and databases they pos-sess, they have to address the issues of map-ping data to a single naming convention,uniformly representing and handling missing data, and handling noise and errors when possible.Data access: Uniform and well-defined methods must be created for accessing the da-ta and providing access paths to data that were historically difficult to get to (for exam-ple, stored offline).Once organizations and individuals have solved the problem of how to store and ac-cess their data, the natural next step is the question, What else do we do with all the da-ta? This is where opportunities for KDD natu-rally arise.A popular approach for analysis of data warehouses is called online analytical processing (OLAP), named for a set of principles pro-posed by Codd (1993). OLAP tools focus on providing multidimensional data analysis,which is superior to SQL in computing sum-maries and breakdowns along many dimen-sions. OLAP tools are targeted toward simpli-fying and supporting interactive data analysis,but the goal of KDD tools is to automate as much of the process as possible. Thus, KDD is a step beyond what is currently supported by most standard database systems.Basic DefinitionsKDD is the nontrivial process of identifying valid, novel, potentially useful, and ultimate-and still run efficiently, how results can be in-terpreted and visualized, and how the overall man-machine interaction can usefully be modeled and supported. The KDD process can be viewed as a multidisciplinary activity that encompasses techniques beyond the scope of any one particular discipline such as machine learning. In this context, there are clear opportunities for other fields of AI (be-sides machine learning) to contribute to KDD. KDD places a special emphasis on find-ing understandable patterns that can be inter-preted as useful or interesting knowledge.Thus, for example, neural networks, although a powerful modeling tool, are relatively difficult to understand compared to decision trees. KDD also emphasizes scaling and ro-bustness properties of modeling algorithms for large noisy data sets.Related AI research fields include machine discovery, which targets the discovery of em-pirical laws from observation and experimen-tation (Shrager and Langley 1990) (see Kloes-gen and Zytkow [1996] for a glossary of terms common to KDD and machine discovery),and causal modeling for the inference of causal models from data (Spirtes, Glymour,and Scheines 1993). Statistics in particular has much in common with KDD (see Elder and Pregibon [1996] and Glymour et al.[1996] for a more detailed discussion of this synergy). Knowledge discovery from data is fundamentally a statistical endeavor. Statistics provides a language and framework for quan-tifying the uncertainty that results when one tries to infer general patterns from a particu-lar sample of an overall population. As men-tioned earlier, the term data mining has had negative connotations in statistics since the 1960s when computer-based data analysis techniques were first introduced. The concern arose because if one searches long enough in any data set (even randomly generated data),one can find patterns that appear to be statis-tically significant but, in fact, are not. Clearly,this issue is of fundamental importance to KDD. Substantial progress has been made in recent years in understanding such issues in statistics. Much of this work is of direct rele-vance to KDD. Thus, data mining is a legiti-mate activity as long as one understands how to do it correctly; data mining carried out poorly (without regard to the statistical as-pects of the problem) is to be avoided. KDD can also be viewed as encompassing a broader view of modeling than statistics. KDD aims to provide tools to automate (to the degree pos-sible) the entire process of data analysis and the statistician’s “art” of hypothesis selection.Data mining is a step in the KDD process that consists of ap-plying data analysis and discovery al-gorithms that produce a par-ticular enu-meration ofpatterns (or models)over the data.Articles40AI MAGAZINEly understandable patterns in data (Fayyad, Piatetsky-Shapiro, and Smyth 1996).Here, data are a set of facts (for example, cases in a database), and pattern is an expres-sion in some language describing a subset of the data or a model applicable to the subset. Hence, in our usage here, extracting a pattern also designates fitting a model to data; find-ing structure from data; or, in general, mak-ing any high-level description of a set of data. The term process implies that KDD comprises many steps, which involve data preparation, search for patterns, knowledge evaluation, and refinement, all repeated in multiple itera-tions. By nontrivial, we mean that some search or inference is involved; that is, it is not a straightforward computation of predefined quantities like computing the av-erage value of a set of numbers.The discovered patterns should be valid on new data with some degree of certainty. We also want patterns to be novel (at least to the system and preferably to the user) and poten-tially useful, that is, lead to some benefit to the user or task. Finally, the patterns should be understandable, if not immediately then after some postprocessing.The previous discussion implies that we can define quantitative measures for evaluating extracted patterns. In many cases, it is possi-ble to define measures of certainty (for exam-ple, estimated prediction accuracy on new data) or utility (for example, gain, perhaps indollars saved because of better predictions orspeedup in response time of a system). No-tions such as novelty and understandabilityare much more subjective. In certain contexts,understandability can be estimated by sim-plicity (for example, the number of bits to de-scribe a pattern). An important notion, calledinterestingness(for example, see Silberschatzand Tuzhilin [1995] and Piatetsky-Shapiro andMatheus [1994]), is usually taken as an overallmeasure of pattern value, combining validity,novelty, usefulness, and simplicity. Interest-ingness functions can be defined explicitly orcan be manifested implicitly through an or-dering placed by the KDD system on the dis-covered patterns or models.Given these notions, we can consider apattern to be knowledge if it exceeds some in-terestingness threshold, which is by nomeans an attempt to define knowledge in thephilosophical or even the popular view. As amatter of fact, knowledge in this definition ispurely user oriented and domain specific andis determined by whatever functions andthresholds the user chooses.Data mining is a step in the KDD processthat consists of applying data analysis anddiscovery algorithms that, under acceptablecomputational efficiency limitations, pro-duce a particular enumeration of patterns (ormodels) over the data. Note that the space ofArticlesFALL 1996 41Figure 1. An Overview of the Steps That Compose the KDD Process.methods, the effective number of variables under consideration can be reduced, or in-variant representations for the data can be found.Fifth is matching the goals of the KDD pro-cess (step 1) to a particular data-mining method. For example, summarization, clas-sification, regression, clustering, and so on,are described later as well as in Fayyad, Piatet-sky-Shapiro, and Smyth (1996).Sixth is exploratory analysis and model and hypothesis selection: choosing the data-mining algorithm(s) and selecting method(s)to be used for searching for data patterns.This process includes deciding which models and parameters might be appropriate (for ex-ample, models of categorical data are differ-ent than models of vectors over the reals) and matching a particular data-mining method with the overall criteria of the KDD process (for example, the end user might be more in-terested in understanding the model than its predictive capabilities).Seventh is data mining: searching for pat-terns of interest in a particular representa-tional form or a set of such representations,including classification rules or trees, regres-sion, and clustering. The user can significant-ly aid the data-mining method by correctly performing the preceding steps.Eighth is interpreting mined patterns, pos-sibly returning to any of steps 1 through 7 for further iteration. This step can also involve visualization of the extracted patterns and models or visualization of the data given the extracted models.Ninth is acting on the discovered knowl-edge: using the knowledge directly, incorpo-rating the knowledge into another system for further action, or simply documenting it and reporting it to interested parties. This process also includes checking for and resolving po-tential conflicts with previously believed (or extracted) knowledge.The KDD process can involve significant iteration and can contain loops between any two steps. The basic flow of steps (al-though not the potential multitude of itera-tions and loops) is illustrated in figure 1.Most previous work on KDD has focused on step 7, the data mining. However, the other steps are as important (and probably more so) for the successful application of KDD in practice. Having defined the basic notions and introduced the KDD process, we now focus on the data-mining component,which has, by far, received the most atten-tion in the literature.patterns is often infinite, and the enumera-tion of patterns involves some form of search in this space. Practical computational constraints place severe limits on the sub-space that can be explored by a data-mining algorithm.The KDD process involves using the database along with any required selection,preprocessing, subsampling, and transforma-tions of it; applying data-mining methods (algorithms) to enumerate patterns from it;and evaluating the products of data mining to identify the subset of the enumerated pat-terns deemed knowledge. The data-mining component of the KDD process is concerned with the algorithmic means by which pat-terns are extracted and enumerated from da-ta. The overall KDD process (figure 1) in-cludes the evaluation and possible interpretation of the mined patterns to de-termine which patterns can be considered new knowledge. The KDD process also in-cludes all the additional steps described in the next section.The notion of an overall user-driven pro-cess is not unique to KDD: analogous propos-als have been put forward both in statistics (Hand 1994) and in machine learning (Brod-ley and Smyth 1996).The KDD ProcessThe KDD process is interactive and iterative,involving numerous steps with many deci-sions made by the user. Brachman and Anand (1996) give a practical view of the KDD pro-cess, emphasizing the interactive nature of the process. Here, we broadly outline some of its basic steps:First is developing an understanding of the application domain and the relevant prior knowledge and identifying the goal of the KDD process from the customer’s viewpoint.Second is creating a target data set: select-ing a data set, or focusing on a subset of vari-ables or data samples, on which discovery is to be performed.Third is data cleaning and preprocessing.Basic operations include removing noise if appropriate, collecting the necessary informa-tion to model or account for noise, deciding on strategies for handling missing data fields,and accounting for time-sequence informa-tion and known changes.Fourth is data reduction and projection:finding useful features to represent the data depending on the goal of the task. With di-mensionality reduction or transformationArticles42AI MAGAZINEThe Data-Mining Stepof the KDD ProcessThe data-mining component of the KDD pro-cess often involves repeated iterative applica-tion of particular data-mining methods. This section presents an overview of the primary goals of data mining, a description of the methods used to address these goals, and a brief description of the data-mining algo-rithms that incorporate these methods.The knowledge discovery goals are defined by the intended use of the system. We can distinguish two types of goals: (1) verification and (2) discovery. With verification,the sys-tem is limited to verifying the user’s hypothe-sis. With discovery,the system autonomously finds new patterns. We further subdivide the discovery goal into prediction,where the sys-tem finds patterns for predicting the future behavior of some entities, and description, where the system finds patterns for presenta-tion to a user in a human-understandableform. In this article, we are primarily con-cerned with discovery-oriented data mining.Data mining involves fitting models to, or determining patterns from, observed data. The fitted models play the role of inferred knowledge: Whether the models reflect useful or interesting knowledge is part of the over-all, interactive KDD process where subjective human judgment is typically required. Two primary mathematical formalisms are used in model fitting: (1) statistical and (2) logical. The statistical approach allows for nondeter-ministic effects in the model, whereas a logi-cal model is purely deterministic. We focus primarily on the statistical approach to data mining, which tends to be the most widely used basis for practical data-mining applica-tions given the typical presence of uncertain-ty in real-world data-generating processes.Most data-mining methods are based on tried and tested techniques from machine learning, pattern recognition, and statistics: classification, clustering, regression, and so on. The array of different algorithms under each of these headings can often be bewilder-ing to both the novice and the experienced data analyst. It should be emphasized that of the many data-mining methods advertised in the literature, there are really only a few fun-damental techniques. The actual underlying model representation being used by a particu-lar method typically comes from a composi-tion of a small number of well-known op-tions: polynomials, splines, kernel and basis functions, threshold-Boolean functions, and so on. Thus, algorithms tend to differ primar-ily in the goodness-of-fit criterion used toevaluate model fit or in the search methodused to find a good fit.In our brief overview of data-mining meth-ods, we try in particular to convey the notionthat most (if not all) methods can be viewedas extensions or hybrids of a few basic tech-niques and principles. We first discuss the pri-mary methods of data mining and then showthat the data- mining methods can be viewedas consisting of three primary algorithmiccomponents: (1) model representation, (2)model evaluation, and (3) search. In the dis-cussion of KDD and data-mining methods,we use a simple example to make some of thenotions more concrete. Figure 2 shows a sim-ple two-dimensional artificial data set consist-ing of 23 cases. Each point on the graph rep-resents a person who has been given a loanby a particular bank at some time in the past.The horizontal axis represents the income ofthe person; the vertical axis represents the to-tal personal debt of the person (mortgage, carpayments, and so on). The data have beenclassified into two classes: (1) the x’s repre-sent persons who have defaulted on theirloans and (2) the o’s represent persons whoseloans are in good status with the bank. Thus,this simple artificial data set could represent ahistorical data set that can contain usefulknowledge from the point of view of thebank making the loans. Note that in actualKDD applications, there are typically manymore dimensions (as many as several hun-dreds) and many more data points (manythousands or even millions).ArticlesFALL 1996 43Figure 2. A Simple Data Set with Two Classes Used for Illustrative Purposes.。
药物基因组学
应用
• 试验中失败的药物都有可能“推倒重来”。已被淘汰的或 未被批准的药物中,可能存在对某些病人有很好疗效的药 物。如果对这类药物配上基因标签,表明对某类人群有效, 那么应用基因芯片技术对特定人群的前期基因诊断,可能 有助于新药的开发。
• 另外,由于基因组学规模大、手段新、系统性强,药物基 因组学可以直接加速新药的发现。
整合基因与药物基因组学的平台多样性分析
Epidauros Biotechnologie Janssen Pharmaceutica Nova Mollecular
目的基因多态性分析 线粒体基因多样性分析 中枢神经系统疾病图
基本原理
• 药物基因组学=基因功能学+分子药理学
• 不是以发现人体基因组基因为主要目的, • 而是相对简单地运用已知的基因理论改善病人的治疗。 • 也可以这么说,药物基因组学以药物效应及安全性为目标,
研究动态
实验室和(或)公司
Aeiveos Sciences Group (Seattle, WA)
Avitech Diagostics (Malvern, PA)
研究领域 年龄相关的基因及基因作用
酶基因突变检测方法
Eurona Medical,AB (Upsala,SE) 药物效应与遗传学关系
Genome Therapeutics Crop (Waltham, MA)
药物疗效的个体差异性
• 药物基因组学(pharmacogenomics)是研究 DNA和RNA特征的变异与药物反应相关性的科学, 即研究基因序列的多态性与药物效应多样性之间的 关系。
它是一门研究影响药物吸收、转运、代谢、清除、 效应等个体差异的基因特性,即决定药物行为和敏 感性的全部基因的新学科。主要阐明药物代谢、药 物转运、和药物靶分子的基因多态性与药物效应及 不良反应之间的关系,并在此基础上研制新的药物 或新的用药方法。
如何生成全脑体素的时间序列
如何生成全脑体素的时间序列Generating a whole-brain voxel-wise time series involves acquiring data from each voxel in the brain over a period of time. This process typically requires the use of functional magnetic resonance imaging (fMRI) to capture changes in blood flow and oxygenation that are associated with neural activity. By collecting data at multiple time points, researchers can create a time series that represents the dynamic activity of different brain regions.生成全脑体素的时间序列涉及在一段时间内从大脑的每个体素获取数据。
这个过程通常需要使用功能性磁共振成像(fMRI)来捕获与神经活动相关的血流和氧合变化。
通过在多个时间点收集数据,研究人员可以创建一个代表不同脑区动态活动的时间序列。
One approach to generating voxel-wise time series data is to preprocess fMRI images to correct for artifacts and noise before extracting time series data. This preprocessing may involve motion correction, spatial normalization, and temporal filtering to improve the quality of the data. By carefully processing the raw fMRI images,researchers can reduce the impact of confounding factors and obtain more accurate time series data for further analysis.生成体素级时间序列数据的一种方法是在提取时间序列数据之前对fMRI图像进行预处理,以纠正伪影和噪声。
人工智能辅助的药物设计
内存峰值 (GB)
3000
TB
2500
2000
1500
1000
GB
500
0
0
2000 4000 6000 8000
序列长度 (aa)
越低越好
w/o TPP w/ TPP 多项式 (w/o TPP) 线性 (w/ TPP)
预测的蛋白结构
内存占用量级
预测的蛋白结构示例
单卡GPU是最优选择
~20GB
~40GB
ICX6330
ICX6330
SPR9462
谢谢!
人工智能辅助的药物设计
内容总览
1. 市场挑战及突破口 2. 大分子药物设计代表场景的优化 3. 小分子药物设计代表场景的优化
Al药物设计的场景和挑战
大分子药物设计
A. Anishchico et al. Nature 2021
基础研究的工具分子
Hallucination AfDesign trRosetta
基线 + icc
…
使用英特尔 C++ 编译器 配置 jemalloc
基线 + icc + AVX512
…
使用英特尔 AVX-512 指令集 优化MSA热点函数
基线 + icc + AVX512 + 并 行MSA
Parallel MSA
…
并行处理 MSA 搜索
1.97
1.00 Baseline
1.25 Intel C++ Compiler + jemalloc
xB
64GB
HBM2e
up to
112.5MB
R的应用领域包介绍
R的应用领域包介绍 By R-FoxAnalysis of Pharmacokinetic Data 药物(代谢)动力学数据分析网址:/web/views/Pharmacokinetics.html维护人员:Suzette Blanchard版本:2008-02-15翻译:R-fox, 2008-04-12药物(代谢)动力学数据分析的主要目的是用非线性浓度时间曲线(concentration time curve)或相关的总结(如曲线下面积)确定给药方案(dosing regimen)和身体对药物反应间的关系。
R基本包里的nls()函数用非线性最小二乘估计法估计非线性模型的参数,返回nls类的对象,有 coef(),formula(), resid(),print(), summary(),AIC(),fitted() and vcov()等方法。
在主要目的实现后,兴趣就转移到研究属性(如:年龄、体重、伴随用药、肾功能)不同的人群是否需要改变药物剂量。
在药物(代谢)动力学领域,分析多个个体的组合数据估计人群参数被称作群体药动学(population PK)。
非线性混合模型为分析群体药动学数据提供了自然的工具,包括概率或贝叶斯估计方法。
nlme包用Lindstrom和Bates提出的概率方法拟合非线性混合效应模型(1990, Biometrics 46, 673-87),允许nested随机效应(nested random effects),组内误差允许相关的或不等的方差。
返回一个nlme类的对象表示拟合结果,结果可用print(),plot()和summary() 方法输出。
nlme对象给出了细节的结果信息和提取方法。
nlmeODE包组合odesolve包和nlme包做混合效应建模,包括多个药动学/药效学(PK/PD)模型。
面版数据(panel data)的贝叶斯估计方法在CRAN的Bayesian Inference任务列表里有所描述(/web/views/Bayesian.html)。
A well conditioned estimator for large dimensional covariance matrices, Unpublished working
A Well-Conditioned EstimatorFor Large Dimensional Covariance MatricesOlivier Ledoit AOctober26,1996AbstractMany economic problems require a covariance matrix estimator that is not only invert-ible,but also well-conditioned(i.e.inverting it does not amplify estimation error).Forlarge-dimensional covariance matrices,the usual estimator–the sample covariance matrix–is typically not well-conditioned.This paper introduces an estimator that is both well-conditioned and more accurate than the sample covariance matrix asymptotically.This es-timator is distribution-free and has a simple explicit formula that is easy to compute andinterpret.It is the asymptotically optimal linear combination of the sample covariance matrixwith the identity matrix.Optimality is meant with respect to a quadratic loss function,asymp-totically as the number of observations and the number of variables go to infinity together.Extensive Monte-Carlo confirm that all asymptotic results hold well in finite sample.Keywords:portfolio selection,generalized least squares,generalized method of moments, positive definite,empirical Bayesian,decision theory,shrinkage.A The Anderson School at UCLA,Los Angeles,California90095-1481;E-mail:ledoit@;Phone: (310)2066562;Fax:(310)2065455.Routines implementing my estimator can be found on the Web at /acad unit/finance/olivier/homepage.htm This work originally circulated as my job mar-ket paper(Ledoit,1994),and formed the first chapter of my Ph.D.thesis at MIT.I wish to acknowledge the help of my professors at MIT,especially the members of my thesis committee John Heaton,Bin Zhou and the chairman Andrew Lo.I got valuable feedback from participants to seminars at MIT,the NBER,UCLA,Washington Univer-sity in Saint Louis,Yale,Chicago,Wharton and UBC.I also thank Petr Adamek,Dimitri Bertsimas,Minh Chau, Whitney Newey,Firooz Partovi,Pedro Santa-Clara and especially Tim Crack and Jack Silverstein.Errors are still mine.Many problems of variance minimization in Finance and Economics are solved by inverting a covariance matrix.Sometimes the matrix dimension can be large.Examples include selecting a mean-variance efficient portfolio from a large universe of stocks(Markowitz,1952),running Generalized Least Squares(GLS)regressions in large cross-sections(see e.g.Kandel and Stam-baugh,1995),and choosing an optimal weighting matrix in the General Method of Moments (GMM;see Hansen,1982)when the number of moment restrictions is large.In such situations the usual estimator–the sample covariance matrix–is known to perform poorly.When the matrix dimension S is larger than the number of time-series observations available Q,the sample covariance matrix is not even invertible.When the ratio S Q is less than one but not negligible,the sample covariance matrix is theoretically invertible,but numerically ill-conditioned,which means that inverting it amplifies estimation error dramatically. For large S,it is difficult to find enough observations to make S Q negligible.Therefore it is important to develop a well-conditioned estimator for large-dimensional covariance matrices.If we wanted a well-conditioned estimator at any cost,we could always impose some ad-hoc structure on the covariance matrix to force it to be well-conditioned,such as diagonality or a factor model.But,in the absence of prior information about the true structure of the matrix,this ad-hoc structure will in general be misspecified.The resulting estimator can be so biased that it may bear little resemblance to the true covariance matrix.To the best of our knowledge,no existing estimator is both well-conditioned and more accurate than the sample covariance matrix. The contribution of this paper is to propose an estimator that possesses both these properties asymptotically.One way to get a well-conditioned structured estimator is to impose that all variances are the same and all covariances are zero.The estimator that we recommend is a weighted average of this structured estimator with the sample covariance matrix.The average inherits the good con-ditioning of the structured estimator.By choosing the weight optimally according to a quadratic loss function,we can ensure that our weighted average of the sample covariance matrix andThe condition number is defined as the ratio of the largest to the smallest eigenvalue.It measures how invertible a matrix is.A matrix with low condition number can be safely inverted,and is called well-conditioned.A matrix with high condition number is almost not invertible,and is called ill-conditioned.Jobson and Korkie(1980)and Michaud(1989)show that it is difficult to use the sample covariance matrix for portfolio selection because it is typically ill-conditioned.the structured estimator is more accurate than both of them.The only difficulty is that the true optimal weight depends on the true covariance matrix,which is unobservable.We solve this difficulty by finding a consistent estimator for the optimal weight.We also show that replacing the true optimal weight with a consistent estimator makes no difference asymptotically.Thus, our estimator is the optimal weighted average of the sample covariance matrix and the structured estimator asymptotically.Standard asymptotics assume that the number of variables S is finite,while the number of observations Q goes to infinity.Under standard asymptotics,the sample covariance matrix is well-conditioned(in the limit),and has some appealing optimality properties(e.g.maximum likelihood for the normal distribution).This is clearly a bad approximation of real-world situations where the number of variables S is of the same order of magnitude as the number of observations Q, and possibly larger.We introduce a different framework,called general asymptotics,where we allow the number of variables S to go to infinity too.The only constraint is that the ratio S Q must remain bounded.We see standard asymptotics as a special case in which it is optimal to put all the weight on the sample covariance matrix and none on the structured estimator.In the general case,however,our estimator is different from the sample covariance matrix,substantially more accurate,and of course well-conditioned.Extensive Monte-Carlo simulations indicate that:(i)the new estimator is more accurate than the sample covariance matrix,even for very small numbers of observations and variables, and usually by a lot;(ii)it is essentially as accurate or substantially more accurate than some estimators proposed in finite sample decision theory,as soon as there are at least ten variables and observations;(iii)it is better-conditioned than the true covariance matrix;and(iv)general asymptotics are a good approximation of finite sample behavior when there are at least twenty observations and variables.The first section characterizes in finite sample the linear combination of the identity matrix and the sample covariance matrix with minimum quadratic risk.The following section develops a linear shrinkage estimator with uniformly minimum quadratic risk in its class asymptotically as the number of observations and the number of variables go to infinity together.In Section3, Monte-Carlo simulations indicate that this estimator behaves well in finite sample.The conclusion suggests directions for future research.1Analysis in Finite SampleThe easiest way to explain what we do is to first analyze in detail the finite sample case.Let ;denote a S d Q matrix of Q iid observations on a system of S random variables with mean zero and covariance matrix h .Following the lead of Muirhead and Leung (1987),we consider the Frobenius norm:N $N T WU $$W S . Our goal is to find the linear combination h e | , | 6of the identity matrix ,and the sample covariance matrix 6 ;;W Q whose expected quadratic loss (>N h e b h N @is minimum.Haff (1980)studied this class of linear shrinkage estimators,but did not get any optimality results.The optimality result that we obtain in finite sample will come at a price:h e will not be a bona fide estimator,because it will require hindsight knowledge of four scalar functions of the true (and unobservable)covariance matrix h .This would seem like a high price to pay but,interestingly,it is not:In the next section,we are able to develop a bona fide estimator 6e with the same properties as h e asymptotically as the number of observations and the number of variables go to infinity together .Furthermore,extensive Monte-Carlo simulations will indicate that twenty observations and variables are enough for the asymptotic approximations to hold accurately.Even the formulas for h e and 6e will look the same and have the same interpretations.This is why we study the properties of h e in finite sample “as if”it was a bona fide estimator.1.1Optimal Linear ShrinkageThe squared Frobenius norm NcN is a quadratic form whose associated inner product is:$ p $WU $ $W S .Four scalars play a central role in the analysis:x h p ,,m N h b x,N ,n (>N 6b h N @,and p (>N 6b x,N @.We do not need to assume that the random variables in ;follow a specific distribution,but we do need to assume that they have finite fourth moments,so that n and p are finite.The following relationship holds.Dividing by the dimension 5is not standard,but it does not matter in this section because 5remains finite.The advantages of this convention are that the norm of the identity matrix is simply one,and that it will be consistent with Definition 2below.Theorem1.1m n p .Proof of Theorem1.1(>N6b x,N @(>N6b h h b x,N @(1)(>N6b h N @ (>N h b x,N @ (> 6b h p h b x, @(2)(>N6b h N @ N h b x,N (>6b h@p h b x, (3)Notice that(>6@h,therefore the third term on the right hand side of Equation(3)is equal to zero.This completes the proof of Theorem1.1.(The optimal linear combination h e| , | 6of the identity matrix,and the sample covariance matrix6is the standard solution to a simple quadratic programming problem under linear equality constraint.Theorem1.2Consider the optimization problem:PLQ| |(>N h e b h N @V W h e| , | 6(4) where the coefficients| and| are nonrandom.Its solution verifies:h e npx,mp6(5)(>N h e b h N @m np(6)Proof of Theorem1.2By a change of variables,Problem(4)can be rewritten as:PLQ| y(>N h e b h N @V W h e|y, b| 6(7)With a little algebra,and using(>6@h as in the proof of Theorem1.1,we can rewrite the objective as:(>N h e b h N @| N h b y,N b| (>N6b h N @ (8)Therefore the optimal value of y can be obtained as the solution to a reduced problem that does not depend on |:PLQ y N h b y,N .Remember that the norm of the identity is one by convention,so the objective of this problem can be rewritten as:N h b y,N N h N b y h p , y .The first order condition is:b h p , y .The solution is:y h p , x .Replacing y by its optimal value x in Equation (8),we can rewrite the objective of the original problem as:(>N h e b h N @ | m b | n .The first order condition is: |m b b | n .The solution is:| n m n n p .Note that b | m p .At the optimum,the objective is equal to: n p m m p n m n p .This completes the proof.(x,can be interpreted as a shrinkage target,and the weight n p placed on x,as a shrinkage intensity.The Percentage Relative Improvement in Average Loss (PRIAL)over the sample covariance matrix is equal to:(>N 6b h N @b (>N h e b h N @(>N 6b h N @ n p (9)same as the shrinkage intensity.Therefore everything is controlled by the ratio n p ,which is a properly normalized measure of the error of the sample covariance matrix 6.Intuitively,if 6is relatively accurate,then you should not shrink it too much,and shrinking it will not help you much either;if 6is relatively inaccurate,then you should shrink it a lot,and you also stand to gain a lot from shrinking.1.2InterpretationsThe mathematics underlying Theorem 1.2are so rich that we are able to provide four comple-mentary interpretations of it.One is geometric,and the others echo some of the most important ideas in finite sample multivariate statistics.First,we can see Theorem 1.2as a projection theorem in Hilbert space.The appropriate Hilbert space is the space of S -dimensional symmetric random matrices $such that (>N $N @ .The associated norm is,of course,T (>N c N @,and the inner product of two random matrices $ and $ is (>$ p $ @.With this structure,Theorem 1.1is just a rewriting of the Pythagorean Theorem.Furthermore,Formula (5)can be justified as follows:In order to project the truecovariance matrix h onto the space spanned by the identity matrix ,and the sample covariance matrix 6,we first project it onto the line spanned by the identity,which yields the shrinkage target x,;then we project h onto the line joining the shrinkage target x,to the sample covariance matrix 6.Whether the projection h e ends up closer to one end of the line (x,)or to the other(6)depends on which one of them h was closer to.Figure 1provides a geometrical illustration.ΣSµΙαβδΣ*Figure 1:Theorem 1.2Interpreted as a Projection in Hilbert Space.The second way to interpret Theorem 1.2is as a trade-off between bias and variance.We seek to minimize mean squared error,which can be decomposed into variance and squared bias:(>N h e b h N @ (>N h e b (>h e @N @ N (>h e @b h N (10)The mean squared error of the shrinkage target x,is all bias and no variance,while for the sample covariance matrix 6it is exactly the opposite:all variance and no bias.h e represents the optimal trade-off between error due to bias and error due to variance.See Figure 2for an illustration.The idea of a trade-off between bias and variance was already central to the original James-Stein (1961)shrinkage technique.The third interpretation is Bayesian.h e can be seen as the combination of two signals:prior information and sample information.Prior information states that the true covariance matrix h lies on the sphere centered around the shrinkage target x,with radius m .Sample information states that h lies on another sphere,centered around the sample covariance matrix 6with radius n .Bringing together prior and sample information,h must lie on the intersection of the two spheres,which is a circle.At the center of this circle stands h e .The relative importance givenMean Squared Error Variance Squared Bias01Shrinkage Intensity E r r o rFigure 2:Theorem 1.2Interpreted as a Trade-off Between Bias and Variance.Shrinkage intensity zero corresponds to the sample covariance matrix 6.Shrinkage intensity one corresponds to the shrinkage target x,.Optimal shrinkage intensity (represented by q )corresponds to the minimum expected loss combination h e .to prior vs.sample information in determining h e depends on which one is more accurate. See Figure 3for an illustration.The idea of drawing inspiration from the Bayesian perspective to obtain an improved estimator of the covariance matrix was used by Haff (1980).The fourth and last interpretation involves the cross-sectional dispersion of covariance matrix eigenvalues.Let w w S denote the eigenvalues of the true covariance matrix h ,and O O S those of the sample covariance matrix 6.We can exploit the Frobenius norm’s elegant relationship to eigenvalues.Note that x S S ;L w L ( S S ;L O L (11)represents the grand mean of both true and sample eigenvalues.Then Theorem 1.1can be rewritten as: S ( S ;L O L b x S S ;L w L b x (>N 6b h N @ (12)In words,sample eigenvalues are more dispersed around their grand mean than true ones,andStrictly speaking,a full Bayesian approach would specify not only the support of the distribution of D ,but also the distribution itself.We could assume that D is uniformly distributed on the sphere,but it might be difficult to justify.Thus,D A should not be thought of as the expectation of the posterior distribution,as is traditional,but rather as the center of mass of its support.Figure3:Bayesian Interpretation.The left sphere has center x,and radius m and represents priorinformation.The right sphere has center6and radius n.The distance between sphere centers is p and represents sample information.If all we knew was that the true covariance matrix h lies on the left sphere,our best guess would be its center:the shrinkage target x,.If all we knewwas that the true covariance matrix h lies on the right sphere,our best guess would be its center:the sample covariance matrix6.Putting together both pieces of information,the true covariancematrix h must lie on the circle where the two spheres intersect,therefore our best guess is itscenter:the optimal linear shrinkage h e.the excess dispersion is equal to the error of the sample covariance matrix.Excess dispersion implies that the largest sample eigenvalues are biased upwards,and the smallest ones downwards.Therefore we can improve upon the sample covariance matrix by shrinking its eigenvalues towards their grand mean,as in:L S w e L n p x m p O L (13)Note that w e w e S defined by Equation (13)are precisely the eigenvalues of h e .Surprisingly,their dispersion (>3S L w e L b x @ S m p is even below the dispersion of true eigenvalues.For the interested reader,the next paragraphs explain why.The idea that shrinking sample eigenvalues towards their grand mean yields an improved estimator of the covariance matrix was highlighted in Muirhead’s (1987)review paper.1.3Further Results on Sample EigenvaluesThe following paragraphs contain additional insights about the eigenvalues of the sample covari-ance matrix,but the reader can skip them and go directly to Section 2if he or she so wishes.We discuss:1)why the eigenvalues of the sample covariance matrix are more dispersed than those of the true covariance matrix (Equation (12));2)how important this effect is in practice;and 3)why we should use instead an estimator whose eigenvalues are less dispersed than those of the true covariance matrix (Equation (13)).The explanation relies on a result from matrix algebra.Theorem 1.3The eigenvalues are the most dispersed diagonal elements that can be obtained by rotation.Proof of Theorem 1.3Let 5denote a S -dimensional symmetric matrix and 9a S -dimensional rotation matrix:99 9 9 ,.First,note that S WU 9 59 S WU 5 .The average of the diagonal elements is invariant by rotation.Call it U .Let Y L denote the L WK column of9.The dispersion of the diagonal elements of 9 59is S3S L Y L 5Y L b U .Note that3S L Y L 5Y L b U 3S L 3S M M L Y L 5Y M WU> 9 59b U, @ WU> 5b U, @is invariant by rotation.Therefore the rotation 9maximizes the dispersion of the diagonal elements of 9 59if and only if it minimizes 3S L 3SM M L Y L 5Y M .This is achieved by setting Y L 5Y M to zero for all L M .In this case,9 59is a diagonal matrix,call it '.9 59 'is equivalent to 5 959 .Since9is a rotation and'is diagonal,the columns of9must contain the eigenvectors of5and the diagonal of'its eigenvalues.Therefore the dispersion of the diagonal elements of9 59is maximized when these diagonal elements are equal to the eigenvalues of5.This completes the proof of Theorem1.3.(Decompose the true covariance matrix into eigenvalues and eigenvectors:h b eb,where e is a diagonal matrix,and b is a rotation matrix.The diagonal elements of e are the eigenvalues w w S,and the columns of b are the eigenvectors o o S.Similarly,decompose the sample covariance matrix into eigenvalues and eigenvectors:6* /*,where/is a diagonal matrix, and*is a rotation matrix.The diagonal elements of/are the eigenvalues O O S,and the columns of*are the eigenvectors J J S.Since6is unbiased and b is nonstochastic,b 6b is an unbiased estimator of e b hb.The diagonal elements of b 6b are approximately as dispersed as the ones of b hb.For convenience, let us speak as if they were exactly as dispersed.By contrast,/* 6*is not at all an unbiased estimator of b hb.This is because the errors of*and6interact.Theorem1.3shows us the effect of this interaction:the diagonal elements of* 6*are more dispersed than those of b 6b (and hence than those of b hb).This is why sample eigenvalues are more dispersed than true ones.See Table1for a summary.b 6b * 6*RRb hb * h*Table1:Dispersion of Diagonal ElementsThis table compares the dispersion of the diagonal elements of certain products of matrices.The symbols ,{,and pertain to diagonal elements,and mean less dispersed than,approximately as dispersed as,and more dispersed than,respectively.We illustrate how important this effect is in a particular case:when the true covariance matrix is the identity matrix.Let us sort the eigenvalues of the sample covariance matrix from largest to smallest,and plot them against their rank.The shape of the plot depends on the ratio S Q,but does not depend on the particular realization of the sample covariance matrix,at least approximately when S and Q are very large.Figure4shows the distribution of sample eigenvalues for various values of the ratio S Q.This figure is based on the asymptotic formula proven byLargest Smallest01234S o r t e d E i g e n v a l u e s p/n=0.1Largest Smallest 01234S o r t e d E i g e n v a l u e sp/n=0.5Largest Smallest 01234S o r t e d E i g e n v a l u e s p/n=1 Largest Smallest01234S o r t e d E i g e n v a l u e sp/n=2Figure 4:Sample vs.True Eigenvalues.The solid line represents the distribution of the eigenval-ues of the sample covariance matrix.Eigenvalues are sorted from largest to smallest,then plotted against their rank.In this case,the true covariance matrix is the identity,i.e.true eigenvalues are all equal to one.The distribution of true eigenvalues is plotted as a dashed horizontal line at one.Distributions are obtained in the limit as the number of observations Q and the number of variables S both go to infinity,when their ratio S Q converges to a finite positive limit.The four plots correspond to different values of the limit.Marcenko and Pastur(1967).We notice that the largest sample eigenvalues are severely biased upwards,and the smallest ones downwards.The bias increases in S Q.This phenomenon is very general and is not limited to the identity case.It is similar to the effect observed by Brown (1989)in Monte-Carlo simulations.Finally,let us remark that the sample eigenvalues O L J L6J L should not be compared to the true eigenvalues w L o L h o L,but to J L h J L.We should compare estimated vs.true variance associated with vector J L.By Theorem1.3again,the diagonal elements of* h*are even less dispersed than those of b hb.Not only are sample eigenvalues more dispersed than true ones, but they should be less dispersed.This effect is attributable to error in the sample eigenvectors. Intuitively:Statisticians should shy away from taking a strong stance on extremely small and extremely large eigenvalues,because they know that they have the wrong eigenvectors.The sample covariance matrix is guilty of taking an unjustifiably strong stance.The optimal linear shrinkage h e corrects for that.2Analysis under General AsymptoticsIn the previous section,we have shown that h e has an appealing optimality property and fits well in the existing literature.It has only one drawback:it is not a bona fide estimator,since it requires hindsight knowledge of four scalar functions of the true(and unobservable)covariance matrix h:x,m ,n and p .We now fix this problem.The idea is that,asymptotically,there exists consistent estimators for x,m ,n and p ,hence for h e too.The difficulty is that we cannot use standard asympotics,where the number of observations Q goes to infinity and the number of variables S remains finite,since it would imply that the sample covariance matrix is optimal.Specifically,the optimal shrinkage intensity and the improvement from shrinking would vanish under standard asymptotics.We have to consider a more general framework where these two quantities need not vanish in the limit.This is achieved by allowing the number of variables S to go to infinity at the same speed as the number of observations Q.It is called general asymptotics. In practice,general asymptoticsTo the best of our knowledge,the framework of general asymptotics has not been used before to improve overare appropriate when the number of variables and the number of observations are both large (so that asymptotic approximations are accurate),and of the same order of magnitude(so that we can improve over the sample covariance matrix).This happens frequently in the real world: for example,Ledoit(1994)describes an application to Financial Economics.In the following section,extensive Monte-Carlo simulations will indicate that twenty observations and variables are enough for general asymptotic approximations to hold accurately.In a nutshell,general asymptotics allow for improvement over the sample covariance matrix, as in finite sample,while retaining the tractability of standard asymptotics.A key advantage over finite sample is that,asymptotically,we can neglect the error that we introduce when we replace the four unobservable scalars by consistent estimators.Furthermore,only general asymptotics allow the number of variables to be higher than the number of observations.2.1General AsymptoticsLet Q index a sequence of statistical models.For every Q,;Q is a S Q d Q matrix of Q iid observations on a system of S Q random variables with mean zero and covariance matrix h Q. The number of variables S Q can change and even go to infinity with the number of observations Q,but not too fast.Assumption1There exists a constant. independent of Q such that S Q Q . . Assumption1is very weak.It does not require S Q to change and go to infinity,therefore standardasymptotics are included as a particular case.It is not even necessary for the ratio S Q Q to converge to any limit.Decompose the covariance matrix into eigenvalues and eigenvectors:h Q b Q e Q b W Q,where e Q is a diagonal matrix,and b Q a rotation matrix.The diagonal elements of e Q are the eigenvaluesw Q w Q SQ ,and the columns of b Q are the eigenvectors o Q o Q SQ.<Q b W Q;Q is a S Q d Qmatrix of Q iid observations on a system of S Q uncorrelated random variables that spans the same space as the original system.We impose restrictions on the higher moments of<Q.Let \Q \Q SQW denote the first column of the matrix<Q.the sample covariance matrix,but only to characterize the distribution of its eigenvalues,as in Silverstein(1994).Assumption2There exists a constant. independent of Q such thatS QS Q;L(> \Q L @ . .Assumption3OLPQ S QQd3L M N O 4Q&RY>\Q L \Q M \Q N \Q O @&DUGLQDO RI4Q,where4Q denotes the set of allthe quadruples that are made of four distinct integers between and S Q.Assumption2states that the eighth moment is bounded(on average).Assumption3states that products of uncorrelated random variables are themselves uncorrelated(on average,in the limit).In the case where general asymptotics degenerate into standard asymptotics(S Q Q ), Assumption3is trivially verified as a consequence of Assumption2.Assumption3is verified when random variables are normally or even elliptically distributed,but it is much weaker than that.Assumptions1-3are implicit throughout the paper.Our matrix norm is based on the Frobenius norm.Definition1The norm of the S Q-dimensional matrix$is:N$N Q I S Q WU $$W ,where I S Q is a scalar function of the dimension.It defines a quadratic form on the linear space of S Q-dimensional symmetric matrices whose associated inner product is:$ p Q$ I S Q WU $ $W .The behavior of N c N Q across dimensions is controlled by the function I c .The norm N c N Q is used mainly to define a notion of consistency.A given estimator will be called consistent if the norm of its difference with the true covariance matrix goes to zero(in quadratic mean)as Q goes to infinity.If S Q remains bounded,then all positive functions I c generate equivalent notions of consistency.But this particular case similar to standard asymptotics is not very representative. If S Q(or a subsequence)goes to infinity,then the choice of I c becomes much more important. If I S Q is too large(small)as S Q goes to infinity,then it will define too strong(weak)a notion of consistency.I c must define the notion of consistency that is“just right”under general asymptotics.Our solution is to define a relative norm.The norm of a S Q-dimensional matrix is divided by the norm of a benchmark matrix of the same dimension S Q.The benchmark must be chosen carefully.For lack of any other attractive candidate,we take the identity matrix as benchmark. Therefore,by convention,the identity matrix has norm one in every dimension.This determines the function I c uniquely as follows.。
miR-124通过靶向调控AURKA抑制胶质瘤细胞的增殖
【收稿日期】 2020-04-03 【修回日期】 2020-05-14 【基金项目】 国家自然科学基金青年科学基金项目(编号:
81801232) 【作者单位】 新疆医科大学第一附属医院,新疆 乌鲁木齐
830054 【作者简介】 沙娅·玛哈提(1988-),女,哈萨克族,新疆人,博士,
研究方 向:消 化 系 统 和 神 经 系 统 肿 瘤。 E-mail: sasa870716@126.com 【通讯作者】 王增亮(1983-),男,新疆人,副主任医师,研究方向: 神经系统肿瘤。E-mail:653391638@qq.com
采用 GrapadPrism6统计软件,组间数据采用配对 t检 验,P<0.05表示差异具有统计学意义。 2 结果 2.1 miR-124与 AURKA在胶质瘤组织中的表达
通过 PCR方法测定胶质瘤组织及非胶质瘤组织中 miR -124与 AURKA的表达水平,如图 1所示,miR-124在胶质 瘤组织中低表达(0.17±0.08),在非胶质瘤组织中相对高表 达(0.38±0.18),差 异 具 有 统 计 学 意 义 (P<0.001)。 AURKA在胶质瘤组织中的表达水平为 2356.84±429.76, 在 非 胶 质 瘤 组 织 中 的 表 达 水 平 为1092.32±234.35,提示 AURKA在胶质瘤组织中高表达,非胶质瘤组织中低表达,差 异具有统计学意义(P<0.001)。
收集 miR-124mimics组、anti-miR-124组及 miR- NC组细胞,加入 RIPA及 PMSF裂解细胞提取总蛋白,SDS- PAGE分离蛋白,加入 AURKA抗体及 GAPDH抗体,次日以 TBST洗涤,ECL显色系统显色,对比灰度。 1.8 荧光素酶实验
莫诺苷对大鼠脑缺血再灌注皮层Wnt信号通路转录因子表达的影响
莫诺苷对大鼠脑缺血再灌注皮层Wnt信号通路转录因子表达的影响艾厚喜;孙芳玲;侯虹丽;张丽;王文【摘要】Objective To study the effects of morroniside on the expression of Wnt signaling-related transcription factors neurogenin 2 (Ngn2), Pax6 and Tbr2 in the ischemic ipsilateral cortex 7 days after cerebral ischemia-reperfusion in rats. Methods 15 male Sprague-Dawley rats were randomly divided into sham group (n=3), ischemia group (n=3), and morroniside groups (low, medium and high dosage groups, n=3). The middle cerebral artery were occluded for 30 min, and re-perfused. Morroniside was administered intragastrically once a day at dose of 30 mg/kg, 90 mg/kg and 270 mg/kg 3 hours after operation. The expression of Ngn2, Pax6 and Tbr2 in the ischemic ipsilateral cortex were detected with Western blotting analysis 7 days after operation. Results The expression of Ngn2 increased in the ischemia group compared with the sham group (P<0.05), and it further increased the morroniside groups of medium and high dosage compared with the ischemia group (P<0.01). There was no significant difference between the ischemia group and sham group in the expression of Pax6, while it increased the morroniside groups of medium and high dosage compared with the ischemia group (P<0.01). There was no significant difference among all the groups in the expression of Tbr2. Conclusion Morroniside could increase the expression of Ngn2 and Pax6 in the ischemic ipsilateral cortex 7 days after ischemia-reperfusion in rats,suggesting promoting the neurogenesis after ischemia.%目的研究莫诺苷对脑缺血再灌注7 d患侧皮层Wnt信号通路相关转录因子神经发生素2(Ngn2)、Pax6、Tbr2表达的影响。
RNA干扰沉默信息调节因子2同源蛋白1阻滞前列腺癌PC3细胞周期
RNA干扰沉默信息调节因子2同源蛋白1阻滞前列腺癌PC3细胞周期李驰;王忠利【期刊名称】《中国肿瘤临床》【年(卷),期】2014(000)020【摘要】Objective:To observe the effects of double-stranded small interfering RNA (siRNA) of the silent mating-type infor-mation regulation 2 homolog 1 (SIRT1) on the cell proliferation, cell cycle progression, and expression levels of the cell cycle negative regulators. These regulators include P21, P27, and phosphorylated retinoblastoma (PRb) proteins present in prostate cancer PC3 cells. This work further aims to explore the possible underlying mechanism for such effects. Methods:PC3 cells were cultured in vitro and then randomly divided into the mock group, scramble siRNA transfected group, and SIRT1 siRNA-transfected group. SIRT1 siRNA ef-ficiency was examined through reverse transcription polymerase chain reaction and Western blot analysis. The inhibitory rate of PC3 cell growth was determined through a methyl thiazolyl tetrazolium assay, and the cell cycle was investigated with the use of flow cytom-etry. The P21 and P27 protein expression levels and PRb status were determined by Western blot assay. Results:Compared with those of the mock and scramble siRNA groups, the expression levels of SIRT1 mRNA and protein significantly decreased in SIRT1 siR-NA-transfected cells. In addition, the inhibitory rateof PC3 cell growth was markedly increased, and the cell cycle of the PC3 cells was arrested at the G1 stage. The expression levels of negative cell cycle regulators, including P21 and P27 protein levels increased, whereas Rb protein phosphorylation was inhibited in SIRT1 siRNA-transfected PC3 cells. Conclusion: SIRT1 RNA interference inhibits PC3 cell growth and arrests cell cycle progression through the upregulation of the P21 and P27 proteins and the inhibition of Rb protein phosphorylation.%目的:观察沉默信息调节因子2同源蛋白1(SIRT1)小干扰RNA(siRNA)对前列腺癌PC3细胞生长增殖、细胞周期和P21、P27细胞周期调节蛋白及视网膜母细胞瘤(retinoblastoma,Rb)蛋白表达变化影响,探讨SIRT1在前列腺癌发生中的可能作用机制。
内皮素受体拮抗剂对肝硬化大鼠转化生长因子β1和Ⅰ型胶原mRNA表达的影响
内皮素受体拮抗剂对肝硬化大鼠转化生长因子β1和Ⅰ型胶原mRNA表达的影响陈汇;许冰【期刊名称】《郑州大学学报(医学版)》【年(卷),期】2011(046)005【摘要】Aim; To investigate the effects of ET receptor antagonist on the expression of collagen type I and TGFpl mRNA in carbon tetrachloride-induced cirrhosis rats. Methods: A total of 40 male SD rats were allocated into carbon tetra-chloride group, normal group,endothelin A receptor antagonist group, endothelin B receptor antagonist group,and combined treatment group. The posterior 3 groups were injected with BQ-123( 12.5 μg/kg) ,BQ-788(15 μg/kg) ,and BQ-123 + BQ-788 .respectively,besides carbon tetrachloride treatment. The expressions of collagen type I and TGF(31 mRNA were determined by RT-PCR. And a specific portion of liver tissue in every group was took for routine pathology testing. Results:The expression of TGF|31 mRNA in 5 groups had no significant difference( F = 1. 857 ,P = 0. 765 ) , but that of collagen type I mRNA in 5 groups[ (0.437 ±0.082) ,(0.623 ±0.142) , (0.655 ±0. 124) , (0.558 ±0.183), and (0.874 ±0. 170)] had significant differences ( F = 11. 235 ,P =0. 023) . Microscopic image showed that inflammatory reaction decreased in the cirrhotic rats with injection of both ET receptor antagonists or combination administration. Conclusion: Endothelin receptor antagonists could inhibitthe expression of hepatic collagen typelin cirrhotic rats effectively, and reduce inflammation reaction and liver fibrosis.%目的:探讨内皮素受体拮抗剂对肝硬化大鼠转化生长因子β1和I型胶原mRNA表达的影响.方法:40只雄性SD 大鼠,随机分成四氯化碳组、正常对照组、内皮素A受体拮抗剂治疗组、内皮素B 受体拮抗剂治疗组和联合治疗组5组,每组8只.后3组在四氯化碳灌胃的基础上2次/d(间隔10 h)分别皮下注射BQ-123(12.5μg/kg)和BQ-788(15μg/kg)以及BQ-123+ BQ-788(12.5 μg/kg+ 15 μg/kg).取部分肝组织进行常规病理学检测,部分采用RT-PCR测定大鼠肝组织转化生长因子β1和I型胶原mRNA表达水平.结果:常规病理结果显示内皮素受体拮抗剂处理组肝脏炎症反应及纤维化明显减轻.5组大鼠肝组织转化生长因子β1 mRNA表达水平比较,差异无统计学意义(F=2.857,P=0.765);Ⅰ型胶原mRNA表达水平比较[(0.437±0.082)、(0.623±0.142)、(0.655±0.124)、(0.558±0.183)和(0.874±0.170)],差异有统计学意义(F=11.235,P=0.023).结论:内皮素受体拮抗剂能有效抑制肝硬化大鼠I型胶原mRNA的表达,缓解肝脏炎症反应和肝纤维化.【总页数】3页(P745-747)【作者】陈汇;许冰【作者单位】郑州大学第一附属医院肝胆胰外科郑州450052;郑州大学第一附属医院肝胆胰外科郑州450052【正文语种】中文【中图分类】R657.3【相关文献】1.内皮素受体拮抗剂对博菜霉素所致肺纤维化大鼠Ⅰ型和Ⅲ型胶原表达的影响 [J], 孙燕妮;李先茜;顾宗元;王磊;王雄彪2.复方中药对肝硬化大鼠Ⅰ型胶原mRNA表达的影响 [J], 宋明;张忠涛;王宇;薛建国;阴赪宏;马雪梅;马红;贾继东3.骨金散对去卵巢大鼠Ⅰ型胶原交联N-端肽、Ⅰ型胶原mRNA表达的影响 [J],赵锦程;李菲;刘明远;朱金玲;陶佳男;张明远;江清林4.丹参对大鼠肝星状细胞转化生长因子β1与Ⅰ型胶原mRNA表达的影响 [J], 蔡方刚;林永堃;翁山耕;石铮;刘景丰5.环孢素A联合雷公藤单体T_4对人肾小球系膜细胞Ⅳ型胶原合成及Ⅳ型胶原α1、转化生长因子β1mRNA表达的影响 [J], 苏颖;李学旺;高扬;毕增祺;林嘉友;李莉因版权原因,仅展示原文概要,查看原文内容请购买。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
2
பைடு நூலகம்
The Yang-Baxter equation and a unified spectral decomposition for exceptional R-matrices
2.1
The Yang-Baxter equation
The Yang-Baxter equation (YBE), between expressions in End(V ⊗ V ⊗ V ) for V = Cn , is ˇ (u ) ⊗ 1 · 1 ⊗ R ˇ (u + v ) · R ˇ (v ) ⊗ 1 = 1 ⊗ R ˇ (v ) · R ˇ (u + v ) ⊗ 1 · 1 ⊗ R ˇ (u ) , R (2.1)
or, with its indices made explicit (each running from 1 to n, and with repeated indices summed), ˇ ij (u)R ˇ mk (u + v )R ˇ ls (v ) = R ˇ jk (v )R ˇ il (u + v )R ˇ sm (u) , R sr pq ps qr lm lm (2.2) ˇ (u) ∈ End(V ⊗ V ). We first note that this equation is homogeneous in R ˇ and in u, so for R ˇ (λu) is still a solution for arbitrary C-scalings λ and µ. We shall therefore rescale that µR ˇ and u wherever it is convenient for us to do so. (In the physical construction of both R ˇ affects its analytic factorized S -matrices, in contrast, the scale of u is fixed, and scaling R properties and thus the bootstrap spectrum.) The simplest class of solutions of the YBE (which we refer to as ‘R-matrices’) has rational dependence on u, and an expansion in powers of 1/u of the form ˇ (u ) = P 1 n ⊗ 1 n + C + . . . R u where C=
a,b
ρV (I a ) ⊗ ρV (I b )gab ,
(2.3)
in which 1n is the n × n identity matrix, I a are the generators of a Lie algebra g, gab its Cartan-Killing form, ρV its suitably-chosen representation on a module V (usually its 2
1 2
nm15@ at165@
1
Introduction
The Yang-Baxter equation (YBE) [1], which appears in 1+1D physics as the factorizability condition for S -matrices in integrable models, is closely bound up with Lie algebras and their representation theory, essentially because of the asymptotic behaviour of its rational solutions (see (2.3) below). Indeed, if one were to investigate the YBE knowing nothing of Lie algebras, one would very soon find oneself re-discovering a great deal about them. In fact surprisingly little is known about the rational YBE solutions associated with the exceptional Lie algebras: this paper investigates these, and finds, in precisely the spirit of the preceding sentence, an intricate relationship between the YBE and various identities satisfied by the algebras’ invariant tensors. Our point of departure is a remarkable observation made a few years ago by Westbury [2]: that certain solutions of the YBE (‘R-matrices’) associated with the Lie algebras of the e6 and e7 series (the second and third rows of the Freudenthal-Tits ‘magic square’) have spectral decompositions which may be expressed simply and uniformly in terms of the dimension (= 1, 2, 4 or 8) of the underlying division algebra. In this paper we shall explicitly construct these R-matrices, prove that they solve the Yang-Baxter equation, and thus provide an explanation of Westbury’s observation. More interesting, perhaps, is what we shall learn along the way. In particular we will need to understand the structure of, and provide a basis for, the centralizer of the Lie group action on tensor cubes of the defining representation. (These are the analogues for the exceptional series of the symmetric group algebra for su(n) and of Brauer’s algebra for the other classical groups.) We shall also discover a host of secondary identities satisfied by the groups’ defining invariant tensors, all of them subtly necessary in solving the YBE. Our method is to use Cvitanovi´ c’s ‘birdtrack’ diagrams [3, 4] (which extend earlier ideas of Penrose) to handle the calculations. An alternative approach to the en centralizers, which utilizes the braid matrices (the q -deformed but spectral-parameter u-independent R-matrices) and is complementary to ours, appears in [5].
Rational R-matrices, centralizer algebras and tensor identities
arXiv:math/0608248v3 [math.QA] 24 Nov 2007
for e6 and e7 exceptional families of Lie algebras N. J. MacKay1 and A. Taylor2
defining representation), and P the transposition operator on the two components of V ⊗ V . Thus, from the outset, the investigation of R-matrices naturally involves the investigation of Lie algebras and their representations. ˇ (u) commutes with the A natural consequence of this (see, for example, [6]) is that R action of g on V ⊗ V , so that, by Schur’s lemma, ˇ (u ) = R
Department of Mathematics, University of York, York YO10 5DD, U.K.
Abstract
We use Cvitanovi´ c’s diagrammatic techniques to construct the rational solutions of the Yang-Baxter equation associated with the e6 and e7 families of Lie algebras, and thus explain Westbury’s observations about their uniform spectral decompositions. In doing so we explore the extensions of the Brauer and symmetric group algebras to the centralizer algebras of e7 and e6 on their lowest-dimensional representations and (up to three-fold) tensor products thereof, giving bases for them and a range of identities satisfied by the algebras’ defining invariant tensors.