Rates and patterns of gene duplication and
植物基因组领域重要文章
植物基因组领域重要文章花了一个周末加一个半天,根据记忆整理了我认为重要的植物基因组领域的一些重要或者说有趣的文章,大家如果都读过,而且能讲个一二三,那恭喜你三把斧头至少已经有一把在手了. 需要说明的,这仅是一家之言,而且整理之时并没有去读文章,对文章的推荐仅是凭之前读过的印象,所以错误在所难免.请大家辨证参考.当然,我也无法保证这真的是经典,呵呵. 一共分为13个领域,推荐了140篇文章,就当寒假作业好了,呵呵.欢迎指正补充.(注:因为是基因组,所以暂没有把功能基因组的内容包括进来)植物基因组学研究我感兴趣的几个领域1.基因组的结构和变异2.分子标记连锁图谱构建和基因定位3.QTL定位的原理和方法4.QTL精细定位5.基因和QTL的克隆5.1插入突变方法5.2图位克隆的方法(含比较图位克隆)5.3候选基因法6.资源评估和利用7.分子标记辅助选择(含分子设计育种)8.转基因8.1转基因体系和实证研究8.2转基因的生态学安全研究9.比较基因组9.1标记水平的比较研究9.2序列水平的比较研究9.3性状水平的比较研究9.4功能比较研究10.杂种优势研究10.1遗传学解释10.2分子生物学解释11.分子进化(主要是玉米进化)12.基于连锁不平衡的关联分析12.1实证研究12.2方法学研究13.基因组研究中的一些新技术运用13.1DNA芯片技术13.2 DNA shuffling13.3 Gene Trap13.4 Gene therapy in plants13.5 TILLING 技术1.植物基因组的结构和变异在越来越多的植物基因组被测完后,该研究的重要性逐渐显现,该方面的文章可以说是汗牛充栋.在玉米方面该领域的大牛是Buckler,ES; Messing, J, Dooner HK, Doebley J ; Gaut, BS.1. Buckler, E. S., Gaut, B. S. and McMullen, M. D. (2006)Molecular and functional diversity of maize. Curr. Opin. PlantBiol. 9, 172-176这是关于玉米基因组结构的REVIEW文章,先了解大概,在细读研究文章.其任何2个玉米自交系之间的遗传变异大于人和大猩猩之间的差异的经典论断充分说明玉米变异的广泛性.最近因为人类基因组研究的进展而似乎可以改写.2.Messing J, Dooner HK. Organization and variability of themaize genome. Curr Opin Plant Biol. 2006 Apr;9(2):157-63两位大牛的联合REVIEW, 值得一读.3.Goff S A, Ricke D, Lan T H, Presting G, Wang R, Dunn M,Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D,Martin C, Katagiri F, Lange B M, Moughamer T, Xia Y, Budworth P,Zhong J, Miguel T, et al. A Draft Sequence of the Rice Genome Oryzasativa L. ssp. japonica. Science, 2002, 296: 92-100大家或许都知道这篇文章,但我相信看完的不多,尽管全基因组测序的文章许多,强烈建议大家读这篇,讨论写的太好了.同期中国测序的文章就相形见拙许多,当然之后水稻精细图谱的公布,这篇文章也可以读读. 4. International Rice Genome Sequencing Project. The map-basedsequence genome. nature, 2005, 436: 793-8005.Fu H H, Dooner H K. Intraspecific violation of geneticcolinearity and its implications in maize. Proc Natl Acad Sci USA,2002, 99: 9573-9578改文章给我的启示许多,基因的存在和缺失也是等位基因的一种形式就是其一,尽管后来该文章的结论不断被修正.6.Song R, Messing J: Gene expression of a gene family in maizebased on noncolinear haplotypes. Proc Natl Acad Sci USA 2003,100:9055-9060.宋任涛代表作之一, 与Fu的文章有异曲同工之妙,给杂种优势提供了新的解释.7.Brunner S, Fengler K, Morgante M, Tingey S, Rafalski A:Evolution of DNA sequence non-homologies among maize inbreds. PlantCell 2005, 17:343-360.5,6工作的基础上提供了更多的数据8. Lai J, Li Y, Messing J, Dooner HK: Gene movement by Helitrontransposons contributes to the haplotype variability of maize. ProcNatl Acad Sci USA 2005, 102:9068-9073.赖锦盛的代表工作之一,为玉米基因组的扩张提供了全面的解释.9. Lai J, Ma J, Swigonova Z, Ramakrishna W, Linton E, Llaca V,Tanyolac B, Park YJ, Jeong OY, Bennetzen JL et al.: Gene loss andmovement in the maize genome. Genome Res 2004, 14:1924-1931部分阐述了玉米基因组的结构的成因,更多的是插入而不是缺失.10. Morgante M, Brunner S, Pea G, Fengler K, Zuccolo A, RafalskiA:Gene duplication and exon shuffling by helitron-like transposonsgenerate intraspecies diversity in maize.Nat Genet 2005,37:997-1002与8讲的同一个故事.11.Tenaillon MI, Sawkins MC, Long AD, Gaut RL, Doebley JF, GautBS: Patterns of DNA sequence polymorphism along chromosome 1 ofmaize (Zea mays ssp. mays L.). Proc Natl Acad Sci USA 2001,8:9161-9166该数据表明,在玉米基因组大约只保留了其祖先大刍草60%的遗传变异.12.Messing J, Bharti AK, Karlowski WM, Gundlach H, Kim HR, Yu Y,Wei F, Fuks G, Soderlund CA, Mayer KF et al.: Sequence compositionand genome organization of maize. Proc Natl Acad Sci USA 2004,101:14349-14354玉米有59000个基因的预测就出自此文.13. Bruggmann R, Bharti AK, Gundlach H, Lai J, Young S,Pontaroli AC, Wei F, Haberer G, Fuks G, Du C, Raymond C, Estep MC,Liu R, Bennetzen JL, Chan AP, Rabinowicz PD, Quackenbush J,Barbazuk WB, Wing RA, Birren B, Nusbaum C, Rounsley S, Mayer KF,Messing J. Uneven chromosome contraction andexpansion in the maize genome. Genome Res. 2006Oct;16(10):1241-5114.Emrich SJ, Li L, Wen TJ, Yandeau-Nelson MD, Fu Y, Guo L, ChouHH, Aluru S, Ashlock DA, Schnable PS. Nearly Identical Paralogs:Implications for Maize (Zea mays L.) Genome Evolution.Genetics.2007 Jan;175(1):429-39Schnable 提出的NIP概念给我们以后的关联分析和其他一系列研究提出了新的挑战,尽管在玉米基因组的频率只有1%.15. Fu Y, Emrich SJ, Guo L, Wen TJ, Ashlock DA, Aluru S,Schnable PS.Quality assessment of maize assembled genomic islands (MAGIs) andlarge-scale experimental verification of predicted genes. Proc NatlAcad Sci U S A. 2005 23;102(34):12282-7.看看什么是MAGI,也是Schnable的贡献,其超大的课题组(在美国而言)和永不疲倦的精力让他文章如麻,而且牛文不断。
基因突变 变异比例在英文中的用法
基因突变变异比例在英文中的用法基因突变和变异比例在英文中通常可以用以下方式表达:1. Gene Mutation: This term is commonly used to refer to any change in the DNA sequence that can affect the structure or function of an organism. It is a broad term that encompasses various types of genetic alterations.2. Mutation Rate: This refers to the frequency at whicha particular gene or set of genes undergoes changes in its DNA sequence. It is often expressed as the number of mutations per generation or per unit of time.3. Variant Frequency: This term is used to describe the proportion of individuals in a population who carry a specific genetic variant. It is often expressed as a percentage or a fraction.4. Allele Frequency: This refers to the relative frequency of a particular allele (one of two or morealternative forms of a gene) in a population. It is a measure of how common a specific genetic variant is within a population.5. Genetic Variation: This term encompasses all the differences in DNA sequence between individuals within a population. It can be used to describe the overall level of genetic diversity within a population.These terms are commonly used in scientific literature, research papers, and discussions related to genetics, genomics, and evolutionary biology to describe the occurrence and prevalence of genetic mutations and variations within populations.。
The Four Great Inventions of ancient China
作文一The Four Great Inventions of Ancient ChinaAs a country with a long history and profound cultural heritage, China has made numerous significant contributions to human civilization. Among these, the "Four Great Inventions" -papermaking, printing, gunpowder, and the compass - stand out as remarkable achievements.The invention of papermaking can be attributed to Cai Lun during the Eastern Han Dynasty. Before the advent of paper, various materials like bamboo slips and silk were used for writing, which was cumbersome and expensive. The creation of paper using plant fibers revolutionized the way information was recorded and disseminated, facilitating the spread of knowledge and culture within China and beyond.Printing, another crucial invention, further enhanced the dissemination of knowledge. Woodblock printing, which emerged in the Tang Dynasty, enabled the mass production of books and other written materials, making them more accessible to a wider audience. This breakthrough played a vital role in promoting literacy and the preservation of written works, fostering the exchange of ideas.The discovery of gunpowder in the Tang Dynasty had a profound impact on warfare, as well as in other areas such as alchemy and entertainment. Initially used for fireworks, it later found applications in military weapons, transforming the nature of armed conflicts and leading to the development of various weapons and explosives.Finally, the compass, invented during the Song Dynasty, revolutionized navigation. By harnessing the Earth's magnetic field, it provided sailors with a reliable means to determine direction, greatly facilitating maritime trade and exploration. This invention not only expanded trade networks but also played a crucial role in the Age of Exploration, connecting different cultures and facilitating the exchange of goods and ideas across continents.In conclusion, the Four Great Inventions of Ancient China have left an indelible mark on human history. They showcase the ingenuity and creativity of the ancient Chinese people and continue to inspire and influence the world today.作文二The Four Great InventionsThe compass, gunpowder, papermaking skill, and typography are the four great inventions of ancient China, which are symbols of China's status as an ancient civilized country and occupy an important position in the history of human civilization.The compass was invented as a result of the ancient Chinese working people's long-termunderstanding of object magnetism. Through productive labor, they discovered the characteristic of the magnet pointing to a direction. After many experiments and researches, the compass was finally invented and played an important role in navigation after being introduced to Europe.Gunpowder is made by mixing three kinds of materials: niter, sulfur, and charcoal. In the Tang Dynasty, gunpowder began to be applied to military affairs. People utilized the throwing stone machine, ignited the gunpowder bag, and threw it out to burn the enemy, which was the most primitive cannon.The invention of papermaking skill is China's contribution to the civilization of the world. In 105 AD, during the Eastern Han Dynasty, Cai Lun, the inventor of papermaking skill, based on the experience of his predecessors in weaving silk, made vegetable fiber paper using bark, broken fishing nets, rags, and other raw materials. Since then, paper has become the commonly used writing material and is called "Caihou Paper".Typography is one of the most important four inventions in ancient China. The woodblock printing appeared in the Sui Dynasty, where characters were carved in relief on a woodblock with a knife, and then ink was brushed on it to print on paper. In the Northern Song Dynasty, the word worker Bi Sheng (1004-1048 AD) used clay with viscosity to make rectangular cubes, carved single Chinese characters on them, burned them with fire to form the font. This printing skill is named typography.In a word, the ancient four great inventions of China have left a magnificent chapter in the history of human science and culture. These great inventions once influenced and benefited the whole world, promoting the advancement of human history.作文三The Marvels of Ancient China: The Four Great InventionsChina's ancient Four Great Inventions have had a profound and lasting impact on the world. Papermaking, for instance, emerged in China and revolutionized the way information was recorded and shared. Before its invention, people relied on heavy and cumbersome materials for writing.The printing technique was another significant advancement. Woodblock printing allowed for the reproduction of texts on a larger scale, spreading knowledge more widely and promoting literacy.Gunpowder, although initially used for fireworks, also had a major influence on warfare, changing the dynamics of battles.The compass was a crucial invention that greatly enhanced navigation. It enabled sailors to venture into uncharted waters with more confidence, facilitating trade and exploration.These inventions not only showcase the ingenuity and creativity of the ancient Chinese but also highlight their important contributions to the progress of human civilization. They have left an indelible mark on the course of history, shaping the world as we know it today.作文四The Four Great Inventions: Treasures of Ancient ChinaThe Four Great Inventions of ancient China are like bright stars in the long history of human civilization. The papermaking technology provided a convenient and economical medium for recording and transmitting knowledge. It replaced the previous heavy and expensive writing materials, making the spread of information more efficient.The compass, with its ability to determine direction, opened up new horizons for exploration and navigation. Sailors could rely on it to travel to distant places, expanding the scope of human activities.Gunpowder, despite its destructive power in warfare, also had some positive aspects. It was used in various fields, and the invention of fireworks brought joy and celebration to people's lives.Typography, especially the movable type printing, revolutionized the publishing industry. It made the mass production of books possible, promoting the dissemination of ideas and cultural exchange.These four inventions are not only the pride of the Chinese nation but also the common wealth of all mankind. They have had a far-reaching impact on the development of human society, and their significance will always be remembered.作文五An Introduction to the Four Great Inventions of Ancient ChinaChina's ancient Four Great Inventions have played a vital role in the development of human civilization. Papermaking, one of the inventions, provided a lightweight and accessible material for writing, replacing the cumbersome and expensive alternatives.The printing method was a significant breakthrough that made the duplication and dissemination of books and knowledge much easier. This led to an increase in literacy rates and the sharing of ideas on a larger scale.Gunpowder, although it brought changes to warfare, also had applications in other areas. Its discovery and use had a certain impact on various industries.The compass was an essential tool for navigation. It allowed people to accurately determine theirdirection at sea, which greatly promoted maritime trade and exploration.In summary, the Four Great Inventions of ancient China are evidence of the intelligence and innovation of the Chinese people. They have had a profound influence on the world, shaping the course of history and contributing to the progress of human society. These inventions continue to be admired and studied, serving as a source of inspiration for future generations.。
gret-15
Gret-15IntroductionThe aim of this document is to provide detailed information about Gret-15. This document will cover various aspects of Gret-15, including its features, functions, and usage. Gret-15 is a powerful software application designed to streamline workflow processes and increase efficiency in an organization.Features1. Workflow AutomationGret-15 offers extensive workflow automation capabilities that allow users to automate repetitive tasks and streamline their business processes. This feature eliminates the need for manual intervention and reduces the chances of errors or delays. With Gret-15, users can create custom workflows that automate tasks such as document approvals, data entry, and email notifications.2. Task ManagementGret-15 provides a comprehensive task management system that allows users to create, assign, and track tasks effectively. Users can easily manage their individual tasks or collaborate with team members on shared tasks. The task management feature of Gret-15 provides a centralized platform to track progress, set deadlines, and allocate resources for efficient task completion.3. Document CollaborationCollaboration is essential for effective teamwork, and Gret-15 offers a robust document collaboration feature. Users can upload, edit, and share documents securely within the Gret-15 platform. Multiple users can collaborate on the same document simultaneously, making it easy to track changes and ensure everyone is on the same page. The document collaboration feature also includes version control, allowing users to revert to previous versions if necessary.4. Reporting and AnalyticsGret-15 comes equipped with advanced reporting and analytics capabilities. Users can generate customized reports that provide valuable insights into various aspects of the organization. These reports can include data on workflow performance, task completion rates, and resource utilization. The analytics feature of Gret-15 allows users to identify trends, patterns, and areas for improvement, enabling data-driven decision making.5. IntegrationGret-15 integrates seamlessly with other popular software applications, providing users with a comprehensive and interconnected work environment. Integration with tools such as email clients, project management software, and productivity suites allows for smooth data transfer and eliminates the need for manual data entry. This feature enhances productivity and reduces the risk of data duplication or inconsistencies.How to Use Gret-15To start using Gret-15, follow these steps:1.Sign up for a Gret-15 account on the official website.2.Once signed up, log in to your Gret-15 account using your credentials.3.Familiarize yourself with the user interface and navigation.4.Set up your organization’s workflows by creatin g custom workflowsor using pre-built templates.5.Assign tasks to team members using the task management feature andset deadlines.6.Upload and collaborate on documents with your team members usingthe document collaboration feature.7.Generate reports and analyze data to gain insights into yourorganization’s performance.8.Integrate Gret-15 with other software applications to streamline yourworkflow further.ConclusionGret-15 is a powerful software application that offers an extensive range of features to streamline workflow processes and increase efficiency. With its workflow automation, task management, document collaboration, reporting, and integration capabilities, Gret-15 provides users with a comprehensive solution for their business needs. By utilizing Gret-15, organizations can achieve improved productivity, reduced manual intervention, and better decision making based on data-driven insights.。
旁系同源基因 英语
旁系同源基因英语Paralogous Genes: An Insight into Their Evolution and Function.Paralogous genes, also known as duplicate genes or duplicated genes, are genes that have arisen from the duplication of an ancestral gene within the genome of an organism. These genes are typically found on different chromosomes or in different genomic regions within the same species. The duplication process can occur via various mechanisms, including whole-genome duplication, segmental duplication, or tandem duplication. The resulting paralogous genes may retain similar or identical functions, acquire novel functions, or become pseudogenes due to mutational inactivation.The concept of paralogous genes was introduced by Walter Gilbert in 1978 to describe the genetic relationship between genes that arise from duplication events within a species. Since then, they have been extensively studied invarious organisms, providing insights into the mechanisms of gene duplication, the evolution of gene families, and the functional diversity within genomes.Evolutionary Origin of Paralogous Genes.Gene duplication is a common phenomenon in the evolution of genomes. It can occur in response to various evolutionary pressures, including adaptive radiation, genetic drift, and mutational events. The duplicated gene may retain the original function of the ancestral gene or evolve a new function, depending on the selective pressures acting on the gene and the genomic context.Adaptive radiation occurs when a species diverges into multiple lineages, each adapting to different environments. In such cases, gene duplication can provide the genetic raw material for the evolution of novel traits, allowing the species to occupy multiple niches. For example, the duplication of a gene encoding a transcription factor may lead to the evolution of two genes with distinct expression patterns and functions, allowing the species to respond todifferent environmental cues.Genetic drift, on the other hand, occurs when random sampling of genes during reproduction leads to changes in gene frequencies within a population. This process can result in the fixation of gene duplications, even if they do not confer a direct advantage to the organism. Such duplications may persist in the genome as pseudogenes or may evolve new functions over time.Mutational events, such as chromosomal rearrangements or gene conversion, can also lead to the duplication of genes. These events can result in the duplication of entire chromosome segments or individual genes, leading to the formation of paralogous genes.Functional Divergence of Paralogous Genes.Once duplicated, paralogous genes can diverge in function due to mutations in their coding sequences, regulatory regions, or both. These mutations can result in changes in protein structure or expression patterns,leading to the evolution of novel functions.For example, the duplication of a gene encoding an enzyme may lead to the evolution of two genes with distinct substrate specificities or catalytic activities. Similarly, the duplication of a gene encoding a transcription factor may result in the evolution of two genes with distinct DNA binding sites or regulatory networks.In some cases, one of the duplicated genes may retain the original function while the other evolves a new function. This process is known as subfunctionalization, where the duplicate genes divide the ancestral gene's functions between them. Alternatively, both genes may evolve new functions, a process known as neofunctionalization.Biological Importance of Paralogous Genes.Paralogous genes play crucial roles in various biological processes, including metabolism, development, and responses to environmental stresses. By providinggenetic redundancy, they can buffer the effects ofmutations and maintain the stability of biological systems. Additionally, they can contribute to the evolution of novel traits and adaptations, allowing species to respond to changing environments.For instance, in humans, paralogous genes are involvedin a wide range of biological processes, including immune response, signal transduction, and transcription regulation. Mutations in these genes can lead to various diseases and disorders, such as cancer, neurodegenerative diseases, and genetic disorders.Conclusion.Paralogous genes are an important component of genomes, providing genetic diversity and evolutionary plasticity. By understanding the mechanisms of gene duplication and the functional divergence of paralogous genes, we can gain insights into the evolution of genomes and the adaptive strategies employed by organisms. Future studies in this field will continue to reveal the complex relationshipsbetween paralogous genes and their roles in maintaining the stability and diversity of life on Earth.。
体外哺乳动物细胞基因突变试验的英语
体外哺乳动物细胞基因突变试验的英语Gene mutation is a crucial process in the evolution of species, as it introduces genetic diversity and drives adaptation to changing environments. In the field of molecular biology, researchers often conduct gene mutation experiments on mammalian cells to study the effects of specific genetic changes on cellular function. One commonly used method for studying gene mutations in mammalian cells is the in vitro mammalian cell gene mutation assay.The in vitro mammalian cell gene mutation assay is a widely accepted and standardized test for assessing the mutagenic potential of chemicals and other substances. This assay is based on the principle that mutations in specific genes can lead to changesin cellular phenotype, such as altered growth characteristics or resistance to certain toxins. By exposing mammalian cells to a test substance and then monitoring for genetic changes, researchers can determine the mutagenic potential of the substance in question.To conduct an in vitro mammalian cell gene mutation assay, researchers typically start by selecting a suitable mammalian cell line for the experiment. Commonly used cell lines include Chinese hamster ovary (CHO) cells, L5178Y mouse lymphoma cells, and TK6 human lymphoblastoid cells. These cell lines are chosen for their sensitivity to genetic changes and their ability to accurately reflect the mutagenic potential of test substances.Once a cell line has been selected, researchers expose the cells to varying concentrations of the test substance and incubate them for a specified period of time. During this incubation period, the cells are allowed to replicate and divide, giving them the opportunity to accumulate genetic mutations. After the incubation period, researchers can assess the presence of gene mutations by performing molecular analyses, such as polymerase chain reaction (PCR) or DNA sequencing.The results of an in vitro mammalian cell gene mutation assay can provide valuable information about the potential mutagenic effects of a test substance. If the test substance induces a significant increase in the frequency of gene mutations in the treated cellscompared to untreated controls, it may be considered mutagenic. This information is important for assessing the safety of chemicals and other substances, as mutagenic compounds have the potential to cause genetic damage and increase the risk of cancer.In conclusion, the in vitro mammalian cell gene mutation assay is a powerful tool for studying the mutagenic potential of chemicals and other substances. By exposing mammalian cells to test substances and monitoring for genetic changes, researchers can gain valuable insights into the effects of specific genetic mutations on cellular function. This assay plays a crucial role in assessing the safety of chemicals and informing regulatory decisions to protect human health and the environment.。
The Significance of Digital Gene Expression Profiles
The Significance of Digital GeneExpression ProfilesSte´phane Audic and Jean-Michel Claverie1Laboratory of Structural and Genetic Information,Centre National de la Recherche Scientifique–E.P.91,Marseille13402,FranceGenes differentially expressed in different tissues,during development,or during specific pathologies are of foremost interest to both basic and pharmaceutical research.‘‘Transcript profiles’’or‘‘digital Northerns’’are generated routinely by partially sequencing thousands of randomly selected clones from relevant cDNA libraries.Differentially expressed genes can then be detected from variations in the counts of their cognate sequence tags.Here we present the first systematic study on the influence of random fluctuations and sampling size on the reliability of this kind of data.We establish a rigorous significance test and demonstrate its use on publicly available transcript profiles.The theory links the threshold of selection of putatively regulated genes (e.g.,the number of pharmaceutical leads)to the fraction of false positive clones one is willing to risk.Our results delineate more precisely and extend the limits within which digital Northern data can be used.Very large-scale,single-pass partial sequencing of cDNA clones from a large number of libraries has led to the identification of∼50,000human genes(Ad-ams et al.1995;Aaronson et al.1996;Hillier et al. 1996).However,a precise function or a complete transcript sequence are known for<5000of these (Adams et al.1995;Boguski and Schuler1995).In the absence of functional clues for most of the newly identified genes,evidence of differential ex-pression is the most important criteria to prioritize the exploitation of anonymous sequence data in both basic and pharmaceutical(Nowak1995;Ad-ams1996;Bains1996;Editorial1996)research.For example,the study of expression profiles in various tumors is central to the new Cancer Genome Anatomy project(Kuska1996;O’Brien1997).In contrast to functional assays,the quantitative analysis of gene expression level lends itself to large-scale implementation.Two main approaches have been proposed(1)‘‘analog’’methods based on hy-bridization to arrayed cDNA libraries(Lennon and Lehrach1991;Gress et al.1992;Nguyen et al.1995; Schena et al.1995;Zhao et al.1995)or oligonucleo-tide‘‘chips’’(Fodor et al.1991;Southern et al.1992; Guo et al.1994;Matson et al.1995);and(2)‘‘digi-tal’’methods,based on the generation of sequence tags.This paper focuses on the latter.The sequence tag-based method(Okubo et al.1992;Matsubara and Okubo1994)consists of generating a large number(thousands)of expressed sequence tags (ESTs)(Adams et al.1991;Wilcox et al.1991;Adams et al.1992;Khan et al.1992)from3Ј-directed re-gional non-normalized cDNA libraries.Recently, Velculescu et al.(1995)have introduced the serial analysis of gene expression(SAGE).Although tags are100–300nucleotides in length in the original EST approach,the SAGE method only requires nine nucleotides,therefore allowing a larger throughput. In both protocols,the number of tags is reported to be proportional to the abundance of cognate tran-scripts in the tissue or cell type used to make the cDNA library.The variation in the relative fre-quency of those tags,stored in computer databases, is then used to point out the differential expression of the corresponding genes:This is the concept of a ‘‘digital Northern’’comparison.In the absence of a sound theoretical framework,the validity of the method has only been verified for a handful of genes in the context of two cellular differentiation systems(Lee et al.1995;Okubo et al.1995)induc-ible in vitro.Yet,with a total number of human genes of∼80,000or more,it is not intuitive that sequencing a mere few thousand tags(a typical ex-periment)from highly redundant non-normalized cDNA libraries can produce a useful picture,or real-istic‘‘transcript profile,’’of a given tissue,develop-ment stage,or cell type.What variations in tag numbers allow for a reliable inference about differ-ential expression?How many tags should be gener-ated?Here we present the statistical framework re-quired to answer those questions and analyze tran-script profiles in a quantitative manner.1Corresponding author.E-MAIL jmc@rs-mrs.fr;FAX33491164549.RESEARCH 986RESULTSIn Methods we establish the probability distribution governing the occurrence of the same rare event in duplicate experiments.This probability distribution is a general result applicable to a wide variety of experimental situations,although this paper fo-cuses on its use to analyze digital gene expression patterns.The main and only mathematical assump-tion behind the derivation is that the observed events are rare and part of a large population of possible outcomes(the distribution of which is not specified).In the context of a digital Northern,one event is the observation of a given cDNA sequence tag,and the experiment consists of the random picking and partial sequencing of a number N of cDNA clones.Given the usual complexity(i.e.,the number of different genes expressed)of cDNA li-braries,observing a given cDNA qualifies as a rare event,as the abundance of most individual mes-sages is of the order of a few percents or less. Random Fluctuation vs.Significant Change in Tag Number:When to Infer Differential ExpressionLet us randomly pick N=1000clones from a cDNA library and generate the corresponding sequence tags;a given message(e.g.,interleukin-2)will be picked x(e.g.,two)times,with x in a typical(0–10) range.If we now redo this experiment,that is,again pick1000clones and generate the tags,the same message will now be picked y(e.g.,3)times.If the experiments have been duplicated correctly and the clones selected at random,we expect x and y to be close,albeit often different because of random fluc-tuations.In the Methods section,we show that the expected probability of observing y occurrences of a clone already observed x times is given by the simple formula:p(y|x)=͑x+y͒!x!y!2͑x+y+1͒(1)Equation1can be used to compute a confi-dence interval[y min,y max]⑀within which we expect to find y with a given probability,noted1–2⑀,where 2⑀is the significance level.For⑀small(e.g.,2.5%or less),y values falling outside the[y min,y max]⑀inter-val correspond to p(y|x)<<1,therefore pointing out very unlikely random fluctuations between the two experiments.The confidence intervals for the usual1%and5%significance levels are given in Table1.The same confidence intervals listed in Table1 can in fact be used to analyze the results of sampling N clones from two different libraries.Provided all experimental factors are well replicated,significant discrepancies between x(from one library)and y (from the other)will now characterize differentially expressed genes,for example,the relative abun-dance of which is unlikely to be the same in the two libraries.Simply reading Table1,we see that varia-tions in counts such as7→0,or2→12are signifi-cant(P<0.01)evidence of regulated gene expres-sion,whereas variations such as3→0or8→16are not(P>0.05).However,we do not advocate the use of rigid significance thresholds to analyze digital transcript profiles,as discussed below.Influence of the Sampling SizeSurprisingly at first,p(y|x)in Equation1does not involve the sampling size N,that is,the total num-ber of picked clones.The fluctuation probabilities, and confidence intervals,depend only on the values of the observed counts.To understand why,we must remember that Equation1governs the results of strictly duplicated experiments.Given N clones are sampled,the most likely tags to be picked up are, intuitively,those corresponding to cDNA,the abun-dance of which is of the order of1/N,or larger(ac-cording to Equation3,the probability of finding a given cDNA with1/N abundance while picking up N clones is0.63,see also Equation13).Choosing a sampling size therefore corresponds to targeting a given subset of genes,the level of expression of which allows their tags to occur at reasonable fre-quencies.As expected,more reliable inferences can be made on clones corresponding to larger absolute frequencies(i.e.,the ones more often picked up). For example(see Table1),a variation in counts from 1–3(threefold increase)is not indicative of a signifi-cant(P<0.05)increase,whereas a variation from 4–12is significant at P<0.05,and a variation from 7–21is significant at P<0.01.For a gene expressed at a given rate,increasing the sampling size N leads to higher tag counts,and allows more stringent sta-tistical inference to be made,for the same propor-tional variation.Most often in practice one wishes to compare digital Northerns or gene profiles that have been computed from the random picking of different numbers of clones,N1and N2.The mathematical problem is now to establish the probability for a given cDNA(e.g.,interleukin-2)to be picked up x times when the sampling size was N1and y times when the sampling size was N2.Equation1then becomes(see Methods):STATISTICAL ANALYSIS OF TRANSCRIPT PROFILESGENOME RESEARCHp͑y|x͒=ͩN2N1ͪy͑x+y͒!x!y!ͩ1+N2N1ͪ͑x+y+1͒(2)Whereas Equation1applied to the analysis of fluctuation in counts in strictly identical experi-ments,Equation2now applies to the analysis of counts in experiments only differing by the total number of clones randomly picked up.In practice, Equation2will be used to analyze experiments per-formed on two different libraries,using different sampling sizes.As for Equation1,small p(y|x)are expected to characterize the genes exhibiting regu-lated expression,the relative abundance of which is unlikely to be the same in the two libraries.Table1.Confidence Intervals in Function of the Value of xThe value of x(first column),one of the occurrence numbers.The intervals are given for the95%(2=0.05) and99%(2=0.01)confidence levels.Up to x=20,the exact boundaries,immediately outside the confi-dence interval(first significantly different values)are indicated.A star is used when none are possible.For larger values,the boundaries are given as percentages to be subtracted or added to x.Ricker’s confidence interval characterizes the value of,not y(see Methods).The use of a flat p()prior distribution results in the most stringent test,as expected.Although the number(N)of clones sampled does not appear in the expres-sion of p(y|x)(Equation1),its influence shows in the fact that the confidence interval becomes proportionally smaller as x(and y)increases(e.g.,1¨7has the same statistical significance as40¨60).For the same expression level,larger N will result in larger absolute values for x and y,making the detection of significant differential expression more sensitive.Comparison with Fisher’s(2×2)Exact TestThe(2ן2)contingency tables arising from treat-ment versus control experiments are traditionally analyzed with Fisher’s exact test(Siegel1956; Agresti1996).Differential EST count data can be presented in a tabulated form so as to suggest the use of this test,as follows:Brain cDNA library Liver cDNA libraryNumber of actin ESTs211 Number of other ESTs9981189Total clones sampled10001200 The statistical significance according to Fisher’s exact test for such a result is4.6%(two-tail P-value,i.e.,the probability for such a table to occur in the hypothesis that actin EST frequencies are in-dependent of the cDNA libraries).In comparison, the P-value computed from the cumulative form (Equation9,see Methods)of Equation2(i.e.,for the relative frequency of actin ESTs to be the same in both libraries,given that at least11cognate ESTs are observed in the liver library after two were observed in the brain library)is1.6%.Fisher’s(2ן2)exact test is always more conservative than our test(e.g., Fisher’s P-value of1.6%requires a2→13EST count transition in the above setting).Besides being too conservative,there is a more fundamental difficulty in using this test to analyze EST count data.The sampling scheme assumed by Fisher’s exact test in principle requires the total number of data values in the contingency table to be fixed,as well as both the row marginal total and the column marginal totals. In our prospective experimental situation,only the column marginals(i.e.,the numbers of clones sampled from each library)are fixed.The extension of Fisher’s exact test to cases where only one set of marginal totals is fixed(Tocher1950)is still contro-versial.In the context of the above EST counting results,there is an additional problem with the lack of homogeneity in the definition of the‘‘other EST’’category.This category represents different subsets of transcripts for different libraries.The use of Fisher’s(2ן2)exact test is more natural for a different type of EST data analysis:the study of library-dependent alternative transcripts of the same gene(i.e.,splice or polyadenylation vari-ants)(D.Gautheret,O.Poirot,F.Lopez,S.Audic, and J.-M.Claverie,in prep.).Here,the results for an hypothetical gene G1may look as follows:G1-relatedtranscripts inbrain libraryG1-relatedtranscripts inliver library Long-form mRNA210Short-form mRNA83Total G1-relatedclones1013where the alternative categories are unambiguously defined and refer to the same objects.For example, the above results constitute good evidence that G1is expressed in different forms in those tissues(Fisher’s exact test two-tail P-value=1.2%).False Leads in the Selection of Candidate GenesA crucial measure of the power of statistical signifi-cance tests is their rate of false alarm,that is,how often random fluctuations are expected to be mis-taken for significant differences in the results.When analyzing the transcript profiles from two different libraries,a false alarm would cause a gene to be deemed differentially transcribed,whereas in fact it is not.The rate of false alarm is therefore a direct estimate of the fraction of false leads,when search-ing for differentially expressed genes on the basis of differences in tag counts.The rates of false alarm associated with the P<0.01and P<0.05confi-dence intervals listed in Table1have been com-puted by Monte-Carlo simulation on the basis of two experimental sequence tag distributions(Table 2;Fig.1).The rate of false alarms associated with the use of Equation1(in fact,its cumulative form Equa-tion9,see Methods)is very small for genes repre-sented by small tag counts and slowly increases for higher tag counts,without ever exceeding the se-lected significance level.Such good behavior vali-dates the use of the confidence intervals(Table1) computed from Equation1and Equation9to assess the statistical significance of variations in digital Northern data.The curves labeled‘‘window’’char-acterize the very similar behavior of a slightly less conservative derivation of the same test(see Meth-ods,Equation15).For comparison,Figure1also presents the behavior of another test,based on an inappropriate application of Ricker’s confidence in-tervals(Ricker1937)(see Methods).DISCUSSIONAn appropriate statistical test is now at our disposal to begin analyzing digital gene expression profiles STATISTICAL ANALYSIS OF TRANSCRIPT PROFILESGENOME RESEARCHin a more quantitative way.For example,the test can be used to determine how many genes appear regulated at various confidence levels using the data from a typical experiment(e.g.,sampling a thou-sand clones).We analyzed the data gathered by Okubo et al.(1995)on the human promyelocytic leukemia cell line HL60induced by dimethylsulfox-ide(DMSO)or tetradecanoylphorbolacetate(TPA). Table3shows the21EST classes the occurrences of which exhibit significant variations at the1%level. Most of the corresponding genes make biological sense in term of differentiation along the granulo-cyte or monocyte pathways.This example serves to discuss a subtle point in the interpretation of the P values computed from Equation1,2,and9.Rigorously,these equations apply to the case where a given gene(e.g.,lipocor-tin)would have been selected for scrutiny before looking at the differences in cognate tag counts be-tween libraries.When comparing two libraries with-out specifying in advance the transcripts we want to follow,and then focusing a posteriori on any of those exhibiting significant variations,the average number of expected false positive N f a l s e is N false=PN species,where N species is the number of dif-ferent transcript species encountered and p is a given significance level.For instance,in the experi-ment analyzed in Table3,N species is of the order of 600(Okubo et al.1995).It is therefore possible that up to four(600ן7ן10מ3)out of the21transcript species listed in Table3are not truly differentially expressed.Therefore,when two libraries are compared without prior gene selection,the use of a predeter-mined significance threshold is not advisable.The P values computed from Equation1,2,and9should simply be used to rank all observed variations by order of decreasing statistical significance(analo-gous to how‘‘similarity hits’’are listed after data-base searches).The end-users can then make their own choice about the number of candidate target genes to be retained from the top of the list,bearing in mind the corresponding number of expected false positives.Although the present interpretation of a digital Northern focuses on the genes exhibiting the most spectacular differential expressions,there is already ample evidence that small changes can cause drasticTable2.Publicly Available Distributions of Sequence Tags(Left)Data from Velculescu et al.(1995):Frequency of occurrence of each of the428transcriptspecies represented in840SAGE tags randomly generated from a3Ј-directed cDNA library fromhuman pancreas.(Right)Data from Okubo et al.(1992):Frequency of occurrence of each of641transcript species represented in982randomly sequenced clones from a3Ј-directed cDNAlibrary from human liver cell line HepG2.AUDIC AND CLAVERIE990effects.Disease states caused by haploinsufficiency and trisomy suggest that2→1or2→3propor-tional changes in expression level may be of biologi-cal significance.Table1shows that there is no theo-retical limit to the detection of such small variations from the comparison of digital expression patterns. Simply,the sampling size has to be increased enough for the required numbers of cDNA tags to reach a significance threshold(for instance 40→60,for a confidence level of95%).Analog hybridization-based methods(Fodor et al.1991;Lennon and Lehrach1991;Gress et al. 1992;Southern et al.1992;Guo et al.1994;Matson et al.1995;Nguyen et al.1995;Schena et al.1995; Zhao et al.1995)are traditionally opposed to digital tag-counting methods(Okubo et al.1992;Matsub-ara and Okubo1994;Lee et al.1995;Okubo et al.1995;Vel-culescu et al.1995)for theanalysis of differential geneexpression.Both types ofmethods are sensitive to thequality of the original messen-ger RNA preparation and/orcDNA libraries.Analog meth-ods promise higher through-put,lower cost,and have thecapacity of studying transcriptson a much wider scale of abun-dance.They are therefore ex-pected to supersede digitalmethods.On the down side,however,hybridization signalsare not easily reproducible,andcan be affected by many un-known properties such as thecDNA library complexity,aswell as clone and sequence spe-cific features(e.g.,insert size,nucleotide composition,pres-ence of repeats,secondarystructure,triple helix interac-tion,etc.).Therefore,the hy-bridization-based methods re-quire an estimation of the dis-persion of the signal associatedwith each clone(i.e.,enoughrepetitions of each experi-ment),and multiple standard-ization and calibration proce-dures to allow the meaningfulcomparison of hybridizationpatterns obtained from varioussources(tissues,cell types,etc.) or from different membranes or chips.This is far from routine and has yet to be worked out.In con-trast,and thanks to the unique properties of the Poisson distribution,digital methods have the ca-pacity of providing a quantitative assessment of dif-ferential expression without the repetition or the standardization of individual tag-counting experi-ments.The statistical analysis presented here pro-vides an objective method to analyze digital tran-script profile data,and adapts it to fit(1)the num-ber of leads one wants to be followed;(2)the fraction of false clues to be tolerated;and(3)the level of modulation in gene expression considered of biological interest.A program is available on our web site(http:// rs-mrs.fr)to compute the confidenceFigure1Rate of false alarm computed according to the confidence intervalslisted in Table1.(Top)Monte-Carlo simulation of the random sampling of840tags distributed according to the data from Velculescu et al.(1995;see Table2).(Bottom)Monte-Carlo simulation of the random sampling of982ESTs distrib-uted according to the data from Okubo et al.(1992;see Table2).The fre-quency of false alarm was computed for two significance levels(2⑀=5%,leftand2⑀=1%,right)and plotted in function of the tag class size(from1–64forVelculescu et al.,from1–22for Okubo et al.).In all cases,the rate of false alarmincreases up to a plateau for larger class sizes.The test(cumulative form ofEquation1)derived from the flat p()prior shows perfect behavior with amaximal rate of false alarm always less than the significance levels(brokenlines).The test(cumulative form of Equation15)derived from the window p()prior exhibits a slightly higher rate of false alarms.Both versions of the testexhibit conservative behaviors for class size<5,with a false alarm rate even lessthan expected.In contrast,Ricker’s confidence intervals(Equation12)aregrossly inadequate and lead to false alarm rates up to four times the significancelevel.Graphs are computed from the analysis of1000repetitions of each ex-periment.STATISTICAL ANALYSIS OF TRANSCRIPT PROFILESGENOME RESEARCHintervals corresponding to arbitrary significance lev-els and sampling size N1and N2.METHODSLet us denote p(x)the probability to observe x sequence tags of the same gene(i.e.,from the3Јend of the same transcript) when N cDNA clones are picked randomly.For each transcript representing a small(i.e.,less than5%)fraction of the library and Nജ1000,p(x)will closely follow the Poisson distribu-tion:p͑x͒=e−xx!(3)whereis the actual(albeit unknown)number of transcript of this type per N clones in the library.If we duplicate this ex-periment(i.e.,once again randomly pick N clones of the same library and generate sequence tags),we will now observe y occurrences of the same transcript.What is the probability of the various y values?An approximate solution consists in us-ing x as the maximum likelihood estimate forand compute the probability for y occurrences given a Poisson distribution of mean=x:p͑y|x͒=e−x x yy!(4)Equation4is not symmetrical in x and y.This is an ob-vious flaw as the probability should not depend on which of the x or y values were observed first.p(y|x)=p(x|y)should hold provided that an equal number N of clones is sampled in both experiments.Equation4is not the correct formula,be-cause we have not yet taken into account the fluctuation of x around the unknown mean.To account for the fact that the actual value ofis unknown,we have to integrate Equation4 over all possiblevalues:p͑y|x͒=͐0ϱdp͑d=|x͒p͑y|d=͒(5) p(d=|x)in Equation5is the probability that the actualTable3.List of ESTs Exhibiting Significant(P<0.01)Differences inAbundance in the HL60Cell Line Induced by DMSO or TPAEST ID HL60HL60+TPA HL60+DMSO Significance418221013ן10מ7211241024ן10מ71982328ן10מ735616203ן10מ638012106ן10מ513541206ן10מ528514811ן10מ4201501102ן10מ424401143ן10מ429313613ן10מ429211015ן10מ465014525ן10מ433515339ן10מ444410412ן10מ316740814ן10מ31550834ן10מ38616107ן10מ33056207ן10מ318060607ן10מ318080607ן10מ317660607ן10מ3Only the probability(computed according to Equations7and8)corresponding to the most significanttransition(numbers in bold)is listed(Okubo et al.1995).The total EST numbers sampled from the HL60,HL60+TPA and HL60+DMSO cDNA libraries are845,845,and1058,respectively.ESTs418,211,356,285,293,292,650,335,444,861,305corresponding to ribosomal proteins,and EST380,a tag to an unkown gene,exhibit a marked reduction of expression level in the DMSO-and/or TPA-induced differentiated states.Inconstrast,ESTs135(ferritin),2015(LD78/macrophage inflammatory protein),1674(methionine adenosyl-transferase),155(thymosin-4),1806(lipocortin),1808(thymosin-10),and1766(a metallothionein)appear more abundant in the TPA-induced state,also highly enriched in EST19(the ubiquitous elongationfactor1-␣).-Actin(EST244),is the only markedly increased tag in the DMSO-induced state.EST numbers,abundance data,and protein assignments are from the‘‘body map’’public expression data repository athttp://www.imcb.osaka-u.ac.jp(K.Okubo and K.Matsubara).AUDIC AND CLAVERIE992abundance of a given transcript isgiven that x occurrences of a cognate tag have been observed in one experiment.The second term in the integral is the probability of drawing y occurrences given a Poisson distribution of mean:p͑y|d=͒=e−yy!(6)Using Bayes’theorem p(d=|x)can be written asp͑d=|x͒=p͑x|d=͒p͑d=͒͐0ϱdЈp͑x|d=Ј͒p͑d=Ј͒(7)To evaluate Equation7,we need to define the prior dis-tribution p(d=).The least constrained hypothesis(i.e.,with the least information content),is to attribute an equal a priori probability to allvalues in the[0,ϱ]range.Incorporating such a flat prior in Equation5leads top͑y|x͒=1x!y!͐0ϱde−2͑x+y͒(8)From the definition of the⌫function for integer arguments we observe that͐0ϱde−2͑x+y͒=͑x+y͒!2͑x+y+1͒and finally obtain the expression given in Results:p͑y|x͒=͑x+y͒!x!y!2͑x+y+1͒(1)This equation can be used in a wide variety of experi-mental situations.Equation1defines the probability of ob-serving x and y occurrences of the same rare event in dupli-cated experiments,regardless of the detailed probability dis-tribution of those events among the set of possible outcomes. In particular,in the context of transcription profiles,p(y|x) can be evaluated regardless of the distribution of each tran-script(provided it is rare)within a cDNA library.To compute the confidence intervals listed in Table1,we made use of the cumulative distributions:C͑yഛy min|x͒=͚y=0yഛy min p͑y|x͒(9a)D͑yജy max|x͒=͚y=y maxϱp͑y|x͒(9b) These equations allow the computation of an interval [y m i n,y m a x]⑀s u c h a s C(yഛy m i n|x)ഛ⑀a n d D(yജy max|x)ഛ⑀.Given that an event is observed x times in one experiment,the number y of occurrences of this event in a duplicate experiment is expected to fall within the interval [y min,y max]⑀with a probability of1–2⑀.Equation9,a and b, can therefore serve as a significance test when comparing,for instance,the results of sampling N clones from two different libraries.For2⑀small(e.g.,5%or less),y values falling outside the[y min,y max]⑀interval correspond to p(y|x)<<1,and point out significant differences between the two experiments. They should include differentially expressed genes,for ex-ample,for whichis different in the two libraries.Generalization to Different Sampling SizesWhen different numbers of clones N1and N2are sequenced from the same library,Equation5becomesp͑y|x͒=͐0ϱd2͐0ϱd1p͑d1=1|x͒p͑y|d2=2͒␦ͩ2מN2N11ͪ(10) where the two abundance values1and2are forced in the same ratio as N1and ing the same bayesian argument as before(Equation7)leads top͑y|x͒=1x!y!ͩN2N1ͪy͐0ϱd1e−1ͩ1+N2N1ͪ1͑x+y͒(11)the last integral is simply͑x+y͒!ͩ1+N2N1ͪ͑x+y+1͒leading to the formula presented in the Results section:p͑y|x͒=ͩN2N1ͪy͑x+y͒!x!y!ͩ1+N2N1ͪ͑x+y+1͒(2)Ricker’s Confidence IntervalThe confidence interval computed from Equation1(and its cumulative form,Equation9,a and b)is different from one introduced previously by Ricker(1937)although,at first,the two may appear to be related.Given x occurrences of a sequence tag,Ricker’s formula defines a confidence interval[min,max]x for(again the actual number of transcripts of this type per N clones in the library)such asp͑kഛx͒=͚k=0x e−maxmax kk!ഛ␣2(12a) andp͑kജx͒=͚k=xϱe−minmin kk!ഛ␣2(12b) where␣is typically5%or1%.Ricker’s confidence intervals for various values of x are given in Table1.Those intervals are close to those computed from Equation1,but delineate the range of likelyvalues,not y(the number of occurrences of the same event in a duplicated experiment).It is possible for x and y to fall outside each other’s Ricker’s confidence interval [min,max],while still being nonsignificant fluctuations around the samevalue.The confidence intervals computed from Equation12,a and b,are therefore too narrow to prop-erly define significant discrepancies between x and y.The false alarm rate associated with the use of Ricker’s confidence in-tervals is too high(Fig.1).However,an interesting use of Equation12,a and b,is the estimation of the range of possible frequencies[min,max]x=0for cDNAs not yet encountered after picking N clones.For example,the95%confidence interval is given by:0<N<3.7(13) That is,the abundance of a cDNA not picked up among STATISTICAL ANALYSIS OF TRANSCRIPT PROFILESGENOME RESEARCH。
基于基因家族大小的比较研究脊椎动物的适应性进化
Hereditas (Beijing) 2019年2月, 41(2): 158―174收稿日期: 2018-08-06; 修回日期: 2018-12-13作者简介: 孟玉,硕士研究生,专业方向:遗传学。
E-mail: m1994yu@通讯作者:杨若林,教授,博士生导师,研究方向:进化遗传学和生物信息学。
E-mail: desert.ruolin@ DOI: 10.16288/j.yczz.18-225网络出版时间: 2019/1/14 13:15:21URI: /kcms/detail/11.1913.R.20190114.1315.004.html 研究报告基于基因家族大小的比较研究脊椎动物的适应性进化孟玉,杨若林西北农林科技大学生命科学学院,杨凌712100摘要:同源基因家族的拷贝数在不同物种间普遍存在差异,这种差异是由不同的基因得失速率引起。
众所周知,基因拷贝数变异是特定物种表型创新的可能原因。
本研究选取具有代表性的脊椎动物主要类群并跨约6亿年进化时间的64个物种,鉴定了它们的同源基因家族,揭示了脊椎动物基因家族大小的进化模式。
结果表明:在推断的存在于脊椎动物最近共同祖先的6857个基因家族中,有6712个都在至少一个种系中发生了大小的变化,而且基因家族在大多数种系中都是收缩的;其中,霍氏树懒(Choloepus hoffmanni)中有最高的基因家族收缩水平,而在斑马鱼(Danio rerio)中则相反。
基于脊椎动物基因家族大小进化的高度动态性,本研究从基因家族大小变化的角度鉴定了一些可能与特定脊椎动物类群进化有关的基因组信号。
结果观察到在现存真骨鱼类最近共同祖先基因组中出现了可能因全基因组复制所导致的高比例的基因家族扩增现象,随后在后裔物种中发生基因收缩事件。
此外,本研究还发现了硬骨鱼特异性的orphan基因可能对这些鱼类在水生环境中的适应性进化有所贡献的证据,如在有些硬骨鱼中orphan基因与鳍、尾巴、肾脏等发育有关。
名词解释-5道
名词解释5道(基因组(同源……);基因突变;蛋白质降解;表达调控)基因组(第二章)Genome (基因组):一种生物细胞内全部遗传物质的总和,包括构成基因和基因之间区域的所有DNA;C值:基因组中的全部DNA量称为C值。
Families of genes(基因家族):同一物种中结构与功能相似,进化起源上密切相关的一组基因。
多基因家族(multi gene family)指由某一祖先基因经过倍增和变异所产生的一组基因。
假基因(pseudo gene):在多基因家族中,某些基因并不产生有功能的基因产物,这些基因称为假基因(pseudo gene)(来源:突变或来自RNA的逆转录;重新插入基因组)经典的多基因家族:成员的序列相等或近乎相等,人们认为多基因家族成员来自祖先基因的倍增(e.g.rRNA )“复合”多基因家族:序列相似,编码产物特性上有差异orthologs直系同源基因: genes in two separate species that derive from the same ancestral gene in the last common ancestor of those two species.paralogs旁系同源基因: related genes that have resulted from a gene duplication event within a single genome — likely to have diverged in their function—Homologs:Genes that are related by descent in either way are called homologs, a general term used to cover both types of relationshipgene superfamily: sometimes it is possible to see relationships not only within a single gene family but also between different families.(e.g.the α- and β-globin families )Operon:(操纵子)(为原核生物所特有)a group of genes that are located adjacent to one another in the genome, with perhaps just one or two nucleotides between the end of one gene and the start of the next.all the genes in an operon are expressed as a single unit.蛋白质降解(第四章)熔球(molten globule) 包含了二级结构的大部分元件,其结构已接近于蛋白质的最终结构。
Evolution_of_p53_pathway-related_genes_provides_in
Evolution of p53 pathway-related genes provides insights into anticancer mechanisms of natural longevity in cetaceansDEAR EDITOR,Despite the generally increased cancer risk in large, long-lived organisms, cetaceans, among the largest and longest-living mammals, appear to possess a counteracting mechanism.Nevertheless, the genetic basis underlying this mechanism remains poorly understood. The p53 pathway serves as an ideal target for studying the mechanisms behind cancer resistance, as most cancer types have evolved strategies to circumvent its suppressive functions. Here, comparative genetic analysis of 73 genes involved in the p53 pathway in cetaceans (Supplementary Table S1) was undertaken to explore the potential anticancer mechanisms behind natural longevity. Results showed that long-lived species contained three positively selected genes (APAF1, CASP8, and TP73)and three duplicated genes (IGFBP3, PERP , and CASP3)related to apoptosis regulation. Additionally, the evolutionary rates of three genes associated with angiogenesis (SERPINE1, CD82, and TSC2) showed a significant relationship with longevity quotient (LQ) and maximum lifespan (MLS), suggesting angiogenesis inhibition as another potential strategy protecting cetaceans from cancer.Interestingly, several positively selected tumor suppressor genes with high copy numbers were correlated with body size in the large-bodied and long-lived cetacean lineages,corroborating Peto’s paradox, which posits no link between cancer incidence and body size or longevity across species. In conclusion, we identified several candidate genes that may confer cancer resistance in cetaceans, providing a new avenue for further research into the mechanisms of lifespan extension.Mammalian lifespans and body masses (BM) exhibit considerable variability, with the shortest- and longest-living mammals differing by more than 100-fold and the smallest and largest mammals differing by more than 100-million-fold (Tacutu et al., 2018). The largest extant mammal, the blue whale (Balaenoptera musculus ), has an average adult weight of 136 000 kg and MLS of 110 years, while the longest-lived mammal, the bowhead whale (Balaena mysticetus ), has an average weight of over 100 000 kg and an MLS of 211 years.Typically, large and long-lived organisms face elevated cancer risk due to increased cell divisions, which increases the likelihood of DNA damage and potential cellular transformation to malignancy. Nonetheless, large whales, possessingapproximately 1 000 times more cells than humans,demonstrate an unexpectedly low cancer risk, despite their extended lifespans (Nagy et al., 2007). These observations align with Peto’s paradox, suggesting that these cetaceans have evolved an effective mechanism for suppressing cancer (Peto et al., 1975). The p53 pathway, vital for tumor suppression and lifespan extension, acts as a transcription factor to prevent tumor formation and development by selectively regulating target genes to induce cell cycle arrest,promote cell apoptosis or senescence, and accelerate DNA repair (Cha & Yim, 2013). Thus, studying the p53 pathway is a promising approach for uncovering the mechanisms by which large and long-lived species inhibit cancer.Based on MLS and BM records of non-flying eutherian mammals from the AnAge online dataset (Supplementary Table S2), a new allometric equation was derived:The LQ of all cetacean species was calculated using the allometric equation:Species with an LQ or MLS greater than 0.5 standard deviations (SD ) from the mean of 65 cetaceans were classified as long-lived. This determination was made after computing sequential thresholds ranging from 0 to 1.0 SD from the cetacean mean, with species designated as long-lived consistently falling within the 0.4 to 0.8 range (Supplementary Figure S1). The mean LQ value for the 65cetaceans was 0.97 with a 0.5 SD of 0.17 (LQ=0.97±0.17;Supplementary Table S3). Six cetacean species, including bowhead whales, long-finned pilot whales (Globicephala melas ), Pacific white-sided dolphins (Lagenorhynchus obliquidens ), killer whales (Orcinus orca ), bottlenose dolphins (Tursiops truncatus ), and Indo-Pacific bottlenose dolphins (T.aduncus ) were classified as long-lived (LQ>1.14) and five species were classified as short-lived (LQ<0.80), with the remaining intermediate species (0.8<LQ<1.14) serving as controls (Figure 1A). In terms of MLS, the mean value across all cetaceans was 47.90±15.67. Five long-lived species,including blue whales, bowhead whales, humpback whales (Megaptera novaeangliae ), killer whales, and sperm whalesReceived: 09 April 2023; Accepted: 11 September 2023; Online: 11September 2023Foundation items: This work was supported by the National Key Program of Research and Development, Ministry of Science and Technology of China (2022YFF1301600), National Natural Science Foundation of China (32070409, 32270453 to S.X.X.), Priority Academic Program Development of Jiangsu Higher Education Institutions to G.Y. and S.X.X., and Qing Lan Project of Jiangsu Province to S.X.X.This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (/licenses/by-nc/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium,provided the original work is properly cited.Copyright ©2023 Editorial Office of Zoological Research, Kunming Institute of Zoology, Chinese Academy of SciencesLiu et al. Zool. Res. 2023, 44(5): 947−949https:///10.24272/j.issn.2095-8137.2023.058(Physeter catodon) were identified with an MLS>63.57 and four short-lived species were identified with an MLS<32.23. Ancestral reconstructions based on MLS and LQ were performed to classify long-lived lineages in ancestral nodes (Supplementary Figure S2). Subsequent analyses were then performed on cetacean species identified as long-lived based on both standards (LQ>1.14, MLS>63.57).In this study, 23 genes exhibited copy number gains in at least one cetacean lineage (Figure 1B; Supplementary Table S4, S5). Among these, four genes (BCL2L1 in long-finned pilot whales, IGFBP3 and STEAP3 in Indo-Pacific bottlenose dolphins, and PERP in bottlenose dolphins) contained two copies unique to the long-lived cetacean lineages (LQ>1.14), whereas only one copy was identified in the other cetacean lineages. In addition, the long-lived sperm whale contained three copies of CCNB2 and two copies of MDM4, while only one copy of each was found in the other cetacean lineages. Results also showed CASP3 and CCNG1 duplication in all large, long-lived cetacean species (MLS>63.57), except for the common minke whale (Balaenoptera acutorostrata). Ancestral state reconstructions estimated that the last common ancestor of baleen whales lived to over 110 years of age (Supplementary Figure S2), suggesting that increased copy numbers of both genes in common minke whales was the same as that of other baleen whales classified as long-lived based on MLS. In contrast, only one RCHY1 copy was identified in the five large, long-lived species, while two copies were found in the other cetacean species. Expanded analyses of 17 non-cetacean mammals indicated higher CASP3 and PERP copies in notable cancer-fighting species, including primates and naked mole-rats (Gorbunova et al., 2014). Neither genome assembly length nor scaffold N50 number influenced the estimated gene copy number (Supplementary Figure S3).A total of 46 “one-to-one” orthologous genes were identified among the 73 genes involved in the p53 pathway (Supplementary Table S6). To detect signatures of episodic selection in genes occurring in long-lived cetaceans, four different methods were used, including the free-ratio and branch-site models from PAML v4.9 and aBSREL and BUSTED from Datamonkey v2.0. The likelihood ratio test (LRT) indicated that the free-ratio model, which assumes an independent ω for each branch, provided a superior fit to the data compared to the one-ratio model for the APAF1, CASP8, and AIFM2 genes (P<0.05, Figure 1A; Supplementary Table S7). An ω value greater than one was observed exclusively in specific branches of APAF1 and CASP8, including the last common ancestor (LCA) of the humpback whale and the terminal branch for the Indo-Pacific bottlenose dolphin for APAF1, as well as the LCA of delphinids and the branch leading to the Pacific white-sided dolphin for CASP8. The branch-site model from CODEML, designed to detect pronounced positive selection on a limited number of sites amidst predominant purifying selection, yielded consistent results. Evidence of positive selection was observed in the two long-lived branches leading to the sperm whale for TP73 and the long-finned pilot whale for AIFM2 (Figure 1A; Supplementary Table S7). Based on the BEB approach, twoFigure 1 Evidence of p53 pathway-related gene evolution in cetaceansA: Long-lived species identified based on LQ and MLS are marked in green and red, respectively. Short-lived species and control group are marked in purple and gray, respectively. Significant positive selections identified by free-ratio model, branch-site model, and aBSREL are indicated by a circle, pentagram, and triangle, respectively. Photo credit: NOAA Fisheries. B: Copy number variation in p53 pathway-related genes in cetaceans. Colors correspond to number of copies, with red indicating increasing copy number. Red bars represent long-lived species. C: Regression analyses between root-to-tip (ω) and three longevity traits (MLS, BM, and LQ).948 positively selected sites identified in both genes (TP73: 506; AIFM2: 459) exhibited radical changes in at least one property. The alternative branch-site model aBSREL in Datamonkey v2.0 appeared to be markedly more sensitive in detecting episodic selection than the branch-site methods in PAML v4.9. After multiple testing correction, TP73 displayed positive selection in the large, long-lived humpback whale (Supplementary Table S8). The BUSTED program, which may be particularly effective at testing for selection limited to foreground branches, further revealed positive selection in TP73 within the long-lived cetaceans (P<0.05, Supplementary Table S9). Employing these four methods of selection, three positively selected genes (APAF1, CASP8, and TP73) emerged as unique to the long-lived cetacean species. Collectively, all three positively selected genes (APAF1, CASP8, and TP73) and most genes with multiple copy numbers (IGFBP3, PERP, and CASP3) play roles in apoptosis regulation in the long-lived cetacean species (Supplementary Figure S4). This suggests that these long-lived lineages may have evolved apoptosis mechanisms to prevent cancer. Phylogenetic generalized least squares (PGLS) regression was performed between the evolutionary rate of each orthologous gene (represented by root-to-tip ω) and the three lifespan-associated traits (MLS, BM, and LQ). Results showed that the evolutionary rates of two angiogenesis-inhibiting genes (SERPINE1: R2=0.343, P=0.010; and CD82: R2=0.392, P=0.004) were significantly correlated with LQ (Figure 1C). SERPINE1, which encodes endothelial plasminogen activator inhibitor-1 (PAI1), plays an important role in inhibiting vascular endothelial growth factor (VEGF)-induced angiogenesis in mice (Wu et al., 2015). The tumor suppressor gene TSC2 (R2=0.364, P=0.008) was significantly positively associated with MLS in the cetaceans. TSC2 is implicated in the regulation of angiogenesis. Notably, absence of this gene results in elevated levels of hypoxia-induced factor 1α (HIF-1α) and VEGF, and the subsequent activation of the HIF-1α/VEGF pathway, which mediates hypoxia-induced angiogenesis (Brugarolas et al., 2003). In addition, research has shown that TSC2 is negatively regulated by mTOR signaling, with genetic inhibition of TOR activity leading to a two-fold extension in lifespan in Caenorhabditis elegans (Brugarolas et al., 2003; Vellai et al., 2003). Furthermore, a positive correlation was observed between the TP73 evolution rate and BM (R2=0.226, P=0.031). Angiogenesis plays a crucial role in cancer development, facilitating the delivery of oxygen, nutrients, and growth factors, and promoting tumor metastasis to distant organs (Al-Ostoot et al., 2021). Thus, inhibition of angiogenesis may represent another anticancer mechanism in cetaceans.Peto’s paradox states that large-bodied, long-lived species do not exhibit a greater lifetime risk of cancer compared to small, short-lived species (Peto et al., 1975). For instance, despite the 100-fold difference in cell number between African elephants and humans, the former does not show a higher incidence of cancer compared to the latter (Abegglen et al., 2015). A similar observation has also been observed in large-bodied, long-lived cetaceans (Nagy et al., 2007). Our research findings provide strong evidence in support of Peto’s paradox. Notably, a series of positively selected genes and tumor suppressor genes with copy number variations were identified in the large, long-lived species. Moreover, a significant positive relationship between gene evolution and body size was identified for the tumor suppressor gene TP73,suggesting that cetaceans evolved mechanisms to counteract the risk of cancer caused by the accumulation of cellular mutations. However, further laboratory experiments are needed to verify this.SUPPLEMENTARY DATASupplementary data to this article can be found online.COMPETING INTERESTSThe authors declare that they have no competing interests.AUTHORS’ CONTRIBUTIONSS.X.X. designed the study. X.L. was responsible for the data collection and analysis. S.X.X. and X.L. drafted the manuscript. S.X.X. revised the manuscript. F.Y., Y.L., X.H., and L.X.S. participated in the data collection. Z.P.Y., W.H.R., and G.Y. helped edit the manuscript. All authors read and approved the final version of the manuscript.ACKNOWLEDGMENTSWe thank members of the Jiangsu Key Laboratory for Biodiversity and Biotechnology, Nanjing Normal University, for their contributions to this paper. We thank Mr. Tian-Zhen Wu and Mr. Xu Zhou for their helpful suggestions. We are particularly grateful to Dr. Ran Tian and Dr. Wei-Jian Guo for their technical support.Xing Liu1, Fei Yang1, Yi Li1, Zhen-Peng Yu1, Xin Huang1, Lin-Xia Sun1, Wen-Hua Ren1, Guang Yang1, Shi-Xia Xu1,*1 Jiangsu Key Laboratory for Biodiversity and Biotechnology,College of Life Sciences, Nanjing Normal University, Nanjing,Jiangsu 210023, China*Corresponding author, E-mail: *****************.cnREFERENCESAbegglen LM, Caulin AF, Chan A, et al. 2015. Potential mechanisms for cancer resistance in elephants and comparative cellular response to DNA damage in humans. JAMA, 314(17): 1850−1860.Al-Ostoot FH, Salah S, Khamees HA, et al. 2021. Tumor angiogenesis: Current challenges and therapeutic opportunities. Cancer Treatment and Research Communications, 28: 100422.Brugarolas JB, Vazquez F, Reddy A, et al. 2003. TSC2 regulates VEGF through mTOR-dependent and-independent pathways. Cancer Cell, 4(2): 147−158.Cha HJ, Yim H. 2013. The accumulation of DNA repair defects is the molecular origin of carcinogenesis. Tumor Biology, 34(6): 3293−3302. Gorbunova V, Seluanov A, Zhang ZD, et al. 2014. Comparative genetics of longevity and cancer: insights from long-lived rodents. Nature Reviews Genetics, 15(8): 531−540.Nagy JD, Victor EM, Cropper JH. 2007. Why don't all whales have cancer?A novel hypothesis resolving Peto's paradox. Integrative and Comparative Biology, 47(2): 317−328.Peto R, Roe FJ, Lee PN, et al. 1975. Cancer and ageing in mice and men. British Journal of Cancer, 32(4): 411−426.Tacutu R, Thornton D, Johnson E, et al. 2018. Human ageing genomic resources: new and updated databases. Nucleic Acids Research, 46(D1): D1083−D1090.Vellai T, Takacs-Vellai K, Zhang Y, et al. 2003. Influence of TOR kinase on lifespan in C. elegans. Nature, 426(6967): 620.Wu JB, Strawn TL, Luo M, et al. 2015. Plasminogen activator inhibitor-1 inhibits angiogenic signaling by uncoupling vascular endothelial growth factor receptor-2-αVβ3 integrin cross talk. Arteriosclerosis, Thrombosis, and Vascular Biology, 35(1): 111−120.Zoological Research 44(5): 947−949, 2023 949。
遗传学名词解释(中英对照版)
遗传学名词解释(中英对照版)abortive transduction 流产转导:转导的DNA片段末端掺入到受体的染色体中,在后代中丢失。
acentric chromosome 端着丝粒染色体:染色体的着丝粒在最末端。
Achondroplasia 软骨发育不全:人类的一种常染色体显性遗传病,表型为四肢粗短,鞍鼻,腰椎前凸。
acrocetric chromosome 近端着丝粒染色体:着丝粒位于染色体末端附近。
active site 活性位点:蛋白质结构中具有生物活性的结构域。
adapation 适应:在进化中一些生物的可遗传性状发生改变,使其在一定的环境能更好地生存和繁殖。
adenine 腺嘌呤:在DNA中和胸腺嘧啶配对的碱基。
albino 白化体:一种常染色体隐性遗传突变。
动物或人的皮肤及毛发呈白色,主要因为在黑色素合成过程中,控制合成酪氨酸酶的基因发生突变所致。
allele 等位基因:一个座位上的基因所具有的几种不同形式之一。
allelic frequencies (one frequencies)在群体中存在于所有个体中某一个座位上等位基因的频率。
allelic exclusion 等位排斥:杂合状态的免疫球蛋白基因座位中,只有一个基因因重排而得以表达,其等位基因不再重排而无活性。
allopolyploicly 异源多倍体:多倍体的生物中有一套或多套染色体来源于不同物种。
Ames test 埃姆斯测验法:Bruce Ames 于1970年人用鼠伤寒沙门氏菌(大鼠)肝微粒体法来检测某些物质是否有诱变作用。
amino acids 氨基酸:是构成蛋白质的基本单位,自然界中存在20种不同的氨基酸。
aminoacyl-tRNA 氨基酰- tRNA:tRNA的氨基臂上结合有相应的氨基酸,并将氨基酸运转到核糖体上合成蛋白质。
aminoacyl-tRNA synthetase 氨基酰- tRNA合成酶:催化一个特定的tRNA结合到相应的tRNA分子上。
遗传学名词解释-(2)
遗传学名词解释-(2)《遗传学》名词解释1Mitosis, 有丝分裂:高等真核生物细胞分裂形式,细胞分裂过程中,子代染色体数目不变。
Meiosis, 减数分裂:有性生殖个体形成生殖细胞的分裂方式,期间复制一次,分裂两次,子代染色体数目减半。
Chromosome 染色体:指细胞分裂过程中,由染色质聚缩而呈现为一定数目和形态的复合结构。
2Synapsis 联会:减数分裂偶线期开始出现同源染色体配对现象,即联会。
Homologous Chromosome, 同源染色体:指形态、结构和功能相似的一对染色体,他们一条来自父本,一条来自母本。
Chromatids 染色单体:复制时产生的染色体拷贝。
3Univalent 二价体:一对配对的同源染色体称二价体Wild-type gene, 野生型基因:在自然群体中往往有一种占多数座位的等位基因,称为野生型基因。
Bivalent 单价体:本应联会而未联会的染色体。
4Alleles, 等位基因:位于同源染色体上,位点相同,控制着同一性状的基因。
Principle of Linkage, 连锁遗传定律:亲本所具有的两个或多个性状,常有联系在一起遗传的倾向。
Locus 基因座:一个特定基因在染色体上的位置。
5Heterozygous,杂合子:是指同一位点上的两个等位基因不同的基因型个体Homozygous, 纯合子:是指同一位点上的两个等位基因相同的基因型个体Phenotype 表现型:某个体某单性状的物理表现形式。
6Diploid, 二倍体:细胞核染色体根据着丝粒位置和染色体长度全是成对存在的个体或细胞。
(2n)Haploid, 单倍体:具有配子染色体数(n)的个体或细胞。
Chromosome theory of heredity, 遗传的染色体理论:基因位于染色体上,且成线性排列,基因间的距离由它们在后代中共同出现的概率决定。
Polyploid 多倍体:体细胞中含三个及以上染色体组的个体。
医学遗传学词汇Glossary
Acceptor splice site??The boundary between the 3’end of an intron and the 5’end of the following exon. Also called 3’splice site.剪接受体位点:内含子3′末端与下一个外显子5′端之间的交界处。
又称3′剪接位点。
Acrocentric??A type of chromosome with the centromere near one end. The human acrocentric chromosomes (13, 14, 15, 21, and 22) have satellited short arms that carry genes for ribosomal RNA.近端着丝粒(染色体):着丝粒位于接近染色体臂端部的染色体。
人类近端着丝粒染色体(第13、14、15、21和22号)短臂的随体携带有编码核糖体RNA的基因。
Adverse selection??A term used in the insurance industry to describe the situation in which individuals with private knowledge of having an increased risk for illness, disability, or death buy disproportionately more coverage than those at a lower risk. As a result, insurance premiums, which are based on averaging risk across the population, are inadequate to cover future claims.逆向选择:保险业的专有名词,指投保人知晓其有较高的患病、残疾或死亡风险,但隐瞒真相购买相关保险。
遗传学名词解释(英文)
细菌遗传合成代谢功能的突变型(anabotic function mutants)合成代谢功能(anabolic functions):野生型(wild type)在基本培养基上具有合成和生长所必需的有机物的功能营养缺陷型(auxotroph):野生型品系的任何一个基因突变,都不能进行一个特定的生化反应,从而阻碍整个合成代谢功能的实现分解代谢功能的突变型(catabolic functional mutation分解代谢功能(catabolic function):指野生型E coli能利用比葡萄糖复杂的不同碳源,转化成葡萄糖或其他简单的糖类,也能把复杂的氨基酸或脂肪分子降解成乙酸或三羧酸循环的中间产物的功能抗性突变型细菌由于某基因的突变而对某些噬菌体或抗菌素产生抗性(resistant),从而使其不能吸附或吸附在这种突变细菌上的能力降低conjugation (接合生殖)F因子又称性因子或致育因子(sex or fertility factor),它是能独立增殖的环状DNA分子F+细菌丢失F因子,成为F-细菌(acriflavine处理)F-受体细胞只接受部分的供体染色体,这样的细胞称为部分二倍体(partial diploid)或半合子(merozygote)内基因子(endogenote)和外基因子(exogenote)重组作图(recombination mapping)是根据基因间重组率进行基因定位末端(outside marker),受体部位(recept site):外源DNA片段进入受体细菌形成临时性通道的特定区域感受态细胞(receptor site):能接受外源DNA分子并被转化的细菌细胞感受态因子(competence factor):促进转化作用的酶或蛋白质分子噬菌体所携带供体(细菌)染色体片段是完全随机的,即供体基因组中所有基因具有同等机会被转导形成部分二倍体,经交换和重组后,形成转导频率大致相等的不同转导子,这种转导称为普遍性转导(general transduction)共转导或并发转导(cotransduction):指两个基因同时转导的现象,如果两个基因共转导的频率愈高,表明两个基因连锁愈紧密,相反共转导频率愈低,则表明两个基因距离愈远双因子转导(two-factor transduction)实验:就是每次观察两个基因的转导,通过每两个基因的共转导频率确定这些基因在染色体上的顺序溶菌酶(lysozyme)原噬菌体(prophage)或原病毒(provirus):是指整合到宿主染色体中的噬菌体基因组溶源性(lysogeny):有些细菌带有某种噬菌体,但并不立即导致溶菌,这种现象称为溶源性;这种细菌称为溶源性细菌或溶源菌(lysogenic bacterium),此过程称为溶源周期裂解途径:裂解周期(lytic cycle)溶源途径:溶源周期(lysogenic cycle)条件致死突变型。
遗传育种相关名词中英文对照
中英文对照的分子育种相关名词3'untranslated region (3'UTR) 3'非翻译区5'untranslated region (5; UTR) 5'非翻译区A chromosome A 染色体AATAAA 多腺苷酸化信号aberration 崎变abiogenesis 非生源说accessory chromosome 副染色体accessory nucleus 副核accessory protein 辅助蛋白accident variance 偶然变异Ac-Ds system Ac-Ds 系统acentric chromosome 无着丝粒染色体acentric fragment 无着丝粒片段acentric ring 无着丝粒环achromatin 非染色质acquired character 获得性状acrocentric chromosome 近端着丝粒染色体acrosyndesis 端部联会activating transcription factor 转录激活因子activator 激活剂activator element 激活单元activator protein( AP)激活蛋白activator-dissociation system Ac-Ds 激活解离系统active chromatin 活性染色质active site 活性部位adaptation 适应adaptive peak 适应高峰adaptive surface 适应面addition 附加物addition haploid 附加单倍体addition line 附加系additive effect 加性效应additive gene 加性基因additive genetic variance 加性遗传方差additive recombination 插人重组additive resistance 累加抗性adenosine 腺昔adenosine diphosphate (ADP )腺昔二鱗酸adenosine triphosphate( ATP)腺昔三憐酸adjacent segregation 相邻分离A-form DNA A 型DNAakinetic chromosome 无着丝粒染色体akinetic fragment 无着丝粒片断alien addition monosomic 外源单体生物alien chromosome substitution 外源染色体代换alien species 外源种alien-addition cell hybrid 异源附加细胞杂种alkylating agent 焼化剂allele 等位基因allele center 等位基因中心allele linkage analysis 等位基因连锁分析allele specific oligonucleotide(ASO)等位基因特异的寡核苷酸allelic complement 等位(基因)互补allelic diversity 等位(基因)多样化allelic exclusion 等位基因排斥allelic inactivation 等位(基因)失活allelic interaction 等位(基因)相互作用allelic recombination 等位(基因)重组allelic replacement 等位(基因)置换allelic series 等位(基因)系列allelic variation 等位(基因)变异allelism 等位性allelotype 等位(基因)型allohaploid 异源单倍体allopatric speciation 异域种alloploidy 异源倍性allopolyhaploid 异源多倍单倍体allopolyploid 异源多倍体allosyndesis 异源联会allotetraploid 异源四倍体alloheteroploid 异源异倍体alternation of generation 世代交替alternative transcription 可变转录alternative transcription initiation 可变转录起始Alu repetitive sequence, Alu family Alu 重复序列,Alu 家族ambiguous codon 多义密码子ambisense genome 双义基因组ambisense RNA 双义RNAaminoacyl-tRNA binding site氨酰基tRNA接合位点aminoacyl-tRNA synthetase 氨酰基tRNA连接酶amixis 无融合amorph 无效等位基因amphipolyploid 双多倍体amplicon 扩增子amplification 扩增amplification primer 扩增引物analysis of variance 方差分析anaphase (分裂)后期anaphase bridge (分裂)后期桥anchor cell 锚状细胞androgamete 雄配子aneuhaploid 非整倍单倍体aneuploid 非整倍体animal genetics 动物遗传学annealing 复性antibody 抗体anticoding strand 反编码链anticodon 反密码子anticodon arm 反密码子臂anticodon loop 反密码子环antiparallel 反向平行antirepressor 抗阻抑物antisense RNA 反义RNAantisense strand 反义链apogamogony 无融合结实apogamy 无配子生殖apomixis 无融合生殖arm ratio (染色体)臂比artificial gene人工基因artificial selection 人工选择asexual hybridization 无性杂交asexual propagation 无性繁殖asexual reproduction 无性生殖assortative mating 选型交配asynapsis 不联会asynaptic gene 不联会基因atavism 返祖atelocentric chromosome 非端着丝粒染色体attached X chromosome 并连X 染色体attachment site 附着位点attenuation 衰减attenuator 衰减子autarchic gene 自效基因auto-alloploid 同源异源体autoallopolyploid 同源异源多倍体autobivalent 同源二阶染色体auto-diploid 同源二倍体;自体融合二倍体autodiploidization 同源二倍化autoduplication 自体复制autogenesis自然发生autogenomatic 同源染色体组autoheteroploidy 同源异倍性autonomous transposable element 自主转座单元autonomously replicating sequence(ARS)自主复制序列autoparthenogenesis 自发单性生殖autopolyhaploid 同源多倍单倍体autopolyploid 同源多倍体autoradiogram 放射自显影图autosyndetic pairing 同源配对autotetraploid 同源四倍体autozygote 同合子auxotroph 营养缺陷体B chromosome B 染色体B1,first backcross generation 回交第一代B2,second backcross generation 回交第二代back mutation 回复突变backcross 回交backcross hybrid 回交杂种backcross parent 回交亲本backcross ratio 回交比率background genotype 背景基因型bacterial artification chromosome( BAC )细菌人工染色体Bacterial genetics 细菌遗传学Bacteriophage 噬菌体balanced lethal 平衡致死balanced lethal gene 平衡致死基因balanced linkage 平衡连锁balanced load 平衡负荷balanced polymorphism 平衡多态现象balanced rearrangements 平衡重组balanced tertiary trisomic 平衡三级三体balanced translocation 平衡异位balancing selection 平衡选择band analysis 谱带分析banding pattern (染色体)带型basal transcription apparatus 基础转录装置base analog 碱基类似物base analogue 类減基base content 减基含量base exchange 碱基交换base pairing mistake 碱基配对错误base pairing rules 碱基配对法则base substitution 减基置换base transition 减基转换base transversion 减基颠换base-pair region 碱基配对区base-pair substitution 碱基配对替换basic number of chromosome 染色体基数behavioral genetics 行为遗传学behavioral isolation 行为隔离bidirectional replication 双向复制bimodal distribution 双峰分布binary fission 二分裂binding protein 结合蛋白binding site 结合部位binucleate phase 双核期biochemical genetics 生化遗传学biochemical mutant 生化突变体biochemical polymorphism 生化多态性bioethics 生物伦理学biogenesis 生源说bioinformatics 生物信息学biological diversity 生物多样性biometrical genetics 生物统计遗传学(简称生统遗传学) bisexual reproduction 两性生殖bisexuality 两性现象bivalent 二价体blending inheritance 混合遗传blot transfer apparatus 印迹转移装置blotting membrane 印迹膜bottle neck effect 瓶颈效应branch migration 分支迁移breed variety 品种breeding 育种,培育;繁殖,生育breeding by crossing 杂交育种法breeding by separation 分隔育种法breeding coefficient 繁殖率breeding habit 繁殖习性breeding migration 生殖回游,繁殖回游breeding period 生殖期breeding place 繁殖地breeding population 繁殖种群breeding potential繁殖能力,育种潜能breeding range 繁殖幅度breeding season 繁殖季节breeding size 繁殖个体数breeding system 繁殖系统breeding true 纯育breeding value 育种值broad heritability 广义遗传率bulk selection 集团选择C0,acentric 无着丝粒的Cl,monocentric 单着丝粒C2, dicentric双着丝粒的C3,tricentric 三着丝粒的candidate gene 候选基因candidate-gene approach 候选基因法Canpbenmodel坎贝尔模型carytype染色体组型,核型catabolite activator protein 分解活化蛋白catabolite repression 分解代谢产物阻遏catastrophism 灾变说cell clone 细胞克隆cell cycle 细胞周期cell determination 细胞决定cell division 细胞分裂cell division cycle gene(CDC gene) 细胞分裂周期基因ceU division lag细胞分裂延迟cell fate 细胞命运cell fusion 细胞融合cell genetics 细胞的遗传学cell hybridization 细胞杂交cell sorter细胞分类器cell strain 细胞株cell-cell communication 细胞间通信center of variation 变异中心centimorgan(cM) 厘摩central dogma 中心法则central tendency 集中趋势centromere DNA 着丝粒DNAcentromere interference 着丝粒干扰centromere 着丝粒centromeric exchange ( CME)着丝粒交换centromeric inactivation 着丝粒失活centromeric sequence( CEN sequence)中心粒序列character divergence 性状趋异chemical genetics 化学遗传学chemigenomics 化学基因组学chiasma centralization 交叉中化chiasma terminalization 交叉端化chimera异源嵌合体Chi-square (x2) test 卡方检验chondriogene 线粒体基因chorionic villus sampling 绒毛膜取样chromatid abemition染色单体畸变chromatid break染色单体断裂chromatid bridge 染色单体桥chromatid interchange 染色单体互换chromatid interference 染色单体干涉chromatid tetrad 四分染色单体chromatid translocation 染色单体异位chromatin agglutination 染色质凝聚chromosomal aberration 染色体崎变chromosomal assignment 染色体定位chromosomal banding 染色体显带chromosomal disorder 染色体病chromosomal elimination 染色体消减chromosomal inheritance 染色体遗传chromosomal interference 染色体干扰chromosomal location 染色体定位chromosomal locus 染色体位点chromosomal mutation 染色体突变chromosomal pattern 染色体型chromosomal polymorphism 染色体多态性chromosomal rearrangement 染色体质量排chromosomal reproduction 染色体增殖chromosomal RNA 染色体RNA chromosomal shift 染色体变迁,染色体移位chromosome aberration 染色体畸变chromosome arm 染色体臂chromosome banding pattern 染色体带型chromosome behavior 染色体动态chromosome blotting 染色体印迹chromosome breakage 染色体断裂chromosome bridge 染色体桥chromosome coiling 染色体螺旋chromosome condensation 染色体浓缩chromosome constriction 染色体缢痕chromosome cycle 染色体周期chromosome damage 染色体损伤chromosome deletion 染色体缺失chromosome disjunction 染色体分离chromosome doubling 染色体加倍chromosome duplication 染色体复制chromosome elimination染色体丢失chromosome engineering 染色体工程chromosome evolution 染色体进化chromosome exchange 染色体交换chromosome fusion 染色体融合chromosome gap 染色体间隙chromosome hopping 染色体跳移chromosome interchange 染色体交换chromosome interference 染色体干涉chromosome jumping 染色体跳查chromosome knob 染色体结chromosome loop 染色体环chromosome lose染色体丢失chromosome map 染色体图chromosome mapping 染色体作图chromosome matrix 染色体基质chromosome mutation染色体突变chromosome non-disjunction染色体不分离chromosome paring染色体配对chromosome polymorphism 染色体多态性chromosome puff染色体疏松chromosome rearrangement染色体质量排chromosome reduplication 染色体再加倍chromosome repeat染色体质量叠chromosome scaffold 染色体支架chromosome segregation 染色体分离chromosome set 染色体组chromosome stickiness染色体粘性chromosome theory of heredity 染色体遗传学说chromosome theory of inheritance 染色体遗传学说chromosome thread 染色体丝chromosome walking 染色体步查chromosome-mediated gene transfer 染色体中介基因转移chromosomology 染色体学CIB method CIB法;性连锁致死突变出现频率检测法circular DNA 环林DNAcis conformation 顺式构象cis dominance 顺式显性cis-heterogenote顺式杂基因子cis-regulatory element 顺式调节兀件cis-trans test 顺反测验cladogram 进化树cloning vector 克隆载体C-meiosis C减数分裂C-metaphase C 中期C-mitosis C有丝分裂code degeneracy 密码简并coding capacity 编码容量coding ratio 密码比coding recognition site 密码识别位置coding region 编码区coding sequence 编码序列coding site 编码位置coding strand 密码链coding triplet 编码三联体codominance 共显性codon bias 密码子偏倚codon type 密码子型coefficient of consanguinity 近亲系数coefficient of genetic determination 遗传决定系数coefficient of hybridity 杂种系数coefficient of inbreeding 近交系数coefficient of migration 迁移系数coefficient of relationship 亲缘系数coefficient of variability 变异系数coevolution 协同进化coinducer 协诱导物cold sensitive mutant 冷敏感突变体colineartiy 共线性combining ability 配合力comparative genomics 比较基因组学competence 感受态competent cell感受态细胞competing groups 竞争类群competition advantage 竞争优势competitive exclusion principle 竞争排斥原理complementary DNA (cDNA)互补DNA complementary gene 互补基因complementation test 互补测验complete linkage 完全连锁complete selection 完全选择complotype 补体单元型composite transposon 复合转座子conditional gene 条件基因conditional lethal 条件致死conditional mutation 条件突变consanguinity 近亲consensus sequence 共有序列conservative transposition 保守转座constitutive heterochromatin 组成型染色质continuous variation 连续变异convergent evolution 趋同进化cooperativity 协同性coordinately controlled genes 协同控制基因core promoter element 核心启动子core sequence 核心序列co-repressor协阻抑物correlation coefficient相关系数cosegregation 共分离cosuppression 共抑制cotranfection 共转染cotranscript共转录物cotranscriptional processing共转录过程cotransduction 共转导cotransformation 共转化cotranslational secrection 共翻译分泌counterselection 反选择coupling phase 互引相covalently closed circular DNA(cccDNA)共价闭合环状DNA covariation 相关变异criss-cross inheritance 交叉遗传cross 杂交crossability 杂交性crossbred 杂种cross-campatibility 杂交亲和性cioss-infertility 杂交不育性crossing over 交换crossing-over map 交换图crossing-over value 交换值crossover products 交换产物crossover rates 交换率crossover reducer 交换抑制因子crossover suppressor 交换抑制因子crossover unit 交换单位crossover value 值crossover-type gamete 交换型配子C-value paradox C 值悖论cybrid 胞质杂种cyclin 细胞周期蛋白cytidme 胞苷cytochimera 细胞嵌合体cytogenetics 细胞遗传学cytohet 胞质杂合子cytologic 细胞学的cytological map 细胞学图cytoplasm细胞质cytoplasmic genome 胞质基因组cytoplasmic heredity 细胞质遗传cytqplasmic incompatibility 细胞质不亲和性cytoplasmic inheritance 细胞质遗传cytoplasmic male sterility 细胞质雄性不育cytoplasmic mutation 细胞质突变cytofdasmic segregation 细胞质分离cytoskeleton 细胞骨架Darwin 达尔文Darwinian fitness 达尔文适合度Darwinism 达尔文学说daughter cell 子细胞daughter chromatid 子染色体daughter chromosome 子染色体deformylase 去甲酰酶degenerate code 简并密码degenerate primer 简并引物degenerate sequence 简并序列degenerated codon 简并密码子degeneration 退化degree of dominance 显性度delayed inheritance 延迟遗传deletant 缺失体deletion 缺失deletion loop 缺失环deletion mapping 缺失作图deletion mutation 缺失突变denatured DNA 变性DNA denatured protein 变性蛋白denaturing gel 变性胶denaturing gel electrophoresis 变性凝胶电泳denaturing gradient polyacrylamide gel 变性聚丙稀酰胺凝胶density gradient centrifugation 密度梯度离心density gradient separation 密度梯度分离deoxyribonucleic acid-dependent DNA polymerase 依赖于DNA的DNA聚合酶derived line 衍生系derived type 衍生类型developmental genetics 发育遗传学developmental pathway 发育途径dicentric bridge 双粒染色体桥dicentric chromosome 双着丝粒染色体differential staining technique 显带技术differentiation center 分化中心dihaploid 双单倍体,dihybrid 双因子杂种dihybrid cross 双因子杂交dimorphism 二态性diploidization 二倍化diploidize 二倍化diploidized haploid 二倍化的单倍体direct cross 正交direct repeat 同向重复(序列)direct selection 正选择directed mutagenesis 正向突变discontinuous variation 不连续变异distant hybrid 远缘杂种distant hybridization 远缘杂交diversity center 多样性中心diversity curve 多样性曲线diversity gene ( D gene) D 基因diversity indices 多样性指数diversity of species 种的多样性diversity region ( D region) D 区;多变区DNA alkylation DNA 烧化DNA amplification DNA 扩增DNA amplification in vitro DNA 体外扩增DNA amplification polymorphism DNA 扩增多态性DNA breakage DNA 断裂DNA database DNA 数据库DNA degradation DNA 降解DNA denaturation DNA 变性DNA detection DNA 检测DNA distortion DNA 变形DNA duplex DNA 双链体DNA duplicase DNA 复合酶DNA element DNA 单元DNA evolution DNA 进化DNA fingerprint DNA 指纹DNA fingerprinting DNA 指纹分析DNA homology DNA 同源性DNA hybridization DNA 杂交DNA jumping technique DNA 跳查技术DNA melting DNA 解链DNA methylation DNA 甲基化DNA modification DNA 修饰DNA modification restriction system DNA 修饰限制系统DNA nicking DNA 切口形成DNA oxidation DNA 氧化DNA packaging DNA 包装DNA pairing DNA 配对DNA pitch DNA 螺距DNA polymorphism DNA 多态性DNA probe DNA 探针DNA puff DNA 泡DNA purification DNA 纯化DNA recombination DNA 重组DNA redundant 多余DNADNA repair DNA 修复DNA replication DNA 复制DNA replication enhancer DNA 复制增强子DNA replication origin DNA 复制起点DNA replication site DNA 复制点DNA sealase DNA 连接酶DNA sequence analysis DNA 序列分析DNA sizing gene DNA大小决定基因DNA strand exchange DNA 链交换DNA strand separation DNA 链分离DNA strand transfer protein DNA 链转移蛋白DNA template DNA 模板DNA thermal cycler DNA 热循环仪DNA topoisomerase DNA 拓扑异构酶DNA transcript DNA 转录物DNA transposon DNA 转座子DNA twist DNA 扭曲DNA typing DNA 分型DNA untwisting DNA 解旋DNA unwinding enzyme DNA 解旋酶DNA unwinding protein DNA 解旋蛋白DNA-agar technique DNA 琼脂技术DNAase I footprinting DNA 酶I 足迹法DNAase-free reagent 无DNA 酶试剂DNA-binding domain DNA 结合域DNA-binding motif DNA 结合基序DNA-binding protein DNA 结合蛋白DNA-polymerase DNA 聚合酶DNA-protein complex DNA -蛋白质复合体DNA-protein interaction DNA _ 蛋白质相互作用DNA-restriction enzyme DNA 限制酶DNA-RNA hybrid DNA-RNA 杂交体DNase-free 不含DNA 酶的dominance 显性dominance type 优势型dominance variance 显性方差dominant allele 显性等位基因dominant effect 显性效应dominant gene 显性基因dominant gene mutation 显'性基因突变dominant lethal 显性致死dominant phenotype 显性表型donor DNA 供体DNAdonor organism 供体生物dosage compensation 剂量补偿作用dotting blotting 点溃法double crossing over 双交换double fertilization 汉受精duplicate genes 重复基因duplication重复duplicon 重复子dyad 二分体dynamic selection 动态选择ecological genetics 生态遗传学ecological isolation 生态隔离ecological niche 生态小境ectopic expression 异位表达ectopic integration 异位整合effective population size 有效群体大小embryoid 胚状体embryonic stem cells( ES cells)胚胎干细胞endocrine signal 内分泌信号endogamy 近亲繁殖endomitosis 核内有丝分裂endonuclease 内切核酸酶endopolyploidy 核内多倍体environment 环境environmental variance 环境方差environmental variation 环境变异epigenesis 后成说epigenetic inheritance 后生遗传epigenetically silenced 后生沉默episome 附加体epistasis 上位性epistatic dominance 超显性epistatic gene 上位基因equal segregation 均等分离equational division 均等分裂equilibrium population 平衡群体Expressed Sequence Tag(EST)表达序列标签euchromatin 常染色质euchromatin常染色质eugenics 优生学euhaploid 整单倍体eukaryote 真核生物eukaryotic chromosome 真核染色体eukaryotic cell 真核细胞eukaryotic organism 真核生物eukaryotic vector 真核载体euphenics 优型学euploid 整倍体evolutional load 进化负荷evolutionary divergence 进化趋异evolutionary genetics 进化遗传学evolutionaiy rate 进化速率excision repair 切除修复exconjugant 接合后体excretion vector 分泌型载体exit site 萌发点exogenote 外基因子exogenous gene 外源基因exonuclease 外切核酸酶expression cloning 表达克隆expression library 表达文库expression mutation 表达突变expression plasmid 表达质粒expression product 表达产物expression screening 表达筛选extinguisher loci 消失基因座,灭绝基因座extirpated species 绝迹种extrachromosomal inheritance 染色体外遗传extra-chromosome超数染色体,额外染色体extranuclear inheritance 核外遗传F1 generation F1代,子一代F2 generation F2 代,子二代facultative heterochromatin 兼性异染色质familial trait 家族性状family selection 家系选择feedback suppression 反馈抑制female gamete 雌配子fertility factor 致育因子filial generation 子代fingerprint 指纹finite population 有限群体first division segregation 第一次分裂分离first division segregation pattern 第一次分裂分离模式flanking sequence 侧翼序列flow cytometry 流式细胞仪fluorescence in situ hybridization ( FISH )荧光原位杂交fluorescent primer 荧光引物fluorescent probe 荧光探针formyl methionine (fMet)甲酰甲硫氨酸foot printing 足迹法foreign DNA 外源DNAforward genetics 正向遗传学forward mutation 正向突变forward primer 正向引物founder effect 建立者效应four strand double crossing over 四线双交换full-sib 全同胞functional genomics 功能基因组学functional RNA 功能RNAgain-of-function mutation 功能获得性突变gamete 配子gametic 配子的gametic incompatibility 配子不亲和性gametic lethal 配子致死gametic linkage 配子连锁gametic meiosis 配子减数分裂gametic ratio 配子分离比gametoclonal variation 配子无性系变异gametophyte 配子体G-band G带;中期染色体带GC box GC 框GC tailing GC 加尾gel electrophoresis 凝胶电泳gemetic sterility 配子不育gene activation 基因激活gene activity 基因活性gene amplification 基因扩增gene analysis 基因分析gene arrangement 基因排列gene balance 基因平衡gene basis 基因基础gene batteries 基因群gene block 基因区段gene carrier 基因携带者gene center theory 基因中心学说gene cluster 基因簇gene combination 基因重组gene complex 基因复合体gene content 基因含量gene conversion 基因转换gene distribution 基因分布gene diversity 基因多样性gene dosage 基因剂量gene dosage compensation 基因剂量补偿gene dosage effect 基因剂量效应gene duplication 基因重复gene element 基因元件gene exchange 基因交流gene expression 基因表达gene expression system 基因表达系统gene family 基因家族gene fixation 基因固定gene flow 基因流gene frequency 基因频率gene fusion 基因融合gene inactivation 基因失活gene inoculation 基因接种gene interaction 基因相互作用gene isolation 基因分离gene knockout 基因敲除gene knock-out 基因失效法gene linkage 基因连锁gene localization 基因定位gene location 基因位置gene locus 基因位点gene magnification 基因扩增gene manipulation 基因操作gene map 基因图谱gene mapping 基因作图gene multiplication 基因重复gene mutation 基因突变gene mutation rate 基因突变频率gene order 基因次序gene organization 基因组构gene pool 基因库gene position effect 基因位置效应gene probe 基因探针gene product 基因产物gene rearrangement 某因重排gene reassortment 基因重新配对gene replication 基因复制gene repression 基因抑制gene resortment 基因重配gene silencing 基因沉默gene splicing 基因剪接gene string 基因线gene structure 基因结构gene substitute 基因置换gene substitution 基因置换gene suppression 基因抑制gene synthesis 基因合成gene tagged 基因标签gene tagging 基因标签gene targeting 基因导向,基因寻靶gene transfer 基因转移gene transfer agent 基因传递因子gene transfer vector 基因转移载体gene transposition 基因转座genealogical classification 系谱分类genera 属general transcription factor ( GTF )通用转录因子generalized transduction 普遍性转导generation 世代generative cell 生殖细胞generative reproduction 有性繁殖generic coefficient 种属系数generic cross 属间杂交generic name 属名genes in common 共同基因gene-specific transcription factor 基因特异性转录因子genetic ablation 基因缺损genetic advance 遗传进度genetic algebra 遗传代数genetic analysis 遗传分析genetic background 遗传背景genetic balance 遗传平衡genetic block 遗传性阻碍genetic compensation 遗传补偿genetic complementation 遗传互补genetic composition 遗传组成genetic continuity 遗传连续性genetic control 遗传控制genetic covariance 遗传协方差genetic cross 杂交genetic database 遗传数据库genetic death 遗传性死亡genetic deficiency 遗传缺损genetic deformity 基因变型genetic determinant 遗传决定因子genetic dimorphism 遗传二型现象genetic distance 遗传距离genetic divergence 遗传趋异genetic diversity 遗传多样性genetic dominance 遗传优势genetic donor 基因供体genetic drift 遗传漂变genetic element遗传因子,遗传成分genetic engineering 遗传工程genetic equilibrium 遗传平衡genetic erosion 遗传冲刷,遗传蚀变genetic expression 遗传表达genetic extinction 遗传灭绝genetic facilitation 遗传促进作用genetic factor 遗传因子genetic feedback 遗传反馈genetic fingerprint 遗传指纹genetic fingerprinting 遗传指纹分析genetic fitness 遗传适合度genetic flexibility 遗传可塑性genetic gain 遗传获得量genetic heterogeneity 遗传异质性genetic homology 遗传同源genetic immunity 遗传免疫genetic imprinting 遗传印记genetic inertia 遗传惰性genetic information 遗传信息genetic inoculation 基因接种genetic instability 遗传不稳定性genetic continuity 遗传连续性genetic control 遗传控制genetic covariance 遗传协方差genetic cross 杂交genetic database 遗传数据库genetic death 遗传性死亡genetic deficiency 遗传缺损genetic deformity 基因变型genetic determinant 遗传决定因子genetic dimorphism 遗传二型现象genetic distance 遗传距离genetic divergence 遗传趋异genetic diversity 遗传多样性genetic dominance 遗传优势genetic donor 基因供体genetic drift 遗传漂变genetic element遗传因子,遗传成分genetic engineering 遗传工程genetic equilibrium 遗传平衡genetic erosion 遗传冲刷,遗传蚀变genetic expression 遗传表达genetic extinction 遗传灭绝genetic facilitation 遗传促进作用genetic factor 遗传因子genetic feedback 遗传反馈genetic fingerprint 遗传指纹genetic fingerprinting 遗传指纹分析genetic fitness 遗传适合度genetic flexibility 遗传可塑性genetic gain 遗传获得量genetic heterogeneity 遗传异质性genetic homology 遗传同源genetic immunity 遗传免疫genetic imprinting 遗传印记genetic inertia 遗传惰性genetic information 遗传信息genetic inoculation 基因接种genetic instability 遗传不稳定性genetic interaction 遗传相互作用genetic isolating factor 遗传隔离因子genetic isolation 遗传隔离genetic knock-out experiment 基因失效试验genetic linkage 遗传连锁genetic linkage map 遗传连锁图谱genetic load 遗传负荷genetic manipulation 遗传操作genetic map 遗传图谱genetic mapping 遗传作图genetic marker 遗传标记genetic masking 基因组掩饰genetic material 遗传物质genetic mobilization 遗传转移genetic modification 遗传修饰genetic module 遗传组件genetic nomenclature 遗传命名法genetic parameter 遗传参数genetic polarity 遗传极性genetic polymorphism 遗传多样性genetic population 遗传群体genetic potential 遗传潜力genetic process 遗传过程genetic property 遗传特'性genetic ratio 遗传比genetic reactivation 遗传复活genetic reassortment 遗传重排genetic recipient 基因受体genetic recombination 遗传重组genetic regulation 遗传调节genetic relationship 亲缘关系genetic repair mechanism 遗传修复机制genetic replication 遗传复制genetic risk 遗传危险性genetic screening 遗传筛查genetic segregation 遗传分离genetic selection 遗传选择genetic sex 遗传性别genetic shift 遗传漂移genetic stability 遗传稳定性genetic sterility 遗传性不育genetic strain 遗传品系genetic suppression 遗传抑制genetic switch 遗传开关genetic system 遗传体系genetic transcription 遗传转录genetic transformation 遗传转换genetic translation 遗传翻译genetic transmission 遗传传递genetic typing 遗传分型genetic unit 遗传单位genetic value 遗传值genetic variability 遗传变异性genetic variance 遗传方差genetic vulnerability 遗传易损性genetic“hot spot” 遗传“热点”genetical marker 遗传标记genetical non-disjunction 遗传不分离genetical population 遗传群体genetically heterogeneous 遗传异质的genetically modified organism 基因修饰生物genetics correction 遗传修正genetics of resistance 抗性遗传genetype 基因型genic balance 基因平衡genome allopolyploid 基因组异质多倍体genome amplification 基因组扩增genome evolution 基因组进化genome mapping 基因组作图genome project 基因组计划genome rearrangement 基因组重排genome sequencing 基因组测序genomic exclusion 基因组排斥genomic fingerprinting 基因组指纹分析genomic footprinting 基因组足迹分析genomic imprinting 基因组印记genomic instability 基因组不稳定性genomic library 基因组文库genomic walking 基因组步查genotypic frequency 基因型频率genotypic ratio 基因型比值genotypic value 基因型值genotypic variance 基因型方差geographic speciation 地理型新种形成geographical isolation 地理隔离geographical polymorphism 地理多态现象germ layer 胚层germ line 种系germ nucleus 生殖核germ plasm 种质germinal mutation 生殖细胞突变germ-line gene therapy 种系基因治疗giant chromosome 巨型染色体global homology 总体同源性global region 全局调节子globular protein 球蛋白group selection 集团选择growth factor 生长因子GT-AG rule mRNA剪接识别信号规则gynandromorphy 雌雄嵌合体hairpin loop 发夹环hairpin structure 发夹结构half life 半寿期half sib mating 半同胞交配haplogenotypic 单倍基因型的haploid 单倍体haploidization 单倍体化haplotype 单元型hapostatic gene 下位基因Hardy-Weinberg equilibrium 哈迪-温伯格平衡heat shock gene 热激基因heat sock protein 热激蛋白heavy chain 重链helical structure 螺旋结构。
基因组节段重复和串联重复
基因组节段重复和串联重复英文回答:Genomic Segmental Duplications and Tandem Duplications.Genomic segmental duplications (SDs) and tandem duplications are two types of structural variants that involve the duplication of DNA segments. They are both common features of the human genome, and they playimportant roles in genome evolution and function.Segmental Duplications.Segmental duplications are large (>1 kb) duplicationsof DNA that are dispersed throughout the genome. They are typically several kilobases to several megabases in size, and they can be either tandemly arranged or inverted. Segmental duplications are often highly homologous,with >95% sequence identity between the duplicated segments.Segmental duplications are thought to arise from unequal crossing-over events during meiosis. These events can occur when two homologous chromosomes misalign during synapsis, leading to the exchange of genetic material between the chromosomes. If the exchanged segments are large, they can result in the formation of segmental duplications.Segmental duplications are a significant source of genetic variation in the human genome. They can contain genes, regulatory elements, and other functional elements. This can lead to changes in gene expression, gene regulation, and other genomic functions. Segmental duplications have also been implicated in a number of human diseases, including cancer, autism, and schizophrenia.Tandem Duplications.Tandem duplications are duplications of DNA segments that are located adjacent to each other on the same chromosome. They are typically small (1-100 bp) in size, and they can be either direct (in the same orientation) orinverted (in the opposite orientation). Tandem duplications are often highly homologous, with >95% sequence identity between the duplicated segments.Tandem duplications are thought to arise from a variety of mechanisms, including slipped-strand mispairing during DNA replication, unequal crossing-over events, and gene amplification. Slipped-strand mispairing occurs when the DNA template strand is misaligned with the newly synthesized strand during DNA replication. This can lead to the duplication of a short segment of DNA. Unequalcrossing-over events can also lead to the formation of tandem duplications if the misaligned chromosomes exchange genetic material in a non-reciprocal manner. Gene amplification is a process by which a gene is duplicated multiple times in a tandem array. This can occur through a variety of mechanisms, including unequal crossing-over and retrotransposition.Tandem duplications are a common feature of the human genome. They can be found in both coding and non-coding regions of the genome. Tandem duplications can have avariety of effects on gene expression and function. For example, tandem duplications in coding regions can lead to changes in protein structure and function. Tandem duplications in non-coding regions can affect generegulation by altering the binding sites for transcription factors and other regulatory proteins. Tandem duplications have also been implicated in a number of human diseases, including cancer, neurodegenerative diseases, and developmental disorders.中文回答:基因组片段重复和串联重复。
duplicate content check result
Duplicate Content Check ResultIntroductionIn the digital world, the presence of duplicate content can have a detrimental impact on the performance and ranking of a website.Duplicate content refers to any piece of content that appears on the internet in more than one location. This can occur within a single website or across different websites. The search engines, such as Google, actively work to identify and penalize duplicate content to provideusers with the most relevant and unique results. In this article, wewill explore the concept of duplicate content, its impact on SEO, andthe tools and techniques to check and manage it effectively.Understanding Duplicate ContentDuplicate content can exist in various forms. It can be an exact copy of a page, a section of content within a page, or even a few sentences or paragraphs. Content duplication can occur unintentionally or due to malicious actions, such as content scraping. It is important to distinguish between duplicate content and syndicated content, where the content is legitimately shared with permission and proper attribution.Search engines strive to deliver the most valuable and unique content to users. When multiple pages contain the same or substantially similar content, search engines may have difficulty determining which page isthe most relevant and authoritative. As a result, they may choose to show only one version of the content or rank it lower in search engine results pages (SERPs).Impact of Duplicate Content on SEO1.Keyword Cannibalization: When multiple pages on a website targetthe same keywords, they compete with each other for ranking. Thisleads to a dilution of ranking potential and decreased visibilityin search results.2.Lower Search Engine Rankings: Search engines penalize websitesthat engage in content duplication. This can result in lowerrankings, decreased organic traffic, and ultimately, lowerconversion rates.3.Wasted Crawl Budget: Search engine crawlers have a limited budgetto crawl and index a website’s pages. When duplicate contentexists, it consumes the crawl budget for redundant content,leaving fewer resources for valuable pages.4.Loss of Backlinks: Backlinks play a crucial role in SEO. Whenduplicate content exists, it may lead to multiple versions of the same page receiving backlinks. This results in a dilution ofbacklink authority and potential loss of valuable inbound links.Detecting Duplicate ContentTo identify duplicate content, webmasters and SEO professionals can employ several tools and techniques. These enable them to find instances of duplication, analyze the severity of the issue, and take appropriate measures.1. Manual ReviewA manual review involves visually inspecting website content to identify any duplicated sections or pages. This method is suitable for small websites or specific pages but may become time-consuming for larger websites.2. Google Search ConsoleGoogle Search Console provides a range of tools to help webmasters monitor their websites’ performance. The “Index Coverage” report can identify pages with duplicate content issues. Additi onally, the “HTML Improvements” section can flag potential duplicate meta descriptions or title tags.3. SEO Auditing ToolsSeveral SEO auditing tools, such as Screaming Frog, SEMrush, and Ahrefs, offer features to identify and analyze duplicate content. These toolscrawl the website, analyze the content, and provide detailed reports highlighting instances of duplication.Managing Duplicate ContentOnce duplicate content has been identified, webmasters must take appropriate action to manage it effectively. The chosen method depends on various factors, including the extent of duplication, website structure, and available resources.1. CanonicalizationCanonicalization is the process of specifying a preferred version of a web page that search engines should consider as the source of content. By implementing canonical tags, webmasters can consolidate duplicate pages under a single canonical URL, indicating that it is the primary version.2. 301 RedirectsPermanent 301 redirects direct users and search engines from one URL to another. They are useful when consolidating duplicate content by redirecting all duplicate URLs to a single, preferred URL. This ensures that search engines understand which version is the original and authoritative source.3. Noindex TagThe noindex meta tag instructs search engines not to index a particular page. When applied to duplicate content pages, it prevents them from appearing in search results altogether. This method is suitable for pages that do not need to be indexed or have little value on their own.4. Content ConsolidationIn some cases, it may be beneficial to consolidate fragmented or similar content into a more comprehensive and unique page. By merging or rewriting duplicate content, webmasters can provide users and search engines with a single, authoritative source of information.ConclusionDuplicate content can negatively impact a website’s SEO performance and user experience. It is crucial for webmasters and SEO professionals to understand the concept, detect instances of duplication, and implement appropriate measures to manage it effectively. By employing tools, implementing canonicalization, redirects, or using the noindex tag, webmasters can ensure that their websites provide unique and valuable content to both users and search engines. Regular monitoring and maintenance of duplicate content issues are essential to maintain a strong online presence and achieve better search engine rankings.。
sci重复率的英文表达
sci重复率的英文表达The English expression for "sci重复率" is "repetitionrate of scientific literature". It refers to the frequency at which scientific articles or studies are published repeatedly by the same authors or in the same journals within a specific timeframe.Here are 22 bilingual example sentences:1.这个研究领域的重复率极高,同一篇论文被多次发表,缺乏进一步的研究创新。
The repetition rate in this research field is alarmingly high, with the same paper published multiple times, lacking further research innovation.2.这个期刊的重复率很低,每篇发表的论文都是全新的研究成果。
The repetition rate of this journal is very low, witheach published paper being a completely new research finding.3.良好的研究伦理准则对于降低科学文献的重复率至关重要。
Good research ethics guidelines are crucial in reducing the repetition rate in scientific literature.4.重复率高又不注明引用来源的行为将被视为学术不端行为。
High repetition rates without indicating the source of citation will be considered as academic misconduct.5.多次研究表明,高重复率在科学研究中造成了信息过载和资源的浪费。
转博考试历年考试题
2000年分子生物学专业博士研究生理论课程笔试试题1.进行PCR扩增可使用哪些类型的DNA聚合酶,各有哪些特征?2.有哪些类型的载体可用于构建DNA文库,各有哪些特征?3.在通过生物化学方法分离到某种哺乳动物的一种蛋白活性因子后,若要在E.coli中大量表达该蛋白质,试问如何分离得到编码该蛋白质的基因?4.从原核生物中获得一个新的杀虫基因后,将该基因转移到某一植物,如何检测该转基因植株是否产生该杀虫基因所编码的蛋白质?5.2000年,国际人类基因组合作计划和美国孟山都公司分别宣布,完成了人类和水稻基因组的工作草图(working draft),这两个工作草图的基本含义(如测序完成的大致情况)是什么?6.有人将生物全基因组序列比作“天书”,以你所熟悉的生物为例,谈谈你将如何破译这部“天书”?7.何为“GMO”,目前国际,国内关于“GMO”有何争议?有人认为“公众与新闻媒体的无知”是引起GMO争议的主要原因之一,对此请予以评论。
8.论述染色质结构与状态的变化与基因表达的关系。
9.Crick提出的“中心法则”(central dogma)的意义是什么?至今,分子生物学的发展对“中心法则”有哪些修订和补充?10.何谓“后基因组学”?依托哪些基本的分子生物学理论和原理?2000年分子生物学博士学位理论考试试题1.近年来,国际上关于转基因农作物环境释放和遗传工程食品存在很大的争议,这些争议所关心的主要问题是什么?你对此有何评议?2.试述比较基因组学(comparative genomics)的主要内容及其理论和实际意义。
3.试述功能基因组学(functional genomics)的基本内容。
4.简述从基因组中分离的启动子和终止子,确定基因转录起始位点和终止位点的方法和策略。
5.仅从转录和翻译水平上阐明原核生物与真核生物之间的差别。
6.说明真核生物基因组内重复序列的分子机理。
7.说明两种利用PCR技术实现基因点突变的方法及原理。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Proc.R.Soc.B(2005)272,277–283doi:10.1098/rspb.2004.2969Published online04February2005 Rates and patterns of gene duplicationand loss in the human genomeJames A.Cotton1,2Ãand Roderic D.M.Page11Division of Environmental and Evolutionary Biology,Institute of Biomedical and Life Sciences,University of Glasgow,Glasgow G128QE,UK2Department of Zoology,The Natural History Museum,Cromwell Road,London SW75BD,UK Gene duplication has certainly played a major role in structuring vertebrate genomes but the extent and nature of the duplication events involved remains controversial.A recent study identified two major episodes of gene duplication:one episode of putative genome duplication ca.500Myr ago and a more recent gene-family expansion attributed to segmental or tandem duplications.We confirm this pattern using methods not reliant on molecular clocks for individual gene families.However,analysis of a simple model of the birth–death process suggests that the apparent recent episode of duplication is an artefact of the birth–death process.We show that a constant-rate birth–death model is appropriate for gene duplication data,allowing us to estimate the rate of gene duplication and loss in the vertebrate genome over the last200Myr(0.00115 and0.00740MyrÀ1lineageÀ1,respectively).Finally,we show that increasing rates of gene loss reduce the impact of a genome-wide duplication event on the distribution of gene duplications through time.Keywords:gene duplication;gene loss;gene families;birth–death models;2R hypothesis1.INTRODUCTIONGene duplications are probably the major source of novel genetic material(Ohno1970;Holland et al.1994),but there has been relatively little quantitative investigation of the rates at which new genes are generated by the process of gene duplication,or of the rate at which genes are deleted from the genome,beyond the pioneering work of Lynch& Conery(2000,2003).By contrast,there has been much interest in the pattern of gene duplications in vertebrate evolution,stemming from the‘2R hypothesis’that two rounds of whole-genome duplication occurred early in ver-tebrate evolution(Ohno1970;Holland et al.1994).This hypothesis has proved difficult to test,principally because most of the duplicated copies have subsequently been deleted from the genome(Skrabanek&Wolfe1998),and because movement of genes complicates map-based approaches(Wolfe&Shields1997).The arrival of genome-scale sequence data for vertebrates in recent years has prompted a number of investigations of gene duplica-tions in vertebrates(e.g.Gu et al.2002;McLysaght et al. 2002),allowing better estimates of duplication and loss rates and investigations of the pattern of gene duplication. In particular,Gu et al.(2002)presented data on the age distribution of vertebrate duplications,revealing a pattern suggestive of two major episodes of duplication:one recently,and another corresponding in time to that expec-ted under the2R hypothesis(although different authors have disagreed about exactly when the‘2R’event occurred; Skrabanek&Wolfe1998).Gu et al.’s original dataset consisted of749human gene family trees suitable for dating gene duplication events dur-ing vertebrate evolution but the only data available from this analysis are the dates of duplications across the entire dataset(X.Gu,personal communication).We have com-piled a dataset(Cotton&Page2002)showing a very simi-lar pattern of gene duplication to that observed by Gu et al. (figure1).These two datasets have difference strengths and weaknesses:while Gu et piled a larger set of gene families(and duplications),phylogenetic trees are available for all of our gene families.Here,we use these two complimentary datasets to investigate three related ques-tions about the interpretation of this pattern.First,we focus on whether the pattern is real,given con-cerns about the constancy of molecular clocks.The dis-tributions shown in figure1assume a relaxed molecular clock occurring within each gene family,so that ultrametric trees(in which each leaf is the same distance from the root) can be produced,while Gu et al.’s‘nearest neighbour’clock assumes a relaxed clock over a smaller part of each gene family tree.There are theoretical concerns about the rate constancy of molecular clocks(Ayala1999;Rodriguez-Trelles et al.2002)and the accuracy of fossil calibrations (Graur&Martin2004),and it seems likely that molecular dating studies have often overestimated the dates of evol-utionary events(Conway Morris1999).Gene duplications are constrained by speciation nodes above and below them (figure2),giving us independent evidence about the dates of these events.More reliable dates are available for these speciation events than for gene duplications,as many genes can be used to estimate the date of a speciation event (Kumar&Hedges1998;Heckman et al.2001),while only the few duplicated genes can be used to estimate a dupli-cation date(Li1997).To test how sensitive the shape of the distribution of duplications is to molecular clock assump-tions we use a method based only on the topology of gene family trees to confirm the reality of the observed pattern. Gene duplication represents the birth of new gene linea-ges,while gene loss represents the death of these lineages, analogous to processes of speciation and extinction.TheÃAuthor for correspondence(james.cotton@).Received21September2004Accepted29September2004277#2005The Royal Societymajor results of this paper use a continuous-time model of this birth–death process (Sanderson 1994)that has pre-viously been used to study the processes of speciation and extinction (Yule 1924;Nee et al.1992,1994;Kubo &Isawa 1995).The mathematical models produced to study speciation and extinction as birth–death processes (Nee et al.1992)are equally applicable to studying gene dupli-cation and loss,and these models suggest a different interpretation of Gu et al .’s results.Birth–death models show a characteristic shape on plots of the number of extant lineages present against time (a lineage-through-time plot;Nee et al.1992).With no extinction and a con-stant gene duplication rate,these plots are exponential (and so show a straight line on a log plot).With extinction,the curves show a characteristic ‘hollowed-out exponential’shape,increasing rapidly towards the present (or an upward curving line on a log scale;Harvey et al.1994),as fewer older lineages persist to the present day to be observ-able on phylogenies of extant lineages.We compare the pattern expected under this simple model with that seen in Gu et al .’s data,allowing us to investigate how gene dupli-cation and loss rates have varied through evolutionary time.Finally,we use a simulation-based test of model adequacy to investigate whether a constant-rate birth–death model fits Gu et al .’s data,and use this model to estimate per-lineage rates of gene duplication,which can be more easily compared with previous estimates than Gu et al .’s per-genome rate,and to present,to our knowledge,the first explicit estimates of the rate of gene loss in vertebrates.2.MATERIAL AND METHODS(a )Dates of gene duplications in the human genomeWe use two different datasets of gene duplications.The larger Gu dataset of dates of gene duplications reconstructed in vertebrategene families is from Gu et al.(2002).These dates come from 749vertebrate gene families,and include duplications estimated to date from 4660.1Myr ago to the present day.This older figure is certainly an overestimate,and Gu et al .truncate the distribution they show at 3500Myr ago.For the smaller (Cotton and Page)dataset,118gene families that included members of a selected group of taxonomically diverse vertebrate taxa were identified from the Hovergen database.The vertebrate gene family phylo-genies used in this work are available from /~jcotton/vertebrate_data;selection of these gene families was described in detail in Cotton &Page (2002),and the analysis is detailed below.(b )Reconstructing gene duplicationsAlignments were generated using C LUSTAL W (Thompson et al.1994),with default settings,and checked by eye.Small sequence fragments that might reduce alignment quality and be difficult to place phylogenetically were removed.A maximum-likelihood esti-mate of the genetic distances between taxa was then found using T REE -P UZZLE ,v.5.0(Schmidt et al.2002),using the model selec-ted by the program,with amino acid frequencies estimated from the data and using an eight-category approximation to a gamma distribution to model rate heterogeneity between sites.These dis-tances were then used to produce a neighbour-joining tree in PAUP,v.4b10.Ultrametric trees were produced from these phy-logenies by using the non-parametric rate smoothing method (Sanderson 1997)implemented in the R8s software package,v.1.50,with calibration based on a date of 310Myr ago for the divergence of mammals and reptiles.All nodes representing the relevant speciation event for this calibration point were con-strained to the same age,so there were multiple calibration points in a number of gene families.Similarly,some gene families had no nodes mapping to that particular speciation,and were not included in the clock-based data.These ultrametric trees were analysed in a modified version of G ENE T REE (Page 1998),which produced output listing estimated dates for each node on the spe-cies tree,and for duplications mapped onto each branch on the species tree.Dates representing gene duplications that occurred(a )2040608010020406080100120140millions of years before present(b )n u m b e r o f d u p l i c a t i o n s i n a g e c l a s sFigure parison of the results of (a )our data and (b )data from Gu et al.(2002).Figures are histograms showing the numbers of human-lineage gene duplications dated to occur at different times in vertebrate evolution in the two datasets.Roman numerals on (b )locate the two episodes of gene duplication previously identified by Gu et al.Homo Homo MusMusMonodelphis Monodelphis Chondrichthyes cyclostomes Homo MusMonodelphis Bos Chondrichthyes cyclostomesreptiles amphibians Latimeria lungfish teleostsFigure 2.Duplications are constrained by neighbouringspeciation nodes.The duplication shown here (openrectangle)occurred before the divergence of Monodelphis and the placental mammals Mus and Homo ,but after the divergence of the Chondrichthyes and the teleosts.This duplication could thus have occurred anywhere along the highlighted branch of the species tree.278J.A.Cotton and R.D.M.Page Gene duplication and loss in the human genomeProc.R.Soc.B (2005)along the path from the root of the species tree to humans (the evolutionary lineage of humans)were used to produce the esti-mated pattern of gene duplication.(c )Clock-independent distributionsThe location of gene duplications is constrained by speciation nodes above and below them on the tree,so it is trivial to locate a set of edges on the species tree where a single duplication could have occurred (figure 2).Gene duplication events can occur at a range of different genomic scales,from a few bases to the entire genome,so duplications on different gene family trees may thus be the result of the same multiple gene duplication event.To investigate this,weclustered gene duplications from individual gene families into the minimum number of sets that may represent these larger gene duplication episodes,using a set-cover algorithm (Page &Cotton 2002).This clustering can be thought of as the distribution of duplication events if we assume that duplications of any number of genes occur with similar frequency.To examine the history of gene duplications without clustering into large episodes of duplication,we also reconstructed the most probable distribution of duplication events under the assumption that duplications occurred indepen-dently.For each branch of the species tree,the most probable num-ber of duplications actually occurring at that location was found by summing the number of duplications that actually occurred on that branch weighted by the uncertainty in the duplication’s position.A duplication that definitely occurred at a particular location thus added one to the estimated number of duplications occurring at that location,while a duplication that could have occurred on any of three different branches added one-third to the estimate for each of the three branches.To scale these distributions,the number of gene duplication episodes from the clustering analysis and the ungrouped distribution of duplications were plotted as histograms,with a bar for each branch on the species tree,and with the x -axis scaled to represent the length of each branch using molecular date estimates (Kumar &Hedges 1998).millions of years before presentd u p l i c a t i o n s pe r m i l l i o n y e a r s1254300.050.100.150.20e p i s o d e s p e r M y r(a )(b )(c )’ Figure 3.The distribution of gene duplications throughhuman evolution independent of a molecular clock.Distribution (a )is the same distribution shown in figure 1b ,truncated and with the x -axis scaled for ease ofcomparison.The locations of human-lineage duplications on our 118vertebrate gene families were either:(b )clusteredusing the duplication clustering algorithm (Page &Cotton2002),and the distribution of duplication episodes is shown,or (c )left unclustered,but with the ambiguity in their positions taken into account.The distributions are scaled so that branch lengths in the species tree reflected dates of cladogenesisevents from Kumar &Hedges (1998).Dates were interpolated for events not included in the Kumar and Hedges study.The y -axis scale for (a )is as in figure 1b ;for (b )shows the number of duplication episodes per million years;and for (c )shows the number of individual duplications per million years.0.850.900.951.000.050.100.200.501.00 p r o p o r t i o n o f e x t a n t l i n e a g e s o b s e r v e d N t TN (millions of years before present1000200030004000(a )(b )Figure 4.(a ,b )The constant rate birth–death model fittedusing the least-squares method to the lineages-through-time plot derived from duplication ages in the Gu et al .data.The dashed line in each plot is the model fitted to the last 200Myr,the solid line,the model fitted to the entire data.(a )Shows the results for the last 200Myr in more detail.Note that the solid curve fits less well for older data as many of the ancientduplications will not be recorded in the data,as they will have diverged too far to be grouped in a single gene family.This will tend to make the observed curve less steep initially on the lineage-through-time plot.These points do not heavily influence the fitted curve as there are relatively few duplications this ancient in Gu et al .’s dataset.Gene duplication and loss in the human genome J.A.Cotton and R.D.M.Page 279Proc.R.Soc.B (2005)(d )Birth–death modelThe models of the birth–death process used here are those of Kubo &Isawa (1995).These models are expressed in terms of numbers of lineages rather than numbers of duplications,so the data shown in figure 1need to be converted into this form.This change is simple:we start with 749lineages and add one lineage for each gene duplication event.A graph of these data is known as a lineage-through-time plot.The birth–death model with constant birth and death relates N T (the number of extant lineages)and N t (the expected number of lineages at time t ),by eqn 5of Kubo &Isawa (1995):N t N T ¼b Àcbe ðb Àc ÞðT Àt ÞÀc,or by their eqn 7c in the special case where birth and death rates are equal:ln N t ¼ln N T Àln ½1þb ðT Àt Þ :Fitting this model to the lineage-through-time plot by the least-squares method allows estimates of the rate of lineage birth (i.e.speciation or gene duplication,b )and the rate of lineage death (i.e.extinction or gene loss,c ),under the assumption that b and c remain constant.The extant number of lineages (N T )is 2488,as Gu et al .’s data start with 749gene families and include 1739duplications on these lineages,and T is 4660.1for the entire data-set,and 200for the recent duplication dataset.Model fitting and other procedures were implemented in R (code available from /~jcotton/RatesAndPatterns/).(e )Testing the fit of the constant-rate modelA parametric bootstrap procedure involved simulating 1000data-sets under a continuous-time constant-rate birth–death model with birth and death rates as estimated from the original data.Under this process,time between events is distributed as an expo-nential random variable,with mean 1=b þc ,with the probability of an event being a birth or death being proportional to their respective rates.Simulated and observed data were comparedusing the deviance D ¼2PO log ðO E Þwhere O is the observed/simulated data and E is the expected value from the constant rate birth–death model,summed across all data points.If the deviance of the model fit to the observed data falls within the core of the dis-tribution of the deviance of model fit to the simulated data,then the model fits the data adequately (Johnson &Omland 2004).(f )Non-parametric bootstrap estimatesof birth–death parametersNon-parametric bootstrapping was performed by sampling,with replacement,from the set of duplications to generate pseudor-eplicate duplication histories containing the same number of duplications as the original dataset.These replicates were analysed exactly as described above for the original data,allowing us to construct confidence regions for birth and death rates.3.RESULTS(a )The pattern of gene duplicationsWe have used topological constraints on the locations of duplications with previously estimated vertebrate speciation dates to find the distribution of duplications independent of molecular clocks for the 118gene families in our dataset (figure 3).These distributions are similar to those in figure 1,as the deepest divergence in figure 3dates to ca .565Myr ago (Kumar &Hedges 1998)and the peak to the right represents the possible ‘2R’event (episode II,figure 1).These data seem to confirm that the pattern of duplications(a )c)(d )(e )f )bbbbccc1003002000500n–38×10–4c 6×10–44×10–42×10–41×10–3Figure 5.Results of non-parametric bootstrapping on the birth and death rate estimates:(a –c )are based on parameter estimatesbased on the last 200Myr of data points from the Gu et al .dataset,(d –f )are based on parameters estimated for the entire Gu et al .dataset (4660Myr).(a )and (d )are two-dimensional confidence regions of birth rate and death rate estimates from bootstrap samples of last 200Myr data;(b )and (e )are frequency distributions of birth rate estimates,showing the actual estimate and 95%confidence interval;and (c )and (f )are frequency distributions of death rate estimates showing the actual estimate and 95%confidence interval.280J.A.Cotton and R.D.M.Page Gene duplication and loss in the human genomeProc.R.Soc.B (2005)shown in figure 1is not simply an artefact of the molecular clock assumption,and so demands an explanation.(b )Estimation of birth and death ratesThe best-fitting model for the entire data of Gu et al .sug-gests a duplication rate of 0.00097Myr À1lineage À1,and an extinction rate of 0.00048Myr À1lineage À1.The 95%confidence limits on these estimates,from non-parametric bootstrapping are 0.000890–0.00105and 0.000153–0.000786,respectively (figure 5d–f ).For the whole data-set,deviance was 17.971,outside the range of deviances from 1000simulated datasets (maximum 9.08)and so giv-ing a p -value of p <0:001that the observed data come from a constant-rate birth–death process,so this constant-rate model is rejected for the entire data.Looking at the Gu et al .data from the last 200Myr only,we get estimated duplication rate of 0.00115Myr À1lineage À1and a loss rate of 0.00740Myr À1lineage À1,with95%confidence intervals from non-parametric boot-strapping of 0.000902–0.00131and 0.00409–0.00951,respectively (figure 5a–c ).D for these observed data was À0.0260,lying within the lower tail of the distribution of simulated data (range À0.0420–0.0289),and lower than 99.3%of observations.This gives a two-tailed p -value of 0:0138<p <0:0140that the data for the last 200Myr come from a constant-rate birth–death process,so this model cannot be rejected at the 1%significance level for this restricted dataset.4.DISCUSSION(a )Pattern of gene duplication and loss throughtimeOur data show a similar pattern of duplications to that reported from Gu et al.(2002)(figure 1)but given the broadly similar methods of analysis,this is not surprising.Our topology-based method confirms the pattern of gene duplications through time suggested by these clock-based methods.Gu et al .interpreted this pattern as representing two episodes of increased gene duplication (figure 1):one of putative genome duplications occurring ca .500Myr ago,and a second recent increase in the rate of duplication.This is interpreted as ‘a recent gene family expansion by tandem or segmental duplications’,an event that has also been suggested elsewhere (Eichler 2001;Fortna et al.2004).Our tests of model adequacy show that a con-stant rate of gene duplication and loss explains the recent pattern of gene duplications observed over the last 200Myr,showing that Gu et al .’s episode I (figure 1)does not represent an episode of increased duplication activity.The recent sharp increase in the number of duplications follows the pattern that would be expected if rates of duplication and extinction per lineage were constant,and reflects the fact that a greater proportion of lineages from recent times are still extant in the genome (Harvey et al.1994).By con-trast,comparing the fitted model for the whole data to Gu et al .’s data (figure 4)clearly shows an increase in dupli-cation rate ca .500Myr ago that cannot be explained by a constant-rate model,and which seems to represent a genu-ine episode of increased gene duplication (or reduced gene loss)consistent with the 2R hypothesis.(b )Rates of gene duplication and lossOne limitation of this approach is that as duplicated genes diverge it will be increasingly difficult to detect similarity between them and align the genes properly.This means that any analysis based on gene family phylogenies will be less thorough in sampling older duplications than more recent events.Recent duplications,however,are more numerous,so the model (figure 4)is fitted largely to this part of the curve and is less influenced by the sparse,ancient data.Despite this,a constant-rate birth–death pro-cess can be rejected for the data taken as a whole,owing to both this sampling effect and variation in the rate of gene duplication across the data.If rates of duplication and loss have varied considerably through time,it is debatable how meaningful single estimates of these rates are.Restricting the data to the last 200Myr,we find that a constant-rate model cannot be rejected at a 1%level,so we have used this smaller time interval for estimates of duplication and loss rates.1.00.50.30.20.10.050.01050010001500200010008006004002000 1.00.50.30.20.10.050.011.00.50.30.20.10.050.011000800600400200010008006004002000050010001500500100015002000millions of years since first duplicationn u m b e r o f d u p l i c a t i o n s i n a g e c l a s s(a )(b )(c )proportion of extant lineages observed (N t /N T )Figure 6.The results of simulations showing the effects ofgene loss on the signal from an ancient genome duplication event.All three show constant rates of gene duplication(0.001097lineage À1Myr À1)and gene loss,with 749lineages simulated over 2000Myr.The loss rate is zero in (a ),equal to the duplication rate in (b ),and twice this value in (c ).The size of the spike from the genome duplication event 500Myr ago is much less pronounced in (b )and (c ),as gene loss has erased many of the lineages duplicated in this event.In real data,there will be error in estimating duplication dates and the dispersed peak will be harder to identify against background noise.Bar charts show the frequencies of duplications through time (left-hand axis),while lines are a log lineage-thru-time plot (right-hand axis).Gene duplication and loss in the human genome J.A.Cotton and R.D.M.Page 281Proc.R.Soc.B (2005)Our estimates of duplication and loss rates differ mark-edly from the only previous estimates.Lynch&Conery (2000)suggest rates of duplication of0.0023geneÀ1 MyrÀ1for Drosophila,0.0083for Saccharomyces and0.0208 for Caenorhabditis,and0.0071for human genes(Lynch& Conery2001),while a more recent estimate(Lynch& Conery2003)for the human rate is ca.0.009geneÀ1 MyrÀ1.Our estimate is thus almost an order of magnitude lower than previous estimates for human genes,and half the lowest value found by Lynch and Conery for any organ-ism.Lynch&Conery(2001,2003)also estimate half-lives of genes,which(under a constant rate assumption)can be converted into estimates of loss rates.The estimated half-life of7.5Myr for human genes(Lynch&Conery2003) corresponds to an estimate of0.0924gene losses geneÀ1 MyrÀ1,again around an order of magnitude higher than our estimate.There are several problems with this earlier study,most importantly that it assumes a global molecular clock,does not test if the rates of duplication and loss are constant(Long&Thornton2001)and may include redun-dant allelic sequences(Zhang et al.2001),which would tend to inflate the rate estimates.Lynch&Conery also restrict their estimates to duplicate pairs showing less than 1%divergence at silent sites:using their estimate of2.5 substitutions silent siteÀ1ByrÀ1,this is equivalent to dis-carding duplications over4Myr old.While this should not have a significant effect on the estimates,given that dupli-cation and loss have occurred with an approximately con-stant rate over this time period,it would be expected to reduce the precision of Lynch&Conery’s estimates.The birth–death model we use assumes that duplications and losses in each lineage are independent,and that the rates of duplication and loss stay constant throughout the tree.The effect of temporal rate variation has been investi-gated(Kubo&Isawa1995):clearly,duplication and loss rates are inter-related,and particular patterns in the num-ber of extant lineages can be explained by changes in either duplication or loss rates.There has clearly been variation in the rate of gene duplication and/or gene loss during ver-tebrate evolution,most notably ca.500Myr ago.Unfortu-nately,rates may also vary between lineages,for example if purifying selection makes duplicates more likely to go extinct soon after the event that gave rise to them(Walsh 1995).This violates the assumptions of the model,and may affect the accuracy of estimates from these models in a way that has not yet been investigated.(c)Inferring genome-scale eventsGenome-scale events are difficult to observe on lineage-through-time plots if there has been a high rate of sub-sequent gene loss.Kubo&Isawa(1995)show that a mass speciation(equivalent to a large-scale gene duplication) event produces a discontinuity in the lineage-through-time plot as the number of lineages suddenly increases.The size of this discontinuity depends upon the extinction rate:at high extinction rates,the discontinuity will be small and may be difficult to identify against the noisy background of real data,as figure6shows.It is even more difficult to detect ancient events of large-scale gene loss:these are vis-ible only as a slight‘kink’where the gradient of a lineage-through-time plot changes(Kubo&Isawa1995).It is clear that good estimates of gene deletion rates will be needed to correctly interpret the peak in duplication rates observed during vertebrate evolution.5.CONCLUSIONReconstructing the pattern of gene duplications indepen-dently of molecular-clock assumptions confirms the pat-tern of gene duplication shown by Gu et al.and by our data. Using these data,we can use a quantitative model of the birth–death process of gene family evolution to estimate rates of gene duplication and gene loss.We show that a constant rate of gene duplication and loss fits the pattern of recent gene family evolution reasonably well,implying that, contrary to Gu et al.(2002),there has been no recent increase in duplication.Duplication and loss rates esti-mated here are significantly lower than previous estimates (Lynch&Conery2000),but we confirm the high rate of loss relative to gain of new genes(figure5a).An estimate of the rate of gene loss is crucial in interpreting the pattern of ancient gene duplication episodes.While the scale of ancient gene duplications in vertebrates is striking,it seems likely that evidence from a number of sources—from tree topology and genetic map information—will be needed to unravel the history of vertebrate genome evolution.We thank Xun Gu and co-authors for making their data avail-able for this study.Two anonymous reviewers made many helpful comments that greatly improved the final product. This work was supported by a NERC studentship,the Wolfson Foundation and by BBSRC grant no.40/G18385. REFERENCESAyala, F.J.1999Molecular clock mirages.Bioessays21, 71–75.Conway Morris,S.1999Palaeodiversifications:mass extinc-tions,‘clocks’,and other worlds.Geobios32,165–174. Cotton,J.A.&Page,R.D.M.2002Going nuclear:vertebrate phylogeny and gene family evolution reconciled.Proc.R. Soc.B269,1555–1561.(doi:10.1098/rspb.2002.2074) Eichler,E.E.2001Recent duplication,domain accretion and the dynamic mutation of the human genome.Trends Genet. 17,661–669.Fortna,A.(and15others)2004Lineage-specific gene dupli-cation and loss in human and great ape evolution.PLoS Biol.2,0937–0954.Graur,D.&Martin,W.2004Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision.Trends Genet.20,80–86.Gu,X.,Wang,J.&Gu,J.2002Age distribution of human gene families shows significant roles of both large-and small-scale duplications in vertebrate evolution.Nature Genet.31,205–209.Harvey,P.H.,May,R.M.&Nee,S.1994Phylogenies with-out fossils.Evolution48,523–529.Heckman,D.S.,Geiser,D.M.,Eidell,B.R.,Stauffer,R.L., Kardos,N.L.&Hedges,S.B.2001Molecular evidence for the early colonization of land by fungi and plants.Science 293,1129–1133.Holland,P.W.,Garcia-Fernandez,J.,Williams,N.A.& Sidow,A.1994Gene duplications and the origins of ver-tebrate development.Development(Suppl.),125–133. Johnson,J.B.&Omland,K.S.2004Model selection in ecol-ogy and evolution.Trends Ecol.Evol.19,101–108.282J.A.Cotton and R.D.M.Page Gene duplication and loss in the human genome Proc.R.Soc.B(2005)。