Identication of complete gene structures in human genomic DNA
杜仲甲羟戊酸激酶_EuMK_基因鉴定及生物信息学分析
V ol. 32 No.1Mar. 2014第32卷 第1期2014年3月 经 济 林 研 究Nonwood Forest Research 收稿日期:2013-07-26基金项目:国家林业公益性行业科研专项(201004029)。
作者简介:乌云塔娜(1975—),女,内蒙古通辽人。
教授,博士生导师,主要从事经济林育种栽培的研究。
E-mail :tanatanan@ 。
生物信息学分析乌云塔娜1,2,王 淋3a,3b ,叶生晶4(1.中国林业科学研究院 经济林研究开发中心,河南 郑州 450003;2.国家林业局杜仲工程技术研究中心,河南 郑州 450003;3.中南林业科技大学a.经济林育种与栽培国家林业局重点实验室;b.林学院,湖南 长沙410004;4.国家林业局中南林业调查规划设计院,湖南 长沙 410014)摘 要:甲羟戊酸激酶(mevalonate kinase ,MK )是萜类化合物生物合成甲羟戊酸(mevalonate pathway ,MV A )途径的限速酶之一。
为了探明EuMK 对杜仲胶生物合成的调控机制,对EuMK 所编码蛋白质氨基酸序列的理化性质、跨膜结构域、疏水性/亲水性、蛋白质二、三级结构、结构保守域以及基因组内含子和外显子结构特征等进行了全面的生物信息学分析,结果发现EuMK 蛋白质序列均存在3个与甲羟戊酸底物结合及催化反应有关的保守结构域;EuMK 的氨基酸组成中性的疏水性蛋白居多数,属于稳定的疏水性蛋白;无明显的跨膜结构,均在膜外;二级结构为混合型结构的蛋白质。
EuMK 均含有4个外显子,3个内含子;EuMK 基因家族启动子顺式作用元件潜在功能的预测分析发现,含有大量基本顺式作用元件TATAbox ,CAATbox ,还含有光(AE-box 、CGT-motif 、Box 4、G-box )、生长素(ERE 、ABRE 、TCA-element )响应等多个顺式调控元件。
BCR-ABL genes and leukemic phenotype
BCR/ABL genes and leukemic phenotype:from molecular mechanisms to clinical correlationsFabrizio Pane*,1,Mariano Intrieri 1,2,Concetta Quintarelli 1,Barbara Izzo 1,Giada Casadei Muccioli 1and Francesco Salvatore 11CEINGE Biotechnologie Avanzate,and Dipartimento di Biochimica e Biotecnologie Mediche,Facolta `di Medicina,Universita `di Napoli Federico II,Italy;2Dipartimento STAT,Facolta `di Scienze MFN,Universita `degli Studi del Molise -Isernia,ItalyThe Philadelphia chromosome (Ph),a minute chromo-some that derives from the balanced translocation between chromosomes 9and 22,was first described in 1960and was for a long time the only genetic lesion consistently associated with human cancer.This chro-mosomal translocation results in the fusion between the 5’part of BCR gene,normally located on chromosome 22,and the 3’part of the ABL gene on chromosome 9giving origin to a BCR/ABL fusion gene which is transcribed and then translated into a hybrid protein.Three main variants of the BCR/ABL gene have been described,that,depending on the length of the sequence of the BCR gene included,encode for the p190BCR/ABL ,P210BCR/ABL ,and P230BCR/ABL proteins.These three main variants are associated with distinct clinical types of human leukemias.Herein we review the data on the correlations between the type of BCR/ABL gene and the corresponding leukemic clinical stly,drawing on experimental data,we provide insight into the different transforming power of the three hybrid BCR/ABL proteins.Oncogene (2002)21,8652–8667.doi:10.1038/sj.onc.1206094Keywords:Philadelphia chromosome;Ph positive leu-kemias;molecular pathogenesis;phenotype of leukemia IntroductionPh-positive leukemias have been the object of extensive studies to define at molecular and cellular level the complex mechanisms whereby the hybrid BCR/ABL protein may disrupt,in hemopoietic precursors,the maturation and replication processes and the physiological response to cytokines and growth factor leading,ultimately,to their neoplastic transformation.Although the expression of the hybrid BCR/ABL protein is presumed to be an early pathogenic event and its elevated tyrosine kinase activity to be a central event in Ph positiveleukemias,the intimate role of the BCR/ABL protein in leukemic transformation has not yet been completely elucidated.The knowledge of the role of BCR/ABL fusion gene in leukemia is further complicated by cytogenetic and epidemiological observations.This hybrid gene was originally described in a chronic type of leukemia affecting the myeloid compartment,the chronic myelogenous leukemia (CML),and is present in almost all the CML cases with a classical clinical picture,which is now considered the hallmark (Bennett et al .,1994).However,unlike other genetic lesions that are specifically associated with particular types of leuke-mias (Rowley,1998),the BCR/ABL hybrid gene is not restricted to CML,but it can also be found in a significant proportion of acute lymphoblastic leuke-mia (ALL)and less frequently in other chronic and acute hemopoietic tumors (Melo,1996).We described a relatively benign and rather rare form of myeloproliferative disease that we called ‘Neutrophi-lic-chronic myeloid leukemia’(CML-N),which is also associated with a BCR/ABL gene,that shows a novel position for the breakpoint in the BCR gene far downstream from the breakpoints already described (Pane et al .,1996b).Therefore,the identification of this subset of patients carrying the novel type of BCR/ABL gene led to the hypothesis that the structure of this gene and,in particular,the location of the breakpoint on the BCR sequences correlates with the leukemic phenotype (Pane et al .,1996b).A masterly account of these data and their far reaching implications can be found in Melo (1996).To minimize overlapping with the excellent reviews devoted to the intracellular interactions of BCR/ABL proteins that have appeared in recent years (Faderl et al .,1999;Holyoake,2001)or that are included in this issue of the journal,we first trace the milestones that have culminated in today’s state of the art,and then discuss:(i)the molecular structure of BCR/ABL fusion genes;(ii)the correlation between leukemic clinical features and the position of the breakpoint in the BCR gene;(iii)the clinical characteristics of the CML-N cases described so far;and (iv)the hypothesis that could explain at molecular level the differential transforming power of the various BCR/ABL fusion proteins.*Correspondence:F Pane,CEINGE and Dipartimento di Biochimicae Biotecnologie Mediche,Universitadi Napoli Federico II,Via S.Pansini 5,80131Naples,Italy;E-mail:fabpane@unina.itMilestonesThe history of the hybrid BCR/ABL gene started in 1960,when an abnormal,shortened chromosome22, termed the Philadelphia chromosome,was described in the leukemic cells of a patient affected by chronic myeloid leukemia(Nowell and Hungerford,1960). Only13years later,the use of quinacrinefluorescence and Giemsa banding in chromosome studying,allowed to clarify that Ph chromosome is the result of a translocation of part of chromosome22to chromo-some9(Rowley,1973).The molecular characterization of this translocation was facilitated when the c-ABL gene,the human homologous of v-ABL,an oncogene originally identified in the Abelson murine leukemia virus,was localized to the long arm of the chromosome 9(Heisterkamp et al.,1982).Two ABL probes,which normally map to chromosome9,were shown to hybridize on chromosome22q-(the Ph chromosome) of somatic cell hybrids originating from human CML leukocytes and rodent cells(de Klein et al.,1982), while sequences of the c-sis oncogene,generally found on chromosome22,were shown to be translocated to the chromosome9in CML cells(Groffen et al.,1983). These studies indicated a reciprocal translocation for the Ph chromosome,and suggested that activation of the transforming potential of the c-ABL proto-oncogene was an important consequence of the translocation.The cloning of the BCR/ABL break-point was the subsequent milestone.Heisterkamp and colleagues applied the‘chromosome walking’technique to show a chromosomal breakpoint within a14kb sequence of chromosome9homologous to the v-ABL, in one of three patients with CML(Heisterkamp et al., 1983).However,the chromosome9rearrangement was not evident in the remaining two patients,and this prompted Groffen et al.(1984)to direct their attention to chromosome22breakpoints.They used the sequences complementary to chromosome22 previously isolated from the genomic clone of the positive CML patients as probe to screen for genomic rearrangements,and found a rearrangement of chro-mosome22in a second CML patient.Noteworthy,this second rearrangement of chromosome22mapped very closely(1kb)to that identified in thefirst positive CML patient.When the same probe and restriction analysis was applied to a series of17leukemic patients, all but two cases(both Ph chromosome negative) showed a chromosome22rearrangement clustered within a5.8kb Bgl II–Bgl II genomic fragment,which was thus named the‘bcr’or breakpoint cluster region (Figure1)(Groffen et al.,1984).This region was later renamed‘M-bcr’(major breakpoint cluster region)to distinguish it from the‘minor breakpoint region’(m-bcr)–a chromosome22fragment located further upstream,which appeared to harbor breakpoints of Ph-positive acute leukemias(Hermans et al.,1987).It was rapidly evident that the involvement of M-bcr was highly specific for CML,as the rearrangement of this chromosome region were consistently found only in this type of leukemia and not in other hemopoietic neoplasias.The identification of an abnormally large (8.0kb)v-ABL related mRNA in Ph positive cell lines and in CML cells,but not in normal cells and in Ph negative leukemias(Gale and Canaani,1984),and the characterization of a large(210kDa)ABL related phosphoprotein,the P210(Konopka et al.,1984), reinforced the concept that CML was related to a fusion gene.Importantly,the tyrosine kinase activity of P210resembled that of protein v-ABL(Witte et al., 1980;Davis et al.,1985),which was phosphorylated, whereas the145kDa c-ABL protein which was not phosphorylated in vivo,in normal cells(Konopka and Witte,1985).The definitive proof of the presence of a fusion gene in CML cells,came from the studies of Shtivelman et al.(1995)thatfirst cloned the full length cDNA corresponding to the6.0kb cABL encoding transcript, and used the5’part of this cDNA to clone the CML-specific transcript from cDNA libraries prepared from both K562and EM2PH positive cell lines.The sequence analysis of these transcripts isolated from both cell lines revealed that they contain a novel5’sequence fused in frame with c-ABl exon2,and interestingly,a genomic fragment which hybridized to these non-ABL sequences contained the M-bcr region previously identified by Groffen et al.(1984).Taken together,these data suggested that the8kb CML-specific mRNA contains the BCR joined to the c-ABL sequences that was transcribed from a BCR/ABL fusion gene.In addition the authors suggested that the amino terminal substitution was responsible for the increased tyrosine kinase activity of the ABL-derived sequences.Thesefirst but extremely important data open the way to a huge amount of studies on the structure of this hybrid gene and on molecular mechanisms of Ph positive leukemias.Molecular structure of BCR/ABL genesAt molecular level,the Ph translocation results in the juxtaposing of the5’part of the BCR gene to the3’part of the ABL gene,and,depending on chromosomal breakpoint locations,different parts of these two genes may be included in the oncogenic fusion gene(Table 1).The ABL gene is a gene encoding a non receptor tyrosine kinase,which spans a230kb region at band q34of chromosome9and consists of11exons,with twofirst alternative exons i.e.exons Ia and Ib (Kurzrock et al.,1988).In the vast majority of the Ph positive patients,breakpoints in the ABL gene appeared to be distributed over a rather large300kb fragment of chromosome9at band q34,which comprises the5’end of this gene,and may occur either upstream of the alternativefirst exon(exon Ib), or downstream of the otherfirst exon(Ia)or more frequently between these two(Melo et al.,1993a).By the effect of the Ph chromosomal translocation,ABL sequences sited downstream(telomeric)of breakpoint move to the der(22)and are joined to the5’part of the BCR/ABL genes and leukemic phenotypeF Pane etalFigure1Breakpoint locations at the BCR and ABL loci.The sporadic breakpoints are indicated by small arrows(see also Table1)Table1Breakpoints at the BCR and ABL locus in human Philadelphia positive leukemias(see also text for details)Genomic breakpoints No of ChimericBCR locus ABL locus cases mRNA junctions protein ReferenceMinor bcr5’end of ABL(*300Kb)Common e1a2185KDa(Fainstein et al.,1987;Clark et al.,1987;and this review) Minor bcr Intron2or5’end of ABL4e1a3180KDa(Soekarman et al.,1990;Iwata et al.,1994;Wilson et al.,2000;Mancini et al.,2001)Exon25’end of ABL(intron Ib)1e2-int ABL1b-a2187KDa(Okamoto et al.,1997)Intron65’end of ABL(*300Kb)1e6a2195KDa(Hochhaus et al.,1996)Intron85’end of ABL(*300Kb)4Not tested?(Saglio et al.,1988;Negrini et al.,1992)2e8-int-a2197.5KDa(How et al.,1999;Martinelli et al.,2002)Intron105’end of ABL(*300Kb)1Not tested?(Erikson et al.,1986)Major bcr5’end of ABL(*300Kb)Common e13a2and/or e14a2210KDa(Heisterkamp et al.,1985;Kawasaki et al.,1988and this review)Major bcr Intron2or5’end of ABL*4e13a3203KDa(Soekarman et al.,1990;van der Plas et al.,1991;3e14a3Tuszynski et al.,1993;Wilson et al.,2000)(Inukai et al.,1993;Iwata et al.,1994)Intron15Exon2(nt78)2e15a2210KDa(Moreno Mdel et al.,2001)Micro bcr5’end of ABL(*300Kb)24e19a2230KDa(Saglio et al.,1990and this review)BCR gene(Figure1).Sequences of the exons Ia and/or Ib,which may be included in the fusion gene,are spliced out from the primary hybrid transcript.Indeed, they are never detected in the mature BCR/ABL mRNAs,which,with very rare exceptions(see below), contain only the last10exons,from exon2to exon11, of ABL gene(Morris et al.,1991;Melo et al.,1993b). Breakpoints on the BCR gene are usually clustered in three well defined regions(Table1and Figure1). Thefirst,now known as major breakpoint cluster region(M-bcr),is a 5.8kb chromosomal region spanning exons12–16(originally named b1to b5) (Heisterkamp et al.,1985;Stam et al.,1985).M-bcr breakpoints are detectable in more than95%of cases (Faderl et al.,1999,however,breakpoints in this region can be detected in about one third of adult acute lymphoid leukemias(ALLs)with the t(9;22)transloca-tion and in a small fraction of Ph positive ALL childhood cases(Arico et al.,2000;Gleissner et al., 2002).Depending on the position of breakpoint in this region,the5’end of the BCR gene comprising either the exon13(formerly b2)or exon14(formerly b3)is joined to the3’part of ABL gene,giving rise to a BCR/ABL hybrid gene encoding the chimeric210kDa protein(P210BCR/ABL).The corresponding fusion mRNAs shows either the b2a2or the b3a2type of junctions(Kawasaki et al.,1988).Together with Saglio’s group,we reported alternative splicing of the primary BCR/ABL transcript which leads in all CML patients to the production of also the P190-encoding mRNA(e1a2junction),(Saglio et al.,1996).This finding was subsequently confirmed by other groups (van Rhee et al.,1996;Lichty et al.,1998;Serrano et al.,2000).In addition,alternative splicing allows the simultaneous expression of both b2a2and b3a2types of transcripts(Melo,1996).In70–80%of Ph positive ALLs(Arico et al.,2000; Gleissner et al.,2002),and in rare cases of CML(Melo et al.,1994),breakpoints of chromosome22span a 55kb intronic sequence between the two alternative exons e2’and e2(Chissoe et al.,1995),called minor breakpoint cluster region(m-bcr)(Clark et al.,1987; Fainstein et al.,1987).In these cases only the extreme 5’end of the BCR gene is joined to the3’sequences of the ABL gene,and although the deriving BCR/ABL fusion gene contains both the e1’and the e2’BCR exons and may contain ABL alternativefirst exons,all these exonic sequences are removed by splicing and the hybrid transcript shows a junction between the BCR exon e1and ABL exon a2.This type of ela2transcript is smaller(7.4kb)than that normally found in CML patients and encodes a185kDa chimeric protein (P190BCR/ABL)(Clark et al.,1987;Hermans et al., 1987;Kurzrock et al.,1987).More recently Saglio et al.(1990)described the presence of a novel type of BCR breakpoint located at the3’end of the gene in two cases of chronic myeloid leukemia.The breakpoints of these two patients were both comprised in the intron19and,hence,19of the 23BCR exons were included in the resulting fusion mRNA,which show an in frame e19a2type junction encoding for the largest BCR/ABL chimeric protein of predicted230kDa.This protein(P230BCR/ABL)was subsequently purified from a Ph positive cell line expressing this type of BCR/ABL fusion gene(Wada et al.,1995).Afterwards,we described three additional patients affected by a Ph positive leukemia carrying the e19a2type of BCR/ABL gene(Pane et al.,1996a). Interestingly,the leukemic phenotype in these three patients was very close to that of the chronic neutrophilic leukemia(CNL),and,in retrospect,also the clinical features of thefirst two cases described by Saglio et al.(1990)might be considered much closer to CNL than the classical CML.Therefore,we proposed the name of m BCR(Micro-BCR)for the breakpoints located at the intron19of the BCR gene,which were associated to a distinct form of mild Ph positive myeloproliferative disease,the neutrophilic-chronic myeloid leukemia(CML-N)(Pane et al.,1996b). Over the past15years,additional types of BCR/ ABL genes have been described mainly as single case reports(Table1).Eight cases concerning patients with a breakpoint comprised between the minor and major BCR regions,mainly located at,or around,the BCR intron8.The fused mRNA was detected in only three of these patients,being the in frame e6a2junction in one case(Hochhaus et al.,1996)and an e8a2junction which was in frame by the inclusion of an intronic sequence,in the other two patients(How et al.,1999; Martinelli et al.,2002).In other11patients,the hybrid transcript had an unusual junction between BCR sequences(exons e1,e13or e14)and the ABL exon 3,thus showing that the174bp sequences of the ABL exon2,which encode for58amino acids,the last17of which are part of the SH3domain,were not essential for the transformation of cells(Table1).In only one of these cases the Southern analysis made it possible to detect a breakpoint at the0.6kb ABL intron2, whereas in the other two cases the restriction analysis excluded the presence of a breakpoint in the second ABL intron,therefore,most likely,the unusual junctions derive in these latter cases,from the splicing out of the ABL exon2from the primary hybrid transcript(van der Plas et al.,1991;Iwata et al.,1994). Correlation of BCR breakpoint positions with the leukemic phenotypeClinical variability among the main cluster regions Table2shows the different types of leukemia and the corresponding BCR/ABL breakpoints found in the Ph positive patients,which shows the relationship between the amount of BCR sequences included in the hybrid gene and the leukemic phenotypes.In ALL,the Philadelphia chromosome and the corresponding BCR/ABL gene shows a different epidemiological distribution between adults and chil-dren cases(see Table2).The incidence of this genetic abnormality seems to correlate with age and it is relatively common in adults where it is found in up to 35%of cases(Maurer et al.,1991;Westbrook et al., BCR/ABL genes and leukemic phenotypeF Pane etal1992;Annino et al .,1994;Gleissner et al .,2002).Interestingly,among ALL cases,this genetic abnorm-ality is tightly associated to the B lineage and the cases of true T-ALL containing the BCR/ABL gene are very rare.Moreover,most of the Ph positive B-ALL cases express CD34and CD10antigens (Kantarjian et al .,1991;Westbrook et al .,1992;Gleissner et al .,2002).The relative proportion of adult ALL patients with the P210-and P190-encoding genes is variable in the various studies.The bias among the various studies is largely due to the selection of cases and the low number of patients analysed in some reports.Indeed,two studies,based on retrospective analysis,reported a higher percentage of adult ALL cases with M-bcr breakpoint (P210+ALL)compared to those with m-bcr type of breakpoint (P190+ALL)(Annino et al .,1994;Radich et al .,1994).Kantarjian et al .(1991)found the m-bcr type of breakpoint in 12out of 24adult ALL patients and the M-bcr in 11patients,and both the P210and the P190-encoding mRNA in the remaining patient.Most likely,this latter patient had a hybrid BCR/ABL gene with the M-bcr type of breakpoint,which gave origin to the two types of mature mRNAs by alternative splicing of a primary transcript (Dhingra et al .,1991;Saglio et al .,1996;van Rhee et al .,1996).In other studies,including a large prospective report on 175cases of Ph positive adult ALL,the percentage of adult ALL expressing the P190-encoding gene is much higher and accounts for 70–75%of the total number of Ph positive cases (Maurer et al .,1991;Westbrook et al .,1992;Gleissner et al .,2002).On the clinical ground,none of these studies provided evidence of hematological differences between the two types of breakpoints,with the exception of a trend,not statistically significant,toward a longer survival probability at 3years of the P190+ALL compared to the P210+ALL patients (0.19%vs 0.03%;P =0.7)found in the 175Ph positive ALL cases published by Gleisser et al .(2002).It is to be stressed,however,that the dire prognosis of adult ALL patients with the BCR/ABL gene might hide the clinical differences between the P190+ALL and the P210+ALL cases at least as regarding the treatment outcome aspects.Noteworthy,when aggressive treat-ments such as allogeneic stem cell transplant (SCT),were applied to the Ph positive ALL,the type of transcript seems to predict the clinical outcome.Indeed,Radich et al .(1992)showed that the PCR analysis made it possible to detect BCR/ABL positive cells,at least once after SCT,in bone marrow samples of 23out of 36Ph positive ALL patients,and that the PCR positivity predicted an overt clinical relapse with a relative risk of 5.7compared to a PCR negative assay.Interestingly,these authors showed that seven out of 10patients with the P190positive ALL and a post-transplant PCR positive assay relapsed,in contrast to only one out of eight patients with the P210positive ALL and a post-SCT PCR positive assay (Radich et al .,1997).In childhood ALL cases,the BCR/ABL gene is detectable in a limited proportion (up to 5–6%)of newly diagnosed patients (Suryanarayan et al .,1991;Pui and Evans,1998).As for the adult patients,the Ph defect is restricted to the pre-B ALL (Schrappe et al .,1998;Arico et al .,2000),and the cases of T-ALL with the BCR/ABL gene are very rare (Arico et al .,2000).The P210+ALL in children is less frequent than in adults and represents a small minority of cases (10–20%)of all childhood Ph positive ALL when tested in large groups of patients (Maurer et al .,1991;Schrappe et al .,1998;Arico et al .,2000).Although no clinical differences have been found between P210+ALL and P190+ALL pediatric patients,two large retrospective studies provided evidence that childhood Ph positive ALL comprises patients with heterogeneous responses to the intensive treatments (Schrappe et al .,1998;Arico et al .,2000).In particular,a good response to initial prednisone treatment and low WBC count predict those patients who will benefit from the treatment (Arico et al .,2000;Schrappe et al .,1998).At the moment,however,the biological background of these clinical heterogeneity is not known.The Ph chromosome is only rarely detected in acute myeloid leukemia (AML),and most of the patients represent myeloid blast crisis of CML following a clinically silent chronic phase.In the few bona fide Ph positive AML cases for which molecular studies were available,the P190-,differently to the P210-encoding gene,was always associated with the myelomonocytic phenotype (FAB M4or M5),thus suggesting a pivotal role of the BCR sequences located downstream from the m-bcr breakpoint,in the transformation of myeloidTable 2BCR/ABL junctions and leukemic phenotypeBCR breakpointJunction BCR/ABL mRNAProteinDiseases FrequencyALL85%of childhood Ph positive ALL 50–70%of adult Ph positive ALL Minor (m)e1/a2P185CML Very rare AMLVery rare Multiple myelomaOne case Non Hodgkin Lymphoma One caseCML 100%of casesMajor (M)b2/a2or b3/a2P210ALL 15%of childhood Ph positive ALL 30–50%of adult Ph positive ALL AML Very rare Micro (m )c3/a2P230CML-NRarelineage cells(Kurzrock et al.,1987;Preudhomme et al., 1992;Secker-Walker et al.,1992;Alimena et al.,1995). The impact of the amount of BCR sequences on leukemic phenotype is further supported by chronic leukemias predominantly affecting the myeloid lineage. Classical CML usually shows,at least in the initial chronic phase,a prominent,but often asymptomatic increase in peripheral blood of WBC,and in some cases of platelet count associated with the presence of immature granulocytic elements.This leukemia is almost invariably derived by the P210-encoding BCR/ ABL gene,and is the prototype of a stem cell neoplasia,in which all the hemopoietic lineages derive from a staminal transformed cell and have the Ph chromosome(Fialkow et al.,1977;Jonas et al.,1992; Maguer-Satta et al.,1996).In the classical CML, however,only the myeloid and megakaryotytic compartments show a neoplastic expansion,and the granulocytic progenitors have a moderate degree of impairment of their differentiation and maturation capacity,whereas the erythroid,monocytic and B-and T-lymphoid lineages do not reveal any functional damage(Clarkson and Strife,1993).By contrast,the rare cases of CML with the P-190encoding BCR/ABL gene are characterized by a prominent monocytosis, with a low neutrophil-to-monocyte ratio in both bone marrow and peripheral blood.In addition,these patients show a relative higher proportion of immature cells in peripheral blood and lower neutrophil alkaline phosphatase(NAP)score than those affected by P210 CML(Melo et al.,1994;Roumier et al.,1999). Therefore,as for the P190positive AML,when the BCR/ABL gene with the m-BCR type of breakpoint is expressed in the early myeloid compartment,the monocytic precursors are consistently included in the neoplastic expansion,whereas the M-BCR breakpoint results in the restriction of the expansion to the granulocytic precursors.The importance of the length of BCR sequence in the BCR/ABL fusion gene was corroborated by the CML-N patients with the P230encoding BCR/ABL gene and the m BCR type of breakpoint.The CML-N wasfirst described by us in1996infive patients as a clinical entity characterized by primary chronic,non progressive leukocytosis(Pane et al.,1996b).The original description required the following criteria:(1) moderate neutrophilic leukocytosis,(2)rare circulating immature myeloid cells without a myelocyte peak,(3) excess marrow mature myeloid cells,and(4)absent or minimal splenomegaly.Thesefirst cases suggested that CML-N might have a more benign clinical course than classical CML.This was disputed after the report of three new cases of Ph positive leukemia with the m BCR type of breakpoint whose clinical features were similar to those usually observed in the classical P210-expressing CML and of a single case of AML with this type of BCR/ABL gene(Briz et al.,1997;Wilson et al.,1997;Haskovec et al.,1998;Kojima et al.,1999). Twenty-four patients with the P230-encoding BCR:ABL gene have thus far been reported(Verstov-sek et al.,2002).Overall,the clinical features of these patients are different to the classical CML(Table3): out of the24CML-N patients,18were females,17 had no palpable splenomegaly(in one case thisfinding was not reported),and only three had WBC counts greater than1006109/L.The mean platelet count of all the patients was6616109/L and thrombocytosis was present in16cases,withfive patients presenting with very high platelet count(410006109/L).Note-worthy,in15patients the m BCR type of BCR/ABL gene was the sole detectable genetic abnormality (Table3,upper panel),whereas in nine patients the BCR/ABL gene rearrangement was associated with other cytogenetic abnormalities beside the Ph chromo-some(Table3,lower panel),and at least in the case reported by Wada et al.(1995),the associated chromosomal lesion,the isochromosome17,was proven to precede the t(9;22)translocation.Interest-ingly,14out of the15patients belonging to thefirst group showed a clinical picture typical of what we have termed CML-N(i.e.,a Ph-positive chronic neutrophilic leukemia).Indeed,all patients except one did not show splenomegaly,the mean WBC count in these patients was rather low(346109/L),and12of the15patients are alive and well,in some instances after a prolonged period of observation(Verstovsek et al.,2002).Only one patient of this group had a blastic transformation,while the other two patients died from unrelated causes(Saglio et al.,1990;Pane et al., 1996b),and the median survival probability of this group of patient is projected at190months by the actuarial analysis.By contrast,in the remaining nine patients who had,at presentation of disease,additional cytogenetic abnormalities,clinical and hematological features were more similar to the classical CML:the spleen was enlarged in four of these patients,the mean WBC count was significantly higher(926109/L),and a transition to the more advanced phase of disease was observed in four of this group of patients who showed a median survival probability of37months.Interest-ingly,all these chromosomal abnormalities have been described in acute myeloid leukemia and among these, both the trisomy of chromosome8and the isochromo-some17,which has been found in three patients,may be the sole genetic lesion in acute myeloid leukemias (Schoch et al.,1997;Fioretos et al.,1999;Paulsson et al.,2001).Furthermore,the hybrid AML/ETO gene of the t(8;21)detected in patient#23,has proven to transform myeloid precursors(Nucifora and Rowley, 1995).We then investigated for the presence of the P230BCR/ABL protein and for the amount of P230-encoding transcript copies in the bone marrow cells of all the available CML-N patients(Verstovsek et al., 2002).Interestingly,in all the patients analysed,the level of expression of the P230BCR/ABL protein was always very low and below the detection limit of the very sensitive technique used in this study.The presence of the P230BCR/ABL protein was reported in only one patient(#24of Table3),however,his leukemic blasts carried two copies of Ph chromosome (Haskovec et al.,1998).In addition,in all but two patients,we measured,by using both the real time and BCR/ABL genes and leukemic phenotypeF Pane etal。
一、教育经历1982
一、教育经历1982.9-1986.6,湖南农业大学学习,获学士学位。
1990.9-1991.6,武汉大学学习分析化学研究生课程。
2003.9-2003.12,爱尔兰国立大学Cork College高级访问学者二、工作经历1986.7-2002.7,湖南农业大学,助教、讲师、副教授;2002.8-2004.2,湖南农业大学,教授;2004.2-今,南京农业大学,教授。
三、获奖情况1.经济作物抑病型土壤微生物区系调控技术创建与应用,2019年,农业农村部2018-2019年度神农中华农业科技奖,一等奖,第四完成人(2019-KJ013-1-R04)2.利用秸秆和废弃动物蛋白制造木霉固体菌种及木霉全元生物有机肥,2018年,教育部技术发明奖,一等奖(2018-148),第二完成人3.一种木霉直接发酵作物秸秆制备木霉固体菌种的方法及制备的产品,2018年,国家知识产权局中国专利奖,优秀奖,第二完成人4.病死畜禽动物零污染无害化处理和高附加值资源化利用工艺,2017年,国家知识产权局中国专利奖,优秀奖,第二完成人5.有机肥与土壤微生物创新团队,2015年,农业部中华农业科技奖优秀创新团队奖(等同于科研成果一等奖),第六完成人(TD2015-R-024-06)6.有机肥作用机制和产业化关键技术研究与推广,2015年,国务院,国家科技进步奖,二等奖,第十完成人(2015-J-25101-2-08-R10)7.克服土壤连作生物障碍的微生物有机肥及其新工艺,2011年,中华人民共和国国务院国家技术发明奖,二等奖,第四完成人(2011-F-251-2-01-R04)8.一种能防除连作作物枯萎病的拮抗菌及其微生物有机肥料,2010年,国家知识产权局中国专利奖,金奖,第三完成人9.有机肥作用机制和产业化关键技术研究与推广,2013年,教育部科技进步奖,一等奖(2013-198),第十完成人10.克服土壤连作障碍的微生物有机肥产品研制与产业化开发,2010 年,江苏省人民政府,江苏省科学技术奖,一等奖,第四完成人(2010-1-17-R4)11.一种能防除连作作物枯萎病的拮抗菌及其微生物肥料,2009 年,江苏省知识产权局和江苏省财政厅,江苏省专利奖,金奖,第三完成人四、教学情况连续多年主讲农业资源与环境本科专业的专业课程《植物营养学》理论和资源环境分析系列实验Ⅱ,为丰富植物营养学实习课程的内容,多次参与选点,制定更新实习计划。
Protein Function Predict
汇报人
刘言
Sample of homologous sequences
MSA and Family features identification
Secondary structure prediction
Flod recognition and Secondary structure alignment
a crystal structure of another member of the same family to be available. (more than 25% of identical amino acids)
Comparative modeling of the investigated sequence
Check of physical-chemical contains
rotamers libraries, energy minimization, molecular dynamics,pdb loop databases
Binding pocket prediction within the best comparative model
mutagenesis and literature data about binding
Molecular based docking and virtual screening of chemical libraries in the proposed binding region
在蛋白质表面寻找那些与生物化学和细胞学功能有关的位点始终 是一个问题。 • Sequence-based methods allow the identification of a ligandbinding interaction motif. • Structure-based methods are reliant on homology and require
t2t级别 英语 基因组 -回复
t2t级别英语基因组-回复the following question, the complete question is: "What is a genome and how does it work?"Introduction:The genome is the complete set of genetic material or DNA present in an organism. It contains all the information necessary for the development, growth, and functioning of an organism. This article aims to explain what a genome is and how it works, highlighting the various components involved and their functions.1. What is a genome?A genome is the entire DNA sequence present in an organism. It consists of all the genes, non-coding regions, and regulatory elements that determine the characteristics and traits of an individual. Genomes can vary in size and complexity depending on the organism. For example, humans have a genome consisting of approximately 3 billion base pairs, while bacteria have smaller genomes.2. Structure of a genome:A genome is composed of DNA, which is a long double-stranded helix. DNA is made up of four nucleotides – adenine (A), thymine (T), cytosine (C), and guanine (G). These nucleotides form base pairs where A pairs with T, and C pairs with G. The arrangement of these base pairs forms the genetic code that carries the instructions for building and maintaining an organism.3. Genes and their functions:Genes are segments of DNA that contain instructions for making proteins, which are the building blocks of cells. They provide the blueprint for the structure and function of an organism. Genes control various traits such as eye color, height, and susceptibility to diseases. Each gene consists of a specific sequence of nucleotides that encode a protein or RNA molecule.4. Non-coding regions:In addition to genes, a genome also contains non-coding regions. These regions do not code for proteins but play essential regulatory roles. They control gene expression by determiningwhen and where genes are turned on or off. Non-coding regions include enhancers, silencers, and promoters, which interact with specific proteins to regulate gene activity.5. Genome organization:Genomes are organized into chromosomes, which are long strands of DNA wrapped around structural proteins called histones. Chromosomes are located within the nucleus of eukaryotic cells and are visible during cell division. They ensure the proper distribution of genetic material to daughter cells. Genes and other regulatory elements are arranged linearly along the chromosomes.6. Replication and transcription:DNA replication is the process by which a cell creates an exact copy of its genome. It occurs during cell division and ensures that each daughter cell receives an identical set of chromosomes. Transcription is the process of copying the genetic information from DNA into RNA. It serves as an intermediate step in protein synthesis.7. Translation and protein synthesis:Translation is the process by which the genetic information carried by RNA is converted into a sequence of amino acids to form a protein. This process occurs in the ribosomes, where transfer RNA molecules bring the amino acids to the ribosome according to the instructions encoded in the RNA. Proteins have numerous essential functions in cells, including enzymatic activity, structural support, and signaling.8. Genomics and its importance:Genomics is the study of genomes and their functions. Advances in genomics have revolutionized many areas of biology and medicine. It has enabled the identification of disease-causing genes, the development of personalized medicine, and the understanding of evolutionary relationships between species. Genomic research is continually uncovering new insights into the complexity of life and helping solve biological mysteries.Conclusion:The genome is an intricate and essential component of living organisms. Its discovery and study have transformed our understanding of genetics, biology, and human health. By unraveling the secrets of the genome, scientists have unlocked the potential for diagnosing and treating diseases, developing new agricultural techniques, and expanding our knowledge of the natural world.。
生物大数据_福建农林大学中国大学mooc课后章节答案期末考试题库2023年
生物大数据_福建农林大学中国大学mooc课后章节答案期末考试题库2023年1.翻译contig参考答案:跨叠克隆群##%_YZPRLFH_%##重叠克隆群##%_YZPRLFH_%##克隆重叠群##%_YZPRLFH_%##重叠群##%_YZPRLFH_%##克隆叠连群2.导致氨基酸改变的核苷酸变异称为__________突变,它又可分为错义突变或无义突变参考答案:非同义3.生物信息学主要是利用哪种工具实现对生命科学研究中生物信息的存储、检索和分析的?()参考答案:计算机4.Proteomics的含义是()参考答案:蛋白质组学5.被誉为“生物信息学之父”的科学家是()参考答案:林华安6.利用PubMed文献数据查找论文“Transgenic plants of Petunia hybridaharboring the CYP2E1 gene efficiently remove benzene and toluenepollutants and improve resistance to formaldehyde”的第一作者是参考答案:Zhang D7.Bioinformatics的含义是()参考答案:生物信息学8.核酸序列一个位点的InDel会引起编码蛋白质的________突变参考答案:移码9.全基因组中拷贝数变异CNV有5种形式,列举一种_________参考答案:缺失##%_YZPRLFH_%##串联复制##%_YZPRLFH_%##不连续的复制##%_YZPRLFH_%##高层次的复制##%_YZPRLFH_%##复杂的拷贝数变异##%_YZPRLFH_%##高层次的复制变异##%_YZPRLFH_%##不连续的复制变异##%_YZPRLFH_%##串联复制变异##%_YZPRLFH_%##缺失变异10.Q值低于___时,相应的读段应该过滤掉参考答案:3011.数据库提供了最全面和可靠的注释信息,被称为蛋白质序列数据的“黄金标准”。
活化蛋白
Expression in a RabGAP yeast mutant of two human homologues,one of which is an oncogene qChristelle Bizimungu,a,*Nancy De Neve,a Ars e ne Burny,a St e phane Bach,a,1Franc ßoise Bontemps,b Daniel Portetelle,a and Micheline Vandenbol aaAnimal and Microbial Biology Unit,Gembloux Agricultural University,B-5030Gembloux,BelgiumbLaboratory of Physiological Chemistry,Christian de Duve Institute of Cellular Pathology,Universit e Catholique de Louvain,B-1200Brussels,BelgiumReceived 25August 2003AbstractThe yeast proteins Msb3p and Msb4p are two Ypt/Rab-specific GTPase-activating proteins (GAPs)involved in cell growth polarization.Both proteins share with a wide variety of other proteins the highly conserved TBC domain forming the catalytically active RabGAP domain.In particular,Msb3p and Msb4p are similar to the human proteins oncTre210p (the 786-amino-acid product of the human Tre2oncogene,implicated in Ewing’s sarcoma)and RN-tre (a Rab5-GAP controlling endocytosis of the EGFR).To further understand the biochemical function of Tre2oncogene,we expressed its cDNA and,as a control,the RN-tre cDNA,in an msb3msb4double mutant yeast plementation data show that RN-tre can,unlike Tre2,replace the function of the MSB3and MSB4genes.As two highly conserved amino acids,including the catalytic arginine,are mutated in the onc-Tre210p TBC domain,we restored these two amino acids and expressed the modified Tre2cDNA in the yeast mutant.Ó2003Elsevier Inc.All rights reserved.Keywords:MSB3/GYP3;MSB4/GYP4;GTPase-activating protein;Yeast;Tre2oncogene;RN-treYpt/Rab proteins,which belong to the Ras super-family of small GTPases,are central regulators of in-teroganellar traffic in eukaryotic organisms.They seem to be involved in all steps of vesicular transport,i.e.,vesicle budding from a donor membrane,movement,tethering/docking,and fusion with target membranes [1–4].Moreover,Ypt/Rabs seem to be coordinators of the different transport steps and of steps between the transport machinery and other cellular processes [5].To date,the Ypt/Rab family counts 60members in mam-malian cells (Rab proteins)and 11members in yeast(Ypt proteins).Most Ypts have one or more mamma-lian homologues and share similar functions with them,this being indicative of the conservation of the vesicle traffic mechanism through evolution [6].Like other GTPases,Ypt/Rab proteins switch between two func-tionally distinct conformations,one GDP-bound and one GTP-bound.The latter is the active or ‘on’form and interacts with downstream effectors [6].This func-tional cycle is regulated by guanosine nucleotide ex-change factors (GEFs),which enhance an endogenous nucleotide exchange activity,and GTPase-activating proteins (GAPs),which catalyze slow intrinsic GTP hydrolysis [6].Ypt/Rab stimulation of GTPase activity seems to happen through an ‘arginine finger’mecha-nism,as described for Ras-and Rho-GAPs [7,8].In this activation mechanism,the GAP exposes an arginine residue at the top of a loop pointing into the catalytic site.This essential arginine neutralizes negative charges created on the b -phosphate group of GTP when the enzyme is in the transition state.In addition,the GAP arginine stabilizes the catalytically essential glutamine ofqAbbreviations:DMSO,dimethyl sulfoxide;HPLC,high-perfor-mance liquid chromatography;GAP,GTPase-activating protein;GEF,guanine nucleotide exchange factor;EGFR,epidermal growth factor receptor;RN-tre,the so-called ‘‘protein related to the N-terminus of Tre’’;SD,synthetic dextrose minimal medium;TBC ,Tre2/BUB2/Cdc16;YPD,yeast extract peptone glucose medium.*Corresponding author.Fax:+32-81-61-15-55.E-mail address:bizimungu.c@fsagx.ac.be (imungu).1Present address:Station Biologique de Roscoff,CNRS,29682RoscoffCedex,Bretagne,France.0006-291X/$-see front matter Ó2003Elsevier Inc.All rights reserved.doi:10.1016/j.bbrc.2003.09.051Biochemical and Biophysical Research Communications 310(2003)498–504BBRC/locate/ybbrcGTPase[9].The Ypt/Rab-GAPs of yeast,Drosophila, plants,nematodes,and men all share a conserved do-main called the TBC domain[10,11],which is the cata-lytically active domain of these Ypt/Rab-GAPs[8,10].The yeast genes MSB3and MSB4code for two pro-teins with very similar sequences(55%amino acid identity over most of their lengths[12]),structurally related to Ypt/Rab-GAPs.Moreover,in vitro biochemical studies have provided evidence that these proteins are potent GAPs for a number of Ypt GTPases[13,14].Our previous results have revealed that co-deletion of the MSB3and MSB4coding regions(i)causes growth inhibition in the presence of caffeine or DMSO,(ii)increases sensitivity to latrunculin-A,(iii)produces a random budding pattern in diploid cells,and(iv)affects the organization of the actin cytoskeleton[15]and causes the GTP level to increase significantly[our unpublished data].These results are in agreement with the fact that Msb3p and Msb4p are in-volved in a pathway linking the activation of Cdc42p to polarization of the cytoskeleton[16].Two particular TBC-containing homologues of Msb3p and Msb4p are the human proteins RN-tre and oncTre210p.The latter is the786-aa product of the Tre2 oncogene,originally identified in a Ewing’s sarcoma cell line[17].A recent report shows that the oncoprotein may be part of an effector complex for the Rho-GTPases Cdc42p and Rac1in actin remodeling[18],but its role in malignant transformation remains unknown.Sequence analysis of Tre2(Fig.1A)reveals that the oncogene derives from the chimeric fusion of two genes,NY-REN-60,an ancient,highly conserved gene,and TBC1D3,a more recent gene encoding a TBC domain[19].RN-tre (for‘‘related to the N-terminus of Tre’’)is an828-aa protein highly similar to the Tre2oncogene product (36%identity and55%similarity over462aa)[20].It has been identified as a GAP for Rab5,which plays a role in controlling EGF-receptor signaling via internalization of this receptor[21].Given their sequence similarity to Msb3p and Msb4p, we tested the ability of the human TBC-containing oncTre210p and RN-tre proteins to complement the msb3msb4double mutation for all the phenotypes ob-served.In the present study,we provide evidence that the Tre2oncogene codes for a non-functional Ypt/Rab-GAP protein.Materials and methodsStrains,growth conditions,and DNA manipulations.The Saccha-romyces cerevisiae strains used in this study are listed in Table1.Gene disruptions were created in the yeast diploid strain FYBL3[22](iso-genic with S288C).The msb3msb4deletion mutant was created in a previous work[15].Yeast cells were grown either in YPD complete medium or in SD minimal medium supplemented with the required amino acids[23].Escherichia coli strain XL-1Blue was used as the recipient for constructing and propagating the plasmids describedin Fig.1.Primary structure analysis of oncTre210p and sequence alignment of the TBC domains of oncTre210p,RN-tre,Msb3p,Msb4p,and Gyp1p.(A)The oncoprotein shows a bipartite structure:its N-terminal region contains the TBC domain and its C-terminal part is identical to the N-terminal part of ubiquitin-specific protease NY-REN-60[19].(B)Alignment was performed with the ClustalW program[29](URL:/ clustalw/)and shared motifs(A–F)as defined by Neuwald[10]were aligned manually.Dark gray boxes indicate strictly conserved amino acids in two or more of the proteins,conservative substitutions are boxed in light gray.The conserved arginine in motif B and glutamine in motif C,mutated, respectively,to threonine and arginine in the oncTre210p sequence,are marked by black circles.Positions of the TBC domains were determined in (A)and(B)with the SMART V3.3program(URL:http://www.smart.embl-heidelberg.de/).imungu et al./Biochemical and Biophysical Research Communications310(2003)498–504499this work.It was selectively grown in LB medium containing100l g/ml ampicillin.Standard methods were used for yeast genetics and recombinant DNA manipulations[23].Restriction enzymes and other DNA mod-ification enzymes were purchased from Amersham Biosciences and used as recommended by the manufacturer.Plasmid DNA from E.coli was prepared using the High Pure Plasmid Isolation Kit(Roche Di-agnostics Belgium).DNA sequences were determined by the dideoxy chain termination method[24]on double-stranded plasmid DNA, using the T7sequenase quick-denature plasmid sequencing kit(USB). Custom oligonucleotides were provided by Sigma.Plasmids.The plasmids used in this study are listed in Table1. oncTre210(cDNA encoding the786-amino-acid product of the Tre2 oncogene),the‘‘GAP’’part of the oncTre210cDNA(i.e.,the50part from bp1to bp1494,containing the TBC domain from bp289to bp 945),and the cDNA encoding the828-amino-acid RN-tre protein were cloned in the ARS/CEN yeast expression vector pYX122downstream from the triose phosphate isomerase(TPI)promoter and upstream from the coding sequence for the HA epitope,to produce C-terminally HA-tagged proteins.TPI is one of the strongest constitutive promoters in yeast,providing expression levels comparable to those of some of the most abundant yeast enzymes.The vectors obtained,respectively, pYX-oncTre210,pYX-GAP,and pYX-RN-tre,were checked by DNA sequencing.MSB3and MSB4cDNAs were cloned in the yeast ex-pression vector pYX122in a previous work[15].Site-directed mutagenesis.For site-directed mutagenesis,oncTre210 and GAP was cloned in the cloning vector pUC18.Replacement of the threonine(T150)with the arginine codon and of the arginine(R187)with the glutamine codon was achieved by PCR-mediated mutagenesis (GeneTailor Site-Directed Mutagenesis System,Invitrogen)using the following oligonucleotides as primers:T150R,50-cacatcgacctggacgtgagg aggactctccgga-30and50-cctcacgtccaggtcgatgtggtggatgtgt-30;R187Q,50-aacccggaggtgggctactgccaggacctgagcc-30and50-gcagtagcccacctccgggttat actccgaa-30.The oncTre210and GAP-oncTre210sequences carrying one or both mutations were then subcloned in the pYX122vector.The vectors obtained—pYX-onc(T150R),pYX-onc(R187Q),pYX-onc (T150R/R187Q),pYX-GAP(T150R),pYX-GAP(R187Q),and pYX-GAP(T150R/R187Q)—were checked by DNA sequencing.Complementation experiments.For the complementation tests,the homozygous double-mutant diploid strain FyBLT3YN was trans-formed with pYX122,pYX-Ynl,pYX-Yol,pYX-oncTre210,pYX-GAP,pYX-RN-tre,pYX-onc(T150R),pYX-onc(R187Q),pYX-onc(T150R/R187Q),pYX-GAP(T150R),pYX-GAP(R187Q),or pYX-GAP(T150R/R187Q).Transformants were selected on histidine-free plates.After exponential growth on YPD rich medium,three di-lutions(105,104,and103cells per drop)of each strain were plated on YPD,YPD+10mM caffeine,and YPD+4%DMSO.The plates were incubated for3days at30°C.Localization of HA-epitope-tagged proteins.The localization of re-combinant epitope-tagged proteins was determined as described by Adams et al.[25],afterfixing the cells by adding formaldehyde directly to the culture medium(final concentration:3.7%).For‘‘one-step’’detection,a rhodamine-conjugated monoclonal anti-HA mouse anti-body was used(purchased from Roche Diagnostics,Belgium)and the stain was detected byfluorescence microscopy with a Nikon Eclipse E800microscope.HPLC analysis of nucleotides and statistical analyses.The method used,described previously by the team of Bontemps[26],was adapted to yeast cells.To measure intracellular nucleotides,109exponentially growing cells were harvested rapidly by centrifugation at4°C and the cell pellet was washed twice in1ml PBS(0.8%NaCl;0.02%KCl;0.15%Na2HPO4Á2H2O;and0.02%KH2PO4)before adding220l l of 1N HClO4.The supernatant obtained after centrifugation was neu-tralized with3M K2CO3.Throughout the experiment,the samples were maintained at4°C.Nucleotides were separated by HPLC on a 110Â4.7mm Partisphere5SAX(Whatman)column by the method of Hartwick and Brown[27],modified by Vincent et al.[26].UV detection of nucleotides was performed at254nm.All results are expressed as meansÆSD.ResultsExpression of the RN-tre and Tre2cDNA sequences in yeastThe structural similarity between oncTre210p,RN-tre,Msb3p,and Msb4p suggests that the human proteinsTable1Saccharomyces cerevisiae strains and plasmids used in this studyStrain Relevant genotype ReferenceFYBL3MAT a/MAT a ura3D851/ura3D851trp1D63/+leu2D1/leu2D1his3D200/his3D200lys2D202/+[22]FyBLT3YN MAT a/MAT a ura3D851/ura3D851trp1D63/trp1D63leu2D1/leu2D1his3D200/his3D200lys2D202/lys2D202msb3-D1::TRP1/msb3-D1::TRP1msb4-D1::kan R/msb4-D1::kan R[15]Plasmid Characteristics Reference or sourcepYX122Low-copy(ARS/CEN),HIS3marker gene,TPI promoter Ingenius,R&D Systems,Europe pYX-Ynl Yeast MSB3in pYX122[15]pYX-Yol Yeast MSB4in pYX122[15]pYX-oncTre210Human oncTre210in pYX122See textpYX-GAP Human GAP in pYX122See textpYX-RN-tre Human RN-tre in pYX122See textpYX-onc(T150R)Human oncTre210with correct catalytic arginine in pYX122See textpYX-onc(R187Q)Human oncTre210with correct stabilizing glutamine in pYX122See textpYX-onc(T150R/R187Q)Human oncTre210with both correct catalytic arginine and stabilizingglutamine in pYX122See textpYX-onc(T150R)Human GAP with correct catalytic arginine in pYX122See textpYX-onc(R187Q)Human GAP with correct stabilizing glutamine in pYX122See textpYX-onc(T150R/R187Q)Human GAP with both correct catalytic arginine and stabilizingglutamine in pYX122See text500 imungu et al./Biochemical and Biophysical Research Communications310(2003)498–504are functionally related to the yeast Ypt/Rab-GAPs Msb3p and Msb4p.Because it has been shown previ-ously that the msb3msb4double-mutant yeast cells display particular phenotypes (see Introduction),we tested the ability of oncTre210p and RN-tre to comple-ment functionally the combined msb3and msb4yeast mutations.As shown in Fig.1A,the TBC domain of the onco-protein (786aa)is located N-terminally (amino acids 97–315),whereas its C-terminal region (amino acids 500–786)shares similarity to ubiquitin-specific proteases [19,28].To test the functional contribution of the latter portion,we constructed a C-terminally truncated pro-tein (amino acids 1–499)and named it GAP because it contains essentially the TBC domain identified as cata-lytically active in Ypt/Rab-GAPs.The double-mutant strain FyBLT3YN was transformed with a low-copy plasmid bearing an insert encoding HA-tagged RN-tre ,oncTre210,or GAP (respectively,the pYX-RN-tre,pYX-oncTre210,or pYX-GAP plasmid).Prior to the complementation experiment,we exam-ined the expression of the human cDNA sequences in yeast.Immunostaining of the transformants with anti-HA antibody yielded detectable signals for all three proteins,indicative of their correct synthesis in yeast mutants (data not shown).We next examined the phe-notypes of the double-mutant yeast strain transformed with each of the plasmids containing one of the three human cDNAs.As controls,we used the diploid wild-type strain transformed with the empty expression vec-tor and the diploid double-mutant strain FyBLT3YN with the empty expression vector or carrying the MSB3or MSB4gene.Previous results have shown that the msb3msb4double mutation causes growth inhibition when caffeine or DMSO is added to the culture medium [15].Here,strains expressing either the complete Tre2oncogene or its GAP region showed similar growth inhibition.On the other hand,the cell growth defect was suppressed when RN-tre was synthesized (Fig.2).Co-deletion of the MSB3and MSB4coding regions also affects cell growth polarization:double-mutant cells are more round and less regular in shape than normal and display a random budding pattern [15,16].There-fore,we observed transformant morphologies under the microscope (Fig.3).Cells expressing RN-tre again showed a normal ovoid shape with no ‘‘multi-budding’’pattern,whilst cells synthesizing the oncoprotein or its GAP portion retained the morphological abnormalities of the untransformed double mutant.Finally,the msb3msb4double mutation causes the total GTP level to increase significantly ($3-fold —our results not yet published)in diploid strains.This sur-prising phenotype was also explored in the three yeast transformants (Fig.4).In agreement with the results presented above,we found that the GTP concentration measured in pYX-oncTre210-orpYX-GAP-transformedFig.2.Analysis of the growth behavior of the yeast msb3msb4double mutant expressing human oncTre210,GAP or RN-tre cDNA.After expo-nential growth on YPD rich medium,three dilutions (105,104,and 103cells per drop)of each strain were plated on YPD,YPD +10mM caffeine,and YPD +4%DMSO.The controls were the wild-type (FYBL3)and double-mutant (FyBLT3YN)strains transformed with the empty expression vector pYX122and the double-mutant strain synthesizing Msb3p (pYX-Ynl)or Msb4p (pYX-Yol).The mutant cells synthesizing oncTre210p,GAP,or RN-tre harbored,respectively,the pYX-oncTre210,pYX-GAP,or pYX-RN-tre vector.The plates were incubated for 3days at 30°C.Fig. 3.Cell morphology of the yeast msb3msb4double mutant transformed with a plasmid carrying the human oncTre210,GAP ,or RN-tre cDNA.Wild-type and double-mutant cells were grown in YPD rich medium and viewed with a microscope when in the exponential growth phase.imungu et al./Biochemical and Biophysical Research Communications 310(2003)498–504501cells did not differ significantly from that of the double-mutant strain.In contrast,cells expressing RN-tre had a normal GTP level.Taken together,these results show that the human RN-tre protein can totally replace the function of the two yeast MSB3and MSB4genes,but that the onco-protein or its GAP region cannot.These results indicate that oncTre210p has no GAP activity when expressed in yeast.The Tre2oncogene thus seems to code for a non-functional Ypt/Rab-GAP protein.Two catalytically important residues are mutated in the TBC domain of the oncoproteinAlignment of oncTre210p with other Ypt/Rab-GAPs sequences reveals two conserved residues that are mu-tated within its TBC domain (Fig.1B):a threonine re-places the catalytic arginine (residue 150)and a critical glutamine is changed to arginine (residue 187).Thecatalytic arginine is essential to GAP activity;its mu-tation to either alanine or lysine leads to significant to complete inactivation of GAP proteins [7,30,31].The critical glutamine plays a role in ‘‘arginine finger’’stabilization.In fact,the known three-dimensional structure of Gyp1p [8]shows that this glutamine (Q378)forms a hydrogen bond with a highly conserved aspar-tate (D340),which stabilizes the catalytic arginine (R343)via a salt bridge (Fig.5A).Using the Swiss-PdbViewer program [32],we replaced arginine R343and glutamine Q378in the three-dimensional structure with a threonine and an arginine,respectively,as in the oncTre210p sequence.In the resulting model (Fig.5B),the conserved aspartate appeared no longer to form a salt bridge with the threonine replacing the catalytic arginine (R343T),but to interact via a hydrogen bond with the arginine replacing the stabilizing glutamine (Q378R).These two mutations might thus explain the oncoprotein’s loss of activity.To further explore the functional contribution of these two mutations,we restored one or both conserved resi-dues in the entire oncTre210p sequence and in the GAP partial sequence by site-directed mutagenesis.As our initial complementation experiments had shown that neither the complete oncTre210p nor its GAP part showed any GAP activity when synthesized in a msb3msb4double-mutant yeast cell (see above),we tested the ability of the corrected proteins to replace functionally the yeast genes MSB3and MSB4.The double-mutant strain FyBLT3YN was transformed with a low-copy vector carrying one of the six modified cDNAs encoding HA-tagged proteins (plasmid pYX-onc(T150R),pYX-onc (R187Q),pYX-onc(T150R/R187Q),pYX-GAP(T150R),pYX-GAP(R187Q),or pYX-GAP(T150R/R187Q)).Be-fore examining the phenotypic consequences of the mu-tations,we checked the expression of the modified cDNAs by immunostaining the transformants.In all cases,a fluorescent signal was detected,demonstrating the syn-thesis of the human proteins in yeast (data not shown).Then we observed the phenotypes of the six transfor-mants.All of the FyBLT3YN transformants expressing a ‘‘corrected’’oncTre210or GAP ,whether one orbothFig.4.Intracellular GTP concentrations in the diploid msb3msb4mutant strain synthesizing human oncTre120p,GAP or RN-tre pro-teins.Strains transformed with pYX-oncTre210,pYX-GAP,or pYX-RN-tre were grown in YPD.Cell extracts were prepared as described under Materials and methods.Controls are the wild-type and double mutant,both transformed with the empty expression vector,and the double-mutant strain synthesizing yeast Msb3p (pYX-Ynl)or Msb4p (pYX-Yol).The data in each column are means of three independent determinations and the error bars represent standard errors on themeans.Fig.5.Structural model of the oncoprotein active site.(A)Structure of a part of the Gyp1p active site containing the essential arginine (R343)and glutamine (Q378).This protein is the first Ypt-GAP whose crystal structure has been determined [8].(B)Model of the oncTre210p active site.The catalytically active arginine is mutated to threonine (R343T)and the conserved glutamine to arginine (Q378R).These two figures were generated with the Swiss-PdbViewer program [32](URL:/spdbv/).502 imungu et al./Biochemical and Biophysical Research Communications 310(2003)498–504critical amino acids were restored,showed the same growth inhibition in the presence of caffeine or DMSO (Fig.6)and the same morphological anomalies (data not shown)as the untransformed double mutant.In other words,none of the six corrected proteins could pheno-typically complement the double mutation.These results suggest that the oncoprotein’s loss of activity is not due solely to the presence of the two mutations in its TBC domain.DiscussionPrimary structure analyses of oncTre210p have re-vealed its similarity to Ypt/Rab-GAP proteins,and notably a high degree of similarity to the yeast proteins Msb3p and Msb4p,two GAPs involved in polarization of the actin cytoskeleton through a Cdc42p pathway [13,14,16].Both Msb3p and Msb4p share the catalyti-cally active TBC domain that characterizes Ypt/Rab-GAPs.As shown in the alignment presented in Fig.1B,oncTre210p also contains the highly conserved TBC domain.So one might expect the human oncoprotein to be a GAP and to perform the same function(s)as these two proteins.Interestingly,this work has shown that this is not the case and suggests that the oncoprotein lacks catalytic activity.As a control,we showed that another human homologue of Msb3p and Msb4p,highly similar to oncTre210p,can totally replace the two yeast Ypt/Rab-GAPs.This protein is RN-tre,a GAP for the Rab5GTPase involved in EGF-receptor endocyto-sis.RN-tre also controls a signaling pathway leading toactin cytoskeletal remodeling via its interaction with Eps8,a cellular mediator of EGFR [33].These two observations raise a question.Why is the oncoprotein inactive?Ypt/Rab-GAPs seem to exert their activity through an ‘arginine finger’mechanism.In this activation mechanism,a highly conserved arginine is es-sential to GAP activity [7,8].For example,its mutation to alanine in RN-tre or in Gyp proteins leads to complete or significant loss of GAP activity [7,21,30,31].As shown in Fig.1B,the catalytic arginine is conserved in RN-tre and other Ypt-GAPs,but replaced by a threonine in the oncTre210p TBC domain.In addition,a conserved glu-tamine involved in ‘arginine finger’stabilization is mu-tated to arginine in the oncoprotein sequence.These two mutations are likely to play a role in the loss of onc-Tre210p GAP activity,but our study shows that these two substitutions alone do not account for the inability of the human oncoprotein to function as a GAP in yeast.In Gyp1p,Gyp7p,and Gyp5p Ypt-GAPs,the TBC domain is located C-terminally and a segment of variable length follows it.It has been shown that this segment,or a part of it,is essential to the catalytic activity of the protein [7,31].In Gyp1p,for example,deletion of the 104amino acids downstream from the TBC domain renders the GAP totally inactive [7].The three-dimensional structure of Gyp1p reveals that this C-terminal segment contains an a -helix,which is probably important for maintaining the tertiary architecture of the GAP.Moreover,the solvent-exposed residues of this a -helix might be involved in binding of the GTPase and might determine selectivity for members of the Ypt/Rab-GTPase family [8].Perhaps the C-terminal region of the GAP part of oncTre210p is unfit for correct folding or GTPase binding.This hypothesis could be checked by replacing the C-terminal region fol-lowing the TBC domain in oncTre210p with that of the active GAP RN-tre.To be active,some proteins require recruitment to a particular cell component,interaction with one or more other proteins or co-factors,correct cellular localiza-tion ...For example,the Rab5-GAP RN-tre requires as-sociation with Grb2in order to exert its action on Rab5and EGF-receptor internalization [34].A next step to-wards understanding the function of the oncoprotein will be to seek its partner(s)in human cancer cells.An expla-nation for the lack of oncTre210p activity in yeast might be the absence of suitable partners or co-factors required to render it functional.In addition,characterization of human target protein(s)will probably shed light on ele-ments contributing to the oncoprotein’s carcinogenic action and on the pathway where it is involved.AcknowledgmentsWe thank M.Pr e vot (Animal and Microbial Biology Unit,Gembloux,Belgium)for his excellent technical assistance in yeast ge-netics.We are grateful to A.Delacauw (Christian de Duve InstituteofFig.6.Analysis of the growth behavior of the yeast msb3msb4double mutant expressing modified oncTre210or GAP cDNA.After expo-nential growth on YPD rich medium,three dilutions (105,104,and 103cells per drop)of each strain were plated on YPD,YPD +10mM caffeine,and YPD +4%DMSO.The controls were the wild-type (FYBL3)and double-mutant (FyBLT3YN)strains transformed with the empty expression vector pYX122.Mutant cells synthesizing a ‘‘corrected’’oncTre210p or GAP where one or both conserved amino acids are restored harbored one of the following vectors:pYX-onc(T150R),pYX-onc(R187Q),pYX-onc(T150R/R187Q),pYX-GAP(T150R),pYX-GAP(R187Q),or pYX-GAP(T150R/R187Q).The plates were incubated for 3days at 30°C.imungu et al./Biochemical and Biophysical Research Communications 310(2003)498–504503Cellular Pathology,Brussels,Belgium)for crucial help with the HPLC analyses.We also thank Pr.Di Fiore(European Institute of Oncology, Milan,Italy)for the RN-tre cDNA and Dr.I.Callebaut(Jussieu, Paris,France)for her help in the use of the Swiss-PdbViewer program. This work was supported by the Belgian‘‘Fonds National de la Recherche Scientifique’’(FNRS)imungu is a recipient of an ‘‘aspirant FNRS’’fellowship.References[1]F.Schimm€o ler,I.Simon,S.R.Pfeffer,Rab GTPases,directors ofvesicle docking,J.Biol.Chem.273(1998)22161–22164.[2]J.A.Hammer III,X.S.Wu,Rabs grab motor:defining theconnections between Rab GTPases and motor proteins,Curr.Opin.Cell.Biol.14(2002)69–75.[3]M.J.Tuvim,R.Adachi,S.Hoffenberg, B.F.Dickey,Trafficcontrol:Rab GTPases and the regulation of interorganellar transport,News Physiol.Sci.16(2001)56–61.[4]J.S.Rodman, A.Wandinger-Ness,Rab GTPases coordinateendocytosis,J.Cell Sci.113(2000)183–192.[5]N.Segev,Ypt and Rab GTPases:insight into functions throughnovel interactions,Curr.Opin.Cell Biol.13(2001)500–511. [6]H.Stenmark,V.M.Olkkonen,The Rab GTPase family,Gen.Biol.2(2001)3007.1–3007.7.[7]S.Albert,E.Will,D.Gallwitz,Identification of the catalyticdomains and their functionally critical arginine residues of two yeast GTPase-activating proteins specific for Ypt/Rab transport GTPases,EMBO J.18(1999)6216–6225.[8]A.Rak,R.Fedorov,K.Alexandrov,S.Albert,R.S.Goody,D.Gallwitz,A.J.Scheidig,Crystal structure of the GAP domain of Gyp1p:first insight into interaction with Ypt/Rab proteins, EMBO J.19(2000)5105–5113.[9]K.Scheffzek,M.R.Ahmadian,A.Wittinghofer,GTPase-activat-ing proteins:helping hands to complement an active site,TIBS23 (1998)257–262.[10]A.F.Neuwald,A shared domain between a spindle assemblycheckpoint protein and Ypt/Rab-specific GTPase-activators, TIBS22(1997)243–244.[11]S.D.Zhang,J.Kassis,B.Olde,D.M.Mellerick,W.F.Odenwald,Pollux,a novel Drosophila adhesion molecule,belongs to a family of proteins expressed in plants,yeast,nematodes and man,Genes Dev.10(1996)1108–1119.[12]L.Iwanejko,K.N.Smith,S.Loeillet, A.Nicolas, F.Fabre,Disruption and functional analysis of six ORFs on chromosome XV:YOL117w,YOL115w(TRF4),YOL114c,YOL112w (MSB4),YOL111c and YOL072w,Yeast15(1999)1529–1539.[13]S.Albert,D.Gallwitz,Two new members of a family of Ypt/RabGTPase activating proteins,J.Biol.Chem.274(1999)33186–33189.[14]S.Albert,D.Gallwitz,Msb4p,a protein involved in Cdc42p-dependent organization of the actin cytoskeleton,is a Ypt/Rab-specific GAP,Biol.Chem.381(2000)453–456.[15]S.Bach,O.Bouchat,D.Portetelle,M.Vandenbol,Co-deletion ofthe yeast MSB3and MSB4coding regions affects bipolar budding and perturbs the organisation of the actin cytoskeleton,Yeast16 (2000)1015–1023.[16]E.Bi,J.B.Chiavetta,H.Chen,G.G.Chen,C.S.M.Chan,J.R.Pringle,Identification of novel,evolutionarily conserved Cdc42p-interacting proteins and of redundant pathways linking Cdc24p and Cdc42p to actin polarization in yeast,Mol.Biol.Cell11 (2000)773–793.[17]T.Nakamura,J.Hillova,R.Mariage-Samson,M.Hill,Molecularcloning of a novel oncogene by DNA recombination during transfection,Oncogene Res.2(1988)357–370.[18]J.M.Masuda-Robens,S.N.Kutney,H.Qi,M.M.Chou,TheTRE17oncogene encodes a component of a novel effector pathway for Rho GTPases Cdc42and Rac1and stimulates actin remodeling,Mol.Cell.Biol.23(2003)2151–2161.[19]C.A.Paulding,M.Ruovolo, D.A.Haber,The Tre2(USP6)oncogene is a hominoid-specific gene,A 100(2003)2507–2511.[20]B.Matoskova,W.T.Wong,N.Seki,T.Nagase,N.Nomura,K.C.Robbins,P.P.DI Fiore,RN-tre identifies a family of tre-related proteins displaying a novel potential protein binding domain, Oncogene12(1996)2563–2571.[21]nzetti,V.Rybin,M.G.Malabarba,S.Christoforidis,G.Scita,M.Zerial,P.P.Di Fiore,The Eps8protein coordinates EGF receptor signalling through Rac and trafficking through Rab5, Nature408(2000)374–377.[22]C.Fairhead,B.Llorente,F.Denis,M.Soler,B.Dujon,Newvectors for combinatorial deletions in yeast chromosomes and for gap-repair cloning using‘split-marker’recombination,Yeast12 (1996)1439–1457.[23]F.M.Ausubel,R.Brent,R.E.Kingston, D.D.Moore,J.G.Seidman,J.A.Smith,K.Struhl,Current Protocols in Molecular Biology,Wiley,New York,1987.[24]F.Sanger,S.Nicklen, A.R.Coulson,DNA sequencing withchain-terminating inhibitors,A74 (1977)5463–5467.[25]A.Adams,D.E.Gottschling,C.A.Kaiser,T.Stearns,Methods inYeast Genetics,Cold Spring Harbor Laboratory Press,New York,1997.[26]M.F.Vincent,F.Bontemps,G.Van den Berghe,Inhibition ofglycolysis by5-amino-4-imidazolecarboxamide riboside in iso-lated rat hepatocytes,Biochem.J.281(1992)267–272.[27]R.A.Hartwick,P.R.Brown,The performance of microparticlechemically-bonded anion-exchange resins in analysis of nucleo-tides,J.Chromatogr.112(1975)651–662.[28]F.R.Papa,M.Hochstrasser,The yeast DOA4gene encodes adeubiquinating enzyme related to a product of the human tre-2 oncogene,Nature366(1993)313–319.[29]D.Higgins,J.Thompson,T.Gibson,J.D.Thompson, D.G.Higgins,T.J.Gibson,CLUSTAL W:improving the sensitivity of progressive multiple sequence alignment through sequence weight-ing,position-specific gap penalties and weight matrix choice, Nucleic Acids Res.22(1994)4673–4680.[30]E.Will,D.Gallwitz,Biochemical characterization of Gyp6p,aYpt/Rab-specific GTPase-activating protein from yeast,J.Biol.Chem.276(2001)12135–12139.[31]A.De Antoni,J.Schmitzov a,H.H.Trepte, D.Gallwitz,S.Albert,Significance of GTP hydrolysis in Ypt1p-regulated endoplasmic reticulum to Golgi transport revealed by the analysis of two novel Ypt1p-GAPs,J.Biol.Chem.277(2002) 41023–41031.[32]N.Guex,M.C.Peitsch,SWISS-MODEL and the Swiss-Pdb-Viewer:an environment for comparative protein modeling, Electrophoresis18(1997)2714–2723.[33]P.P.Di Fiore,G.Scita,Eps8in the midst of GTPases,Int.J.Biochem.Cell Biol.34(2002)1178–1183.[34]L.Martinu, A.Santiago-Walker,Q.Hongwei,M.M.Chou,Endocytosis of epidermal growth factor receptor regulated by Grb2-mediated recruitment of the Rab5GTPase-activating pro-tein RN-tre,J.Biol.Chem.277(2002)50996–51002.504 imungu et al./Biochemical and Biophysical Research Communications310(2003)498–504。
羊布鲁菌BtpA 和BtpB 基因缺失突变载体
Chinese Journal of Animal Infectious Diseases中国动物传染病学报收稿日期:2020-05-25基金项目:动物布鲁菌新型标记灭活疫苗的研制及临床试验研究(20200402054NC);“十三五”国家重大科研专项(2016YFD0500900)作者简介:乔连江,男,硕士研究生,预防兽医学专业通信作者:杨艳玲,E-mali:********************2022,30(6):19-26·研究论文·羊布鲁菌BtpA 和BtpB 基因缺失突变载体的构建及生物信息学分析摘 要:为进一步研究布鲁菌IV 分泌系统(T4SS )效应蛋白BtpA 和BtpB 的分子功能,本研究预构建羊布鲁菌BtpA 和BtpB 基因缺失突变载体,并对其生物学功能进行简单的预测。
本研究以羊布鲁菌流行菌株基因组为模板,分别设计了BtpA 和BtpB 基因上、下游同源臂引物,通过PCR 克隆技术得到了待融合基因片段。
采用无缝克隆技术(in-fusion cloning ),将待融合片段与线性化载体PBK-CMV-SacB 连接,经转化、阳性载体的筛选、PCR 鉴定及DNA 测序验证。
并利用生信软件对BtpA 和BtpB 进行分析。
结果表明:PCR 鉴定和基因测序显示BtpA 和BtpB 上、下游基因片段均成功连接到自杀载体上;生信分析显示,BtpA 和BtpB 序列同源性达99%,不存在信号肽,二级结构以α-螺旋为主,具有良好的反应原性。
说明本试验成功构建了羊布鲁菌BtpA 和BtpB 基因缺失突变载体,并对基因结构和功能进行了预测分析,为下一步研究布鲁菌的致病机制奠定了基础。
关键词:布鲁菌;Ⅳ分泌系统;无缝克隆技术;生信分析中图分类号:S858.31文献标志码:A文章编号:1674-6422(2022)06-0019-08Construction of the Brucella melitensis BTPA and BTPB Genetic Defect Vectorsand Analysis of BioinformaticsQIAO Lianjiang, ZHANG Ping, ZHOU Yucheng, YANG Sen, YANG Yanling(Institute of Special Economic Animal and Plant Sciences, CAAS, Changchun 130000, China)乔连江,张 萍,周玉成,杨 森,杨艳玲(中国农业科学院特产研究所,长春130000)Abstract: To further study the molecular functions of BtpA and BtpB effector proteins of Brucella Ⅳ secretion system, the aim of the present study was to construct genetic defect vectors of BtpA and BtpB genes of Brucella standard strain 16M effector protein and make a simple prediction of their biological functions. Using the Brucella melitensis genome as a template, the homologous arm primers for the upstream and downstream of the BtpA and BtpB genes were designed respectively and the gene fragments to be fused were obtained by PCR cloning technology. In-Fusion Cloning was used to connect the gene fragments to linearized vectors, which were then verifi ed by transformation, screening of positive vectors, PCR identifi cation and DNA sequencing. The resulting BtpA and BtpB were analyzed using biological software. The results show that the fragment sizes and sequences of the recombinant vectors were identifi ed as expected by PCR. Bioinformatics analysis showed that the BtpA and BtpB sequences were 99% homologous without signal peptide and their secondary structures were mainly α-helix with good reactogenicity. These results indicated the success of construction of the Bt. Mutans BtpA and BtpB gene deletion mutation vectors and prediction of their structure and function, which laid the foundation for the investigation of the pathogenic mechanism of brucella.Key words: Brucella ; T4SS; in-fusion cloning; bioinformatics· 20 ·中国动物传染病学报2022年12月布鲁菌病(简称“布病”)是由布鲁菌属(Brucella spp.)引起的一种严重人畜共患传染病,被列为我国法定传染病中乙类传染病之首[1]。
幽门螺杆菌-HcpC
C LINICAL AND D IAGNOSTIC L ABORATORY I MMUNOLOGY,July2003,p.542–545Vol.10,No.4 1071-412X/03/$08.00ϩ0DOI:10.1128/CDLI.10.4.542–545.2003Copyright©2003,American Society for Microbiology.All Rights Reserved.Detection of High Titers of Antibody against Helicobacter Cysteine-Rich Proteins A,B,C,and E in Helicobacter pylori-Infected Individuals Peer R.E.Mittl,1*Lucas Lu¨thy,1Christoph Reinhardt,1and Hellen Joller2Biochemical Institute,University of Zu¨rich,CH-8057Zu¨rich,1and Institute forClinical Immunology,University of Zu¨rich,8044Zu¨rich,2SwitzerlandReceived24October2002/Returned for modification22January2003/Accepted4April2003The family of Helicobacter cysteine-rich proteins(Hcp)constitutes one of the largest protein families that arespecific for proteobacteria from the delta/epsilon subgroup.Most of the proteins belonging to this family haveso far only been recognized on the genome level.To investigate the expression of Hcp proteins in vivo weanalyzed titers of antibody against HcpA(HP0211),HcpB(HP0336),HcpC(HP1098),and HcpE(HP0235)insera from30Helicobacter pylori-positive individuals and in a control group of six H.pylori-negative individuals.Significantly higher titers of antibody were observed for H.pylori-positive individuals(P<0.00005).Thehighest and lowest titers were observed for HcpC(⌬mean؍1.06)and HcpB(⌬mean؍0.333),respectively.There is a clear correlation among anti-HcpA,-HcpC,and-HcpE immunoglobulin G titers in H.pylori-positiveindividuals(correlation>0.7),but there is only a weak correlation for HcpB(correlation<0.4).These resultsconfirm that Hcp proteins are expressed by H.pylori under natural environmental conditions and that theseproteins are recognized by the immune system of the host.The observed correlations are in agreement with theexpected distribution of Hcp proteins among H.pylori strains.HcpA,HcpC,and HcpE are present in thegenomes of strains26695and J99,whereas HcpB is absent from most strains.Since Hcp proteins are specificfor H.pylori,immunological assays including Hcp proteins might be of value to detect H.pylori infection andperhaps to distinguish among different groups of H.pylori-positive patients.The availability of complete genome data offers a great opportunity to improve our knowledge of pathogenic micro-organisms.The genome sequence confirms the presence of a particular open reading frame(ORF),but if the corresponding gene product is expressed and if it has an impact on the patho-genicity of the microorganism,it often remains elusive.The family of Helicobacter cysteine-rich proteins(Hcp)is one of the largest protein families that is specific for proteobacteria from the delta/epsilon subgroup.This family consists of ORFs HP0160,HP0211,HP0235,HP0336,HP0628,HP1098,HP1117, JHP0318,JHP1437,and CJ0413,which share between22and 66%sequence identity on the protein level.In this work we follow the nomenclature that was initiated by Cao and cowork-ers(3)and has later been expanded over the entire protein family(10).In Helicobacter pylori strains26695(14)and J99(1),all ORFs besides HP0336,JHP1437,and JHP0318are conserved. The loci for HP0336and JHP0318were investigated in nine H.pylori strains that were isolated from individuals who suf-fered from gastric carcinoma,duodenal ulcers,and chronic gastritis.HP0336was found to be absent from all strains, whereas JHP0318was detected infive strains(4).The recent crystal structure analysis of HcpB(HP0336)(10) confirms the modular architecture of Hcp proteins that was already predicted from the protein sequence(11).Although it was shown that HcpA(HP0211),HcpB,and HcpD(HP0160) have penicillin-binding activities(9,10,11),their in vivo func-tions are unknown.HcpD was previously isolated from H.py-lori membrane fractions with an ampicillin affinity resin(9). Antibodies against HcpA were raised by using the supernatant of H.pylori cultures,and the gene was subsequently cloned, confirming that the protein was expressed and secreted(3).It was also shown that HcpA induced IFN-␥expression in a mouse splenocyte system(6).HcpC(HP1098)was found to be associated with H.pylori membranes,but a function was not assigned(2,13).In a comprehensive immunoproteomics study, the antibody responses against H.pylori strain26695proteins were analyzed in sera from patients with different clinical man-ifestations(7).Increased titers of antibody against HcpC were detected primarily in the sera from patients suffering from gastric cancer(7).HcpE(HP0235)has so far only been iden-tified on the genome level.Data on the gene product have not been reported so far.Using the recombinant expressed proteins HcpA,HcpB, HcpC,and HcpE,we detected immunoglobulin G(IgG)anti-bodies against these proteins in the sera of H.pylori-infected individuals,showing that these proteins are expressed under native conditions and that these proteins are recognized by the immune system of the host.Since these proteins are specific for H.pylori,Hcp proteins might be useful targets for the diagnosis of H.pylori infections.MATERIALS AND METHODSRecombinant expression of HcpA,HcpB,HcpC,and HcpE.Hcp proteins were expressed,refolded,and purified under conditions similar to those reported previously for HcpA and HcpB(10,11).Briefly,the ORFs HP0211,HP0235, HP0336,and HP1098were amplified by PCR from the genomic DNA of H.pylori strain26695(American Tissue and Culture Collection),and the amplification products were inserted into a pTFT74expression vector.The PCR also intro-duced a His6tag at the C-terminal ends of the petent Escherichia-coli BL21(DE3)cells were transformed,and the cells were grown at37°C.Three*Corresponding author.Mailing address:Biochemical Institute, University of Zurich,Winterthurer Strasse190,CH-8057Zurich,Swit-zerland.Phone:41-1-6356559.Fax:41-1-6356834.E-mail:mittl@bioc.unizh.ch.542hours after induction with 1mM isopropyl--D -thiogalactopyranoside,cells were harvested and broken up by passing the resuspended pellet over a French press.Inclusion bodies were collected by centrifugation (15min,20,000ϫg ,4°C),and the soluble fraction was discarded.The pellet was washed two times with buffer A (0.1M Tris-HCl,20mM EDTA [pH 6.8])and subsequently with buffer B (0.5M GdnCl [guanidine hydrochloride]in buffer A).Inclusion bodies were solubilized in buffer C (5M GdnCl,0.2M Tris-HCl,0.1M dithiothreitol,10mM EDTA [pH 8.0]),and insoluble material was removed by centrifugation.Solu-bilized inclusion bodies were dialyzed overnight against buffer D (5M GdnCl,0.1M acetic acid).Hcp proteins were refolded by immobilizing the solubilized inclusion bodies on Ni-nitrilotriacetic acid-agarose (Qiagen).The column (5to 10ml)was washed with 50ml of buffer E (5M GdnCl,0.1M Tris [pH 8.0]),and Hcp proteins were refolded by replacing buffer E immediately with buffer F (50mM Tris-HCl,150mM sodium chloride,5mM glutathione [pH 8.0])and wash-ing the column with 50ml of buffer F at a flow rate of 1ml/min.The protein was eluted with buffer G (250mM imidazole,50mM Tris-HCl,150mM sodium chloride,5mM glutathione [pH 7.0]).Protein-containing fractions were pooled and dialyzed against 1,000ml of buffer H (40mM sodium acetate,1mM EDTA [pH 5.5]).Buffer H was also used for gel-permeation chromatography.After concentrating the protein in a Cen-triprep (Millipore),1mg of refolded Hcp proteins was loaded onto a Superdex 75HR 10/30column (Amersham Pharmacia)and eluted as single peaks at a flow rate of 0.5ml/min.Investigation of patients.Thirty patients who were serologically positive for H.pylori were selected for this study.Thirteen patients were men (mean age,49.1years;range,19to 78years),and 17patients were women (mean age,44.5years;range,18to 78years).The mean age of all patients was 46.8years (range,18to 78years).Identical serologic analyses were performed on samples from six serologically negative H.pylori patients (3women and 3men).The mean age was 50.9years (range,15to 71years).One sample was taken from each patient.All serologic assays were performed in the laboratory of Clinical Immunology,University Hospital Zurich,which is accredited by the Swiss Federal Office of Metrology and Accreditation,Swiss Accreditation Service.Anti-H.pylori anti-bodies were measured with a commercially available enzyme immunoassay (Syn-elisa H .pylori [IgG],Abs;Pharmacia and Upjohn,Freiburg,Germany)by fol-lowing the guidelines of the manufacturer.ELISA conditions.Anti-HcpA,-HcpB,-HcpC,and -HcpE IgG antibodies were detected by enzyme immunoassay as follows.The antigens (0.2g/ml)(HcpA,HcpB,HcpC,and HcpE)in 0.1M carbonate bicarbonate buffer (pH 9.6)were coated overnight at 4°C onto 96-well microtiter plates (Immulon 2;Dyna-tech Labs,Chantilly,Va.).The plates were blocked with 1%bovine serumalbumin in phosphate-buffered saline (PBS)at room temperature for 60min.Negative and low-and high-positive controls from Pharmacia as well as serum samples were diluted 1:101in PBS–0.05%Tween 20.After adding the samples,the plates were incubated at room temperature for 60min.After four washes,horseradish peroxidase-conjugated rabbit anti-human IgG (DAKO Diagnostics A/S,Copenhagen,Denmark)diluted 1:2,500in PBS–0.05%Tween 20was added and incubated at room temperature for 60min.After four washes,o -phenylen-diamine hydrochloride solution was added and the color was allowed to develop at room temperature in the dark.The reaction was stopped with 4M sulfuric acid,and the optical density (OD)was analyzed at a wavelength of 492nm with a Tecan SLT enzyme-linked immunosorbent assay (ELISA)reader.All ELISAs were done in duplicate.Results were expressed as OD 492values.To assess the variation among subsequent experiments,the sera of six H .pylori -positive indi-viduals were analyzed against HcpA on all ELISA plates.The variation between subsequent ELISAs was between 3and 8%.For samples where the absorption was saturated,values were truncated at 4.0OD 492units.For statistical analysis,Student’s t test function,assuming unequal variances,was applied.RESULTSDetection of anti-HcpA,-HcpB,-HcpC,and -HcpE IgG an-tibodies.Anti-HcpA,-HcpB,-HcpC,and -HcpE IgG antibod-ies titers were analyzed by immobilizing the purified proteins to ELISA plates,incubating the plates with sera obtained from 30H.pylori -infected and 6uninfected individuals,and detect-ing bound IgG antibodies by standard ELISA techniques.Ex-pression and refolding of more Hcp family members was at-tempted,but so far only HcpA,HcpB,HcpC,and HcpE were obtained in sufficient quantities.As shown in Fig.1,the re-combinant Hcp proteins were pure enough to rule out unspe-cific binding to impurities that might have been present in all Hcp protein preparations.The mean ODs for the anti-Hcp IgG titers in infected and uninfected individuals are given in Table 1.In the group of H .pylori -positive individuals,the average anti-HcpA,-HcpC,and -HcpE IgG titers were 1.11,1.35,and 1.61,whereas in the control group,they were 0.306,0.292,and 0.620,respectively.Although high values for the standard deviations indicated significant variations of IgG ti-ters among individuals,the presence of anti-HcpA,-HcpC,and -HcpE IgG antibodies is expected,with probabilities below 5ϫ10Ϫ4.For HcpB,the difference between the mean titers for infected and uninfected individuals is three times smaller and the P value is 100times higher than the titers for HcpA,HcpC,and HcpE.Correlation between anti-HcpA,-HcpB,-HcpC,and -HcpE IgG titers.As indicated by the values given in Table 2and the line-plots shown in Fig.2,there is a good correlation between anti-HcpA,-HcpC,and -HcpE IgG titers.The correlations for the pairwise comparisons are all above 0.7,indicating a clear trend.For HcpB,this trend is much less significant,as indi-cated by correlations below 0.4.No correlation wasdetectedFIG.1.Sodium dodecyl sulfate-polyacrylamide gel electrophoresis analysis of recombinant HcpA (lane a),HcpB (lane b),HcpC (lane c),and HcpE (lane d).Ten micrograms of protein was loaded per lane.Higher-molecular-weight impurities represent Hcp dimers that formed during sample preparation.TABLE 1.Mean OD 492values for Hcp proteins inH.pylori -positive and -negative individualsProtein (ORF)OD 492(OD )for individualswho are H.pylori :P valuePositiveNegativeHcpA (HP0211) 1.108(1.021)0.306(0.138) 1.2ϫ10Ϫ4HcpB (HP0336) 1.180(0.640)0.847(0.362) 5.2ϫ10Ϫ2HcpC (HP1098) 1.346(1.083)0.292(0.110) 5.5ϫ10Ϫ6HcpE (HP0235) 1.612(0.999)0.620(0.416)4.0ϫ10Ϫ4V OL .10,2003ANTI-Hcp ANTIBODIES 543between titers of IgG measured by the commercial serological test to validate the infection state and the titers of anti-Hcp IgG (correlation Ͻ0.15for anti-HcpA,-HcpB,and -HcpC in the group of H .pylori -positive individuals).There is substantial variation between titers of anti-Hcp IgG in the group of infected individuals.In some cases the titers of IgG are as low as in the group of H .pylori -negative individuals,but in some cases,the titers are 10times larger.These fluctu-ations cannot be addressed as measurement errors but reflect the individual response of the host.DISCUSSIONThe results presented above confirm that anti-HcpA,-HcpC,and -HcpE IgG antibodies exist in the sera of H .pylori -positive individuals,whereas for HcpB,the mean titers of antibody are just slightly increased.The titers and P values for HcpA,HcpC,and HcpE are in the same order of magnitude as other patho-genicity factors,such as the H.pylori vaculating toxin VacA (12).There are two possible reasons for the low P values ofHcpB.HcpA,HcpC,and HcpE possess N-terminal leader peptides that guide these proteins into the periplasmic space,whereas HcpB has no leader peptide and therefore it is assumed that HcpB is localized in the cytosol.The secretion of Hcp proteins into the periplasmic space could increase the antigenicity of HcpA,HcpC,and HcpE because it makes interactions with the host more likely.On the other hand,secretion is not a prereq-uisite for antigenicity.In the sera of H .pylori -positive individ-uals,significant titers of anti-UreB IgG were detected although UreB lacks a leader peptide (5).Another reason for the lower titers of anti-HcpB IgG might be the absence of HcpB from the proteome of many H.pylori strains.The ORFs coding for HcpA,HcpC,and HcpE are conserved in the genome sequences of strains 26695and J99,whereas HP0336,the ORF coding for HcpB,is absent from the genome of strain J99.HP0336was also found to be absent from the genomes of nine clinical isolates,as shown by Chanto and coworkers (4).In five of nine cases,HP0336was replaced by JHP0318.Since HcpB seems to be very specific for strain 26695,the absence of HcpB in H .pylori -positive individuals seems to be the reason for its lower P value.The increased titers of anti-HcpB IgG,particularly in some H.pylori -seroneg-ative patients,could be explained by the presence of a cross-reacting antigen in some H .pylori -positive and -negative indi-viduals.In addition,the titers of IgG for HcpA,HcpC,and HcpE show a clear correlation (Table 2),indicating that these proteins are expressed and recognized by the immune sys-tem at similar disease states.If HcpB would be expressed under the same conditions but the decreased P value for HcpB would be due to cytosolic expression,the titer of anti-HcpB should also correlate with the titers of HcpA,HcpC,and HcpE.From the lack of correlation for anti-FIG.2.Line plots to indicate the correlations between HcpA and HcpC (a),HcpA and HcpE (b),and HcpC and HcpE (c).TABLE 2.Correlation between titers of anti-Hcp IgGProteinCorrelation with a :HcpAHcpBHcpCHcpB 0.293HcpC 0.7340.357HcpE0.7330.1460.710aCorrelations above 0.7are shown in boldface type.544MITTL ET AL.C LIN .D IAGN .L AB .I MMUNOL .HcpB IgG,it can be concluded that HcpB is missing under conditions where HcpA,HcpC,and HcpE are expressed either because the corresponding gene is missing in the genome of the particular H.pylori strain or because expres-sion is dispensable under some conditions.The correlation of titers of anti-HcpA,-HcpC,and-HcpE IgG could also be explained by cross-reactivity of sera,which is supported by the significant sequence identity levels among these antigens and the modular architectures of the Hcp pro-teins.HcpA,HcpB,HcpC,and HcpE consist of six,four, seven,and nine repeats of a common␣/␣motif,respectively (10).If the correlation between titers of anti-HcpA,-HcpC, and-HcpE IgG were caused by the cross-reactivity of the sera, HcpB should also be recognized at the same level.Although HcpB shares the same basic structure and a similar degree of sequence identity,no significant correlations between the titers of anti-HcpB and anti-HcpA,-HcpC,and-HcpE IgG were observed.Therefore,it is unlikely that the correlation between the titers of anti-HcpA,-HcpC,and-HcpE IgG is due to the cross-reactivity of the sera.The same argument contradicts the hypothesis that the sera might recognize the artificial His6tag rather than the H.pylori proteins.The high standard deviation for titers of anti-Hcp IgG and the lack of correlation between titers of anti-Hcp IgG and infection state measured by the commercial serological test suggest that Hcp proteins are only expressed under very spe-cific conditions.This hypothesis is in agreement with the ob-servation that at least HcpA is not required for survival in vitro because an HcpA deletion mutant was found to grow normally under standard culture conditions(8).However,the high titers of IgG observed in some H.pylori-positive individuals confirm that Hcp proteins are expressed in vivo.Anti-HcpC antibodies have recently been detected in a com-prehensive immunoproteomics study(7).Although only qual-itative data on two-dimensional gel analysis but no quantitative data on antibody titers was given,HcpC was found to be preferentially recognized by the sera of patients suffering from gastric cancer.Since we did not distinguish in our study among different clinical manifestations of H.pylori infection,we can-not comment on this observation.However,if it turns out that elevated titers of anti-HcpC IgG are present in gastric cancer patients,immunological tests including Hcp proteins might be suitable assays to distinguish between different groups of H.py-lori-positive individuals.ACKNOWLEDGMENTSThis work was supported by the Hartmann-Mu¨ller foundation(Zu¨-rich,Switzerland)and by the Swiss National Science Foundation grant no.3100-063794.001.REFERENCES1.Alm,R.A.,L.S.Ling,D.T.Moir,B.L.King,E.D.Brown,P.C.Doig,D.R.Smith,B.Noonan,B.C.Guild,B.L.deJonge,G.Carmel,P.J.Tummino,A.Caruso,M.Uria-Nickelsen,ls,C.Ives,R.Gibson,D.Merberg, ls,Q.Jiang,D.E.Taylor,G.F.Vovis,and T.J.Trust.1999.Genomic-sequence comparison of two unrelated isolates of the human gas-tric pathogen Helicobacter pylori.Nature397:176–180.2.Bumann,D.,S.Aksu,M.Wendland,K.Janek,U.Zimny-Arndt,N.Sabarth,T.F.Meyer,and P.R.Jungblut.2002.Proteome analysis of secreted pro-teins of the gastric pathogen Helicobacter pylori.Infect.Immun70:3396–3403.3.Cao,P.,M.S.McClain,M.H.Forsyth,and T.L.Cover.1998.Extracellularrelease of antigenic proteins by Helicobacter pylori.Infect.Immun.66:2984–2986.4.Chanto,G.,A.Occhialini,N.Gras,R.A.Alm,F.Megraud,and A.Marais.2002.Identification of strain-specific genes located outside the plasticity zone in nine clinical isolates of Helicobacter pylori.Microbiology148:3671–3680.5.Dunn,B.E.,G.P.Campbell,G.I.Perez-Perez,and M.J.Blaser.1990.Purification and characterization of urease from Helicobacter pylori.J.Biol.Chem.265:9464–9469.6.Gosciniak,G.,A.Przondo-Mordarska,B.Iwanczak,and E.Poniewierka.2001.Neutralisation of cytotoxic vacuolating activity by serum antibodies of Helicobacter pylori-infected patients.Int.J.Med.Microbiol.291:27–32. 7.Haas,G.,G.Karaali,K.Ebermayer,W.G.Metzger,mer,U.Zimny-Arndt,S.Diescher,U.B.Goebel,K.Vogt,A.B.Roznowski,B.J.Wieden-mann,T.F.Meyer,T.Aebischer,and P.R.Jungblut.2002.Immunopro-teomics of Helicobacter pylori infection and relation to gastric disease.Proteomics2:313–324.8.Karita,M.,M.L.Etterbeek,M.H.Forsyth,M.K.Tummuru,and M.Blaser.1997.Characterization of Helicobacter pylori dapE and construction of a conditionally lethal dapE mutant.Infect.Immun.65:4158–4164.9.Krishnamurthy,P.,M.H.Parlow,J.Schneider,S.Burroughs,C.Wickland,N.B.Vakil,B.E.Dunn,and S.H.Phadnis.1999.Identification of a novel penicillin-binding protein from Helicobacter pylori.J.Bacteriol.181:5107–5110.10.Lu¨thy,L.,M.G.Gru¨tter,and P.R.Mittl.2002.The crystal structure ofHelicobacter pylori cysteine-rich protein B reveals a novel fold for a peni-cillin-binding protein.J.Biol.Chem.277:10187–10193.11.Mittl,P.R.E.,L.Lu¨thy,P.Hunziker,and M.G.Gru¨tter.2000.Thecysteine-rich protein A from Helicobacter pylori is a beta-lactamase.J.Biol.Chem.275:17693–17699.12.Perez-Perez,G.I.,R.M.Peek,Jr.,J.C.Atherton,M.J.Blaser,and T.L.Cover.1999.Detection of anti-VacA antibody responses in serum and gastric juice samples using type s1/m1and s2/m2Helicobacter pylori VacA antigens.b.Immunol.6:489–493.13.Sabarth,N.,mer,U.Zimny-Arndt,P.R.Jungblut,T.F.Meyer,and D.Bumann.2002.Identification of surface proteins of Helicobacter pylori by selective biotinylation,affinity purification,and two-dimensional gel electro-phoresis.J.Biol.Chem.277:27896–27902.14.Tomb,J.F.,O.White,A.R.Kerlavage,R.A.Clayton,G.G.Sutton,R.D.Fleischmann,K.A.Ketchum,H.P.Klenk,S.Gill,B.A.Dougherty,K.Nelson,J.Quackenbush,L.Zhou,E.F.Kirkness,S.Peterson,B.Loftus,D.Richardson,R.Dodson,H.G.Khalak,A.Glodek,K.McKenney,L.M.Fitzegerald,N.Lee,M.D.Adams,and J.C.Venter.1997.The complete genome sequence of the gastric pathogen Helicobacter pylori.Nature388: 539–547.V OL.10,2003ANTI-Hcp ANTIBODIES545。
医药行业生化重要概念解释
医药行业生化重要概念解释生化重要概念说明1 重要概念说明A Abundance (mRNA 丰度):指每个细胞中mRNA 分子的数目。
Abundant mRNA(高丰度mRNA):由少量不同种类mRNA组成,每一种在细胞中显现大量拷贝。
Acceptor splicing site (受体剪切位点):内含子右末端和相邻外显子左末端的边界。
Acentric fragment(无着丝粒片段):(由打断产生的)染色体无着丝粒片段缺少中心粒,从而在细胞分化中被丢失。
Active site(活性位点):蛋白质上一个底物结合的有限区域。
Allele(等位基因):在染色体上占据给定位点基因的不同形式。
Allelic exclusion(等位基因排斥):形容在专门淋巴细胞中只有一个等位基因来表达编码的免疫球蛋白质。
Allosteric control(别构调控):指蛋白质一个位点上的反应能够阻碍另一个位点活性的能力。
Alu-equivalent family(Alu 相当序列基因):哺乳动物基因组上一组序列,它们与人类Alu 家族相关。
Alu family (Alu 家族):人类基因组中一系列分散的相关序列,每个约300bp长。
每个成员其两端有Alu 切割位点(名字的由来)。
α-Amanitin(鹅膏覃碱):是来自毒蘑菇Amanita phalloides 二环八肽,能抑制真核RNA聚合酶,专门是聚合酶II 转录。
Amber codon (琥珀密码子):核苷酸三联体UAG,引起蛋白质合成终止的三个密码子之一。
Amber mutation (琥珀突变):指代表蛋白质中氨基酸密码子占据的位点上突变成琥珀密码子的任何DNA 改变。
Amber suppressors (琥珀抑制子):编码tRNA的基因突变使其反密码子被改变,从而能识别UAG 密码子和之前的密码子。
Aminoacyl-tRNA (氨酰-tRNA):是携带氨基酸的转运RNA,共价连接位在氨基酸的NH2 基团和tRNA 终止碱基的3¢或者2¢-OH 基团上。
基因组结构(1)
核小体电镜图
编辑ppt
22
1)、核小体结构
组 蛋 白 与 DNA的 结 合
染色质及染色体基本结构单位:核小体;
DNA以左手螺旋在组蛋白核心上盘绕 1.8 圈,共146 bp;
DNA长度缩短7倍;
编辑ppt
23
2)、真核生物染色体由DNA高度凝聚的形成 ----------三个水平的折叠
DNA Organization
编辑ppt
25
5、在真核染色体上有几个重要的元件:
复制起始位点(replication origins): 每隔30-40kb均匀分布在每条真核染色体上,一般位于非 编码区。
着丝粒(centromere):是DNA复制后染色体正确分离所必 须的。每条染色体上只有一个着丝粒。大多数真核生物的着 丝粒大于40kb,且主要由重复DNA序列组成。 端粒(telomere):
编辑ppt
10
三、基因组结构 Structure of genome
• 是指不同DNA功能区域在整个DNA分
子中的分布和排列情况 基因测序
编辑ppt
11
四、 基因度 Gene density
1、C-值 (C-value): 每种生物的单倍体基因组的DNA总量是恒定
的,称之为C-值。C-值是每种生物的一个特性。
白以外的其它蛋白质,是一大类种类复杂的各种蛋白质的 总称;如RNA聚合酶及与细胞分裂有关的蛋白质;
• 非组蛋白中还包括一类高迁移率组(high mobility
group,HMG)蛋白质,此类蛋白质因在凝胶电泳上泳动速 度快而得名。其中的一些蛋白质在转录活性区含量丰富, 认为是参与转录调控的蛋白质;
英文练习
Chronic Cocaine-Induced H3 Acetylation and Transcriptional Activation of CaMKIIa in the Nucleus Accumbens Is Critical for Motivation for Drug Reinforcement Neuropsychopharmacology (2010) 35, 913–928The regulation of gene expression in the brain reward regions is known to contribute to the pathogenesis and persistence of drug addiction. Increasing evidence suggests that the regulation of gene transcription is mediated by epigenetic mechanisms that alter the chromatin structure at gene promoters. To better understand the involvement of epigenetic regulation in drug reinforcement, rats were subjected to cocaine self-administration paradigm. Daily histone deacetylase (HDAC) inhibitor infusions in the shell of the nucleus accumbens (NAc) caused an upward shift in the dose-response curve under fixed-ratio schedule and increased the break point under progressive-ratio schedule, indicating enhanced motivation for self-administered drug. The effect of the HDAC inhibitor is attributed to the increased elevation of histone acetylation induced by chronic, but not acute, cocaine experience. In contrast, neutralizing the chronic cocaine-induced increase in histone modification by the bilateral overexpression of HDAC4 in the NAc shell reduced drug motivation. The association between the motivation for cocaine and the transcriptional activation of addiction-related genes by H3 acetylation in the NAc shell was analyzed. Among the genes activated by chronic cocaine experiences, the expression of CaMKIIa, but not CaMKIIb, correlated positively with motivation for the drug.慢性可卡因诱导的伏核组蛋白H3的乙酰化和CaMKIIa的转录激活对于药物巩固的推动是至关重要的。
基因VIII名词解释
附录1.名词解释A腺嘌呤(adenine)abortive transduction 流产转导:转导的DNA片段未掺入到受体的染色体中,此DNA片段不能复制,只能传给两个子细胞中的一个,沿着单个细胞线传递。
acentric chromosome无着丝粒染色体:指缺乏着丝粒的染色体或染色单体。
achondroplasia 软骨发育不全:人类的一种常染色体显性遗传病,表型为四肢粗短,鞍鼻,腰椎前凸。
acroce n tric chromosome近端着丝粒染色体:着丝粒位于染色体末端附近。
active site 活性位点:蛋白质结构中具有生物活性的结构域。
adap t ation适应:在进化中一些生物的可遗传性状发生改变,使其在一定的环境能更好地生存和繁殖。
adenine 腺嘌呤:在DNA中和胸腺嘧啶配对的碱基。
albino 白化体:一种常染色体隐性遗传突变。
动物或人的皮肤及毛发呈白色,主要因为在黑色素合成过程中,控制合成酪氨酸酶的基因发生突变所致。
allele等位基因:一个座位上的基因所具有的几种不同形式之一。
allelic frequencies等位基因频率:在群体中存在于所有个体中某一个座位上等位基因的频率。
allelic exclusion等位排斥:杂合状态的免疫球蛋白基因座中,只有一个基因因重排而得以表达,其等位基因不再重排而无活性。
allopolyploi d y异源多倍体:多倍体的生物中有一套或多套染色体来源于不同物种。
Ames test埃姆斯测验法: Bruce Ames 于1970年人用鼠伤寒沙门氏菌(大鼠)肝微粒体法来检测某些物质是否有诱变作用。
amino acids氨基酸:是构成蛋白质的基本单位,自然界中存在20种不同的氨基酸。
aminoacyl-tRNA氨基酰- tRNA:tRNA的氨基臂上结合有相应的氨基酸,并将氨基酸运转到核糖体上合成蛋白质。
aminoacyl-tRNA synthetase氨基酰- tRNA合成酶:催化一个特定的tRNA结合到相应的tRNA分子上。
人类基因组解码:揭示遗传信息对健康的重要意义
人类基因组解码:揭示遗传信息对健康的重要意义1. Introduction1.1 OverviewThe decoding of the human genome has been a monumental scientific achievement that has revolutionized our understanding of genetics and its impact on human health. Through this groundbreaking endeavor, scientists have successfully unraveled the complete sequence of DNA in the human body, providing valuable insights into the genetic information that influences our well-being.1.2 Research BackgroundFrom the moment scientists first understood that genes determine our physical characteristics and are responsible for inherited diseases, there has been a persistent curiosity to decipher the entire human genome. This curiosity stems from the belief that unraveling the secrets hidden within our genes could unlock a wealth of knowledge about human health and provide innovative approaches to prevent, diagnose, and treat genetic disorders.1.3 ObjectivesThe objective of this article is to explore and highlight the importance of decoding the human genome in revealing vital genetic information for improving health outcomes. By examining various aspects such as understanding gene information, diagnosing and treating genetic diseases, and advancing personalized medicine, we aim to demonstrate how decoding the genome has paved the way for significant advancements in healthcare.Through this article, we will delve into the significance of the relationship between genes and health, emphasizing how genetic diversity and interactions with our environment play crucial roles in determining disease susceptibility. Additionally, we will discuss how an understanding of these factors can aid in formulating effective prevention and intervention strategies.Furthermore, we will examine the progress made by initiatives like the Human Genome Project in decoding the genome. We will explore cutting-edge genomic interpretation technologies that have accelerated research efforts and shed light on previously unknown genetic associations with diseases. Additionally, we will uncover how global collaborations have facilitated advancements in genomic research whileaddressing ethical challenges surrounding data sharing.In conclusion, this article aims to summarize key findings and insights derived from decoding the human genome. Furthermore, it seeks to provide an outlook on future developments and applications in personalized medicine, highlighting the profound impact these discoveries will have on individual and societal health.2. 基因组解码的意义2.1 揭示基因信息:基因组解码旨在揭示人类基因组中包含的遗传信息。
生物工程 Concepter of Gene
由于同源染色体的分离而实现等位基因的分离,导致 性状的分离;决定不同性状的两对非等位基因分别处 在两对非同源染色体上,由于同源染色体的分离、非 同源染色体的独立分配,导致了基因的自由组合。
必须进一步把某一特定基因与特定染色体相联系,证明基因 的行为与染色体在细胞分裂中行为的平行关系转变为基因与 染色体的从属关系。
peas
Mendel's principles were independently discovered and verified, marking the beginning of modern genetics.
Principle of independent segregation
Principle of independent assortment
take place.
Sutton‘s work with grasshoppers(蝗虫) showed that chromosomes
occur in matched pairs of maternal and paternal chromosomes which
separate during meiosis and may constitute the physical basis of the
truly bounded by the parent's colors.
11
1.2 获得性遗传理论
(Inheritance of acquired characters marck. 1809)
➢ Nature produced successively all the different forms of life on earth. ➢ Environmentally induced behavioral changes lead the way in
基因的结构和组合
1.1 基因重叠
36 LOGO
如ΦX174
开放阅读框(Open Reading Fram, ORF)
蛋 精 丙 丝 异亮 赖 蛋 异亮 甘 缬 丝 天酰 异亮 谷酰
A ATGCGCGCTTCGATAAAAATGATTGGCGTATCCAACATCCAG B ATGCGCGCTTCGATAAAAATGATTGGCGTATCCAACATCCAG
命特征)以及疾病易感性(肿瘤、高血压)等。
Your company slogan
(三) DNA的二级结( 伟大的双螺旋结构)
11 LOGO
Your company slogan
12 LOGO
反向平行 碱基互补 双螺旋
Your company slogan
(base match)碱基互补
13 LOGO
27 LOGO
Tm值
使50%DNA分子解链时所需的温度
Your company slogan
两个阶段
28 LOGO
I
II
Your company slogan
轻度变性
29 LOGO
GC
AT
Your company slogan
Tm值
30 LOGO
DNA的Tm值与以下因素有关:
(1)DNA的均一性
2.
Your company slogan
6 DNA的变性和复性
变性(denaturation)
24 LOGO
在加热,碱性等条件下,双螺旋之间氢键断裂,双螺旋解 开,形成单链无规则线团,因而发生了性质上的改变(如 粘度下降,沉降速度增加,浮力上升,紫外吸收增加等) 复性(renaturation)——退火(annealing) 在去除变性的条件下,两条变性的碱基互补的单链DNA 可以恢复成双链结构,并且恢复原有的理化特性和生物学 活性
分子生物学基因的结构和功能
根据基因产物不同,分为蛋白质基因和
RNA基因。 根据基因的功能,分为结构基因和调节基 因。
一、结构基因
1. 概念: 结构基因(structural gene)是指能够编码特 定RNA分子或蛋白质分子的遗传单位。 2. 结构基因的特点: ① 原核生物结构基因
编码序列是连续的。
② 真核生物结构基因
④ 正调控蛋白结合位点:指常出现于弱启动 子附近的一段特殊的DNA序列,能与某些 具有转录激活作用的正调控蛋白识别结合,
从而加快转录的启动。
大肠杆菌乳糖操纵子的基因序列
2. 真核生物的调控基因
① 启动子(promoter):
与原核生物类似,也是指位于结构基因上游,并与
RNA聚合酶识别、结合和启动转录有关的一段特殊
第一篇 分子生物学基本原理
第一章 基因的结构与功能
Chapter 1 Structure and Function of Gene
基因(gene)是指核酸分子中贮存与表达遗传 信息的单位。 大多数生物的遗传信息贮存于DNA分子。在某 些生物,如RNA病毒以RNA作为遗传信息的载 体。
DNA的化学结构
在某些生物如rna病毒以rna作为遗传信息的载dna的化学结构一级结构primarystructure指dna分子中核苷酸的排列顺序二级结构secondarystructure指两条dna单链形成的双螺旋结构三级结构tertiarystructure指双链dna进一步扭曲盘旋形成的超螺旋结构在原核生物中染色体dna通常为双链环状与蛋白质构成复合体以类核nucleoid结构存在于细胞中
T.Cech
掌握基因、结构基因、 断裂基因、外显子、
内含子、启动子、
增强子的概念
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Prediction of Complete Gene Structures in Human Genomic DNAChris Burge *and Samuel KarlinDepartment of Mathematics Stanford University,Stanford CA,94305,USAWe introduce a general probabilistic model of the gene structure of human genomic sequences which incorporates descriptions of the basic transcriptional,translational and splicing signals,as well as length distri-butions and compositional features of exons,introns and intergenic regions.Distinct sets of model parameters are derived to account for the many substantial differences in gene density and structure observed in distinct C G compositional regions of the human genome.In addition,new models of the donor and acceptor splice signals are described which capture potentially important dependencies between signal positions.The model is applied to the problem of gene identi®cation in a computer pro-gram,GENSCAN,which identi®es complete exon/intron structures of genes in genomic DNA.Novel features of the program include the ca-pacity to predict multiple genes in a sequence,to deal with partial as well as complete genes,and to predict consistent sets of genes occurring on either or both DNA strands.GENSCAN is shown to have substan-tially higher accuracy than existing methods when tested on standardized sets of human and vertebrate genes,with 75to 80%of exons identi®ed exactly.The program is also capable of indicating fairly accurately the re-liability of each predicted exon.Consistently high levels of accuracy are observed for sequences of differing C G content and for distinct groups of vertebrates.#1997Academic Press LimitedKeywords:exon prediction;gene identi®cation;coding sequence;probabilistic model;splice signal*Corresponding authorIntroductionThe problem of identifying genes in genomic DNA sequences by computational methods has at-tracted considerable research attention in recent years.From one point of view,the problem is clo-sely related to the fundamental biochemical issues of specifying the precise sequence determinants of transcription,translation and RNA splicing.On the other hand,with the recent shift in the emphasis of the Human Genome Project from physical map-ping to intensive sequencing,the problem has taken on signi®cant practical importance,and com-puter software for exon prediction is routinely used by genome sequencing laboratories (in con-junction with other methods)to help identify genes in newly sequenced regions.Many early approaches to the problem focused on prediction of individual functional elements,e.g.promoters,splice sites,coding regions,in iso-lation (reviewed by Gelfand,1995).More recently,a number of approaches have been developed which integrate multiple types of information in-cluding splice signal sensors,compositional prop-erties of coding and non-coding DNA and in some cases database homology searching in order to pre-dict entire gene structures (sets of spliceable exons)in genomic sequences.Some examples of such pro-grams include:FGENEH (Solovyev et al .,1994),GENMARK (Borodovsky &McIninch,1993),Gene-ID (GuigoÂet al .,1992),Genie (Kulp et al .,1996),GeneParser (Snyder &Stormo,1995),and GRAIL II (Xu et al .,1994).Fickett (1996)offers an up-to-date introduction to gene ®nding by computer and points up some of the strengths and weaknesses of currently available methods.Two important limi-tations noted are that the majority of current algor-ithms assume that the input sequencecontainsAbbreviations used:Sn,sensitivity;Sp,speci®city;CC,correlation coef®cient;AC,approximate correlation;ME,missed exons;WE,wrong exons;snRNP,small nuclear ribonucleoprotein particle;snRNA,small nuclear RNA;WMM,weight matrix model;WAM,weight array model;MDD,maximal dependence decomposition.J.Mol.Biol.(1997)268,78±940022±2836/97/160078±17$25.00/0/mb970951#1997Academic Press Limitedexactly one complete gene(so that,when presented with a sequence containing a partial gene or mul-tiple genes,the results generally do not make sense);and that accuracy measured by indepen-dent control sets may be considerably lower than was originally thought.The issue of the predictive accuracy of such methods has recently been ad-dressed through an exhaustive comparison of available methods using a large set of vertebrate gene sequences(Burset&GuigoÂ,1996).The authors conclude that the predictive accuracy of all such programs remains rather low,with less than 50%of exons identi®ed exactly by most programs. Thus,development of new methods(and/or im-provement of existing methods)continues to be important.Here,we introduce a general probabilistic model for the(gene)structure of human genomic se-quences and describe the application of this model to the problem of gene prediction in a program called GENSCAN.Our goal in designing the geno-mic sequence model was to capture the general and speci®c compositional properties of the dis-tinct functional units of a eukaryotic gene:exon,in-tron,splice site,promoter,etc.Emphasis was placed on those features which are recognized by the general transcriptional,splicing and transla-tional machinery which process most or all protein coding genes,rather than specialized signals re-lated to transcription or(alternative)splicing of particular genes or gene families.Thus,for example,we include the TATA box and cap site which are present in most eukaryotic promoters, but not specialized or tissue-speci®c transcription factor binding sites such as those bound by MyoD (ssar et al.,1989).Similarly,we use a general three-periodic(inhomogeneous)®fth-order Markov model of coding regions rather than using special-ized models of particular protein motifs or data base homology information.As a consequence, predictions made by the program do not depend on presence of a similar gene in the protein se-quence databases,but instead provide information which is independent and complementary to that provided by homology-based gene identi®cation methods such as searching the protein databases with BLASTX(Gish&States,1993).Additionally, the model takes into account many of the often quite substantial differences in gene density and structure(e.g.intron length)that exist between different C G%compositional regions(``iso-chores'')of the human genome(Bernardi,1989; Duret et al.,1995).Our model is similar in its overall architecture to the Generalized Hidden Markov Model approach adopted in the program Genie(Kulp et al.,1996), but differs from most existing programs in several important respects.First,we use an explicitly double-stranded genomic sequence model in which potential genes occuring on both DNA strands are analyzed in simultaneous and inte-grated fashion.Second,while most existing inte-grated gene®nding programs assume that in each input sequence there is exactly one complete gene, our model treats the general case in which the se-quence may contain a partial gene,a complete gene,multiple complete(or partial)genes,or no gene at all.The combination of the double-stranded nature of the model and the capacity to deal with variable numbers of genes should prove particularly useful for analysis of long human genomic contigs,e.g.those of a hundred kilobases or more,which will often contain multiple genes on one or both DNA strands.Third,we introduce a novel method,Maximal Dependence Decompo-sition,to model functional signals in DNA(or pro-tein)sequences which allows for dependencies between signal positions in a fairly natural and statistically justi®able way.This method is applied to generate a model of the donor splice signal which captures several types of dependencies which may relate to the mechanism of donor splice site recognition in pre-mRNA sequences by U1 small nuclear ribonucleoprotein particle(U1 snRNP)and possible other factors.Finally,we de-monstrate that the predictive accuracy of GEN-SCAN is substantially better than other methods when tested on standardized sets of human and vertebrate genes,and show that the method can be used effectively to predict novel genes in long genomic contigs.ResultsGENSCAN was tested on the Burset/GuigoÂset of570vertebrate multi-exon gene sequences(Bur-set&GuigoÂ,1996):the standard measures of pre-dictive accuracy per nucleotide and per exon are shown in Table1A(see Table legend for details). Comparison of the accuracy data shows that GEN-SCAN is signi®cantly more accurate at both the nucleotide and the exon level by all measures of accuracy than existing programs which do not use protein sequence homology information(those in the upper portion of Table1A).At the nucleotide level,substantial improvements are seen in terms of Sensitivity(Sn 0.93versus0.77for the next best program,FGENEH),Approximate Correlation (AC 0.91versus0.78for FGENEH)and Corre-lation Coef®cient(CC 0.92versus0.80for FGE-NEH).At the exon level,signi®cant improvements are seen across the board,both in terms of Sensi-tivity(Sn 0.78versus0.61for FGENEH)and Speci®city(Sp 0.81versus0.64for FGENEH),as well as Missed Exons(ME 0.09versus0.15for FGENEH)and Wrong Exons(WE 0.05versus 0.11for GRAIL).Surprisingly,GENSCAN was found to be somewhat more accurate by almost all measures than the two programs,GeneID and GeneParser3,which make use of protein sequence homology information(Table1A).Exon-level sen-sitivity and speci®city values were substantially higher for GENSCAN and Wrong Exons substan-tially lower;only in the category of Missed Exons did GeneID do better(0.07versus0.09for GEN-SCAN).Use of protein sequence homology infor-mation in conjunction with GENSCAN predictions is addressed in the Discussion.Going beyond exons to the level of whole gene structures,we may de®ne the``gene-level accu-racy''(GA)for a set of sequences as the proportion of actual genes which are predicted exactly,i.e.all coding exons predicted exactly with no additional predicted exons in the transcription unit(in prac-tice,the annotated GenBank sequence).Gene-level accuracy was0.43(243/570)for GENSCAN in the Burset/GuigoÂset,demonstrating that it is indeed possible to predict complete multi-exon gene struc-tures with a reasonable degree of success by com-puter.It should be noted that this proportion almost certainly overstates the true gene-level ac-Table1.Performance comparison for Burset/GuigoÂset of570vertebrate genesA Comparison of GENSCAN with other gene prediction programsAccuracy per nucleotide Accuracy per exonProgram Sequences Sn Sp AC CC Sn Sp Avg.ME WE GENSCAN570(8)0.930.930.910.920.780.810.800.090.05 FGENEH569(22)0.770.880.780.800.610.640.640.150.12 GeneID570(2)0.630.810.670.650.440.460.450.280.24 Genie570(0)0.760.770.72n/a0.550.480.510.170.33 GenLang570(30)0.720.790.690.710.510.520.520.210.22 GeneParser2562(0)0.660.790.670.650.350.400.370.340.17 GRAIL2570(23)0.720.870.750.760.360.430.400.250.11 SORFIND561(0)0.710.850.730.720.420.470.450.240.14 Xpound570(28)0.610.870.680.690.150.180.170.330.13 GeneID 478(1)0.910.910.880.880.730.700.710.070.13 GeneParser3478(1)0.860.910.860.850.560.580.570.140.09B GENSCAN accuracy for sequences grouped byC G content and by organismAccuracy per nucleotide Accuracy per exonSubset Sequences Sn Sp AC CC Sn Sp Avg.ME WEC G<4086(3)0.900.950.900.930.780.870.840.140.05C G40-50220(1)0.940.920.910.910.800.820.820.080.05C G50-60208(4)0.930.930.900.920.750.770.770.080.05C G>6056(0)0.970.890.900.900.760.770.760.070.08 Primates237(1)0.960.940.930.940.810.820.820.070.05 Rodents191(4)0.900.930.890.910.750.800.780.110.05 Non-mam.Vert.72(2)0.930.930.900.930.810.850.840.110.06 A,For each sequence in the test set of570vertebrate sequences constructed by Burset&GuigoÂ(1996),the forward-strand exons in the optimal GENSCAN parse of the sequence were compared to the annotated exons(GenBank``CDS''key).The standard measures of predictive accuracy per nucleotide and per exon(described below)were calculated for each sequence and averaged over all sequences for which they were de®ned.Results for all programs except GENSCAN and Genie are from Table1of Burset&GuigoÂ(1996);Genie results are from Kulp et al.(1996).Recent versions of Genie have demonstrated substantial improvements in accuracy over that given here(M.G.Reese,personal communication).To calculate accuracy statistics,each nucleotide of a test sequence is classi®ed as predicted positive(PP)if it is in a predicted coding region or predicted negative(PN)otherwise,and also as actual posi-tive(AP)if it is a coding nucleotide according to the annotation,or actual negative(AN)otherwise.These assignments are then com-pared to calculate the number of true positives,TP PP AP(i.e.the number of nucleotides which are both predicted positives and actual positive);false positives,FP PP AN;true negatives,TN PN AN;and false negatives,FN PN AP.The following mea-sures of accuracy are then calculated:Sensitivity,Sn=TP/AP;Speci®city,Sp=TP/PP;Correlation Coef®cient,CC TP TN À FP FN PP PN AP AN p Yand the Approximate Correlation,AC 12TPAPTPPPTNANTNPN!À1:The rationale for each of these de®nitions is discussed by Burset&GuigoÂ(1996).At the exon level,predicted exons(PP)are com-pared to the actual exons(AP)from the annotation;true positives(TP)is the number of predicted exons which exactly match an actual exon(i.e.both endpoints exactly correct).Exon-level sensitivity(Sn)and speci®city(Sp)are then de®ned using the same for-mulas as at the nucleotide level,and the average of Sn and Sp is calculated as an overall measure of accuracy in lieu of a correlation measure.Two additional statistics are calculated at the exon level:Missed Exons(ME)is the proportion of true exons not overlapped by any predicted exon,and Wrong Exons(WE)is the proportion of predicted exons not overlapped by any real exon.Under the heading Sequences,the number of sequences(out of570)effectively analyzed by each program is given,followed by the number of sequences for which no gene was predicted,in parentheses.Performance of the programs which make use of amino acid similarity searches,GeneID and GeneParser3,are shown separately at the bottom of the Table:these programs were run only on sequences less than8kb in length.B,Results of GENSCAN for different subsets of the Burset/GuigoÂtest set,divided either according to the C G%composition of the GenBank sequence or by the organism of origin.Classi®cation by organism was based on the GenBank ``ORGANISM''key.Primate sequences are mostly of human origin;rodent sequences are mostly from mouse and rat;the non-mam-malian vertebrate set contains22®sh,17amphibian,5reptilian and28avian sequences.curacy of GENSCAN because of the substantial bias in the Burset/GuigoÂset towards small genes (mean:5.1kb)with relatively simple intron-exon structure(mean:4.6exons per gene).Nevertheless, GENSCAN was able to correctly reconstruct some highly complex genes,the most dramatic example being the human gastric(H K )-ATPase gene (accession no.J05451),containing22coding exons. The performance of GENSCAN was found to be relatively insensitive to C G content(Table1B), with CC values of0.93,0.91,0.92and0.90ob-served for sequences of<40,40to50,50to60,and >60%C G,respectively,and similarly homo-geneous values for the AC statistic.Nor did accu-racy vary substantially for different subgroups of vertebrate species(Table1B);CC was0.91for the rodent subset,0.94for primates and0.93for a di-verse collection of non-mammalian vertebrate se-quences.A feature which may prove extremely useful in practical applications of GENSCAN is the``for-ward-backward''probability,p,which is calcu-lated for each predicted exon as described in Methods.Speci®cally,of the2678exons predicted in the Burset/GuigoÂset:917had p>0.99and,of these,98%were exactly correct;551had p P[0.95, 0.99](92%correct);263had p P[0.90,0.95](88% correct);337had p P[0.75,0.90](75%correct);362had p P[0.50,0.75](54%correct);and248had p P[0.00,0.50],of which30%were correct.Thus, the forward-backward probability provides a use-ful guide to the likelihood that a predicted exon is correct and can be used to pinpoint regions of a prediction which are more certain or less certain. From the data above,about one half of predicted exons have p>0.95,with the practical consequence that any(predicted)gene with four or more exons will likely have two or more predicted exons with p>0.95,from which PCR primers could be de-signed to screen a cDNA library with very high likelihood of success.Since for GENSCAN,as for most of the other programs tested,there was a certain degree of overlap between the``learning''set and the Bur-set/GuigoÂtest set,it was important also to test the method on a truly independent test set.For this purpose,in the construction of the learning set l, we removed all genes more than25%identical at the amino acid level to the genes of the previously published GeneParser test sets(Snyder&Stormo, 1995),as described in Methods.Accuracy statistics for GENSCAN,GeneID,GeneParser2and GRAIL3 (GRAIL II ``assembly''option)on GeneParser test sets Iand I Iare given in Table2.In this Table, exons correct is the proportion of true exons which were predicted exactly,essentially the same as theTable2.Performance comparison for GeneParser Test Sets I,IIProgram:GeneID GRAIL3GeneParser2GENSCANAll sequences III III III III Correlation(CC)0.690.550.830.750.780.800.930.93 Sensitivity0.690.500.830.680.870.820.980.95 Speci®city0.770.750.870.910.760.860.900.94 Exons correct0.420.330.520.310.470.460.790.76 Exons overlapped0.730.640.810.580.870.760.960.91High C G III III III III Correlation(CC)0.650.730.880.800.890.710.940.98 Sensitivity0.720.850.870.800.900.65 1.000.98 Speci®city0.730.730.950.880.930.870.910.98 Exons correct0.380.430.670.500.640.570.760.64 Exons overlapped0.800.860.890.790.960.79 1.000.93 Medium C G III III III III Correlation(CC)0.670.520.830.750.750.820.930.94 Sensitivity0.650.470.860.680.860.840.970.95 Speci®city0.770.760.840.910.700.870.900.95 Exons correct0.370.290.510.320.410.460.790.79 Exons overlapped0.670.620.830.380.840.790.960.93Low C G III III III III Correlation(CC)0.810.620.620.620.720.670.920.81 Sensitivity0.820.560.510.450.790.710.930.80 Speci®city0.850.710.870.890.750.670.940.84 Exons correct0.800.470.250.160.400.370.850.68 Exons overlapped0.850.630.550.420.850.580.850.74 GENSCAN was run on GeneParser test sets I(28sequences)and I I(34sequences),described in Snyder&Stormo(1995).Accuracy statistics for programs other than GENSCAN are from Table1of Snyder&Stormo(1995).For each program,accuracy statistics for test set Iare shown in the left column,for test set I Iin the right column.Nucleotide-level accuracy statistics Sn,Sp and CC were cal-culated as described in the legend to Table1,except that the convention used for averaging the statistics was that of Snyder and Stormo.In this alternative approach,the raw numbers(PP,PN,AP,AN,TP,etc.)from each sequence are summed and the statistics calculated from these total numbers rather than calculating separate statistics for each sequence and then averaging.(For large sequence sets,these two conventions almost always give similar results.)Exon-level accuracy statistics are also calculated in this fashion.Here,exons correct is the proportion of true exons which were predicted exactly(both endpoints correct),essentially the same as exon-level sensitivity.Exons overlapped is the proportion of true exons which were at least overlapped by predicted exons,a less stringent measure of accuracy not requiring exact prediction of splice sites.Each test set was divided into three subsets accord-ing to the C G content of the GenBank sequence:low C G(<45%),medium C G(45to60%),and high C G(>60%).exon-level sensitivity statistic of Burset&GuigoÂ(1996).Comparison of the GENSCAN accuracy statistics for the two GeneParser test sets(Table2) with each other and with those for the Burset/ GuigoÂtest set(Table1)show little difference in predictive accuracy.For example,identical corre-lation coef®cient values of0.93were observed in both GeneParser test sets versus0.92in the Burset/ GuigoÂtest set.Similarly,the proportion of exons correct was0.79and0.76in GeneParser test sets I and II,as compared to0.78for the corresponding value(exon-level sensitivity)in the Burset/GuigoÂset.Again,performance of the program is quite ro-bust with respect to differences in C G content; the somewhat larger¯uctuations observed in Table2undoubtedly relate to the much smaller size of the GeneParser test sets.Of course,it might be argued that none of the accuracy results described above are truly indica-tive of the program's likely performance on long genomic contigs,since all three of the test sets used consist primarily of relatively short sequences con-taining single genes,whereas contigs currently being generated by genome sequencing labora-tories are often tens to hundreds of kilobases in length and may contain several genes on either or both DNA strands.To our knowledge,only one systematic test of a gene prediction program (GRAIL)on long human contigs has so far been re-ported in the literature(Lopez et al.,1994),and the authors encountered a number of dif®culties in car-rying out this test,e.g.it was not always clear whether predicted exons not matching the annota-tion were false positives or might indeed represent real exons which had not been found by the orig-inal submitters of the sequence.As a test of the performance of gene prediction programs on a large human contig,we ran GENSCAN and GRAIL II on the recently sequenced CD4gene re-gion of human chromosome12p13(Ansari-Lari et al.,1996),a contig of117kb in length in which six genes have been detected and characterized ex-perimentally.Annotated genes,GENSCAN predicted genes, and GRAIL predicted exons in this sequence are displayed in Figure1:both programs®nd most of the known exons in this region,but signi®cant differences between the predictions are observed. Comparison of the GENSCAN predicted genes (GS1through GS8)with the annotated(known) genes showed that:GS1corresponds closely to the CD4gene(the predicted exon at about1.5kb is ac-tually a non-coding exon of CD4);GS2is identical to one of the alternatively spliced forms of Gene A; GS3contains several exons from both Gene B and GNB3;GS5is identical to ISOT,except for the ad-dition of one exon at around74kb;and GS6is identical to TPI,except with a different translation start site.This leaves GS4,GS7and GS8as poten-tial false positives,which do not correspond to any annotated gene,of which GS7and GS8are over-lapped by GRAIL predicted exons.A BLASTP(Altschul et al.,1990)search of the predicted peptides corresponding to GS4,GS7and GS8against the non-redundant protein sequence databases revealed that:GS8is substantially identi-cal(BLAST score419,P 2.6E-57)to mouse60S ribosomal protein(SwissProt accession no. P47963);GS7is highly similar(BLAST score150, P 2.8E-32)to Caenorhabditis elegans predicted protein C26E6.5(GenBank accession no.532806); and GS4is not similar to any known protein(no. BLASTP hit with P<0.01).Examination of the se-quence around GS8suggests that this is probably a 60S ribosomal protein pseudogene.Predicted gene GS7might be an expressed gene,but we did not detect any hits against the database of expressed sequence tags(dbEST)to con®rm this.However, we did®nd several ESTs substantially identical to the predicted3H UTR and exons of GS4(GenBank accession no.AA070439,W92850,AA055898, R82668,AA070534,W93300and others),strongly implying that this is indeed an expressed human gene which was missed by the submitters of this sequence(probably because GRAIL did not detect it).Aside from the prediction of this novel gene, this example also illustrates the potential of GEN-SCAN to predict the number of genes in a se-quence fairly well:of the eight genes predicted, seven correspond closely to known or putative genes and only one(GS3)corresponds to a fusion of exons from two known genes.DiscussionAs the focus of the human genome project shifts from mapping to large-scale sequencing,the need for ef®cient methods for identifying genes in anon-ymous genomic DNA sequences will increase.Ex-perimental approaches will always be required to prove the exact locations,transcriptional activity and splicing patterns of novel genes,but if compu-tational methods can give accurate and reliable in-dications of exon locations beforehand,the experimental work involved may often be signi®-cantly reduced.We have developed a probabilistic model of human genomic sequences which ap-proximates many of the important structural and compositional features of human genes,and have described the implementation of this model in the GENSCAN program to predict exon/gene locations in genomic sequences.Novel features of the method include:(1)use of distinct,explicit,em-pirically derived sets of model parameters to cap-ture differences in gene structure and composition between distinct C G compositional regions(iso-chores)of the human genome;(2)the capacity to predict multiple genes in a sequence,to deal with partial as well as complete genes,and to predict consistent sets of genes occuring on either or both DNA strands;and(3)new statistical models of donor and acceptor splice sites which capture po-tentially important dependencies between signal positions.Signi®cant improvements in predictiveaccuracy have been demonstrated for GENSCAN over existing programs,even those which use pro-tein sequence homology information,and we have shown that the program can be used to detect novel genes even in sequences previously subjected to intensive computational and experimental scru-tiny.In practice,several distinct types of computer programs are often used to analyze a newly se-quenced genomic region.The sequence may ®rst be screened for repetitive elements with a program like CENSOR (Jurka et al .,1996).Following this,GENSCAN and/or other gene prediction pro-grams could be run,and the predicted peptide sequences searched against the protein sequence databases with BLASTP (Altschul et al .,1990)to detect possible homologs.If a potential homolog is detected,one might perhaps re®ne the predic-tion by submitting the genomic region corre-sponding to the predicted gene together with the potential protein homolog to the program Pro-crustes (Gelfand et al .,1996),which uses a ``spliced alignment''algorithm to match the geno-mic sequence to the protein.Even in the absence of a protein homolog,it may be possible to con-®rm the expression and precise 3H terminus of a predicted gene using the database of Expressed Sequence Tags (Boguski,1995).Finally,a variety of experimental approaches such as RT-PCR and 3H RACE are typically used (see,e.g.,Ansari-Lari et al .,1996)to pinpoint precise exon/intron boundaries and possible alternatively spliced forms.At this stage,computational approaches may also prove useful, e.g.GENSCANhighFigure 1.A diagram of GenBank sequence HSU47924(accession no U47924,length 116,879bp)is shown with anno-tated coding exons (from the GenBank CDS features)in black,GENSCAN predicted exons in dark gray,and GRAIL predicted exons in light gray.Exons on the forward strand are shown above the sequence line;on the reverse (comp-lementary)strand,below the sequence line.GRAIL II was run through the email server (grail@):®nal pre-dicted exons of any quality are shown.Exon sizes and positions are to scale,except for initial,terminal and single-exon genes,which have an added arrow-head or -tail (see key above)which causes them to appear slightly larger than their true size.Since GRAIL does not indicate distinct exon types (initial versus internal versus terminal exons),all GRAIL exons are shown as internal exons.Gene names for the six annotated genes in this region (CD4,Gene A,Gene B,GNB3,ISOT and TPI)are shown on the annotation line,immediately preceding the ®rst coding exon of the gene.The GENSCAN predicted genes are labeled GS1to GS8as they occur along the sequence.。