Mining phenotypes and informative genes from gene expression data

合集下载

微生物遗传

微生物遗传

Mendel
• Born
Johann Mendel孟德尔in 1822 – Took name of Gregor孟德尔 as a monk(道名) Mendel was a member of a monastery修道院in what is now the Czech Republic(捷克共和国) Studied physics and botany植物学in the University of Vienna维也纳(1851-1853) Began first hybridization杂交experiments on the garden pea豌豆in 1856 Research ended in 1868 when promoted to abbot (修道院院长)
Major advantages of classical genetic approach
○ Mutants can be isolated and characterized without any a priori (以前的)understanding of the molecular basis of the function.
Dominant显性的: An allele which is expressed (masks the other).
Recessive:隐性An allele which is present but remains unexpressed (masked) Homozygous:纯合子Both alleles for a trait are the same. Heterozygous:杂合子The organism's alleles for a trait are

RIC1在拟南芥根的生长发育过程中正调控生长素信号负调控ABA信号

RIC1在拟南芥根的生长发育过程中正调控生长素信号负调控ABA信号

Arabidopsis ROP-interactive CRIB motif-containing protein 1(RIC1)positively regulates auxin signalling and negatively regulates abscisic acid (ABA)signalling during root developmentYUNJUNG CHOI 1,YUREE LEE 1,SOO YOUNG KIM 3,YOUNGSOOK LEE 1,2&JAE-UNG HWANG 11POSTECH-UZH Global Research Laboratory,Division of Molecular Life Sciences,Pohang University of Science andTechnology (POSTECH),Pohang 790-784,Korea,2Division of Integrative Bioscience and Biotechnology,POSTECH,Pohang 790-784,Korea and 3Department of Molecular Biotechnology &Kumho Life Science Laboratory,College of Agriculture and Life Sciences,Chonnam National University,Gwangju 500-757,KoreaABSTRACTAuxin and abscisic acid (ABA)modulate numerous aspects of plant development together,mostly in opposite directions,suggesting that extensive crosstalk occurs between the signal-ling pathways of the two hormones.However,little is known about the nature of this crosstalk.We demonstrate that ROP-interactive CRIB motif-containing protein 1(RIC1)is involved in the interaction between auxin-and ABA-regulated root growth and lateral root formation.RIC1expression is highly induced by both hormones,and expressed in the roots of young seedlings.Whereas auxin-responsive gene induction and the effect of auxin on root growth and lateral root formation were suppressed in the ric1knockout,ABA-responsive gene induction and the effect of ABA on seed germination,root growth and lateral root for-mation were potentiated.Thus,RIC1positively regulates auxin responses,but negatively regulates ABA responses.Together,our results suggest that RIC1is a component of the intricate signalling network that underlies auxin and ABA crosstalk.Key-words :hormone crosstalk;lateral root;RIC protein;root growth;ROP GTPase.INTRODUCTIONAuxin and abscisic acid (ABA)are two major plant growth regulators.In general,auxin promotes the growth of vegeta-tive tissues,whereas ABA suppresses proliferation and confers stress resistance.For example,auxin promotes lateral root initiation,whereas ABA inhibits it.Auxin opens stomata,whereas ABA closes them.Such antagonistic effects of two hormones have been reported to regulate numerous stress responses and developmental and physiological pro-cesses in the plant (Gehring,Irving &Parish 1990;Casimiro et al .2003;Tanaka et al .2006).The interaction between auxin and ABA seems to be more complex during early seedlingdevelopment and primary root elongation than later on.Although both auxin and ABA are necessary for early seed-ling development,exogenously applied ABA,which presum-ably is applied at a significantly greater concentration than the endogenous hormone,inhibits growth.Primary root elon-gation is promoted by nanomolar amounts of both auxin and ABA (Gaither,Lutz &Forrence 1975;Mulkey,Kuzmanoff &Evans 1982),but is inhibited by higher concentrations of these hormones (Pilet &Chanson 1981;Mulkey et al .1982;Eliasson,Bertell &Bolander 1989).The observation that the two hormones function together to regulate many responses indicates that the signalling pathways that transduce the primary hormonal signals to downstream responses may intersect at specific points and/or involve common players.Indeed,the expression of ABA INSENSITIVE 3(ABI3)is activated by both auxin and ABA,and ABI3functions as a positive regulator of ABA-mediated inhibition of seed ger-mination and as a negative regulator of auxin-mediated lateral root formation and ABA-mediated inhibition of primary root growth (Brady et al .2003;Zhang,Garreton &Chua 2005).Given the vast array of responses of plants to auxin and ABA,one would expect that many such points of crosstalk exist;however,this aspect of auxin and ABA signal transduction remains largely unexplored.Rho family GTPases act as molecular switches that mediate diverse cellular responses to multiple extracellular signal including hormones (Bos 2000).ROP (Rho of plants;also called RAC)GTPases represent the sole Rho family of Ras-related G proteins in plants (Yang 2002),and the model plant Arabidopsis contains 11ROP GTPases in its genome (Bischoff et al .1999;Winge et al .2000;Zheng &Yang 2000).Several studies have reported that ROP GTPases play impor-tant roles in auxin-and ABA-related responses (Lemichez et al .2001;Tao,Cheung &Wu 2002;Zheng et al .2002;Bloch et al .2005;Tao et al .2005).Auxin treatment increases the amount of activated ROP GTPase in tobacco (Tao et al .2002)and Arabidopsis (Xu et al .2010;Lin et al .2012)seed-lings.Overexpression of wild-type or constitutively active forms of ROP GTPases stimulates auxin-related phenotypes and auxin-responsive gene expression in Arabidopsis and tobacco (Li et al .2001;Tao et al .2002,2005).Activated ROP GTPases promote the 26S proteasome-dependentCorrespondence:Y.Lee.Fax:+82542792199;e-mail:ylee@postech.ac.kr;J-U.Hwang.Fax:+82542792199;e-mail:thecute@postech.ac.krY.L.and J-U.H.contributed equally to the manuscript.Plant,Cell and Environment (2013)36,945–955doi:10.1111/pce.12028©2012Blackwell Publishing Ltd945degradation of auxin/indole-3-acetic acid(AUX/IAA)pro-teins in tobacco and Arabidopsis(Tao et al.2005).ROP GTPase mutations cause defects in auxin-dependent cell expansion(Fu et al.2005,2009;Xu et al.2010).In contrast to the positive role of ROP GTPases in the auxin response, ROP GTPases appear to be negative regulators of ABA responses.ABA treatment reduces the amount of activated ROP GTPase in Arabidopsis suspension cells and seedlings (Lemichez et al.2001).Expression of constitutively active forms of Arabidopsis ROP2and ROP6reduces sensitivity to ABA during seed germination(Li et al.2001)and stomatal closing(Lemichez et al.2001;Hwang et al.2011).The obser-vation that an Arabidopsis mutant that lacks ROP10expres-sion is hypersensitive to ABA,and that ROP10expression is suppressed by ABA,suggests the existence of an interesting feedback regulation loop in the ABA signalling pathway (Zheng et al.2002).RICs(ROP-interactive CRIB motif-containing proteins) are a unique group of interacting partners of activated ROP GTPases.RIC proteins interact with multiple ROP GTPases via their conserved CRIB motif,and link ROP proteins to diverse target molecules that bind to their variable domains (Yang2002).The11RIC genes present in Arabidopsis are categorized into four phylogenetic groups(Wu et al.2001;Gu et al.2005).However,knowledge on RIC functions is limited; RIC3and RIC4have been shown to regulate[Ca2+]cyt and F-actin dynamics during the polar growth of pollen tubes (Wu et al.2001;Gu et al.2005).RIC7is reported to interact with active ROP2in stomatal guard cells and to suppress light-induced stomatal opening(Jeon et al.2008).In epider-mal cells of the leaf and hypocotyl,RIC1suppresses aniso-tropic cell expansion by regulating microtubule(MT) dynamics(Fu et al.2005,2009;Xu et al.2010).RIC1is expressed in a broad range of tissues(Wu et al. 2001).However,the function of RIC1has been analysed mostly in the development of leaf pavement cells(Fu et al. 2005).In this cell type,RIC1is associated with MTs and regulates their assembly.In the lobe-forming regions of pave-ment cells,RIC1is inactivated by active ROP2,which sup-presses MT assembly,but promotesfine F-actin assembly and thereby induces outgrowth of the region.In contrast,in the neck-forming regions of pavement cells,RIC1is activated by active ROP6and then promotes the assembly of MTs,which limits the expansion of the region and results in the forma-tion of a narrow neck.The cortical MTs in the leaf pavement cells of ric1mutants are randomly organized,resulting in pavement cells with wider necks.This ROP6-RIC1-MT sig-nalling pathway seems to function in both hypocotyl elonga-tion and leaf epidermal cell development(Fu et al.2005, 2009).In pollen tubes,however,RIC1is localized to the apical plasma membrane,where MTs are absent,and over-expression of RIC1suppresses the depolarized tube growth induced by ROP1overexpression(Wu et al.2001).Given its broad expression pattern,RIC1may mediate diverse pro-cesses in the growth and development of plants,which have yet to be elucidated.In this work,we established that RIC1positively regulates the auxin effect and negatively regulates the ABA effect during root growth and lateral root development.These results will advance our current limited understanding on the mode of action of RIC1protein during regulation of plant development by auxin and ABA.MATERIALS AND METHODSPlant materials and growth conditionsSeeds of wild-type,ric1,and ric1/RIC1p:GFP:RIC1Arabi-dopsis thaliana plants(ecotype Ws)were surface sterilized, placed at4°C in the dark for2d,and then sown in half-strength Murashige and Skoog(MS)agar medium.Arabi-dopsis seedlings were grown in a growth chamber with a16h light/8h dark cycle at22°C.Isolation of the RIC1knockout mutantsSeeds of T-DNA insertion mutant for RIC1(ric1; FLAG_075E05)were obtained from Institut National de la Recherche Agronomique(INRA)-Versailles Genomic Resource Center(http://www-ijpb.versailles.inra.fr/en/cra/ cra_accueil.htm).Reverse transcriptase(RT)-PCR analysis using gene-specific primers confirmed that this is a null mutant.Primer information used for RT-PCR is available in Supporting Information Table S1.Complementation of ric1with RIC1p:GFP:RIC1For the ric1complementation assay,the RIC1promoter region(~2kb)and the RIC1open reading frame were indi-vidually obtained by PCR amplification.These genomic DNA fragments and GFP coding sequence were sequentially cloned into a pCR®8/GW/TOPO®vector(Invitrogen,Carls-bad,CA,USA),and then transferred into a pMDC100 gateway vector(Curtis&Grossniklaus2003).The RIC1p:GFP:RIC1construct was transformed into ric1plants by the Agrobacterium-mediatedfloral dipping method (Clough&Bent1998).The phenotypes of the T3seedlings of homozygous ric1/RIC1p:GFP:RIC1lines were observed. RIC1p:GUS expression assayThe genomic DNA fragment containing the promoter region (~2kb)andfirst exon of RIC1was amplified by PCR and fused to the GUS-coding region of the pMDC164vector (RIC1p:GUS).The RIC1p:GUS construct was transformed into wild-type Arabidopsis plants using the Agrobacterium-mediatedfloral dipping method(Clough&Bent1998). RIC1p:GUS expression was observed in T3plants from six independently transformed lines.Briefly,the seedlings of RIC1p:GUS were incubated in GUS staining buffer[100m m Na2HPO4(pH7.2),3m m potassium ferricyanide,3m m potas-sium ferrocyanide,10m m ethylenediaminetetraacetic acid (EDTA),0.1%Triton X-100,and2m m5-bromo-4-chloro-3-indolyl-b-D-glucuronide(Duchefa,Haarlem,The Nether-lands)]at37°C for12h.Chlorophyll was extracted in70% ethanol solution.946Y.Choi et al.©2012Blackwell Publishing Ltd,Plant,Cell and Environment,36,945–955Observation of the cellular localizationof GFP:RIC1Wild-type Arabidopsis plants were stably transformed with a GFP:RIC1construct under the control of CaMV35S pro-moter.In multiple independently transformed lines of Arabidopsis,the subcellular localization of GFP:RIC1was observed by using a Zeiss LSM510Meta Laser scanning microscope(Zeiss,/).To investigate the effects of auxin and ABA on the cellular localization of RIC1,7-day-old seedlings that stably express GFP:RIC1 were incubated in half-strength MS medium containing1m m auxin[naphthalene-1-acetic acid(NAA)]or10m m ABA for1h.Quantification of RIC and ABA-orauxin-responsive gene transcript levelsQuantitative real-time RT-PCR(Q-PCR)was used to quan-tify transcript levels of RIC genes,ABA-responsive genes and auxin-responsive genes.Total RNA was extracted from each sample and then reverse transcribed into cDNA. Q-PCR was carried out using a Takara TP800thermal cycler and Takara SYBR RT-PCR Kit(Takara Bio,Kyoto,Japan), following the manufacturer’s instructions.Transcript levels of RIC s,ABA-responsive genes and auxin-responsive genes were normalized against that of tubulin8. Measurement of lateral root formation and primary root growthTo examine the effects of ABA or auxin on lateral root formation and primary root growth,Arabidopsis seedlings were grown for4d under a16h photoperiod and then trans-ferred to fresh half-strength MS agar plates supplemented with the indicated concentrations of ABA or auxin.After an additional5–7d,the net elongation of primary roots was measured and the number of lateral roots was counted using a stereo microscope(Olympus SZX12,Tokyo,Japan). Seed germination assayFor germination assays,the seeds were placed in the dark at 4°C for2d and then sown on MS medium agar plates con-taining1%sucrose in the presence or absence of0–1.5m m concentrations of ABA.The seeds were incubated under a 16h photoperiod at22°C.Germinated seeds(as determined by cotyledon greening or radicle emergence)were scored every12h for6d.Germination ratio refers to the number of germinated seeds as a proportion of the total number of seeds tested.RESULTSRIC1expression is induced by both auxinand ABAMembers of the ROP small GTPase family are reported to mediate ABA and auxin responses(Li et al.2001;Zheng et al.2002;Tao et al.2005).We hypothesized that RIC proteins might serve an important intermediary in the ROP-mediated ABA and auxin signalling pathways.If this hypothesis was true,then the level of the RIC proteins may be substantially regulated by ABA and auxin.Thus,we tested the effect of auxin and ABA on the expression of10out of the11Arabi-dopsis RIC genes(Fig.1).Arabidopsis seedlings were grown on half-strength MS medium for7d and then treated with 1m m NAA or10m m ABA for1h.Total RNA was isolated from these seedlings and RIC gene transcript levels were examined using quantitative real-time PCR(Q-PCR).Upon NAA and ABA treatment,the transcript level of many RIC s (RIC2,3,4,5,7,9and11)increased,but the increase in RIC1 was the highest.Therefore,we chose to focus our analysis on RIC1.Loss of RIC1expression alters gene induction by auxin and ABAAuxin and ABA induce the expression of sets of genes, which are known as auxin-responsive and ABA-responsive genes,respectively(Brady et al.2003;Zhang et al.2005;Li et al.2009).To examine the involvement of RIC1in auxin and ABA signal transduction,we evaluated the effect of RIC1knockout(ric1)on the expression levels of typical auxin-and ABA-responsive genes.ric1,a T-DNA insertion mutant(Stock No.FLAG_075E05),was obtained from INRA-Versailles Genomic Resource Center(http://www-ijpb.versailles.inra.fr/en/cra/cra_accueil.htm).RT-PCR analy-sis confirmed that the T-DNA insertion into the fourth exon completely blocked the expression of RIC1in ric1(Fig.2a). The small auxin-up RNAs(SAURs)encode short tran-scripts that accumulate rapidly upon auxin treatment(Li et al.2009).IAA6and IAA19are members of INDOLE-3-ACETIC ACID/AUXIN(IAA/AUX)genes andtheir Figure1.Expression of RIC genes were induced by auxin and abscisic acid(ABA).Seven-day-old Arabidopsis seedlings were treated without or with1m m auxin[naphthalene-1-acetic acid (NAA)]or10m m ABA for1h,and total RNA was isolated from seedlings for Q-PCR analysis.The transcript levels of RIC genes were normalized against the transcript level of Tubulin8,which served as the internal control,and are presented as values relative to the untreated control.Data are meansϮSEM of three to eight biological replicates.Asterisks indicate values that are statistically significantly different from the untreated control(***P<0.005;**P<0.001;*P<0.05).RIC1regulates root development947©2012Blackwell Publishing Ltd,Plant,Cell and Environment,36,945–955948Y.Choi et al.©2012Blackwell Publishing Ltd,Plant,Cell and Environment,36,945–955expressions are induced by auxin(Abel,Nguyen&Theologis 1995;Tatematsu et al.2004).Using Q-PCR analysis,we com-pared the transcript levels of SAUR and IAA genes in ric1 seedlings with those in wild-type seedlings(Fig.2b).Under control conditions without NAA treatment,the transcript levels of these genes in ric1seedlings were similar to or slightly higher than those in wild-type seedlings(Fig.2b).In 7-day-old wild-type seedlings,treatment with1m m NAA for 1h induced a two-to fourfold increase in SAUR gene expres-sion(Fig.2b).Interestingly,however,the induction of SAUR genes by1m m NAA was suppressed in ric1seedlings (Fig.2b);SAUR9transcript level increased3.3Ϯ0.1-fold in the wild type,but1.7Ϯ0.1-fold in ric1(t-test,P<0.005); SAUR15transcript level increased3.9Ϯ0.6-fold in the wild type,but1.8Ϯ0.2-fold in ric1(t-test,P<0.01);SAUR23 transcript level increased3.0Ϯ0.3-fold in the wild type,but 1.4Ϯ0.2-fold in ric1(t-test,P<0.005);SAUR62transcript level increased2.5Ϯ0.2-fold in the wild type but1.4Ϯ0.2-fold in ric1(t-test,P<0.001);and SAUR66transcript level increased3.9Ϯ0.4-fold in the wild type,but1.7Ϯ0.2-fold in ric1(t-test,P<0.001).Similarly,induction of IAA6and IAA19upon NAA treatment was suppressed by ric1(Fig.2b bottom panel),whereas the transcript level of IAA6 increased21.9Ϯ1.3-fold in the wild type,but only11.3Ϯ0.2-fold in ric1(P<0.05),and the transcript level of IAA19 increased27.1Ϯ3.3-fold in the wild type,but only13.1Ϯ1.9-fold in ric1(P<0.05).These results indicate that RIC1is involved in the control of the auxin signalling pathway.ABI3,ABI5,responsive to ABA18(RAB18),and respon-sive to dehydration29A(RD29A)and29B(RD29B)are well-characterized ABA-responsive genes that play critical roles in ABA signalling(Parcy et al.1994;Finkelstein& Lynch2000;Lopez-Molina&Chua2000;Hoth et al.2002; Kang et al.2010).To analyse the involvement of RIC1in ABA signalling,we gauged the effects of ABA on expres-sions of these genes in the roots of wild-type and ric1seed-lings(Fig.2c).Seven-day-old seedlings were incubated in half-strength liquid MS medium in the presence or absence of0.5m m ABA for1h.Under control conditions(i.e.in the absence of ABA),transcript levels of ABI3,ABI5,RD29A, RD29B and RAB18were slightly higher in ric1seedlings than in wild-type seedlings,and this difference was further increased after ABA treatment(Fig.2c).Upon ABA treat-ment,ric1seedlings exhibited much higher transcript levels of thosefive ABA-responsive genes,compared with wild-type seedlings(Fig.2c);ABI3transcript level increased 2.4Ϯ0.5-fold in the wild type,but 5.5Ϯ0.9-fold in ric1 (P<0.01);ABI5transcript level increased5.9Ϯ0.4-fold in the wild type,but12.9Ϯ1.9in ric1(P<0.005);RD29A tran-script level increased12.3Ϯ3.9-fold in the wild type,but 17.4Ϯ6.6-fold in ric1(P<0.06);RD29B transcript level increased5.5Ϯ1.2-fold in the wild type,but10.9Ϯ0.8-fold in ric1(P<0.05);and RAB18transcript level increased 4.6Ϯ1.1-fold in the wild type,but8.0Ϯ1.3-fold in ric1 (P<0.01).In summary,RIC1knockout suppressed the induction of auxin-responsive genes by auxin,but promoted the induction of ABA-responsive genes by ABA.These results suggest that RIC1exerts opposite regulatory functions in the auxin and ABA signalling pathways.RIC1is expressed in the roots ofyoung seedlingsTo identify which auxin-and ABA-mediated processes are regulated by RIC1,wefirst determined the tissue-specific and developmental stage-specific expression of RIC1.The genomic DNA region containing the RIC1promoter(~2kb) and thefirst exon was fused to the GUS-coding region (RIC1p:GUS),and introduced into wild-type plants. RIC1p:GUS expression was observed in T3seeds and seed-lings from six independently transformed Arabidopsis lines (Fig.3a,b).In germinating seeds and young seedlings,the RIC1p:GUS signal was evident in roots.In germinating seeds,RIC1p:GUS signal was limited to the embryonic root tip(Fig.3a,left),and in seedlings at1–3d after sowing,RIC1p:GUS extended the expression to other parts of root including differentiation zone,root hairs and root–shoot junction(Fig.3a,right).In the roots of2-week-old plants,RIC1p:GUS signal was detected in root tips and also in maturation zone,where lateral roots grow out(Fig.3b).RIC1p:GUS signal was strongly detected in columella cells from the root tip, (Fig.3b-d).In maturation zone of root,cells surrounding emerged lateral root(Fig.3b-b)and epidermal cells at the base of lateral roots(Fig.3b-c)showed clear RIC1p:GUS signals.These RIC1expression patterns indicate that RIC1is likely to be involved in the regulation of seed germination, early seedling development and root development.In addition to being expressed in the roots,RIC1p:GUS was also expressed in the hypocotyls,petioles,and weakly in the leaves of young seedlings(Fig.3a,b).In Arabidopsis plants of later development stages,expression of RIC1p:GUS was weak except inflowers,where RIC1expression was pre-viously reported(Wu et al.2001;Fu et al.2005,2009;Xu et al.Figure2.ric1knockout mutation altered induction of auxin-and abscisic acid(ABA)-responsive genes.(a)Schematic structure of theRIC1gene(left).The triangle indicates the T-DNA insertion site in ric1.Exons are represented as boxes and introns as lines.RT-PCR analysis using a RIC1-specific primer set(RT-F and RT-R)shows that ric1is a true null mutant(right).Tubulin8was used as an internal control.(b)Expression of SAUR9,SAUR15,SAUR23,SAUR62,SAUR66,IAA6and IAA19in plants treated or not with1m m auxin [naphthalene-1-acetic acid(NAA)].(c)Expression of ABI3,ABI5,RD29A,RD29B and RAB18in plants treated or not with0.5m m ABA.Q-PCR analyses of transcripts of auxin-and ABA-responsive genes were performed using total RNA isolated from the roots of8-day-old seedlings after1h of treatment without or with auxin or ABA.Data were normalized using Tubulin8as an internal control,and are presented as values relative to the untreated wild type(WT).Data are meansϮSEM of four independent experiments.Asterisks indicate values that are significantly different from those of the WT(***P<0.005;**P<0.01;*P<0.05;#P<0.06).RIC1regulates root development949©2012Blackwell Publishing Ltd,Plant,Cell and Environment,36,945–9552010);RIC1p:GUS signal was strongly observed in the anthers and mature pollen grains (Supporting Information Fig.S1).RIC1knockout suppresses the effect of auxin on lateral root formation and primary root elongationAs RIC1is expressed in root (Fig.3)and RIC1expression is up-regulated by auxin (Fig.1),we examined whether auxin-dependent root growth and lateral root formation were affected in the ric1mutant (Fig.4).Arabidopsis seedlings were grown on half-strength MS medium for 4d and then transferred to fresh half-strength MS medium supplemented with various concentrations (0–100n m )of NAA.After 5d,the number of lateral roots (including newly emerged ones)was counted.Auxin promoted the formation of lateral roots in different genotypes,including the wild type,ric1,and ric1/RIC1p:GFP:RIC1(complementation lines,C1and C2;Fig.4a);however,this effect was significantly less in ric1than in the wild type and complementation lines (Fig.4b).InFigure 3.RIC1expression in Arabidopsis plants.(a)One dayafter sowing (DAS),an embryo exhibited RIC1p::GUS signal at the root tip (left,indicated by arrow).Seed coat was removed after GUS staining for observation.A young seedling that had justgerminated but had not yet undergone cotyledon expansion (right;1–3DAS)exhibited relatively stronger RIC1p::GUS signal in the root tip and differentiation zone of the root including the root hairs.Bar =300m m.(b)A 2-week-old seedling displayed RIC1p::GUS signal in the root tip and maturation zone withemerged lateral roots,and,weakly,in the shoot.(b-a )Bar =1cm.(b-b ),(b-c )and (b-d )are enlarged images of maturation zone with an emerged lateral root,a lateral root with GUS staining at the base,and a root tip with GUS staining in columellacells.Figure 4.Auxin responses were reduced in ric1plants.Arabidopsis seedlings were grown vertically on half-strength Murashige and Skoog (MS)medium for 4d,transferred to fresh half-strength MS medium supplemented or not with 0–100n m auxin [naphthalene-1-acetic acid (NAA)],and grown for anadditional 5–7d.(a)Levels of RIC1transcript in wild-type (WT),ric1,and ric1/RIC1p:GFP:RIC1(lines C1and C2)plants.RIC1expression in seedlings was quantified using Q-PCR and presented values relative to that of WT.Data are means ϮSEM of four independent experiments.(b)Number of lateral roots formed in the absence or presence of NAA (means ϮSEM of 28seedlings from four independent experiments),measured 5d after transfer to medium supplemented with or without NAA.Asterisks indicate values that are significantly different from those of the WT atP <0.05.(c)Relative values of lateral root number (mean ϮSEM,n =28).Values in (b)were normalized to the values of non-treated controls.Asterisks indicate values that are significantly different from those of the WT (***P <0.005).(d)Primary root elongation in the absence or presence of NAA (means ϮSEM of 28seedlings from four independent experiments).The net primary root growth was measured 7d after transfer to medium supplemented with or without NAA.Asterisks indicate values that are significantly different from those of the WT (**P <0.05;*P <0.1).(e)Relative values of net primary root elongation (mean ϮSEM,n =28).Values in (d)were normalized to the values ofnon-treated controls.Asterisks indicate values that are significantly different from those of the WT (***P <0.005;*P <0.05).950Y.Choi et al .©2012Blackwell Publishing Ltd,Plant,Cell and Environment,36,945–955the absence of exogenous auxin(0n m NAA),ric1plants produced more lateral roots than did the wild type and ric1 complementation lines(n=28,N=4,P<0.05;Fig.4b). However,in the presence of auxin,lateral root number was less in ric1plants than in the wild type(Fig.4b,P<0.05). Whereas80and100n m NAA increased the number of lateral roots per unit length(cm)of primary root in wild-type seedlings to734and920%of non-treated control values, respectively,the same concentration of NAA increased this number in ric1to466and613%of control values,respec-tively(Fig.4c).This alteration in lateral root formation in ric1 plants was completely reversed by the expression of RIC1 driven by its native promoter(ric1/RIC1p:GFP:RIC1).In two independent ric1/RIC1p:GFP:RIC1lines(C1and C2), lateral root formation was recovered to wild-type levels both in the absence and presence of NAA(Fig.4b,c).The effect of RIC1knockout on primary root elongation was also examined(Fig.4d,e).Seven days after transfer to half-strength MS medium supplemented or not with NAA, the net elongation of primary roots was measured.In medium lacking NAA,ric1mutants had reduced primary root elongation compared with wild-type plants(n=28, N=4,P<0.05);the net primary root elongation of wild-type seedlings was4.9Ϯ0.16cm,while that of ric1seedlings was 4.3Ϯ0.22cm(Fig.4d).NAA(50–100n m)inhibited primary root elongation in both ric1and wild-type seedlings (Fig.4d,e).However,ric1seedlings were less sensitive than the wild type to NAA(n=28,N=4,P<0.1);whereas100n m NAA reduced primary root elongation in wild-type seedlings by34.4%(to3.2Ϯ0.16cm),it inhibited that in ric1seedlings by only19.4%(to3.5Ϯ0.13cm).Primary root elongation in the complementation lines(C1and C2)was similar to that in the wild type,in both the absence and presence of NAA (Fig.4d,e).These results suggest that RIC1participates in auxin-regulated lateral root development and primary root elongation.RIC1knockout enhances the effect of ABA on seed germination,lateral root formation and primary root elongationWe then tested whether RIC1knockout also altered the plant’s response to ABA by comparing seed germination and root development in ric1and wild-type plants treated with ABA(Figs5&6).Seeds were sown on half-strength MS medium after2d of stratification.In the presence of0.5m m ABA,seed germination,as gauged by cotyledon greening, was delayed to a greater extent in ric1than in wild-type seeds (Fig.5a,b);whereas only25%of ric1seedlings exhibited cotyledon greening84h after sowing,66%of wild-type seed-lings exhibited greening(Fig.5b).Similar enhanced sensitiv-ity to ABA in inhibition of seed germination was observed when germination rate was analysed based on radicle emer-gence(Supporting Information Fig.S2).Seed germination rates in the ric1/RIC1p:GFP:RIC1lines were restored to wild-type levels in the presence of0.5m m ABA,confirming that loss of RIC1expression was responsible for the enhanced suppression of seed germination by ABA(Fig.5b).The delayed germination of ric1seeds in the presence of ABA was not due to a defect in seed development,because the germination rate of ric1seeds in the absence of ABA was not significantly different from that of wild type(Fig.5c and Supporting Information Fig.S2b).ABA was reported to inhibit root elongation and lateral root development(Pilet&Chanson1981).We compared primary root elongation and lateral root number in ric1and wild-type plants in the presence and absence ofexogenous Figure5.Inhibition of seed germination by abscisic acid(ABA) was enhanced in the ric1mutant.(a)Representative photographs showing young seedlings of wild type(WT),ric1,and the tworic1/RIC1p:GFP:RIC1lines(C1and C2)in the presence of0.5m m ABA(taken96h after sowing).(b)Seed germination rate in the presence of0.5m m ABA,measured as a percentage of seedlings with green cotyledons at the indicated time points after sowing on half-strength Murashige and Skoog(MS)medium supplemented with ABA.Data are meansϮSEM of three independent experiments.(c)Seed germination rate in the absence of ABA. There was no significant difference between genotypes.RIC1regulates root development951©2012Blackwell Publishing Ltd,Plant,Cell and Environment,36,945–955。

首个以我国植物学家吴征镒命名的新属发表

首个以我国植物学家吴征镒命名的新属发表

相互作用 的 R N A ) 序 列也 与 r D N A完全 匹配 。这 些都 提示 ,
与r D N A完全 匹配 的微小 R N A很可 能是一类 具有生 物学功
能的非编码小 R N A 。
据2 0 1 3 年2 月2 5日 《 科 技 日报》 报道 , 首个 以著 名植 物
学家 吴征镒院士命名 的荨麻科 新属—— 征镒 麻 属在 国际权
能和 m i R N A等一样具有生物学功能 ; 另外, 目前 已知 的很多 小R N A与 r D N A完全 匹配 , 也就是说这些 已知 的小 R N A就
是s r R N A。
以及植物资源研究领域的著名学者 , 为现代植物学在 中国的
发 展以及植物 资源 的保护 和利 用 作 出了 突出 贡献 , 并荣 获 2 0 0 7年度 国家最高科 学技 术奖 。
1 0种 m i R N A序 列和 r D N A完全 匹配 , 并 且小 鼠 p i R一1 6等
已经鉴定 的 6 0种 p i R N A( P i w i —i n t e r a c t i n g R N A, 与P i w i 蛋 白
[ 物种新发现 ] 首个以我 国植物学家吴征镒命名 的新属发表
分可 以与 r D N A完全 匹配 。进 一步 的小 鼠糖尿 病模 型研 究
吉首 大学和云南师范大学合作 , 综合研究 了传统形态学 、 微形 态、 染色体及分子系统学等多学科证据 , 确立 了采 自于湖北神 农架的荨麻科 高大草本植物 为一新 属 , 并正 式命名 为征镒 麻 属和新种征镒 麻。研究首次报道 了该新属 的染色体数 目及核
吉 林发现小鲵新 物种—— “ 吉林爪鲵”
据光明网2 0 1 3 年3 月9 3援引《 1 新闻晚报》 消息 , 几年

疾病关键基因的挖掘方法及其应用

疾病关键基因的挖掘方法及其应用

分类号:TP301 Q81 密级:公开U D C:单位代码:10424学位论文疾病关键基因的挖掘方法及其应用杨梅茜申请学位级别:硕士学位专业名称:应用数学指导教师姓名:王淑栋职称:教授山东科技大学二零一二年五月论文题目:疾病关键基因的挖掘方法及其应用作者姓名:杨梅茜入学时间:2009年9月专业名称:应用数学研究方向:生物信息与智能计算指导教师:王淑栋职称:教授论文提交日期:2012年5月论文答辩日期:2012年6月授予学位日期:A METHOD OF MINING KEY GENES OF DISEASES ANDITS APPLICATIONSA Dissertation submitted in fulfillment of the requirements of the degree ofMASTER OF SCIENCEfromShandong University of Science and TechnologybyYang MeixiSupervisor: Professor Wang ShudongCollege of Information Science and EngineeringMay 2012声明本人呈交给山东科技大学的这篇硕士学位论文,除了所列参考文献和世所公认的文献外,全部是本人在导师指导下的研究成果,该论文资料尚未呈交于其它任何学术机关作鉴定。

硕士生签名:日期:AFFIRMATIONI declare that this dissertation, submitted in fulfillment of the requirements for the award of Master of Science in Shandong University of Science and Technology, is wholly my own work unless referenced of acknowledge. The document has not been submitted for qualification at any other academic institute.Signature:Date:摘要从诸多致病基因中发掘疾病的“关键”基因,对一些顽症的诊断与治疗以及药物设计都具有重要意义,也是当前生物信息学研究的一个重要课题。

Asthma Immune Phenotypes哮喘免疫表型

Asthma Immune Phenotypes哮喘免疫表型

Allergic Asthma: Pathways
Generation of Allergic Adaptive Immune Responses
APC
CD80 MHC II TCR
CD86 CD28
T Lymphocyte
IL-4
B-cell
IgE
Th2
IL -13
IL-5
Th1
Mast Cell
Eosinophils
Severe Asthma •


Definition
Phenotypes - Pathologic/Clinical
Therapeutic Options
Inflammation and Remodeling in Asthma
Courtesy of Marllyn Glassberg, MD
Lung Function: Inhibition of IL-13
Corren et al. NEJM 2011; 365:1088
Non-eosinophilic Asthma
Eosinophilic and non-eosinophilic asthma: pathologic comparison
Asthma: Epidemiology
• • • • • • • Between 150-300 million patients worldwide 15-25 million in the U.S. Most common chronic disease of childhood Over 500,000 E.R. visits per year 25,000 ICU admissions 5-6,000 deaths in U.S. On the increase

ch7.Online Mendelian Inheritance in Man (OMIM) A Directory of Human Genes and Genetic Disorders

ch7.Online Mendelian Inheritance in Man (OMIM)  A Directory of Human Genes and Genetic Disorders

7. Online Mendelian Inheritance in Man (OMIM):A Directory of Human Genes and Genetic DisordersDonna Maglott, Joanna S. Amberger, and Ada HamoshCreated: October 9, 2002SummaryOnline Mendelian Inheritance in Man ( OMIM TM*) is a timely, authoritative compendium of bibliographic material and observations on inherited disorders and human genes. It is the continuously updated electronic version of Mendelian Inheritance in Man (MIM). MIM was last published in 1998 (1) and is authored and edited by Dr. Victor A. McKusick and a team of science writers, editors, scientists, and physicians at The Johns Hopkins University [] and around the world (2). Curation of the database and editorial decisions take place at The Johns Hopkins University School of Medicine. OMIM provides authoritative free text overviews of genetic disorders and gene loci that can be used by clinicians, researchers, students, and educators. In addition, OMIM has many rich connections to relevant primary data resources such as bibliographic, sequence, and map information.Content and AccessOMIM EntriesOMIM comprises descriptive, full-text MIM entries, a tabular Synopsis of the ( Human Gene Map[/htbin-post/Omim/getmap]) that includes the Morbid Map [http://www./htbin-post/Omim/getmorbid], clinical synopses, and mini-MIMs.OMIM entries are authored and edited by experts in the field and by the OMIM staff, based on information in the published literature. All entries are assigned a unique, stable, six-digit ID num-ber and provide names and symbols used for the disorder and/or gene, a literature-baseddescription, citations, contributor information, and creation and editing dates. Because MIM isderived from the primary literature, the text is replete with citations and links to PubMed. OMIMauthors create entries for each unique gene or Mendelian disorder for which sufficient informationexists and do not wittingly create more than one entry for each gene locus.MIM is organized into autosomal, X-linked, Y-linked, and mitochondrial catalogs, and MIM numbers are assigned sequentially within each catalog (Table 1). The kinds of information thatmay be included in MIM entries are approved name and symbol (obtained from the HUGONomenclature Committee), alternative names and symbols in common use, and a text descriptionof the disease or gene. Many of the longest entries and most new entries in MIM have headings **Trademark status. OMIM TM and Online Mendelian Inheritance in Man TM are trademarks of The Johns Hopkins University.within the text that may include Clinical Features, Inheritance, Population Genetics, Heterogene-ity, Genotype/Phenotype Correlations, Cloning, Gene Structure, Gene Function, Mapping, andmore. Information on selected disease-causing mutations is contained in the Allelic Variant sec-tion of the entry describing the gene.Table 1. The OMIM numbering system.MIM number range a Explanation100000-199999Autosomal loci or phenotypes (entries created before May 15, 1994) 200000-299999Autosomal loci or phenotypes (entries created before May 15, 1994) 300000-399999X-linked loci or phenotypes400000-499999Y-linked loci or phenotypes500000-599999Mitochondrial loci or phenotypes600000-Autosomal loci or phenotypes (entries created after May 15, 1994)a MIM numbers are frequently preceded by a symbol. An asterisk (*) before a MIM number indicates that the entry describes a distinct gene or phenotype and that the mode of inheritance of the phenotype has been proved (in the judgment of the authors and editors) and that the phenotype described is not known to be determined by a gene represented by other asterisked entries in MIM. A number sign (#) before a MIM number describing a phenotype indicates that the phenotype is caused by mutation in a gene represented by another entry and usually in any of two or more genes represented by other entries. The number sign is also used for phenotypes that result from specific chromosomal aberrations, such as Down syndrome, and for contiguous gene syndromes, such as Langer-Giedion syndrome. Whenever a number sign is used, the reason is stated at the outset of the entry.The absence of an asterisk (or other sign) preceding the number indicates that the distinctness of the phenotype as a mendelizing entity or the characterization of the gene in the human is not established.With the increasing complexity of biological information, OMIM makes a critical contribution by distilling what is known about a gene or disease into a single, searchable entry. The rich text ofthe OMIM entry, along with the source reference citations, make it easy to retrieve data of inter-est. The OMIM entry can then serve as a gateway to other sources of related information via themany curated and computed links within each entry.The OMIM Gene MapThe OMIM Synopsis of the Human Gene Map [/htbin-post/Omim/getmap] is a tabular listing of the genes and loci represented in MIM ordered pter to qter fromchromosome 1–22, X, and Y. The information in the map includes the cytogenetic location, sym-bol, title, MIM number, method of mapping, comments, associated disorders and their MIM num-bers, and the map location of the mouse ortholog. Links are provided from the cytogeneticlocation to the human Map Viewer [/mapview], from MIM numbers toOMIM entries, and from mouse map locations to the Mouse Genome Database ( MGD [http:///]).The Synopsis of the human gene map has also been sorted alphabetically by disorder and is referred to as the Morbid Map.AccessOMIM can be found either by direct query via Entrez, from other data resources within NCBI thatconnect to OMIM directly (for example, LocusLink or UniGene), or through Entrez cross-indexing(for example, from a PubMed abstract of an article cited in an OMIM entry). OMIM is indexed forretrieval in Entrez using a weighted system so that if the query term(s) appears in the title of aMIM entry, it will appear at the top of the retrieval list. Field restrictions [http://www.ncbi.nlm.nih.gov/entrez/Omim/omimhelp.html#SearchFields] are supported for some types of information, orone can use the Limits page to restrict a search (Figure 1). There are also several format optionsfor viewing a retrieval set that may affect which entries are displayed (Box 1).Figure 1: The Limits options for searching OMIM in Entrez.Queries can also be entered in the query box shown on all Entrez database pages, selecting OMIM as the database in the Search option. The Preview/Index,History, Clipboard, or Cubbyfunctions can also be used for OMIM, as for the rest of Entrez.The Gene Map can be accessed from an individual OMIM entry via the cytogenetic location displayed when appropriate under the entry titles. The Gene Map may also be queried directly[/htbin-post/Omim/getmap]. When queried directly, the first entry thatmatches the query is shown in the top row of the table, followed by 19 entries ordered by cytoge-netic location. The Find Next button can be used to find additional gene map entries that matchthe query.Guide to OMIM PagesQuery BarOMIM is queried via a standard Entrez query bar. The mechanics of selecting entries to display,how to display them, and identifying related entries either within Entrez or from external resourcesis also according to the Entrez/LinkOut standard. The display options (Box 1) allow the user toformat results in several ways. A useful function particular to OMIM is the option to display AllelicVariants, Clinical Synopsis, or mini-MIM views for a retrieval set.OMIM NavigationThe OMIM homepage and the search results pages share the same navigational links to theadvanced query page ( Gene Map [/htbin-post/Omim/getmap], MorbidMap [/htbin-post/Omim/getmorbid]), Help [http://www.ncbi.nlm.nih.gov/entrez/Omim/omimhelp.html] documents, FAQs [/entrez/Omim/omimfaq.html], statistics [/entrez/Omim/dispupdates.html], and relatedresources [/entrez/Omim/allresources.html].When viewing the text of an OMIM entry, however, the navigational links serve as an elec-tronic table of contents. The section headings within the entry are listed similar to a table ofcontents, and selecting one moves the display to that section. Within an entry, selecting the MIM# link takes you back to the top of the entry.OMIM staff actively contributes to the curation of data in LocusLink. Thus, if the MIM number is represented in LocusLink, a reciprocal LocusLink link is provided in the section to the left. Otherlinks provided by the LocusLink collaboration may also be listed in this section, e.g., links toNomenclature, Reference Sequences, or UniGene clusters that are specific to the subject OMIMentry.The sequence links in the LocusLink section may be different from the Entrez indexing links available via the Links link at the top right of an entry. The Entrez indexing links result indirectlyfrom the references in the OMIM entry and may include related sequences in other species, forexample. Thus, OMIM pages allow two levels of sequence connection: the specific ones in theleft section under LocusLink and the indirect but still informative ones through Entrez indexing linkat the upper right. More information on OMIM link types can be found in the Help [http://www.ncbi./entrez/Omim/omimhelp.html#LinkTypesOverview] documents.OMIM entries may also contain a link to LinkOut for resources external to NCBI (see Chapter17). Some of these external resouces are curated by OMIM staff, in which case they are dis-played by name. Others can be seen either by selecting LinkOut in the Links pull-down menu orby selecting the LinkOut display option in the query bar.Entrez LinksAt the top of any report page, or associated with each entry in the query result page, are the linksto related data generated from Entrez (Chapter 15). Here, PubMed links to the PubMed abstractsof the reference citations in the entry. Related Entries are to all other OMIM entries referenced inthe subject entry. Nucleotide, Protein, and LinkOut connections are as documented in the pre-vious section.The OMIM EntryEach OMIM entry has a unique number and is given a primary title and symbol. This is the titlethat is displayed in the document retrieval list. Alternative designations are listed below the pri-mary title. Some entries contain information that is related but not synonymous to the primarytopic and is not addressed in another entry (e.g., splice variants, phenotypic variants, etc.). Thisinformation is set off by the word “included” in the title. The first “included” title is displayed in thedocument retrieval list. The cytogenetic map location when known is given for each entry. When adisease shows genetic heterogeneity, more than one map location may be given. The “light bulb”icon at the end of text paragraphs links to related articles in PubMed. References within the textare linked to the complete citation at the end of the entry. There, the PubMed ID is linked to thePubMed abstract.Some entries contain an Allelic Variants section, which lists noteworthy mutations for the gene. Allelic variants are given a 10 digit number: the 6-digit number of the parent locus, followedby a decimal point and a unique 4-digit variant number. Criteria for inclusion include the firstmutation to be discovered, high population frequency, distinctive phenotype, historic significance,unusual mechanism of mutation, unusual pathogenetic mechanism, and distinctive inheritance (e.g., dominant with some mutations, recessive with other mutations in the same gene). Most of theallelic variants represent disease-producing mutations. A few polymorphisms are included, manyof which show a statistically significant association with specific common disorders.FTPThe OMIM data are available for bulk transfer [/entrez/Omim/omimfaq.html#download], but it should be noted that there are restrictions on use.The OMIM TM database, including the collective data contained therein, is the property of The Johns Hopkins University, which holds the copyright thereto. The OMIM database is made avail-able to the general public subject to certain restrictions. You may use the OMIM database anddata obtained from this site for your personal use, for educational or scholarly use, or for researchpurposes only. The OMIM database may not be copied, distributed, transmitted, duplicated,reduced, or altered in any way for commercial purposes or for the purpose of redistribution with-out a license from The Johns Hopkins University. Requests for information regarding a license forcommercial use or redistribution of the OMIM database may be sent via email to techli-cense@.Legal StatementOMIM is funded by a contract from the National Library of Medicine and the National HumanGenome Research Institute and by licensing fees paid to the Johns Hopkins University by com-mercial entities for adaptations of the database. The terms of these licenses are being managedby the Johns Hopkins University in accordance with its conflict of interest policies. References1. McKusick VA, et al. Mendelian Inheritance in Man. 12th ed. Baltimore: Johns Hopkins University Press;1998.2. Hamosh A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and geneticdisorders. Nucleic Acids Res 30:52–55; 2002. (PubMed)Box 1: Display options for viewing OMIM query results.Title (default)DetailsClinical SynopsisAllelic Variantsmini-MIMASN.1LinkOutRelated EntriesGenome LinksNucleotide LinksProtein LinksPubMed LinksSNP LinksStructure LinksUniSTS LinksObtaining multiple views of a query result:1. Enter query term or terms (example: renal failure hypertension).2. Default display is Titles.3. Select Clinical Synopsis and click on Display at the left to see the Clinical Synopsis section of all entries that have them.4. Similarly, select mini-MIM or Allelic Variants.NOTE: In the same bar, the number of entries to display and the format in which to display them can be configured by use of the Show and Text buttons, respectively.。

illumina芯片拷贝数变异分析流程

illumina芯片拷贝数变异分析流程

illumina芯片拷贝数变异分析流程Analyzing copy number variations (CNVs) in Illumina microarray data can be a challenging but incredibly informative process. Illumina芯片是一种广泛用于基因组学研究的高通量技术,其数据可以提供基因组中拷贝数变异的信息。

CNVs refer to structural variations in the DNA that involve gains or losses of sections of the genome, and they have been implicated in various human diseases. Illumina microarrays are commonly used to detect and analyze CNVs due to their high resolution and ability to simultaneously assess thousands of genetic markers.One of the first steps in the analysis of CNVs from Illumina microarray data is the pre-processing of raw intensity signals. This involves normalization of the data to correct for systematic variations in intensities across samples, as well as quality control measures to assess the reliability of the data. The goal is to ensure that the data is of high quality and free from technical artifacts that could impact the accuracy of CNV calling. Pre-processing of the data is crucial to obtaining reliable results in downstream analyses.After pre-processing, the next step is CNV calling, which involves identifying regions of the genome that exhibit differences in copy number compared to a reference sample. There are various algorithms available for CNV calling from Illumina microarray data, each with its own strengths and limitations. Commonly used algorithms include PennCNV, QuantiSNP, and Nexus Copy Number. These algorithms use statistical models to assess the likelihood of a CNV at specific genomic loci and provide a measure of confidence in the call.Once CNVs have been called, the next step is to annotate and interpret the results. This involves mapping the identified CNVs to the human genome and determining their potential functional consequences. CNVs can impact gene expression, disrupt gene structures, or alter regulatory regions, so understanding their effects is crucial for linking them to disease phenotypes. Various bioinformatics tools and databases can assist in the annotation of CNVs and provide insights into their biological significance.In addition to data analysis, it is essential to validate identified CNVs using independent experimental methods. This can includequantitative PCR, droplet digital PCR, or fluorescence in situ hybridization to confirm the presence and precise boundaries of the CNVs. Validation is critical to ensure the reliability of the findings and eliminate false positives that may arise from bioinformatics analyses. By combining computational analysis with experimental validation, researchers can confidently characterize CNVs and their implications in various diseases.Overall, analyzing CNVs from Illumina microarray data is a comprehensive and multi-step process that requires a combination of bioinformatics skills, statistical knowledge, and experimental validation. Despite the challenges, the insights gained from studying CNVs can provide valuable information about the genetic basis of diseases and pave the way for precision medicine approaches. Illumina芯片数据中CNVs的分析是一项既具有挑战性又极具信息价值的过程。

利用WGCNA_挖掘种公鸡睾丸和附睾中影响精子活力的核心基因

利用WGCNA_挖掘种公鸡睾丸和附睾中影响精子活力的核心基因

江苏农业学报(JiangsuJ.ofAgr.Sci.)ꎬ2023ꎬ39(3):762 ̄769http://jsnyxb.jaas.ac.cn原佳妮ꎬ赵延辉ꎬ侍玉梅ꎬ等.利用WGCNA挖掘种公鸡睾丸和附睾中影响精子活力的核心基因[J].江苏农业学报ꎬ2023ꎬ39(3):762 ̄769.doi:10.3969/j.issn.1000 ̄4440.2023.03.017利用WGCNA挖掘种公鸡睾丸和附睾中影响精子活力的核心基因原佳妮ꎬ㊀赵延辉ꎬ㊀侍玉梅ꎬ㊀倪和民ꎬ㊀郭㊀勇ꎬ㊀盛熙晖ꎬ㊀齐晓龙ꎬ㊀王相国ꎬ㊀邢㊀凯(北京农学院动物科学技术学院ꎬ北京102206)收稿日期:2022 ̄06 ̄25作者简介:原佳妮(1999-)ꎬ女ꎬ山西晋城人ꎬ硕士研究生ꎬ研究方向为功能基因组学与生物信息学ꎮ(E ̄mail)BUAYjn@163.com通讯作者:邢㊀凯ꎬ(E ̄mail)xk181986@163.com㊀㊀摘要:㊀种公鸡的精子活力对养禽业的可持续发展至关重要ꎬ通过加权基因共表达网络(WGCNA)分析法挖掘种公鸡睾丸㊁附睾中调控精子活力的基因共表达模块和核心基因ꎬ并构建与种公鸡精子活力相关的调控网络ꎮ基于团队前期对不同精子活力种公鸡睾丸㊁附睾组织转录组测序数据的分析ꎬ用WGCNA方法构建基因共表达网络ꎬ识别与表型性状显著相关的基因模块ꎬ并对关键模块基因进行GO功能注释㊁KEGG通路富集分析ꎮ用Cytoscape软件筛选每个关键模块的核心基因并构建可视化共表达网络ꎮ结果表明ꎬ14227个基因聚类到11个模块ꎬ以决定系数(R2)ȡ0 6㊁P<0 05为标准挖掘出青绿色(Turquoise)模块㊁黄色(Yellow)模块㊁红色(Red)模块与表型显著相关ꎮ对3个关键模块的基因进行功能分析ꎬ发现这些基因显著富集在核苷酸切除修复㊁同源重组㊁细胞色素P450对异类物质代谢㊁MAPK信号通路和细胞凋亡等通路上ꎮ选出的IFT家族基因与HMOX2㊁CYP4B1㊁ANG㊁ITGB2基因是与种公鸡精子活力相关的核心基因ꎬ可作为提高精子活力的潜在基因ꎮ关键词:㊀种公鸡ꎻ精子活力ꎻ加权基因共表达网络(WGCNA)ꎻ核心基因中图分类号:㊀S831.2㊀㊀㊀文献标识码:㊀A㊀㊀㊀文章编号:㊀1000 ̄4440(2023)03 ̄0762 ̄08MiningofhubgenesaffectingspermmotilityintestesandepididymidesofbreedercocksbyWGCNAmethodYUANJia ̄niꎬ㊀ZHAOYan ̄huiꎬ㊀SHIYu ̄meiꎬ㊀NIHe ̄minꎬ㊀GUOYongꎬ㊀SHENGXi ̄huiꎬ㊀QIXiao ̄longꎬWANGXiang ̄guoꎬ㊀XINGKai(AnimalScienceandTechnologyCollegeꎬBeijingUniversityofAgricultureꎬBeijing102206ꎬChina)㊀㊀Abstract:㊀Thespermmotilityofbreedingroostersiscrucialforthesustainabledevelopmentofthepoultryfarming.Thecoexpressionmodulesandcoregenesregulatingspermmotilityintestisandepididymiswereexploredbyweightedgeneco ̄expres ̄sionnetworkanalysis(WGCNA)ꎬandtheregulatorynetworkrelatedtospermmotilityinbreedercockswasconstructed.Thetranscriptomesequencingdataoftesticularandepididymistissuesofbreedercockswithhighandlowspermmotilitywereana ̄lyzed.Thegeneco ̄expressionnetworkwasconstructedbyWGCNAmethodꎬandgenemodulessignificantlyassociatedwithphe ̄notypictraitswereidentified.GOfunctionalannotationandKEGGpathwayenrichmentanalysiswereperformedforthemodulegenes.Cytoscapesoftwarewasusedtoscreenkeygenesandvisualizetheco ̄expressionnetwork.Theresultsshowedthat14227geneswereclusteredinto11modulesꎬTurquoiseꎬYellowandRedmoduleswereminedwithR2ȡ0.6andP<0.05ascriteria.Functionalanalysisofthegenesinthethreekeymodulesshowedthatthesegenesweremainlyenrichedinnucleotideexcisionre ̄pairꎬhomologousrecombinationꎬeffectsofcytochromeP450onxenobioticmetabolismꎬMAPKsignalingpathwayꎬapoptosisandothersignalingpathways.InthisstudyꎬtheselectedIFTfamilygenesHMOX2ꎬCYP4B1ꎬANGandITGB2werecoregenesrelatedtospermmotilityofbreedercocksꎬwhichcouldbeusedaspotentialgenesforimprovingspermmotility.Keywords:㊀breedercocksꎻspermmotilityꎻweightedgeneco ̄expressionnetworkanalysis(WGCNA)ꎻhubgene267㊀㊀种公鸡在家禽生产中具有重要作用ꎬ每年每羽种公鸡可以使超过1000个种蛋受精[1]ꎮ对于家禽养殖业而言ꎬ高繁殖力的种公鸡可以提高畜禽生产经济效益ꎮ精液品质是种公鸡最主要的繁殖性状ꎬ良好的精液品质可以提高种公鸡的利用效率ꎬ加速鸡的遗传改良进程[2]ꎮ公鸡的精液品质主要包括精液颜色㊁精子活力㊁精子密度等[3]ꎬ其中精子活力是精子生存能力㊁受精能力的体现ꎬ最能反映精液的品质ꎮ精子活力是高遗传力性状[2]ꎬ研究精子活力的分子遗传机制是提高种公鸡精子活力的有效方法ꎮ加权共表达网络分析(Weightedcorrelationnet ̄workanalysisꎬWGCNA)是通过构建共表达网络研究基因功能的重要方法[4]ꎮWGCNA利用基因共表达数据将大量基因划分为少数模块ꎬ将模块与性状关联后可确定核心基因所在关键模块ꎬ并筛选出核心基因[5]ꎮ用WGCNA分析法将表达模式相似的基因进行聚类[4]ꎬ分析基因与性状之间的关系ꎬ在动物育种方面得到了广泛应用ꎮLiu等[6]通过对牛28个精子测序数据进行WGCNA分析ꎬ发现精子DNA甲基化可能会影响公牛的繁殖性能ꎮXu等[7]以湖羊睾丸为试验材料ꎬ对不同月龄湖羊睾丸测序数据进行WGCNA分析ꎬ鉴定出2个与睾丸发育高度相关的基因模块ꎬ还发现DNAH17㊁SPATA4㊁PDGFA㊁VIM和INHBA是影响睾丸大小的关键基因ꎮRobic等[8]以影响猪睾丸类固醇代谢的CYP11A1或HSD17B3基因为核心ꎬ鉴定这2个基因所在模块以寻找更多影响类固醇代谢的基因ꎮ目前尚无用WGCNA分析法鉴定种公鸡精子活力关键基因的报道ꎬ本研究拟用WGCNA分析法鉴定种公鸡睾丸㊁附睾中影响精子活力的核心基因ꎬ从而进一步阐明种公鸡精子活力遗传的分子机制ꎮ1㊀材料与方法1.1㊀样本的采集与处理本试验利用笔者所在课题组前期对不同精子活力的种公鸡睾丸㊁附睾组织进行转录组测序所得数据[9 ̄10]进行分析ꎮ试验前期通过检测种公鸡的精子活力ꎬ各选取4羽高㊁低精子活力的公鸡ꎬ屠宰后取睾丸㊁附睾组织ꎮ所取组织样本分为G(睾丸)㊁F(附睾)2组ꎬ再根据精子活力分为H(高活力)㊁L(低活力)组ꎮ用TopHat2[11]㊁Haseq2[12]对转录组数据进行比对和定量ꎬ并除去表达量为0的基因ꎮ1.2㊀共表达网络的构建用WGCNA分析包[13]构建基因共表达网络ꎮ首先计算任意2个基因之间的相关系数ꎬ用pick ̄SoftThreshold函数确定最佳软阈值ꎬ选择相关系数加权值(power)=7对关系矩阵进行幂运算ꎬ建立无尺度的邻近矩阵ꎮ采用adjacency函数将邻近矩阵转换为TOM矩阵ꎬ根据TOM矩阵的相异程度ꎬ按照层次聚类法和动态剪切树的标准进行基因聚类和模块划分ꎮ用模块基因进行主成分分析(PCA)ꎬ得到模块特征值(ME)ꎮ1.3㊀目标模块的选择为了筛选与表型数据相关的模块ꎬ计算模块特征向量(ME)与样本表型的相关程度ꎮ本研究以决定系数(R2)ȡ0 6㊁显著性水平0 05为标准ꎬ选取与样本表型显著相关的模块ꎬ表型数据见表1ꎮ表1㊀样本分组对应的表型Table1㊀Phenotypescorrespondingtodifferentsamplegroups表型精子活力(%)组织H ̄G86.20ʃ1.81睾丸L ̄G28.85ʃ2.25睾丸H ̄F86.20ʃ1.81附睾L ̄F28.85ʃ2.25附睾精子活力测定方法:使用伟力仪测定ꎬ每个精液样品随机选取5个显微区域进行分析ꎬ取平均值ꎮ1.4㊀精子活力相关基因的筛选与网络构建将目标模块中的基因导入STRING构建蛋白质互作网络ꎮ使用Cytoscape软件的cytoHabba插件计算基因的连接度ꎬ将结果排名前15的基因作为核心基因ꎬ绘制前10个基因的网络图ꎮ1.5㊀目标模块的GO和KEGG富集分析用ClusterProfiler包对鸡精子活力相关模块内的基因作GO㊁KEGG富集分析ꎬ得出睾丸㊁附睾中参与精子活力调控的关键基因富集的生物学过程ꎬ阈值为0 05ꎮ2㊀结果与分析2.1㊀WGCNA分析对定量数据除去表达量为0的基因后ꎬ得到14227个基因用于加权基因共表达网络分析ꎮ相关系数加权值(power)大于7时ꎬ网络中基因之间的连接服从无尺度网络分布ꎬ因此选取power=7构建无尺度网络(图1)ꎮ采用动态剪切法划分ꎬ构建了11个367原佳妮等:利用WGCNA挖掘种公鸡睾丸和附睾中影响精子活力的核心基因模块ꎬ基因聚类数量最多的青绿色(Turquoise)模块含有6753个基因ꎬ基因聚类数量最少的紫色(Purple)模块仅含有49个基因ꎬ灰色模块代表没有被聚类的基因(图2)ꎮ图1㊀WGCNA分析中power值的筛选Fig.1㊀Screeningofpowervaluesinweightedgeneco ̄expressionnetworkanalysis(WGCNA)图2㊀基因聚类模块Fig.2㊀Geneclusteringmodule2.2㊀模块相关性与重要模块的识别每个基因共表达模块与高㊁低精子活力睾丸和附睾组织的关联性分析结果如图3所示ꎮ以各基因聚类模块的特征值为标准ꎬ发现黄色(Yellow)模块与高精子活力睾丸的相关性最高ꎬ但相关性不显著ꎮ青绿色(Turquoise)模块是与低精子活力睾丸显著相关的模块(R2=0 6ꎬP=0 01)ꎮ黄色(Yellow)模块是与高精子活力附睾显著正相关的模块(R2=0 68ꎬP=0 004)ꎮ红色(Red)模块是与低精子活力附睾显著正相关的模块ꎬ且决定系数为0 70(图3a)ꎮ通过模块相关性分析发现ꎬ红色(Red)模块与黄色(Yellow)模块的相关性较高ꎬ因此可将这2个模块作为影响精子活力的特异性模块(图3b)ꎮH ̄G㊁L ̄G㊁H ̄F㊁L ̄F分别表示高精子活力睾丸㊁低精子活力睾丸㊁高精子活力附睾㊁低精子活力附睾ꎮa:基因共表达模块与性状的相关性ꎻb:模块相关性热图ꎮ图例表示不同精子活力睾丸㊁附睾与模块的相关性ꎬ红色表示正相关ꎬ绿色表示负相关ꎬ颜色越深表明相关性越强ꎮ图3㊀基因共表达模块的相关性Fig.3㊀Correlationofgeneco ̄expressionmodules467江苏农业学报㊀2023年第39卷第3期2.3㊀网络构建与核心基因鉴定图4为3个模块的网络构建结果ꎬ由Cytoscape结果确定IFT74㊁TRAF3IP1㊁NUP153㊁NUP54㊁TTC30A㊁IFT80㊁NUP155㊁IFT81㊁IFT140和IFT57是青绿色(Turquoise)模块的核心基因ꎬCYP4B1㊁LOC421584㊁APOA4㊁ENSGALP00000006662㊁AGXT㊁SLC51A㊁GAL9㊁PLA2G12B㊁TM6SF2和FTCD是黄色(Yellow)模块的核心基因ꎬMYO1F㊁CTSS㊁FES㊁LOC100857714㊁RAC2㊁MYO1G㊁MPEG1㊁CCLI10㊁RS ̄FR和ITGB2是红色(Red)模块的关键基因ꎮ此外ꎬ未绘入图中的HMOX2㊁CYP4B1㊁ANG基因也是与种公鸡精子活力相关的核心基因ꎮ颜色的深浅表示基因连接度的高低ꎬ颜色越深表明连接度越高ꎮa:青绿色(Turquoise)模块核心基因共表达网络ꎻb:黄色(Yellow)模块核心基因共表达网络ꎻc:红色(Red)模块核心基因共表达网络ꎮ图4㊀关键模块核心基因的共表达网络Fig.4㊀Co ̄expressionnetworkofcoregenesinkeymodules2.4㊀目标模块中基因的生物学功能分析Turquoise模块的基因富集分析结果见图5ꎮ生物过程(BP)㊁细胞组分(CC)㊁分子功能(MF)这几个GO类别分别显著富集的前2个条目分别为DNA修复㊁染色体结构ꎬRNA聚合酶复合体㊁细胞质ꎬ催化活性作用于蛋白质和RNA结合(图5a)ꎮKEGG通路分析发现ꎬTurqueious模块的基因显著富集到核苷酸切除修复㊁核糖体生物发生和RNA降解等与核苷酸相关的通路上(图5b)ꎮa:Turquoise模块GO功能注释ꎻb:Turquoise模块KEGG富集分析ꎮBP:生物过程ꎻCC:细胞组分ꎻMF:分子功能ꎮ图5㊀青绿色(Turquoise)模块功能富集分析Fig.5㊀FunctionalenrichmentanalysisofTurquoisemodule㊀㊀Yellow模块的基因显著富集在区域化㊁解剖结构形态发生和前置/后置模式规范3个生物学过程及DNA结合转录因子活性㊁转录调节器活性㊁序列特异性DNA结合等分子功能条目(图6a)ꎮ细胞色素P450对异类物质的代谢㊁MAPK信号通路㊁谷胱甘肽代谢和药物代谢 ̄细胞色素567原佳妮等:利用WGCNA挖掘种公鸡睾丸和附睾中影响精子活力的核心基因P450是Yellow模块基因显著富集的信号通路(图6b)ꎮ㊀㊀Red模块的基因显著富集于免疫系统过程㊁白细胞的细胞 ̄细胞粘连㊁对细菌的反应㊁对生物刺激的反应㊁对T细胞活化的正向调控和淋巴细胞活化等生物学过程(图7a)ꎮ信号通路主要涉及细胞因子 ̄细胞因子受体相互作用㊁产生IgA的肠道免疫网络㊁噬菌体㊁细胞黏附分子㊁细胞凋亡和紧密连接等(图7b)ꎮa:Yellow模块GO功能注释ꎻb:Yellow模块KEGG富集分析ꎮBP:生物过程ꎻMF:分子功能ꎮ图6㊀黄色(Yellow)模块功能富集分析结果Fig.6㊀FunctionalenrichmentanalysisofYellowmodulea:Red模块GO功能注释ꎻb:Red模块KEGG富集分析ꎮBP:生物过程ꎻMF:分子功能ꎮ图7㊀红色(Red)模块功能富集分析Fig.7㊀FunctionalenrichmentanalysisofRedmodule3㊀讨论本研究通过WGCNA分析将14227个基因富集到11个共表达模块ꎬ利用各个模块的ME值对各模块与目标性状进行相关性分析ꎬ得出精子活力性状研究的3个关键目标模块为Turquoies模块㊁Yellow模块㊁Red模块ꎮ再通过互作网络的连接度筛选得出ꎬIFT基因家族㊁HMOX2㊁CYP4B1㊁ANG㊁ITGB2为影响精子活力的候选基因ꎮTurquoies模块的基因主要参与核苷酸切除修复(NER)㊁同源重组和核糖核酸降解等与核糖体相关的通路ꎮ其中ꎬ核苷酸切除修复系统在精子发生过程中对于去除大块DNA加成物至关重667江苏农业学报㊀2023年第39卷第3期要[14]ꎮ研究发现ꎬ核苷酸切除修复参与大鼠睾丸的氧化应激[15]ꎮ通过对牦牛㊁牛睾丸转录组的分析发现ꎬ生精停滞导致雄性不育的差异长链非编码RNA(lncRNA)富集在核苷酸切除修复通路上[16]ꎬ这与本研究结果相似ꎮ同源重组(HR)以程序化的DNA双链断裂(DSBs)的产生为起点ꎬ从而造成遗传信息的交换和基因组的多样性[17]ꎮ睾丸精子干细胞的同源重组途径可以检测DNA损伤修复[18]ꎬ因此同源重组对于睾丸精子发生至关重要ꎮIFT家族基因和HMOX2是睾丸组织中与精子活力相关的关键基因ꎮIFT基因家族中的IFT57㊁IFT74㊁IFT80㊁IFT81㊁IFT140和HMOX2是Turquoise模块的核心基因ꎬ在低精子活力睾丸中高表达ꎮIFT172的失活似乎对有丝分裂㊁减数分裂没有影响ꎬ但会阻碍精子发生ꎮIFT基因敲除会导致小鼠精子数量㊁活力减少60%ꎬ从而造成雄性不育[19]ꎮ早期研究检测到HMOX1㊁HMOX2在人类睾丸等生殖器官中表达[20]ꎬ表明这些基因可能在动物睾丸中表达ꎮ从Leydig细胞中的HO ̄1衍生的CO调节了精子发生并引起生殖细胞的凋亡[21]ꎮHMOX2通过调节类固醇激素的生成来影响精子活力ꎮIFT家族基因和HMOX2是睾丸组织中影响精子活力的基因ꎮYellow模块的基因可能在异类物质代谢㊁外源药物代谢过程中发挥作用ꎮ细胞色素P450对异类物质代谢通路㊁MAPK信号通路和谷胱甘肽代谢通路是与高精子活力附睾显著相关的通路ꎮ细胞色素P450对异类物质代谢在胚胎干细胞分化成精子干细胞的过程中发挥着重要作用ꎬ细胞色素P450基因家族内CYP450基因的相对表达量降低使得精子畸形率升高ꎬ并对精子活力㊁密度也有影响[22]ꎬ这与本研究结果相似ꎮMAPK信号通路通过调节附睾中紧密连接蛋白质的表达和分布ꎬ有助于维持附睾储存㊁精子成熟所需的腔内环境[23]ꎮ此外ꎬMAPK信号通路影响Sertoli细胞的乳酸供应ꎬ并且MAPK信号通路在调节精原干细胞自我更新中也占据主导地位[24]ꎮ谷胱甘肽代谢是能量代谢的一种ꎬ其代谢产物半胱氨酸可以促进睾丸类固醇激素的合成ꎬ类固醇激素又可以合成睾酮㊁雄烯酮ꎬ因此谷胱甘肽代谢间接影响睾丸内精子发生相关激素的合成ꎮ谷胱甘肽代谢通路中的谷胱甘肽过氧化物酶5基因(GPx5)是在附睾中强表达的基因ꎬGPx5可以调节附睾内活性氧自由基的浓度ꎬ促进精子发育成熟ꎬ维持精子完整性[25]ꎮCYP4B1是Yellow模块的核心基因ꎬ在高精子活力附睾中表达ꎮCYP4B1是一种哺乳动物的细胞色素P450单加氧酶ꎬ能够羟基化不饱和脂肪酸ꎮLahnsteiner等[26]研究发现ꎬ脂质的组成和代谢对精子活力有显著影响ꎮ此外ꎬYellow模块中的基因通过参与附睾的代谢活动来调控精子活力ꎮRed模块的基因可以保护精子免受氧化应激的危害ꎬ具有抗凋亡㊁抗炎症的功能ꎮ细胞因子 ̄细胞因子受体相互作用㊁细胞凋亡㊁细胞黏附分子是与低精子活力附睾性状相关的通路ꎮ研究发现ꎬ细胞因子 ̄细胞因子受体相互作用通路上相关基因的表达可导致雄性附睾炎ꎬ表明该通路参与附睾炎症的发生[27]ꎮ通过探索新鲜㊁冷冻后解冻公猪精子中的miRNA㊁mRNA谱发现ꎬ新鲜㊁冷冻公猪精液的差异mRNA在细胞因子 ̄细胞因子受体相互作用通路富集[28]ꎬ表明该通路是调控精子活力的通路ꎮ细胞凋亡是细胞在基因作用下的正常死亡ꎬ可以维持机体内环境的相对稳定ꎮ在精子发生过程中ꎬ涉及生精细胞的凋亡ꎬ而生精细胞的凋亡可以维持支持细胞与生精细胞的数量平衡[29]ꎮ细胞凋亡通路上基因的表达与精子凋亡㊁精液质量显著相关[30]ꎬ这与本研究结果一致ꎮ黏附分子在附睾中的表达可以调节附睾内的炎症反应[31]ꎮANG㊁ITGB1是Red模块内与精子活力相关的基因ꎬ其中ANG参与应激反应ꎬ在炎症反应期间其表达量增加[32]ꎮ研究发现ꎬANG缺失阻止了炎症诱导的精子中5ᶄ ̄tsRNA表达谱的改变ꎬ下调了线粒体氧化磷酸化㊁翻译/核糖体途径ꎬ进而影响精子活力[33]ꎮITGB1是细胞黏附分子家族基因中的1个基因ꎬ参与细胞表面黏附信号通路[34]ꎮMatsuyama等[35]研究发现ꎬITGB1的表达使得精子数量减少ꎬ造成Sertoli ̄生殖细胞黏附连接的功能障碍ꎮAzizi等[36]研究发现ꎬITGB1在精子分化过程中表达量下调ꎮ此外ꎬANG㊁ITGB1是与附睾组织炎症反应有关的基因ꎮ4㊀结论本研究通过WGCNA分析ꎬ获得了与鸡精子767原佳妮等:利用WGCNA挖掘种公鸡睾丸和附睾中影响精子活力的核心基因活力相关的基因集ꎬ筛选出3个与精子活力显著相关的模块ꎮ关键模块中的IFT家族基因㊁HMOX2㊁CYP4B1㊁ANG㊁ITGB2等基因在调控鸡精子活力中发挥着关键作用ꎮ本研究结果为阐明种公鸡睾丸㊁附睾调控精子活力的分子机制奠定了基础ꎮ参考文献:[1]㊀刘一帆.基于睾丸测序的鸡精子活力性状mRNA ̄miRNA ̄ln ̄cRNA转录调控研究[D].北京:中国农业大学ꎬ2018. [2]㊀徐秀丽.基于睾丸转录组测序筛选影响家鸽精子活力的关键mRNAs和非编码RNAs[D].杭州:浙江大学ꎬ2021. [3]㊀熊㊀婷ꎬ邬崇华ꎬ陈㊀彪ꎬ等.笼养山麻鸭精液品质的测定与分析[J].黑龙江畜牧兽医ꎬ2020(14):53 ̄56.[4]㊀LANGFELDERPꎬHORVATHS.WGCNA:anRpackageforweightedcorrelationnetworkanalysis[J].BMCBioinformaticsꎬ2008ꎬ9(1):559.[5]㊀CHENGYꎬLILꎬQINZꎬetal.Identificationofcastration ̄resist ̄antprostatecancer ̄relatedhubgenesusingweightedgeneco ̄ex ̄pressionnetworkanalysis[J].JCellMolMedꎬ2020ꎬ24(14):8006 ̄8017.[6]㊀LIUSꎬFANGLꎬZHOUYꎬetal.Analysesofinter ̄individualvar ̄iationsofspermDNAmethylationandtheirpotentialimplicationsincattle[J].BMCGenomicsꎬ2019ꎬ20(1):888. [7]㊀XUHꎬSUNWꎬPEISꎬetal.IdentificationofkeygenesrelatedtopostnataltesticulardevelopmentbasedontranscriptomicdataoftestisinHuSheep[J].FrontGenetꎬ2021ꎬ12:773695. [8]㊀ROBICAꎬFARAUTTꎬFEVEKꎬetal.Correlationnetworkspro ̄videnewinsightsintothearchitectureoftesticularsteroidpathwaysinpigs[J].Genesꎬ2021ꎬ12(4):519 ̄551.[9]㊀XINGKꎬCHENYꎬWANGLꎬetal.EpididymalmRNAandmiR ̄NAtranscriptomeanalysesrevealimportantgenesandmiRNAsre ̄latedtospermmotilityinroosters[J].PoultSciꎬ2022ꎬ101(1):101558.[10]XINGKꎬGAOMJꎬLIXꎬetal.AnintegratedanalysisoftestismiRNAandmRNAtranscriptomerevealsimportantfunctionalmiRNA ̄targetsinreproductiontraitsofroosters[J].ReproductiveBiologyꎬ2020ꎬ20(3):433 ̄440.[11]KIMDꎬPERTEAGꎬTRAPNELLCꎬetal.TopHat2:accuratea ̄lignmentoftranscriptomesinthepresenceofinsertionsꎬdeletionsandgenefusions[J].GenomeBiologyꎬ2013ꎬ14(4):R36. [12]ANDERSSꎬPYLPTꎬHUBERW.HTSeq ̄aPythonframeworktoworkwithhigh ̄throughputsequencingdata[J].Bioinformaticsꎬ2015ꎬ31(2):166 ̄169.[13]LANGFELDERPꎬHORVATHS.WGCNA:anRpackageforweightedcorrelationnetworkanalysis[J].BMCBioinformaticsꎬ2008ꎬ9(1):559.[14]GUAHꎬJIGXꎬZHOUYꎬetal.Polymorphismsofnucleotide ̄excisionrepairgenesmaycontributetospermDNAfragmentationandmaleinfertility[J].ReprodBiomedOnlineꎬ2010ꎬ21(5):602 ̄609.[15]ZHAOHXꎬSONGLXꎬMANꎬetal.ThedynamicchangesofNrf2mediatedoxidativestressꎬDNAdamageandbaseexcisionre ̄pairintestisofratsduringaging[J].ExperimentalGerontologyꎬ2021ꎬ152:111460.[16]WUSXꎬMIPAMTDꎬXUCFꎬetal.Testistranscriptomeprofi ̄lingidentifiedlncRNAsinvolvedinspermatogenicarrestofcattley ̄ak[J].PLoSOneꎬ2020ꎬ15(2):e0229503.[17]LINMꎬLYUJXꎬZHAODꎬetal.MRNIPisessentialformeioticprogressionandspermatogenesisinmice[J].BiochemBiophysResCommunꎬ2021ꎬ550:127 ̄133.[18]LEWꎬQILꎬXUCꎬetal.Preliminarystudyofthehomologousrecombinationrepairpathwayinmousespermatogonialstemcells[J].Andrologyꎬ2018ꎬ6(3):488 ̄497.[19]ZHANGSYꎬLIUYHꎬHUANGQꎬetal.Murinegermcell ̄spe ̄cificdisruptionofIft172causesdefectsinspermiogenesisandmalefertility[J].Reproductionꎬ2020ꎬ159(4):409 ̄421.[20]TRAKSHELGMꎬMAINESMD.Detectionoftwohemeoxygen ̄aseisoformsinthehumantestis[J].BiochemBiophysResCom ̄munꎬ1988ꎬ154(1):285 ̄291.[21]OZAWANꎬGODANꎬMAKINONꎬetal.Leydigcell ̄derivedhemeoxygenase ̄1regulatesapoptosisofpremeioticgermcellsinresponsetostress[J].JClinInvestꎬ2002ꎬ109(4):457 ̄467. [22]FRANASIAKJMꎬBARNETTRꎬMOLINAROTAꎬetal.CYP1A13801T>Cpolymorphismimplicatedinalteredxenobioticmetabolismisnotassociatedwithvariationsinspermproductionandfunctionasmeasuredbytotalmotilespermandfertilizationrateswithintracytoplasmicsperminjection[J].FertilityandSteril ̄ityꎬ2016ꎬ106(2):481 ̄486.[23]KIMBꎬBRETONS.TheMAPK/ERK ̄signalingpathwayregulatestheexpressionanddistributionoftightjunctionproteinsinthemouseproximalepididymis[J].BiologyofReproductionꎬ2016ꎬ94(1):22.[24]NIFꎬHAOSꎬYANGW.MultiplesignalingpathwaysinSertolicells:recentfindingsinspermatogenesis[J].CellDeath&Dis ̄easeꎬ2019ꎬ10(8):515 ̄541.[25]伊茹汗.GPx5在骆驼附睾上的时空表达研究[D].呼和浩特:内蒙古农业大学ꎬ2020.[26]LAHNSTEINERFꎬMANSOURNꎬCABERLOTTOS.CompositionandmetabolismofcarbohydratesandlipidsinSparusauratasemenanditsrelationtoviabilityexpressedasspermmotilitywhenacti ̄vated[J].ComparativeBiochemistryandPhysiologyPartB:Bio ̄chemistryandMolecularBiologyꎬ2010ꎬ157(1):39 ̄45. [27]SONGXꎬLINNHꎬWANGYLꎬetal.Comprehensivetranscrip ̄tomeanalysisbasedonRNAsequencingidentifiescriticalgenesforlipopolysaccharide ̄inducedepididymitisinaratmodel[J].AsianJAndrolꎬ2019ꎬ21(6):605 ̄611.[28]DAIDHꎬQAZIIHꎬRAMMXꎬetal.ExplorationofmiRNAand867江苏农业学报㊀2023年第39卷第3期mRNAprofilesinfreshandfrozen ̄thawedboarspermbytranscrip ̄tomeandsmallRNAsequencing[J].IntJMolSciꎬ2019ꎬ20(4):802. [29]SINGHRꎬLETAIAꎬSAROSIEKK.Regulationofapoptosisinhealthanddisease:thebalancingactofBCL ̄2familyproteins[J].NatRevMolCellBiolꎬ2019ꎬ20(3):175 ̄193.[30]JIGXꎬGUAHꎬHUFꎬetal.Polymorphismsincelldeathpath ̄waygenesareassociatedwithalteredspermapoptosisandpoorse ̄menquality[J].HumReprodꎬ2009ꎬ24(10):2439 ̄2446. [31]OZTÜRKHꎬOZTURKHꎬDOKUCUAI.Theroleofcelladhe ̄sionmoleculesinischemicepididymalinjury[J].IntUrolNeph ̄rolꎬ2007ꎬ39(2):565 ̄570.[32]OLSONKAꎬVERSELISSJꎬFETTJW.Angiogeninisregulatedinvivoasanacutephaseprotein[J].BiochemBiophysResCom ̄munꎬ1998ꎬ242(3):480 ̄483.[33]ZHANGYWꎬRENLꎬSUNXXꎬetal.Angiogeninmediatespa ̄ternalinflammation ̄inducedmetabolicdisordersinoffspringthroughspermtsRNAs[J].NatCommunꎬ2021ꎬ12(1):6673. [34]MATSUURANꎬTAKADAY.Subclassificationꎬmolecularstruc ̄tureꎬfunctionandligandinintegrinsuperfamily[J].NihonRin ̄shoJapaneseJournalofClinicalMedicineꎬ1995ꎬ53(7):1623 ̄1630.[35]MATSUYAMATꎬNIINONꎬKIYOSAWANꎬetal.Toxicogenomicinvestigationonrattesticulartoxicityelicitedby1ꎬ3 ̄dinitroben ̄zene[J].Toxicologyꎬ2011ꎬ290(2/3):169 ̄177.[36]AZIZIHꎬNIAZITAꎬSKUTELLAT.Successfultransplantationofspermatogonialstemcellsintotheseminiferoustubulesofbusulfan ̄treatedmice[J].ReprodHealthꎬ2021ꎬ18(1):189.(责任编辑:徐㊀艳)967原佳妮等:利用WGCNA挖掘种公鸡睾丸和附睾中影响精子活力的核心基因。

基于转录组挖掘不同碳源条件下解淀粉芽孢杆菌TF28脂肽合成相关基因

基于转录组挖掘不同碳源条件下解淀粉芽孢杆菌TF28脂肽合成相关基因

收稿日期:2022-11-17基金项目:黑龙江省院所基本应用技术研究专项(2021JBKY002)作者简介:闫更轩(1996-),男,黑龙江哈尔滨人,助理研究员,硕士,主要从事植物病害防治研究,(电话)187****8190(电子信箱)predawnyan@ ;通信作者,夏海华(1972-),女,黑龙江哈尔滨人,研究员,硕士,主要从事微生物药物开发研究,(电话)133****8598(电子信箱)******************。

闫更轩,王向向,田缘,等.基于转录组挖掘不同碳源条件下解淀粉芽孢杆菌TF28脂肽合成相关基因[J ].湖北农业科学,2023,62(5):172-178.解淀粉芽孢杆菌(Bacillus amyloliquefaciens )是一种革兰氏阳性兼性厌氧菌,在28~37℃、pH 6.5~7.0的条件下适宜生长。

解淀粉芽孢杆菌生长速度快、产量高、安全无致病性,可以合成多种具有抑菌活性的次级代谢产物,包括脂肽、抑菌蛋白及聚酮化合物等,是发酵工业的重要宿主菌株[1]。

脂肽作为基于转录组挖掘不同碳源条件下解淀粉芽孢杆菌TF28脂肽合成相关基因闫更轩1,2,王向向1,田缘1,3,刘治廷1,张淑梅1,夏海华1(1.黑龙江省科学院微生物研究所,哈尔滨150010;2.东北林业大学生命科学学院,哈尔滨150040;3.东北农业大学食品学院,哈尔滨150030)摘要:以解淀粉芽孢杆菌(Bacillus amyloliquefaciens )TF28为供试菌株,设置葡萄糖组(对照)、果糖组、木糖组3个实验组,通过转录组测序鉴定差异基因,分别对差异基因进行功能分析,挖掘脂肽合成调控基因。

果糖组共鉴定到差异基因688个,上调基因522个,下调基因166个;木糖组共鉴定到差异基因855个,上调基因691个,下调基因164个。

不同碳源改变了解淀粉芽孢杆菌TF28脂肽合成群体感应系统、双组分系统全局调控因子的表达水平,并影响脂肽必需氨基酸及脂肪酸的合成代谢,为进一步研究脂肽合成的生物学调控机制提供参考。

济南市无偿献血人群ABO血型及与Rh血型分布研究

济南市无偿献血人群ABO血型及与Rh血型分布研究

•论著•济南市无偿献血人群A B O血型及与R h血型分布研究曹丹、昝小玲2,朱海峰\唐建华、李文超S李洪涛11.山东省血液中心,山东济南250014;2.山东中医药大学附属医院【摘要】目的研究济南市无偿献血人群中A B0、R h血型的分布特点,并分析其A B0、R h抗原表现型及基因频率,是否符合Hardy-W einberg平衡定律。

方法对山东省血液中心采集的282 413份无偿献血样本采用血型血清学方法进行A B0、R h血型检测,利用血型群体遗传学研究方法进行描述性分析。

结果2015—2017年无偿献血者282 413人,A B0表现型特征分布为B>0>A>A B; A B0基因频率为r>q>p,期望值和观察值符合Hardy-Weinberg平衡定律;不同性别、汉族和少数民族间ABO血型分布差异无统计学意义(均P>0.05);Rh( + )中的ABO血型构成比与R h(-)中的A B0血型构成比、汉族R h(-)献血人群表现型与A B0血型表现型、R h(-)献血人群表现型与AB0血型表现型差异均无统计学意义(均尸>0.05)。

共确认Rh(-)1802名,占0.64%(含少数民族和重复献血者),多表现为ccdee > Ccdee > ccdEe > CCdee > CcdEe > ccdEE > CCdEe,cde 单倍型频率最高。

无偿献血者中汉族 266 498名,R h(-)1015名,占0.38%;R hD抗原在A B0血型系统中分布为B>0>A>A B,R h血型表现型分布符合Hardy-W einberg平衡定律。

结论济南市A B0血型与R h血型分布符合Hardy-W einberg平衡定律,两者间无相关性,是独立遗传的两个不同的血型系统。

【关键词】Hardy-W einberg平衡定律;A B0血型;R h血型;基因频率;血型库中图分类号:R446.6 文献标识码:A 文章编号:1671 - 5039(2021)04 - 0416 - 05Distribution of ABO blood group and Rh blood group among voluntary blood donors in Ji'nanCAO Dan1, ZAN Xiao-ling2 y ZHU Hai-feng1, TANG Jian-hua1, LI Wen-chao x, LI Hong-tao'1. Shandong Blood Center, Ji'n a n 250014, China;2. Affiliated Hospital o f S handong University o f Traditional Chinese Medicine【Abstract】Objective To study the distribution characteristics of ABO blood group and Rh blood group in volun­tary blood donors in J i' nan area, and to analyze their ABO and Rh antigen phenotypes and gene frequencies, whetherthey conform to Hardy-Weinberg equilibrium law or not. Methods ABO blood group and Rh blood group were detected in282 413 blood donation samples collected from Shandong Blood Center by blood group serological method, and Chi-squaretest was carried out by blood group population genetics research method. Results From 2015 to 2017, the ABO phenotyp­ic distribution of 282 413 voluntary blood donors was B>0>A>A B, ABO gene frequency was r>q>p,the expectedand observed values were in accordance with Hardy-W einberg equilibrium law. There were no significant differences inABO blood group distribution among different genders, Han and ethnic minorities (both P>0.05). Between ABO bloodgroup of R h( + ) and R h(-), R h(-)blood group of Han nationality and phenotype of ABO blood group, the phenotype ofR h(-)blood group and phenotype of ABO blood group, there were no significant differences in the distribution (all P>0.05). A total of 1802 cases of R h( -)were confirmed, accounting for 0.64% (including ethnic minorities and repeatedblood donors). The most common manifestations were ccdee > Ccdee > ccdEe > CCdee > CcdEe > ccdEE > CCdEe, hadthe highest haplotype frequency cde. Among which 266 498 cases were Han nationality, 1015 cases were R h(-), account-DOI: 10.12183/j.scjpm.2021.0416基金项目:山东省医药卫生科技发展计划项目(2011HZ086)作者简介:曹丹(1968—),女,大学专科,主管技师,主要从事血液筛查检测工作通信作者:李洪涛,E-mail:***************ing for0.38%; The distribution of RhD antigen in ABO blood group system was B>0>A>AB.The distribution of Rh blood group phenotype accords with Hardy-Weinberg equilibrium law.Conclusion Ji'nan ABO group and Rh group with the Hardy-Weinberg equilibrium law,no correlation between the two,are two different independent genetic blood group system.【Keywords】Hardy-Weinberg equilibrium law;ABO blood group;Rh blood group;Gene frequency; Blood bank人类血液的主要特征之一是血型,主要体现在 血液抗原表达各种遗传性状具有高度复杂性和多 态性,在遗传学、法医学、临床医学等学科中起着至 关重要的作用,用于研究人类的遗传规律、法医学 和临床输血上的亲子鉴定和个体识别等。

使用生物大数据技术进行种群遗传学研究的实用指南

使用生物大数据技术进行种群遗传学研究的实用指南

使用生物大数据技术进行种群遗传学研究的实用指南种群遗传学是研究种群内基因变异和基因频率变化的科学领域。

随着技术的进步和生物大数据的快速积累,生物大数据技术在种群遗传学研究中扮演着重要的角色。

本指南将为您介绍如何使用生物大数据技术进行种群遗传学研究,并提供一些实用的方法和工具。

一、选择合适的生物大数据库在进行种群遗传学研究之前,选择合适的生物大数据库至关重要。

您可以考虑以下几个常用的生物大数据资源:1. 1000 Genomes Project (千人基因组计划):包含全球多个人种的基因组数据,为研究者提供了广泛的遗传变异数据。

2. dbGaP(数据库of Genotypes and Phenotypes):存储了大量的人类遗传数据,包括基因型和表型信息。

3. Ensembl:提供了各种生物物种的基因组、基因和变异信息,是一个广泛使用的数据库。

二、分析种群遗传学数据的方法使用生物大数据技术进行种群遗传学研究,需要掌握一些常用的数据分析方法。

以下是几种常见的方法:1. 主成分分析(PCA): PCA可用于检测种群间的遗传结构,帮助研究者理解不同种群之间的遗传关系,并发现种群间的遗传差异。

2. 育种值评估:育种值评估可用于鉴定具有良好育种潜力的个体。

通过分析基因型和表型数据,可以预测个体的遗传价值,为育种者提供指导。

3. 单倍型(Haplotype)分析:单倍型分析可以用于研究基因组区域内的遗传变异。

通过识别具有相似遗传序列的个体,可以推断出遗传变异的源头,并研究其与特定表型的相关性。

三、使用生物大数据工具为了进行种群遗传学研究,可以利用许多开源的生物大数据工具。

以下是一些常用的工具:1. PLINK:一个功能强大的工具,可用于进行基因型数据质量控制、关联分析和种群结构分析等。

2. ADMIXTURE:用于进行种群混合分析,根据单倍型数据估计个体的祖源组成。

3. PopGenome:一个R语言包,用于进行种群遗传学分析和可视化。

关于遗传基因检测中基因变异临床意义分级的建议

关于遗传基因检测中基因变异临床意义分级的建议
天津医药 2021 年 6 月第 49 卷第 6 期
561
建议与共识
关于遗传基因检测中基因变异临床意义分级的建议
天津市医学会医学遗传学分会,天津市医学会遗传咨询分会
摘要:目前基因检测报告主要对基因变异的致病性及临床含义进行描述,实验室人员撰写报告时往往把焦点放 在变异本身的性质,而临床医生的关注点主要在案例的临床情况。这种关注点的差异时常造成临床医生对检测报 告的误解。本建议提出基因变异的临床分级方案,即在基因检测报告中不仅应对基因变异的致病性进行分类,还应 增加临床意义的分级。推荐将基因变异的临床指导意义分为 5 个级别:具有明确的临床指导意义、具有潜在的临床 指导意义、临床指导意义不明确、具有意外发现的临床指导意义和没有明显的临床指导意义。本方案强调了临床表 型的准确性、全面性,以及实验室-临床沟通的重要性,并且提倡表型描述的标准化,这将有助于临床医生与实验室 人员之间的相互理解,有利于基因检测报告的解读和遗传咨询。
基金项目:国家重点研发计划项目(2017YFC1001900,2020YFC2008100);国家自然科学基金资助项目(81771589);京津冀专项项目 (19JCZDJC65400);天津市卫生行业重点攻关项目(16KG166);天津市重大疾病防治科技重大专项(18ZXDBSY00170,18ZXDBSY00230);天津 市卫生健康科技项目(ZC20120,KJ20166)
在完成了基因检测之后,还可以根据基因检测 报告所提示的疑似诊断再次进行表型采集,以便发 现较为次要的表型,进一步确认或者排除诊断。尤 其是针对携带可疑基因变异的家庭成员,也应尽量 采集其表型,帮助对基因型-表型之间的相关性进行 评估及确认。 1.2 辅助进行表型描述以及查询的数据库和网站 1.2.1 人 类 表 型 标 准 用 语 联 盟(Human Phenotype Ontology,HPO)和 中 文 人 类 表 型 标 准 用 语 联 盟 (CHPO)表型标准化数据库 随着对人类疾病研究 的逐渐深入,科研工作者们越来越意识到临床表型 数据的重要性,对基因型与表型数据进行联合分析 成为众多疾病研究的一个方向。HPO 旨在提供人类 疾病中用于描述表型异常的标准词汇,目前包含约 11 000 个名词和超过 115 000 条关于遗传性疾病的 注 释 ,还 提 供 了 一 套 针 对 约 4 000 种 疾 病 的 注 释

生物大数据技术如何解读遗传多样性的模式

生物大数据技术如何解读遗传多样性的模式

生物大数据技术如何解读遗传多样性的模式生物多样性是地球上生命的丰富性和多样性,是生态系统稳定和运行的基础。

遗传多样性是生物多样性的一部分,描述了同一物种内个体之间的基因差异。

遗传多样性研究是生态学和进化生物学领域的重要研究方向之一。

近年来,随着生物大数据技术的发展,科学家们能够更好地解读遗传多样性的模式。

生物大数据技术的出现,使得科学家们能够高效地处理和分析大量的生物序列数据。

生物序列数据包括DNA、RNA和蛋白质序列等。

通过测序技术获得的生物序列数据可以提供关于遗传多样性的有关信息,如基因型、变异位点分布以及系统进化关系等。

科学家们运用生物大数据技术,可以从整体上把握遗传多样性的模式并进一步研究其对生态系统和物种演化的影响。

首先,生物大数据技术可以帮助研究者识别和比较遗传多样性的模式。

DNA 序列中的不同基因型和变异位点可以反映个体之间的遗传关系,同时也可以揭示物种内部的种群结构和遗传流动。

研究者通过分析大量的生物序列数据,可以检测和比较物种内外的遗传变异模式,以此来研究个体间和群体间的遗传多样性。

其次,生物大数据技术可以揭示物种的演化历史和系统关系。

通过比对物种间的DNA序列差异,科学家们可以构建系统进化树,进一步了解物种的起源和进化历程。

生物大数据技术的高通量测序能力和数据分析算法的不断改进,使得更多的物种的系统进化关系得以揭示。

这种揭示物种演化历史和系统关系的信息对于保护物种和开展生态系统恢复等具有重要的指导意义。

此外,生物大数据技术还可以研究遗传多样性与环境因素的相互作用。

生物多样性的维持是由遗传多样性和环境因素共同决定的。

通过对大量遗传数据的研究,科学家们可以更好地理解环境变化对遗传多样性的影响以及物种对环境变化的适应性。

例如,研究者可以通过比对不同环境中的遗传数据,了解物种在不同环境中的适应策略和遗传适应性。

最后,生物大数据技术的快速发展也为未来的研究提供了更多可能性和挑战。

随着测序技术的不断升级和生物大数据的不断积累,未来的遗传多样性研究将能够更精细地揭示个体和群体间的遗传模式。

人类遗传多样性大数据库建设与确认集

人类遗传多样性大数据库建设与确认集

人类遗传多样性大数据库建设与确认集人类遗传多样性是指人类之间存在的基因差异。

这种多样性是由遗传变异和演化过程所导致的,是人类种群的宝贵财富。

了解人类遗传多样性对于研究人类起源、进化和遗传疾病具有重要意义。

为了更好地认识和利用人类遗传多样性,建设和确认一个大规模的人类遗传多样性数据库是非常关键的。

建设一个人类遗传多样性大数据库需要收集大量的基因数据样本。

这些样本可以从不同地理区域、不同人种、不同年龄和性别的个体中获取。

为了确保样本的代表性和可靠性,需要将样本收集的范围扩大到全球范围,并严格遵循伦理原则和法律规定。

同时,应该考虑到不同人种和族群之间的遗传差异,以确保数据库的广泛适用性和可操作性。

在数据库建设过程中,采用先进的基因测序技术是关键。

现如今,高通量测序技术的出现使得大规模基因数据的获取变得更加快速和高效。

同时,应该采用多种不同的基因标记和测序方法,以获得更全面和准确的遗传信息。

此外,必须采取严格的质量控制措施,确保测序数据的准确性和一致性。

在数据库的建设过程中,确保数据的安全性和隐私性是至关重要的。

应该采取措施来保护个体的隐私和数据的安全。

对于涉及个体敏感信息的数据,应该进行去识别化处理和加密存储。

合理的数据访问和使用政策也应该制定,以确保数据的合法和合理使用。

在人类遗传多样性数据库建设完成后,数据库的确认集需要进行一系列的验证和分析。

首先,需要对数据库中的数据进行质量检查,确保数据的准确性和可信度。

此外,还需要进行遗传多样性分析,了解不同人种和地理区域之间的遗传差异和相似性。

这将有助于我们更好地理解人类起源和迁徙的模式。

建设一个人类遗传多样性大数据库和进行确认集分析将为人类遗传学和遗传疾病研究提供巨大的帮助。

通过大规模样本的收集和测序,我们可以更全面地了解人类遗传多样性,为疾病的预防和治疗提供更准确和有效的指导。

此外,数据库的分享和合作也将促进全球人类遗传学研究的发展,并推动人类健康的进步。

基因组DNA甲基化研究综述

基因组DNA甲基化研究综述

目录摘要 (1)Abstract (1)1 甲基化机制 (3)1.1 动物中DNA甲基化机制 (3)1.2 植物中DNA甲基化机制 (4)1.3 DNA甲基化与组蛋白甲基化的关系 (5)1.4 甲基化DNA抑制基因转录沉默的机制 (6)2 甲基化研究技术 (7)2.1 甲基化敏感扩增多态性 (8)2.2 单分子实时测序法直接检测DNA甲基化 (9)2.3 依赖于单分子纳米孔技术的测序 (9)3 甲基化应用 (10)3.1 DNA甲基化在植物中的应用 (10)3.2 DNA甲基化在动物中的应用········································································104 展望 (12)参考文献 (13)基因组DNA甲基化研究进展摘要:动植物基因组中存在广泛的DNA甲基化修饰,主要存在于CpG 和CpNpG位点。

作为表观遗传学最重要的现象之一,DNA甲基化在调控自身基因表达,抵御外来入侵DNA、转座子元件及转基因沉默从而维持基因组稳定等方面具有重要作用。

phenotype翻译

phenotype翻译

phenotype翻译【释义】phenotypen.表型,表现型;显型【短语】1 phenotype cloning表型克隆2 Bombay Phenotype孟买型; 孟买表现型; 详细翻译3 clinical phenotype临床表型; 临床类型4 The Extended Phenotype延伸的表现型; 延伸的表型; 延伸表现型5 synthetic phenotype合成型; 合成表型; 转变为分泌型6 nuclear phenotype核表现型;遗核表型7 Lewis phenotypeLewis表型8 phenotype value表现型值9 phenotype screening表型筛选法; 表型筛选【例句】1 Phenotype and niche are exchangeable notions.表型与小生境是可以互相替换的概念。

2 Their metabolism phenotype was mild or moderate HPA.生化代谢表型均为轻度或中度hpa。

3 The result is a cell with completely new traits: a new phenotype.其后果就是形成一个具有全新特性的细胞:也就是一个新的表现型。

4 But B27 polymorphism affect disease phenotype has not been reported.但B27多态性是否影响疾病表型尚未见报道。

5 The object of selection is the phenotype in its surrounding environment.选择的对象是在周边环境之中的表型。

6 Phenotype differentiation and surface marker of periodontal ligament cell.牙周韧带细胞的表型分化及其表面标记物。

现代智人染色体中存在源自古尼安德特人线粒体的DNA序列

现代智人染色体中存在源自古尼安德特人线粒体的DNA序列

现代智人染色体中存在源自古尼安德特人线粒体的DNA序列张佳;周翠兰;肖莉;庹勤慧;彭翠英;郭紫芬;廖端芳;李凯【期刊名称】《数字中医药(英文)》【年(卷),期】2022(5)3【摘要】古尼安德特人部分线粒体DNA测序成功,一定程度上终结了早期人类有关走出非洲与多中心起源的争论。

但由于缺乏尼安德特人的染色体基因组序列,其线粒体DNA在古人类学领域的重要价值受到了限制。

本研究引入核化线粒体组学分析法将线粒体DNA视为转基因和将人设定为转基因人。

采用已有尼安德特人的线粒体DNA比对现代智人的基因组DNA,得到40段同源性较高片段。

其中5个片段与尼安德特人的同源性高于现代智人线粒体DNA且含有尼安德特人特有的单倍体序列。

当将数据库中不同尼安德特人个体的线粒体DNA序列作为一个整体与现代智人线粒体DNA作为另一个整体比对时,高于98%的基因相似度不但说明已有尼安德特人线粒体序列是分析尼安德特人的有用数据,同时也说明尼安德特人与现代智人在进化上的近亲关系。

【总页数】6页(P236-241)【作者】张佳;周翠兰;肖莉;庹勤慧;彭翠英;郭紫芬;廖端芳;李凯【作者单位】湖南中医药大学个体化诊疗技术国家(联合)工程研究中心;南华大学生命科学学院SNP研究所;湖南中医药大学湘产大宗药材品质评价湖南省重点实验室【正文语种】中文【中图分类】R73【相关文献】1.寻找尼安德特人的DNA证据2.尼安德特人的基因组序列3.古DNA研究揭示一欧洲早期现代人的祖先曾与尼安德特人混血4.古DNA研究揭示——欧洲早期现代人祖先曾与尼安德特人混血5.智力水平还是文化表现?——尼安德特人与现代人文化精致程度差异研究述评因版权原因,仅展示原文概要,查看原文内容请购买。

遗传多样性的定义、遗传多样性的定义、研究新进展和新概念

遗传多样性的定义、遗传多样性的定义、研究新进展和新概念

遗传多样性的定义遗传多样性的定义、、研究新进展和新概念研究新进展和新概念胡志昂 王洪新(中国科学院植物研究所,北京100093)摘要摘要 遗传多样性即生物的遗传变异,有广义和狭义两种定义。

从远古时代起,人们就选择生 物的变异;形成现在数以千计、万计的动植物和微生物品种。

达尔文极端重视这个司空见惯极 为普遍而学术界长期忽视的现象,进行系统总结和深入的思考。

他把种内的多样性和物种的 多样性及适应性联系起来,为物种是由变种形成的进化理论提供大量无可辩驳的证据。

从此, 生物学从根本上摆脱了神学,成为一门科学。

遗传多样性研究产生了孟德尔遗传定律。

遗传 学随所用遗传标记从形态、细胞、生化和分子水平而从经典遗传学发展到分子遗传学。

基因概 念发生很大的变化。

现在是分子生物学阶段,包括生物多样性在内的各个生物学分支学科都 因为引进分子生物学的概念、方法而面目一新。

群体遗传学应该改变基因观念。

以利用遗传 资源为基础的“绿色革命”在解决粮食危机的同时,也暴露出遗传一致性的危险。

反复说明 遗传多样性的损失是物种绝灭的内部原因。

全球遗传资源运动发展为全球生物多样性运动。

生物技术对生物多样性的利用早已超出物种的范围。

最近国内外的研究均表明,生态系统和 物种的保护都离不开分子遗传研究。

我们最近提出了生态系统功能基本单位的概念。

认为生 态系统功能中生物多样性的作用主要是基因的多样性。

总之,遗传多样性应该有广义的解释, 才能和物种和生态系统多样性的研究结合起来,组成生物多样性科学。

关键关键词词 遗传多样性 历史 进展 定义 生物多样性科学 生态系统基本功能单位1 1 遗传多样性的定义遗传多样性的定义遗传多样性的定义McNeely等(1990)在回答“什么是生物多样性”问题时,为遗传多样性下的定义是:“遗传 信息的总和,蕴藏在地球上植物、动物和微生物个体的基因中。

”但在群体遗传学界,遗传多样 性主要指种内群体间和群体内的遗传变异,施立明等(1993)称为“狭义”的定义;而把Mc- Neely的定义称为广义的定义,物种以上的分类群以及种群以上的生态学系统都包括了各自 的遗传多样性。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Mining Phenotypes and Informative Genes from GeneExpression DataChun T ang Aidong Zhang Jian PeiDepartment of Computer Science and EngineeringState University of New Y ork at Buffalo,Buffalo,NY14260chuntang,azhang,jianpei@ABSTRACTMining microarray gene expression data is an important research topic in bioinformatics with broad applications.While most of the previous studies focus on clustering either genes or samples, it is interesting to ask whether we can partition the complete set of samples into exclusive groups(called phenotypes)andfind a set of informative genes that can manifest the phenotype structure. In this paper,we propose a new problem of simultaneously min-ing phenotypes and informative genes from gene expression data. Some statistics-based metrics are proposed to measure the quality of the mining results.Two interesting algorithms are developed: the heuristic search and the mutual reinforcing adjustment method. We present an extensive performance study on both real-world data sets and synthetic data sets.The mining results from the two pro-posed methods are clearly better than those from the previous meth-ods.They are ready for the real-world applications.Between the two methods,the mutual reinforcing adjustment method is in gen-eral more scalable,more effective and with better quality of the mining results.Categories and Subject DescriptorsH.2.8[Database Management]:Database Applications—Data Min-ingGeneral TermsAlgorithms,ExperimentationKeywordsPhenotype,informative genes,array data,bioinformatics1.INTRODUCTIONThe DNA microarray technology enables rapid,large-scale screen-ing for patterns of gene expression.The raw microarray data are transformed into gene expression matrices in which a row repre-sents a gene and a column represents a sample.The numeric value in each cell characterizes the expression level of a specific gene in a particular sample.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.SIGKDD’03,August24-27,2003,Washington,DC,USACopyright2003ACM1-58113-737-0/03/0008...$5.00.Effective and efficient analysis techniques are demanding as geneexpression data are accumulated rapidly.The gene expression ma-trix can be analyzed in two ways.On the one hand,genes canbe clustered based on similar expression patterns[7].In such agene-based method,the genes are treated as the objects,while thesamples are the attributes.On the other hand,the samples can bepartitioned into homogeneous groups.Each group may correspondto some particular macroscopic phenotypes,such as clinical syn-dromes or cancer types[8].Such a sample-based method regardsthe samples as the objects and the genes as the attributes.While a gene expression matrix can be analyzed from orthogonalangles,the gene-based and sample-based methods are facing verydifferent challenges.The number of genes and the number of sam-ples are very different in a typical gene expression ually,we may have tens or hundreds of samples but thousands or tensof thousands of genes.Thus,we may have to adopt very differentcomputational strategies in the two situations.In this paper,we will focus on the sample-based analysis.Inparticular,we are interested in mining phenotypes and informativegenes.Within a gene expression matrix,there are usually severalphenotypes of samples related to some diseases or drug effects,such as diseased samples,normal samples or drug treated samples.Figure1shows a gene expression matrix containing two pheno-types of samples.Samples are in one phenotype,while samples are in the other one.For the sake of simplicity,the gene expression levels in the matrix are discretized into binaryvalues,i.e.,either“on”or“off”.Figure1:A simplified example of a gene expression matrix.Biologists are particularly interested infinding the informative genes that manifest the phenotype structure of the samples.For example,in Figure1,are informative genes,since each gene shows all“on”signals for one phenotype of samples and all“off”for the other phenotype.provideinconsistent signals for the samples in a phenotype,and thus cannotbe used to distinguish phenotypes.They are called non-informativegenes.If phenotype information is known,the major task is to select theinformative genes that manifest the phenotypes of samples.Thiscan be achieved by supervised analysis methods such as the neigh-borhood analysis[8]and the support vector machine[4].Although the supervised methods are helpful,the initial identi-fication of phenotypes over samples are usually slow,typically byevolving through years of hypothesis-driven research[8].There-fore,it is natural to ask“Can wefind both the phenotypes and theinformative genes automatically at the same time?”In other words,we want to discover the phenotypes of the samples as well as toidentify a set of genes that manifests the phenotypes of the sam-ples.For example,in Figure1,if the seven samples’phenotypesare unknown,can we correctly distinguish samples from as well as output as informative genes? Mining both the phenotypes and the informative genes at thesame time is challenging.First,the values in data matrices are allreal numbers such that there is usually no clear border betweeninformative genes and non-informative ones.Second,there aremany genes but only a small number of samples.There is no ex-isting technique to correctly detect class structures from samples.Last,most of the genes collected may not necessarily be of inter-est.The experience shows that only less than of all the genesinvolved in a gene expression matrix are informative ones[8].Inother words,the gene expression matrix is very noisy.In this paper,we tackle the problem of mining phenotypes andinformative genes from gene expression data by developing novelunsupervised learning methods.We claim the following contribu-tions.We identify and formulate the problem of simultaneously mining phenotypes and informative genes.The major differ-ences between this novel problem and the previous studies on clustering or subspace clustering are elaborated.A set of statistical measurements of quality of phenotypesand informative genes are proposed.They coordinate andcompromise both the sample phenotype discovery and theinformative gene selection.A heuristic search method and a mutual reinforcing adjust-ment approach are devised tofind phenotypes and informa-tive genes with high quality.Particularly,the mutual rein-forcing adjustment method dynamically manipulates the re-lationship between samples and genes while conducting aniterative adjustment to detect the phenotypes and informativegenes.An extensive experimental evaluation over some real data sets is presented.It shows that our methods are both effec-tive and efficient and outperform the existing methods.The mutual reinforcing method has the even better performance. The remainder of this paper is organized as follows.Section2 reviews the related work.The quality measurements are proposed in Section3,while the mining algorithms are developed in Section 4.Section5reports the experimental results and Section6gives the conclusion.2.RELATED WORKRecently,some methods have been proposed tofind macroscopicphenotypes from samples[6,14].In these approaches,samples arepartitioned by conventional clustering methods,such as K-means,self-organizing maps(SOM),hierarchical clustering(HC),or graph based clustering.However,these traditional clustering techniques cannot handle the heavy noise well in the gene expression data.Al-though some approaches[16]filter out genes for partition samples, the genefiltering processes are non-invertible.The deterministic filtering will cause samples to be grouped based on the local deci-sions.Sub-space clustering have been studied extensively[1,5,17]to find subsets of objects such that the objects appear as a cluster in a sub-space formed by a subset of the attributes.Although the sub-space clustering problem may appear similar to the phenotype and informative gene mining problem at thefirst look,there are two significant and inherent differences between these two.On the one hand,in subspace clustering,the subsets of attributes for various subspace clusters are different.However,in phenotypes and infor-mative gene mining,we want tofind a unique set of genes to man-ifest a partition of all samples.On the other hand,two subspace clusters can share some common objects and attributes.Some ob-jects may not belong to any subspace cluster.Nevertheless,in phe-notype and informative gene mining,a sample must be in a pheno-type and the phenotypes are exclusive.3.QUALITY MEASUREMENTS OF PHE-NOTYPES AND INFORMATIVE GENES The phenotypes of samples and informative genes should sat-isfy two requirements simultaneously.On the one hand,the ex-pression levels of each informative gene should be similar over the samples within each phenotype.On the other hand,the expression levels of each informative gene should display a clear dissimilar-ity between each pair of phenotypes.To quantize how well the phenotypes and informative genes meet the two requirements,we introduce the intra-phenotype consistency and inter-phenotype di-vergency measurements.Let be a set of samples andbe a set of genes.The corresponding gene expression matrix can be represented as,where is the expression level value of sample on gene.Given a subset of samples and a subset of genes,is the corresponding sub-matrix w.r.t.and.Intra-phenotype consistency:The variance of each row mea-sures whether a given gene has consistent expression level values over all samples within the sub-matrix.A small variance value in-dicates that the gene has consistent values on all the samples.Thus we can measure whether every gene has good consistency on a set of samples by the average of variance in the subset of genes.That is,we define the intra-phenotype consistency as:Inter-phenotype divergency:The inter-phenotype divergency quantizes how a subset of genes can distinguish two phenotypes of samples.The inter-phenotype divergency of a set of genes on two groups of samples(denoted as and)such that, ,and is defined as(1)The greater the inter-phenotype divergency,the better the genes dif-ferentiate the phenotypes.The quality of phenotypes and informative genes:Suppose a set of samples is partitioned into exclusive groups,. Given a set of genes,the quality measure quantizes how purethe phenotypes are w.r.t.the genes and how well the genes differ-entiate the phenotypes.(2) The greater the quality value,the better the phenotypes and the more informative the genes are.Problem statement.Given a gene expression matrix with samples and genes,and the number of phenotypes,the prob-lem of mining phenotypes and informative genes is tofind a partition of the samples as the phenotypes and a subset of genes as informative genes such that the quality measure()is maximized.4.ALGORITHMSIn this section,we will develop two methods.Thefirst one is a heuristic searching algorithm adopting the simulated annealing technique[10].Moreover,we will propose a novel mutual rein-forcing adjustment algorithm to approximate the best solution. Both algorithms maintain a set of genes as the candidates of in-formative genes and a partition of samples as the candidates of phe-notypes.The best quality will be approached by iteratively adjust-ing the candidate sets.Both algorithms maintain two basic elements,a state and the corresponding adjustments.The state of the algorithm describes the following items:A partition of samples.A set of genes.The quality of the state calculated based on the partitionon.An adjustment of a state is one of the following.For a gene,insert into;For a gene,remove from;For a sample in,move to where.To measure the effect of an adjustment to a state,we calculate the quality gain of the adjustment as the change of the quality,i.e., ,where and are the quality of the states before and after the adjustment,respectively.Now,the problem becomes,given a starting state,we try to apply a series of adjustments to reach a state such that the accumulated quality gain is maximized.Both algorithms record the best state, in which the highest quality so far is achieved.4.1A Heuristic Searching AlgorithmAn immediate solution(shown in Figure2)to the problem is to start from a random state and iteratively conduct adjustments to approach the optimal state.The algorithm has two phases:initialization phase and iterative adjusting phase.In the initialization phase,an initial state is generated randomly and the corresponding quality value,,is computed.In the iterative adjusting phase,during each iteration,genes and samples are examined one by one.The adjustment to a gene or sam-ple will be conducted if the quality gain is positive.Otherwise, the adjustment will be conducted with a probability, where is a decreasing simulated annealing function[10]and is the iteration number.The probability function has two components.Thefirst part, ,considers the quality gain in proportion.The more reduces,Algorithm1(Heuristic Searching)Initialization phase:adopt a random initialization and calculate the quality Iterative adjusting phase:1)list a sequence of genes and samples randomly;for each gene or sample along the sequence,do1.1)if the entity is a gene,compute for the possible insert/remove;else if the entity is a sample,compute for the largest quality gain move;1.2)if,then conduct the adjustment;else if,then conduct the adjustmentwith probability;2)goto1),until no positive adjustment can be conducted;3)output the best state;Figure2:The heuristic search algorithm.the less probability the adjustment will be performed.The second part,,is a decreasing simulated annealing function where is the iteration number.In our implementation,we set,and .The heuristic algorithm is sensitive to the order of genes and sample adjustments considered in each iteration.To give every gene or sample a fair chance,all possible adjustments are sorted randomly at the beginning of each iteration.We set the termination criterion as whenever in an iteration,no positive adjustment is conducted.Once the iteration stops,the par-tition of samples and the candidate gene set in the best state will be output.4.2The Mutual Reinforcing Adjustment Al-gorithmIn the iteration phase of the heuristic searching algorithm,sam-ples and genes are examined and adjusted with equal chances.How-ever,since the number of samples is far less than the number of genes,each sample should play a crucial role during the adjustment process.As they are treated equally with all genes,the samples thus have less chances to be adjusted.In addition,because the number of samples is quite small,even one or two noise or outlier samples may highly interfere the quality and the adjustment decisions.The heuristic approach cannot detect or eliminate the influence of noise or outliers in the samples effectively.Thus,we propose here a more robust approach called mutual reinforcing adjustment algorithm. The general idea is that we adopt a deterministic,noise-insensitive method to adjust samples.The algorithm is shown in Figure3.Details of the algorithm will be discussed in the following subsections.4.2.1Partitioning the matrixThefirst step is to divide the complete set of samples and the set of candidate informative genes into some smaller groups.At the beginning of thefirst iteration,the set of candidate informative genes contains all the genes.The algorithm CAST(for cluster affinity search technique)[3] is applied to group genes and samples and the Pearson’s Correla-tion Coefficient is chosen to calculate the similarity matrix.1CAST is a method specially designed for grouping gene expression data based on their pattern similarities.Thus the entities belonging to the same group should have similar expression patterns while the 1The correlation coefficient between two vectorsand isAlgorithm2(Mutual Reinforcing Adjustment)start from,do the following iteration:Iteration phase:1)partitioning the matrix1.1)group samples into()groups;1.2)group genes in into groups.2)identifying the reference partition2.1)compute reference degree for each sample groups;2.2)select groups from groups of samples;2.3)do partition adjustment.3)adjusting the genes3.1)compute for reference partition on;3.2)perform possible adjustment of each genes3.3)update,go to step1.Until no positive adjustment can be conducted.Refinement phase:find the highest pattern quality state.do state refinement.Figure3:The mutual reinforcing adjustment algorithm. different groups should have different,well separated patterns.A small amount of outliers will befiltered out without being assigned to any groups.One advantage of CAST is that the number of groups does not need to be pre-specified.Instead,a threshold has to be set to approximate the size of each group.In biological applications, the experiences show that genes associated with similar functions always involve from several dozen to several hundred entities[8]. Thus,when grouping genes,the threshold should be set so that a majority of groups will contain several dozen to several hundred genes.For samples,the threshold can be chosen so that the number of groups ranges within to maintain a good compromise between the number of clusters and the separation among them. 4.2.2Reference partition detectionSuppose that,after partitioning the matrix,we have exclusive groups of samples and groups of genes in set of candidate informative genes.Among the sample groups,groups of them will be selected to“represent”the phenotypes of the samples.The set of represen-tatives is called a reference partition.The reference partition is se-lected among the sample groups which have small intra-phenotype consistency value(i.e.,being highly consistent)and large inter-phenotype divergency among each other.The purpose of select-ing such a reference partition is to approximate the phenotypes as closely as possible and to eliminate noise interference caused by outlier samples.A reference degree is defined for each sample group by accu-mulating its intra-phenotype consistency over the gene groups generated from the last step.This reference degree measures the likelihood of a given sample group to be included into the refer-ence partition.That is,(3)A high reference degree indicates that the group of the samples are consistent on most of the genes.Such groups should have a high probability to represent a phenotype of samples.Wefirst choose a sample group having the highest reference de-gree,denoted by.Then,we choose the second sample group that has the lowest intra-phenotype consistency value and the highest inter-phenotype divergency with respect to among the remaining groups.When it comes to the third group,the inter-phenotype divergency with respect to both and will be considered.Here we define a selection criterion for the-th sam-ple group by combining its intra-phenotype consistency and inter-phenotype divergency with respect to the sample groups al-ready selected.(4) We will calculate values for the groups which have not been selected.The group having the highest value will be selected as the-th sample group.When there is a tie,we will choose the group in which the number of samples is the largest.Totally sample groups,along with thefist group,are selected to form the reference partition.The reference partition selected above is based on the phase par-titioning the matrix.Some samples in other groups that may also be good candidates to form the representative partition may be missed. Thus,we need to conduct a“partition adjustment”step by adding some other samples that can improve the quality of the reference partition.For each sample which is not yet included in the ref-erence partition,its probability to be added into the reference par-tition is determined as follows.We calculate the quality gains for inserting groups in the reference partition.The group with the highest quality gain is called the matching group of.If the quality gain is positive,then will be inserted into its matching group;oth-erwise,will be inserted into its matching group with a probability .4.2.3Gene adjustmentIn this step,the reference partition derived from the last step is used to guide the gene adjustment.Notice that the quality in the mutual reinforcing method is not computed based on the full parti-tion of samples,but on the reference partition and the current can-didate gene set.Thus the state and the best state maintained by the algorithm are also based on the reference partition.The gene adjustment process is similar to that in the heuristic searching algorithm.All the genes will be examined one by one in a random order.The possible adjustment is to“remove”genes from or to“insert”genes into for genes not in the candidate set.The adjustment of each gene will be conducted if the quality gain is positive,or with a probability if the quality gain is negative.After each iteration,the gene candidate set is changed.The next iteration starts with the new gene candidate set and the complete set of samples.In each iteration,we use the gene can-didate set to improve the reference partition,and use the reference partition to improve the gene candidate set.Therefore,it is a mutu-ally reinforcing process.4.2.4Refinement phaseThe iteration phase terminates when no positive adjustment is conducted in the last iteration.The reference partition correspond-ing to the best state may not cover all the samples.Therefore,a refinement phase is conducted.Wefirst add every sample not cov-ered by the reference partition into its matching group.Thus,the refined reference partition becomes a full partition.It will be output as the phenotypes of the samples.Then,a gene adjustment phase is conducted.We execute all adjustments with positive quality gain. Then the genes in the candidate set of genes are output as infor-mative genes.It can be shown that time complexity of both the heuristic search-ing and the mutual reinforcing adjustment algorithm is,where is the number of paring to the random ad-justments of samples in the heuristic searching approach,the mu-tual reinforcing method has a more deterministic,more robust and more noise-insensitive adjustment method for samples by improv-ing the reference partition.5.PERFORMANCE EV ALUATIONIn this section,we will report an extensive performance evalua-tion on the effectiveness and efficiency of the proposed two meth-ods using various real-world gene expression data sets.The ground-truths of the partition,which includes the information such as how many samples belong to each class and the class label for each sam-ple,is used only to evaluate the experimental results.Rand index[12],the measurement of“agreement”between the ground-truths of macroscopic phenotypes of the samples and the partition results,is adopted to evaluate the effectiveness of the al-gorithm.The Rand index is between and.The higher the index value,the better the algorithm performs.Table1shows the data sets and the results obtained by apply-ing our two algorithms and some unsupervised sample clustering algorithms proposed previously.A recently developed effective subspace clustering algorithm,-cluster[17],is also included.As shown,the two methods proposed in this paper consistently achieve clearly better mining results than the previously proposed meth-ods.In J-Express[13],CLUTO and SOTA,samples are partitioned based on the complete set of genes.The mining results using these methods suffer from the generally heavy noise in the gene expres-sion data sets.Other approaches adopt some dimensionality reduc-tion techniques,such as the principal component analysis(PCA). However,the principal components in PCA do not necessarily cap-ture the class structure of the data.Therefore the subspace clus-tering methods may notfind the phenotypes and informative genes precisely.Figure4shows the informative genes of the Leukemia-G1data set detected by the mutual reinforcing adjustment approach. genes are output as the informative genes.In Figure4,each column represents a sample,while each row corresponds to an informative gene.The description and probe for each gene are also listed.Dif-ferent colors(grey degree in a black and white printout)in the ma-trix indicates the different expression levels.Figure4shows that the top genes distinguish ALL-AML phenotypes according to “on-off”pattern while the rest genes follow“off-on”pattern. We also apply two extensively accepted supervised methods,the neighborhood analysis[8]and the statistical regression modeling approach[15]to mine this data set.Both methods selected top genes to distinguish ALL-AML classes.Among the two top-gene sets,genes are overlapped.The number in the column“match”in Figure4shows whether the corresponding gene matches either of these two supervised meth-ods.That is,a here means that this gene is in the top genes selected by both supervised methods,while a means the gene is selected by one of the above two methods.Interestingly,as shown in Figure4,out of the informative genes identified by the mutual reinforcing adjustment method are either selected by the neighborhood analysis or by the statistical regression modeling approach.This strongly indicates that,even without supervision, the mutual reinforcing method learns well from the real-world data sets.Table2reports the average number of iterations and the response time(in second)of the above gene expression data sets.Each al-gorithm is executed times with different parameters.The algo-2Results are based on hierarchical clustering method.rithms are implemented with MATLAB package and are executed on SUN Ultra80workstation with450MHz CPU and256MB main memory.The number of iterations are dominated by the simu-late annealing function we used.We used a slow simulate annealing function for effectiveness of the approaches.Since in reality,the number of genes in the human genome is about, efficiency is not a major concern.Heuristic Searching Mutual Reinforcing Data Size#of iterations runtime#of iterations runtime966327581026827591161582912511715529120811682315694572652Table2:Number of iterations and response time(in second) with respect to the matrix size.6.CONCLUSIONSIn this paper,we identified the novel problem of mining phe-notypes and informative genes simultaneously from gene expres-sion data.A set of statistics-based metrics are proposed to coor-dinate and compromise both the sample phenotype discovery and the informative genes mining.We proposed two interesting min-ing methods:the heuristic search and the mutual reinforcing ad-justment approach.In particular,the mutual reinforcing adjust-ment approach incorporates deterministic,robust techniques to dy-namically manipulates the relationship between samples and genes while conducting an iterative adjustment to detect the phenotypes and informative genes.We demonstrated the performance of the proposed approaches by extensive experiments on various real-world gene expression data sets.The empirical evaluation shows that our approaches are effective and scalable on mining large real-world data sets.The mining results are consistently with good quality. 7.REFERENCES[1]Agrawal,R.,Gehrke,J.,Gunopulos,D.,and Raghavan,P.Automatic subspace clustering of high dimensional data fordata mining applications.In SIGMOD1998,ProceedingsACM SIGMOD International Conference on Management ofData,pages94–105,1998.[2]Alon U.,Barkai N.,Notterman D.A.,Gish K.,Ybarra S.,Mack D.and Levine A.J.Broad patterns of gene expressionrevealed by clustering analysis of tumor and normal colontissues probed by oligonucleotide array.Proc.Natl.Acad.A,V ol.96(12):6745–6750,June1999.[3]Ben-Dor A.,Shamir R.and Yakhini Z.Clustering geneexpression patterns.Journal of Computational Biology,6(3/4):281–297,1999.[4]Brown M.P.S.,Grundy W.N.,Lin D.,Cristianini N.,SugnetC.W.,Furey T.S.,Ares M.Jr.and HausslerD.Knowledge-based analysis of microarray gene expressiondata using support vector machines.Proc.Natl.Acad.Sci.,97(1):262–267,January2000.[5]Cheng Y.,Church GM.Biclustering of expression data.Proceedings of the Eighth International Conference onIntelligent Systems for Molecular Biology(ISMB),8:93–103,2000.[6]Dysvik B.and Jonassen I.J-Express:exploring geneexpression data using Java.Bioinfomatics,17(4):369–370,2001.Applications Note.。

相关文档
最新文档