gene-prediction
0-水产生物信息学-原理和方法
Bioinformatics in AquacultureBioinformatics in Aquaculture Principles and MethodsEdited by Zhanjiang(John)LiuThis editionfirst published2017©2017John Wiley&Sons LtdAll rights reserved.No part of this publication may be reproduced,stored in a retrieval system,or transmitted,in any form or by any means,electronic,mechanical,photocopying,recording or otherwise, except as permitted by law.Advice on how to obtain permission to reuse material from this title is available at /go/permissions.The right of Zhanjiang(John)Liu to be identified as the author of the editorial material in this work has been asserted in accordance with law.Registered OfficesJohn Wiley&Sons Ltd,The Atrium,Southern Gate,Chichester,West Sussex,PO198SQ,UKEditorial Office111River Street,Hoboken,NJ07030,USA9600Garsington Road,Oxford,OX42DQ,UKThe Atrium,Southern Gate,Chichester,West Sussex,PO198SQ,UKBoschstr.12,69469Weinheim,GermanyFor details of our global editorial offices,customer services,and more information about Wiley products, visit us at .Wiley also publishes its books in a variety of electronic formats and by print-on-demand.Some content that appears in standard print versions of this book may not be available in other formats.Limit of Liability/Disclaimer of WarrantyThe publisher and the authors make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties,including without limitation any implied warranties offitness for a particular purpose.This work is sold with the understanding that the publisher is not engaged in rendering professional services.The advice and strategiescontained herein may not be suitable for every situation.In view of ongoing research,equipmentmodifications,changes in governmental regulations,and the constantflow of information relating to the use of experimental reagents,equipment,and devices,the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical,piece of equipment,reagent,or device for,among other things,any changes in the instructions or indication of usage and for added warnings and precautions.The fact that an organization or web site is referred to in this work as a citation and/or potential source of further information does not mean that the author or the publisher endorses the information the organization or web site may provide or recommendations it may make.Further,readers should be aware that web sites listed in this work may have changed or disappeared between when this work was written and when it is read.No warranty may be created or extended by any promotional statements for this work.Neither the publisher nor the author shall be liable for any damages arising therefrom.Library of Congress Cataloging-in-Publication DataNames:Liu,Zhanjiang,editor.Title:Bioinformatics in aquaculture:principles and methods/edited by Zhanjiang(John)Liu. Description:Hoboken,NJ:John Wiley&Sons,2017.|Includes bibliographical references and index. Identifiers:LCCN2016045878(print)|LCCN2016057071(ebook)|ISBN9781118782354 (cloth:alk.paper)|ISBN9781118782385(Adobe PDF)|ISBN9781118782378(ePub)Subjects:LCSH:Bioinformatics.|Aquaculture.Classification:LCC QH324.2B54882017(print)|LCC QH324.2(ebook)|DDC572/.330285–dc23LC record available at https:///2016045878Cover image:Jackfish©wildestanimal/Getty Images,Inc.;Digital DNA strands©deliormanli/iStockphoto;DNA illustration©enot-poloskun/iStockphotoCover design:WileySet in10/12pt WarnockPro by SPi Global,Chennai,IndiavContentsAbout the Editor xxiiiList of Contributors xxvPreface xxxiPart I Bioinformatics Analysis of Genomic Sequences11Introduction to Linux and Command Line Tools for Bioinformatics3 Shikai Liu and Zhanjiang LiuIntroduction3Overview of Linux4Directories,Files,and Processes5Directory Structure6Filename Conventions6Wildcards6File Permission7Change File Permission7Environment Variables8Global Environment Variable9Local Environment Variable10Setting Environment Variables10Setting the PATH Environment Variable11Basic Linux Commands11List Directory and File11Create Directory and File12Change to a Directory12Manipulate Directory and File13Access File Content14Query File Content14Edit File Content16Redirect Content18Compare File Content191-Linux系统及命令行工具介绍vi ContentsCompress and Archive Files and Directories20Access Remote Files21Check Process and Job22Other Useful Command Lines23quota23df23du23free23zcat23file23find23history24Getting Help24Installing Software Packages24Installing Packages from a Configured Repository25Installing Software from Source Code25Compiling a Package26Accessing a Remote Linux Supercomputer System27Access Remote Linux from Local Linux System27Access Remote Linux from macOS27Access Remote Linux from Microsoft Windows27Demonstration of Command Lines28Further Reading292Determining Sequence Identities:BLAST,Phylogenetic Analysis,and Syntenic Analyses30Sen Gao,Zihao Yuan,Ning Li,Jiaren Zhang and Zhanjiang LiuIntroduction30Determining Sequence Identities through BLAST Searches30Web-based BLAST31UNIX-based BLAST32Download the Needed Databases32Setup Databases32Execute BLAST Searches33Parsing the BLAST Results33Determining Sequence Identities through Phylogenetic Analysis33Procedures of Phylogenetic Analysis34Collecting Sequences34Multiple Sequences Alignments35Tree Construction35Reliability Test38Other Software Available for Phylogenetic Analysis38Determining Sequence Identities through Synthetic Analysis38Procedures for Synteny Analysis39References412-BLAST,进化分析及共线性分析Contents vii 3Next-Generation Sequencing Technologies and the Assembly of Short Reads into Reference Genome Sequences43Ning Li,Xiaozhu Wang and Zhanjiang LiuIntroduction43Understanding of DNA Sequencing Technologies43454Life Sciences44Illumina Genome Analyzer45SOLiD45Helicos46Ion Torrent46PacBio47Oxford Nanopore48Preprocessing of Sequences48Data Types48Quality Control49Trimming49Error-Correction51Sequence Assembly52Reference-Guided Assembly52De Novo Assembly53Graph53Greedy Assemblers53Overlap-Layout-Consensus(OLC)Assemblers55De Bruijn Graph(DBG)Approach56PacBio Long Reads and Their Applications62Scaffolding64Gap Filling(Gap Closing)66Evaluation of Assembly Quality67References694Genome Annotation:Determination of the Coding Potential of theGenome74Ruijia Wang,Lisui Bao,Shikai Liu and Zhanjiang LiuIntroduction74Methods Used in Gene Prediction75Homolog-based Methods75Ab initio Methods76Case Study:Genome Annotation Examples:Gene Annotation of Chromosome 1of Zebrafish using FGENESH and AUGUSTUS79Pipeline Installations80FGENESH80AUGUSTUS80Prepare the Input Files80Gene Prediction Using FGENESH81Parameter Setting in FGENESH813-NGS高通量测序及参考基因组组装基因组注释-基因组编码潜力检测viii ContentsGene Prediction Using AUGUSTUS81Command-line(Genomefile+cDNAfile)82Output from AUGUSTUS82Discussion82References835Analysis of Repetitive Elements in the Genome86Lisui Bao and Zhanjiang LiuIntroduction86Methods Used in Repeat Analysis87Homology-Based Identification of Repeats87De Novo Identification of Repeats88Software for Repeat Identification88Using the Command-line Version of RepeatModeler to Identify Repetitive Elements in Genomic Sequences92Prerequisites92RepeatModeler Installation93Example Run93References946Analysis of Duplicated Genes and Multi-Gene Families98Ruijia Wang and Zhanjiang LiuIntroduction98Pipeline Installations100MCL-edge100MCscan100MCscanX100OrthoMCL100Identification of Duplicated Genes and Multi-Member Gene Family102MCL-edge102Prepare the Input Data102Identification of Duplication by MCL102Output Results102MCscan and MCscanX102Prepare the Input Data102Duplication Identification by MCscan Tools103Output Results103OrthoMCL104Prepare the Input Data104Trim the Input Data104Self-Blast104Transfer the Blast Result Format for OrthoMCL105Load the"SimilarSequences.txt"into the Database105Identify Similar Sequence Pairs from the Blast Result105Retrieve Similar Sequence Pairs from Database1055-基因组重复序列分析6-重复基因和多基因家族的分析Contents ixMCL-edge Clustering 105Format the Clustering Results (Name the Clusters as "ccat1","ccat2"…)105Output Results 105Calculate Ka/Ks by ParaAT 106Prepare the Input Data 106Calculate Ka/Ks 107Summary of the Ka/Ks Results 107Results 107Downstream Analysis 107Perspectives 108References 1087Dealing with Complex Polyploidy Genomes:Considerations for Assembly andAnnotation of the Common Carp Genome 110Peng Xu,Jiongtang Li and Xiaowen SunIntroduction 110Properties of the Common Carp Genome 111Genome Assembly:Strategies for Reducing Problems Caused byAllotetraploidy 112Reduce Genome Complexity:Gynogen as Sequencing Template 112Sequencing Strategies 113Genome Assembler Comparison and Selection 114Quality Control of Assembly 115Annotation of Tetraploidy Genome 115De Novo Gene Prediction 116Sequence-homology-based Prediction 116Transcriptome Sequencing 116Assessment and Evaluation 117Conclusions 118References 118Part IIBioinformatics Analysis of Transcriptomic Sequences 1238Assembly of RNA-Seq Short Reads into Transcriptome Sequences125Jun Yao,Chen Jiang,Chao Li,Qifan Zeng and Zhanjiang Liu Introduction 125RNA-Seq Procedures 125Reference-Guided Transcriptome Assembly 126De novo Transcriptome Assembly 129Assessment of RNA-Seq Assembly 130Conclusions 131Acknowledgments 132References 1327-多倍体基因组处理-鲤鱼基因组的组装与注释8-RNA-seq 短片段组装成转录组序列x Contents9Analysis of Differentially Expressed Genes and Co-expressed Genes Using RNA-Seq Datasets135Chao Li,Qifan Zeng,Chen Jiang,Jun Yao and Zhanjiang LiuIntroduction135Analysis of Differentially Expressed Genes Using CLC Genomics Workbench137Data Import137Mapping Reads to the Reference137Quantification of Gene Expression Value138Set Up an Experiment138Statistical Analysis for the Identification of DEGs138Analysis of Differentially Expressed Genes Using Trinity139Read Alignment and Abundance Estimation139Using RSEM to Estimate Expression Value139Using eXpress to Estimate Expression Value140Generating Expression Value Matrices140Identifying Differentially Expressed Transcripts140Extracting Differentially Expressed Transcripts141Analysis of Co-Expressed Genes141Network Construction143Module Detection145Relating the Modules to External Information146Computational Challenges147Acknowledgments148References14810Gene Ontology,Enrichment Analysis,and Pathway Analysis150 Tao Zhou,Jun Yao and Zhanjiang LiuIntroduction150GO and the GO Project150GO Terms151Ontology151Biological Process151Molecular Function151Cellular Component151Ontology Structure152GO Slim152Annotation153Electronic Annotation153Literature Annotation154Sequence-Based Annotation154GO Tools155AmiGO2155GOOSE157Blast2GO157Enrichment Analysis159Main Types of Enrichment Tools1599-利用RNA-Seq数据库分析差异表达基因和共表达基因10-基因本体、富集分析和通路分析Contents xi Gene Set and Background160Annotation Sources160Statistical Methods160Recommended Tools161Enrichment Analysis by Using DAVID161Gene Pathway Analysis163Definition of Pathway163Pathway Analysis Approaches164Over-representation Analysis(ORA)Approaches164Functional Class Scoring(FCS)Approaches164Pathway Topology(PT)–based Approaches164Pathway Databases165KEGG165Reactome165PANTHER165Pathway Commons165BioCyc166Pathway Analysis Tools166Ingenuity Pathway Analysis(IPA)166KEGG Pathway Mapping166PANTHER167Reactome Pathway Analysis167References16711Genetic Analysis Using RNA-Seq:Bulk Segregant RNA-Seq169Jun Yao,Ruijia Wang and Zhanjiang LiuIntroduction169BSR-Seq:Basic Considerations170BSR-Seq Procedures171Identification of SNPs171Identification of Significant SNPs173Bulk Frequency Ratio173Combined Bulk Frequency Ratio174Location of Genes with High BFR175Acknowledgments175References17612Analysis of Long Non-coding RNAs179Ruijia Wang,Lisui Bao,Shikai Liu and Zhanjiang LiuIntroduction179Data Required for the Analysis of lncRNAs182Assembly of RNA-Seq Sequences182Identification of lncRNAs184Length Trimming184Coding Potential Analysis184Coding Potential Calculator(CPC)185RNAcode18611-使用RNA-Seq进行遗传分析-批量隔离RNA-Seq12-长非编码RNA片段分析xii ContentsPhyloCSF188CPAT189Homology Search190Homology Protein Search191Homology Domain Search191ORF Length Trimming(Optional)193UTR Region Trimming194Analysis of lncRNA Expression194Analysis and Prediction of lncRNA Functions194Future Perspectives196References19613Analysis of MicroRNAs and Their Target Genes200Shikai Liu and Zhanjiang LiuIntroduction200miRNA Biogenesis and Function202Tools for miRNA Data Analysis202miRNA Identification203miRNA Hairpin Structure205miRNA Expression Profiling205miRNA Target Prediction207miRNA–mRNA Integrated Analysis208miRNA Analysis Pipelines210miRNA and Target Databases211miRNA Analysis:Computational Identification from Genome Sequences212Installing MapMi212Using MapMi213Using Genomic Sequences from Ensembl213Using Custom Genomic Sequences213Output File214miRNA Analysis:Empirical Identification by Small RNA-Seq215Procedures of Small RNA Deep Sequencing215Workflow of Data Analysis216miRNA Analysis Using miRDeep2217Package Installation217Using miRDeep2218Prediction of miRNA Targets222Package Installation222Using miRanda223Conclusions224References22514Analysis of Allele-Specific Expression228Yun Li,Ailu Chen and Zhanjiang LiuIntroduction228Genome-wide Approaches for ASE Analysis229Polymorphism-based Approach23013-MicroRNA及相应靶基因的分析14-等位基因特异性表达分析Contents xiii NGS-based Assays230Applications of ASE Analysis231Detection of Cis-and Trans-Regulatory Effects231Detection of Parent-of-Origin Effects233Considerations of ASE Analysis by RNA-Seq234Biological Materials234RNA-Seq Library Complexity235Genome Mapping Bias235Removing Problematic SNPs236Step-by-Step Illustration of ASE Analysis by RNA-Seq236Trimming of Sequencing Reads236Aligning the RNA-Seq Reads to the Reference236Obtaining STAR237Build a Genome Index237Mapping Reads to the Reference Genome237The Output Files237SNP Calling238SAMtools238VarScan238Filtering and Normalization239Quantification of ASE239Constructing a Pseudo-genome239Realigning the RNA-Seq Reads to the Pseudo-Genome239Counting the Allele ratios240Downstream Analysis240Evaluating Cis-and Trans-regulatory Changes240Detecting of Parental of Origin Effects240Validating of the ASE Ratio242References24315Bioinformatics Analysis of Epigenetics247Yanghua He and Jiuzhou SongIntroduction247Mastering Epigenetic Data248DNA Methylation248Bisulfite-based Methods250Enrichment-based Methods251Histone Modifications252Genomic Data Manipulation253R Language253Bioconductor253USCS Genome Bioinformatics Site254Galaxy255DNA Methylation and Bioinformatics255Demo of Analysis for DNA Methylation255Pre-processing255Mapping25715-表观遗传学的生物信息学分析xiv ContentsPeak-calling257DMR Identification257Gene Expression Analysis260DMR Annotation and Functional Prediction261Other Ways of Analysis for Methylation263Histone Modifications and Bioinformatics264Histone Modifications264Data Analysis265Perspectives265References266Part III Bioinformatics Mining and Genetic Analysis of DNAMarkers27516Bioinformatics Mining of Microsatellite Markers from Genomic and Transcriptomic Sequences277Shaolin Wang,Yanliang Jiang and Zhanjiang LiuIntroduction277Bioinformatics Mining of Microsatellite Markers278Sequence Resources for Microsatellite Markers278Microsatellite Mining Tools278Msatfinder279MIcroSAtellite Identification Tool(MISA)281Msatcommander282Imperfect Microsatellite Extractor(IMEx)282QDD282Primer Design for Microsatellite Markers284Primer Design284Selection of Primer Pairs284Conclusions285References28517SNP Identification from Next-Generation Sequencing Datasets288 Qifan Zeng,Luyang Sun,Qiang Fu,Shikai Liu and Zhanjiang LiuIntroduction288SNP Identification and Analysis289Quality Control of Sequencing Data289Alignment of Short Reads to the Reference Sequence291Processing of the Post-alignment File293SNP and Genotype Calling294Filtering SNP Candidates296SNP Annotation296Detailed Protocols of SNP Identification297Quality Control297Short Reads Alignment298Processing of the Post-alignment File29917-SNP鉴定从NGS高通量测序数据库16-生物信息学数据挖掘和微卫星标记从基因组和转录组序列Contents xv SNP Identification300GATK300SAMtools/BCFtools301Varscan302PoPoolation2302Which Software Should I Choose?303References30418SNP Array Development,Genotyping,Data Analysis,and Applications308 Shikai Liu,Qifan Zeng,Xiaozhu Wang and Zhanjiang LiuIntroduction308Development of High-density SNP Array311Marker Selection311Axiom myDesign Array Plates313SNP Array Design313Principles of Probe Design for the Differentiation of SNPs andIndels313Allele Naming Convention for Affymetrix Axiom GenotypingPlatform315DQC315Case Study:the Catfish250K SNP Array Design315SNP Genotyping:Biochemistry and Workflow316Axiom Genotyping Solution316Biochemistry and Workflow316SNP Genotyping:Analysis of Axiom Genotyping Array Data317Genotyping Workflow317Step1:Group Samples into Batches317Step2:Perform a Sample QC318Step3:Perform First-pass Genotyping of the Samples319Step4:Perform Sample Filtering Post-genotyping319Step5:Perform Second-Pass Genotyping319Genotyping Analysis Software319Genotyping Analysis Reference Files320Genotyping Using GTC321Software Installation321Starting GTC321Genotyping with GTC322Genotyping Using APT323Software Installation323Genotyping with APT323SNP Analysis After Genotype Calling324View SNP Cluster Plots325Metrics Used for SNP Post-processing325Perform SNP Filtering Using SNPolisher327Software Installation328Using SNPolisher328Applications of SNP Arrays33218-SNP阵列的开发-基因分型-数据分析和应用xvi ContentsGenome-Wide Association Studies332Analysis of Linkage Disequilibrium333Population Structure and Discrimination333Genomic Signatures of Selection and Domestication333Conclusion334Further Readings334References33419Genotyping by Sequencing and Data Analysis:RAD and2b-RAD Sequencing338Shi Wang,Jia Lv,Jinzhuang Dou,Qianyun Lu,Lingling Zhang and Zhenmin BaoIntroduction338Methodology Principles339RAD3392b-RAD340The Experimental Procedure of2b-RAD341DNA Input342Restriction Enzyme Digestion343Adaptor Ligation343Adaptor Preparation343Ligation Reaction344PCR Amplification and Gel Purification344Barcode Incorporation345Bioinformatics Analysis of RAD and2b-RAD Data347Overview347Reference-based and De Novo Analytical Approaches347Codominant and Dominant Genotyping348Codominant Genotyping348Dominant Genotyping349Usage Demonstration350Example for Running a Linkage Mapping Analysis351The Benefits and Pitfalls of RAD and2b-RAD Applications353References35420Bioinformatics Considerations and Approaches for High-Density Linkage Mapping in Aquaculture356Yun Li,Shikai Liu,Ruijia Wang,Zhenkui Qin and Zhanjiang LiuIntroduction356Basic Concepts358Principles of Genetic Mapping358Linkage Phase359The LOD Score360Mapping Function360Requirements for Genetic Mapping361Polymorphic Markers36119-通过测序和数据分析进行基因分型分析-RAD和2b-RAD 测序20-高密度连锁图的生物信息学分析和方法在水产养殖Contents xvii Genotyping Platforms362Reference Families362Linkage Mapping Software362Linkage Mapping Process365Data Filtering365Assigning Markers into Linkage Groups366Ordering Markers Within Each Linkage Group366Step-by-Step Illustration of Linkage Mapping367JoinMap367Obtaining JoinMap368Input Data Files368OneMap370MergeMap372Getting MergeMap373Input Files373Output Files373MapChart373Getting MapChart373Pros and Cons of Linkage Mapping Software Packages375References37521Genomic Selection in Aquaculture Breeding Programs380Mehar S.KhatkarIntroduction380Genomic Selection380Steps in GS382Preparation of Reference Population382Development of Prediction Equations382Validation of Prediction Equations382Computing GBVs of Selection/Test Candidates382Selection and Mating382Models for Genomic Prediction383An Example of Implementation of Genomic Prediction384Some Important Considerations for GS385How Many Animals Need to Be Genotyped?385How Many SNPs Are Enough?386What Is the More Important Factor for Genomic Predictions,the Number of Individuals or the Number of SNPs?387Is Prediction Across Breeds/Population Possible?387Do We Need Knowledge About Genes and Gene Functions?387Does Accuracy Decline Over Generations?387Would Inbreeding Increase by Using GS?387GS in Aquaculture388Acknowledgment389References38921-水产养殖育种中的基因遗传选择xviii Contents22Quantitative Trait Locus Mapping in Aquaculture Species:Principles and Practice392Alejandro P.Gutierrez and Ross D.HoustonIntroduction392Selective Breeding in Aquaculture392QTL Mapping in Aquatic Species393Applications of Genomic Technology394DNA Markers and Genotyping395Microsatellites395SNPs395Genotyping by Sequencing396Linkage Maps397Quantitative Trait Loci(QTL)Mapping398Traits of Importance399Mapping Populations400QTL Mapping Methods401Software for QTL Analysis402QTL Mapping Example402Future Directions and Applications of QTL406References40723Genome-wide Association Studies of Performance Traits415Xin Geng,Degui Zhi and Zhanjiang LiuIntroduction415Study Population416Samples from Natural Population417Samples from Family-Based Population418Phenotype Design419Power of Association Test and Sample Size419Quality Control Procedures420LD Analysis420Association Test421Genomic Control421PCA422Linear Mixed Models422Transmission Disequilibrium Test and Derivatives423Significance Level for Multiple Testing424Step-by-Step Procedures:A Case Study in Catfish424Description of the Experiment425Phenotyping425Data Input for PLINK425QC425LD-based SNP Pruning426Family-based Association Tests for Quantitative Traits(QFAM)426 22-QTL Mapping在水产物种的应用:原理和实践23-GWAS分析-生产性状的全基因组关联研究Contents xix Follow-up Work after GWAS427Validation427Fine Mapping427Functional Confirmation of Implicated Molecular Mechanisms427Pitfalls of GWAS with Aquaculture Species428Comparison of GWAS with Alternative Designs428Linkage-based QTL Mapping428Bulk Segregant Analysis429Conclusions430References43024Gene Set Analysis of SNP Data from Genome-wide AssociationStudies434Shikai Liu,Peng Zeng and Zhanjiang LiuIntroduction434GSA in GWAS436Preprocessing Data and Defining the Gene Sets436Formulating a Hypothesis436Constructing Corresponding Statistical Tests437Assessing the Statistical Significance of the Results437Statistical Methods437Single-SNP Analysis437Set-based Tests438LKM Regression Approach438ARTP Method440GSEA-based Approach441Demonstration Using Alzheimer’s Disease Neuroimaging Initiative’s ADData442Data Information442Data Analysis:General Strategy443Data Analysis:Protocol and Codes444Standard GWAS Using Plink444Set-based Test Using Plink445LKM Method Using SKAT,an R Package445Adaptive Rank Truncated Product Method Using an R Package,ARTP446Results of Single-SNP Analysis448Results of Set-based SNP Tests449Results of LKM-based Test449Results of ARTP Statistic451Results of GSEA-based SNP-Set Analysis452Comparison of the GSA Methods454Conclusion455References45624-GWAS分析-SNP数据的基因集分析xx ContentsPart IV Comparative Genome Analysis46125Comparative Genomics Using CoGe,Hook,Line,and Sinker463 Blake Joyce,Asher Baltzell,Matt Bomhoffand Eric LyonsIntroduction463Getting Hooked into the CoGe Platform467Logging in to CoGe467Navigating CoGe Using MyProfile467Privately Sharing Genomes or Experiments in CoGe468Making a Genome or Experiment Publicly Available468Data Management in CoGe468Tracking History and Progress in CoGe468Finding Genomes Already Present in CoGe469Loading Genome(s)into CoGe469Adding Data from the CyVerse Data Store472Uploading from an FTP/HTTPS Site473Uploading Directly to CoGe from Local Storage473Using the NCBI Loader473Organizing Genomes Using Notebooks474Loading Genome Annotation475Casting the Line:Analyses for Comparing Genomes475Running SynMap to Generate Syntenic Dot Plots477Using the Syntenic Path Assembly Algorithm in SynMap477Seek Synteny and Ye Shall SynFind479Visualizing Genome Evolution Using GEVo480CoGe Phylogenetics:Tree Branches and Lines for Something Other than Making a Rod480BLAST at CoGe482Phylogeny.fr:The Trees in France Are Lovely This Time of Year483 Sinkers to Cast Further and Deeper:Adding Weight to Genomes with Additional Data Types483Using LoadExperiment to Include Other Data Types484Viewing Genomes and Experiments in GenomeView484SNP Analysis of BAM Alignment Experiment Datasets484Conclusions485References486Part V Bioinformatics Resources,Databases,and GenomeBrowsers48926NCBI Resources Useful for Informatics Issues in Aquaculture491 Zihao Yuan,Yujia Yang,Shikai Liu and Zhanjiang LiuIntroduction491Popularly Used Databases in NCBI492PubMed492Entrez Nucleotide492Entrez Genome49325-比较基因组学-用CoGe,Hook,Line和Sinker26-NCBI数据库在水产上的应用Contents xxiSequence Read Archive 493dbSNP Database 494Gene Expression Omnibus 495UniGene 496Probe 497Conserved Domain Database 497Popularly Used Tools in NCBI 498BLAST 498ORF Finder 500Splign 500Map Viewer 501GEO DataSets Data Analysis Tools 501Conserved Domain Search Service (CD-Search)502CDART 502Submit Data to NCBI 503Submission of Nucleotide Sequences Using BankIt 503Submission of Short Reads to SRA 503Create a BioProject 504Create a BioSample Submission 504Prepare Sequence Data Files 505Enter Metadata on SRA Web Site 505Transfer Data Files to SRA 506References 50627Resources and Bioinformatics Tools in Ensembl 508Yulin Jin,Suxu Tan,Jun Yao and Zhanjiang LiuIntroduction 508Ensembl Resources 512Genome Browsers 512Gene Annotation 512Variation Annotation 513Gene-based Views 514Transcript-based Views 515Location-based Views 515Variation-based View 515Comparative Genomics 516Genomic Alignments 517Phylogenetic Trees 518Gene Tree 518Gene Gain/Loss Tree 518Orthologs and Paralogs 519Ensembl Families 519Whole Genome Alignments 519Ensembl Regulation 521Ensembl Tools 522Variant Effect Predictor (VEP)522BLAST/BLAT 52227-Ensembl 数据库的相关资源和生物信息学工具。
单细胞转录组测序maker基因英文
单细胞转录组测序maker基因英文Single-cell transcriptomics is a powerful techniquethat allows researchers to study gene expression patterns at the level of individual cells. This technology has revolutionized our understanding of cellular heterogeneity and has the potential to provide valuable insights into various biological processes and diseases. To analyze single-cell transcriptomic data and gain meaningful biological insights, it is essential to have accurate and comprehensive gene annotations. This is where the Maker gene prediction tool comes into play.Maker is a widely used gene annotation pipeline that integrates evidence-based gene prediction methods to accurately identify protein-coding genes in a genome. It combines multiple sources of evidence, such as protein homology, RNA-seq data, and ab initio gene predictions, to generate high-quality gene annotations. In the context of single-cell transcriptomics, Maker can be used to predict genes from the transcriptomic data obtained from individualcells.One of the major requirements for using Maker insingle-cell transcriptomics is the availability of a reference genome. The reference genome serves as a template for gene prediction and provides the necessary genomic context for accurate annotation. The quality of the reference genome is crucial, as any errors or gaps in the genome assembly can lead to incorrect gene predictions. Therefore, it is important to ensure that the reference genome used for Maker gene prediction is of high quality and well-annotated.In addition to a reference genome, another requirement for using Maker in single-cell transcriptomics is the availability of transcriptomic data. Single-cell RNA sequencing (scRNA-seq) is commonly used to generate transcriptomic data from individual cells. This data provides information about the expression levels of genes in each cell and can be used as evidence for gene prediction. Maker can integrate this transcriptomic data with other sources of evidence to improve the accuracy ofgene annotations.Furthermore, it is important to consider the computational resources required for running Maker onsingle-cell transcriptomic data. Single-cell transcriptomics generates large amounts of data, and analyzing this data using Maker can be computationally intensive. High-performance computing resources andefficient algorithms are necessary to handle the computational demands of gene prediction in single-cell transcriptomics. Additionally, the analysis pipeline should be optimized to handle the unique characteristics ofsingle-cell transcriptomic data, such as high levels of technical noise and low RNA capture efficiency.Another important consideration when using Maker in single-cell transcriptomics is the validation of the predicted gene annotations. While Maker integrates multiple sources of evidence to generate gene predictions, it isstill prone to false positives and false negatives. Therefore, it is crucial to validate the predicted gene annotations using independent experimental methods, such asqPCR or in situ hybridization. This validation step ensures the accuracy and reliability of the gene annotations and provides confidence in the downstream analysis of the single-cell transcriptomic data.In conclusion, the use of Maker in single-cell transcriptomics requires a high-quality reference genome, transcriptomic data, computational resources, andvalidation of the predicted gene annotations. By meeting these requirements, researchers can leverage the power of Maker to accurately annotate genes in single-cell transcriptomic data and gain valuable insights intocellular heterogeneity and biological processes.。
生物信息学中常见数据处理方法总结
生物信息学中常见数据处理方法总结随着高通量测序技术的发展,生物信息学在生命科学研究中扮演着愈发重要的角色。
生物信息学旨在处理、分析和解释生物学数据,以便从海量的生物信息中挖掘出有意义的知识。
在这个领域中,有许多常见的数据处理方法被广泛应用,下面将对其中一些方法进行总结。
1. 序列比对(Sequence Alignment)序列比对是生物信息学中最常见的数据处理方法之一。
它主要用于比较两个或多个生物序列的相似程度。
比对的目标包括DNA,RNA和蛋白质序列。
序列比对方法的核心在于寻找两个序列之间的匹配模式和不匹配位置,并计算其相似度评分。
常用的序列比对算法有Smith-Waterman算法和Needleman-Wunsch算法。
2. 基因组组装(Genome Assembly)基因组组装是将碎片化的DNA序列重新拼接成完整基因组的过程。
由于基因组非常庞大且复杂,从现有的测序数据中恢复出完整基因组是一项巨大的挑战。
基因组组装方法通常依赖于测序技术的不同,包括De Bruijn图方法、重叠-布局-一致性(Overlap-Layout-Consensus)方法和引导组装方法等。
3. RNA测序分析(RNA-seq Analysis)RNA测序分析是分析转录组数据的一种方法。
它可以帮助研究者了解转录过程中的基因表达和调控机制。
RNA-seq分析通常包括数据质量控制、对原始序列进行去除低质量序列和适配体序列、比对到参考基因组、计算基因表达量以及差异表达基因分析等步骤。
4. 蛋白质结构预测(Protein Structure Prediction)蛋白质结构预测是根据蛋白质的氨基酸序列推断其三维结构的过程。
蛋白质结构预测对于了解蛋白质的功能和相互作用机制至关重要。
通过生物信息学方法,可以预测蛋白质的二级结构、三级结构和蛋白质相互作用等信息。
常用的蛋白质结构预测方法包括模板比对、蛋白质分子动力学模拟和聚类分析等。
微生物次级代谢产物合成基因簇预测分析
NRPS A domain
specificities are predicted using both the signature sequence method and the support-vector machinesbased method of NRPSPredictor2
24
analysis of secondary metabolism gene
nonribosomal peptides 非核糖体肽(NRP) bacteriocins 细菌素 aminocoumarins 基香豆素
butyrolactones 丁内酯
terpenes 萜烯 beta-lactams β-内酰胺 siderophores 铁载体
indoles 吲哚类
lantibiotics 羊毛硫抗生素
3
次级代谢产物简介
次级代谢产物:微生物生长到一定阶段才产生的化学结构十分复 杂、对该生物无明显生理功能,或并非是微生物生长和繁殖所必 需的物质 主要来源:放线菌、真菌等
Zwittermicin A
4
次级代谢产物简介
polyketides (type I) 聚酮(PK) polyketides (type II) polyketides (type III)
gene prediction by Glimmer3 (prokaryotic data) or GlimmerHMM
(eukaryotic data) Transform the predicted results to EMBL format
21
Detection of gene clusters
微生物次级代谢产物及其生物 合成基因簇预测分析
序列分析(六)一一Prediction
例:待研究的两类数据exon:100 条; intron:120 条,预测结果如下表:
True
Positive (exon) TP (90)
Negative (intron)
TN (100)
False
FP (20)ຫໍສະໝຸດ FN (10)Sn=90/(90+10)=90.0%; Sp=100/(100+20)=83.3%; Ac=(90+100)/(100+120)=86.4%; MCC=73.0%。
2. 特异性 (Specificity, Sp): 对于阴性的数据,能够预 测成“假”的比例是多少;
3. 准确性 (Accuracy, Ac): 对于整个数据集(包括阳性 和阴性数据),预测总共的准确比例是多少;
4. 马修相关系数(Mathew correlation coefficient, MCC): 当阳性数据的数量与阴性数据的数量差别较大 时,能够更为公平的反映预测能力,值域[-1, 1];
三、预测算法简介
1. Position Weight Matrix (PWM) 2. Fisher Discriminant 3. Distance 4. Support Vector Machine (SVM)
1. 位置权重矩阵
一般适合具有相同长度的序列,常用于TFBS 的寻找或序列模体的研究。
过训练问题:几乎所有的预测模型都存在一定的过训 练问题。 1. 根据已知数据构建的模型只能很好的适用于训练数 据; 2. 不适合用来预测; 3. 对训练数据的微小改变对于预测性能影响过大。 4. 只能很好的预测训练数据,对新数据则性能很差。
实现预测所需的两大要素:
1. 研究对象的特征选取(参数选取) 例如: 基因预测参数:condon使用频率,ORF的长度等; 蛋白质预测参数:氨基酸的频率,n肽组分
第九章基因功能研究常用方法
第九章基因功能研究常用方法基因功能研究是生物学研究中的重要部分,通过研究基因的功能和表达方式,可以揭示基因在生物体发育、生长和疾病发生等方面的关键作用。
为了实现对基因功能的深入了解,科学家们发展了各种基因功能研究的方法。
以下将介绍一些常用的基因功能研究方法。
1. 基因敲除(Knockout):这是一种研究基因功能的重要方法。
通过CRISPR/Cas9等技术将目标基因的部分或全部序列剔除,使其无法表达,观察敲除后的生物体表现出的表型变化。
这种方法可用于验证基因的功能,发现其在生物体中的作用。
2. 基因突变(Mutation):通过诱发基因突变或筛选已有基因突变体,研究基因的功能和表达方式。
其中,随机突变(例如化学物质诱变)和目标突变方法(例如诱导突变)是常用的策略。
研究基因突变体可以揭示基因对于生物体正常发育和功能的影响。
3. 基因过表达(Gene Overexpression):通过将目标基因插入表达载体并导入生物体,使基因在生物体中过度表达。
观察过度表达基因后生物体的表型变化,可以了解基因过度表达对生物体的影响。
此外,过度表达基因还可用于验证一些基因在特定条件下的功能和路径。
4. 基因沉默(Gene Silencing):通过RNA干扰(RNAi)或转座子的反义RNA,使目标基因的转录或翻译过程受到阻碍。
基因沉默可用来研究基因造成的表型变化以及调控基因的功能。
5. 基因共表达(Gene Co-expression):通过分析大规模基因表达数据,探索基因间的共表达关系。
通过比较共表达基因的功能和通路,可以发现基因的相互关联及其在生物体中的功能。
6. 基因互作(Gene Interaction):通过分析基因间的物理相互作用和遗传相互作用关系,了解基因在调控和相互影响方面的作用。
这种方法对于揭示基因网络调控和疾病发生机制很有帮助。
7. 基因转移(Gene Transfer):将外源基因导入目标细胞或生物体,以研究基因功能和转录调控。
2020年(生物科技行业)生物技术词汇
(生物科技行业)生物技术词汇AA chain 重链zhòngliànaa (amino acid) 氨基酸ānjīsuānab (antibody) 抗体kàngtǐab initio gene prediction 从头开始基因预报cóngtóukāishǐjīyīnyùbàoabiogenesis 无生源论, 自然发生论wúshēngyuánlùn,zìránfāshēnglùnabiotic 非生物的, 无生命的fēishēngwùde,wúshēngmìngd eabiotic stress 非生物应激fēishēngwùyìngjīablation 切除, 摘除qiēchú, zhāichúabortive complex 无效复合物wúxiàofùhéwùabortive transduction 流产转导liúchǎnzhuǎndǎoabrin 红豆碱hóngdòujiǎnabscisic acid 脱落酸tuōluòsuānabsolute configuration 绝对构型juéduìgòuxíng absorbance 光密度, 吸收率guāngmìdù, xīshōul ǜabsorption 吸收作用xīshōuzuòyòngabzyme 抗体酶kàngtǐméiaccelerated maturation 催熟, 人工老熟cuīshú, réngōnglǎoshúaccelerator globulin 促血凝球蛋白cùxiěníngqiúdànbáiaccelerin 促血凝球蛋白cùxiěníngqiúdànbái acceptable daily intake 可接受日摄入量kějiēshòurìshèrùliàngacceptable level of risk 可接受风险程度kějiēshòufēngxiǎnchéngdùacceptor 接纳体jiēnàtǐacceptor control 受体控制shòutǐkòngzhìacceptor junction site 接纳连接位点jiēnàliánjiēwèidiǎnacceptor site 接纳体部位jiēnàtǐbùwèi accession 发作, 侵袭fāzuò, qīnxíaccessory cell 辅助细胞, 佐细胞fǔzhùxìbāo, zuǒxìbāoaccessory chromosome 辅助染色体, 副染色体fǔzhùrǎnsètǐ, fùrǎnsètǐaccident 事故, 意外shìgù, yìwàiaccidental release 事故性排放shìgùxìngpáifàng acclimatization 驯化作用xúnhuàzuòyòng accumulation 积累, 累积jīlěi, lěijīacellular 无细胞的wúxìbāodeacentric 无着丝粒的wúzháosīlìdeacetic acid bacteria 醋酸菌, 乙酸菌cùsuānjùn, y ǐsuānjùnacetolactate synthase 乙酰乳酸合酶yǐxiānrǔsuānhéméiacetone 丙酮bǐngtóngacetyl carnitine 乙酰肉碱yǐxiānròujiǎnacetylcholine 乙酰胆碱yǐxiāndǎnjiǎnacetylcholinesterase acid 乙酰胆碱酯酶酸yǐxiāndǎnjiǎnzhǐméisuānacetyl-CoA 乙酰辅酶A yǐxiānfǔméiAacetyl-CoA carboxylase 乙酰辅酶A羧化酶yǐxiānfǔméi A suōhuàméiacetyl-coenzyme A 乙酰辅酶A yǐxiānfǔméi Aacetylspiramycin 乙酰螺旋霉素yǐxiānluóxuánméisùachromic point 消色点xiāosèdiǎnacid-fast stain 抗酸染色kàngsuānrǎnsèacid mucopolysaccharide 酸性黏多糖suānxìngniánduōtángacid phosphatase 酸性磷酸酶suānxìnglínsuānméiacid protease 酸性蛋白酶suānxìngdànbáiméiacid saccharification process 酸糖化法suāntánghuàfǎAcidic Fibroblast Growth Factor 酸性成纤维细胞生长因子suānxìngchéngxiānwéixìbāoshēngzhǎngyīnzǐacidophilic 嗜酸性的shìsuānxìngdeacidosis 酸中毒suānzhòngdúacidotropic 向酸性xiàngsuānxìngaclacinomycin 阿克拉霉素ākèlāméisùaconitase 顺乌头酸酶, 乌头酸酶shùnwūtóusuānméi,wūtóusuānméiaconitine 乌头碱wūtóujiǎnaconta 矢牙形石属藻shǐyáxíngshíshǔzǎoacoustic gene transfer 声频基因转移法shēngpínjīyīnzhuǎnyífǎAcquired Immune Deficiency Syndrome (AIDS)获得性免疫缺陷综合征huòdéxìngmiǎnyìquēxiànzōnghézhēngacquired mutation 获得性突变huòdéxìngtūbiànacquired tolerance 获得性耐受huòdéxìngnàishòuacrasin 聚集素jùjísùacridine 吖啶ādìngacridine orange 吖啶橙ādìngchéngacriflavine 吖啶黄ādìnghuángacrolein 丙烯醛bǐngxīquánacrosin 顶体酶dǐngtǐméiacrylamide 丙烯酰胺bǐngxīxiānànacrylamide gel 丙烯酰胺凝胶bǐngxīxiānànníngjiāoacrylic acid 丙烯酸bǐngxīsuānactidione 放线菌酮fàngxiànjùntóngactin 肌动蛋白jīdòngdànbáiactinin 辅肌动蛋白fǔjīdòngdànbáiActinomycetes 放线菌fàngxiànjùnactinomycin 放线菌素fàngxiànjùnsùactinomyosin 肌动球蛋白jīdòngqiúdànbáiactinospectacin 放线壮观素, 奇放线菌素fàngxiànzhuàngguànsù, qífàngxiànjùnsùactivated biofilter 活性生物滤池huóxìngshēngwùlǜchíactivated carbon 活性炭huóxìngtànactivated charcoal 活性炭huóxìngtànactivated sludge process 活性污泥法huóxìngwūnífǎactivation energy 活化能huóhuànéngactivation hormone 激活激素jīhuójīsùactivator 活化物, 激活剂huóhuàwù, jīhuójìactivator protein (AP) 激活蛋白jīhuódànbáiEnglish-Chinese80active biomass 活性生物量huóxìngshēngwùliàngactive center 活性中心huóxìngzhōngxīnactive-enzyme centrifugation 活性-酶离心法huóxìngméilíxīnfǎactive immunity 主动免疫zhǔdòngmiǎnyìactive immunization 主动免疫反应zhǔdòngmiǎnyìfǎnyìngactive ingredient 活性成分, 有效成分huóxìngchéngfèn, yǒuxiàochéngfènactive site 活性部位huóxìngbùwèiactive transport 主动转运zhǔdòngzhuǎnyùnactivin 活化素, 激活素huóhuàsù, jīhuósùactivity coefficient 活性系数huóxìngxìshùactobindin 肌动蛋白结合蛋白jīdòngdànbáijiéhédànbáiactomyosin 肌动球蛋白jīdòngqiúdànbáiacute exposure 急性曝露jíxìngpùlùacute illness 急性病jíxìngbìngacute-phase protein 急相蛋白, 急性期蛋白jíxiāngdànbái, jíxìngqīdànbáiacute toxicity 急性毒性jíxìngdúxìngacute transfection 急性转染jíxìngzhuǎnrǎnacycloguanosine 羟基乙氧甲基鸟嘌呤核苷, 无环鸟苷qiǎngjīyǐyǎngjiǎjīniǎopiàolìnghégān, wúhuánniǎogānacyclovir 羟基乙氧甲基鸟嘌呤核苷, 无环鸟苷qiǎngjīyǐyǎngjiǎjīniǎopiàolìnghégān, wúhuánniǎogānacyl carrier protein 酰基载体蛋白xiānjīzàitǐdànb áiacylcarnitine transferase 酰基肉碱转移酶xiānjīròujiǎnzhuǎnyíméiadamalysin 坚固蛋白酶jiāngùdànbáiméi adamantane 金刚烷jīngāngwánadaptation 适应shìyìngadaptation traits 适应性性状shìyìngxìngxìngzhuàngadaptin 适配素, 衔接蛋白shìpèisù, xiánjiēdànb áiadaptive enzyme 适应酶shìyìngméi adaptive immunity 获得性免疫, 适应性免疫huòdéxìngmiǎnyì, shìyìngxìngmiǎnyìadaptive radiation 适应辐射shìyìngfúshèadaptive response 适应反应shìyìngfǎnyìng adaptive zone 适应区带shìyìngqūdài adaptor 连接物, 适配器liánjiēwù, shìpèiqìadditive 添加剂, 添加物tiānjiājì, tiānjiāwùadditive effect 加性效应jiāxìngxiàoyìng additive genes 加性基因jiāxìngjīyīnadditive recombination 插入重组chārùchóngz ǔaddress 地址码dìzhǐmǎaddress sequence 地址序列, 地址域dìzhǐxùliè, dìzhǐyùadducin 内收蛋白nèishōudànbáiadenine 腺嘌呤xiànpiàolìngadenosine 腺苷xiàngānadenosine diphosphate 二磷酸腺苷èrlínsuānxi àngānadenosine monophosphate 一磷酸腺苷yīlínsuānxiànadenosine triphosphate 腺苷三磷酸酶xiàngānsānlínsuānméiadenovirus 腺病毒xiànbìngdúadenylate 腺苷酸, 腺嘌呤核苷酸xiàngānsuān, xiànpiàolìnghégānsuānadenylate cyclase 腺苷酸环化酶xiàngānsuānhuánhuàméiadenylic acid 腺苷酸, 腺苷一磷酸xiàngānsuān, xiàngānyīlínsuānadequate intake 足够摄入量zúgòushèrùliàng adhesion molecule 黏附分子niánfùfēnzǐadhesion protein 黏着蛋白, 吸附蛋白niánzháodànbái,xīfùdànbáiadipocyte 脂肪细胞zhīfángxìbāoadipose 动物脂肪dòngwùzhīfángadipsin 脂肪细胞蛋白酶zhīfángxìbāodànbáiméiadjuvant 佐剂zuǒjìadoptive cellular therapy 过继细胞疗法guòjìxìbāoliáofǎadoptive immunization 过继免疫guòjìmiǎnyìadrenal cortical hormone 肾上腺皮质激素shènxiànpízhìjīsùadrenergic receptor 肾上腺素能受体shènshàngxiànsùnéngshòutǐadrenocorticotropic hormone 促肾上腺皮质素cùshènshàngxiànpízhìsùadsorption 吸附xīfùadult stem cell 成人干细胞chéngréngānxìbāoadventitious 不定的, 偶生的bùdìngde, ǒushēngdeadverse effect 反效果, 副作用fǎnxiàoguǒ, fùzuòyòngadverse reaction 逆反应nìfǎnyìngaequorin 水母发光蛋白shuǐmǔfāguāngdànbáiaeration 通风, 通气tōngfēng, tōngqìaeration basin 曝气池pùqìchíaeration process 曝气法pùqìfǎaerobe 需氧生物xūyǎngshēngwùaerobic 好气的, 需氧的hǎoqìdeaerobic bacteria 需氧菌xūyǎngjùnaerobic microbe 好氧微生物hǎoyǎngwēishēngwùaerobic reactor 需氧反应器xūyǎngfǎnyìngqìaerobic respiration 需氧呼吸xūyǎnghūxīaerobic treatment 好氧处理, 生物接触氧化法hǎoyǎngchǔlǐ, shēngwùjiēchùyǎnghuàfǎaerodynamic 流线型的liúxiànxíngdeaerolysin 气单孢菌溶素qìdānbāojùnróngsùAeromonas 气单胞菌属qìdānbāojùnshǔaerosol 气溶胶qìróngjiāoaerosol particle 气雾粒子qìwùlìzǐaerosol spray 气雾喷射qìwùpēnshèaerosolization 气雾化喷洒qìwùhuàpēnsǎaerosolize(d) 气雾化qìwùhuàaerosolized biological agent 气雾化生物制剂qìwùhuàshēngwùzhìjìaerosolizing 气雾化qìwùhuàafebrile 不发烧的, 无热度的bùfāshāode, wúrèdùdeEnglish-Chinese81affected party 受影响方shòuyǐngxiǎngfāngaffinity 亲和, 亲和性qīnhé, qīnhéxìngaffinity chromatography 亲和层析qīnhécéngxīaffinity tag 亲和标签qīnhébiāoqiānaflatoxin 黄曲霉毒素huángqǔméidúsùagar 冻粉, 琼胶, 琼脂, 洋菜dòngfěn, qióngjiāo,qióngzhī, yángcàiagar-diffusion method 琼脂扩散法qióngzhīkuòsànfǎagarase 琼脂糖酶qióngzhītángméiagarose 琼脂糖qióngzhītángagarose gel electrophoresis 琼脂糖凝胶电泳qióngzhītángníngjiāodiànyǒngage group 年龄群, 年龄组niánlíngqún, niánlíngzuagent 剂, 介质jì, jièzhìagglomeration 附聚作用fùjùzuòyòngagglutination 凝集反应níngjífǎnyìngagglutinin 凝集素níngjísùagglutinogen 凝集原níngjíyuánaggregation 聚集, 群聚jùjí, qúnjùaggressin 攻击素gōngjīsùaging 陈酿, 老化chénniàng, lǎohuàaglycon(e) 糖苷配基tánggānpèijīagnotobiotic culture 未知生物培养wèizhīshēngwùpéiyǎngagonists 激动剂, 兴奋剂jīdòngjì, xīngfènjìagricultural 农业的nóngyèdeagricultural biodiversity 农业生物多样性nóngyèshēngwùduōyàngxìngagricultural biological diversity 农业生物多样性nóngyèshēngwùduōyàngxìngagrobacterium 土壤杆菌tǔrǎnggānjùnagrobacterium tumefaciens 根癌农杆菌, 根癌土壤杆菌gēnáinónggānjùn, gēnáitǔrǎnggānjùnagrobiodiversity 农业生物多样性nóngyèshēngwùduōyàngxìngagrobiotechnology 农业生物技术nóngyèshēngwùjìshùagroecology 农业生态学nóngyèshēngtàixuéagroforestry 农林学nónglínxuéagropine 农杆氨酸nónggānānsuānAIDS (Acquired Immune Deficiency Syndrome)艾滋病àizībìngair 空气kōngqìairborne pathogen 空气病原体, 空气传播的病原体kōngqìbìngyuántǐ, kōngqìchuánbōdebìngyuántǐairborne precaution 预防空气传播yùfángkōngqìchuánbōairbrush 喷枪, 气笔pēnqiāng, qìbǐairplane 飞机fēijīairway 导气管, 气道dǎoqìguǎn, qìdào alamethicin 丙甲菌素bǐngjiǎjùnsùalanine 丙氨酸bǐngānsuānalanine aminotransferase 丙氨酸转氨酶bǐngānsuānzhuǎnānméialarmone 报警素bàojǐngsùalbumin 白蛋白, 清蛋白báidànbái, qīngdànbái alcalase 枯草杆菌蛋白酶kūcǎogānjùndànbáiméialcohol dehydrogenase 乙醇脱氢酶yǐchúntuōqīngméialcohol oxidase 乙醇氧化酶yǐchúnyǎnghuàméi aldehyde 醛quánaldolase 醛缩酶quánsuōméialdose 醛糖quántángaldosterone 醛固酮, 醛甾酮quángùtóng, quánz āitóngaleurone 糊粉húfěnalgae 海藻, 藻类hǎizǎo, zǎolèialgal toxin 藻毒素zǎodúsùalgicide 杀藻剂shāzǎojìalgin 海藻酸钠hǎizǎosuānnàalginate 海藻酸盐hǎizǎosuānyánalginic acid 海藻酸, 褐藻酸hǎizǎosuān, hézǎosuānalgorithm 公式, 算法gōngshì, suànfǎalien species 外来物种, 异种wàiláiwùzhǒng, yìzhǒngaliesterase 脂族酯酶zhīzúzhǐméialkali 碱, 强碱jiǎn, qiángjiǎnalkaline 碱性的jiǎnxìngdealkaline hydrolysis 碱水解jiǎnshuǐjiěalkaline phosphatase 碱性磷酸酶jiǎnxìnglínsuānméialkaloid 生物碱shēngwùjiǎnalkannin 阿香草红, 紫草红, 紫草素āxiāngcǎohóng,zǐcǎohóng, zǐcǎosùallantoin 尿囊素niàonángsùallele 等位基因děngwèijīyīnallelic exclusion 等位基因排斥děngwèijīyīnpáichìallelopathy 异株克生, 植物间毒素抑制yìzhūkèsh ēng, zhíwùjiāndúsùyìzhìallelozyme 等位基因酶děngwèijīyīnméiallergy 变态反应, 过敏反应biàntàifǎnyìng, guòmǐnfǎnyìngallicin 大蒜素dàsuànsùalloantigen 同种抗原tóngzhǒngkàngyuánallogeneic 同种异体的, 异基因的tóngzhǒngyìtǐde,yìjīyīndeallograft 同种移植, 同种异体移植tóngzhǒngyízhí,tóngzhǒngyìtǐyízhíalloisoleucine 别异亮氨酸biéyìliàngānsuānallolactose 异乳糖yìrǔtángallometry 比速增长, 异速生长bǐsùzēngzhǎng,yìsùshēngzhǎngallomone 异源外激素, 种间信息素yìyuánwàijīsù,zhǒngjiānxìnxīsùallopatric 分布区不重叠的fēnbùqūbùchóngdiédeallopatric speciation 分布区不重叠的物种形成fēnbùqūbùchóngdiédewùzhǒngxíngchéngallopurinol 别嘌呤醇biépiàolìngchúnallosteric enzyme 变构酶biàngòuméiallosteric site 变构部位, 别构部位biàngòubùwèi,biégòubùwèiEnglish-Chinese82allosterism 变构biàngòuallostery 变构性biàngòuxìngallotype 同种异型tóngzhǒngyìxíngallotypic monoclonal antibodies 同种异型抗体tóngzhǒngyìxíngkàngtǐalloxan 阿脲, 四氧嘧啶āniào, sìyǎngmìdìngallozyme 等位基因酶, 异型酶děngwèijīyīnméi,yìxíngméialpha amylase 糊精生成酶hújīngshēngchéngméiAlphavirus 甲病毒属jiǎbìngdúshǔALS gene 抗淋巴细胞血清基因kànglìnbāxìbāoxuèqīngjīyīnalternative medicine 另类医学lìnglèiyīxuéalternative mRNA splicing 可变mRNA 剪接kěbiànmRNA jiǎnjiēalternative splicing 可变剪接, 旁路剪接kěbiànjiǎnjiē,pánglùjiǎnjiēaluminum 铝lǚaluminum hydroxide 氢氧化铝qīngyǎnghuàlǚaluminum resistance 抗铝性kànglǚxìngaluminum tolerance 耐铝性nàilǚxìngaluminum toxicity 铝毒性lǚdúxìngalveolar macrophage 肺泡巨噬细胞fèipàojùshìxìbāoAlzheimer's disease 阿尔茨海默病, 老年性痴呆āěrcíhǎimòbìng, lǎoniánxìngchīdāiamanitin 鹅膏蕈碱égāoxùnjiǎnamantadine 氨基三环癸烷, 金刚烷胺ānjīsānhuánguǐwán, jīngāngwánànamatoxin 鹅膏毒素, 鹅膏毒蕈肽égāodúsù,égāodúxùntàiamber codon 琥珀密码子hǔpòmìmǎzǐamber mutation 琥珀型突变hǔpòxíngtūbiànamber suppressor 琥珀抑制突变株hǔpòyìzhìtūbiànzhūambient measurement 环境测量huánjìngcèliàngAmerican Type Culture Collection 美国模式培养物保藏所měiguómóshìpéiyǎngwùbǎocángsuǒAmes test 爱姆试验àimǔshìyànamidase 酰胺酶xiānànméiamino acid 氨基酸ānjīsuānamino acid profile 氨基酸分布图ānjīsuānfēnbùtúamino acid sequence 氨基酸序列ānjīsuānxùlièamino purine 氨基嘌呤ānjīpiàolìngaminoacylase 酰化氨基酸水解酶xiānhuàānjīsuānshuǐjiěméiaminocyclopropane carboxylic acid 氨基环丙烷羧酸ānjīhuánbǐngwánsuōsuānaminocyclopropane carboxylic acid synthase 氨基环丙烷羧酸合酶ānjīhuánbǐngwánsuōsuānhéméiaminoethoxyvinylglycine 氨基乙氧基乙烯基甘氨酸ānjīyǐyǎngjīyǐxījīgānānsuānaminoglycoside antibiotic 氨基糖苷抗生素ānjītánggānkàngshēngsùaminopeptidase 氨基肽酶, 氨肽酶ānjītàiméi, āntàiméiaminopterin 癌得宁, 氨蝶呤, 氨基蝶呤áidéníng,āndiélíng, ānjīdiélìngaminopyridines 氨基吡啶ānjībǐdìngaminotransferases 氨基转移酶, 转氨酶ānjīzhuǎnyíméi,zhuǎnānméiamitosis 无丝分裂wúsīfēnlièamorph 无效等位基因wúxiàoděngwèijīyīn amphetamine 安非他明, 苯齐巨林, 苯异丙胺ānfēitāmíng, běnqíjùlín, běnyìbǐngàn amphibolic pathway 两用代谢途径liǎngyòngdàixiètújìngamphiphilic molecules 两亲分子, 两亲性分子liǎngqīnfēnzǐ, liǎngqīnxìngfēnzǐamphitropic virus 兼宿病毒jiānsùbìngdúamphoteric compound 两性化合物liǎngxìnghu àhéwùampicillin 氨苄青霉素ānbiànqīngméisùamplicon 扩增子kuòzēngzǐamplification 放大, 扩增fàngdà, kuòzēng amplified fragment length polymorphism 扩增片段长度多态性kuòzēngpiànduànchángdùduōtàixìng amplify 扩增kuòzēngamplimer 扩增产物kuòzēngchǎnwùamygdalin 苦杏仁苷kǔxìngréngānamylase 淀粉酶diànfěnméiAmylo process 阿明路法, 根霉法āmínglùfǎ, gēnméifǎamyloglucosidase 淀粉葡糖苷酶diànfěnpútánggānméiamyloid precursor protein 淀粉前体蛋白diànfěnqiántǐdànbáiamylopectin 支链淀粉zhīliàndiànfěn amylose 直链淀粉zhíliàndiànfěnanabolic pathway 合成代谢途径héchéngdàixièt újìnganabolis 合成代谢héchéngdàixièanabolism 同化作用tónghuàzuòyòng anabolite 合成代谢产物héchéngdàixièchǎnwùanaerobe 厌氧微生物yànyǎngwēishēngwùanaerobic 厌氧的yànyǎngdeanaerobic bacteria 厌氧细菌yànyǎngxījūn anaerobic digestion 厌氧消化yànyǎngxiāohuàanaerobic treatment 厌氧处理yànyǎngchǔlǐanal 肛门的gāngméndeanalgesic 止痛药zhǐtòngyàoanalog 类似物lèisìwùanalogous 可比拟的, 类似的, 相似的kěbǐnǐde, lèisìde, xiāngsìdeanalogue 结构类似物jiégòulèisìwùanalysis of variance 方差分析fāngchāfēnxīanalyte 被分析物, 分析物bèifēnxīwù, fēnxīwùanandamide 花生四烯酰乙醇胺huāshēngsìxīxiānyǐchúnànanaphylatoxin 过敏毒素guòmǐndúsùanaphylaxis 过敏反应guòmǐnfǎnyìnganaplasia 退行发育tuìxíngfāyùanaplerotic metabolic pathway 回补代谢途径English-Chinese83huíbǔdàixiètújìnganchorage-dependent cell 贴壁细胞tiēbìxìbāoancrod 蛇毒去纤维酶shédúqùxiānwéiméiandrogen 雄激素xióngjīsùandrosterone 雄甾酮xióngzāitónganemia 贫血症pínxuèzhènganesthesia 麻醉mázuìaneuploidy 非整倍性fēizhěngbèixìnganeurysm 动脉瘤dòngmàiliúangiogenesis 血管生成xuèguǎnshēngchéngangiogenesis factor 血管生成因子xuèguǎnshēngchéngyīnzǐangiogenin 血管生成素xuèguǎnshēngchéngsùangiostatin 血管抑制素xuèguǎnyìzhìsùangiotensin 血管紧张肽xuèguǎnjǐnzhāngtàiangiotensin-converting enzyme 血管紧张肽转化酶xuèguǎnjǐnzhāngtàizhuǎnhuàméiangiotensin-converting enzyme inhibitor 血管紧张肽转化酶抑制剂xuèguǎnjǐnzhāngtàizhuǎnhuàméiyìzhìjìangiotensinase 血管紧张肽酶xuèguǎnjǐnzhāngtàiméiangiotensinogen 血管紧张肽原xuèguǎnjǐnzhāngtàiyuánangiotensinogenase 肾素, 血管紧张肽原酶shènsù,xuèguǎnjǐnzhāngtàiyuánméiangstrom 埃āianimal cell culture 动物细胞培养dòngwùxìbāopéiyǎnganimal cell line 动物细胞系dòngwùxìbāoxìanimal gene bank 动物基因库dòngwùjīyīnkùanimal genetic resources databank 动物遗传资源数据库dòngwùyíchuánzīyuánshùjùkùanimal genome bank 动物基因组库dòngwùjīyīnzǔkùanimal model 动物模型dòngwùmóxínganion 阴离子yīnlízǐanion exchanger 阴离子交换树脂yīnlízǐjiāohuànshùzhīanneal(ing) 退火tuìhuǒannexin 膜联蛋白móliándànbáiannotation 注解zhùjiěanomer 端基异构体, 异头物duānjīyìgòutǐ, yìtóuwùanoxic culture 缺氧培养quēyǎngpéiyǎngantagonism 拮抗jiékàngantagonist 拮抗剂jiékàngjìanterior pituitary gland 垂体前叶腺chuítǐqiányèxiànanthocyanidin 花青素, 花色素huāqīngsù, huāsèsùanthocyanin 花色素苷huāsèsùgānantiangiogenesis 抗血管生成kàngxuèguǎnshēngchéngantibacterial 抗菌的kàngjùndeantibiosis 拮抗菌现象, 抗生, 抗生作用jiékàngjùnxiànxiàng, kàngshēng,kàngshēngzuòyòngantibiotic 抗生素kàngshēngsùantibiotic resistance 抗生素抗性kàngshēngsùkàngxìngantibiotic therapy 抗生素治疗kàngshēngsùzhìliáoantibiotics 抗生素kàngshēngsùantibody 抗体kàngtǐantibody affinity chromatography 抗体亲和层析kàngtǐqīnhécéngxīantibody array 抗体排列kàngtǐpáilièantibody-mediated immune response 抗体介导免疫反应,抗体介导免疫应答kàngtǐjièdǎomiǎnyìfǎnyìng, kàngtǐjièdǎomiǎnyìyìngdáanticholinesterase 抗胆碱脂酶kàngdǎnjiǎnzhīméianticoagulant 抗凝剂kàngníngjìanticoding strand 反编码链fǎnbiānmǎliànanticodon 反密码子fǎnmìmǎzǐanticomplementary 抗补体的kàngbǔtǐdeanticonvulsant 抗惊厥的药, 抗痉挛的药, 抗惊厥的,抗痉挛的kàngjīngjuédeyào, kàngjìngluándeyào,kàngjīngjuéde, kàngjìngluándeanticooperativity 反协同性fǎnxiétóngxìng anticrop agent 破坏作物剂pòhuàizuòwùjìantidiuretic hormone 抗利尿激素kànglìsuījīsùantidote 解毒剂jiědújìantifolate 抗叶酸剂kàngyèsuānjìantifreeze protein 抗冻蛋白kàngdòngdànbái antifungal 防霉的, 杀真菌的fángméide, shāzhēnjūndeantigen 抗原kàngyuánantigen-antibody complex 抗原抗体复合物kàngyuánkàngtǐfùhéwùantigen-presenting cell 抗原呈递细胞kàngyuánchéngdìxìbāoantigenic determinant 抗原决定簇kàngyuánjuédìngcùantigenic switching 抗原转换kàngyuánzhuǎnhuànantihemophilic factor 抗血友病因子kàngxuèyǒubìngyīnzǐantihemophilic globulin 抗血友病球蛋白kàngxuèyǒubìngqiúdànbáiantihistamine 抗组织胺药物kàngzǔzhīànyàowùanti-idiotype 抗独特型kàngdútèxínganti-idiotype antibody 抗独特型抗体kàngdútèxíngkàngtǐanti-infective 消毒剂xiāodújìanti-infective agent 消毒剂xiāodújìanti-interferon 反干扰素, 抗干扰素fǎngānrǎos ù,kànggānrǎosùanti-messenger DNA 对应信使DNA duìyìngxìnshǐDNAantimetabolite 代谢拮抗物dàixièjiékàngwùantimetabolite 抗代谢物kàngdàixièwùantimicrobial agent 抗微生物剂kàngwēishēngwùjìantimorph 反效等位基因fǎnxiàoděngwèijīyīn antimutator gene 抗突变基因kàngtūbiànjīyīn anti-oncogene 抗癌基因kàngáijīyīn antioxidant 抗氧化剂kàngyǎnghuàjìantiparallel 反平行fǎnpíngxíngantiplatelet 抗血小板kàngxuèxiǎobǎn antiporter 对向运输剂, 反向转运剂duìxiàngyùnshūjì,fǎnxiàngzhuǎnyùnjìEnglish-Chinese84antiproliferative 抗增殖的kàngzēngzhíde antipyretic 退热的, 退热的药tuìrède, tuìrèdeyàoantisense 反义的fǎnyìdeantisense RNA 反义RNA fǎnyìRNAanti-sense technology 反义技术fǎnyìjìshùantiseptic 防腐剂fángfǔjìantisera 抗血清kàngxuèqīngantitoxin 抗毒素kàngdúsùantiviral 抗病毒的kàngbìngdúdeantivirial agent 抗病毒剂kàngbìngdújìantrin 胃窦素wèidòusùAP (activator protein) 激活蛋白jīhuódànbáiAP endonuclease 脱嘧啶核酸内切酶, 脱嘌呤tuōmìdìnghésuānnèiqièméi, tuōpiàolìngapamin 蜂毒明肽, 蜜蜂神经毒素fēngdúmíngtài,mìfēngshénjīngdúsùaphicidin 杀蚜素shāyásùaphidicolin 芽栖毒素yáqīdúsùapigenin 芹苷元qíngānyuánaplastic anemia 再生障碍性贫血zàishēngzhàngàixìngpínxuèapoenzyme 酶蛋白, 脱辅基酶méidànbái, tuōfǔjīméiapolipoprotein 载脂蛋白zàizhīdànbáiapomixes 无融合生殖wúrónghéshēngzhíapomorphine 阿朴吗啡āpòmǎfēiapoprotein 脱辅基蛋白tuōfǔjīdànbáiapoptosis 细胞凋亡xìbāodiāowángaporepressor 阻遏物蛋白zǔèwùdànbáiapparent viscosity 表观黏度biǎoguānniándùapple domains 苹果样功能域píngguǒyànggōngnéngyùapplicability 适用性, 应用性shìyòngxìng,yìngyòngxìngapplication 应用yìngyòngapplicator 点样器diǎnyàngqìaptitude 诱发适应力yòufāshìyìnglìapurinic acid 脱嘌呤核酸tuōpiàolìnghésuānapyrimidinic acid 脱嘧啶核酸tuōmìdìnghésuānaquaculture 水培shuǐpéiaquaporin 亲水孔蛋白qīnshuǐkǒngdànbáiaqueous 含水的, 水成的, 水的hánshuǐde, shuǐchéngde, shuǐdearachidonic acid 二十碳四烯酸, 花生四烯酸èrshítànsìxīsuān, huāshēngsìxīsuānarachin 花生球蛋白huāshēngqiúdànbáiarchaea 古细菌gǔxījūnarchitectural gene 结构基因jiégòujīyīnarea of release 释放区shìfàngqūarenavirus 沙粒病毒shālìbìngdúarginase 精氨酸酶jīngānsuānméiarginine 精氨酸jīngānsuānarmyworm 粘虫niánchóngaromatic 芳香族的fāngxiāngzúdearrestin 视紫红质抑制蛋白, 抑制蛋白shìzǐhóngzhìyìzhìdànbái, yìzhìdànbáiarrhythmia 心律不齐, 心律失常xīnlǜbùqí,xīnlǜshīchángars element 自主复制序列因子zìzhǔfùzhìxùlièyīnzǐarteriosclerosis 动脉硬化dòngmàiyìnghuàarthralgias 关节痛guānjiétòngarthritis 关节炎guānjiéyánarthropathy 关节病guānjiébìngartificial gene 人工基因réngōngjīyīnartificial insemination 人工授精réngōngshòujīngartificial selection 人工选择réngōngxuǎnzéascites 腹水fùshuǐascorbic acid 抗坏血酸kànghuàixiěsuānasepsis 无菌, 无菌法wújùn, wújùnfǎaseptic 无菌的wújùndeasexual 无性孢子wúxìngbāozǐasexual reproduction 无性生殖wúxìngshēngzhíAsian corn borer 亚洲玉米螟yàzhōuyùmǐmíngAsparagines 天冬酰胺tiāndōngxiānànaspartate transaminase 天冬氨酸转氨酶tiāndōngānsuānzhuǎnānméiaspartic acid 天冬氨酸tiāndōngānsuānaspartokinase 天冬氨酸激酶tiāndōngānsuānjīméiaspergillus flavus 黄曲霉huángqǔméiaspergillus fumigatus 烟曲霉yānqǔméiAspergillus niger 黑曲霉hēiqǔméiaspirate 吸出物xīchūwùassay 测定, 检定cèdìng, jiǎndìngassessment 估价, 评估, 同化gūjià, pínggū, tónghuàassimilation 吸收xīshōuastaxanthin 变胞藻黄素, 虾青素biànbāozǎohuángsù,xiāqīngsùasymptomatic seroconversion 无症状血清转变wúzhèngzhuàngxuèqīngzhuǎnbiànatherosclerosis 粥样硬化zhōuyàngyìnghuàatmospheric 常压的, 大气的chángyāde, dàqìdeatomic force microscopy 原子力显微术yuánzǐlìxiǎnwēishùatomic weight 原子量yuánzǐliàngatomizer 超微粉碎机, 喷雾器, 雾化器chāowēifěnsuìjī, pēnwùqì, wùhuàqì。
基因克隆过程
设计过程一、查到的关于mPPAR alpha 的基因资料1、NCBI里的资料Mus musculus strain C57BL/6J chromosome 5 genomic contig, GRCm38 C57BL/6J MMCHR5_CTG5LOCUS NT_039305 249181 bp DNA linear CON 23-FEB-2012DEFINITION Mus musculus strain C57BL/6J chromosome 5 genomic contig, GRCm38 C57BL/6J MMCHR5_CTG5.ACCESSION NT_039305 REGION: 21039824..21289004 GPS_000874384VERSION NT_039305.8 GI:372099023DBLINK Project: 169KEYWORDS .SOURCE Mus musculus (house mouse)ORGANISM Mus musculusEukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;Sciurognathi; Muroidea; Muridae; Murinae; Mus; Mus.REFERENCE 1 (bases 1 to 249181)AUTHORS Church,D.M., Schneider,V.A., Graves,T., Auger,K., Cunningham,F.,Bouk,N., Chen,H.C., Agarwala,R., McLaren,W.M., Ritchie,G.R.,Albracht,D., Kremitzki,M., Rock,S., Kotkiewicz,H., Kremitzki,C.,Wollam,A., Trani,L., Fulton,L., Fulton,R., Matthews,L.,Whitehead,S., Chow,W., Torrance,J., Dunn,M., Harden,G.,Threadgold,G., Wood,J., Collins,J., Heath,P., Griffiths,G.,Pelan,S., Grafham,D., Eichler,E.E., Weinstock,G., Mardis,E.R.,Wilson,R.K., Howe,K., Flicek,P. and Hubbard,T.TITLE Modernizing reference genome assembliesJOURNAL PLoS Biol. 9 (7), E1001091 (2011)PUBMED 21750661REFERENCE 2 (bases 1 to 249181)AUTHORS Church,D.M., Goodstadt,L., Hillier,L.W., Zody,M.C., Goldstein,S.,She,X., Bult,C.J., Agarwala,R., Cherry,J.L., DiCuccio,M.,Hlavina,W., Kapustin,Y., Meric,P., Maglott,D., Birtle,Z.,Marques,A.C., Graves,T., Zhou,S., Teague,B., Potamousis,K.,Churas,C., Place,M., Herschleb,J., Runnheim,R., Forrest,D.,Amos-Landgraf,J., Schwartz,D.C., Cheng,Z., Lindblad-Toh,K.,Eichler,E.E. and Ponting,C.P.CONSRTM Mouse Genome Sequencing ConsortiumTITLE Lineage-specific biology revealed by a finished genome assembly ofthe mouseJOURNAL PLoS Biol. 7 (5), E1000112 (2009)PUBMED 19468303REFERENCE 3 (sites)AUTHORS Lowe,T.M. and Eddy,S.R.TITLE tRNAscan-SE: a program for improved detection of transfer RNA genesin genomic sequenceJOURNAL Nucleic Acids Res. 25 (5), 955-964 (1997)PUBMED 9023104REMARK This is the methods paper for tRNAscan-SE.COMMENT REFSEQ INFORMA TION: Features on this sequence have been produced for build 38 version 1 of the NCBI's genome annotation [seedocumentation]. The reference sequence is identical to GL456117.2.On or before Feb 23, 2012 this sequence version replacedgi:149253841, gi:149253858, gi:149253955.Assembly Name: GRCm38 C57BL/6JThe DNA sequence is composed of genomic sequence, primarilyfinished clones that were sequenced as part of the Mouse GenomeProject. PCR products and WGS shotgun sequence have been addedwhere necessary to fill gaps or correct errors. All such additionsare manually curated by GRC staff. For more information see:.FEATURES Location/Qualifierssource 1..249181/organism="Mus musculus"/mol_type="genomic DNA"/strain="C57BL/6J"/db_xref="taxon:10090"/chromosome="5"gene complement(74755..174427)/gene="Ppargc1a"/gene_synonym="A830037N07Rik; ENSMUSG00000079510; Gm11133;PGC-1; Pgc-1alpha; PGC-1v; Pgc1; Pgco1; Ppargc1"/note="Derived by automated computational analysis usinggene prediction method: BestRefseq."/db_xref="GeneID:19017"/db_xref="MGI:1342774"mRNA complement(join(74755..78789,83212..83363,83712..83833,93104..93224,93385..93489,94002..94917,110523..110596,110706..110751,115107..115311,116176..116301,118540..118734,169018..169197,174241..174427))/gene="Ppargc1a"/gene_synonym="A830037N07Rik; ENSMUSG00000079510; Gm11133;PGC-1; Pgc-1alpha; PGC-1v; Pgc1; Pgco1; Ppargc1"/product="peroxisome proliferative activated receptor,gamma, coactivator 1 alpha, transcript variant 1"/note="Derived by automated computational analysis usinggene prediction method: BestRefseq."/transcript_id="NM_008904.2"/db_xref="GI:238018130"/db_xref="GeneID:19017"/db_xref="MGI:1342774"misc_RNA complement(join(74755..78789,83212..83363,83712..83833,93104..93224,93385..93489,94002..94917,110523..110627,110706..110751,115107..115311,116176..116301,118540..118734,169018..169197,174241..174427))/gene="Ppargc1a"/gene_synonym="A830037N07Rik; ENSMUSG00000079510; Gm11133;PGC-1; Pgc-1alpha; PGC-1v; Pgc1; Pgco1; Ppargc1"/product="peroxisome proliferative activated receptor,gamma, coactivator 1 alpha, transcript variant 2"/note="Derived by automated computational analysis usinggene prediction method: BestRefseq."/transcript_id="NR_027710.1"/db_xref="GI:238018131"/db_xref="GeneID:19017"/db_xref="MGI:1342774"CDS complement(join(78686..78789,83212..83363,83712..83833,93104..93224,93385..93489,94002..94917,110523..110596,110706..110751,115107..115311,116176..116301,118540..118734,169018..169197,174241..174288))/gene="Ppargc1a"/gene_synonym="A830037N07Rik; ENSMUSG00000079510; Gm11133;PGC-1; Pgc-1alpha; PGC-1v; Pgc1; Pgco1; Ppargc1"/note="Derived by automated computational analysis usinggene prediction method: BestRefseq."/codon_start=1/product="peroxisome proliferator-activated receptor gammacoactivator 1-alpha"/protein_id="NP_032930.1"/db_xref="GI:6679433"/db_xref="CCDS:CCDS19282.1"/db_xref="GeneID:19017"/db_xref="MGI:1342774"ORIGIN1 acacccattc atgataaaag ttttggaaag atcaggaatt caaggcccat acctaaacat61 gataaaagca atctacagca aaccagtagc caacatcaaa gtaaatggag agaagctgga············································································ 249061 ctgacgtggc agagtctgca tgaaagaacc acaagcaagc actttctgaa gcccaggctc249121 aaaaagcaag gacactaact cacactaaac ttgatgatct tccgtgctta ttccttgaaa249181 g//CDS 前后的78661 tgtcattcct cagcctggga acacgttacc tgcgcaagct tctctgagct tccttcagta78721 aactatcaaa atccagagag tcatacttgc tcttggtgga agcagggtca aaatcgtctg78781 agttggtatc tgaaagaaac acaccatgtc aggttagatg caccctttac tgatggcctt78841 taacacaact caagataccg ggttcaatta tagagcagct aatcaacgta cttaaaaaca·············································································· 174121 aaagctcccg ctgagccact ttctatgtcc cctcctctcc cctagcccat cccccccccc174181 ctccaggaat cattgcatct gagagaagct ccggtcctgc aatactcagc cccagctcac174241 ctctatgtca ctccatacag agtcttggct gcacatgtcc caagccatcc agctcccgaa2、数据处理由上述资料文献,将cds 部分的碱基序列拷入SECentral 软件中得到map 如下图,再根据pGEX-4T-3的酶切位点将目的基因中相应的的酶切位点找出并标记在map 中。
MAKER使用指南说明书
Exercise 1. Using MAKER for Genome AnnotationIf you are following this guide for your own research project, please make the following modifications:1. In this exercise, SNAP was used for gene prediction. When you are working on your owngenome, we recommend that you use Augustus. The instructions for using Augustus is in appendix.2. In the exercise, you will be using 2 CPU cores. When you are working on your own genome,you should use all CPU cores on your machine. When you run the command:"/usr/local/mpich/bin/mpiexec -n 2", replace 2 with number of cores available on yourmachine.3. The steps for Repeatmodeler and Repeatmasker are optional in the exercise, but requiredwhen you work on your own genome.The example here is from a workshop by Mark Yandell Lab (/ ) Further readings:1. Yandel Lab Workshop. /MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018 .2. MAKER protocol from Yandell Lab. It is good reference. https:///pmc/articles/PMC4286374/3. Tutorial for training Augustus https:///simonlab/bioinformatics/programs/augustus/docs/tutorial2015/training.html4. Maker control file explained: /MAKER/wiki/index.php/The_MAKER_control_files_explainedPart 1. Prepare working directory.1. Copy the data file from /shared_data/annotation2018/ into /workdir/$USER, and de-compress the file. You will also copy the maker software directory to /workdir/USER. The maker software directory including a large sequence repeats database. It would be good to put it under /workdir which is on local hard drive.mkdir /workdir/$USERmkdir /workdir/$USER/tmpcd /workdir/$USERcp /shared_data/annotation2019/maker_tutorial.tgz ./cp -rH /programs/maker/ ./cp -rH /programs/RepeatMasker ./tar -zxf maker_tutorial.tgzcd maker_tutorialls -1Part 2. Maker round 1 - Map known genes to the genomeRun everything in "screen".Round 1 includes two steps:Repeat masking;Align known transcriptome/protein sequences to the genome;1. [Optional] Build a custom repeat database. This step is optional for this exercise, as it is avery small genome, it is ok without repeat masking. When you work on a real project, you can either download a database from RepBase (https:///repbase/, license required), or you can build a custom repeat database with your genome sequence.RepeatModeler is a software for building custom databases. The commands for building a repeat database are provided here.cd example_02_abinitioexport PATH=/programs/RepeatModeler-2.0:$PATHBuildDatabase -name pyu pyu_contig.fastaRepeatModeler -pa 4 -database pyu -LTRStruct >& repeatmodeler.logAt the end of run, you would find a file "pyu-families.fa". This is the file you can supply to "rmlib=" in the control file.2. Set environment to run Maker and create MAKER control files.Every steps in Maker are specified by the Maker control files. The command "maker -CTL" will create three control files: maker_bopts.ctl, maker_exe.ctl, maker_opts.ctl.by.exportPATH=/workdir/$USER/maker/bin:/workdir/$USER/RepeatMasker:/programs/snap:$PATH export ZOE=/programs/snap/Zoeexport LD_LIBRARY_PATH=/programs/boost_1_62_0/libcd /workdir/$USER/maker_tutorial/example_02_abinitiomaker -CTL3. Modify the control file maker_opts.ctl.Open the maker_opts.ctl file in a text editor (e.g. Notepad++ on Windows, BBEdit on Mac, or vi on Linux). Modify the following values. Put the modified file in the same directory“example_02_abinitio”.genome=pyu_contig.fastaest=pyu_est.fastaprotein=sp_protein.fastamodel_org=simplermlib= #fasta file of your repeat sequence from RepeatModeler. Leave blank to skip.softmask=1est2genome=1protein2genome=1TMP=/workdir/$USER/tmp #important for big genome, as the default /tmp is too smallThe modified maker_opts.ctl file instructs MAKER to do two things.a) Run RepeatMasker.The line “model_org=simple” tells RepeatMasker to mask the low complexity sequence (e.g.“AAAAAAAAAAAAA”.The line “rmlib=” sets "rmlib" to null, which tells RepeatMasker not to mask repeatsequences like transposon elements. If you have a repeat fasta file (e.g. output fromRepeatModeler) that you need to mask, put the fasta file name next to “rmlib=”The line “softmask=1” tells RepeatMasker to do soft-masking which converts repeats tolower case, instead of hard-masking which converts repeats to “N”. "Soft-masking" isimportant so that short repeat sequences within genes can still be annotated as part of gene.If you run RepeatMasker separately, as described in https:///darencard/bb10 01ac1532dd4225b030cf0cd61ce2 , you should leave rmlib to null, but set rm_gff to a repeat gff file.b) Align the transcript sequences from the pyu_est.fasta file and protein sequences from thesp_protein.fasta file to the genome and infer evidence supported gene model.The lines “est2genome=1” and “protein2genome=1” tell MAKER to align the transcriptsequences from the pyu_est.fasta file and protein sequences from the sp_protein.fasta file to the genome. These two files are used to define evidence supported gene model.The lines “est=pyu_est.fasta" and "protein=sp_protein.fasta" specify the fasta file names of the EST and protein sequences. In general, the EST sequence file contains the assembled transcriptome from RNA-seq data. The protein sequence file include proteins from closely related species or swiss-prot. If you have multiple protein or EST files, separate file names with ",".4. [Do it at home] Execute repeat masking and alignments. This step takes an hour. Run it in"screen". In the command: "mpiexec -n 2 " means that you will parallelize Maker using MPI, and use two threads at a time. When you work on a real project, it will take much longer, and you should increase this "-n" setting to the number of cores.Set Maker environment if it is new session:exportPATH=/workdir/$USER/maker/bin:/workdir/$USER/RepeatMasker:/programs/snap:$PATH export ZOE=/programs/snap/Zoeexport LD_LIBRARY_PATH=/programs/boost_1_62_0/libExecute the commands:cd /workdir/qisun/maker_tutorial/example_02_abinitio/usr/local/mpich/bin/mpiexec -n 2 maker -base pyu_rnd1 >& log1 &After it is done, you can check the log1 file. You should see a sentence: Maker is now finished!!!Part 3. Maker round 2 - Gene prediction using SNAP1. Train a SNAP gene model.SNAP is software to do ab initio gene prediction from a genome. In order to do gene prediction with SNAP, you will first train a SNAP model with alignment results produced in the previous step.If you skipped the step "4. [Do it at home] Execute Maker round 1", you can copy the result files from this directory: /shared_data/annotation2019/cd /workdir/qisun/maker_tutorial/example_02_abinitiocp /shared_data/annotation2019/pyu_rnd1.maker.output.tgz ./tar xvfz pyu_rnd1.maker.output.tgzSet Maker environment if it is new session:exportPATH=/workdir/$USER/maker/bin:/workdir/$USER/RepeatMasker:/programs/snap:$PATH export ZOE=/programs/snap/Zoeexport LD_LIBRARY_PATH=/programs/boost_1_62_0/libThe following commands will convert the MAKER round 1 results to input files for building a SNAP mode.mkdir snap1cd snap1gff3_merge -d ../pyu_rnd1.maker.output/pyu_rnd1_master_datastore_index.logmaker2zff -l 50 -x 0.5 pyu_rnd1.all.gffThe “-l 50 -x 0.5” parameter in maker2zff commands specify that only gene models with AED score>0.5 and protein length>50 are used for building models. You will find two new files: genome.ann and genome.dna.Now you will run the following commands to train SNAP. The basic steps for training SNAP are first to filter the input gene models, then capture genomic sequence immediately surrounding each model locus, and finally uses those captured segments to produce the HMM. You can explore the internal SNAP documentation for more details if you wish.fathom -categorize 1000 genome.ann genome.dnafathom -export 1000 -plus uni.ann uni.dnaforge export.ann export.dnahmm-assembler.pl pyu . > ../pyu1.hmmmv pyu_rnd1.all.gff ../cd ..After this, you will find two new files in the directory example_02_abinitio:pyu_rnd1.all.gff: A gff file from round 1, which is evidence based genes.pyu1.hmm: A hidden markov model trained from evidence based genes.2. Use SNAP to predict genes.Modify directly on the maker_opts.ctl file that you have modified previously.Before doing that, you might want to save a backup copy of maker_opts.ctl for round 1.cp maker_opts.ctl maker_opts.ctl_backup_rnd1Now modify the following values in the file: maker_opts.ctlRun maker with the new control file. This step takes a few minutes. (A real project could take hours to finish). You will use the option “-base pyu_rnd2” so that the results will be written into a new directory "pyu_rnd2".Again, make sure the log2 file ends with "Maker is now finished!!!".Part 4. Maker round 3 - Retrain SNAP model and do another round of SNAP gene predictionYou might need to run two or three rounds of SNAP . So you will repeat Part 2 again. Make sure you will replace snap1 to snap2, so that you would not over-write previous round.1. First train a new SNAP model.2. Use SNAP to predict genes.Modify directly on the maker_opts.ctl file that you have modified previously.Before doing that, you might want to save a backup copy of maker_opts.ctl for round 2.Now modify the following values in the file: maker_opts.ctlmaker_gff= pyu_rnd1.all.gffest_pass=1 # use est alignment from round 1protein_pass=1 #use protein alignment from round 1rm_pass=1 # use repeats in the gff filesnaphmm=pyu1.hmmest= # remove est file, do not run EST blast againprotein= # remove protein file, do not run blast againmodel_org= #remove repeat mask model, so not running RM againrmlib= # not running repeat masking againrepeat_protein= #not running repeat masking againest2genome=0 # do not do EST evidence based gene modelprotein2genome=0 # do not do protein based gene model.pred_stats=1 #report AED statsalt_splice=0 # 0: keep one isoform per gene; 1: identify splicing variants of the same genekeep_preds=1 # keep genes even without evidence support, set to 0 if no/usr/local/mpich/bin/mpiexec -n 2 maker -base pyu_rnd2 >& log2 &mkdir snap2cd snap2gff3_merge -d ../pyu_rnd2.maker.output/pyu_rnd2_master_datastore_index.logmaker2zff -l 50 -x 0.5 pyu_rnd2.all.gfffathom -categorize 1000 genome.ann genome.dnafathom -export 1000 -plus uni.ann uni.dnaforge export.ann export.dnahmm-assembler.pl pyu . > ../pyu2.hmmmv pyu_rnd2.all.gff ..cd ..cp maker_opts.ctl maker_opts.ctl_backup_rnd2maker_gff=pyu_rnd2.all.gffsnaphmm=pyu2.hmmRun Maker:/usr/local/mpich/bin/mpiexec -n 2 maker -base pyu_rnd3 >& log3 &Use the following command to create the final merged gff file. The “-n” option would produce a gff file without genome sequences:gff3_merge -n -dpyu_rnd3.maker.output/pyu_rnd3_master_datastore_index.log>pyu_rnd3.noseq.gff fasta_merge -d pyu_rnd3.maker.output/pyu_rnd3_master_datastore_index.logAfter this, you will get a new gff3 file: pyu_rnd3.noseq.gff, and protein and transcript fasta files. 3. Generate AED plots./programs/maker/AED_cdf_generator.pl -b 0.025 pyu_rnd2.all.gff > AED_rnd2/programs/maker/AED_cdf_generator.pl -b 0.025 pyu_rnd3.noseq.gff > AED_rnd3You can use Excel or R to plot the second column of the AED_rnd2 and AED_rnd3 files, and use the first column as the X-axis value. The X-axis label is "AED", and Y-axis label is "Cumulative Fraction of Annotations "Part 5. Visualize the gff file in IGVYou can load the gff file into IGV or JBrowse, together with RNA-seq read alignment bam files. For instructions of running IGV and loading the annotation gff file, you can read under "part 4" of this document:/doc/RNA-Seq-2019-exercise1.pdfAppendix: Training Augustus modelRun Part 1 & 2.In the same screen session, set up Augustus environment.cp -r /programs/Augustus-3.3.3/config/ /workdir/$USER/augustus_configexport LD_LIBRARY_PATH=/programs/boost_1_62_0/libexport AUGUSTUS_CONFIG_PATH=/workdir/$USER/augustus_config/export LD_LIBRARY_PATH=/programs/boost_1_62_0/libexport LC_ALL=en_US.utf-8export LANG=en_US.utf-8export PATH=/programs/augustus/bin:/programs/augustus/scripts:$PATHThe following commands will convert the MAKER round 1 results to input files for building a SNAP mode.mkdir augustus1cd augustus1gff3_merge -d ../pyu_rnd1.maker.output/pyu_rnd1_master_datastore_index.logAfter this step, you will see a new gff file pyu_rnd1.all.gff from round 1.## filter gff file, only keep maker annotation in the filtered gff fileawk '{if ($2=="maker") print }' pyu_rnd1.all.gff > maker_rnd1.gff##convert the maker gff and fasta file into a Genbank formated file named pyu.gb ##We keep 2000 bp up- and down-stream of each gene for training the modelsgff2gbSmallDNA.pl maker_rnd1.gff pyu_contig.fasta 2000 pyu.gb## check number of genes in training setgrep -c LOCUS pyu.gb## train model## first create a new Augustus species namednew_species.pl --species=pyu## initial trainingetraining --species=pyu pyu.gb## the initial model should be in the directoryls -ort $AUGUSTUS_CONFIG_PATH/species/pyu##create a smaller test set for evaluation before and after optimization. Name the evaluation set pyu.gb.evaluation.randomSplit.pl pyu.gb 200mv pyu.gb.test pyu.gb.evaluation# use the first model to predict the genes in the test set, and check theresultsaugustus --species=pyu pyu.gb.evaluation >& first_evaluate.outgrep -A 22 Evaluation first_evaluate.out# optimize the model. this step is very time consuming. It could take days. To speed things up, you can create a smaller test set# the following step will create a test and training sets. the test set has 1000 genes. This test set will be splitted into 24 kfolds for optimization (the kfold can be set up to 48, with processed with one cpu core per kfold. Kfold must be same number as as cpus). The training, prediction and evaluation will beperformed on each bucket in parallel (training on hh.gb.train+each bucket, then comparing each bucket with the union of the rest). By default, 5 rounds of optimization. As optimization for large genome could take days, I changed it to3 here.randomSplit.pl pyu.gb 1000optimize_augustus.pl --species=hh --kfold=24 --cpus=24 --rounds=3 --onlytrain=pyu.gb.train pyu.gb.test >& log &#train again after optimizationetraining --species=pyu pyu.gb# use the optionized model to evaluate again, and check the resultsaugustus --species=pyu pyu.gb.evaluation >& second_evaluate.outgrep -A 22 Evaluation second_evaluate.outAfter these steps, the species model is in the directory/workdir/$USER/augustus_config/species/pyu.Now modify the following values in the file: maker_opts.ctlmaker_gff= pyu_rnd1.all.gffest_pass=1 # use est alignment from round 1protein_pass=1 #use protein alignment from round 1rm_pass=1 # use repeats in the gff fileaugustus_species=pyu # augustus species model you just builtest= # remove est file, do not run EST blast againprotein= # remove protein file, do not run blast againmodel_org= #remove repeat mask model, so not running RM againrmlib= # not running repeat masking againrepeat_protein= #not running repeat masking againest2genome=0 # do not do EST evidence based gene modelprotein2genome=0 # do not do protein based gene model.pred_stats=1 #report AED statsalt_splice=0 # 0: keep one isoform per gene; 1: identify splicing variants of the same genekeep_preds=1 # keep genes even without evidence support, set to 0 if noRun maker with the new augustus model/usr/local/mpich/bin/mpiexec -n 2 maker -base pyu_rnd3 >& log3 &Create gff and fasta output files:Use the following command to create the final merged gff file. The “-n” option would produce a gff file without genome sequences:gff3_merge -n -dpyu_rnd3.maker.output/pyu_rnd3_master_datastore_index.log>pyu_rnd3.noseq.gff fasta_merge -d pyu_rnd3.maker.output/pyu_rnd3_master_datastore_index.logAfter this, you will get a new gff3 file: pyu_rnd3.noseq.gff, and protein and transcript fasta files. To make the gene names shorter, use the following commands:maker_map_ids --prefix pyu_ --justify 8 --iterate 1 pyu_rnd3.all.gff > id_map map_gff_ids id_map pyu_rnd3.all.gffmap_fasta_ids id_map pyu_rnd3.all.maker.proteins.fastamap_fasta_ids id_map pyu_rnd3.all.maker.transcripts.fasta。
微生物次级代谢产物合成基因簇预测分析
Analysis tools of antiSMASH
NCBI BLAST+ HMMer 3, Muscle 3 Glimmer 3 FastTree
TreeGraph 2 Indigo-depict PySVG JQuery SVG
19
Pipeline for genomic analysis of secondary metabolites
PKS PKS PKS (neg.)
PKS PKS PKS
PKS_KS PKS_AT fabH
ene_KS mod_KS t2clf
SMART SMART This study
Yadav et al. (2009) Yadav et al. (2009) This study
Terpene Terpene ……….
gene prediction by Glimmer3 (prokaryotic data) or GlimmerHMM
(eukaryotic data) Transform the predicted results to EMBL format
21
Detection of gene clusters
nonribosomal peptides 非核糖体肽(NRP) bacteriocins 细菌素 aminocoumarins 基香豆素
butyrolactones 丁内酯
terpenes 萜烯 beta-lactams β-内酰胺 siderophores 铁载体
indoles 吲哚类
lantibiotics 羊毛硫抗生素
22
NRPS/PKS domain architecture analysis
基因结构与基因预测
25 Exons: 48~354 bp
人类基因组结构的特点
(1)、人类细胞核基因组中编码序列不到2%,约含3万左右 不同的基因,且有近1/3为多拷贝;
(2)、结构基因大多含有插入序列。即大部分基因为断裂基 因(interrupted gene);
(3)、外显子(exon)一般不长于800bp,内含子(intron)则在 30bp~数十kb不等;
6 与转录有关的调控信号
(1)、启动子(promoter) (2)、增强子(enhancer) (3)、负性调节元件 (4)、LCR(Locus control regions)(基因座调控区) (5)、转录因子 (6)、与转录终止有关的序列: (7)、mRNA的剪接
7 内含子-外显子结构的统计研究
特异性(specificity,Sp):
Sp TP TP FP
Sn:实际编码区核酸序列中被成功预测的比例; Sp:预测为编码核酸序列中被成功预测的比例。
条件概率: x:某个核酸的状态(即编码或非编码),F(x):该核酸被预测的状态,c: 编码状态,n:非编码状态
S n P F x c x c
S p P x c F x c
外显子长度概率分布曲线的山峰处于30~40个氨基酸长度的地方,且 山峰比较紧凑,而内含子的长度则大多数为40~125个核苷酸,山峰 相对平缓。
10种真核生物的外显子和内含子长度的统计分布 (Deutsch & Long,2019)
(外显子长度的单位为氨基酸,内含子长度的单位为核苷酸;图中横坐标表 示长度,纵坐标表示频率。)
(外显子长度的单位为氨基酸,内含子长度的单位为核苷酸;图中横坐标表 示长度,纵坐标表示频率。)
§6.1.2 真核基因预测研究概况
叶酸缺乏的小鼠胚胎干细胞ES-D3中miR-302a、Bcl2L11表达观察及其靶向关系预测和验证
叶酸的胎干细胞ES-D3miR-302a、Bcl2Lll观察及其靶预测和验证梁燕打曹丁丁2,李媛媛2,刘卓2,21山东第一医科大学附属省立医院,济南250021;2首都儿科研究所摘要:目的观察叶酸缺乏的小鼠胚胎干细胞系ES-D3中微小RNA302a(miR-302a)、Bcl2Lll的表达变化,并对miR-302a与Bcl2Lll的靶向关系进行预测和验证。
方法取适量ES-D3细胞,置于叶酸的培养液中培养,分别于培养0、24、48、72h(A、B、C、D组)时收集细胞,实时荧光定量PCR法检测miR-302a、Bcl2Lll mRNA,Western blotting法检测细胞Bcl2Lll蛋白(BimEL、BimL及BimS蛋白)。
生物信息学软件预测Bcl2Ll1与miR-302a的靶向结合位点。
3T3细胞基因组DNA为模板,设计3'UTR扩增引物,PCR扩增基因的3'UTR,将其克隆到pmiR-RB-REPORT双荧光素酶报告载体中,构建Bcl2Ll1-3'UTR野生载体;设计突变引物将靶标序列突变,构建突变载体,miRNA-302a mimics及Negative Control分别与Bcl2L113'UTR双报告基因野生载体及突变载体共转染,分别为1组(Bcl2L113'UTR野生型载体+Negative Control)、组(Bcl2L113'UTR野生型载体+miR-302a mimics)、组(Bcl2LH3'UTR突变型载体+Negative Control)、4组(Bcl2L113'UTR突变型载体+miR-302a mimics),转染后测算各组报告基因hRluc的相对荧光值,验证miR-302a与Bcl2Ll1是否存在相互作用。
结果A、B、C、D组细胞miR-302a相对表达量分别为1.00±0.20,0.65±0.32,0.50±0.22,0.43±0.24,与A组比较,D组miR-302a相对表达量下降(P<0.05);A、B、C、D组细胞Bcl2LH mRNA相对表达量分别为1.00±0.00、1.22±0.19、1.82±0.37,3.49±0.45,与A组比较,D组Bcl2Lll mRNA相对表达量升高(P<0.05);A、B、C、D组细胞BimEL相对表达量分别为0.86±0.02,0.87±0.03,0.92±0.02丄10±0.06,与A组比较,C、D组BimEL相对表达量升高(P<0.05);A、B、C、D组细胞BimL相对表达量分别为0,0.09±0.02,0.16±0.04,0.26±0.03,与A组比较,B、C、D组BimL相对表达量升高(P<0.05);A、B、C、D组细胞BimS相对表达量分别为0、0、0.11±0.04,0.18±0.04,与A组比较,C、D组BimS相对表达量升高(P<0.05)o miR-302a和Bcl2Lll存在相互作用。
generatio在生信中的分析
generatio在生信中的分析
下一代测序:英文名为Next Generation Sequencing,简称为NGS。
也叫做二代测序或者高通量测序。
也称为高通量测序,
high-throughput sequencing,或者称为新一代测序,全基因组测序WGS等等概念。
是指相对于Sanger为主的第一代测序技术来说的,其特点是测序产量高,读长短,价格便宜。
现在通常所说的二代测序技术,主要包括ABI的solid测序,罗氏的454测序技术、Life公司的Ion Torrent测序技术和ill um in a公司的Hi seq、mi seq 测序技术等。
当前最主要的是指ill um in a测序。
利用高通量测序平台对人类不同个体或群体进行全基因组测序,并在个体或群体水平上进行生物信息分析的技术手段.全基因组测序可全面挖掘DNA水平的遗传变异,包括较大的结构性变异,为筛选疾病的致病及易感基因,研究发病及遗传机制,以及推断种群迁徙和进化等提供重要信息。
全基因组测序可以检测人基因组上SNP突变,INDEL突变之外,还可以用于检测拷贝数变异CNV和结构变异SV,融合基因,病毒整合位点检测,非编码区突变检测等。
生物学中其他学科的重要知识点
生物学中其他学科的重要知识点生物学是自然科学的一个分支,研究生命现象和生命活动规律。
在生物学的研究过程中,其他学科的知识也起到了重要的作用。
本文将介绍物理学、化学、数学、计算机科学等学科在生物学中的重要知识点。
1. 物理学物理学是自然科学的基础学科之一,研究自然界中物质、能量、空间和时间的基本规律。
在生物学中,物理学知识主要应用于以下方面:(1)光学:生物学显微镜技术的基础,如荧光显微镜、共聚焦显微镜等,都离不开光学原理。
光学成像技术在生物学研究中发挥着重要作用,如激光扫描显微镜、光学切片成像技术等。
(2)电磁学:生物体内外的电磁现象,如生物电、磁性物质等,都可以用物理学中的电磁学知识来解释。
电磁学在神经生物学、生物电生理学等领域具有重要应用。
(3)力学:生物体内的力学现象,如肌肉收缩、细胞运动等,可以通过物理学中的力学原理进行分析。
此外,力学在生物力学、流体力学等领域也有广泛的应用。
2. 化学化学是研究物质组成、结构、性质、变化规律的科学。
在生物学中,化学知识主要应用于以下方面:(1)生物大分子:生物学中的蛋白质、核酸、多糖等生物大分子,其结构、功能和相互作用都可以通过化学方法进行研究。
如酶催化反应、蛋白质结晶学等。
(2)代谢途径:生物体内的化学反应,如糖酵解、三羧酸循环等代谢途径,可以用化学原理来解释。
化学在生物化学、分子生物学等领域具有重要应用。
(3)药物化学:化学在药物设计、合成和筛选中发挥着重要作用。
通过化学方法,可以研究药物的分子机理、药效和毒性等性质。
3. 数学数学是一门抽象的科学,为生物学提供了强大的分析工具。
在生物学中,数学知识主要应用于以下方面:(1)统计学:生物学研究中的数据分析和处理,如实验数据的有效性检验、遗传学研究中的概率计算等,都离不开统计学方法。
(2)微积分:生物学中的动态过程,如细胞增殖、生物种群变化等,可以用微积分方法进行建模和分析。
(3)线性代数:生物学中的结构分析,如基因表达数据的矩阵运算、蛋白质结构预测等,都可以用线性代数方法进行处理。
基因的特征
• (二) 分散排列的重复基因 • 有一些重复基因并不衔接排列在一起,而是分散在整个基 因组中。
• (三) 基因簇的组织—— 组蛋白 • 在有些基因簇中,尽管其中各个基因的结构差别很大, 但它们的编码产物都参与同一代谢过程或具有相似功能, 前述组蛋白基因就属于这一类型。组蛋白基因是研究最早 的基因之一。
5/13/2015
(三)DNA复制机器引起的突变
• DNA复制是非常准确的,但复制过程也会不可避免地出现 差错,从而导致某些突变。
5/13/2015
基因突变的特点
①普遍性:自然界的物种中广泛存在 ②随机性:可发生在任何时期 ③不定向性: A=a1或A=a2 ④低频性:自然界突变率很低:10- 5- 10-8 ⑤有害性:多数有害,少数有利
• 镰刀型细胞贫血症是由常染色体上的隐性致病基因引起的,患者 在幼年时期夭折,但致病基因的携带者却能对疟疾有较强的抵抗 力。
5/13/2015
如果某个密码的第一个碱基或第二个碱基发生转 换或颠换,就会使原来编码某种氨基酸的密码改 变为编码另一种氨基酸的密码,或不编码任何氨 基酸的密码,前者称为错义突变 (missensemutation ),后者称为无义突变(nonsense mutation )。 • 密码发生变化,从而使蛋白质中氨基酸发生替换, 但是该蛋白质的功能却不改变。
5/13/2015
• 4真核生物蛋白质基因的启动子区中最基本 的结构保守序列有哪两种? • 5原核生物基因的5‘转录控制区由哪两种 结构保守序列组成?
5/13/2015
第二节 基因的组织
• 一、操纵元(子) operon: 在细菌中,编码若干种功能上有关的酶类的基因在转录时受 一个开关单位控制,形成多顺反子mRNA。像这样一种“超级基因”。 • 包括:structural gene 和regulatory gene • Operator • Trans-acting element 反作用因子 • Cis-acting element 顺式作用元件
基于信号处理的基因识别模型
基于信号处理的基因识别模型乌峰杰;郭峰;田晓敏;张慧增【期刊名称】《杭州师范大学学报(自然科学版)》【年(卷),期】2014(000)002【摘要】In order to obtain accurate and reliable gene identification information ,according to the given DNA sequences of numerical value mapping (based on Voss mapping and Z-curve mapping) ,DNA sequence power spectrum and the signal to noise ratio (SNR) are calculated by using fast Fourier transform (FFT ) and discrete wavelet transform algorithm .For the gene prediction of different types of living things ,the best threshold of gene prediction is deduced by bootstrap sampling algorithm ,and the properties of gene prediction method are assessed withspecificity ,sensitivity and accuracy ,the model in gene recognition has fast operation ,high precision and good effects .%为了获得精确可靠的基因识别信息,根据所给DNA 序列进行数值化映射(基于 Voss映射和Z-curve映射)运用快速傅立叶变换(FFT )以及离散小波变换算法计算DNA序列功率谱和信噪比;对于不同类型生物的基因预测采用靴带抽样算法推断基因预测的最佳阈值并以特异性、敏感性和精度3个性能指标来评估基因预测方法的性能,所建模型在基因识别中精度较高、运算快、效果好。
基因预测总结
基因预测总结1、基因预测对于真菌来说有四个ab initio预测软件:GlimmerHMM,SNAP,Genearkes,augustus 以及同源预测(homology)。
四个软件中:GeneMarkes是通过隐马模型工作的,但是它不需要参考物种,是自身训练的,不需要参考序列,当处理一个新物种,没有理想的或者较近缘的已测序物种时可以采用这种方法。
Augustus,GlimmerHMM,SNAP都需要参考训练集的。
总流程:perl /nas/MG01/FUNGUS/PGAP/FGAP.pl [options] Genome.faOptions--all run all analysis for Fungi--cutlen cut the scaffolds longer than this--predict select the method to predict genes:augustus,genemarkes,snap,glimmerhmm or homology--prepara set the parament for augustus,snap,homology--repeat set repeat method, defalut: repbase-proteinmasker-trf--ncRNA set ncRNA type, default: tRNA-rRNA-miRNA-sRNA-snRNA--rRNA_ref set Reference for rRNA, if null rRNA will be predicted by rRNAmmer--function set dbs for gene function annotaion,default:nr-swissprot-trembl-cog-kegg-iprscan--lib set the lib for synteny analysis and gene family analysis, needed--synteny synteny analysis--family Gene Family analysi--species species tree, default, created by lib information--category category file, default, created by lib information --cpu set the cpu number to use in parallel, default 20 for qsub and 5 for multi --run set the parallel type, qsub, or multi, default=qsub--outdir set the result directory, default="."--prefix set a prefix name for results--help output help information to screen分步流程程序路径:/nas/MG01/FUNGUS/PGAP/gene-prediction/bin/gene-predict.plperl gene-predict.pl [options]--glimmer run glimmer by self training--genemark run genemark by self training--shape set the shape of prokaryote DNA, circular,linear,partial, default=partial --glimmerhmm run glimmerhmm and give a glimmerhmm parameter directory --snap run snap and give a snap parameter file--genemarkes run genemarkes by self traning--augustus run augustus and set species--homology predict genes based on proteins on a homology species--genemarkM run genemarkM for mata gene prediction--metagene run metagene for meta gene prediction--metageneA run metageneA for meta gene prediction--cpu set the cpu number to use in parallel, default=3--run set the parallel type, qsub, or multi, default=qsub--prefix set gene id prefix--outdir set the result directory, default="./"--verbose output running progress information to screen--help output help information to screen1.1Genemarkes预测:Self-training algorithm GeneMark-ESa) splits input sequence at such "NN...N" stringsb) runs gene finding GeneMark.hmm on contigsc) maps back predictions to original super-contig sequence As a result, incomplete gene structures can be predicted inside super-contig sequences.Script:perl ./gene-predict.pl --genemarkesGeneMarkES 输出结果为./genemark_hmm.gtf1.2 Homology预测Homology(同源预测)是通过基因组序列和参考蛋白集进行比对来确定基因位置的,预测的结果特点是基因数目少,但是准确率很高。
pam50概念
pam50概念
PAM50(Prediction Analysis of Microarray 50-gene panel)是一种基因表达谱的分类系统,用于预测乳腺癌患者的分子亚型。
PAM50基因集包含了50个关键基因,这些基因主要涉及到乳腺癌发生和进展所必需的信号通路。
PAM50被广泛应用于乳腺癌研究领域,可以将乳腺癌分为四个亚型:Luminal A、Luminal B、HER2-enriched和Basal-like。
- Luminal A亚型:这是最常见的乳腺癌亚型之一,约占乳腺癌患者的40%左右。
它通常具有较好的预后,并且对激素治疗相对敏感。
- Luminal B亚型:与Luminal A亚型相比,Luminal B 亚型乳腺癌患者通常显示更高的增殖活性和较差的预后。
此亚型中的患者可能需要更积极的治疗策略。
- HER2-enriched亚型:HER2-enriched亚型乳腺癌患者在细胞表面过度表达人类表皮生长因子受体2(HER2)。
这种亚型对抗HER2靶向治疗药物通常有较好的反应。
- Basal-like亚型:Basal-like亚型乳腺癌患者通常
表现为高度侵袭性和不良预后。
这种亚型与基底层皮样细胞特征相关,其特征包括高级别地表达一些基底层细胞标记物。
PAM50分类系统可以帮助医生更好地了解乳腺癌患者的分子特征,并指导个体化的治疗决策。
该系统在乳腺癌诊断、预后评估和治疗选择方面具有重要的临床应用价值。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Advancing Science with DNA Sequence Finding the genes inmicrobial genomesNatalia IvanovaMGM WorkshopJanuary 7, 2008Advancing Science with DNA Sequence Finding the genes in microbial genomesfeaturesSequence features in prokaryotic genomes:stable RNA coding genes (rRNAs, tRNAs, RNA component of RNaseP, tmRNA)protein coding genes (CDSs)transcriptional features (mRNAs, operons, promoters, terminators, protein binding sites, DNA bends)translational features (RBS, regulatory antisense RNAs, mRNA secondary structures, translational recoding and programmed frameshifts, inteins)pseudogenes(tRNA and protein coding genes)…Advancing Science with DNA SequenceSequence features in prokaryotic genomes: stable RNA coding genes (rRNAs, tRNAs, RNAcomponent of RNaseP, tmRNA )protein coding genes (CDSs )transcriptional features (mRNAs, operons, promoters, terminators, protein binding sites, DNA bends )translational features (RBS, regulatory antisenseRNAs, mRNA secondary structures, translationalrecoding and programmed frameshifts, inteins )pseudogenes (tRNA and protein coding genes )…Finding the genes in microbial genomesfeaturesWell annotated bacterial genome in Artemis genome viewer:rRNA tRNA operon promoter terminator protein-coding geneCDSprotein-binding siteAdvancing Science with DNA SequenceServers for microbial genomeannotation•IMG-ER /er Output: stable RNA-encoding genes, CDSs, functional annotations •RAST/Output: stable RNA-encoding genes, CDSs, functional annotations •REGANORhttps://www.cebitec.uni-bielefeld.de/groups/brf/software/reganor/Output: stable RNA-encoding genes, CDSs; file in gff format •RefSeq/genomes/MICROBES/genemark.cgi /genomes/MICROBES/glimmer_3.cgi Output: CDSs; file in tbl format •EasyGenehttp://www.cbs.dtu.dk/services/EasyGene/Output: CDSs; sequence size restriction -<1 MBAdvancing Science with DNA SequenceFinding stable RNAs -IStable RNAs: large RNAs (16S and 23S rRNAs) and small RNAs (5S rRNA, tRNAs, tmRNA, RNase P component,riboswitches) For small RNAs statisticalmodels can be generated and used to identify them in newly sequenced genomes Large RNAs are found by sequence similarity search (BLASTn) => there is no universally accepted tool; many errors in defining the boundariessearch_for_rnas by Niels Larsen, rRNA database –usedby all 3 servers predicting rRNAs 1497,1464Genoscope Synechococcus sp.WH78031498Genoscope Synechococcus sp.RCC3071324TIGR Synechococcus sp.JA-3-3Ab 1323TIGR Synechococcus sp.JA-2-3BA(2-13)1490JGI Synechococcus elongatus PCC 79421440JGI Synechococcus 96051477UCSD, TIGR Synechococcus 931116S rRNA,ntSequencing center Genome Advancing Science with DNA Sequence•Small RNAs (also called non-coding or ncRNAs) are found bysearch against Rfam covariance models using INFERNAL software suite and Rfam collection of models –see //Software/Rfam//•Both Rfam servers provide pre-calculated lists of short ncRNAs; Sanger center also provides web search facility for short DNA sequencesOther (less popular) tools:•Pipeline for discovering cis-regulatory ncRNA motifs:/supplements/yzizhen/pipeline/•RNAz http://www.tbi.univie.ac.at/~wash/RNAz/Finding stable RNAs -IIAdvancing Science with DNA SequenceReading frames: translations of the nucleotide sequence with an offset of 0, 1 and 2 nucleotides (three possible translations in each direction)Open reading frame (ORF):reading frame between a start and stop codonFinding protein-coding genesAdvancing Science with DNA SequenceFinding of protein-coding genesas classification problemORFsPredictedgenes ClassifierGenesPredictedNon-codingORFsReal genesNon-coding ORFsAdvancing Science with DNA SequenceEvidence-based vs ab initioalgorithmsTwo major approaches:•“evidence based”(ORFs with translationshomologous to the known proteins are CDSs) Advantages: finds the “unusual”genes (e. g. horizontallytransferred); relatively low rate of false positive predictions Limitations: cannot find “unique”genes; low sensitivity towards short genes; prone to propagation of false positive results of ab initio annotation tools•ab initio(ORFs with nucleotide compositionsimilar to CDSs are also CDSs)Advantages: finds “unique”genes; high sensitivityLimitations: often misses “unusual”genes; high rate of false positivesAdvancing Science with DNA SequenceMost popular CDS-finding tools •CRITICA•Glimmer family (Glimmer2, Glimmer3, RBS finder)•GeneMark family (GeneMark-hmm, GeneMarkS)•EasyGeneCombinations and variations of the above •REGANOR (CRITICA + Glimmer3 + pre-processing)•ORNL pipeline (CRITICA + Glimmer3)•RAST (Glimmer2 + pre-and post-processing)Advancing Science with DNA Sequence Features and differences betweengene finding tools •Training set selection(evidence-based vs purely abinitio)•Statistical model of coding and non-codingregions(codon frequencies, dicodon frequencies, hidden Markov models)•Statistical model architecture(i. e. which parts of the CDS are explicitly modeled –may include RBS, spacerregion, start codon, second codon, internal codons, stopcodon, etc.)•Additional algorithms for refinement ofpredictions(RBS finder, overlap resolution, estimationof statistical significance)Advancing Science with DNA SequenceExamples -I •CRITICA and EasyGene use evidence-based training sets (BLASTn with counting synonymous/non-synonymous codons in CRITICA, BLASTx in EasyGene)•Glimmer and GeneMark use ab initio training sets (Glimmer uses long non-overlapping ORFs, GeneMark uses heuristic model)•Tools using ab initio training sets run much faster than tools using evidence-based training setsAdvancing Science with DNA SequenceExamples –II•CRITICA uses dicodon frequencies to model coding regions •Glimmer uses interpolated Markov models (IMM) of up to 5-th order; GeneMark uses order 2 hmm for coding regions,order 0 hmm for non-coding regions; EasyGene uses order4 hmm for coding regions, order 0 hmm for non-coding regions •CRITICA is the least sensitive •Order of the Markov model will determine the minimal size of the training set => application to metagenomes1638.4409.6102.425.66.41.6Minimal size of thetraining set (kb)543210Markov modelorder Advancing Science with DNA SequenceDifferent gene prediction toolsapplied to the same genome000821882188218Glimmer3063180633963396420REGANOR 063180542254225503RAST 163180699469947076ORNL 262180697469747059GeneMark 16318343669970427124manual total misc RNAtotal tRNA total rRNA pseudo non-pseudo CDSs total CDSs total features 38.126877.65427.150019.914083.3237Glimmer333.6237315.010592.92072.820312.8904REGANOR 60.5426131.622281.1832.316725.31783RAST 37.2262216.311524.23007.95608.6610ORNL 22.415811.510612.08464.02824.9347GeneMark % CDSs#% CDSs #% CDSs #% CDSs #% CDSs #total modificationstoo long (truncated by manual curation)too short (extended by manual curation)false positive (deleted by manual curation)missed by automated annotationAdvancing Science with DNA SequenceConclusions•There are several tools for automated annotation of microbial genomes•These tools identify a limited range of features and development of tools for identification of operons, promoters, terminators etc. is highly desirable •But this development requires significant experimental input•Different automated gene finders have different advantages and limitations; the best strategy is using any of them or a combination followed by evidence-based manual curation•=> talk on Wednesday by Thanos Lykidis。