生物信息学概论
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
生物信息学概论
2013-6-24
提纲
1. 发展简史 2. 主要研究领域 3. 软件和工具
1. 发展简史
1946年
美国生产出第一台全自动电子数字计算机“埃尼阿克”
1. 发展简史
1955年
Frederick Sanger determined the complete amino acid sequence of insulin in 1955 and earned him his first Nobel prize in Chemistry in 1958.
1. 发展简史
1965年
The first Atlas of Protein Sequence and Structure contained sequence information on 65 proteins.
Dr. Margaret Oakley Dayhoff (1925-1983) was a pioneer in the use of computers in chemistry and biology, beginning with her PhD thesis project in 1948. Her work was multi-disciplinary, and used her knowledge of chemistry, mathematics, biology and computer science to develop an entirely new field. She is credited today as a founder of the field of Bioinformatics.
1. 发展简史
1965年
First use of molecular sequences for evolutionary studies
One of the founding fathers of the field of molecular evolution
Zuckerkandl, E. and Pauling, L. (1965). "Molecules as documents of evolutionary history." Journal of theoretical biology 8(2): 357.
1. 发展简史
1967年
Use of molecular sequences to build trees
Fitch, W. M. and Margoliash, E. (1967). "Construction of phylogenetic trees." Science 155(3760): 279-284.
1. 发展简史
1970年
Needleman-Wunsch algorithm
比较两条序列在全局范围的相似性
Needleman, S. and Wunsch, C. (1970 ). "A general method applicable to the search for similarities in the amino acid sequence of two proteins." J Mol Biol. 48(3): 443-53.
1. 发展简史
1974年 First secondary structure prediction method
Chou, P. Y. and Fasman, G. D. (1974). "Prediction of protein conformation." Biochemistry 13(2): 222-245.
1. 发展简史
1981年
Smith-Waterman algorithm
比较两条序列在局部范围的相似性
SMITH, T. F. and WATERMAN, M. (1981). "Identification of common molecular subsequences." J. Mol. Biol. 147: 195-197.
1. 发展简史
1987年
The first approach for an efficient multiple sequence alignment procedure, later implemented in CLUSTAL
多序列比对算法
Feng, D. and Dolittle, R. F. (1987). "Progressive sequence alignment as a prerequisite to correct phylogenetic trees." J. Mol. Eovl 60: 351-360.
1. 发展简史
1990年
BLAST
数据库局部相似性搜索工具
Altschul, S et al. (1990 ). "Basic local alignment search tool." J Mol Biol. 215(3): 403-10.
1. 发展简史:基因组计划的实施
1990
人类基因组计划 (Human Genome Project, HGP)开始
实施
1995
第一个细菌基因组被完 全测序:嗜血流感菌 (Haemophilus influnzae)
1996
第一个真核生物基因
组被完全测序:酵母。
1996
第一个古细菌基因组 (Methanococcus jannaschii)测序完成。
1997
9月,大肠杆菌K12基 因组测序结果发表。 大肠杆菌基因组大小 约为4600kb,含有约 4000个基因。
1998
完成第一个多细胞生物线
虫(C. elegans)的基因
组测序。线虫基因大小为
97 Mbp,含有2万个基因。
2000
3月,完成黑腹果蝇
(Drosophila melanogaster )
基因组测序。
2001 2/15
2/16
Next-Generation Sequencing (NGS)
• 2001: Pyrosequencing
• 2007: Solexa
1870 1940 Efficiency (bp/person/year) 1 15 150 1,500 15,000 25,000 50,000 1977 1980
Miescher: Discovers DNA Avery: Proposes DNA as ‘Genetic Material’
1953
Watson & Crick: Double Helix Structure of DNA
1965 1970
Holley: Sequences Yeast tRNAAla Wu: Sequences λ Cohesive End DNA Sanger: Dideoxy Chain Termination Gilbert: Chemical Degradation Messing: M13 Cloning
1986 1990
Hood et al.: Partial Automation
200,000
• Cycle Sequencing • Improved Sequencing Enzymes • Improved Fluorescent Detection Schemes
>100,000,000
2008
Messing & Llaca, PNAS (1998)
The Third-Generation Sequencing
True Single Molecule Sequencing
HeliScope™ Single Molecule Sequencer
http://www.helicosbio.com/Technology/TrueSingleMoleculeSequencing/tabid/64/Default.asp x
http://www.helicosbio.com/Portals/0/Documents/Helicos%20tSMS%20Technology%20Primer.pdf
GOLD
• http://www.genomesonline.org/cgibin/GOLD/index.cgi
110 yrs
15 yrs
1865
1975
1990
11 yrs
2001
DNA测序技术和计算机技术的 发展、基因组计划的实施,改变了
生物学的研究模式。
Nature Methods - 5, 16 - 18 (2008,Jan)
National Center for Biotechnology Information,NCBI
SCIENCE VOL 295 1 MARCH 2002
Bioinformatics is the branch of biology that is concerned with the acquisition, storage, display, and analysis of the information found in nucleic acid and protein sequence data. Computers and bioinformatics softwares are the tools of the trade.
提纲
1. 发展简史 2. 主要研究领域 3. 软件和工具
2.1 序列分析
• DNA sequences
– gene finding – regulatory sequences – structural motifs – repetitive sequences – junk DNA
2.1 序列分析
• Protein sequences
– Pattern search – Post-translational modification prediction
• Glycosylation • Phosphorylation
– Subcellular localization – Primary structure
• MW/pI/Hydrophobicity
– Secondary structure – Tertiary structure
2.1 序列分析
• Sequence alignment
– Pair-wise alignment – Multiple sequence alignment – Local sequence alignment – Global sequence alignment
2.2 计算进化生物学
• Trace the evolution of a large number of organisms by measuring changes in their DNA • Compare entire genomes, which permits the study of more complex evolutionary events
– gene duplication – horizontal gene transfer – prediction of factors important in bacterial speciation.
• Build complex computational models of populations • Reconstruct the now more complex tree of life
2.3 文献分析
• Employ computational and statistical linguistics to mine this growing library of text resources. For example:
– abbreviation recognition – identify the long-form and abbreviation of biological terms, – named entity recognition – recognizing biological terms such as gene names – protein-protein interaction – identify which proteins interact with which proteins from text
2013-6-24
提纲
1. 发展简史 2. 主要研究领域 3. 软件和工具
1. 发展简史
1946年
美国生产出第一台全自动电子数字计算机“埃尼阿克”
1. 发展简史
1955年
Frederick Sanger determined the complete amino acid sequence of insulin in 1955 and earned him his first Nobel prize in Chemistry in 1958.
1. 发展简史
1965年
The first Atlas of Protein Sequence and Structure contained sequence information on 65 proteins.
Dr. Margaret Oakley Dayhoff (1925-1983) was a pioneer in the use of computers in chemistry and biology, beginning with her PhD thesis project in 1948. Her work was multi-disciplinary, and used her knowledge of chemistry, mathematics, biology and computer science to develop an entirely new field. She is credited today as a founder of the field of Bioinformatics.
1. 发展简史
1965年
First use of molecular sequences for evolutionary studies
One of the founding fathers of the field of molecular evolution
Zuckerkandl, E. and Pauling, L. (1965). "Molecules as documents of evolutionary history." Journal of theoretical biology 8(2): 357.
1. 发展简史
1967年
Use of molecular sequences to build trees
Fitch, W. M. and Margoliash, E. (1967). "Construction of phylogenetic trees." Science 155(3760): 279-284.
1. 发展简史
1970年
Needleman-Wunsch algorithm
比较两条序列在全局范围的相似性
Needleman, S. and Wunsch, C. (1970 ). "A general method applicable to the search for similarities in the amino acid sequence of two proteins." J Mol Biol. 48(3): 443-53.
1. 发展简史
1974年 First secondary structure prediction method
Chou, P. Y. and Fasman, G. D. (1974). "Prediction of protein conformation." Biochemistry 13(2): 222-245.
1. 发展简史
1981年
Smith-Waterman algorithm
比较两条序列在局部范围的相似性
SMITH, T. F. and WATERMAN, M. (1981). "Identification of common molecular subsequences." J. Mol. Biol. 147: 195-197.
1. 发展简史
1987年
The first approach for an efficient multiple sequence alignment procedure, later implemented in CLUSTAL
多序列比对算法
Feng, D. and Dolittle, R. F. (1987). "Progressive sequence alignment as a prerequisite to correct phylogenetic trees." J. Mol. Eovl 60: 351-360.
1. 发展简史
1990年
BLAST
数据库局部相似性搜索工具
Altschul, S et al. (1990 ). "Basic local alignment search tool." J Mol Biol. 215(3): 403-10.
1. 发展简史:基因组计划的实施
1990
人类基因组计划 (Human Genome Project, HGP)开始
实施
1995
第一个细菌基因组被完 全测序:嗜血流感菌 (Haemophilus influnzae)
1996
第一个真核生物基因
组被完全测序:酵母。
1996
第一个古细菌基因组 (Methanococcus jannaschii)测序完成。
1997
9月,大肠杆菌K12基 因组测序结果发表。 大肠杆菌基因组大小 约为4600kb,含有约 4000个基因。
1998
完成第一个多细胞生物线
虫(C. elegans)的基因
组测序。线虫基因大小为
97 Mbp,含有2万个基因。
2000
3月,完成黑腹果蝇
(Drosophila melanogaster )
基因组测序。
2001 2/15
2/16
Next-Generation Sequencing (NGS)
• 2001: Pyrosequencing
• 2007: Solexa
1870 1940 Efficiency (bp/person/year) 1 15 150 1,500 15,000 25,000 50,000 1977 1980
Miescher: Discovers DNA Avery: Proposes DNA as ‘Genetic Material’
1953
Watson & Crick: Double Helix Structure of DNA
1965 1970
Holley: Sequences Yeast tRNAAla Wu: Sequences λ Cohesive End DNA Sanger: Dideoxy Chain Termination Gilbert: Chemical Degradation Messing: M13 Cloning
1986 1990
Hood et al.: Partial Automation
200,000
• Cycle Sequencing • Improved Sequencing Enzymes • Improved Fluorescent Detection Schemes
>100,000,000
2008
Messing & Llaca, PNAS (1998)
The Third-Generation Sequencing
True Single Molecule Sequencing
HeliScope™ Single Molecule Sequencer
http://www.helicosbio.com/Technology/TrueSingleMoleculeSequencing/tabid/64/Default.asp x
http://www.helicosbio.com/Portals/0/Documents/Helicos%20tSMS%20Technology%20Primer.pdf
GOLD
• http://www.genomesonline.org/cgibin/GOLD/index.cgi
110 yrs
15 yrs
1865
1975
1990
11 yrs
2001
DNA测序技术和计算机技术的 发展、基因组计划的实施,改变了
生物学的研究模式。
Nature Methods - 5, 16 - 18 (2008,Jan)
National Center for Biotechnology Information,NCBI
SCIENCE VOL 295 1 MARCH 2002
Bioinformatics is the branch of biology that is concerned with the acquisition, storage, display, and analysis of the information found in nucleic acid and protein sequence data. Computers and bioinformatics softwares are the tools of the trade.
提纲
1. 发展简史 2. 主要研究领域 3. 软件和工具
2.1 序列分析
• DNA sequences
– gene finding – regulatory sequences – structural motifs – repetitive sequences – junk DNA
2.1 序列分析
• Protein sequences
– Pattern search – Post-translational modification prediction
• Glycosylation • Phosphorylation
– Subcellular localization – Primary structure
• MW/pI/Hydrophobicity
– Secondary structure – Tertiary structure
2.1 序列分析
• Sequence alignment
– Pair-wise alignment – Multiple sequence alignment – Local sequence alignment – Global sequence alignment
2.2 计算进化生物学
• Trace the evolution of a large number of organisms by measuring changes in their DNA • Compare entire genomes, which permits the study of more complex evolutionary events
– gene duplication – horizontal gene transfer – prediction of factors important in bacterial speciation.
• Build complex computational models of populations • Reconstruct the now more complex tree of life
2.3 文献分析
• Employ computational and statistical linguistics to mine this growing library of text resources. For example:
– abbreviation recognition – identify the long-form and abbreviation of biological terms, – named entity recognition – recognizing biological terms such as gene names – protein-protein interaction – identify which proteins interact with which proteins from text