生物信息学Bioinformatics
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
tblastn: protein sequence against translated database (find protein homologs in unannotated nucleotide sequences such as EST or draft genome sequences).
Analysis!
Definition Bioinformatics
an intersecting discipline
----is an interdisciplinary scientific field that develops methods and software tools for storing, retrieving, organizing and analyzing biological data.
Kir2.1
/
常用蛋白质三维结构观察和修改工具
工具
网站
备注
SwissPdbViewer Jmol
MolMol PyMol Rasmol VMD
/spdbv/
/
NCBI BLAST: Understand the Output
Raw score (S) Calculated by summing scored for individual aligned position. Scores for each position are calculated using a substitution matrix BLOSUM or PAM).
Biological Data
Biological Data
Simpleroject, HGP ----The challenge of huge data
The storage, management and sharing of the data.
tblastx: translated sequence against translated database
bl2seq: compare two nucleotide or protein sequences.
Simple example:
• Sequence1:ACGGTTCACGTTCCA • Sequence2:ACGGTCAC
• Transcriptomics
– Analysis of gene expression data (key issue is to find genes that are either up- or down-regulated under specific circumstances) (microRNA, LncRNA, circRNA)
• Genetics
– Dissection of genetic basis of disease and other phenotypes.
– GWAS, Exon sequencing
– Linkage analysis
……
Sequence Homology, Similarity and Comparison
http://www.mol.biol.ethz.ch/wuthrich/software/m olmol/ / /software/rasmol/ /Research/vmd/
Similarity VS Homology
➢Homology sequences are Similarity. ➢Similar sequence does not necessarily Homologous.
Global Alignment versus Local Alignment
Global alignment
Bit score (S’) Derived from the raw score by normalizing them with statistical variables that defines the matrix.
S’ = ( x S -lnK)/ln2
where lamda is a constant associated the scoring matrix and K is constant for search space size
frog A chick A mouse A mouse B chick B paralogs
frog B
A-chain gene
B-chain gene
gene duplication
early globin gene
Similarity
---DNA sequences ---AA sequences
• Sequence1:ACGGTTCACGTTTCCA • Sequence2:ACGGTCACG • Sequence1:ACGGTTCACGTTTCCA • Sequence2:ACGGT--CACG
Example of BLAST Input
Example of BLAST Output
----basis of bioinformatics
Homology
Homology Sequence:
-----Homologous sequences are orthologous if they are inferred to be descended from the same ancestral sequence separated by a speciation event. ----Divergent evolution
网络游览器插件,可以在网页中直接 观察PDB格式的文件 免费分子模拟显示程序,还包括结构 比对、药物筛选等功能 三维分子游览工具,有序列比对显示 功能,由MolSodt公司免费推出
• -Interactomics
/
Analysis of protein-protein interactions and molecular pathways
gene
they were created by a duplication event within the genome.
是指同一基因组(或同系物种的基因组)中,由于始祖 基因的加倍而横向(horizontal)产生的几个同源基因
Orthologs vs Paralogs
orthologs
orthologs
Bioinformatics
Outline
➢Background ➢Definition ➢Application ➢Resource
Background
Biological Data
Central dogma
Biological Data
Biological Data
Biological Data
一个界面非常友好的工具,可以分析 蛋白质的结构性质,比较活性位点或 突变点 一个基于Java语言开发的三维观察工 具,大多是作为一个内嵌式网页工具 快速游览结构数据库数据
免费的PDB三维分子观察软件,可以 通过处理生成很漂亮的图形文件
一个基于开源的三维观察工具,有很 多额外的插件来提升功能 很有名的三维观察软件,操作界面简 介,用命令行实现多种功能 用内建的脚本来浏览、分析三维结构, 还可以以动画的形式模拟蛋白质结构
Application
• Genomics
– Genomic sequencing and mapping (structural genomics) (HGP) (NGS)
– Genome annotation (key issue is gene finding) (functional genomics) (ORF,NON-RNA)
– Structure prediction( Sequence-> Structure->Function)
– Protein network (PPI)
Motif name
Sequences
结 构 预 测 流 程
蛋白质结构数据库
/pdb/home/home.do
Find the best possible alignment over the entire length of compared sequences.
Local alignment
1. Find the best aligned subsequences. 2. Used to find conserved regions (domains, motifs etc). 3. Used widely in sequence analyses (BLAST: Basic Local Alignment Search Tool)
– Microarray – RNA-Seq
• Proteomics
– Analysis of protein sequences (key issue is discovering functional motifs that are conserved across evolution, and using these motifs to functionally classify novel sequences).
– Comparisons of multiple genomes (comparative genomics)
Mus musculus vs Human (90%) Chimpanzee vs Human (99%)
Human and Mouse (1%)
Synteny
Phylogenetic Tree of Life
Orthology and Paralogy
Orthology genes:
(1)vertical descent (2)Two or more genomes (3)Function conserved (4) structure similarity
Ancestral
Paralogy genes:
----as an interdisciplinary field, bioinformatics combines computerscience, statistics, mathematics and engineering to study and process biological data.
blastx: translated query in all six reading frames against protein database (find homologous proteins to those encoded by the nucleotide sequences. Often used when the frame of the query sequence is unknown or contain frameshift such as EST sequences).
Global alignment
Local alignment
NCBI BLAST Programs
blastn: nucleotide sequence against nucleotide database
blastp: protein sequence against protein database
Chime
Chimera
ICMBrowser
/products/framework/chime/i ndex.jsp /chimera/index.html
/icm_browser.html
Analysis!
Definition Bioinformatics
an intersecting discipline
----is an interdisciplinary scientific field that develops methods and software tools for storing, retrieving, organizing and analyzing biological data.
Kir2.1
/
常用蛋白质三维结构观察和修改工具
工具
网站
备注
SwissPdbViewer Jmol
MolMol PyMol Rasmol VMD
/spdbv/
/
NCBI BLAST: Understand the Output
Raw score (S) Calculated by summing scored for individual aligned position. Scores for each position are calculated using a substitution matrix BLOSUM or PAM).
Biological Data
Biological Data
Simpleroject, HGP ----The challenge of huge data
The storage, management and sharing of the data.
tblastx: translated sequence against translated database
bl2seq: compare two nucleotide or protein sequences.
Simple example:
• Sequence1:ACGGTTCACGTTCCA • Sequence2:ACGGTCAC
• Transcriptomics
– Analysis of gene expression data (key issue is to find genes that are either up- or down-regulated under specific circumstances) (microRNA, LncRNA, circRNA)
• Genetics
– Dissection of genetic basis of disease and other phenotypes.
– GWAS, Exon sequencing
– Linkage analysis
……
Sequence Homology, Similarity and Comparison
http://www.mol.biol.ethz.ch/wuthrich/software/m olmol/ / /software/rasmol/ /Research/vmd/
Similarity VS Homology
➢Homology sequences are Similarity. ➢Similar sequence does not necessarily Homologous.
Global Alignment versus Local Alignment
Global alignment
Bit score (S’) Derived from the raw score by normalizing them with statistical variables that defines the matrix.
S’ = ( x S -lnK)/ln2
where lamda is a constant associated the scoring matrix and K is constant for search space size
frog A chick A mouse A mouse B chick B paralogs
frog B
A-chain gene
B-chain gene
gene duplication
early globin gene
Similarity
---DNA sequences ---AA sequences
• Sequence1:ACGGTTCACGTTTCCA • Sequence2:ACGGTCACG • Sequence1:ACGGTTCACGTTTCCA • Sequence2:ACGGT--CACG
Example of BLAST Input
Example of BLAST Output
----basis of bioinformatics
Homology
Homology Sequence:
-----Homologous sequences are orthologous if they are inferred to be descended from the same ancestral sequence separated by a speciation event. ----Divergent evolution
网络游览器插件,可以在网页中直接 观察PDB格式的文件 免费分子模拟显示程序,还包括结构 比对、药物筛选等功能 三维分子游览工具,有序列比对显示 功能,由MolSodt公司免费推出
• -Interactomics
/
Analysis of protein-protein interactions and molecular pathways
gene
they were created by a duplication event within the genome.
是指同一基因组(或同系物种的基因组)中,由于始祖 基因的加倍而横向(horizontal)产生的几个同源基因
Orthologs vs Paralogs
orthologs
orthologs
Bioinformatics
Outline
➢Background ➢Definition ➢Application ➢Resource
Background
Biological Data
Central dogma
Biological Data
Biological Data
Biological Data
一个界面非常友好的工具,可以分析 蛋白质的结构性质,比较活性位点或 突变点 一个基于Java语言开发的三维观察工 具,大多是作为一个内嵌式网页工具 快速游览结构数据库数据
免费的PDB三维分子观察软件,可以 通过处理生成很漂亮的图形文件
一个基于开源的三维观察工具,有很 多额外的插件来提升功能 很有名的三维观察软件,操作界面简 介,用命令行实现多种功能 用内建的脚本来浏览、分析三维结构, 还可以以动画的形式模拟蛋白质结构
Application
• Genomics
– Genomic sequencing and mapping (structural genomics) (HGP) (NGS)
– Genome annotation (key issue is gene finding) (functional genomics) (ORF,NON-RNA)
– Structure prediction( Sequence-> Structure->Function)
– Protein network (PPI)
Motif name
Sequences
结 构 预 测 流 程
蛋白质结构数据库
/pdb/home/home.do
Find the best possible alignment over the entire length of compared sequences.
Local alignment
1. Find the best aligned subsequences. 2. Used to find conserved regions (domains, motifs etc). 3. Used widely in sequence analyses (BLAST: Basic Local Alignment Search Tool)
– Microarray – RNA-Seq
• Proteomics
– Analysis of protein sequences (key issue is discovering functional motifs that are conserved across evolution, and using these motifs to functionally classify novel sequences).
– Comparisons of multiple genomes (comparative genomics)
Mus musculus vs Human (90%) Chimpanzee vs Human (99%)
Human and Mouse (1%)
Synteny
Phylogenetic Tree of Life
Orthology and Paralogy
Orthology genes:
(1)vertical descent (2)Two or more genomes (3)Function conserved (4) structure similarity
Ancestral
Paralogy genes:
----as an interdisciplinary field, bioinformatics combines computerscience, statistics, mathematics and engineering to study and process biological data.
blastx: translated query in all six reading frames against protein database (find homologous proteins to those encoded by the nucleotide sequences. Often used when the frame of the query sequence is unknown or contain frameshift such as EST sequences).
Global alignment
Local alignment
NCBI BLAST Programs
blastn: nucleotide sequence against nucleotide database
blastp: protein sequence against protein database
Chime
Chimera
ICMBrowser
/products/framework/chime/i ndex.jsp /chimera/index.html
/icm_browser.html