生物信息学课件_L6
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
• Advanced Blast
– MegaBLAST is suit for the alignments of about 95% identities within the same or closely related species. – PSI-BLAST is used to get more target sequences and allow the user to select sequences to build the PSSM for the next PSI-BLAST iteration – PHI-BLAST is used to limits alignments to those that match a pattern in the query
8
Term
motif (模体、基序)
• A short conserved region in a DNA or protein sequence which associated with distinct functions. • Average length: 10-20, or even shorter • In proteins, motif refers to highly conserved parts of domains, but a domain may or may not include motifs within its boundaries.
– a list of the nucleotide/amino acid frequencies/probability value/weight at each position
谱是指序列特征区域每一个位点上核苷酸/氨基酸残基的频率/概率值/权重。
• Hidden Markov Model (HMM, 隐马可夫模型)
10
Question1
Which part of DNA/protein sequence could form “functional region”?
11
Motifs in DNA/RNA
Domain/motifs Motifs/Domains in protein
in protein
Functional motif/domain Cellular localization Transmembrane region Post-translational modifications
• Sequences share common motifs do not directly mean they have the common ancient (homolog,同系 物/同源物).
9
Term
domain (结构域)
• Also a conserved sequence region, defined as an independent functional or structural unit. The combination of domains in a single protein determined its overall function. 结构域是蛋白质中有着特定功能/独立结构的保 守序列区域。而整个蛋白质的功能由其所含有 的多个结构域共同决定。 • Could consist of 40-700 residues, average length: 100 residues
Previous lecture
• Tips of Blast Search
– Evaluate the significance of your results
• View the E value, Bit Scores and Coverage region • Sometimes need a reciprocal BLAST
Generally we call them sequence motif or sequence domain.
7
Outline for this lecture
• Terms (术语) of functional region in sequences • Representation (表示形式)of functional region (motifs/domains) in sequences
• Terms (术语) of functional region in sequences • Representation (表示形式)of functional region (motifs/domains) in sequences
• Correctness Evaluation of representation
指一个最有可能的序列字符串(可使用也可不使用通配符),经常用于高度保守 的基序
• Regular expression/pattern (正则表达式/模式)
– a string with wildcards, constrained selection
模式是指由若干限定的选择组成的通配符串。
• Profile (谱)
– 广义:指下列所有的序列功能特征区域代表形式。一致性序列即是一种 序列模型,它能够从已观测到的序列字符出现的频率来预测还未观测到 的序列字符出现的频率。由于模型允许查询序列中出现部分匹配的情况, 因此常用来发现远源序列家族成员,增加分析的敏感性。 – 狭义:a single string with the most likely sequence (+/- wildcards)
13
www.stke.org/cgi/content/full/OC_sigtrans; 2001/113/re22
Question 2
How to represent these “functional region” (motifs/domains) in sequences?
14
Outline for this lecture
4
Protein family
• A protein family is a group of evolutionarily related proteins, and is often nearly synonymous with gene family. • Proteins in a family descend from a common ancestor and typically have similar threedimensional structures, functions, and significant sequence similarity. • Proteins that do not share a common ancestor are very unlikely to show statistically significant sequence similarity, making sequence alignment a powerful tool for identifying the members of protein families.
• Mainly Storage Database for motifs/domains and their representations • How to detect/summarize motifs/domains in sequences
15
Four representation
• Consensus sequence (一致性序列)
1. Consensus sequence
CREB transcription factor binding site (DNA)
• Main software for MSA
– ClustalW/X, T-coffee, MUSCLE… 2
Lect 6 特征序列的发现、总结与预测
凌毅 bioinfo_cau@yahoo.com.cn
3
Pre-Question
• What is protein family? • How they were detected? • Let us find the answer in Wikipedia, the free encyclopedia using google search
1
Previous lecture II
• Multiple Sequence Alignment
– Definition: 3 or more sequences are partially or completely aligned. Resides in the same column means homologous, evolutional and structural meaning – Properties (usually use protein sequence for MSA, not necessary a “correct” alignment of protein family) – Features (conserved position must be important for maintain its structure/function, conserved region like hydrophobicity/hydrophilicity motif or helix/sheet motif, consistent patterns of insertions or deletions (indels) – Use of MSA (more sensitive in detecting homologs than pairwise alignment, could find functional/structural conserved residues or region in the sequences,…) – Methods to do MSA • Exhaustive or heuristic algorithms – dynamic programming, progressive, iterative, consistency-based, structure-based • Three stages of construct MSA using progressive methods – Global pairwise alignment, making guide tree, progressive align the sequence according to the order of guide tree
5
序列中的 Functional region
6
In this lecture, “functional region” is defined broadly refers to subsequences of large sequences that share some common functionality.
– Use Request ID to recall the recent blast resuΒιβλιοθήκη Baidut within 24h
– View different reports of the blast result such as taxonomy report, search summary etc. – Adjust the Blast strategy, if the results are more/less…
• Correctness Evaluation of representation
• Mainly Storage Database for motifs/domains and their representations • How to detect/summarize motifs/domains in sequences