Yeast Protein Interactome Topology Provides Framework for Coordinated-Functionality

合集下载

蛋白质相互作用

蛋白质相互作用
荧光共振能量转移 (FRET)
原理:FRET在蛋白质相互作用研究旳基本原理是分别将bait蛋白、prey蛋白与相应旳供体荧光基团(如ECFP)和受体荧光基团(如 EYFP)融合优点:能检测到瞬时、较弱旳蛋白质相互作用;能同步检测到两蛋白旳细胞分布和作用位点。缺陷:光谱可能存在重叠,影响试验成果
研究蛋白质相互作用旳生物信息学措施
DD构造域4. SH构造域
相互作用区域是蛋白质相互作用旳构造基础
Interaction Domain-Structral basis for protein interaபைடு நூலகம்tion
PH构造域6. EH构造域
蛋白质相互作用旳试验技术
Chapter 2
9
蛋白质相互作用研究措施
酵母双杂交系统(Yeast two-hybrid system, Y2H)
串联亲和纯化(Tandem affinity purification, TAP
原理:老式旳TAP 标签蛋白由 Protein A、TEV蛋白酶可剪切序列和钙调蛋白结合肽(Calmodulin-binding peptide, CBP)构成。AP技术经过两步亲和纯化来降低非特异性蛋白结合。
蛋白质之间旳相互作用与其所具有旳特定构造域密不可分。经典蛋白质相互作用旳构造域是一种具有结合专一性旳独立折叠元件,能够插入新旳蛋白质中并保存结合靶部位旳能力。它们旳相互作用多是经过2个多肽表面几何构型和静电力而相互连接。
PDZ构造域LIM构造域DD构造域SH构造域PH构造域EH构造域
相互作用区域是蛋白质相互作用旳构造基础
谢谢大家!
生物信息学措施
02
蛋白质相互作用数据库
生物信息学措施
利用生物信息学措施能够从已知数据库中分析比较未知蛋白质旳功能及其有关旳相互作用蛋白。

蛋白质互作

蛋白质互作

在酵母中合成的遗传相互作用 Synthetic Genetic Interactions in Yeast
Tong, Boone
蛋白质相互作用网络与蛋白质功能预测
► 对蛋白质功能的研究将成为后基因时代研究的核心
内容之一。伴随着生物信息学的迅猛发展以及基因 表达谱和蛋白质相互作用数据的激增,利用计算方 法对蛋白质功能进行预测和注释成为越来越有效的 一种手段。目前应用较为广泛的蛋白质功能预测主 要基于以下几方面:同源序列、基因组对比、系统 进化特征谱、基因表达谱数据以及蛋白质相互作用 网络等。由于基于蛋白质相互作用网络的功能预测 能整合多种数据信息,并具有从整体水平上准确预 测蛋白质功能的优点,该方法已成为蛋白质功能分 析及预测中的热点。
直接注释方法
► 直接注释方法基于:在蛋白质相互作用网络
中,距离相近的两个蛋白质更加倾向于拥有 相似的功能。而通过两蛋白质在网络中的距 离来计算并判断这两个蛋白质功能相似性有 许多的方法。 ► 邻居节点计算法(neighborhood counting) ► 图论方法(graph theoretic method) ► 马可夫随机场方法
Information Scope
Evolutionary Biology Biophysics Genetics Biochemistry Clinical Studies
Molecular Biology
Chemistry
Epidemiology
DB
Proteomics
Population Biology GenP),1999年由
UCLA的David Elsenberg实验室建立,目标 是成为一个蛋白质-蛋白质相互作用的文件 库,把关于蛋白互相作用的多样的实验信息 整合成一个容易进行查询的专一数据库。 ► DIP关注的是蛋白质配体,但是现在也包括 一些大蛋白合成物。研究人员可以免费获得 数据,并且搜索一个特殊蛋白质的相互影响 配体。

蛋白质相互作用贺俊崎

蛋白质相互作用贺俊崎
✓ in 1990, several groups independently discovered that SH2 domain bind to motifs containing a phosphorylated tyrosine
✓ the general motif bound by SH2 domains: pTyr-x-x-hydrophobic
• The most studied ones in signaling proteins include: the SH2, SH3, PTB, WW, EVH1, FHA, PDZ, and PH domains.
• In transcription factors include: the HLH, leucine zipper, ankyrin repeat.
32
Cell surface receptors signal through modular protein interactions
Cell, 2004, 116:191.
33
SRC
✓ the “Revolution of `76”: Src identigied as the first proto-oncogene ✓ cloning of related mammmalian genes revealed highly conserved regions of the Src protein:
Quaternary structure
Hemoglobin(血红蛋白)
α1-yellow; β1-light blue; α2-green; β2-dark blue; heme-red28
Some protein-protein interactions are very strong, forming

蛋白互作研究方法-文档资料

蛋白互作研究方法-文档资料
蛋白质相互作用
研究方法
王心宇
College of Life Sciences
1 研究蛋白质相互作用的意义
随着生命现象的研究逐渐由获取基因序列 信息转向研究基因功能,一门新的学科——— 蛋白质组学proteomics应运而生。蛋白质组 是一个在空间和时间上动态变化的整体,其功 能往往是通过蛋白质之间或与核酸之间相互 作用而表现出来的,这种相互作用存在于机体 每个细胞的生命活动过程中,相互交叉形成网 络,构成细胞中一系列重要生理活动的基础。 因此,对于蛋白质相互作用interactome的研 究就成为蛋白质组学中最主要研究内容之一。
3 酵母双杂交技术
3.1 原理
DNA-binding and activating functions in a transcription factor may comprise independent domains of the protein.
The two hybrid technique tests the ability of two proteins to interact by incorporating them into hybrid proteins where one has a DNA-binding domain and the other has a transcriptionactivating domain.
Positive clones confirmed on SD/-Ade/-His/-Leu/-Trp/X Information
3.3 膜蛋白互作研究方法
---基于泛素的酵母双杂交 体系(mating-based Split ubiquitin system "mbSUS“)

蛋白质相互作用

蛋白质相互作用

荧光共振能量转移 (FRET)
原理:FRET在蛋白质相互作用研究的 基本原理是分别将bait蛋白、prey蛋 白与相应的供体荧光基团(如ECFP)和 受体荧光基团(如 EYFP)融合
优点:能检测到瞬时、较弱的蛋白质 相互作用;能同时检测到两蛋白的细 胞分布和作用位点。 缺点:光谱可能存在重叠,影响实验 结果
Interaction Domain-Structral basis for protein interaction
相互作用区域是蛋白质相互作用的结构基础
1. PDZ结构域
2. LIM结构域
Interaction Domain-Structral basis for protein interaction
免疫共沉淀(Co-immunoprecipitation, Co-IP)
原理:利用抗体和抗原之间 特异性的识别和结合,通过亲 和纯化,分离出抗原蛋白和单 克隆抗体。 优点:保留蛋白质的修饰 和结合状态,同时所需的样本 量较少。 缺点:Co-IP特异性较低
双分子荧光互补 ( BiFC)
原理 :将荧光蛋白分成两个无独 立功能的片段,分别与 bait 蛋白和 prey 蛋白融合表达。 优点:能简单方便地通过观察荧 光鉴定蛋白质相互作用 缺点:不能实时反应蛋白质的结 合和分离情况。
临界值内。如果两个结构域中至少有五个原子的距离在 5 Å 之内,那么这两个结构域之间存在相互作用。
Full Atom Contact (FAC) PSIMAP 方法 Sampled Atom Contact (SAC) PSIMAP
最精确
节约时间和搜索空间
Bounding Box Contact (BBC) PSIMAP
9

Global landscape of protein complexes in the yeast Saccharomyces cerevisiae

Global landscape of protein complexes in the yeast Saccharomyces cerevisiae

Global landscape of protein complexes in the yeast Saccharomyces cerevisiaeNevan J.Krogan1,2*†,Gerard Cagney1,3*,Haiyuan Yu4,Gouqing Zhong1,Xinghua Guo1,Alexandr Ignatchenko1, Joyce Li1,Shuye Pu5,Nira Datta1,Aaron P.Tikuisis1,Thanuja Punna1,Jose´M.Peregrı´n-Alvarez5,Michael Shales1,Xin Zhang1,Michael Davey1,Mark D.Robinson1,Alberto Paccanaro4,James E.Bray1, Anthony Sheung1,Bryan Beattie6,Dawn P.Richards6,Veronica Canadien6,Atanas Lalev1,Frank Mena6,Peter Wong1,Andrei Starostine1,Myra M.Canete1,James Vlasblom5,Samuel Wu5,Chris Orsi5,Sean R.Collins7, Shamanta Chandran1,Robin Haw1,Jennifer J.Rilstone1,Kiran Gandi1,Natalie J.Thompson1,Gabe Musso1, Peter St Onge1,Shaun Ghanny1,Mandy m1,2,Gareth Butland1,Amin M.Altaf-Ul8,Shigehiko Kanaya8,Ali Shilatifard9,Erin O’Shea10,Jonathan S.Weissman7,C.James Ingles1,2,Timothy R.Hughes1,2,John Parkinson5, Mark Gerstein4,Shoshana J.Wodak5,Andrew Emili1,2&Jack F.Greenblatt1,2Identification of protein–protein interactions often provides insight into protein function,and many cellular processes are performed by stable protein complexes.We used tandem affinity purification to process4,562different tagged proteins of the yeast Saccharomyces cerevisiae.Each preparation was analysed by both matrix-assisted laser desorption/ ionization–time offlight mass spectrometry and liquid chromatography tandem mass spectrometry to increase coverage and accuracy.Machine learning was used to integrate the mass spectrometry scores and assign probabilities to the protein–protein interactions.Among4,087different proteins identified with high confidence by mass spectrometry from 2,357successful purifications,our core data set(median precision of0.69)comprises7,123protein–protein interactions involving2,708proteins.A Markov clustering algorithm organized these interactions into547protein complexes averaging4.9subunits per complex,about half of them absent from the MIPS database,as well as429additional interactions between pairs of complexes.The data(all of which are available online)will help future studies on individual proteins as well as functional genomics and systems biology.Elucidation of the budding yeast genome sequence1initiated a decade of landmark studies addressing key aspects of yeast cell biology on a system-wide level.These included microarray-based analysis of gene expression2,screens for various biochemical activi-ties3,4,identification of protein subcellular locations5,6,and identify-ing effects of single and pairwise gene disruptions7–10.Other efforts were made to catalogue physical interactions among yeast proteins, primarily using the yeast two-hybrid method11,12and direct purifi-cation via affinity tags13,14;many of these interactions are conserved in other organisms15.Data from the yeast protein–protein interaction studies have been non-overlapping to a surprising degree,a fact explained partly by experimental inaccuracy and partly by indications that no single screen has been comprehensive16.Proteome-wide purification of protein complexesOf the various high throughput experimental methods used thus far to identify protein–protein interactions11–14,tandem affinity purification(TAP)of affinity-tagged proteins expressed from their natural chromosomal locations followed by mass spectrometry13,17 has provided the best coverage and accuracy16.To map more completely the yeast protein interaction network(interactome), S.cerevisiae strains were generated with in-frame insertions of TAP tags individually introduced by homologous recombination at the 30end of each predicted open reading frame(ORF)(http:// /)18,19.Proteins were purified from4L yeast cultures under native conditions,and the identities of the co-purifying proteins(preys)determined in two complementary ways17.Each purified protein preparation was electrophoresed on an SDS polyacrylamide gel,stained with silver,and visible bands removed and identified by trypsin digestion and peptide mass fingerprinting using matrix-assisted laser desorption/ionization–time offlight(MALDI–TOF)mass spectrometry.In parallel,another aliquot of each purified protein preparation was digested in solution and the peptides were separated and sequenced by data-dependent liquid chromatography tandem mass spectrometry(LC-MS/ MS)17,20–22.Because either mass spectrometry method often fails toARTICLES1Banting and Best Department of Medical Research,Terrence Donnelly Centre for Cellular and Biomolecular Research,University of Toronto,160College St,Toronto,OntarioM5S3E1,Canada.2Department of Medical Genetics and Microbiology,University of Toronto,1Kings College Circle,Toronto,Ontario M5S1A8,Canada.3Conway Institute, University College Dublin,Belfield,Dublin4,Ireland.4Department of Molecular Biophysics and Biochemistry,266Whitney Avenue,Yale University,PO Box208114,New Haven, Connecticut06520,USA.5Hospital for Sick Children,555University Avenue,Toronto,Ontario M4K1X8,Canada.6Affinium Pharmaceuticals,100University Avenue,Toronto, Ontario M5J1V6,Canada.7Howard Hughes Medical Institute,Department of Cellular and Molecular Pharmacology,UCSF,Genentech Hall S472C,60016th St,San Francisco, California94143,USA.8Comparative Genomics Laboratory,Nara Institute of Science and Technology8916-5,Takayama,Ikoma,Nara630-0101,Japan.9Department of Biochemistry,Saint Louis University School of Medicine,1402South Grand Boulevard,St Louis,Missouri63104,USA.10Howard Hughes Medical Institute,Department of Molecular and Cellular Biology,Harvard University,7Divinity Avenue,Cambridge,Massachusetts02138,USA.†Present address:Department of Cellular and Molecular Pharmacology,UCSF,San Francisco,California94143,USA.*These authors contributed equally to this work.identify a protein,we used two independent mass spectrometry methods to increase interactome coverage and confidence.Among the attempted purifications of4,562different proteins(Supplemen-tary Table S1),including all predicted non-membrane proteins,2,357 purifications were successful(Supplementary Table S2)in that at least one protein was identified(in1,613cases by MALDI–TOF mass spectrometry and in2,001cases by LC-MS/MS;Fig.1a)that was not present in a control preparation from an untagged strain.In total,4,087different yeast proteins were identified as preys with high confidence($99%;see Methods)by MALDI–TOF mass spectrometry and/or LC-MS/MS,corresponding to72%of the predicted yeast proteome(Supplementary Table S3).Smaller pro-teins with a relative molecular mass(M r)of35,000were less likely to be identified(Fig.1b),perhaps because they generate fewer peptides suited for identification by mass spectrometry.We were more successful in identifying smaller proteins by LC-MS/MS than by MALDI–TOF mass spectrometry,probably because smaller proteins stain less well with silver or ran off the SDS gels.Our success in protein identification was unrelated to protein essentiality(data not shown)and ranged from80%for low abundance proteins to over 90%for high abundance proteins(Fig.1c).Notably,we identified 47%of the proteins not detected by genome-wide western blotting18, indicating that affinity purification followed by mass spectrometry can be more sensitive.Many hypothetical proteins not detected by western blotting18or our mass spectrometry analyses may not be expressed in our standard cell growth conditions.Although our success rates for identifying proteins were94%and89%for nuclear and cytosolic proteins,respectively,and at least70%in most cellular compartments(Fig.1d),they were lower(61%and59%,respectively) for the endoplasmic reticulum and vacuole.However,even though we had not tagged or purified most proteins with transmembrane domains,we identified over70%of the membrane-associated proteins,perhaps because our extraction and purification buffers contained0.1%Triton X-100.Our identification success rate was lowest(49%)with proteins for which localization was not estab-lished5,6,many of which may not be expressed.We had high success in identifying proteins involved in all biological processes,as defined by gene ontology(GO)nomenclature,or possessing any broadly defined GO molecular function(Fig.1e,f).We were less successful (each about65%success)with transporters and proteins of unknown function;many of the latter may not be expressed.A high-quality data set of protein–protein interactions Deciding whether any two proteins interact based on our data must encompass results from two purifications(plus repeat purifications, if performed)and integrate reliability scores from all protein identi-fications by mass spectrometry.Removed from consideration as likely nonspecific contaminants were44preys detected in$3%of the purifications and nearly all cytoplasmic ribosomal subunits (Supplementary Table S4).Although the cytosolic ribosomes and pre-ribosomes,as well as some associated translation factors,are not represented in the interaction network and protein complexes we subsequently identified,we previously described the interactome for proteins involved in RNA metabolism and ribosome biogenesis22. We initially generated an‘intersection data set’of2,357protein–protein interactions based only on proteins identified in at least one purification by both MALDI–TOF mass spectrometry and LC-MS/MS with relatively low thresholds(70%)(Supplementary Table S5).This intersection data set containing1,210proteins was of reasonable quality but limited in scope(Fig.2b).Our second approach added to the intersection data set proteins identified either reciprocally or repeatedly by only a single mass spectrometrymethodFigure1|The yeast interactome encompasses a large proportion of the predicted proteome.a,Summary of our screen for protein interactions. PPI,protein–protein interactions.b–f,The proportions of proteins identified in the screen as baits or preys are shown in relation to protein mass (b),expression level(c),intracellular localization(d)and annotated GO molecular function(e)and GO biological process(f).ARTICLES NATURE|Vol440|30March2006to generate the‘merged data set’.The merged data set containing 2,186proteins and5,496protein–protein interactions(Supplemen-tary Table S6)had better coverage than the intersection network (Fig.2b).To deal objectively with noise in the raw data and improve precision and recall,we used machine learning algorithms with two rounds of learning.All four classifiers were validated by the hold-out method(66%for training and33%for testing)and ten-times tenfold cross-validation,which gave similar results.Because our objective was to identify protein complexes,we used the hand-curated protein complexes in the MIPS reference database23as our training set.Our goal was to assign a probability that each pairwise interaction is true based on experimental reproducibility and mass spectrometry scores from the relevant purifications(see Methods).In thefirst round of learning,we tested bayesian inference networks and 28different kinds of decision trees24,settling on bayesian networks and C4.5-based and boosted stump decision trees as providing the most reliable predictions(Fig.2a).We then improved performance by using the output of the three methods as input for a second round of learning with a stacking algorithm in which logistic regression was the learner25.We used a probability cut-off of0.273(average0.68; median0.69)to define a‘core’data set of7,123protein–protein interactions involving2,708proteins(Supplementary Table S7)and a cut-off of0.101(average0.42;median0.27)for an‘extended’data set of14,317protein–protein interactions involving3,672proteins (Supplementary Table S8).The interaction probabilities in Sup-plementary Tables S7and S8are likely to be underestimated because the MIPS complexes used as a‘gold standard’are themselves imperfect26.We subsequently used the core protein–protein inter-action data set to define protein complexes(see below),but the extended data set probably contains at least1,000correct interactions (as well as many more false interactions)not present in the core data set.The complete set of protein–protein interactions and their associ-ated probabilities(Supplementary Table S9)were used to generate a ROC curve with a performance(area under the curve)of0.95 (Fig.2b).Predictive sensitivity(true positive rate)or specificity(false positive rate),or both,are superior for our learned data set than for the intersection and merged data sets,each previous high-through-put study of yeast protein–protein interactions11–14,or a bayesian combination of the data from all these studies27(Fig.2b).Identification of complexes within the interaction networkIn the protein interaction network generated by our core data set of 7,123protein–protein interactions,the average degree(number of interactions per protein)is5.26and the distribution of the number of interactions per protein follows an inverse power law(Fig.2c), indicating scale-free network topology28.These protein–protein interactions could be represented as a weighted graph(not shown) in which individual proteins are nodes and the weight of the arc connecting two nodes is the probability that interaction is correct. Because the2,357successful purifications underlying such a graph would represent.50%of the detectably expressed proteome18, we have typically purified multiple subunits of a given complex.To identify highly connected modules within the global protein–protein interaction network,we used the Markov cluster algorithm,which simulates random walks within graphs29.We chose values for the expansion and inflation operators of the Markov cluster procedure that optimized overlap with the hand-curated MIPS complexes23. Although the Markov cluster algorithm displays good convergence and robustness,it does not necessarily separate two or more com-plexes that have shared subunits(for example,RNA polymerases I and III,or chromatin modifying complexes Rpd3C(S)and Rpd3C(L))30,31.The Markov cluster procedure identified547distinct(non-overlapping)heteromeric protein complexes(Supplementary Table S10),about half of which are not present in MIPS or two previous high-throughput studies of yeast complexes using affinity purification and mass spectrometry(Fig.3a).New subunits or interacting proteins were identified for most complexes that had been identified previously(Fig.3a).Overlap of our Markov-cluster-computed complexes with the MIPS complexes was evaluated(see Supplementary Information)by calculating the total precision (measure of the extent to which proteins belonging to one reference MIPS complex are grouped within one of our complexes,and vice versa)and homogeneity(measure of the extent to which proteins from the same MIPS complex are distributed across our complexes, and vice versa)(Fig.3b).Both precision and homogeneity were higher for the complexes generated in this study—even for the extended set of protein–protein interactions—than for complexes generated by both previous high-throughput studies of yeast com-plexes,perhaps because the increased number of successful purifi-cations in this study increased the density of connections within most modules.The average number of different proteins per complex is 4.9,but the distribution(Fig.3c),which follows an inverse power law, is characterized by a large number of small complexes,most often containing only two to four different polypeptides,and a much smaller number of very large complexes.Proteins in the same complex should have similar function and co-localize to the same subcellular compartment.To evaluate this,weFigure2|Machine learning generates a core data set of protein–protein interactions.a,Reliability of observed protein–protein interactions was estimated using probabilistic mass spectra database search scores and measures of experimental reproducibility(see Methods),followed by machine learning.b,Precision-sensitivity ROC plot for our protein–protein interaction data set generated by machine learning.Precision/sensitivity values are also shown for the‘intersection’and‘merged’data sets(see text)and for other large-scale affinity tagging13,14and two-hybrid11,12data sets, and a bayesian networks combination of those data sets27,all based on comparison to MIPS complexes.FP,false positive;TP,true positive.c,Plot of the number of nodes against the number of edges per node demonstrates that the core data set protein–protein interaction network has scale-free properties.NATURE|Vol440|30March2006ARTICLESFigure3|Organization of the yeast protein–protein interaction network into protein complexes.a,Pie charts showing how many of our547 complexes have the indicated percentages of their subunits appearing in individual MIPS complexes or complexes identified by other affinity-based purification studies13,14.b,Precision and homogeneity(see text)in comparison to MIPS complexes for three large-scale studies.c,The relationship between complex size(number of different subunits)and frequency.d,Graphical representation of the complexes.This Cytoscape/ GenePro screenshot displays patterns of evolutionary conservation of complex subunits.Each pie chart represents an individual complex,its relative size indicating the number of proteins in the complex.The thicknesses of the429edges connecting complexes are proportional to the number of protein–protein interactions between connected nodes. Complexes lacking connections shown at the bottom of thisfigure have,2 interactions with any other complex.Sector colours(see panel f)indicate the proportion of subunits sharing significant sequence similarity to various taxonomic groups(see Methods).Insets provide views of two selected complexes—the kinetochore machinery and a previously uncharacterized, highly conserved fructose-1,6-bisphosphatase-degrading complex(see text for details)—detailing specific interactions between proteins identified within the complex(purple borders)and with other proteins that interact with at least one member of the complex(blue borders).Colours indicate taxonomic similarity.e,Relationship between protein frequency in the core data set and degree of connectivity or betweenness as a function of conservation.Colours of the bars indicate the evolutionary grouping.f,Colour key indicating the taxonomic groupings(and their phylogenetic relationships).Numbers indicate the total number of ORFs sharingsignificant sequence similarity with a gene in at least one organism associated with that group and,importantly,not possessing similarity to any gene from more distantly related organisms.ARTICLES NATURE|Vol440|30March2006calculated the weighted average of the fraction of proteins in each complex that maps to the same localization categories5(see Sup-plementary Information).Co-localization was better for the com-plexes in our study than for previous high-throughput studies but, not unexpectedly,less than that for the curated MIPS complexes (Supplementary Fig.S1).We also evaluated the extent of semantic similarity32for the GO terms in the‘biological process’category for pairs of interacting proteins within our complexes(Supplementary Fig.S2),and found that semantic similarity was lower for our core data set than for the MIPS complexes or the previous study using TAP tags13,but higher than for a study using protein overproduc-tion14.This might be expected if the previous TAP tag study significantly influenced the semantic classifications in GO.To analyse and visualize our entire collection of complexes,the highly connected modules identified by Markov clustering for the global core protein–protein interaction network were displayed (b.sickkids.ca)using our GenePro plug-in for the Cytoscape software environment33(Fig.3d).Each complex is represented as a pie-chart node,and the complexes are connected by a limited number(429)of high-confidence interactions.Assignment of connecting proteins to a particular module can therefore be arbitrary,and the limited number of connecting proteins could just as well be part of two or more distinct complexes.The size and colour of each section of a pie-chart node can be made to represent the fraction of the proteins in each complex that maps into a given complex from the hand-curated MIPS complexes (Supplementary Fig.S3).Similar displays can be generated when highlighting instead the subcellular localizations or GO biological process functional annotations of proteins in each complex.Further-more,the protein–protein interaction details of individual complexes can readily be visualized(see Supplementary Information). Evolutionary conservation of protein complexesORFs encoding each protein were placed into nine distinct evolu-tionary groups(Fig.3f)based on their taxonomic profiles(see Methods),and the complexes displayed so as to show the evolution-ary conservation of their components(Fig.3d).Insets highlight the kinetochore complex required for chromosome segregation and a novel,highly conserved complex involved in degradation of fructose-1,6-bisphosphatase.Strong co-evolution was evident for com-ponents of some large and essential complexes(for example,19S and20S proteasomes involved in protein degradation,the exosome involved in RNA metabolism,and the ARP2/3complex required for the motility and integrity of cortical actin patches).Conversely,the kinetochore complex,the mediator complex required for regulated transcription,and the RSC complex that remodels chromatin haveaFigure4|Characterization of three previously unreported protein complexes and Iwr1,a novel RNAPII-interacting factor.a,Identification of three novel complexes by SDS–PAGE,silver staining and mass spectrometry. The same novel complex containing Vid30was obtained after purification from strains with other tagged subunits(data not shown).b,Identification of Iwr1(interacts with RNAPII).Tagging and purification of unique RNAPII subunits identified YDL115C(Iwr1)as a novel RNAPII-associated factor (Supplementary Fig.S5a).Purification of Iwr1is shown here.c,Genetic interactions of Iwr1with various transcription factors.Lines connect genes with synthetic lethal/sick genetic interactions.d,Microarray analysis on the indicated deletion strains.Pearson correlation coefficients were calculated for the effects on gene expression of each deletion pair and organized by two-dimensional hierarchical clustering.e,Antibody generated against the amino-terminal amino acid sequence(DDDDDDDSFASADGE)of the Drosophila homologue of Iwr1(CG10528)and a monoclonal antibody(H5) against RNAPII subunit Rpb1phosphorylated on S5of the heptapeptide repeat of its carboxy-terminal domain48were used for co-localization studies on polytene chromosomes as previously described47.NATURE|Vol440|30March2006ARTICLEShigh proportion of fungi-specific subunits.Previous studies have shown that highly connected proteins within a network tend to be more highly conserved17,34,a consequence of either functional con-straints or preferential interaction of new proteins with existing highly connected proteins28.For the network as a whole,and consistent with earlier studies,Fig.3e reveals that the frequency of ORFs with a large number(.10)of connections is proportional to the relative distance of the evolutionary group.‘Betweenness’pro-vides a measure of how‘central’a protein is in a network,typically calculated as the fraction of shortest paths between node pairs passing through a node of interest.Figure3e shows that highly conserved proteins tend to have higher values of betweenness. Despite these average network properties,the subunits of some complexes(for example,the kinetochore complex)display a high degree of connectedness despite restriction to hemiascomycetes. Thesefindings suggest caution in extrapolating network properties to the properties of individual complexes.We also investigated the relationship between an ORF’s essentiality and its conservation, degree of connectivity and betweenness(Supplementary Fig.S4). Consistent with previous studies17,35,essential genes tend to be more highly conserved,highly connected and central to the network(as defined by betweenness),presumably reflecting their integrating role. Examples of new protein complexes and interactionsAmong the275complexes not in MIPS that we identified three are shown in Fig.4a.One contains Tbf1,Vid22and YGR071C.Tbf1 binds subtelomeric TTAGGG repeats and insulates adjacent genes from telomeric silencing36,37,suggesting that this trimeric complex might be involved in this process.Consistent with this,a hypo-morphic DAmP allele10(30untranslated region(UTR)deletion)of the essential TBF1gene causes a synthetic growth defect when combined with a deletion of VID22(data not shown),suggesting that Tbf1and Vid22have a common function.Vid22and YGR071C are the only yeast proteins containing BED Zinc-finger domains, thought to mediate DNA binding or protein–protein interactions38, suggesting that each uses its BED domain to interact with Tbf1or enhance DNA binding by Tbf1.Another novel complex in Fig.4a contains Vid30and six other subunits(also see Fig.3d inset).Five of its subunits(Vid30,Vid28,Vid24,Fyv10,YMR135C)have been genetically linked to proteasome-dependent,catabolite-induced degradation of fructose-1,6-bisphosphatase39,suggesting that the remaining two subunits(YDL176W,YDR255C),hypothetical pro-teins of hitherto unknown function,are probably involved in the same process.Vid24was reported to be in a complex with a M r of approximately600,000(ref.39),similar to the sum of the apparent M r values of the subunits of the Vid30-containing complex.The third novel complex contains Rtt109and Vps75.Because Vps75is related to nucleosome assembly protein Nap1,and Rtt109is involved in Ty transposition40,this complex may be involved in chromatin assembly or function.Our systematic characterization of complexes by TAP and mass spectrometry has often led to the identification of new components of established protein complexes(Fig.3a)41–43.Figure4high-lights Iwr1(YDL115C),which co-purifies with RNA polymerase II (RNAPII)along with general initiation factor TFIIF and transcrip-tion elongation factors Spt4/Spt5and Dst1(TFIIS)(Figs4b and3d (inset);see also Supplementary Fig.S5a).We used synthetic genetic array(SGA)technology9in a quantified,high-density E-MAP for-mat10to systematically identify synthetic genetic interactions for iwr1D with deletions of the elongation factor gene DST1,the SWR complex that assembles the variant histone Htz1into chromatin44, an Rpd3-containing histone deacetylase complex(Rpd3(L))that mediates promoter-specific transcriptional repression30,31,the his-tone H3K4methyltransferase complex(COMPASS),the activity of which is linked to elongation by RNAPII45,and other transcription-related genes(Fig.4c).Moreover,DNA microarray analyses of the effects on gene expression of deletions of IWR1and other genes involved in transcription by RNAPII,followed by clustering of the genes according to the similarity of their effects on gene expression, revealed that deletion of IWR1is most similar in its effects on mRNA levels to deletion of RPB4(Fig.4d),a subunit of RNAPII with multiple roles in transcription46.We also made use of the fact that Iwr1is highly conserved(Supplementary Fig.S5b),with a homologue,CG10528,in Drosophila melanogaster.Fig.4e shows that Drosophila Iwr1partly co-localizes with phosphorylated,actively transcribing RNAPII on polytene chromosomes,suggesting that Iwr1 is an evolutionarily conserved transcription factor.ConclusionsWe have described the interactome and protein complexes under-lying most of the yeast proteome.Our results comprise7,123 protein–protein interactions for2,708proteins in the core data set. Greater coverage and accuracy were achieved compared with pre-vious high-throughput studies of yeast protein–protein interactions as a consequence of four aspects of our approach:first,unlike a previous study using affinity purification and mass spectrometry14, we avoided potential artefacts caused by protein overproduction; second,we were able to ensure greater data consistency and repro-ducibility by systematically tagging and purifying both interacting partners for each protein–protein interaction;third,we enhanced coverage and reproducibility,especially for proteins of lower abun-dance,by using two independent methods of sample preparation and complementary mass spectrometry procedures for protein identifi-cation(in effect,up to four spectra were available for statistically evaluating the validity of each PPI);andfinally,we used rigorous computational procedures to assign confidence values to our pre-dictions.It is important to note,however,that our data represent a‘snapshot’of protein–protein interactions and complexes in a particular yeast strain subjected to particular growth conditions. Both the quality of the mass spectrometry spectra used for protein identification and the approximate stoichiometry of the interacting protein partners can be evaluated by accessing our publicly available comprehensive database(http://tap.med.utoronto.ca/)that reports gel images,protein identifications,protein–protein interactions and supporting mass spectrometry data(Supplementary Information and Supplementary Fig.S6).Soon to be linked to our database will be thousands of sites of post-translational modification tentatively identified during our LC-MS/MS analyses(manuscript in prepa-ration).The protein interactions and assemblies we identified pro-vide entry points for studies on individual gene products,many of which are evolutionarily conserved,as well as‘systems biology’approaches to cell physiology in yeast and other eukaryotic organisms.METHODSExperimental procedures and mass spectrometry.Proteins were tagged, purified and prepared for mass spectrometry as previously described43.Gel images,mass spectra and confidence scores for protein identification by mass spectrometry are found in our database(http://tap.med.utoronto.ca/).Confi-dence scores for protein identification by LC-MS/MS were calculated as described previously43.After processing72database searches for each spectrum, a score of1.25,corresponding to99%confidence(A.P.T.and N.J.K,unpublished data),was used as a cut-off for protein identification by MALDI–TOF mass spectrometry.Synthetic genetic interactions and effects of deletion mutations on gene expression were identified as described previously30.Drosophila polytene chromosomes were stained with dIwr1anti-peptide antibody and H5 monoclonal antibody as previously described47.Identification of protein complexes.Details of the methods for identification of protein complexes and calculating their overlaps with various data sets are described in Supplementary Information.Protein property analysis.We used previously published yeast protein localiza-tion data5,6,and yeast protein properties were obtained from the SGD(http:// /)and GO()databases. Proteins expressed at high,medium or low levels have expression log values of .4,3–4,or,3,respectively18.Phylogenetic analysis.For each S.cerevisiae sequence a BLAST and TBLASTXARTICLES NATURE|Vol440|30March2006。

串联亲和层析 protocol

串联亲和层析 protocol

UNIT19.20 Strep/FLAG Tandem Affinity Purification(SF-TAP)to Study Protein InteractionsChristian Johannes Gloeckner,1Karsten Boldt,1,2and Marius Ueffing1,21Helmholtz Zentrum M¨u nchen,Neuherberg,Germany2Technical University of Munich,Munich,GermanyABSTRACTIn recent years,several methods have been developed to analyze protein-protein interac-tions under native conditions.One of them,tandem affinity purification(TAP),combinestwo affinity-purification steps to allow isolation of high-purity protein complexes.Thisunit presents a methodological workflow based on an SF-TAP tag comprising a doubletStrep-tag II and a FLAG moiety optimized for rapid as well as efficient tandem affinitypurification of native proteins and protein complexes in higher eukaryotic cells.Depend-ing on the stringency of purification conditions,SF-TAP allows both the isolation ofa single tagged-fusion protein of interest and purification of protein complexes undernative conditions.Curr.Protoc.Protein Sci.57:19.20.1-19.20.19.C 2009by John Wiley&Sons,Inc.Keywords:SF-TAP r tandem affinity purification r protein complexesINTRODUCTIONThe analysis of protein-protein interactions under native conditions has been a challengeever since immunoprecipitation(IP)became a common methodology.Low yields andnonspecific binding of proteins have been associated with IP.On the other hand,IPfacilitates targeted analysis of protein interactions with respect to a predefined proteinof interest,given that a suitable antibody is available that features monospecificity andselectivity for this protein.Tandem affinity purification(TAP;UNIT19.19)can significantly reduce the backgroundcaused by nonspecific binding of proteins,as it combines two affinity purifications basedon two different affinity matrices(Rigaut et al.,1999).TAP has been widely used topurify protein complexes from different species(Collins and Choudhary,2008).The TAPtechnique was originally developed to analyze the yeast protein interactome(Gavin et al.,2002).Although the original TAP tag,consisting of a Protein A-tag,a TEV(tobacco etchvirus)protease cleavage site,and a calmodulin binding peptide(CBP)tag,has alreadybeen successfully used in mammalian cells(Bouwmeester et al.,2004),several featuresof thisfirst-generation tag remain suboptimal,such as its high molecular mass(21kDa),the dependency on proteolytic cleavage,and CBP,which may interfere with calciumsignaling within eukaryotic cells.This unit presents an alternative TAP protocol for theisolation of protein complexes from higher eukaryotic cells.The Strep/FLAG tandemaffinity purification(SF-TAP)tag(Gloeckner et al.,2007)combines a tandem Strep-tagII(Skerra and Schmidt,2000;Junttila et al.,2005)and a FLAG tag,resulting in a small4.6-kDa tag.Both moieties have a medium affinity and avidity to their immobilizedbinding partners.Therefore,the tagged fusion proteins and their binding partners canbe recovered under native conditions without the need for time-consuming proteolyticcleavage.In thefirst step,desthiobiotin is used for elution of the SF-TAP fusion proteinfrom the Strep-Tactin matrix.In the second step,the FLAG octapeptide is used for elutionof the SF-TAP fusion protein from the anti-FLAG M2affinity matrix.An overview of the Current Protocols in Protein Science19.20.1-19.20.19,August2009Published online August2009in Wiley Interscience().DOI:10.1002/0471140864.ps1920s57Copyright C 2009John Wiley&Sons,Inc.Identification of Protein Interactions19.20.1 Supplement57Strep/FLAGTandem AffinityPurification (SF-TAP)19.20.2Supplement 57Current Protocols in Protein Science A B 1. purification 2. purification binding to Strep-Tactin binding to FLAG matrix elution with desthiobiotin elution with FLAG peptide Key:SF-TAP desthiobiotin FLAG peptide Figure 19.20.1The S trep/FLAG ta n dem affin ity p u rificatio n .(A )N-a n d C-termi n al S F-T AP ta gs (POI,protei n of i n tere s t).(B )Overview of both p u rificatio n s tep s .(1)P u rificatio n by the ta n dem S trep-ta g II moiety:bi n di ng to S trep-T acti n matrix followed by el u tio n with de s thiobioti n .(2)P u rificatio n by the FLAG-ta g moiety:bi n di ng to a n ti-FLAG M2affin ity matrix followed by el u -tio n with FLAG peptide.Abbreviatio ns :s p.,s pecific i n teractor s (s how n a s g ray circle s );n .s p.,n o ns pecific protei ns (co n tami n a n t s ;s how n a s white circle s ).SF-TAP technique and the tag sequence is shown in Figure 19.20.1.The SF-TAP protocol represents an efficient,fast and straightforward purification of protein complexes from mammalian cells within 2hr.This unit describes the full workflow,starting with the cell culture work needed for recombinant expression of the SF-TAP fusion proteins,followed by the SF-TAP protocol (see Basic Protocol 1)and ending with mass spectrometric analysis of the samples (see Basic Protocol 4).Special focus is given to the crucial step of sample preparation for mass spectrometry.For the identification of associated proteins following SF-TAP,the volume of the SF-TAP eluates is reduced by ultrafiltration using centrifugal units with a low molecular weight cut-off or by chloroform/methanol precipitation (see Support Protocol 2).The samples are then directly subjected to proteolytic digestion (see Basic Protocol 2)for analysis on a nano liquid chromatography (LC)–coupled electron sprayIdentification of Protein Interactions 19.20.3Current Protocols in Protein Science Supplement 57Figure 19.20.2Flow chart of a S F-T AP approach i n cl u di ng M S ide n tificatio n of cop u rified pro-tei ns .Thi s figu re co nn ect s all protocol s pre s e n ted i n thi s un it.tandem mass spectrometer.For complex samples,which contain many proteins,an alternative procedure for SDS-PAGE pre-fractionation is provided,including a method for sensitive MS-compatible Coomassie protein staining (see Support Protocol 3)followed by in-gel proteolytic digestion (see Basic Protocol 3).By reducing sample complexity,pre-fractionation helps to increase the number of protein identifications on state-of-the-art LC-coupled tandem mass spectrometers.Representative MS-analysis protocols are provided for an Orbitrap mass spectrometer (Thermo Fisher Scientific),a fast and sensitive system allowing high identification rates from SF-TAP purifications even with low amounts of protein in the sample (see Basic Protocol 4).Finally,a strategy for meta analysis of mass spectrometric data sets using the Scaffold software is provided (see Support Protocol 4).It can generally be used for the analysis of large MS/MS data sets.Figure 19.20.2provides a flowchart of the entire analytical process.Strep/FLAGTandem AffinityPurification (SF-TAP)19.20.4Supplement 57Current Protocols in Protein ScienceBASICPROTOCOL 1STREP/FLAG TANDEM AFFINITY PURIFICATION (SF-TAP)OF PROTEIN COMPLEXES FROM HEK293CELLS A flowchart of the SF-TAP procedure is shown in Figure 19.20.3.Materials HEK293cells (ATCC no.CRL-1573)Complete DMEM containing 10%FBS (APPENDIX 3C )SF-TAP vectors with appropriate insert,and empty control plasmid (see Critical Parameters)Negative control (see annotation to step 3,below)Transfection reagent of choice (see UNIT 5.10)Phosphate-buffered saline (PBS;APPENDIX 2E ),prewarmed Lysis buffer (see recipe)Strep-Tactin Superflow resin (IBA GmbH,cat.no.2-1206-10)Tris-buffered saline (TBS;see recipe)Wash buffer (see recipe)Desthiobiotin elution buffer:dilute 10×buffer E (IBA GmbH,cat.no.2-1000-025)1:10in H 2O (final concentration,2mM desthiobiotin)Anti–FLAG M2agarose (Sigma-Aldrich)FLAG elution buffer (see recipe)14-cm tissue culture plates Cell scraper Millex GP 0.22-μm syringe-driven filter units (Millipore)End-over-end rotator Microspin columns (GE Healthcare,cat.no.27-3565-01)End-over-end rotator Microcon YM-3centrifugal filter devices (Millipore)Additional reagents and equipment for transfection of mammalian cells (UNIT 5.10)Transfect HEK293cells 1.Seed HEK293cells on 14-cm plates at ∼1–2×107cells per dish in complete DMEM medium containing 10%FBS.The amount of cells used for SF-TAP purification can be varied depending on the ex-pression levels of the bait ually,four 14-cm dishes,corresponding to a final amount of ∼4×108HEK293cells,is a good starting point.Strong overexpression of the bait protein usually increases copurification of heat-shock proteins such as HSP70.For in-depth analysis,it is therefore recommended to generate cell lines stably expressing the bait protein.See Support Protocol 1for a stable transfection method.2.Grow cells overnight.3.Transfect cells with the SF-TAP plasmids using a transfection reagent of choice (according to manufacturer’s protocols).HEK293cells can be easily transfected with lipophilic transfection reagents.The trans-fection efficiency is usually >80%.For a typical SF-TAP experiment,1to 4μg plasmid per 14-cm dish is used.Depending on the cell type other transfection reagents may be favorable (also see UNIT 5.10).Although SF-TAP purifications typically exhibit low background caused by nonspecific binding of proteins to the affinity matrix,a suitable negative control should be used in every experiment.Cells transfected with the empty expression vectors may be used in the same amount as for the SF-TAP-tagged bait protein.However,the tag is quite small and expressed at low levels if not fused to a protein.Thus,the untransfected cell line is an acceptable,simple,and inexpensive alternative for a negative control.Identification of Protein Interactions 19.20.5Current Protocols in Protein Science Supplement 571-4 × 108 HEK293 cell s(1-4 co n fl u e n t 14-cm plate s )expre ss i ng S F-TAP f us io n protei nly s i s(15 mi n 4C)vol u mered u ctio nce n trif ug atio n (10 mi n 10,000 × g )a n aly s i sretai n su per n ata n t fi n alel u atei n c u batio n with50 μl/plate S trep-Tacti n matrix (1 hr)el u tio n with200 μl FLAGel u tio n b u ffer(10 mi n )wa s h 3 time s with 500 μl wa s h b u ffer (s pi n 5 s ec, 100 × g )wa s h 3 time s with500 μl wa s h b u ffer(s pi n 5 s ec, 100 × g )el u tio n with 500 μl de s thiobioti n el u tio n b u ffer (10 mi n )i n c u batio n with25 μl/platea n ti-FLAG M2a g aro s e(1 hr)Figure 19.20.3Flow chart for the S F-T AP proced u re.4.Let cells grow for 48hr.If necessary,cells can be starved in DMEM without FBS for 12hr prior to harvesting.Starving might be desirable if cell signaling is to be analyzed,especially prior to differ-ential treatment with growth factors,to eliminate effects of serum growth factors.Lyse cells5.Remove medium from the plates.6.Optional:Rinse cells in warm PBS.Strep/FLAGTandem AffinityPurification (SF-TAP)19.20.6Supplement 57Current Protocols in Protein Science7.Scrape off cells in 1ml lysis buffer per 14-cm plate on ice using a cell scraper,and combine lysates from each experimental condition in a 1.5-ml microcentrifuge tube.8.Lyse cells by incubating 15min on ice with mixing by hand from time to time.9.Pellet cell debris,including nuclei,by centrifuging 10min at 10,000×g ,4◦C.10.Clear lysate supernatant by filtration through a 0.22-μm syringe filter.Perform SF-TAP 11.Wash Strep-Tactin Superflow resin twice,each time with 4resin volumes TBS and once with 4resin volumes lysis buffer.12.Incubate lysates with 50μl per 14-cm plate of settled Strep-Tactin Superflow resin for 1hr at 4◦C (use an end-over-end rotator to keep the resin evenly distributed).Note that a maximum of 200μl settled resin per spin column should not be exceeded.If more than four 14-cm plates (∼4×108HEK293cells)are used,reduce the volume per plate or use additional spin columns in step 13.13.Centrifuge for 30sec at 7000×g ,4◦C,remove the supernatant until 500μl remains,and transfer resin to a microspin column.Snap off bottom closure of the spin column prior to use.The maximum volume of the spin columns is 650μl.Alternatively,centrifugations for wash and elution steps can be performed at room temperature if no cooled centrifuge is available.14.Remove remaining supernatant by centrifugation in the spin column for 5sec at 100×g ,then wash resin three times,each time with 500μl wash buffer (centrifuge 5sec at 100×g each time to remove the supernatant)at 4◦C.Replug spin columns with inverted bottom closure prior to adding the elution buffer in step 15.IMPORTANT NOTE:Do not allow the resin to run dry.Depending on the bait protein,this markedly reduces the yield.15.Add 500μl desthiobiotin elution buffer and gently mix the resin by hand for 10min on ice.16.Remove the plug of the spin column,transfer the column to a new collection tube,and collect the eluate by centrifuging 10sec at 2000×g ,4◦C.If spin columns were closed by the top screw cap during incubation with elution buffer,the cap needs to be removed prior to centrifugation,to allow the pressure to balance out.17.Wash anti–FLAG M2agarose resin three times,each time with 4resin volumes TBS.Suspend resin in TBS and transfer it to microspin columns,then remove the buffer by centrifuging 5sec at 100×g .25μl settled resin per 14-cm plate will be needed.18.Transfer eluate from step 16corresponding to each 14-cm plate to a microspin column containing 25μl settled anti-FLAG M2agarose prepared as in step 17.19.Plug columns,close columns with top screw caps,and incubate for 1hr at 4◦C (on an end-over-end rotator).20.Wash once with 500μl wash buffer,and then twice,each time with 500μl TBS (centrifuge 5sec at 100×g each time to remove the supernatant)at 4◦C.21.For elution,incubate with 4bead volumes (at least 200μl)FLAG elution buffer for 10min,keeping the columns plugged and gently mixing the resin several times.22.After incubation,remove the plugs and top screws of the spin columns,transfer to new collection tubes,and collect the eluate(s)by centrifugation (10sec at 2000×g ).Identification of Protein Interactions 19.20.7Current Protocols in Protein Science Supplement 5723.Depending on downstream method to be used,either precipitate protein (see SupportProtocol 2)or concentrate the eluate by Microcon YM-3centrifugal filter units according to manufacturer’s protocols.SUPPORT PROTOCOL 1GENERATION OF HEK293CLONES STABLY EXPRESSINGSF-TAP-TAGGED PROTEINSIn Basic Protocol 1,SF-TAP-tagged proteins are transiently expressed.However,strong overexpression of the bait protein usually increases copurification of heat-shock proteins such as HSP70.For in-depth analysis,it is therefore recommended to generate cell lines stably expressing the bait protein.This protocol presents a quick method for generating stable HEK293lines.MaterialsHEK293cells (ATCC no.CRL-1573)Complete DMEM containing 10%FBS (APPENDIX 3C )SF-TAP vectors with appropriate insert,and empty control plasmid (see Critical Parameters)Transfection reagent of choice (see UNIT 5.10)Phosphate-buffered saline (PBS;APPENDIX 2E )Complete DMEM medium (APPENDIX 3C )G418(PAA Laboratories, )Freezing solution:90%fetal bovine serum (FBS;Invitrogen)/10%dimethylsulfoxide (DMSO;AR grade)Lysis buffer (see recipe)Blocking reagent:5%(w/v)nonfat dry milk in TBS (see recipe for TBS)containing 0.1%(v/v)Tween 20Anti-FLAG M2antibody (Sigma-Aldrich)10-cm tissue culture dishes12-well and 6-welll tissue culture platesCentrifuge2-ml cryovials (Nunc)Additional reagents and equipment for transfection of mammalian cells (UNIT 5.10),trypsinization and counting of cells (UNIT 5.10),and immunoblotting (UNIT 10.10)Grow and transfect cells1.Grow cells in complete DMEM containing 10%FBS.2.Transfect cells with expression plasmid using a transfection reagent of choice ac-cording to the manufacturer’s protocols.3.Change medium after 6hr.Select cells4.After 48hr,trypsinize and count cells (APPENDIX 3C )and seed them at low density (1×106cells per 10-cm dish)to allow formation of single colonies upon selection.5.Add G418(500to 1000μg/ml)for selection of the SF-TAP expression vectors,which are based on pcDNA3.0and contain a neomycin-resistance gene.6.Grow the cells under G-418selection for 2to 4weeks,changing the medium every second day.7.Collect single colonies with a 200-μl pipet into 12-well plates.8.Keep colonies under G418selection until the cell density is sufficient for expanding them to 6-well dishes (two wells per clone).Strep/FLAGTandem AffinityPurification (SF-TAP)19.20.8Supplement 57Current Protocols in Protein ScienceCryopreserve cells 9.Grow cells to >90%confluency and trypsinize (APPENDIX 3C )one well of each clone for generation of cryostocks.10.Generate cryostocks:a.Wash cells from one well once by adding 3ml PBS,centrifuging 5min at 800×g ,room temperature,and resuspending the pellet in 500μl freezing buffer.b.Transfer resuspended cells to 2-ml cryovials.c.Freeze cells slowly:keep cells for 1hr at −20◦C,then overnight at −80◦C,followed by storage in a liquid nitrogen tank.For cultivation and expansion of confirmed clones,thaw the cryostock at 37◦C,wash cells once with medium,and plate cells onto 10-cm culture dishes.Test for expression of bait protein 11.Lyse one well of each clone in 300μl lysis buffer and test for expression of the bait protein by immunoblotting (UNIT 10.10).SF-TAP proteins can be detected using the anti-FLAG M2antibody (Sigma-Aldrich)at a dilution of 1:1000to 1:5000in blocking reagent.SUPPORTPROTOCOL 2CHLOROFORM/METHANOL PRECIPITATION OF PROTEINS The chloroform/methanol precipitation method described by Wessel and Fl¨u gge (1984)precipitates proteins with high efficiency and yields samples containing low levels of salt contamination.Materials SF-TAP eluate (from Basic Protocol 1)Methanol (AR grade)Chloroform (AR grade)2-ml polypropylene sample tubes 1.Transfer 200μl SF-TAP eluate to a 2-ml sample tube.All steps are performed at ambient temperature.2.Add 0.8ml of methanol,vortex,and centrifuge for 20sec at 9000×g ,room temperature.3.Add 0.2ml chloroform,vortex,and centrifuge for 20sec at 9000×g ,room temperature.4.Add 0.6ml of deionized water,vortex for 5sec,and centrifuge for 1min at 9000×g ,room temperature.5.Carefully remove and discard the upper layer (aqueous phase).The protein precipitate (visible as white flocks)is in the interphase.6.Add 0.6ml of methanol,vortex,and centrifuge for 2min at 16,000×g ,room temperature.7.Carefully remove the supernatant and air dry the pellet.The pellet can be stored for several months at –80◦C.Identification of Protein Interactions 19.20.9Current Protocols in Protein Science Supplement 57BASIC PROTOCOL 2IN-SOLUTION DIGEST OF PROTEINS FOR MASS SPECTROMETRIC ANALYSISThe in-solution digest described here is a quick and efficient method to digest the SF-TAP eluate after protein precipitation (Support Protocol 2).The use of an MS-compatible surfactant helps to solubilize the precipitated proteins.In order to allow the identification of cysteine-containing peptides,random oxidation is prevented,rather than reverted,by applying a DTT/iodoacetamide treatment prior to digestion,leading to a defined-mass adduct.The digested protein sample can then be directly subjected to analysis on an LC-coupled tandem mass spectrometer.MaterialsPrecipitated protein (see Support Protocol 2)50mM ammonium bicarbonate (freshly prepared)RapiGest SF (Waters):prepare 2%(10×)stock solution in deionized water 100mM DTT (prepare from 500mM stock solution;store stock up to 6months at −20◦C)300mM iodoacetamide (prepare fresh)50×(0.5μg/μl)trypsin stock solution (Promega;store at −20◦C)Concentrated (37%)HCl60◦C incubatorPolypropylene inserts (Supelco,cat.no.24722)1to 200μl gel-loader pipet tips (Sorenson Bioscience,/contact.cfm )1.Dissolve the protein pellet in 30μl of 50mM ammonium bicarbonate by extensive vortexing.2.Add 3μl of 10×(2%)RapiGest stock solution (final concentration,0.2%).RapiGest (sodium 3-[(2-methyl-2-undecyl-1,3-dioxolan-4-yl)methoxyl]-1-propanesulfo-nate)is an acid-labile surfactant that helps to solubilize and denature proteins to make them accessible to proteolytic digestion (Yu et al.,2003).3.Add 1μl of 100mM DTT and vortex.4.Incubate 10min at 60◦C.5.Cool the samples to room temperature.6.Add 1μl of 300mM iodoacetamide and vortex.7.Incubate for 30min at room temperature.Samples should be protected from light,since iodoacetamide is light-sensitive.8.Add 2μl trypsin stock solution and vortex.9.Incubate at 37◦C overnight.10.Add 2μl of concentrated (37%)HCl to hydrolyze the RapiGest.For hydrolysis of the RapiGest reagent,the pH must be <2.11.Transfer samples to polypropylene inserts (remove spring).12.Incubate for 30min at room temperature.13.Place inserts in 1.5-ml microcentrifuge tubes and microcentrifuge 10min at 13,000×g ,room temperature.One hydrolysis product of the RapiGest reagent is water-immiscible and can be removed by centrifugation.After centrifugation,it is visible as faint film (oleic phase)on top of theStrep/FLAGTandem Affinity Purification (SF-TAP)19.20.10Supplement 57Current Protocols in Protein Science aqueous sample phase.The other hydrolysis product is an ionic water-soluble component which does not interfere with reversed phase LC or MS analysis.A white pellet might appear.14.Carefully recover the solution between the upper oleic phase and the pellet using gel-loader tips.The sample can now be directly subjected to C18HPLC separation prior to MS/MS-analysis (LC-MS/MS;Basic Protocol 4).Pre-fractionation (Basic Protocol 3)is optional.BASIC PROTOCOL 3PRE-FRACTIONATION VIA SDS-PAGE AND IN-GEL DIGESTION PRIOR TO LC-MS/MS ANALYSIS Pre-fractionation prior to MS analysis increases the number of peptides which can be an-alyzed,and therefore the peptide coverage of identified proteins.This benefit is achieved by overcoming the undersampling problem mainly caused by the limited capacity of the trapping columns used in nano–LC chromatography,or that occurs with high complexity.For these samples,SDS-PAGE pre-fractionation can be used to reduce the complexity.For less complex samples or samples with low protein content,the in-solution digest (Basic Protocol 2)is preferred.Materials Protein sample (e.g.,from Basic Protocol 1or Support Protocol 2)10%NuPAGE gels (Invitrogen)MOPS running buffer (Invitrogen)40%and 100%acetonitrile (AR grade;prepare fresh)5mM DTT (prepare from 500mM stock;store stock up to 6months at −20◦C)25mM iodoacetamide (prepare fresh)Digestion solution:dilute 50×trypsin stock solution (0.5μg/μl,Promega)1:50in 50mM ammonium bicarbonate (freshly prepared)1%and 0.5%(v/v)trifluoroacetic acid (TFA;prepare fresh from 10%v/v stock)50%(v/v)acetonitrile/0.5%(v/v)TFA (prepare fresh)99.5%(v/v)acetonitrile/0.5%(v/v)TFA (prepare fresh)2%(v/v)acetonitrile/0.5%(v/v)TFA Concentration units (e.g.,Microcon from Millipore)Scalpel Polypropylene 96-well microtiter plate:polystyrene material should be avoided since,depending on the product,polymers can be extracted from plastics which produce strong background signals in mass spectrometry 60◦C incubator or heating block Polypropylene 0.5-ml reaction tubes Microtiter plate shaker (e.g.,V ortex mixer equipped with microtiter-plate adaptor)HPLC sample tubes Additional reagents and equipment for SDS-PAGE (UNIT 10.1)and colloidal Coomassie blue staining of gels (Support Protocol 3)Prepare samples 1.Concentrate samples using concentration units (e.g.,Microcon).2.Supplement samples with Laemmli loading buffer (SDS-PAGE loading buffer;UNIT 10.1).A detailed description of the SDS gel electrophoresis and standard buffers can be found in UNIT 10.1or in the protocols supplied with the NuPAGE system.Identification of ProteinInteractions19.20.11Perform electrophoresis and stain gels3.Separate samples on 10%NuPAGE gels according to the manufacturer’s protocols,using MOPS running buffer.4.Stop electrophoresis after the gel front has travelled 1to 2cm.5.Stain gels with colloidal Coomassie blue (see Support Protocol 3).Avoid strong staining of the bands since it increases the time necessary for destaining.6.Excise desired gel pieces with a clean scalpel (three to ten slices,depending on the complexity of the sample).Destain and process gel slices7.Transfer gel pieces into individual wells of a 96-well plate.8.Wash by adding 100μl water to each well and incubating for 30min.9.For destaining:a.Wash twice,each time by incubating the gel slices for 10min in 100μl/well of 40%acetonitrile.b.Wash for 5min in 100μl/well of 100%acetonitrile (if gels are still blue,repeat de-staining).10.Add 100μl of 5mM DTT,then incubate 15min at 60◦C in an incubator or heatingblock.11.Remove DTT solution and cool the plate to room temperature.12.Add 100μl per well of freshly prepared 25mM iodoacetamide,then incubate 30minin the dark.13.Wash twice,each time for 10min with 100μl/well of 40%acetonitrile.14.Wash 5min with 100μl/well of 100%acetonitrile.15.Discard supernatant and air dry (or SpeedVac)the gel pieces to complete dryness.Digest and extract gel slices16.Add 20to 30μl per well of freshly prepared digestion solution (depending on the sizeof the gel plugs).Wrap plates in Parafilm to reduce evaporation during the overnight incubation (or use a humidified incubator in step 17).17.Digest overnight at 37◦C.18.For extraction of the peptides from the gel piece,add 10μl 1%TFA,then shake15min on a V ortex mixer with a microtiter plate adapter.The peptides are extracted in three steps with increasing acetonitrile concentrations (steps 18to 23).19.Transfer liquid (extract 1)to a 0.5-ml polypropylene tube.20.Add 50μl 50%acetonitrile/0.5%TFA to the gel piece and shake 15min on a V ortexmixer with a microtiter plate adapter.21.Remove the liquid (extract 2)and pool extracts 1and 2.22.Add 50μl 99.5%acetonitrile/0.5%TFA to the gel piece,then shake 15min on aV ortex mixer with a microtiter plate adapter.23.Remove the liquid (extract 3)and pool extract 3with 1and 2.Strep/FLAG Tandem AffinityPurification(SF-TAP)19.20.1224.Dry samples to complete dryness in a SpeedVac evaporator.25.Redissolve samples in50μl of2%acetonitrile/0.5%TFA by shaking(e.g.,on aV ortex mixer)for10to15min,then transfer the sample into HPLC sample tubes for LC-MS/MS analysis.SUPPORT PROTOCOL3QUICK MS-COMPATIBLE COLLOIDAL COOMASSIE STAIN OF PROTEINS AFTER SDS-PAGE SEPARATIONThe colloidal Coomassie stain(Kang et al.,2002)represents a fast and sensitive MS-compatible protein staining method.In contrast to the classical staining protocol,no intense and time-consuming destaining is needed to visualize protein bands.Therefore, this method is ideal for a quick staining of the protein bands and provides good orientation on how the gel can be fractionated without splitting predominant bands(see Basic Protocol3).MaterialsElectrophoresed SDS gel containing protein samples of interest(e.g.,from Basic Protocol3)Colloidal Coomassie staining solution(see recipe)Destaining solution:10%(v/v)ethanol/2%(v/v)orthophosphoric acidGel staining trays of appropriate size1.Wash gels twice,each time for10min in deionized water in a staining tray.The SDS must be removed before staining to reduce background signals.2.Incubate gels for10min in colloidal Coomassie staining solution.The incubation steps are kept short for the staining of gels used for pre-fractionation.The staining can be prolonged up to overnight.The maximum staining will be reached after ∼3hr incubation in the staining solution.3.Incubate gels for10min in destaining solution.4.Wash gels twice,each time for10min in deionized water.BASIC PROTOCOL4LC-MS/MS ANALYSIS OF DIGESTED SF-TAP SAMPLESThe following protocol describes MS analysis of digested protein samples on an LC-coupled ESI tandem mass spectrometer.The representative MS-analysis protocol is provided for an Orbitrap mass spectrometer(Thermo Fisher Scientific).The Orbitrap system combines fast data acquisition with high mass accuracy and is therefore ideal for the analysis of SF-TAP samples.Background information on mass spectrometric analysis can be found in UNIT16.11.MaterialsDigested protein sample,either from in-solution digest(Basic Protocol2)or in-gel digest(Basic Protocol3)Nano HPLC loading buffer:0.1%formic acid in HPLC-grade waterNano HPLC buffer A:2%acetonitrile/0.1%formic acid in HPLC-grade waterNano HPLC buffer B:80%acetonitrile/0.1%formic acid in HPLC-grade water HPLC vials(Dionex)Nano HPLC system(UltiMate3000,Dionex)equipped with a trap column (100μm i.d.×2cm,packed with Acclaim PepMap100C18resin,5μm,100◦A;Dionex)and an analytical column(75μm i.d.×15cm,packed with AcclaimPepMap100C18resin,3μm,100◦A;Dionex)Mass spectrometer:Oritrap XL with a nanospray ion source(ThermoFisher Scientific;also see UNIT16.11)。

细胞自噬 转录组+蛋白组

细胞自噬 转录组+蛋白组

细胞自噬转录组+蛋白组
细胞自噬(autophagy)是一种细胞内的重要代谢过程,通
过分解和回收细胞内的蛋白质、脂质和其他细胞器,维持
细胞内环境的稳定性。

细胞自噬包括三个主要步骤:识别
和包裹、溶酶体融合和降解。

在细胞自噬的过程中,转录组(transcriptome)和蛋白组(proteome)起着重要的作用。

转录组是指细胞中所有基
因的转录产物,即RNA的总体表达情况。

蛋白组是指细胞
中所有蛋白质的总体表达情况。

细胞自噬的转录组研究主要关注细胞自噬相关基因的表达
变化。

通过比较自噬诱导条件下和正常条件下的转录组数据,可以发现与细胞自噬相关的基因的表达差异。

这些基
因包括自噬相关基因(如ATG基因家族)、信号通路调控
基因、膜蛋白基因等。

转录组研究可以帮助我们了解细胞
自噬的调控机制以及自噬在不同生理和病理状态下的变化。

细胞自噬的蛋白组研究主要关注细胞自噬相关蛋白的表达
和修饰变化。

通过质谱分析等技术,可以鉴定和定量自噬
相关蛋白的表达水平和修饰状态,如磷酸化、乙酰化、泛
素化等。

这些蛋白包括自噬相关蛋白(如LC3、Beclin-1)、信号通路调控蛋白、膜蛋白等。

蛋白组研究可以帮助我们
了解细胞自噬的分子机制以及自噬在不同生理和病理状态
下的变化。

综上所述,细胞自噬的转录组和蛋白组研究可以帮助我们
深入了解细胞自噬的调控机制和分子机制,为相关疾病的治疗和药物开发提供重要的理论基础。

菊芋基因序列

菊芋基因序列

菊芋基因序列
菊芋(Helianthus tuberosus)是一种多年生的草本植物,其基因序列包含了大量的遗传信息。

由于菊芋的基因组相对较大且复杂,其完整的基因序列目前尚未完全解析。

然而,科学家们已经利用高通量测序技术获得了一些菊芋的基因组数据,并对其中的一些基因进行了深入的研究。

例如,菊芋的HtTIP2-2基因是一个液泡膜内在蛋白亚家族的成员,具有转水活性且主要定位于质膜及其周围的囊泡。

该基因的表达可以增加酿酒酵母对盐胁迫的耐受能力,并且在拟南芥中异位表达该基因可以增加拟南芥根毛的数量,并赋予拟南芥更强的逆境耐受能力。

此外,菊芋的HtWRKY基因家族也受到了关注,其中的一些成员被证明在植物抗逆性方面发挥重要作用。

需要注意的是,由于菊芋基因组的复杂性和多样性,目前所获得的基因序列数据仍然有限。

因此,对于菊芋基因序列的深入研究需要借助更先进的测序技术和分析方法,以揭示其更多的遗传信息和功能。

蒲公英甾醇合酶,编码蒲公英甾醇合酶的基因及其制备和应用[发明专利]

蒲公英甾醇合酶,编码蒲公英甾醇合酶的基因及其制备和应用[发明专利]

专利名称:蒲公英甾醇合酶,编码蒲公英甾醇合酶的基因及其制备和应用
专利类型:发明专利
发明人:章焰生,乔玮博,李长福
申请号:CN202011625519.0
申请日:20201230
公开号:CN112574981A
公开日:
20210330
专利内容由知识产权出版社提供
摘要:本发明公开了一种蒲公英甾醇合酶,编码蒲公英甾醇合酶的基因及其制备和应用。

该具蒲公英甾醇合酶活性的蛋白质是如下a)或b)或c)或d)的蛋白质:1)氨基酸序列如SEQ ID NO.2所示的蛋白质;2)在SEQ ID NO.2所示的蛋白质的N端和/或C端连接标签得到的融合蛋白质;3)将SEQ ID NO.2所示的氨基酸序列经过一个或几个氨基酸残基的取代和/或缺失和/或添加得到具有相同功能的蛋白质。

本发明克隆到一个关键蒲公英甾醇合酶基因的全长序列,并利用酵母转化技术鉴定了此蒲公英甾醇合酶基因所编码的蛋白质为多功能环化酶,能够催化2,3‑氧化鲨烯生成蒲公英甾醇,从而为利用酵母生产蒲公英甾醇提供了一个必要的基因资源。

申请人:上海大学
地址:200444 上海市宝山区上大路99号
国籍:CN
代理机构:北京知呱呱知识产权代理有限公司
代理人:杜立军
更多信息请下载全文后查看。

蛋白表达细胞株构建

蛋白表达细胞株构建

蛋白表达细胞株构建
蛋白表达细胞株构建是一种将目标蛋白质基因转移到细胞中,使细胞能够表达目标蛋白的方法。

构建蛋白表达细胞株的步骤通常包括以下几个方面:
1. 选择合适的宿主细胞株:根据目标蛋白的特性和需求,选择合适的宿主细胞株。

常见的宿主细胞株包括大肠杆菌(E. coli)、酵母(Saccharomyces cerevisiae)、昆虫细胞(Sf9、Sf21)和哺乳动物细胞(HEK293、CHO)等。

2. 克隆目标基因:利用分子生物学方法,将目标蛋白质基因插入适当的表达载体中。

表达载体通常包括启动子、选择性标记和其他调控元件,以促进目标基因的高效表达和筛选。

3. 转染宿主细胞:将克隆好的表达载体导入到宿主细胞中,一般通过化学转染、电穿孔、病毒转染等方法进行。

转染后,经过一定时间的培养和筛选,选择表达目标基因的细胞株。

4. 高效表达:为了提高目标蛋白的表达水平,可以调节培养条件、优化表达载体和引入辅助因子等策略,以获得高产量的目标蛋白。

5. 纯化目标蛋白:根据目标蛋白的特性,选择合适的蛋白纯化策略,如亲和层析、离子交换层析、凝胶过滤等技术,对表达的细胞株进行蛋白纯化和纯化。

需要注意的是,蛋白表达细胞株构建是一个复杂的过程,需要根据实际实验需求和目标蛋白的特性进行优化和改进。

高表达蛋白的分子克隆策略

高表达蛋白的分子克隆策略

高表达蛋白的分子克隆策略随着生物技术的不断发展,蛋白质工程在医学、农业和工业等领域中起着重要作用。

高表达蛋白的分子克隆策略是实现高效表达大量目的蛋白的关键步骤。

本文将介绍一种常用的高表达蛋白的分子克隆策略,并讨论其优势和应用。

一、引子序列的优化设计引子序列是一段与目的蛋白编码序列相连接的DNA序列。

在进行分子克隆前,引子序列的优化设计可以显著提高目的蛋白的表达水平。

引子序列优化设计的原则包括选择适宜的起始密码子、避免稳定性较差的二级结构形成、避免启动子及转录终止位点的干扰等。

二、宿主菌的选择选择适合蛋白表达的宿主菌对于高效表达目的蛋白至关重要。

常用的宿主菌包括大肠杆菌(E. coli)、酿酒酵母(Saccharomyces cerevisiae)、哺乳动物细胞等。

例如,E. coli是最常用的宿主菌之一,其优势在于生长速度快、易于培养,并且表达的目的蛋白易于纯化。

在选择宿主菌时,需要考虑目的蛋白的性质以及宿主菌的表达系统是否能够满足需求。

三、载体的选择和构建为了实现高效表达目的蛋白,需要选择合适的表达载体。

常见的表达载体包括质粒和病毒载体。

质粒是最常用的表达载体,具有稳定性好、易于操作的优点。

病毒载体具有较高的表达水平,但需要严格遵守相关安全操作。

根据不同的表达需求,可以选择具有不同启动子、选择性标记基因和调控序列的载体。

四、限制酶切和连接酶切酶的选择限制酶切是分子克隆的基础步骤,它可以将目的蛋白编码序列与表达载体进行切割。

为了确保蛋白质的高表达,限制酶切的选择是关键之一。

常用的限制酶切酶有EcoRI、BamHI、XhoI等。

连接酶切酶的选择也是非常重要的,它可以将目的蛋白编码序列正确连接到表达载体上。

目前,T4 DNA连接酶和DNA拼接酶是常用的连接酶切酶。

五、转化方法转化方法是将重组质粒导入到宿主菌中的过程。

常用的转化方法包括化学转化、电转化和热激转化等。

转化方法的选择需要根据宿主菌的特性和质粒的大小来确定。

组织特异性蛋白质复合体的识别

组织特异性蛋白质复合体的识别

组织特异性蛋白质复合体的识别丁霞;张晓飞;易鸣【摘要】In this paper, we study the identification problem of tissue-specific protein complexes. By using a variety of typical clustering algorithm to cluster the network, we construct a tissue-specific protein-protein interaction network based on the protein-protein interaction net-works as well as the tissue-specific gene expression data, then merge the results with non-negative matrix factorization model to obtain tissue-specific protein complexes. The results show that clustering effect has been significantly improved, and can identify tissue-specific protein complexes.%本文研究了组织特异性蛋白质复合体的识别问题.利用蛋白质相互作用网络数据以及组织特异性基因表达数据构建组织特异性蛋白网络,利用多种代表性聚类算法对该网络进行聚类,并利用非负矩阵分解对聚类结果进行合并聚类,得到了组织特异性蛋白质复合体.结果表明,聚类效果得到明显提升,并且能识别出组织特异性蛋白质复合体.【期刊名称】《数学杂志》【年(卷),期】2017(037)005【总页数】8页(P1093-1100)【关键词】蛋白质相互作用网络;复合体识别;组织特异性;非负矩阵分解【作者】丁霞;张晓飞;易鸣【作者单位】武汉大学数学与统计学院,湖北武汉 430072;华中师范大学数学与统计学学院,湖北武汉 430079;华中农业大学理学院,湖北武汉 430070【正文语种】中文【中图分类】O212.4;O212.5在现如今的后基因组时代,对细胞间模块以及基因的关系进行系统分析和全面了解是一个非常重要的课题.随着生物信息学的高速发展,基因组学中大规模的高通量技术,如基于质谱的串联亲和纯化[1,2]、酵母双杂交[3,4]以及蛋白芯片技术为我们提供了海量的大规模生物网络,也为我们对生物网络进行系统的分析创造了可能.众所周知,蛋白质很少单独行动,它们往往结合在一起形成复合体在生命体中进行生物功能[5].蛋白质复合体的综合研究有助于揭示蛋白质-蛋白质相互作用网络的结构、预测蛋白质的功能,更有助于阐明各种疾病的细胞机制[6].经过10多年的快速发展,已经涌现出了许多基于不同聚类机理的蛋白质相互作用网络功能模块检验方法.尽管在此方面已经有不少研究,但是这些方法主要关注静态的蛋白质相互作用网络,而忽略了蛋白质功能作用的动态变化及组织特异机制.幸运的是,DNA微阵列技术的出现,使数以千计的基因的差异表达的各种实验条件被同时且定量监视,它提供了许多有关于时间以及组织特异的信息[7].目前也有少许算法研究动态网络,并探测动态复合体,但还没有算法涉及到组织特异的复合体侦测.本文通过结合组织特异性基因表达数据以及人类蛋白质相互作用网络构建出一系列组织特异性蛋白网络,尝试探索组织特异功能模块的研究.本文的主要方法为对所构建的组织特异性蛋白网络利用多种方法对其进行聚类,并对结果进行组装,最后使用非负矩阵分解模型对组装的结果进行有效合并.实验结果表明,本文的方法与其他聚类方法相比,在检测蛋白质复合体上结果更好.因为组织特异性蛋白复合体对于理解生物学功能以及确定生物标志物和功能靶标十分重要[8],因此探索组织特异功能模块很有必要.在本节中,本文首先介绍如何构建组织特异性蛋白网络,随后介绍如何检测组织特异性复合体.组织特异性蛋白网络是结合蛋白质相互作用网络以及组织特异性基因表达数据两者来构建的.给定一个PPI网络,可以用图G=(V,E)来表示[9],其中V包含|V|=N个蛋白质,而E包含|E|条边.图G可以表示成一个邻接矩阵A,其中若有一条边连接蛋白质i与j,则Aij=1,否则Aij=0,在这种情况下,识别蛋白质复合体这一问题就转化为点的聚类问题.组织特异性基因表达数据是这N个蛋白质在T个组织中的基因水平,可以用一个N×T维矩阵F表示.本文将利用矩阵A以及矩阵F来构建组织特异性蛋白网络.若蛋白质i与j有相关关系,即Aij=1,并且在组织t中,蛋白质i与蛋白质j均显著表达,即Fit>0并且Fjt>0,则蛋白质i与蛋白质j在组织t中存在相关关系.根据上述方法,对T个组织进行构建,则可得到T个组织特异性蛋白网络.在本节中,本文先对组织特异性蛋白质相关关系网络中的每一个组织分别使用基本聚类方法,并使用非负矩阵分解模型来合并相似组织特异性蛋白质复合物,得到新的复合体,算法的基本流程如图1所示.2.2.1 基本聚类方法本文首先利用7种基本的聚类方法分别对这T个组织特异蛋白网络进行聚类,构建蛋白质复合体,所用的7种方法分别为MCL、MCODE、MINE、ClusterONE、DPClus、SPICi、CoAch.MCL是通过模拟在PPI网络中流的自由行走来检测蛋白质复合体的经典算法,它定义了指派节点概率的Expansion操作和改变节点游走概率的In fl ation操作来模拟随机游走的扩展和收缩行为[10,11].MCODE是一种基于蛋白质的连接值来检测蛋白质复合体的计算方法,它首先利用节点的局部邻域密度给PPI网络中每个节点进行加权,然后选择具有最高权值的节点作为初始聚类的种子节点,并由种子节点向外扩张形成最后的簇(蛋白质模块)[11,14].MINE是一种类似于MCODE的凝聚聚类算法,但它使用了一个改进的顶点加权策略,并且可以衡量网络模块性,而这两者都有助于避免使用生长群内包含的临界点来定义模块的边界[13].DPClus是一种通过簇边界的跟踪进行聚类的算法,它不仅利用模块密度而且利用新定义的粗特性CP完成复合体检验[11,14].ClusterONE是一种能识别带重叠的蛋白质复合体的一种算法,它依赖于重叠领域扩张[15].CoAch是一种利用核心依附关系进行复合体检测的算法,该算法分为两个阶段,第1阶段从邻接图中定义核心顶点,然后从中检测蛋白质复合体的核心蛋白质,第2阶段为将附属蛋白质逐个连接到核心蛋白质所代表的复合体中[11,16].SPICi是一种高效算法,SPICi种子集群根据其加权度的节点,如果支撑足够高,并且集群的密度低于用户定义的阈值,则此非集群节点将会添加到集群中,否则,群集被输出,这个簇的节点将会从网络中移除[17].2.2.2 非负矩阵分解模型对每一个组织,分别使用上述7种聚类方法,可以得到7个复合体矩阵V1,V2,···,V7,Vi(i=1:7)为N×Pi(i=1:7)矩阵,其中N代表蛋白质的个数,Pi为第i 种聚类方法所识别的蛋白质复合体的个数.对于矩阵Vi,若蛋白质Ni,Nj,···,Nk组成第e个复合体(1<=e<=Pi),则在第e列中,除了蛋白质Ni,Nj,···,Nk所对应的元素为1外,其余元素为0.将这7个复合体矩阵V1,V2,···,V7横向排列,得到矩阵V=[V1,V2,···,V7],V为N行P列的矩阵依造此方法,可构建出T个矩阵.接着,我们使用了非负矩阵分解模型来合并相似瞬时蛋白质复合物.它提供了一种对非负矩阵的低秩逼近,并且已被广泛地运用到聚类当中[18,19].Lee和Seung的非负矩阵分解方法,设定模型为利用更新法则最后得到矩阵W(N×K)和H(K×P),本文只对矩阵W进行研究,将其横向归一,即令Uik=Wik/Wi..得到U之后,设定过滤阈值τ,若Uij>τ,则蛋白质Ni是复合体Kj的组成部分.由上可知,本次算法共有两个参数,所识别的复合体的个数K以及过滤阈值τ.由于复合体大多是由3个及3个以上的蛋白质组合而成,因此对所识别出的复合体进行过滤,将蛋白质个数<2的复合体舍去.本文从BIOGPS项目中的Af f ymetrix数据集中获得了83个人体组织和细胞系的转录水平[20],并从BioGrid网站[21]中下载到人体蛋白质-蛋白质相互作用关系,构建了83个组织特异性蛋白网络,具体处理数据以及构造方法详见文献[20],本文挑选了蛋白质对个数>10000的26个组织进行分析,这26个组织或者细胞分别为:BDCA 4+树突状细胞、支气管上皮细胞、CD105+内皮、CD19+B细胞、髓细胞、造血干细胞、CD4+T细胞、CD56+自然杀伤细胞、CD71+早期红细胞前体细胞、CD8+T细胞、心脏肌细胞、肠和直肠腺癌、慢性粒细胞性白血病k-562、早幼粒细胞性白血病淋巴细胞(MOLT-4)、白血病HL-60、淋巴瘤burkitt(Daudi)、淋巴瘤burkitt(Raji)、日间松果体、夜间松果体、前额叶皮层、视网膜、前列腺、平滑肌、甲状腺、全血.为了衡量所检测出的复合体的精确性,本文选择了一个广泛使用的复合体标准作为黄金标准,该标准是从哺乳动物蛋白质复合体的CORUM[22]数据库中得到,最终获得由2151个蛋白质组成的324个复合体,本文中只选取其中蛋白质个数大于3个的复合体.我们将判断预测的复合体是否能很好地对应到已知的复合体作为评判标准.ACC[23]是用来测量几何精度的,在这项研究中,它被用来评估预测的复合体与参考的复合体之间的相似性.MMR(the Maximum Matching Ratio)由Paccanaro提出的用来评估相对于参考蛋白质复合体来说预测的蛋白质复合体是否符合期望的一个评价标准.MCL有一个用来调整聚类的间隔尺寸的参数,俗称膨胀率,本文设定其取值范围从3.0到5.0,步长为0.2;MCODE设定蛋白质个数为3,其余参数默认;MINE设定蛋白质个数为3,其余参数默认;DPCLUS有两个参数,最小密度d以及最小聚类性质参数cp,本文设定其值分别为0.7以及0.5;ClusterOne参数设为默认;CoAch有一个参数ω,用来过滤冗余的核心蛋白质,本文设定取值范围为0.225到0.925,步长为0.05;SPICi有两个参数,其中我们设定密度阈值这一参数的取值范围为0.1到1,步长为0.1.对于以上7种算法,挑选出使得每种算法的ACC和MMR的调和平均数最大的参数值作为最后选定的参数值.本文的算法中,共有两个参数K以及τ,K为所识别的蛋白质复合体的个数,根据过往者的经验,设定其取值范围从600到1600,步长为200,τ为过滤阈值,设置其取值范围为0到0.9,步长为0.1.在对26个组织分别进行上述算法后,得到表1.在对所有组织计算中发现,一般复合体个数在600-2000并且阈值在0或者0.1的情况下表现良好,由于篇幅有限,仅挑选出4个组织进行参数分析,分别为:甲状腺、B细胞、前额叶皮层、T细胞,如图2.在这一章中,我们将本文的算法与其他7种算法对这26个组织或者细胞的蛋白质网络进行聚类之后的结果进行比较.对于其他7种基本聚类方法,我们取其ACC和MMR的调和平均数为这26个组织最后的结果,从表1中可以看出,本文的算法最后得到的ACC值在24个组织中处于最大值,两个组织中居于第二.本文将26个组织所用的7种方法得到的最高值与本文所用的方法进行比较,提升最高的组织是前列腺,提高值为13%.在与其他7种方法分别单独比较时,提高最高的百分比分别为:51.61%、33.33%、39.53%、122.22%、27.03%、25.00%、27.91%,具体提升情况可参见图3,从图中我们可以看出,MCODE算法所得到的结果最差,在26个组织中,使用非负矩阵分解得到的结果均比其提高30%以上;其次是MCL,提高了8%到40%;而CLusterONE表现最好,有两个组织比本文的算法分别高出1.96%、3.08%.从上述结果中可以看出,本文所提出的算法与其他7种方法相比是具有优越性的. 组织特异性蛋白质复合体对于理解生物学功能以及确定生物标志物和功能靶标十分重要,这也是本文的研究动机.同一个蛋白质在不同的组织中会与不同的蛋白质相结合,举例来说,转运蛋白1(TNPO1)在树突状细胞中与蛋白质CD4、PPP3CA、TNPO3结合,在髓细胞中与SRP19、TNPO3相结合,而在平滑肌中则与蛋白质IPO5、IPO7、NUTF2、RAN、SRP19结合形成复合体,由此可以看出在不同的组织中其会与不同的蛋白质相结合,而TNPO1与TNPO3则同时出现在不同组织的同一个复合体中,这与生命活动也是相符合的.在真正的生命活动中,蛋白质会在不同的组织中与不同的蛋白质相结合,而许多现有的检测蛋白质复合物模型都是在静态PPI网络模型中直接检测,而忽略了蛋白质复合体的空间特异性.本文利用多种方法对不同的组织构建组织特异性蛋白质相互作用网络,并使用非负矩阵分解模型对其他聚类结果进行合并聚类,并在获取组织特异性蛋白质复合体时得到了良好的结果.同时,本文也有一些不足,虽然本文的结果在ACC标准中表现良好,但在MMR这一标准中仍需改进,同时,本文仅选取一个黄金标准复合体,在接下来的工作中,我们可以参考多组黄金标准复合体进行方法之间的比较.【相关文献】[1]Aebersold R,Mann M.Mass spectrometry-based proteomics[J].Nature,2003,422(6928):198-207.[2]Ho Y,Gruhler A,Heilbut A,et al.Systematic identif i cation of protein complexes in Saccharomyces cerevisiae by mass spectrometry[J].Nature,2002,415(6868):180-183. [3]Ito T,Chiba T,Ozawa R,Yoshida M,Hattori M,Sakaki Y.A comprehensive two-hybrid analysis to explore the yeast protein interactome[J].Proceed.National Acad.Sci.United States America, 2001,98(8):4569-4574.[4]Uetz P,Giot L,Cagney G,Mansf i eld T A,et al.A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae[J].Nature,2000,403(6770):623-627. [5]Gavin A C,B sche M,Krause R,et al.Functional organization of the yeast proteome by systematic analysis of protein complexes[J].Nature,2002,415(6868):141-147.[6]Lage K,Karlberg E O,Størling Z M,et al.A human phenome-interactome network of protein complexes implicated in genetic disorders[J].Nature Biotechnology,2007,25(3):309-316.[7]Lo K,Raftery A E,Dombek K M,et al.Integrating external biological knowledge in the construction of regulatory networks from time-series expression data[J].BMCSys.Bio.,2012,6(2):101.[8]Vasmatzis G,Klee E W,Kube D M,Therneau T M,Kosari F.Quantitating tissue specif i city of human genes to facilitate biomarker discovery[J].Bioinformatics,2007,23(11):1348-1355.[9]Li D,Li J,Ouyang S,Wang J,Wu S,Wan P,Zhu Y,Xu X,He F.Protein interaction networks of Saccharomyces cerevisiae,Caenorhabditis elegans and Drosophila melanogaster:large-scale organization and robustness[J].Proteomics,2006,6(2):456-461.[10]Enright A J,Dongen S V,Ouzounis C A.An efficient algorithm for largescale detection of protein families[J].Nucleic Acids Res,2012,30(7):1575-1584.[11]冀俊忠,刘志军,刘红欣,刘椿年.蛋白质相互作用网络功能模块检测的研究综述[J].自动化学报,2014, 40(4):577-593.[12]Bader G D,Hogue C W V.An automated method for f i nding molecular complexes in large protein interaction networks[J].BMC Bioinformatics,2003,4(1):2.[13]Rhrissorrakrai K,Gunsalus K C.MINE:module identif i cation in networks[J].BMC Bioinformatics, 2011,12(1):192.[14]Altaf-Ul-Amin M,Shinbo Y,Mihara K,Kurokawa K,Kanaya S.Development and implementation of an algorithm for detection of protein complexes in large interaction networks[J].BMC Bioinformatics,2006,7(1):207.[15]Nepusz T,Yu H,Paccanaro A.Detecting overlapping protein complexes in protein-protein interaction networks[J].Nature Methods,2012,9(5):471-472.[16]Wu M,Li X L,Kwoh C K,Ng C K.A core-attachment based method to detect proteincomplexes in PPI networks[J].BMC Bioinformatics,2009,10(1):169.[17]Jiang P,Singh M.SPICi:a fast clustering algorithm for large biologicalnetworks[J].Bioinformatics, 2010,26(8):1105-1111.[18]Lee D D,Seung H S.Learning the parts of objects by non-negative matrix factorization[J].Nature, 1999,401(6755):788-791.[19]Ding C,He X F,Simon H D.On the equivalence of nonnegative matrix factorization and spectral clustering[J].Siam Intern.Confer.Data Min.,2005,5:606-610.[20]Lopes T J,Schaefer M,Shoemaker J,Matsuoka Y,Fontaine J F,Neumann G,Andrade-Navarro M A,Kawaoka Y,Kitano H.Tissue-specif i c subnetworks and characteristics of publicly available human protein interaction databases[J].Bioinformatics,2011,27(17):2414-2421.[21]Chatr-aryamontri A,Breitkreutz B J,Heinicke S,et al.The Biogrid interaction database:2013 update[J].Nucleic Acids Research,2013,41(2):816-823.[22]Havugimana P C,Hart G T,Nepusz T,et al.A census of human soluble protein complexes[J].Cell, 2012,150(5):1068-1081.[23]Li X,Wu M,Kwoh C K,et putational approaches for detecting protein complexes from protein interaction networks:a survey[J].BMC Genomics,2010,11(4):S3.[24]Ou-Yang L,Dai D Q,Zhang X F.Protein complex detection via weighted ensemble clustering based on bayesian nonnegative matrix factorization[J].Plos One,2013,8(5):639-642.[25]Ou-Yang L,Dai D Q,Li X L,Wu M,Zhang X F,Yang P.Detecting temporal protein complexes from dynamic protein-protein interaction networks[J].BMC Bioinformatics,2014,15(1):16001-16005.[26]Zhang X F,Dai D Q,Ou-Yang L,Yan H.Detecting overlapping protein complexes based on a generative model with functional and topological properties[J].BMC Bioinformatics,2014,15(2):836-842.[27]Zhang W,Zou X F.A new method for detecting protein complexes based on the three node cliques[J]. IEEE/ACM Trans Comput.Biol.Bioinform,2015,12(4):879-886.[28]涂俐兰.两两序列比对的一种新方法[J].数学杂志,2006,26(1):67-70.。

protease 蛋白类别

protease 蛋白类别

protease是蛋白酶的英文名称,是一种能够水解蛋白质的酶。

根据作用底物的不同,蛋白酶可以被分为以下几类:
1. 胃蛋白酶(Pepsin):主要在胃和小肠中发挥作用,分解食物中的蛋白质。

2. 胰蛋白酶(Trypsin):主要在胰腺和小肠中发挥作用,分解食物中的蛋白质。

3. 木瓜蛋白酶(Papain):主要来源于木瓜,能够分解蛋白质。

4. 胶原蛋白酶(Collagenase):能够分解胶原蛋白,主要在某些细菌和动物中存在。

5. 弹性蛋白酶(Elastase):能够分解弹性蛋白,主要在肺、胰腺和肠道中发挥作用。

除此之外,还有许多其他的蛋白酶种类,如嗜热菌蛋白酶、麦芽糖酶等。

这些蛋白酶在生物体内发挥着重要的生理作用,参与蛋白质的消化、代谢和降解等过程。

蛋白聚糖 表达

蛋白聚糖 表达

蛋白聚糖表达
【原创实用版】
目录
1.蛋白聚糖的概述
2.蛋白聚糖的表达过程
3.蛋白聚糖表达的重要性
4.蛋白聚糖表达的研究进展
5.结论
正文
【1.蛋白聚糖的概述】
蛋白聚糖(Proteoglycan)是一种大分子复合物,主要由蛋白质和糖类组成,广泛存在于细胞表面和细胞外基质中。

蛋白聚糖在生物体内具有多种重要生物学功能,如细胞信号传导、细胞黏附、细胞迁移等。

【2.蛋白聚糖的表达过程】
蛋白聚糖的表达主要包括两个步骤:一是蛋白质的合成,通过核糖体将 mRNA 翻译成蛋白质;二是糖基的添加,将合成的蛋白质转移到高尔基体,并在那里添加糖基。

这个过程需要多种糖基转移酶的参与,并且可能会受到多种因素的调控。

【3.蛋白聚糖表达的重要性】
蛋白聚糖的表达对于细胞功能和生物体发育至关重要。

首先,蛋白聚糖可以作为细胞的标识,参与细胞间的相互作用和信号传导。

其次,蛋白聚糖也可以影响细胞的结构和功能,如细胞黏附和迁移。

最后,蛋白聚糖的表达异常可能导致多种疾病的发生,如肿瘤和关节炎。

【4.蛋白聚糖表达的研究进展】
随着科学技术的发展,对于蛋白聚糖表达的研究也在不断深入。

目前已经发现了许多调控蛋白聚糖表达的分子机制,如糖基转移酶的活性调控、糖基供体的调控等。

同时,也有许多研究探索如何通过调控蛋白聚糖表达来治疗疾病,如通过抑制肿瘤相关蛋白聚糖的表达来抑制肿瘤生长。

【5.结论】
总的来说,蛋白聚糖是一种重要的生物大分子,对于细胞功能和生物体发育具有重要作用。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

arXiv:q-bio/0505006v3 [q-bio.MN] 24 May 2006YEAST PROTEIN INTERACTOME TOPOLOGY PROVIDES FRAMEWORK FOR COORDINATED-FUNCTIONALITY´ ANDRE X. C. N. VALENTE1∗, MICHAEL E. CUSICK2 1 BIOMETRY RESEARCH GROUP, NATIONAL CANCER INSTITUTE, NATIONAL INSTITUTES OF HEALTH, DEPT. OF HEALTH AND HUMAN SERVICES BETHESDA, MD 20892 USA 2 CENTER FOR CANCER SYSTEMS BIOLOGY AND DEPT. OF CANCER BIOLOGY DANA-FARBER CANCER INSTITUTE AND DEPT. OF GENETICS, HARVARD MEDICAL SCHOOL BOSTON, MA 02115 USA ∗ TO WHOM CORRESPONDENCE SHOULD BE ADDRESSED; E-MAIL: ANDRE@Summary The architecture of the network of protein-protein physical interactions in Saccharomyces cerevisiae is exposed through the combination of two complementary theoretical network measures, betweenness centrality and ‘Q-modularity’. The yeast interactome is characterized by well-defined topological modules connected via a small number of inter-module protein interactions. Should such topological inter-module connections turn out to constitute a form of functional coordination between the modules, we speculate that this coordination is occurring typically in a pair-wise fashion, rather than by way of high-degree hub proteins responsible for coordinating multiple modules. The unique non-hub-centric hierarchical organization of the interactome is not reproduced by gene duplication-and-divergence stochastic growth models that disregard global selective pressures. Introduction The set of all physical protein-protein interactions in a cell – the interactome – presents a foundational picture for Biology, sitting at the lowest level of description at which it is possible to have an holistic view of a cell rather then just an isolated study of its individual components. In this article, we make a small contribution to the ongoing effort to understand the global architecture of this fundamental physical network [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. The interactome can be represented in an abstract way as a network of nodes connected by links, where nodes stand for proteins and links for direct physical interactions between proteins. In recent years, there has been much interest in applying statistical mechanics to the study of such complex networks [11, 12]. However, the validity of such an approach is always conditional on the12YEAST PROTEIN INTERACTOME(a)...Top level: Linear string...Lower level: Scale-free distribution but otherwise random(b)Scale-free distribution but otherwise random networkF IGURE 1. Hidden intrinsic hierarchy in networks. In this example sketch, network (a) has two hierarchical levels: it consists of a linear string of nodes at the top level that connect to a lower level set of subgraphs possessing scale-free degree distributions, but that are otherwise random. Network (b) has only one hierarchical level, also with a scale-free degree distribution and random in other respects. Arguably, for many applications the difference between topology (a) and topology (b) is of relevance. Yet, an analysis based on common measures such as degree distribution [11, 12], clustering coefficient as a function of degree [11, 44], or degree correlation measures [61], to name a few, would indicate networks (a) and (b) to be topologically identical. This is due to the much larger number of nodes at the lower hierarchical level statistically overwhelming, and therefore hiding, the top level structure. fundamental assumptions of statistical mechanics being satisfied. For instance, many statistical measures will not be able to distinguish between a network with an intrinsic hierarchical topology and one without it (Figure 1). Results and Discussion Our approach sidesteps the limitations alluded to in the preceding paragraph and focuses directly on how the interactome topology relates to two broad biological concepts. The first of these, hierarchical organization, is in this context the notion that there may exist a hierarchy in the role of proteins [10]. On one hand, there are proteins that perform very specific, local functions, relevant only within the context of a particular biological process. On the other hand,YEAST PROTEIN INTERACTOME3some proteins may possess a global, high level role, perhaps acting as mediators of distinct biological processes. To study the topological hierarchy in the interactome, we use the graph theoretical betweenness centrality measure [13, 14, 15] (Supp. Mat.). Betweenness centrality (denoted ‘traffic’, henceforth) for a node is the total number of shortest paths (between any two other nodes) in the network that pass through that node. A high traffic value for a protein therefore correlates with that protein being topologically central in the interactome. The second of these is the concept of biological functional modularity [16]. In the context of proteins, at the extreme, this takes the form of protein machines performing specific functions in a cell [7]. More generally, it consists of an expectation that the density of protein-protein interactions will rise as we zoom into an increasingly functionally related set of proteins. To assess modularity in the interactome topology, we use the ‘Q-modularity’ measure of Newman [17], which assigns a modularity score, Q, to any given partition of the network into modules. The modularity Q is defined as the difference between the ratio (intra-module edges)/(total edges) for the network in question, and the expected value for this ratio if edges in the network were randomized, subject to every node maintaining its original degree. We use the algorithm of Clauset et al. [18] to find an interactome partition into modules that corresponds to a large Q value. We now explain how to produce an interactome polar map (Figure 2) by combining the information contained in the modularity and traffic analyses. The position of every protein in the map is specified in terms of its radial and angular coordinates. The radial coordinate is a function of its traffic [19]. More precisely, it is proportional to log (max traffic/protein’s traffic), where ‘max traffic’ is the maximum node traffic in the network (Supp. Mat.). A logarithmic scale is used due to the long tail of the traffic distribution [20]. The protein angular coordinates are assigned such that all proteins in the same module fall within the same angular range (Supp. Mat.). This way, an interactome map is created where topologically increasingly central proteins are radially increasingly closer to the center of the map, while angular sectors correspond to topological modules. To determine the circular ordering of the modules in the map, we introduce a Ring Ordering Algorithm that, based on the interactome inter-module connectivity, attempts to place closer to each other, to the extent that it is possible, modules that are more topologically related (Supp. Mat.). We apply this analysis and discuss its implications in the context of the Saccharomyces cerevisiae interactome. The interactome data set we use is the higher confidence ‘filtered yeast interactome’ (FYI) [10], consisting of interactions supported either by small-scale screens as reported in the MIPS database [21] or by at least two distinct methods from amongst i) highthroughput yeast two-hybrid experiments [22, 23], ii) computational predictions based on gene co-occurrence [24, 25], gene neighborhood [24, 26] or gene fusion [24, 25], iii) high-throughput affinity purification/mass spectrometric protein complex identification experiments [27, 28] and iv) small-scale or module-scale experimental identification of protein complexes as reported in MIPS [21] (Supp. Mat.). We consider only the giant connected component produced by this data set, a network containing 1741 interactions amongst 741 proteins. Our data set choice reflects a desire to bias the data towards a thorough and accurate coverage of a limited region of the interactome as opposed to a wider, but likely shallower and more error prone, sampling. Note that when defining whether two proteins interact, a binary description is being imposed4YEAST PROTEIN INTERACTOMEon what ideally would be characterized in terms of an affinity constant. Experimentally, effectively the aim has been for a cut-off that is high enough to exclude indiscriminate low-affinity interactions, such as those that occur between a general protein and proteasomal, ribosomal, or heat shock proteins, since in principle these are less informative interactions [29, 30]. Hence, such interactions are largely absent from data sets. With the vast majority of the proteins at the periphery of the map and well-defined modules connected through a handful of more central proteins, the yeast interactome polar map (Figure 2) presents what we term a ‘coordinatedfunctionality’ architecture. Next, we discuss how this interactome architecture fits in with the biology of the cell. An examination of the MIPS database functional (biological process) annotation of the proteins [21] demonstrates that the topological modules make very good sense as biological functional modules. On average 88.8% of the proteins in a module share a similar function based on their MIPS classification [21] (Supp. Mat.), confirming earlier studies connecting topology and function [31, 32, 33, 34, 35, 36, 37, 38, 39, 40]. Part of the mismatch between the topological modules and current protein functional annotations will likely vanish, once more complete and accurate interactome data sets become available. However, it is the disparities not due to data set limitations that are truly interesting, for those are instances where the functional modules based on the interactome topology are not the ones we currently assign in functional classification schema. In view of i) the good overall match found and ii) the foundational role of the interactome in the cell, we propose that the interactome topology represents a fundamental source for the division of proteins into functional groups. As such, its modularity analysis provides a rigorous alternative to the currently subjective functional annotation present in protein databases. In accordance, we name the topological modules found so as to reflect their perceived biological roles (Figure 2). The average degree of essential [41, 42] proteins in the data set is 5.7, while that of nonessential proteins is 3.9, a difference that is too large to be attributable to chance alone [43] (Supp. Mat.). This difference may indeed be biologically meaningful. Although physically knocking out a gene associated with a high or a low degree protein, say of degree 2 or 10 respectively, may be considered equivalent, from a mathematical network perspective it is not. In one case, it involves deleting 1 node and 2 links, in the other it involves deleting 1 node and 10 links. Note that such an explanation would not involve ascribing any out of the ordinary, higher-level role to hubs, the large degree proteins. Alternatively, the observed higher average degree of essential proteins could still stem from lingering systematic biases in the FYI data set. While the translation of the interactome modular topology into biological functional modularity is straightforward, this is not the case for the interactome topological hierarchical organization. A fundamental question that the traffic analysis gives rise to is how the interactome topological hierarchical organization phenomenologically expresses itself. For instance, do more hierarchically central proteins in fact perform a higher-level, coordinating role in the cell? At present, these are open biological questions. Comparing proteins of equal degree, we find no significant correlation between a protein’s traffic and its essentiality [41, 42] (Supp. Mat.), something not too surprising, as knocking outYEAST PROTEIN INTERACTOME5F IGURE 2. Saccharomyces cerevisiae interactome polar map. The map is constructed, in an unsupervised manner, based solely on protein-protein interaction data. The module captions (blue text boxes) were manually chosen, a posteriori, to reflect the biological role of each module. The map suggests a ‘coordinatedfunctionality’ architecture for the interactome, arguably an ideal framework for the cell to physically implement the concept of distinct, yet coordinated, biological functional modules. This would be a pair-wise-coordination, as inter-module physical interactions occur in a pair-wise fashion: of the 76 proteins that possess inter-modular connections, only 4 connect their module to more than a single other module (TAF25 in module #21 and SRP1 in module #13 have links to four other modules, while NUP1 in module #13 and CLB2 in module #1 have links to two other modules). The map is based on the higher confidence FYI protein-protein interaction data set [10], consisting of interactions validated either through small-scale experiments or through at least two distinct procedures. The giant connected component, shown here, consists of 741 proteins and 1752 protein-protein interactions.6YEAST PROTEIN INTERACTOMEa key protein that renders an essential functional module inoperant is plausibly more damaging than knocking out a protein that mediates two distinct processes that nonetheless can still function independently. An intriguing, though at the moment still unsupported hypothesis is that, if a protein is disrupted, its traffic level correlates with the likelihood of causing non-lethal side effects in multiple areas of the biology of the cell. Speculating further, perhaps our representation of the interactome can provide clues as to where those side effects may arise – a matter of critical importance in drug development. A different possibility is that, for some of these module-connecting proteins, interacting with multiple modules is not a sign of a role in coordinating the functionality of the modules, but rather just a result of the protein being independently used in those modules. In opposition to a true functional ‘connector’, we would call such a protein a ‘bolt’ (alternatively, ‘widget’), in reference to how, analogously, a mechanical bolt can be used in multiple functional modules of a human engineered machine, while playing no role in coordinating their functionality. Finally, note that a priori the observed 47 intermodule interactions are particularly susceptible to be false-positives, because a false-positive interaction between two random unrelated proteins is likely to result in an inter-module, high traffic interaction. However, significantly, 45 out of these 47 inter-module interactions belong to the set of interactions supported by small-scale targeted experiments and arguably it is not very likely that a false-positive interaction of the type just described would go unnoticed in a targeted experiment and further make it into the peer-reviewed literature. Of the remaining 2 inter-module interactions, one is reported in the Ito et al. [23] and Uetz et al. [22] highthroughput yeast two-hybrid data sets as well as in the MIPS data set of protein complexes identified via small or module scale experiments [21], while the other is reported in the Gavin et al. [27] high-throughput protein complex identification study and again in the Uetz et al. [22] yeast two-hybrid data set. Out of the 45 supported by small-scale experiments, 4 are also reported in a high-throughput yeast two-hybrid study [22, 23], 3 in a high-throughput protein complex identification study [27, 28] and 2 in the MIPS data set of protein complexes identified via small or module-scale experiments [21]. Whatever biological role central proteins turn out to play, we submit that they call for further experimental investigation, given their unique topological placement in the interactome. Having hierarchically classified the proteins with the traffic measure, we are now in a position to consider network degree distribution related questions without falling prey to previously noted statistical problems (Figure 1). Of particular interest is how degree changes as one moves hierarchically across the interactome [14] (Figure 3a). Surprisingly, nodes of different degree are rather homogeneously hierarchically spread across the interactome: note the large spread between the green, red and blue curves relative to their small positive slopes; or, for a more quantifiable attribute, how the 10% of nodes with the largest degree in the periphery of the interactome have a significantly larger degree than the average degree at the center of the interactome. Thus, the interactome is not hierarchically stratified by degree. In particular, the interactome has a non-hub-centric hierarchical organization. Further, note that should the inter-module protein interactions indeed represent a form of functional coordination, then this coordination is apparently occurring overwhelmingly in a pair-wise fashion: out of the 76 proteins that possess links to modules outside their own, only 4 connect their home module to moreYEAST PROTEIN INTERACTOME7than one other module (TAF25 and SRP1 have links to 4 other modules, NUP1 and CLB2 to 2 other modules). The other 72 proteins connect their home module to a single other module (Supp. Mat.). It is also noteworthy that, amongst these 76 connecting proteins, the higher degree ones in general belong clearly in their assigned home modules (specifically, for connecting proteins of degree 4 or higher, let us exclude the 2 interactions that each of these proteins must have by default to connect it to its home module and to one linked module; then, 95.3% of the remaining interactions of connecting proteins are with the protein’s respective home module. Supp. Mat.) This pair-wise-coordination is in sharp contrast with the picture of a hub protein connecting and mediating multiple modules [11, 43, 44]. The non-hub-centric organization runs contrary to a number of network growth models that have been proposed to explain the topology of the interactome [11, 45, 46, 47, 48]. The models are based on evolution by stochastic gene duplication and divergence [49, 50]. Amongst other reproduced statistics, the models are able to generate the power-law degree distribution observed in early interactome data sets [51]. Since these models do not make appeal to evolutionary selection pressures, a major conclusion taken from their success was that natural selection is not required to reproduce the global structure of the interactome; instead, stochastic gene duplication and divergence suffices to give rise to that topology [45, 47]. However, here we report that these gene duplication models lead to hierarchically hub-centric networks. In Figure 3b, we show data pertaining to an interactome built using the model of Pastor-Satorras et al. [45]. By comparison with the same plot for the yeast interactome, this time the network is clearly stratified by degree, with the larger degree nodes concentrated at the hierarchical center of the network. Now it is the average degree at the center of the network that is significantly larger than the average degree of the 10% of nodes with largest degree at the periphery. The models of V´ zquez [46] (slightly different implementation of the gene duplication and divergence proa cess), and of Wagner [47] (emphasizing a continuous divergence in the form of gain and loss of interactions amongst existing proteins) produce similar hub-centric networks (Supp. Mat.). In summary, there are at least three possible explanations for the non-hub-centric hierarchy we observe in yeast: i) it is a spurious effect associated with the limitations of existing data and the interactome is in fact hub-centric; ii) there is some crucial feature of the gene duplication and divergence mechanics that is not understood and/or is not captured by current models; or iii) the non-hub-centric hierarchy is in fact shaped by natural selection pressures on the global interactome structure. Our study is based on the present-day knowledge of the yeast interactome, which is still rather deficient [52, 53, 54, 55, 56, 57, 58]. We minimized false-positives by using the higher confidence yeast FYI data set as the source for our study. The good correspondence between the topological modular breakdown of the interactome and the known functional annotation of proteins corroborates that false-positives are not an overriding problem in this data set. Nonetheless, we repeated the interactome analysis using a data set of interactions reported in the MIPS database that are validated through small-scale screenings, the most reliable source of data [55] (giant component: 392 proteins, 675 interactions. Supp. Mat.). The correspondence between topological and functional modules (now on average 92% of the proteins in a module shared a similar function based on their MIPS classification), as well as the reported non-hub-centric830YEAST PROTEIN INTERACTOME30(a) Yeast Interactome25bin size = 0.6(b) Stochastic growth model25bin size = 0.620Average degree15Average degree<top 10%>2015<top 10%>10 <all nodes in bin>10 <all nodes in bin>5 <lowest 10%> 05 <lowest 10%>7.588.599.5 10 10.5 1107.588.599.510Log (betweenness centrality) [ log(traffic) ]Log (betweenness centrality) [ log(traffic) ]F IGURE 3. A non-hub-centric hierarchical organization. In the yeast interactome (a), the degree distribution does not change greatly as one moves from the periphery to the center of the Figure 2 polar map (i.e, as one moves from a low to a high traffic region). In other words, hubs are not hierarchically central in the yeast interactome. In contrast, gene duplication and divergence interactome stochastic growth models [45, 46, 47] produce hub-centric interactomes, where the average degree markedly increases with traffic and the hierarchical center of the interactome is therefore dominated by hubs. (a) Analysis for the yeast interactome giant connected component, based on the FYI data set (741 proteins, 1752 interactions) [10]. Red curve - the average degree for the set of nodes whose log (traffic) value falls within 0.3 of the log (traffic) value indicated in the x-axis. That is, a log (traffic) bin of size 0.6 is continuously slid along the log (traffic) axis and the average degree for all the nodes that fall within the bin is calculated. The last bin also includes all nodes with a log (traffic) value larger than the range shown in the figure. The bins cover the entire data set, with every bin containing at least 26 nodes. The first bin contains 417 nodes. The last bin contains 37 nodes. Green curve - similar to the red curve, except this time the degree average is done only over the 10% largest degree nodes in the bin. Blue curve - similar to green curve, but this time averaging over the 10% lowest degree nodes in the bin. The average degree in the highest traffic bin (rightmost data point in the red curve) is only 0.42 times the average degree of the 10% largest degree nodes in the lowest traffic bin (leftmost data point in the green curve). (b) Corresponding plots for the giant component of an interactome evolved under the gene duplication and divergence stochastic growth model of Pastor-Satorras et al. [45]. In this case, the giant component contains 759 nodes and 1542 interactions. Every bin contains at least 26 nodes. The first and last bins contain 432 and 40 nodes, respectively. The average degree in the highest traffic bin is now 4.6 times larger than the average degree of the 10% largest degree nodes in the lowest traffic bin. Similar results were achieved under multiple trials, model parameters and gene duplication growth models (Supp. Mat.).YEAST PROTEIN INTERACTOME9hierarchy were again supported by this small-scale data set (Supp. Mat.). Regarding the limited coverage of our data sets (FYI ≈ 12% coverage, small-scale ≈ 7% coverage, assuming ≈ 6000 proteins in yeast [23]), it is of note that the doubling in size of the network, going from the small-scale to the FYI data set, did not dilute the observed non-hub-centric topology nor the pair-wise inter-module connectivity pattern. In fact, the FYI data set produces an even slightly less hub-centric interactome than the small-scale data set (Supp. Mat.). Likewise, the pair-wise inter-module connectivity pattern is no less present in the FYI than in the small-scale data set (where out of the 47 proteins with inter-module links, 4 connect their home module to more than one other module. Supp Mat.). The observation that higher degree connecting proteins are in general strongly attached to their home modules is equally confirmed in both data sets (the earlier mentioned 95.3% of home module interactions for the FYI higher degree connecting proteins, now becomes 95.7% in the small-scale data set). Still, it is important to bear in mind the limitations of current data sets. For instance, regarding the typical pair-wise coordination, the possibility that this observation is only the result of a high-number of false-negatives for inter-module protein interactions cannot be ruled out. Note, for example, how in the map there are no interactions between the translation initiation module and the ribosomal subunit modules, even though such interactions must certainly exist. Ultimately, only the generation of more accurate and comprehensive interactome data sets can unequivocally confirm or disprove some of the results and hypotheses put forward in this article [52, 53]. So far our analysis has focused on the global interactome topology. Now we would also like to highlight its potential as a framework for exploiting the wealth of interactome data. The interactome can form a valuable platform for crystalizing biological thought. We briefly introduce two relevant extensions to our work. First, one may zoom into a module of interest in the interactome and locally repeat the analysis, producing a single module polar map (Figures in Supp. Mat.). Such a local map provides one with a starting point to discuss the biology of the process under study, interpret and design experiments, and generate new biological hypotheses. Second, we note that the entire proteome is rarely, if ever, evenly expressed by the cell [49]. Therefore, perhaps the interactome is best viewed as a potential network at the cell’s disposal, with different parts of it being turned on and off to different degrees, as biologically required. Integrating mRNA expression data with the interactome polar map [59] (Figure 4), permits a proper, unified analysis of this dynamical network. The interactome represents an elementary abstraction of the multitude of complex biochemical interactions taking place in the rich physiological environment of the cell. In the trade-off between simplicity and realism, arguably some facts may be beneficially incorporated in future interactome models: For instance, protein interactions vary within a continuum of binding affinities and post-translational modifications as well as allosteric interactions effectively change the possible binding partners of a protein, to name a few of the more prominent omissions at present [49]. We end by noting that the organization of a network through a procedure akin to the one used in this paper may also be of relevance to the problem of network motif finding [60], as certain motifs may turn out to occur sparsely overall and yet be statistically significant in specific regions of the network (for example, only in the high traffic central area,10YEAST PROTEIN INTERACTOMEor only in a particular module).Acknowledgments - A. Valente thanks above all R. Fagerstrom, but also A. Sarkar S. Milstein, P.-O. Vidalain, M. Drezje, J. Han, S. Tee, K.-H. Lin and M. Boxem for comments and support, as well as the entire Vidal lab at Dana-Farber for its hospitality during part of the year. Finally, I am indebted to Howard Stone at Harvard University, under whose orientiation I initiated this work, and to Phil Prorok, Gary Gao and Xiaoxia Lin for their unconditional support at a critical stage of this work. This work was supported by DFCI Sponsored Research.R EFERENCES[1] Eisenberg D, Marcotte EM, Xenarios I, Yeates TO (2000) Protein function in the post-genomic era. Nature 405: 823-826. [2] Lu H, et al. (2004) The interactome as a tree - an attempt to visualize the protein-protein interaction network in yeast. Nucleic Acids Res 32: 4804-4811. [3] Vidal M (2001) A biological atlas of functional maps. Cell 104: 333-339. [4] Bader GD et al. (2003) Functional genomics and proteomics: charting a multidimensional map of the yeast cell. Trends Cell Biol 13, 344-356. [5] Sharom JR, Bellows DS, Tyers M (2004) From large networks to small molecules. Curr Opin Chem Biol 8: 81-90. [6] Tucker CL, Gera JF, Uetz P (2001) Towards an understanding of complex protein networks. TRENDS in Cell Biology 11: 102-106. [7] Alberts B (1998) The cell as a collection of protein machines: preparing the next generation of molecular biologists. Cell 92: 291-294. 2). [8] Bork P, et al. (2004) Protein interaction networks from yeast to human. Curr Opin Struc Biol 14: 292-299. [9] Gavin A-C, Superti-Fuga G (2003) Protein complexes and proteome organization from yeast to man. Curr Opin Chem Biol 7: 21-27. [10] Han JDJ, et al. (2004) Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430: 88-93. [11] Barab´ si A-L, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nature a Rev Genet 112: 101-114. [12] Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45: 167-256. [13] Freeman L, (1977) A set of measures of centrality based upon betweenness. Sociometry 40: 35-41. [14] Joy MP, Brock A, Ingber DE, Huang S (2005) High betweenness proteins in the yeast protein interaction network. J Biomed Biotechnol 2: 96-103. [15] Hahn MW, Kern AD (2005) Comparative genomics of centrality and essentiality in three eukaryotic protein interaction networks. Mol Biol Evol 22: 803-806. [16] Hartwell LH, Hopfield JJ, Leibler S, Murray AW (1999) From molecular to modular cell biology. Nature 402: C47-C51. [17] Newman MEJ, (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 69: Art. No. 066133. [18] Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70: art. no. 066111. [19] Brandes U (2003) Communicating centrality in policy network drawings. IEEE T Vis Comput Gr 9: 241-253. [20] Goh K-I, Kahng B, Kim D (2001) Universal behavior of load distribution in scale-free networks. Phys Rev Lett 87: art. no. 278701. [21] Mewes HW, et al. (2002) MIPS: a database for genomes and protein sequences. Nucleic Acids Res 30: 31-34. [22] Uetz, P. et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623-627 (2000).。

相关文档
最新文档