Genetica2009FMOevol_fulltext
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Molecular phylogeny,long-term evolution,and functional divergence of flavin-containing monooxygenases
Da Cheng Hao ÆShi Lin Chen ÆJun Mu ÆPei Gen Xiao
Received:26November 2008/Accepted:23June 2009/Published online:5July 2009ÓSpringer Science+Business Media B.V.2009
Abstract Flavin-containing monooxygenases (FMOs)metabolize xenobiotic compounds,many of which are clinically important,as well as endogenous substrates as part of a discrete physiological process.The FMO gene family is conserved and ancient with representatives pres-ent in all phyla so far examined.However,there is a lack of information regarding the long-term evolution and func-tional divergence of these proteins.This study represents the first attempt to characterize the long-term evolution followed by the members of this family.Our analysis shows that there is extensive silent divergence at the nucleotide level suggesting that this family has been sub-ject to strong purifying selection at the protein level.Invertebrate FMOs have a polyphyletic origin.The func-tional divergence of FMOs 1–5started before the split between amphibians and mammals.The vertebrate FMO5is more ancestral than other four FMOs.Moreover,the existence of higher levels of codon bias was detected at the N-terminal ends,which can be ascribed to the critical role played by the FAD binding motif in this region.Finally,critical amino acid residues for FMO functional divergence
(type I &II)after gene duplication were detected and characterized.
Keywords Flavin-containing monooxygenase ÁMolecular evolution ÁPurifying selection ÁGene duplication ÁFunctional divergence ÁMolecular phylogeny
Introduction
Flavin-containing monooxygenase (FMO)oxygenates drugs and xenobiotics containing a ‘‘soft-nucleophile’’,usually nitrogen or sulfur (Cashman and Zhang 2006;Krueger and Williams 2005).FMO,like cytochrome P450(CYP),is a monooxygenase,utilizing the reducing equiva-lents of NADPH to reduce one atom of molecular oxygen to water,while the other atom is used to oxidize the substrate.FMO and CYP also exhibit similar tissue and cellular location,molecular weight,substrate specificity,and exist as multiple enzymes under developmental control.The mam-malian FMO functional gene family is much smaller (five families each with a single member)than CYP.FMO does not require a reductase to transfer electrons from NADPH and the catalytic cycle of the two monooxygenases is remarkably different.Another distinction is the lack of induction of FMOs by xenobiotics.FMOs are as important as CYPs as the major contributor to oxidative xenobiotic metabolism.In addition,FMOs metabolize specific endog-enous substrates as part of a discrete physiological process.FMO and CYP have overlapping substrate specificities,but often yield distinct metabolites with potentially significant toxicological/pharmacological consequences.All five expressed mammalian FMO genes,FMO1to FMO5,exhibit genetic polymorphisms.
Electronic supplementary material The online version of this article (doi:10.1007/s10709-009-9382-y )contains supplementary material,which is available to authorized users.D.C.Hao ÁJ.Mu
Laboratory of Biotechnology,Dalian Jiaotong University,116028Dalian,China D.C.Hao
e-mail:hao@
D.C.Hao ÁS.L.Chen (&)ÁP.G.Xiao
Chinese Academy of Medical Sciences,Peking Union Medical College,100193Beijing,China e-mail:slchen@
Genetica (2009)137:173–187DOI 10.1007/s10709-009-9382-y
Eswaramoorthy et al.(2006)analyzed the functional mechanism of FMO from Schizosaccharomyces pombe using the crystal structures of the wild type and protein-cofactor and protein-substrate complexes.FMO(447aa)of S.pombe is composed of two structural domains.Residues 176–291form a small structural domain(insertion domain, ID),with the remainder of the polypeptide chain forming a larger single domain consisting of N-terminal region and C-terminal region.A channel is present between these two domains.A60-residue-long polypeptide chain segment in a predominantly random coil configuration occurs in the interface between the two domains,where it appears to stabilize the overall domain organization.The structure of the wild-type FMO revealed that the prosthetic group FAD is an integral part of the protein.FMO needs NADPH as a cofactor in addition to the prosthetic group for its catalytic activity.It was proposed that FMOs exist in the cell as a complex with a reduced form of the prosthetic group and NADPH cofactor,readying them to act on substrates.The 4a-hydroperoxyflavin form of the prosthetic group repre-sents a transient intermediate of the monooxygenation process.The oxygenated and reduced forms of the pros-thetic group help stabilize interactions with cofactor and substrate alternately to permit continuous enzyme turnover. Moreover,the X-ray structure of a soluble prokaryotic FMO from Methylophaga sp.strain SK1has been solved at 2.6-A˚resolution and is now the protein of known structure with the highest sequence similarity to human FMOs (Alfieri et al.2008).The structure,resembling that of S. pombe FMO,possesses a two-domain architecture,with both FAD and NADP?well defined by the electron density maps.Biochemical analysis shows that the prokaryotic enzyme shares many functional properties with mamma-lian FMOs,including substrate specificity and the ability to stabilize the hydroperoxyflavin intermediate that is crucial in substrate oxygenation.
The developmental and tissue-specific expression of FMO enzymes have been previously characterized in a number of animal species,including humans,mice,rats, and rabbits(Hines et al.1994).Zhang and Cashman(2006) used real-time reverse transcription-PCR to systematically quantify the steady-state mRNA levels of FMOs1–5in human tissues.A comparison between fetal liver and adult liver showed that FMO1was the only FMO that was down-regulated;all other FMOs had greater amounts of mRNA in adult liver.FMO5was the most prominent FMO form detected in fetal liver.The FMO5mRNA level was nearly as abundant as FMO3in adult liver.Whereas other FMOs displayed a significant,dominant tissue-specific mRNA profile,FMO4mRNA was observed more broadly at rel-atively comparable levels in liver,kidney,lung,and small intestine.
The most studied offive mammalian FMOs is FMO3for which mutant alleles contribute to the human disease known as trimethylaminuria(TMAU).Affected individuals are unable to catalyze the N-oxidation of dietary-derived trimethylamine(TMA),a substrate of FMO3.As a con-sequence of this metabolic deficiency,TMA is excreted in the breath,sweat and urine,imparting a bodily odor rem-iniscent of rottingfish(Mitchell and Smith2001).A similar phenotype exists in cattle,in which a nonsense mutation in the bovine orthologue causes afishy off-flavor in cow’s milk(Lunden et al.2002)and off-flavor in pork is asso-ciated with the FMO3polymorphism(Glenn et al.2007). Honkatukia et al.(2005)reported the mapping of a similar disorder(fishy taint of eggs)and the chicken FMO3gene to chicken chromosome8.The only nonsynonymous muta-tion identified in the chicken FMO3gene(T329S)changes an evolutionarily highly conserved amino acid and is associated with elevated levels of TMA andfishy taint in the egg yolk.These results support the importance of the evolutionarily conserved motif FATGY of the insertion domain which has been speculated to be a substrate rec-ognition pocket of FMOs.Allerston et al.(2007)provided evidence that FMO3has been the subject of balancing selection and identified mutations in the50-flanking region (e.g.,-3,548,-2,650,and-3,549)and the coding region (E158K and E308G of the insertion domain)that are potential targets for selection.Hao et al.(2006)found that sites158and257were in significant linkage disequilibrium in both Han Chinese and African-American populations. Interestingly,combinations of certain polymorphic variants can have a prominent effect on FMO3activity(Park et al. 2002;Lattard et al.2003),and in some cases give rise to mild or transient forms of TMAU(Zschocke et al.1999).
Earlier reports of characteristic FMO activities in a range of organisms,including bacteria(Boulton et al. 1974),fungi(Suh et al.1999),protozoa(Agosin and Ankley1987),marine invertebrates(Schlenk and Buhler 1989),insects(Naumann et al.2002),and sharks and tel-eostfish(Schlenk1993),indicated that FMO genes are ancient and conserved.The subsequent completion of various genome projects confirmed such universality with FMO homologs identified in essentially all bacterial,fun-gal,animal and plant genomes.The diversification of FMO proteins during animal evolution must have been deter-mined by the presence of different structural and functional constraints acting on these proteins.In this study we take advantage of the molecular data currently available for FMO proteins of different taxonomic groups to analyze their long-term evolution and functional divergence within a phylogenetic framework.Particular attention was paid to the relative importance of the functional and structural constraints acting at the protein and nucleotide level.
Materials and methods
A total of104nucleotide coding sequences belonging to34 different species of metazoans were used in our analyses (supplemental datasets S1and S2).These include11 FMO1,12FMO2,11FMO3,13FMO4,21FMO5,8FMO from other vertebrates,11FMO fromfish,15FMO-like from invertebrates and two FMO from plete sequences retrieved from databases GenBank and Ensembl were subsequently aligned on the basis of their translated amino acid sequences using the CLUSTAL_W and BIO-EDIT programs(Hall1999)with the default parameters. The alignment of the complete set of sequences consisted of1969nucleotide positions(excluding the start and stop codons)corresponding to675amino acid sites.The inde-pendent alignments for each of the seven FMO lineages are available upon request from the corresponding author and in all cases were checked for errors by visual inspection. The distinction between the N-terminal,the insertion domain(ID),and the C-terminal regions of FMO proteins was established on the basis of the tertiary protein structure of S.pombe and the alignment of these sequences with that of the yeast FMO(see Fig.2legends for details).
All molecular evolutionary analyses in this work were carried out using the program MEGA4(Tamura et al. 2007).The extent of nucleotide and amino acid divergence between sequences was estimated by means of the uncor-rected differences(p-distance).The best-fit evolutionary model and the gamma shape parameter of among-site rate variation were inferred with ModelTest3.8(Posada2006); the latter was used to calculate the transition/transversion ratio(R).The numbers of synonymous(p S)and nonsyn-onymous(p N)nucleotide differences per site were com-puted using the modified Nei–Gojobori method(Zhang et al.1998),providing R in both cases.Distances were estimated using the pairwise-deletion option(which was also used in the protein phylogenetic tree reconstructions) and standard errors were calculated by the bootstrap method with1,000replicates.The presence and nature of selection was tested in FMO genes by using the codon-based Z-test for selection,establishing the alternative hypothesis as H1:p N=p S and the null hypothesis as H0: p N=p S.The Z-statistic and the probability that the null hypothesis is rejected were obtained,and significance levels were indicated as**P(P\0.001)or***P (P\0.01).
The presence of selection in the seven main FMO lin-eages(1–5,fish,and inv)was further studied by testing for deviations from neutrality.The GC content at fourfold degenerate sites was assumed to represent the genomic GC content and was considered as an approximation to the neutral expectation.The influence of selection on certain amino acids was analyzed by determining the correlation between the genomic GC content and the proportion of GC-rich(GAPR)and GC-poor(FYMINK)residues.Under the neutral model,GC-rich and GC-poor amino acids will be positively and negatively correlated with genomic GC content,respectively(Kimura1983).If the frequency of these amino acids is influenced by selection,no correlation between genomic GC content and amino acid frequency would be expected(Rooney2003).Correlations were computed for complete sequences and for discriminating between the N-terminal,ID,and C-terminal segments by using the Spearman rank correlation coefficient.
For amino acid sequences,the neighbor-joining(NJ) method(Saitou and Nei1987)was used to reconstruct the phylogenetic trees.The best model JTT?G(gamma shape parameter2.826)was identified by using ProtTest (Abascal et al.2005).To assess that our results were not dependent on this choice,phylogenetic inference analyses were also completed by(1)the reconstruction of a maxi-mum-parsimony(MP)tree(PAUP* 4.0b10;Swofford 2002)using the tree-bisection-reconnection(TBR)branch swapping algorithm with ten replications for the random addition trees option,and(2)the reconstruction of a Bayesian tree(MrBayes3.1.2;Ronquist and Huelsenbeck 2003)with four Markov chain Monte Carlo chains run for one million generations.For nucleotide sequences, Bayesian analysis and maximum likelihood(ML)methods (GARLI;Zwickl2006)were used to infer phylogenetic trees.The best model GTR?I?G was selected by ModelTest3.8.Bayesian probabilities were obtained under this model,with four Markov chain Monte Carlo chains run for four million generations,using random trees as starting point,and sampling every500th generation.To test the reliability of the obtained topologies,the bootstrap proba-bility(BP)and the posterior probability(PP)values were produced for each internal branch,assuming BP C80% and PP C95%as statistically significant.S.pombe FMO (Eswaramoorthy et al.2006)and the FMO from Saccha-romyces cerevisiae(Zhang and Robertus2002)were assigned as outgroups in the reconstructions.
The analysis of the nucleotide variation across different FMO coding regions was performed using a sliding-window approach by estimating the total(p)and the synonymous (p S)nucleotide diversity(average number of nucleotide differences per site between two sequences)with a window length of20bp and a step size of5bp(for p)and a window length of10bp and a step size of5bp(for p S).The codon usage bias in FMO genes was estimated as the effective number of codons(ENC)(Wright1990),where the highest value(61)indicates that all synonymous codons are used equally(no bias)and the lowest(20)that only a preferred codon is used in each synonymous class(extreme bias). Both analyses were conducted with the program DnaSP v.
4.10(Rozas et al.2003).
To better understand the functional evolution of FMO enzymes,we performed functional analyses of the amino acid alignments in the context of the hypothesized phylo-genetic tree using the software DIVERGE2(Gu and Van-der Velden2002;Gu2006).In particular,we focus on(1) Type-I functional divergence,or site-specific rate shifts,as typically exemplified by amino acid residues highly con-served in a subset of homologous genes but highly variable in a different subset of homologous genes,and(2)Type-II functional divergence,or the shift of cluster-specific amino acid properties,as exemplified by a radical shift of amino acid properties between duplicate genes,that is otherwise evolutionarily conserved.We used DIVERGE2to test the null hypothesis of no changes in site-specific and cluster-specific evolutionary rates among FMO subclades and to predict sites in the alignment having altered functional constraints.For type-I functional divergence,DIVERGE2 measures change in site-specific evolutionary rates using the coefficient of evolutionary functional divergence(h k), where h k=0indicates no change and values approaching h k=1reflect increasing functional divergence.For type-II functional divergence,a h II value significantly higher than zero indicates increasing functional divergence.
Results
Evolution of the FMO protein family
A protein phylogeny was reconstructed from104FMO sequences of34species belonging to different metazoan phyla(Fig.1).Thefive FMO types,thefish FMO,and the invertebrate FMO-like type are well defined by the topol-ogy and by the BP and PP values calculated for each internal branch.The different taxonomic groups are also well differentiated with regard to each of the FMO types. While the tree topology shows the presence of a mono-phyletic origin for FMOs1–5and FMOfish proteins,the polyphyletic origin observed for FMOinv is the result of differences between nematodes and insects,giving rise to independent groups in the phylogeny.In particular,two differentiation events(nodes1and10in the tree)occurred that ledfirst to the FMO lineage from insects and subse-quently to the differentiation of the FMO lineage from nematodes.This pattern of differentiation,which was also corroborated by MP and Bayesian analyses,may have important functional implications.
The lineages corresponding to FMOs1–5(nodes2and3 in the tree)differentiated later than FMOinv where FMO1 and3(P=0.764±0.047substitutions per site)was the closest of all,followed by FMO3and4(P=0.891±0.054 substitutions per site),and FMO1and4(P=0.901±0.057 substitutions per site).This is most likely the result of the longer time elapsed since the differentiation of FMOinv. This observation is supported by the protein variation observed within lineages(Fig.2),which is in agreement with the temporal differentiation frame of the FMO types. The group corresponding to piscine FMO sequences also shows a monophyletic origin and shares the closest common ancestor with FMOs1–5.
After the differentiation of the six vertebrate FMO lin-eages,the early diversification of FMO5,which apparently took place at the same point in mammals and amphibians (node A in the tree),was followed by that of FMOs3and4 (node B)and by FMO2(node C)and FMO1(node D).The differentiation process is also present in invertebrates, which leads to the appearance of the two invertebrate FMO types(insect FMO and nematode FMO).Moreover,some diversification is also present within the invertebrate FMO-like proteins(Fig.1).
Nucleotide variation among FMO genes
Because some FMO sequence comparisons between spe-cies was close to or had even reached the saturation level, the nucleotide-based tree was of low reliability and there-fore we focussed on the protein phylogeny(Fig.1).In the tree that is based on the nucleotide differences per site (data not shown),different FMO types intersperse exten-sively with each other,implying that the nature of the nucleotide variation in the different FMO lineages is essentially synonymous.The level of silent variation was very similar forfive of the six vertebrate lineages(FMO1, p S=0.344±0.009;FMO2,p S=0.380±0.010;FMO3, p S=0.407±0.009;FMO4,p S=0.378±0.010;FMO-fish,p S=0.403±0.014,Fig.2)and slightly higher in FMO5(p S=0.522±0.007)and the invertebrate FMO-like genes(p S=0.519±0.012).When comparing these values with the nonsynonymous differences,we found that p S was significantly greater than p N(P\0.001,Z-test of selection,Fig.2)in most comparisons.
Although the nucleotide coding sequences of these proteins have diverged extensively through silent substi-tutions,different FMOs from the same species do not necessarily cluster together in the phylogenies on the basis of their protein sequences(Fig.1)and nucleotide substi-tutions(data not shown).In general,the amount of silent variation was relatively high between FMO coding regions. It was noted that genes from the same species are not more closely related to each other than they are to FMO genes belonging to very different species of vertebrates(data not shown).For example,the average synonymous divergence between human FMO1and FMO2genes is0.440±0.031 substitutions/site,which is either higher than or comparable to that observed between human FMO1and any of the other types in either human or any other vertebrates.
Fig.1Phylogenetic
relationships among FMO proteins.The reconstruction was carried out by using the
JTT?G model and104FMO sequences(see supplemental dataset1).FMO types are indicated on the right near the species names.Numbers for branches indicate BP values of NJ analyses.The differentiation and diversification events are indicated by squares and circles at the nodes in the phylogeny. human:Homo sapiens,chimp: Pan troglodytes,monkey: Macaca mulatta,bushbaby: Otolemur garnettii,opossum: Monodelphis domestica,mouse: Mus musculus,rat:Rattus norvegicus,guinea pig:Cavia porcellus,rabbit:Oryctolagus cuniculus,squirrel: Spermophilus tridecemlineatus, pig:Sus scrofa,cow:Bos taurus,Madagascan hedgehog tenrec:Echinops telfairi, hedgehog:Erinaceus europaeus,platypus: Ornithorhynchus anatinus, zebrafish:Danio rerio,dog: Canis familiaris,fugu:Takifugu rubripes,tetraodon:Tetraodon nigroviridis,medaka:Oryzias latipes,stickleback: Gasterosteus aculeatus, chicken:Gallus gallus,X. tropicalis:Xenopus tropicalis, evis:Xenopus laevis,C. intestinalis:Ciona intestinalis, C.savignyi:Ciona savignyi, fruitfly:Drosophila melanogaster,moth:Tyria jacobaeae,A.aegypti:Aedes aegypti,A.gambiae:Anopheles gambiae,C.elegans: Caenorhabditis elegans,C. briggsae:Caenorhabditis briggsae,S.cerevisiae: Saccharomyces cerevisiae,S. pombe:Schizosaccharomyces pombe
By discriminating between the N-terminal region,ID,and the C terminus,we observed significantly lower amino acid variation to the N-terminal region (Fig.2).Con-versely,nucleotide variation was roughly the same in terms of silent variation in the N-and C-terminal regions of FMOs 1and 2,and was higher in the C-terminal region of FMOs 3–5than in the N-terminal region.In FMOfish and FMOinv,nucleotide variation was higher in the N-terminal region than in the C-terminal region.Nevertheless,the nonsilent variation was significantly lower in N-terminal domains of the proteins:even the FMO-like proteins from invertebrates did not depart from this trend (Fig.2).This suggests the presence of the strongest functional constraints in this region,which in turn is the main target of the purifying selection acting on FMO proteins.
The nature of nucleotide variation exhibited by sequen-ces among different species was further analyzed by cal-culating the nucleotide diversity (p )and the synonymous nucleotide diversity (p S )across FMO sequences using a sliding-window approach,as shown in Fig.3.The relative contribution of p S to p in FMOs 1–5is evident,as in most cases the overall amount of nucleotide variation was the result of the underlying synonymous variation.While on average the amount of p S ranged between 0.35and 0.65
substitutions per site along the five different types of FMO sequences,a slight increase in the value of p in the case of FMO3and FMO5can be observed at C-terminal regions.This is most likely due to a relaxation of the structural and functional constraints in these regions of the molecules.The values of p and p S appear also to be constrained by the presence of a relatively conserved sequence in the N-terminal region of FMO1(Fig.3a,arrow),the FMO identifying motif FxGxxxHxxxF in the C-terminal region of FMO2(Fig.3b,arrow),and the NADPH binding motif (GxGxxG/A)in the ID of FMO5(Fig.3d,arrow),resulting in reduced nucleotide variation in the segment composing these elements.For FMOinv,a large number of indels made it very difficult to discern between different patterns of variation when comparing the different sequences..,All x -values were substantially \1(Table S1),suggesting a lack of positive selection.
Amino acid frequency and nucleotide composition of FMOs
The presence of selection for certain biased amino acids in the FMO lineages was first analyzed by determining the correlation coefficients between GC content and
the
Fig.2Average numbers of amino acid (p AA )and nucleotide (p NT )differences per site,and average synonymous (p S )and nonsynony-mous (p N )differences per site in the seven FMO lineages,discrim-inating among complete coding regions,N-terminal,ID,and C-terminal domains (within each type,from left to right ).p S [p N in all comparisons except for FMOfish complete (no significant difference),FMOfish C term (no significant difference),FMOinv complete (no significant difference),and FMOinv ID (p S \p N ,P \0.01).Stan-dard errors calculated by the bootstrap method with 1,000replicates are indicated with bars .FMO1[N terminus,nucleotide alignment
position (nt pos.)1–444;ID,445-1008;C terminus,1,009–1,609],FMO2(N terminus,nt pos.1–444;ID,445-993;C terminus,994–1,615),FMO3(N terminus,nt pos.1–444;ID,445-993;C terminus,994–1,627),FMO4(N terminus,nt pos.1–444;ID,445-996;C terminus,997–1,706),FMO5(N terminus,nt pos.1–453;ID,454-1003;C terminus,1,004–1,637),FMO fish (N terminus,nt pos.1–450;ID,451-1001;C terminus,1,002–1,694),and FMO -like from invertebrates (N terminus,nt pos.1–540;ID,541-1121;C terminus,1,122–2,097)
frequency of GC-rich and GC-poor amino acids,shown in Fig.4.In the case of the N-terminal region of FMOs 1,2,4,5,fish,and inv,the frequency of GC-rich GC-poor or that of both amino acids was not correlated with GC content (all P C 0.10).For the ID of FMOs 1,2,3,4,5,and inv,frequencies of GC-rich,that of GC-poor or both amino acids was not correlated with GC content.In con-trast,only in the C-terminal region of FMO3,both the frequencies of GC-rich and that of GC-poor amino acids were not correlated with GC content,while both the fre-quency of GC-rich and that of GC-poor amino acids were significantly correlated with GC content in the case of the same region of FMOs 5and fish (Table 1).For complete FMO molecules,the only case in which a significant cor-relation agreed with the predictions of the neutral model was the negative correlation observed between GC content and the frequency of GC-poor residues of FMOs 2,5,fish,and inv,and the positive correlation between GC content and the frequency of GC-rich residues of FMOs 2,5,fish,and inv (Table 1).
FMO codon usage bias and functional divergence The presence of functional constraints at the protein level allows for a large extent of silent variation in nucleotide sequences resulting in a subsequent decrease in codon bias exhibited by FMO genes.As shown in Fig.5,the overall ENC for FMO genes ranges from 52.558±5.084(FMO5)to 55.979±2.802(FMOinv).When discriminating between the different protein domains,the N-terminal region displayed a trend that was slightly more biased than the C-terminal region,with the exception of FMO4.This unexpected observation may be related to the presence of the conserved FAD binding motif (GxGxxG)at the N-ter-minal segments of FMOs which is present in all species examined and has been shown to be critical for their correct structure and catalytic function (Eswaramoorthy et al.2006).The analyses of the codon usage for glycine residue in this motif showed that GGA (35.26%)is the preferred codon,followed by GGG (23.7%),GGC (22.1%),and GGT
(18.9%).
Fig.3Total (p ,red )and synonymous (p S ,blue )nucleotide diversity (expressed as the average number of nucleotide difference per site)across the coding regions of FMO1(a ),FMO2(b ),FMO3(c ),and FMO5(d ).The diversity values were calculated using a sliding-
window approach with a window length of 20bp and a step size of 5bp (for p )and a window length of 10bp and a step size of 5bp (for p S )(Color figure online)
In addition to our phylogenetic analyses,we analyzed site-specific (type I)divergence of evolutionary rates to predict sites in the amino acid sequences undergoing divergent functional evolution.These functional analyses were based on pairwise comparisons of seven FMO clades (Fig.1).The coefficients of evolutionary functional diver-gence (h k )for each pairwise comparison are presented in Table S2.The type I functional divergence was significant in all comparisons except between FMO2and FMOfish,FMO2and FMOinv,and FMOfish and FMOinv.There was significant divergence in site-specific evolutionary rates between FMO4and all other FMO clades,especially between FMO4and FMO3(h k =0.522±0.060),and between FMO4and FMO1(h k =0.522±0.065).Site-specific analysis of h k revealed a nonrandom distribution of divergent functional constraints along the FMO alignment (Fig.6a,b).For example,among 550alignment positions of FMOs 3and 4,there are 26amino acid residues corre-sponding to the cut-off value P (S 1|X )[0.80.Among 26critical amino acids,ten are in the N-terminal region,five in the ID,and 11in the C-terminal region.Figure 6b shows the distributions of the number of predicted critical sites in three regions of FMOs among 18pairs of cluster compari-sons.It is interesting that the number of predicted critical amino acid residues of the C-terminal region was generally higher than that of the N-terminal region,and both
regions
Fig.4Relationship between GC content of fourfold degenerate sites and the frequencies of GC-rich (GAPR)and GC-poor (FYMINK)amino acid classes in FMO1(a ),FMO3(b ),FMO4(c ),and FMOinv
(d ),discriminating between the complete proteins,the N-terminal regions,IDs,and the C-terminal regions。