细菌染色体复制起始位点
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
细菌染色体复制起点的确定
What is GC Skewing?
If DNA were random strings of letters, you would expect about half of the G's in a genome to be on the leading strand, and the other half on the lagging strand. However, one strand of DNA often has significantly more than its share of G's (thereby causing the other strand to have significantly more than its share of C's). For example, the origin and terminus of replication in a circular chromosome often have unusually even or unusually uneven distribution of G's and C's. The unevenness, or skew, is measured in a "window," or subsequence. By sliding the window along the sequence, unusually even or unusually uneven distributions can be located. GC Skew is calculated as (G - C) / (G + C), where G is the number of G's in the window, and C is the number of C's.
Interpreting GC-Skew Graphs
The sample sequence on this web page is approximately 10 kb from the lagging strand of Mycoplasma pneumoniae. The annotated origin of replication is approximately in the center of the sequence. Experiment with different window sizes (from 100 to 3000) and step sizes (from 20 to 200) to see which combination is best for finding the origin of replication. The origin of replication is typically associated with a change in sign of the GC-skew. However, there are usually many such changes in sign, especially for smaller window sizes. Therefore, a second measure, the cumulative skew, is used. The cumulative skew is simply the running sum of the skew values in each window. The origin of replication is associated with the global minimum of the cumulative skew (or global maximum if the lagging strand is analyzed, as in this example).
quote:/gc_skew/gc_skew.html
伴随基因组测序技术的广泛应用,用生物信息学方法判断基因组的复制起点相应产生了。如GC skew,GC cumulative skew,Z curve 以及应用多种方法综合判断等。
GC skew 的计算公式为(nG-nC)/(nG+nC),其中nG(nC)为一特定大小DNA片段(窗口)内G或C的含量。GC skew在染色体复制起始位点、终止点产生明显的由负到正、由
正到负的变化。该方法是1996年由Lobry [1]通过对大肠杆菌、枯燥芽胞杆菌和流感嗜血杆菌3种细菌基因组的分析发现它们DNA链不同区域的核苷酸组成不对称而建立的,即前导链含有较多的G,而后随链含有较多的C。因为DNA链在复制起点oriC处改变其从后随链到前导链的复制模式,所以在oriC处会发生GC skew由负到正的变化。该现象随后被其他研究人员在很多细菌研究中证实,此方法进而广泛应用于细菌染色体复制起始位点的判断。造成这种现象的原因主要是由于在前导链和后随链上选择和突变压力的不同。
在GC skew的基础上,Grigoriev[1]建立了一种累计skew(cumulative skew)的方法。这种方法是从DNA序列的任一位置开始,计算(nG-nC)/(nG+nC),并依次把相邻的
(nG-nC)/(nG+nC)累计相加,最大值在复制终点,最小值在复制起点。它的优点是适用于一些GC skew不太明显的微生物,用一般的GC skew作图很难观察GC skew正负值的转变点,但用累计GC skew就很容易看出。另外,累计skew 的图形是一条“V”形的曲线,并非一般GC skew的上下波动的曲线,故更直观[2,3]。
一些细菌核苷酸分布不对称的明显转变并不是与复制起点相对应的。为了更准确的定位复制起点,Mackiewicz[4]等利用复制起点序列的特点结合DNA 核苷酸分布不对称性(a),DnaA box 的分布(b),和dnaA 基因的位置(d)建立了新的定位复制起点的方法。这三种方法确定出来的位置相吻合,则假定的起始位点的可信度最高。因为细菌复制起点序列只在相近物种间是保守的,但是几乎所有的细菌复制起点序列都包括几个成簇分布的DnaA box和一个AT-rich区域,同时复制起始位点经常位于dnaA 基因附近。根据这三种方法确定的复制起始位点结果分成5组:abd,三种方法确定的位置一致;ab,DNA asymmetry 的极值在DnaA box簇附近;ad,DNA asymmetry 的极值和dnaA基因相对应;bd,dnaA基因在DnaA box簇附近;O,每种方法确定出不同的位置。结果中abd组细菌占55%,将这三种特性结合来定位复制起始位点的结果是比较可靠的。
Z曲线是张春霆院士建立的,它是表示一条DNA序列唯一等价的三维空间曲线,是通过几何学的途径对基因组序列进行研究[5]。Z曲线的计算公式如下:xn=(An+Gn)-(Cn+Tn), yn=(An+Cn)-(Gn+Tn), zn=(An+Tn)-(Gn+Tn). xn,yn,zn∈[-N,N],n=0,1,2,……,N, 其中An,Gn,Cn,Tn分别是从第一个碱基到第n个碱基长度范围内出现的A,G,C,T的个数。定义A0=G0=C0=T0=0。N是序列的长度。Z 曲线的三个坐标代表不同的生物学意义:x,y,z 分别表示purine/pyrimidine(R/Y),amino/keto(M/K)和strong-H bond/weak-H bond在基因组序列方向上的分布。Z curve还包含的GC-disparity 曲线(xn-yn)/2和AT-disparity 曲线(xn+yn)/2。细菌染色体复制起点和RY-disparity(xn),MK-disparity(yn),GC-disparity,