DnaSPv5

合集下载

5-羟甲基胞嘧啶核苷酸分子量

5-羟甲基胞嘧啶核苷酸分子量

5-羟甲基胞嘧啶核苷酸分子量
5-羟甲基胞嘧啶核苷酸(5-hydroxymethylcytosine, 5hmC)是一种DNA碱基修饰产物,在哺乳动物细胞中广泛存在。

其分子量约为415.2g/mol。

它是通过去甲基化作用而产生的一种介于5-甲基胞嘧啶核苷酸(5-methylcytosine, 5mC)和胞嘧啶核苷酸(cytosine, C)之间的中间产物。

5hmC的发现极大地拓展了我们对于DNA碱基修饰和其在生命科学中的作用的认识。

5hmC是通过酶Tet甲基化酶家族来催化的去甲基化反应而产生的。

Tet酶家族包括Tet1、Tet2和Tet3三种成员,它们都能对5mC进行去甲基化反应,使其转化成5hmC,从而在基因表达、DNA复制和细胞分化等生物学过程中起到重要作用。

目前,5hmC已被证明在胚胎发育、神经系统形成、肿瘤发生等许多生物学过程中发挥了重要作用。

总之,5hmC的发现和研究为我们进一步认识DNA修饰和其在生物学中的作用提供了重要启示。

在未来的研究中,5hmC有望成为一种重要的生物标志物,提供更多疾病的诊断和治疗的新思路。

DNA中5-羟甲基胞嘧啶图谱测定及在临床中的应用

DNA中5-羟甲基胞嘧啶图谱测定及在临床中的应用

DNA中5-羟甲基胞嘧啶图谱测定及在临床中的应用技术参数名称:5-羟甲基胞嘧啶基因图谱测定和分析技术名称:nano-hmC-Seal,来源于芝加哥大学何川教授实验室DNA用量:1ng~1ug(具体情况会根据5hmC在该物种中的含量进行小范围调整)测序策略:NextSeq500/Hiseq X10,PE38/PE150数据量:平均1.5G/6G,读段数量2千万以上周期:少量样本(<10)25个工作日给出基本解读分析;批量样品根据实验要求协商数据循环周期。

服务范围各类物种或者其他来源DNA中的5-羟甲基胞嘧啶基因图谱绘制和对比分析;各类肿瘤临床研究,包括肿瘤标记物的寻找、组织活检、液体活检、术后监控和用药指导的研究;专业人员辅助实验设计,定制个性化的5-羟甲基胞嘧啶科研方案;技术背景简介5-羟甲基胞嘧啶(5hmC)是5-甲基胞嘧啶(5mC)的主动去甲基化过程中的重要中间产物,被称为“第六种DNA碱基”。

2009年两篇science文章发现了TET酶可以将5mC氧化成5hmC并在生理上发挥着重要的作用,5hmC也迅速成为科研的热点。

在接下来的几年中,科学家发现5hmC和5mC一样,是一种重要的表观遗传修饰。

5hmC的基因分布可以精准对应基因活性调控,比5mC更加动态灵敏地反应基因表达状态。

2012年时,《Cell》杂志就有文章报道羟甲基胞嘧啶与黑色素瘤之间的紧密关系(Lian et al., 2012)。

随后又有多篇有影响力的学术文章进一步验证了羟甲基胞嘧啶作为细胞发育、神经系统、肿瘤以及心血管疾病等的标记物。

同表观遗传修饰5mC一样,结合高通量测序的方法绘制特定情形下5hmC在全基因组上的基因图谱显得尤为重要。

通过了解DNA中羟甲基化胞嘧啶在基因组上的分布,可以对应分析基因组的调控信息。

我们的DNA羟甲基化检测技术原理来自于2011年美国芝加哥大学何川教授发表在NatureBiotechnology的专利技术(Song et al., 2011)。

常用DNA分子标记类型和特点

常用DNA分子标记类型和特点

常用DNA分子标记类型和特点
依据对DNA多态性的检测手段,DNA标记可分为四大类:
第一类为基于DNA.DNA杂交的DNA标记。

主要有限制性片段长度多态性标记(RFLP)、可变数目串联重复序列标记(VNTR)、单链构象多态性RFLP(SSCP.RFLP)等;
第二类为基于PCR的DNA标记。

主要有随机扩增多态性DNA(RAPD),简单重复序列DNA
标记(SSR),测定序列标签位点(STS),表达序列标签(EST),测序的扩增区段(SCAR);
第三类为基于PCR与限制性酶切技术结合的DNA标记。

主要有两种,一种是扩增片段艮度多态性(AFLP),第二种是酶解扩增多态顺序(CAPS);
第四类为基于单核苷酸多态性的DNA标记.主要是单核苷酸酸多态性(SNP).
各类常用分子标记的特点和应用如下:。

5hmc 二价染色质

5hmc 二价染色质

5hmc 二价染色质5hmc(5-羟甲基胞苷)是一种二价染色质修饰物,广泛存在于人体细胞中,并在基因表达调控和细胞命运决定中起着重要作用。

本文将介绍5hmc二价染色质的相关知识,包括其功能、调控机制以及与疾病的关联。

5hmc是DNA上的一种化学修饰,是通过DNA甲基化过程中甲基化酶将5-甲基胞苷(5mC)氧化而形成的。

与5mC相比,5hmc在基因组中的分布较少,但在干细胞和成熟的神经细胞中富集,尤其在启动子和增强子区域。

研究表明,5hmc参与了细胞去分化、重编程和细胞分裂等重要的生物学过程。

在胚胎干细胞中,5hmc水平较高,与维持干细胞的基因表达程序密切相关。

而在细胞分化过程中,5hmc的降低与基因沉默和细胞命运决定有关。

此外,5hmc还参与了DNA损伤修复和等位基因不平衡的调控。

5hmc的调控机制涉及多个相关酶和蛋白质因子。

TET家族酶是5hmc生成的主要调控因子,它能够从5mC氧化生成5hmc,并进一步氧化成5-羧甲基胞苷(5caC),最终通过DNA去甲基化酶将其去除。

此外,DNA甲基化修饰蛋白(MeCP2)和其他转录因子也与5hmc的调控相关。

近年来,研究发现5hmc的异常水平与多种疾病的发生和发展密切相关。

例如,5hmc在肿瘤发生中起着重要作用。

某些肿瘤存在5hmc 水平的持续增加,这可能导致癌基因的活化和肿瘤抑制基因的沉默。

同时,5hmc还与神经系统疾病如自闭症、抑郁症和阿尔茨海默病等有关。

这些疾病中,5hmc的异常修饰可能导致关键基因的表达异常,进而影响神经元的发育和功能。

总结来说,5hmc作为一种重要的二价染色质修饰物,在基因表达调控和细胞命运决定中发挥着重要作用。

其调控机制涉及多个酶和蛋白质因子,并与多种疾病的发生和发展密切相关。

未来的研究将进一步探索5hmc在细胞功能和疾病中的具体作用机制,为相关疾病的治疗和预防提供新的思路和靶点。

注:本文仅用中文撰写,不涉及任何英文内容,且不包含AI、关于AI、人工智能、任何网址、超链接和电话。

速递 水稻SPL5基因在抗病应答中可起负调控作用等

速递  水稻SPL5基因在抗病应答中可起负调控作用等

速递水稻SPL5基因在抗病应答中可起负调控作用等作者:暂无来源:《种子科技》 2015年第8期水稻SPL5基因在抗病应答中可起负调控作用Rice杂志最新研究表明,水稻SPL5(spotted leaf 5)突变体能自发产生类似超敏反应(hypersensitive response,HR)的细胞坏死斑(lesion),并显著增强对病原菌的抗性,说明SPL5基因在细胞死亡和抗病应答中起负调控的作用,但是SPL5介导的调控途径及其分子机制尚不清楚。

该实验结果给SPL5基因调控机制提出了一个可能的模型:SPL5基因负调控转录因子OsWRKY14的表达,通过OsWRKY14介导水稻5-羟色胺的生物合成,5-羟色胺的积累促进PR 基因的表达,从而激活水稻的抗病应答反应。

我国已成为马铃薯生产和消费第一大国7月29日,农业部在北京举办马铃薯主食产品及产业开发国际研讨会。

会议指出,马铃薯推进主食产品及产业开发前景十分广阔。

马铃薯是世界上继水稻、小麦、玉米之后的第四大粮食作物,也是中国的第四大粮食作物。

2014年,我国马铃薯种植面积和产量均占世界的1/4左右,已成为马铃薯生产和消费第一大国,但受品种特性、市场需求等影响,中国马铃薯年人均消费量仅为41.2 kg,远低于欧美国家水平。

北京市大兴区严查进种渠道近期,正值大白菜、萝卜种子的进种、销售旺季,大兴区种子管理站执法人员对辖区内重点种子生产、经营企业进行检查。

主要检查了种子经营档案、种子来源、包装标签标识等内容。

检查中,执法人员对销售的散装种子当场开具了限期整改通知书,责令下架退货,同时要求各经营单位严格按照《中华人民共和国种子法》和相关农业法律法规合法经营、谨慎进种,把好进种源头,杜绝假劣种子流入市场,确保种子质量安全,维护农民的切身利益。

绵阳启动杂交水稻制种险日前,四川省绵阳市在全市范围内启动实施杂交水稻制种特色农业保险。

本着利民惠民和互利互惠的原则,并结合水稻制种周期正常成本性支出,测算确定了保险金额为2 000元/667 m2,保险费率5%,保费100元/667m2。

dnasp说明书

dnasp说明书

dnasp说明书篇一:PAUP软件使用说明PAUP软件使用简要说明1.数据输入格式将需要分析的一组DNA数据经Clustal软件比对分析后,将其比对结果的*.aln文件用Mega软件打开并转换为Mega格式(File-〉Convert To Mega Format),转换结果会以*.meg文件存在与*.aln 同一目录下,再用DNAsp软件将*.meg文件转换为PAUP格式(File-〉Save/Export Date As,以NEXUS File Format保存)即可。

2.MP法分析先启动PAUP软件,选择相应数据文件,然后在命令行内依次键入outgroup_外群名回车Bootstrap_eps=1000_keepall 回车Describetree 回车Savetrees_from=1 to=1000 回车3.NJ法分析先启动PAUP软件,选择相应数据文件,然后在命令行内依次键入outgroup_外群名回车Set_criterion=distance 回车Bootstrap_search=nj_eps=1000_keepall 回车contree 回车Savetrees_from=1_to=1000 回车4.ML法分析先启动PAUP软件,选择相应数据文件,然后在命令行内依次键入Set_criterion=likelihood 回车Bootstrap_eps=100_keepall 回车contree 回车Savetrees_from=1_to=100 回车注:外群名指的是分析数据中外群的代号;下划线“_”表示键入一个空格;结果以*.tre格式存在分析数据的同一文件夹内,用Treeview软件打开。

篇二:PhyML_使用说明书Runned1、利用jModelTest建模,记录如图1输出的结果图12、将利用Dnasp输出的.nex数据转化为txt格式(图2)。

其中111为个体数,334为最长序列碱基对数(bp)。

DnaSP使用说明(翻译)

DnaSP使用说明(翻译)

DnaSP使用说明(翻译)2008-06-12 16:191. 打开DnaSP,出现动态DNA图像,点击画面任何一处即会静止,再点击又呈现动态;点file 标签,出现空白画面.--->2. 点file-open data file, 选择要打开的文件(FASTA, MEGA, NBRF/PIR,NEXUS and PHYLIP等格式)后,出现一个小窗口,描述了所打开序列数据的总的核苷酸数目,序列数目等。

点close,如果想在稍后重新打开这个窗口,可以点Display-Data info.--->3. Display-view data,查看对比序列.4. 计算核苷酸多态性:点Analysis-DNA polymorphism, 出现一个选项卡供选择要分析序列的全长或某一区段(Region to Analyze)和运算法则(options)等;如果想知道在整个序列的哪一段多态性较高,可以在sliding window点中compute进行选择。

如果只打开了一个序列,则会出现要求选择多个序列进行分析的提示框.-5. 中性检验点击Analysis-Tajima's test该检验的目的是鉴定目标DNA序列在进化过程中是否遵循中性进化模型。

大多数由自发突变造成的分子差异不会影响到个体适宜度,推论指出基因进化根本上是通过遗传漂变来实现的。

Tajima's D 统计将序列分离位点数和两两序列差异平均数的估计进行对比。

在固定规模的试验群体中,如果这两个估计值相等,则只存在遗传漂变;反之,则是存在另一种选择方式影响了群体基因序列。

Tajima's D 统计值为正值时说明序列进化方式为平衡选择,且有一些单倍型分化;负值时为负向选择。

(Tajima Test(Tajima’s D)是由日本学者Fumio Tajima在1989年提出的。

该检验的目的是鉴定目标DNA序列在进化过程中是否遵循中性进化模型。

如何使用DnaSP v5(Librado et al.,2009)计算核苷酸多样性和单倍型多样性

如何使用DnaSP v5(Librado et al.,2009)计算核苷酸多样性和单倍型多样性

如何使用DnaSP v5(Librado et al.,2009)计算核苷酸多样性和单倍型多样性
熊荣川
中国科学院成都生物研究所
xiongrongchuan@
再进行种群遗传学或是谱系地理学方面的研究时,我们都会用到核苷酸多样性(π)和单倍型多样性(Hd)。

下面简单介绍怎样用软件DnaSP v5(Librado et al.,2009)得到相关的信息
首先我们需要一个比对好的fasta文件,例如taiwan30aln.fas
打开DnaSP v5
打开文件“taiwan30aln.fas”出现一个数据信息框,或者说软件赋予你的数据的一些默认信息
比如它会默认你的数据是来自常染色体的核基因、是二倍体型,因此我们常常需要对之进行修改以符合我们数据的实际情况,关掉这个信息框之后使用“data”-> “format”选项卡进行修改
修改好之后,进行DNA多态性分析“Analysis”-> “DNA Polymorphism”
结果中包含了我们需要的核苷酸多样性(Pi)和单倍型多样性(Hd),当然了还有很多相关的信息,have a fun!
参考文献
Librado P.,Rozas J.2009. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics, 25: 1451-1452.。

DNAsp使用说明

DNAsp使用说明

DNAsp使用说明DNAsp使用说明1、简介1.1 背景介绍1.2 功能概述1.3 系统要求2、安装与配置2.1 与安装2.2 系统配置2.3 导入数据3、数据分析3.1 数据预处理3.1.1 数据清洗3.1.2 数据格式转换3.1.3 数据筛选3.2 基本统计分析3.2.1 碱基组成分析3.2.2 突变频率计算3.2.3 群体遗传多样性分析 3.3 遗传结构分析3.3.1 分子方差分析3.3.2 群体遗传结构分析 3.3.3 近交系数计算3.4 编辑距离计算4、结果解读4.1 数据图表解读4.2 结果分析和解释4.3 结果导出与保存5、常见问题解答5.1 安装和配置问题5.2 数据处理问题5.3 结果解读问题6、法律名词及注释6.1 版权版权是法律规定的对创作的权利保护措施,包括对创作的复制、发布、展示、修改等行为的限制或授权。

6.2 数据隐私数据隐私是指个人或组织的敏感信息在被收集、存储、处理、传输或使用的过程中受到保护的权利。

6.3 免责声明免责声明是指在特定情况下,一方声明对特定事项不承担责任的声明,通常用于降低法律风险。

本文档涉及附件:- DNAsp安装文件- 示例数据文件本文所涉及的法律名词及注释:- 版权:法律规定的对创作的权利保护措施,包括对创作的复制、发布、展示、修改等行为的限制或授权。

- 数据隐私:个人或组织的敏感信息在被收集、存储、处理、传输或使用的过程中受到保护的权利。

- 免责声明:在特定情况下,一方声明对特定事项不承担责任的声明,通常用于降低法律风险。

DNA的复制特点(分子生物学)

DNA的复制特点(分子生物学)

冈崎片段
DNA聚合酶Ⅰ 具有5 →3 外切酶活 性,它可以及时切除复制起始合成的RNA 引物,并填补切除引物的裂隙。


dUMP片段
• 形成原因: 1、dUMP会掺入到DNA分子中去; 2、细胞通过两道防线阻止了dUMP掺入 到DNA分子的过程; 3、第二道防线中的“尿嘧啶-N-糖苷酶” 和“Ap内切酶”会将新合成的DNA链酶切成 片段。

UNG酶缺陷型突变体不能合成尿嘧啶-N-糖苷酶,
-
·
ung (UNG酶缺陷型)突变体
-
由于以掺入dUMP被切除的概率降低, dUMP片段将变长。
DNA复制的起点
放 射 自 显 影
• 复制开始时,用低放射-脱氧胸苷标记· ,将来感光 还原的银密度低; • 再转移到含高放射-脱氧胸苷培养基,则继续合成 区的银密度高;
• 如果:单向复制,则密度为:高低 双向复制,密度为:中间低,两侧高
但仍有dUTP分子约以1/1200的概 率逃逸分解掺入到DNA分子中
细胞内的第二道防线:
形成一个无嘌呤或碱基的Ap位点
Ap内切酶进一步将Ap位点酶解成缺口
dUMP片段
• dUTP有1/1200的概率掺入到DNA中,则约1200个核苷 酸就有一个dUMP被剪切的可能,被剪切后则会与冈崎片 段相似大小的1000~2000核苷酸的片段。 • dut (dUTP酶缺陷型)突变体 由于dUMP掺入的概率增加,dUMP片段将变短。
蒋尚蓉
DNA复制的特点
1.半保留复制
Meselson和Stahl的重氮实验
DNA按5’→3’方向延伸
• 三个磷酸基团的强 大斥力是dNTP难 以靠近带有三个磷 酸基团的5’端 • 空间位阻,带有三 个磷酸基团的5’端 不3’ → 5’,切 除错配的核苷酸后就剩下的5端就是单磷酸 腺苷,就必须再加两个磷酸基团上去,成为三 磷酸腺苷,这样就浪费了能量

PC DNA疫苗pVAX-p55-v3的免疫保护作用研究-免疫学论文-基础医学论文-医学论文

PC DNA疫苗pVAX-p55-v3的免疫保护作用研究-免疫学论文-基础医学论文-医学论文

PC DNA疫苗pVAX-p55-v3的免疫保护作用研究-免疫学论文-基础医学论文-医学论文——文章均为WORD文档,下载后可直接编辑使用亦可打印——肺孢子菌肺炎( Pneumocystis pneumonia,PC) 是各种先天或后天免疫机能低下者常见的并发症和主要的原因之一.复方磺胺甲唑作为目前预防和治疗PCP 的主要药物,由于其毒副作用及耐药性问题受到了一定程度限制.因此,寻找更为安全及有效的防治措施成为目前研究PCP 的一个焦点.P55 抗原是肺孢子菌细胞壁分子量为45 ~ 55kD 的组分抗原,它能刺激宿主产生保护性免疫反应[1].Ma 等[2]发现p55 抗原基因存在多态性并克隆了p55-v1 ~4 抗原基因,并将Smulian 等[3]克隆的p55 抗原基因命名为p55-v0,比较发现p55-v3 与p55-v0 在结构和染色体位置上均存在差异.本课题组前期已成功克隆出p55-v3 与p55-v0 抗原基因,序列比对发现p55-v3 和p55-v0 同源性仅为20. 9%[4].那么机体感染PC 过程中,p55-v3 能否刺激宿主产生保护性免疫反应,尚无研究证实.本研究在成功构建PC DNA 疫苗pVAX-p55-v3 及pVAX-p55-v0 基础上[5],免疫SD 大鼠并导PC 感染,比较观察肺重/体重、肺印片包囊计数及组织病理学,从而对p55-v3 抗原的免疫保护作用提供依据.1 材料与方法1. 1 主要试剂、质粒及菌株质粒抽提试剂盒( 大抽) 购自Omega 公司.感受态大肠杆菌DH5 购自鼎国生物技术有限公司.真核表达载体pVAX1 购自Invitrogen 公司. pVAX-p55-v3 及pVAX-p55-v0 由本实验室自制.1. 2 质粒的抽提( 大抽) 挑取pVAX-p55-v3、pVAX-p55-v0、pVAX1 单菌落,分别接种于LB / Kna培养基中,按质粒抽提试剂盒( 大抽) 说明书进行抽提,核酸蛋白分析仪测定质粒浓度及纯度.1. 3 动物免疫40 只SD 雌性大鼠,分为4 组( A组PBS 对照组、B 组pVAX1 免疫组、C 组pVAX-p55-v0 免疫组、D 组pVAX-p55-v3 免疫组) ,分别肌肉注射DNA 疫苗100 g,每15 天免疫1 次,共免疫4次.末次免疫15 d 后,按照文献[6]报道建立PCP 动物模型, 导PC 感染.1. 4 疗效考核实验前后均记录各组大鼠体重.实验完毕, 大鼠,取肺分离气管及肺门组织,吸干表面水分后称全肺湿重.将每只大鼠的 4 个肺叶上下横断,印于 5 张载玻片上印片,每张印片编号.空气干燥,甲醇固定,六亚甲基四胺银( GMS) 染色,油镜下顺序观察100 视野,计数PC 包囊数,计算每视野包囊均数,再计算每组包囊均数.取左肺上部分小块组织,10%甲醛固定,石蜡包埋,制成切片,经HE 和GMS 染色后观察.参照Kim等[7]提供的方法,按所涉及的肺泡数确定PC 感染的程度.1. 5 统计学分析实验数据用x-s 表示,组间数据结果分析采用单因素方差分析,使用SPSS10. 0 统计学软件进行分析,P 0. 05为统计学差异显著.2 结果2. 1 大鼠体重及肺重/体重的变化地塞米松注射 6 周后,pVAX-p55-v3 及pVAX-p55-v0 免疫组体重增加,肺重( g) /体重( g) 明显降低( 0. 96 0. 15/0. 83 0. 14 vs 1. 59 0. 32 /1. 68 0. 38) ,P 0. 05,而pVAX-p55-v3 组与pVAX-p55-v0 组无显著性差异( P 0. 05) .2. 2 肺印片包囊均数比较pVAX-p55-v0 及pVAX-p55-v3 免疫组每视野包囊均数明显低于pVAX1 及PBS 对照组( 0. 95 0. 45、1. 75 0. 98 vs5. 76 1. 42、5. 90 1. 03) ,P 0. 05.从图1A ~ C、d 亦可见pVAX-p55-v0 及pVAX-p55-v3 免疫组包囊数明显减少.2. 3 肺组织病理学观察2. 3. 1 HE 染色pVAX-p55-v0 及pVAX-p55-v3 免疫组肺切片( HE 染色) ,肺实变较对照组明显减轻,肺泡腔内渗出减少,见图1E、F、G、H.2. 3. 2 GMS 染色采用GMS 染色,PBS 及pVAX1对照组( 图1I、J) 肺切片可见被染成棕黑-黑色的PC 包囊散在或成堆地存在于肺泡腔中或贴在肺泡壁上,肺泡间质中亦可见少量包囊.而实验组包囊数量均明显减少( 图1K、I) .3 讨论P55 是卡氏肺孢子菌细胞壁上的一种组分抗原.Smulian 等[1,3]克隆并鉴定了肺孢子菌55 kD抗原基因( p55-v0) ,将重组P55 抗原主动免疫大鼠后能够刺激机体产生体液和细胞免疫反应.Ma等[2]分别克隆了p55-v1 ~ v4 基因,比较发现p55-v3和p55-v0 基因3端均为高度保守区域,但两者重复区域的氨基酸种类和数目存在差异; p55-v0 包含一个N-连接糖基化位点和RGD 连接位点而p55-v3没有; 染色体杂交发现p55-v3 位于第4 染色体而p55-v0 位于第2 染色体.本课题组前期成功构建pVAX-p55-v3 及pVAX-p55-v0 DNA 疫苗,进一步免疫SD 大鼠发现p55-v0 具有部分免疫保护作用,与Smulian等[1]及易亮衡等[8]的报道一致.同时亦观察到,与p55-v0 相似,p55-v3 组虫体负荷明显减少,且p55-v3 组病理切片可见肺泡间隔水肿明显减轻,炎症细胞减少,提示p55-v3 同样能提供部分保护作用,这为后续进一步对p55-v3 免疫保护作用机制研究提供基础,亦为PCP 的防治提供一种新的方法和手段.参考文献:[1] Smulian AG,Sullivan DW,Theus SA. Immunization with recom-binant Pneumocystis carinii p55 antigen provides partial protectionagainst infection: characterization of epitope recognition associatedwith immunization[J]. Microbes and infection / Institut Pasteur,2000,2( 2) : 127-136.[2] Ma L,Kutty G,Jia Q,et al. Characterization of variants of thegeneencoding the p55 antigen in Pneumocystis from rats and mice[J]. J Med Microbio,2003,52( Pt11) : 955-960.[3] Smulian AG,Theus SA,Denko N,et al. A 55 kDa antigen ofPneumocystis carinii: analysis of the cellular immune response andcharacterization of the gene[J]. Mole Microbiol,1993,7( 5) : 745-753.[4] 冯燕梅,罗永艾,江涛. 卡氏肺孢子菌p55-v0 及p55-v3 基因CDS 区的克隆与序列比较[J]. 中国人兽共患病学报,2010,26( 3) : 235-238,242.[5] 冯燕梅,罗永艾,江涛,等. 卡氏肺孢子菌pVAX-p55-v3 及pVAX-p55-v0 真核表达载体的构建及表达鉴定[J]. 中国免疫学杂志,2010,26( 3) : 214-217.[6] 张帆,卢思奇,王凤云,等. 卡氏肺孢子虫低率SD 大鼠动物模型的建立[J]. 中国寄生虫病防治杂志,2005,18( 2) : 99-102,F002.[7] Kim CK,Foy JM,Cushion MT,et al. Comparison of histologicand quantitative techniques in evaluation of therapy for experimen-tal Pneumocystis carinii pneumonia[J]. Antimicrob Agents Che-mother,1987,31( 2) : 197-201.[8] 易亮衡,陈金铃,秦永伟,等. 卡氏肺孢子虫p55 基因片段DNA 疫苗的免疫原性研究[J]. 现代预防医学,2010,37( 13) :2506-2508.。

ifit5基因

ifit5基因

ifit5基因概述ifit5基因是一种编码蛋白质的基因,其全称为Interferon-induced proteinwith tetratricopeptide repeats 5,中文名为干扰素诱导蛋白5。

该基因在人体中起着重要的免疫调节作用。

在本文中,我们将详细介绍ifit5基因的结构、功能、调控机制以及其在疾病中的作用。

结构ifit5基因位于人类染色体10号的q23.31区域,由6个外显子和5个内含子组成。

该基因编码的蛋白质含有若干个tetratricopeptide重复序列(TPR),这些序列能够与其他蛋白质相互作用,从而参与多种生物过程的调控。

功能ifit5蛋白质是一种干扰素诱导蛋白,主要通过与其他蛋白质相互作用,参与调控细胞的免疫应答。

ifit5蛋白质的主要功能如下:1.抗病毒作用:ifit5蛋白质在受到病毒感染后被干扰素诱导表达,能够抑制病毒的复制和传播。

它通过与病毒RNA相互作用,阻断病毒RNA的翻译和复制过程,从而限制病毒在细胞内的生存。

2.免疫调节作用:ifit5蛋白质能够调节多种免疫相关基因的表达,参与调控免疫细胞的活化、增殖和分化。

它与其他免疫调节蛋白质相互作用,形成复杂的调控网络,调节免疫应答的强度和持续时间。

3.细胞凋亡调控:ifit5蛋白质在某些情况下还可以参与细胞凋亡的调控。

它与凋亡相关蛋白质相互作用,影响细胞凋亡的发生和进程。

调控机制ifit5基因的表达受多种因素的调控,包括病毒感染、干扰素的刺激以及其他免疫调节分子的作用。

以下是ifit5基因表达的调控机制:1.干扰素诱导:ifit5基因是干扰素的直接靶基因,当细胞受到病毒感染或其他免疫刺激时,干扰素被产生并结合到细胞表面的受体上,激活干扰素信号通路,最终导致ifit5基因的转录和表达。

2.转录因子调控:除了干扰素的作用外,一些转录因子也能够直接或间接地调控ifit5基因的表达。

这些转录因子包括NF-κB、STAT1等,它们能够结合到ifit5基因的启动子区域,促进或抑制基因的转录。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

DnaSP Version 5 Help ContentsRunning DnaSP, press F1 to view the context-sensitive help.What DnaSP can doIntroductionSystem requirementsInput and OutputInput Data Files (FASTA format; MEGA format; NBRF/PIR format; NEXUS format; PHYLIP format; HapMap3 Phased Haplotypes format)Open Multiple Data FilesOpen Unphase/Genotype DataOutputUCSC BrowserDataData MenuDefine Sequence SetsDefine Domain SetsFilter / Remove PositionsInclude / Exclude SequencesAnalysisPolymorphic SitesDNA PolymorphismInDel (Insertion-Deletion) PolymorphismDNA Divergence Between PopulationsConserved DNA RegionsPolymorphism and DivergencePolymorphism and Divergence in Functional RegionsSynonymous and Nonsynonymous SubstitutionsCodon Usage BiasPreferred and Unpreferred Synonymous SubstitutionsGene ConversionGene Flow and Genetic DifferentiationLinkage DisequilibriumRecombinationPopulation Size ChangesFu and Li's (and other) TestsFu and Li's (and other) Tests with an OutgroupHKA; Hudson, Kreitman and Aguadé’s TestMcDonald and Kreitman’s TestTajima's TestOverviewPolymorphism DataPolymorphism/Divergence DataMultiDomain AnalysisGenerateConcatenated Data FileShuttle to: DNA Sliderms (Dick Hudson) Data File FormatPolymorphic/Variable Sites FileHaplotype Data FileTranslate to Protein Data FileReverse Complement Data FilePrepare Submission for EMBL / GenBank DatabasesToolsCoalescent SimulationsHKA test. Direct ModeDiscrete DistributionsTests of Independence: 2 x 2 tableEvolutionary CalculatorMenu CommandsDnaSP user interfaceFile MenuData MenuDisplay MenuAnalysis MenuOverview MenuTools MenuGenerate MenuWindow MenuHelp MenuCitationDistribution Policy and UpdatesAcknowledgementsReferencesWhat DnaSP can do:Abstracts:DnaSP v. 1.0DnaSP v. 2.0DnaSP v. 3.0DnaSP v. 4.0DnaSP v. 5.0DnaSP, DNA sequence polymorphism, is an interactive computer program for the analysis of DNA polymorphism from nucleotide sequence data. The program, addressed to molecular population geneticists, calculates several measures of DNA sequence variation within and between populations (with or without the sliding window method) in noncoding, synonymous or nonsynonymous sites; linkage disequilibrium, recombination, gene flow and gene conversion parameters; and some neutrality tests, Fu and Li's, Hudson, Kreitman and Aguadé's, McDonald and Kreitman, and Tajima's tests. DnaSP can also conduct computer simulations based on the coalescent process.What DnaSP can not do:DnaSP can not align sequences. There are some available programs that can do this. For example, you can perform the multiple alignment with CLUSTAL W(Thompson et al. 1994). This program produces an output (multiple aligned sequences in NBRF/PIR format) that can be read by DnaSP.DnaSP can not make phylogenetic inferences or manipulate trees. There are many programs to do this, for example, MacClade (Maddison and Maddison 1992), MEGA (Kumar et al. 1994), PHYLIP (Felsenstein 1993), PAUP (Swofford 1991). Nevertheless, the input file formats used by DnaSP (FASTA, MEGA, NBRF/PIR, NEXUS and PHYLIP format) are also recognized for some of them.DNA sequences can not be edited or manipulated by DnaSP. You can do this by using, for example, MacClade (Maddison and Maddison 1992) or SeqApp / SeqPup programs (Gilbert 1996).DnaSP can not directly analyze diploide genetic information (for instance, SNPs data from diploid genomic regions). If you are using diploid unphase data, you can reconstruct the phase using the Open Unphase/Genotype Data module.IntroductionAbstracts:DnaSP v. 1.0DnaSP v. 2.0DnaSP v. 3.0DnaSP v. 4.0DnaSP v. 5.0Population genetics is a branch of the evolutionary biology that tries to determine the level and distribution of genetic polymorphism in natural populations and also to detect the evolutionary forces (mutation, migration, selection and drift) that could determine the pattern of genetic variation observed in natural populations. Ideally, the best way to quantify genetic variation in natural populations should be by comparison of DNA sequences (Kreitman 1983). However, although the methodology for DNA sequencing is available since 1977 (Maxam and Gilbert 1977Sanger et al. 1977), until 1990 the use of DNA sequence data had had little impact on population genetics. This is because the effort (in terms of both money and time) required to obtain DNA sequence data from a relative large number of alleles was substantial.The introduction of the polymerase chain reaction (PCR) (Saiki et al. 1985;1988) which allows direct sequencing of PCR products and avoids, therefore, their cloning, has changed the situation. Undoubtedly this has produced a revolutionary change in population genetics. Although, at present, population studies at the DNA sequence level are still scarce and primarily carried out in Drosophila (for example: McDonald and Kreitman 1991Schaeffer and Miller 1993Rozas and Aguadé 1994), they will certainly increase in the future.The DnaSP (DNA Sequence Polymorphism) is a software addressed to molecular population geneticists and can compute several measures of DNA sequence variation within and between populations in noncoding, in synonymous or in nonsynonymous sites; gene flow, gene conversion (Betrán et al. 1997), recombination and linkage disequilibrium parameters. In addition, DnaSP performs some neutrality tests: the Hudson, Kreitman and Aguadé (1987), the Tajima (1989),McDonald and Kreitman 1991; and the Fu and Li (1993) tests. DnaSP takes advantage of the Microsoft Windows capabilities, so that it can handle a large number of sequences of thousands of nucleotides each on a microcomputer. Furthermore, DnaSP can easily exchange data with other programs, for example, programs to perform multiple sequence alignments, phylogenetic tree analysis, or statistical analysis.System requirementsSee Also:LimitationsDnaSP is written in Visual Basic v. 6.0 (Microsoft), and it runs on an IBM-compatible PC under 32-bit Windows. The minimum hardware requirements for the program are:a processor based on the Intel Pentium (or higher)32 megabytes of RAM memorya mousea hard disk.DnaSP also requires:Microsoft Windows, versions 95/98/NT/ME/2000/XP/VistaLarge data setsDnaSP has been successfully tested with data files as long as 120 Mbp (for instance, 30 DNA sequences of 4 Mbp each) in a Windows XP/Vista computer with 4 Gb of RAM memory.DnaSP under Linux and MacintoshDnaSP can also run on Apple Macintosh platforms (using VirtualBox, VMWare Fussion, Parallels Desktop or Virtual PC), Linux, and Unix-based operating systems (using VirtualBox, VMWare or Wine).Note: Using emulators, the computation speed of the program will decrease.LimitationsSee Also:System requirementsBoth the number and length of the sequences that can be handled by DnaSP mainly depend on the available memory. Nevertheless, DnaSP is able to use all RAM memory available in a computer, both the conventional and the extended memory. DnaSP can also use virtual memory (it can use the hard disk space as memory, although in this case the computation speed will be much lower than when using RAM). Thus, the program can handle large numbers of sequences of up to thousands nucleotides each.Input data file limitations:Maximum number of nucleotides per sequence: Depends on the available memory (> 3,000,000 nt).Maximum number of sequences: 32767Other limitations:The grid control cannot display more than 16351 rows or 5448 columns. Therefore, for the sliding window option, the maximum number of rows of results is 16351. Hence, the maximum number of polymorphic sites (linkage disequilibrium module) or of sequences (synonymous and nonsynonymous module) that can be analyzed and displayed on the screen is 181 (the total number of pairwise comparisons is: 181*180 / 2 = 16290). Although DnaSP will not display the results of these analyses on the screen, the results could be saved in a file.NOTE: These upper limits will be increased (I hope) in following DnaSP versions.Input Data FilesDnaSP can automatically read the following types of data file formats:FASTA,MEGA (Kumar et al. 1994),NBRF/PIR (Sidman et al. 1988),NEXUS (Maddison et al. 1997),PHYLIP (Felsenstein 1993),HapMap3 Phased HaplotypesIn all cases one or more homologous nucleotide sequences should be included in just one file (ASCII file). The sequences must be aligned (i. e. the sequences must have the same length).Nucleotide sequences should be entered using the letters A, T (or U), C or G (in lower case, upper case, or any mixture of lower and upper case).DnaSP allows you to analyze a subset of sites of the data file (this option is useful for the analysis of particular regions of the data file, for example, when analyzing exonic and intronic regions separately), or to carry out analyses in a subset of sequences of the data file (see the Include / Exclude Sequences command).FASTA formatSee Also:Input Data Files FASTA Format ExampleDnaSP can recognize FASTA data file formats (also called Person format). FASTA file format must begin with the symbol '>' in the first line of the file; the sequence name is the first word after that symbol. Additional characters in this line are considered to be comments. The sequence data starts in the second line. Nucleotide data can be written in one or more lines.DnaSP only recognize noninterleaved FASTA data files.Special charactersBlank spaces, Tabs, and Carriage returns are ignored (i. e. they can be used to separate blocks of nucleotides). By default DnaSP uses the following symbols:the hyphen character '-' to specify an alignment gap;the dot character '.' to specify that the nucleotide in this site is identical to that in the same site of the first sequence (i.e. identical site or matching symbol);the symbols '?', 'N', 'n' to designate missing data.Sequence nameThe sequence name can be up to 20 characters. Blank spaces and tabs are not allowed (underlines should be used to indicate a blank space).Example of FASTA format>seq_1 [comment -optional-]ATATACGGGGTTA---TTAGA----AAAATGTGTGTGTGTTTTTTTTTTCATGTG >seq_2 [comment -optional-]ATATAC--GGATA---TTACA----AGAATCTATGTCTGCTTTCTTTTTCATGTG >seq_3ATATACGGGGATA---TTATA----AGAATGTGTGTGTGTTTTTTTTTTCATGTG >seq_4ATATACGGGGATA---GTAGT----AAAATGTGTGTGTGTTTTTTTTTTCATGTGMEGA format (Kumar et al. 1994)See Also:Input Data Files Mega Format ExampleReferences:Kumar et al. 1994DnaSP can recognize interleaved and noninterleaved MEGA formats (DnaSP v. 1.0 only recognized noninterleaved MEGA formats). MEGA formats must contain the identifier #MEGA in the first line of the file. The second line must start with the word TITLE: followed by some comments (if any) on the data (comments within the sequences must be contained by a pair of double quotation marks: “comment“). The sequence data starts in the third line. The sequence name is the text after the character '#' until the first Blank space, Tab or Carriage return. The nucleotide sequence is written in one or more lines after the sequence name, until the next sequence name that also starts with the symbol '#' (see the MEGA user manual).Special charactersBlank spaces, Tabs, and Carriage returns are ignored (i. e. they can be used to separate blocks of nucleotides). By default DnaSP uses the following symbols: the hyphen character '-' to specify an alignment gap; the dot character '.' to specify that the nucleotide in this site is identical to that in the same site of the first sequence (i.e. identical site or matching symbol); the symbols '?', 'N', 'n' to designate missing data. Nevertheless, these symbols can be changed in the dialog box that appears when opening a data file.Sequence nameThe sequence name can be up to 20 characters. Blank spaces and tabs are not allowed (underlines should be used to indicate a blank space).Example of MEGA format#MEGATITLE: 4 sequences (55 nucleotides). File: EX##N1.MEG#seq_1ATATACGGGGTTA---TTAGA----AAAATGTGTGTGTGTTTTTTTTTTCATGTG #seq_2......--..A........C......G...C.A...C..C...C........... #seq_3..........A........T......G............................ #seq_4..........A.....G...T..................................NBRF/PIR format (Sidman et al. 1988)See Also:Input Data Files NBRF/PIR Format ExampleReferences:Sidman et al. 1988In the NBRF/PIR files, the sequence names are placed immediately after the identifier >DL; . The next line is used for comments. The nucleotide sequence is written in the next line (in one or more lines) and is ended with the symbol '*' . The file must contain nucleotide sequences in a noninterleaved form.Sequence dataBlank spaces, Tabs, and Carriage returns are ignored (i. e. they can be used to separate blocks of nucleotides). The hyphen character '-' must be used to specify an alignment gap. The dot character '.' can be used to specify that the nucleotide in this site is identical to that in the same site of the first sequence. The symbols '?', 'N', 'n' could be used to designate missing data. No other symbols are allowed.Sequence nameThe sequence name can be up to 20 characters. Blank spaces and tabs are not allowed (underlines should be used to indicate a blank space).Example of NBRF/PIR format>DL;seq_1Comment on seq 1 (example file: EX##N1.NBR).ATATACGGGG TTA---TTAG A----AAAAT GTGTGTGTGT TTTTTTTTTC ATGTG* >DL;seq_2Comment: seq 2ATATAC--GG ATA---TTAC A----AGAAT CTATGTCTGC TTTCTTTTTC ATGTG* >DL;seq_3Comment: seq 3ATATACGGGG ATA---TTAT A----AGAAT GTGTGTGTGT TTTTTTTTTC ATGTG* >DL;seq_4Comment: seq 4ATATACGGGG ATA---GTAG T----AAAAT GTGTGTGTGT TTTTTTTTTC ATGTG*NEXUS File format (Maddison et al. 1997)See Also:Input Data Files NEXUS Format Example 1NEXUS Format Example 2References:Maddison et al. 1997DnaSP can read NEXUS file formats. These files are standard text files that have been designed (Maddison et al. 1997) to store systematic data. DnaSP can read NEXUS files (both old and new versions, Maddison et al. 1997) containing DNA or RNA sequence data. The file can contain one or more sequences; in the later case, the homologous nucleotide sequences must be aligned (i. e. the sequences must have the same length).Nucleotide sequences should be entered using the letters A, T (or U), C or G (in lower case, upper case, or any mixture of lower and upper case). Blank spaces and Tabs are ignored (i. e. they can be used to separate blocks of nucleotides). Carriage returns are also ignored in non-interleaved file formats.Alignment gap symbolThe symbol used to designate an alignment gap should be indicated by the subcommand GAP:For example, GAP=- indicates that the hyphen character '-' should be used to specify an alignment gap.Default symbol: -Identical site (matching character) symbolThe symbol used to designate that the nucleotide in a site is identical to that in the same site of the first sequence should be indicated by the subcommand MATCHCHAR:For example, MATCHCHAR=.Default symbol: .Missing data symbolThe symbol used to designate missing data should be indicated by the subcommand MISSING:For example, MISSING=?Default symbol: ?Note: the following symbols are not allowed in the subcommands GAP, MISSING, and MATCHCHAR:The white space, and ( ) [ ] { } / \ , ; : = * ' " ` < >(see Maddison et al. 1997).Moreover, these subcommands cannot share the same symbol.Sequence nameThere is no limit for the sequence name length; nevertheless, DnaSP will only display the first 20 characters. Blank spaces and tabs are not allowed (underlines should be used to indicate a blank space).Interleaved formatNEXUS files can contain nucleotide sequences with interleaved and non-interleaved formats. The former format must be indicated by the subcommand INTERLEAVENEXUS blocksNEXUS blocks must end with the command END;. DnaSP will read the following NEXUS blocks (see Maddison et al. 1997):DATA, TAXA, CHARACTERS blocks. These blocks contain information about the taxa and the molecular sequence data.SETS block. That block allows the user to store information of groups of sequences, characters, taxa, etc. DnaSP only uses the TaxSet command. This block contains information about groups of sequences.NOTE: See also Define Sequence Sets.CODONS block. This block contains information about the genetic code, and about the regions of the sequence that are noncoding, or protein coding regions.NOTE: See also Assign Coding Regions.CODONUSAGE block. This is a private NEXUS that contains information about the specific table of Preferred and Unpreferred codons that will be used in the Preferred and Unpreferred Synonymous Substitutions analysis. There are 8 predefined tables; nevertheless, the user can define their own table.Subcommands:Pref*: subcommand. Includes the preferred codons.Unknown: subcommand. Includes codons of unknown preference nature.NOTE: See also the Data Menu. See also the NEXUS Format Example 1.DNASP block. This is a private NEXUS block that contains information about:i) the chromosomal location of the DNA region:CHROMOSOMALLOCATION= command. There are 8 predefined chromosomal locations:AutosomeXchromosomeYchromosomeZchromosomeWchromosomeprokaryoticmitochondrialchloroplastii) or the organism’s genomic type:GENOME= command. There are 2 predefined genomic types:DiploidHaploidNOTE: See also Data Menu#NEXUS[This is an example of the new NEXUS file format, NEXUS version 1. This is the version used by MacClade 3.05 or later. File: EX##new1.nex]BEGIN TAXA;DIMENSIONS NTAX=4;TAXLABELSseq_1seq_2seq_3seq_4;END;BEGIN CHARACTERS;DIMENSIONS NCHAR=55;FORMAT DATATYPE=DNA MISSING=? GAP=- MATCHCHAR=. INTERLEAVE ;MATRIXseq_1 ATATACGGGGTTA---TTAGA----AAAATGTGTGTGTGTseq_2 ......--..A..---...C.----.G...C.A...C..Cseq_3 ..........A..---...T.----.G.............seq_4 ..........A..---G...T----...............seq_1 TTTTTTTTTCATGTGseq_2 ...C...........seq_3 ...............seq_4 ...............;END;BEGIN SETS;TaxSet Barcelona = 1-2;TaxSet Girona = 3;TaxSet Catalunya = 1-3;TaxSet Outgroup = 4;END;BEGIN CODONS;CODONPOSSET * UNTITLED =N: 1 2 6-26 51-55,1: 3 27-48\3,2: 4 28-49\3,3: 5 29-50\3;CODESET * UNTITLED = Universal: all ;END;BEGIN CODONUSAGE;PREFUNPREFCODONS GENETICCODE=Universal Drosophila_melanogaster =PREF*: UUC UCC UCGUAC UGC CUC CUGCCC CAC CAG CGCAUC ACC AAC AAGAGC GUC GUG GCCGAC GAG GGC;END;BEGIN DNASP;CHROMOSOMALLOCATION= Autosome;GENOME= Diploid;END;#NEXUS[This is an example of the Old NEXUS File Format used by MacClade 3.0 File: EX##old1.nex]BEGIN DATA;DIMENSIONS NTAX=4 NCHAR=55;FORMAT MISSING=? GAP=- DATATYPE=DNA ;MATRIXseq_1 ATATACGGGGTTA---TTAGA----AAAATGTGTGTGTGTTTTTTTTTTCATGTGseq_2 ATATAC--GGATA---TTACA----AGAATCTATGTCTGCTTTCTTTTTCATGTGseq_3 ATATACGGGGATA---TTATA----AGAATGTGTGTGTGTTTTTTTTTTCATGTGseq_4 ATATACGGGGATA---GTAGT----AAAATGTGTGTGTGTTTTTTTTTTCATGTG;END;BEGIN CODONS;CODPOSSET UNTITLED =1: 3 27 30 33 36 39 42 45 48,2: 4 28 31 34 37 40 43 46 49,3: 5 29 32 35 38 41 44 47 50;GENCODE UNIVNUC;END;PHYLIP format (Felsenstein 1993)See Also:Input Data Files PHYLIP Format ExampleReferences:Felsenstein 1993DnaSP can recognize interleaved and noninterleaved PHYLIP formats. PHYLIP formats must contain two integers in the first line of the file: the first number indicates the number of sequences in the data file, while the second indicates the total number of sites. The sequence data starts in the second line. The sequence name can be up to 10 characters. The nucleotide sequence starts immediately (position 11). Nucleotide data can be written in one or more lines.In PHYLIP interleaved formats, the sequence name must be indicate only in the first block.Special charactersBlank spaces, Tabs, and Carriage returns are ignored (i. e. they can be used to separate blocks of nucleotides). By default DnaSP uses the following symbols:the hyphen character '-' to specify an alignment gap;the dot character '.' to specify that the nucleotide in this site is identical to that in the same site of the first sequence (i.e. identical site or matching symbol);the symbols '?', 'N', 'n' to designate missing data.Sequence nameThe sequence name can be up to 10 characters. Blank spaces are allowed.Example of PHYLIP format4 55seq_1 ATATACGGGGTTA---TTAGA----AAAATGTGTGTGTGTTTTTTTTTTCATGTG secuencia2ATATAC--GGATA---TTACA----AGAATCTATGTCTGCTTTCTTTTTCATGTG DmelanogasATATACGGGGATA---TTATA----AGAATGTGTGTGTGTTTTTTTTTTCATGTG seq_4 ATATACGGGGATA---GTAGT----AAAATGTGTGTGTGTTTTTTTTTTCATGTGHapMap3 Phased Haplotypes FormatDnaSP can recognize HapMap3 Phased Haplotypes file formats (phased haplotypes generated in the third HapMap phase). HapMap3 Phased Haplotypes format is a space-separated file with phased SNP information (haplotype information).See the HapMap3 Phased Haplotypes Format Example. It contains 3 individuals (in total 6 chromosomes -or haplotypes-) with 9 positions (8 polymorphic and 1 monomorphic).First rowrsID position_b36 NA19028_A NA19028_B NA19031_A NA19031_B NA19035_A NA19035_BThe first row must contain –separated by spaces- two any strings (in the above example rsID and position_b36) followed by the haplotypes IDs (the IDs must end with "_A" or "_B").In the example, NA19035_A and NA19035_B correspond to the two haplotypes IDs from individual NA19035. Following rowsrs28832292 18095260 C T T T T TThe first column is the SNP ID (rs28832292) and the second column the physical position in the reference chromosome (18095260). The subsequent columns contain the 6 nucleotide variants (from position 18095260). For instance, the nucleotide variants of the chromosomes NA19028_A and NA19028_B in the 18095260 position are a C and a T, respectively.Special charactersDouble-spaces and tabs are treated as a single spaces.Other symbols than A,C, G, T, U, N,? or - are not accepted.NoteDnaSP will export any data file to the HapMap3 format including only polymorphic sites (but also positions with gaps/mising data).Very Important NoteSince this format might not contain all the monomorphic sites, statistics based on the physical distance, or in the total number of positions (i.e., per-site genetic distances like π, K, nucleotide divergence, D xy, D a, etc) will be incorrect.Example of HapMap3 Phased Haplotypes formatrsID position_b36 NA19028_A NA19028_B NA19031_A NA19031_B NA19035_A NA19035_B rs28832292 18095260 C T T T T Trs28439049 18136371 A A A A A Ars28505894 18179985 C C T C C Crs35630207 18206177 C C C A C Crs28842485 18325726 A A C A A Ars4633700 18357066 G G C G G Grs2300680 18398549 G G C G G Grs28620789 18520261 A A A C A Ars28841911 18534123 T C T T T COpen Multiple Data FilesDnaSP can automatically read several data file formats (see Input Data Files).This module also allows you to analyse -at once- multiple files sequentially (as a Batch mode). These data files can contain different number of sequences, or different genomic regions.AnalysisThis command can analyse sequentially several data files. It can compute a number of measures of the extent of DNA polymorphism/divergence and can also conduct some common neutrality tests.Analysis OptionsThis option allows you to choose between analyzing:1. DNA Polymorphism1.1 GC contentq G+Cn, G+C content at noncoding positions.q G+Cc, G+C content at coding positions.1.2 Haplotype/Nucleotide Diversity and Divergenceq The number of Segregating Sites, Sq The total number of mutations, Etaq The number of haplotypes NHap (Nei 1987, p. 259).q Haplotype (gene) diversity and its sampling variance (Nei 1987).q Nucleotide diversity, Pi (π) (Nei 1987), and its sampling variance (not implemeneted yet) (Nei 1987, equation 10.7).q The average number of nucleotide differences, k (Tajima 1983).q Nucleotide divergence with Jukes and Cantor, K(JC) (Nei 1987).q Theta (per gene or per site) from Eta (η) or from S (Watterson 1975; Nei 1987).q ZnS statistic (Kelly 1997, equation 3).1.3 Neutrality testsq Tajima’s D (Tajima 1989), and its statistical significance.q Fu and Li’s D* (Fu and Li 1993), and its statistical significance.q Fu and Li’s F* (Fu and Li 1993), and its statistical significance.q Fu’s Fs (Fu 1997).2. DNA Polymorphism/DivergenceIn addition of the DNA Polymorphism statistics (1.1, 1.2 and 1.3), DnaSP will also compute:q K(JC), average number of substitutions per site (using the Jukes and Cantor correction).q Fu and Li’s D (Fu and Li 1993), and its statistical significance.q Fu and Li’s F (Fu and Li 1993), and its statistical significance.q Fay and Wu's H (Fay and Wu 2000), and the normalized Fay and Wu's H tests.OutgroupThis option allows you to select the outgroup DNA sequence (the first or the last sequence) for the DNA Polymorphism/Divergence analysis. The remaining sequences will be considered the ingroup (intraspecific data). OutputResults are presented in a grid (table). You can save these results on a text file which can be opened by any spreadsheet (such as Excel).More Information in the specific modules:Codon Usage Bias DNA Polymorphism Fu and Li's (and other) Tests Linkage Disequilibrium Tajima's TestAbbrevations:n.d., not determined (not implemented yet).n.a., not available.n.s., not significant.Open Unphase/Genotype Data FilesReferences: Stephens et al. 2001Stephens and Donelly 2003Scheet and Stephens 2006Wang and Xu 2003DnaSP can automatically read unphase (or genotype) data files (diploid individuals) in FASTA format (see FASTA). This format is the standard FASTA format but including the IUPAC nucleotide ambiguity codes to represent heterozygous sites.Suppose a data set containing 5 diploid individuals (therefore a total of 10 sequences) with 16 positions each.* * *Ind1 TRCAAGACCGGAGGCGInd2 .A.C..--........Ind3 .A..M.......S...Ind4 .A---.......C...Ind5 .G..C....-------For instance, as the second site of Ind1 is heterozygous (R = Purine; A and G), Ind1 includes the following two sequences:Ind1-1 TACAAGACCGGAGGCGInd1-2 .G..............As there is not heterozygous site in Ind2, then the two composing sequences are:Ind2-1 TACCAG--CGGAGGCGInd2-2 ......--........This DnaSP module allows reconstructing the 10 sequences from the 5 individuals. DnaSP might handle and use the reconstructed data set (10 sequences of 16 nucleotides each) for further analysis.Haplotype ReconstructionDnaSP can reconstruct the haplotype phases from unphase data. This haplotype reconstruction is conducted using the algorithms provided by PHASE (Stephens et al. 2001; Stephens and Donelly 2003), fastPHASE (Scheet and Stephens 2006) and HAPAR (Wang and Xu 2003).PHASE 2.1 uses a coalescent-based Bayesian method to infer the haplotypes. It can also be used to estimate the recombination rate along the sequences.fastPHASE 1.1 modifies the PHASE algorithm taking into account the patterns of linkage disequilibrium and its gradual decline with physical distance.HAPAR uses a pure parsimony approach to estimate the haplotypes; the optimal solution is that which requires less haplotypes to resolve the genotypes. For positions not completly resolved, the user can choose between to replace unresolved positions as "N", or to assign the nucleotide variants randomly.Note:fastPHASE and HAPAR can handle only diallelic polymorphic positions. Nevertheless, polymorphic positions segregating for three or more variants can be resolved with PHASE.Very important:See the PHASE, fastPHASE or HAPAR documentation for more information and details.。

相关文档
最新文档