Genome-Wide Association Studies For Common Disease and Complex

合集下载

全基因组关联分析(GWAS)取样策略

全基因组关联分析(GWAS)取样策略

全基因组关联分析(GWAS)取样策略GWAS要想做得好,材料选择是至关重要的一环。

So,小编查阅了上百篇GWAS文献,精心梳理了一套GWAS的取样策略,是不是很贴心呢?赶紧来学习一下吧!一、常见经济作物样本选择对于经济作物来说,一般都有成百上千个品系,其中包括野生种、地方栽培种、驯化种及商业品种。

一般选择多个品系来确保群体遗传多样性。

文献中常见的经济作物的样本收集于全国或者全世界各地。

表1 常见经济作物样本收集二、常见哺乳动物样本选择对于哺乳动物,一般选择雄性个体作为研究对象(除研究产奶、产仔等性状外),并且要求所研究的对象年龄相近。

下表是我们统计的一些已发表的哺乳动物取材案例,供大家参考。

表2 常见哺乳动物样本收集三、常见家禽类样本选择对于家禽而言,一般会选择家系群体(全同胞家系或半同胞家系)。

为了增加分析内容,可以构建多个家系群体进行研究。

此外,尽量使群体所有个体生长环境以及营养程度保持一致,同时家禽的年龄也尽量保持一致,这对表型鉴定的准确性有很大的帮助。

表3 常见家禽类样本收集四、林木类样本选择对于林木类,一般选择同一物种的多个样本,多个样本做到表型丰富。

表4 林木类样本收集五、其他物种样本选择对于原生生物以及昆虫等的取样策略,可以参考表5中已发表的文献。

表5 其他物种样本收集有这么多文献支持,各位看官是不是已经整明白了GWAS该如何取材呢?最后,小编再温馨提示一句,根据文献统计及项目经验,一般来说,GWAS的样本大小要不少于300个才是极好的。

参考文献[1] Jia G, Huang X, Zhi H, et al. A haplotype map of genomic variations and genome-wide association studies of agronomic traits in foxtail millet (Setaria italica)[J]. Nature Genetics, 2013, 45(8):957-61.[2] Zhou L, Wang S B, Jian J, et al. Identification of domestication-related loci associated with flowering time and seed size in soybean with the RAD-seqgenotyping method[J]. Scientific reports, 2015, 5.[3]Zhou Z, Jiang Y, Wang Z, et al. Resequencing 302 wild and cultivated accessions identifies genes related to domesticatio n and improvement in soybean[J]. Nature Biotechnology, 2015, 33(4):408-414.[4] MorrisG P, Ramu P, Deshpande S P, et al. Population genomic and genome-wide association studies of agroclimatic traits in sorghum[J].Proceedings of the National Academy of Sciences, 2013, 110(2): 453-458.[5] Yano K, Yamamoto E, Aya K,et al. Genome-wide association study using whole-genome sequencing rapidly identifies new genes influencing agronomic traits in rice[J]. Nature Genetics, 2016, 48(8).[6] Wang X, Wang H, Liu S, et al. Genetic variation in ZmVPP1 contributes to drought tolerance in maize seedlings[J]. Nature Genetics, 2016.[7] Pryce J E, Bolormaa S, Chamberlain A J, et al. A validated genome-wide association study in 2 dairy cattle breeds for milk production and fertility traits using variable length haplotypes[J]. Journal of dairy science, 2010, 93(7):3331-3345.[8] Hayes B J, Pryce J, Chamberlain A J, et al. Genetic architecture of complex traits and accuracy of genomic prediction:coat colour, milk-fat percentage, and type in Holstein cattle as contrastingmodel traits[J]. PLoS Genet, 2010, 6(9): e1001139.[9] Heaton M P, Clawson M L, Chitko-Mckown C G,et al. Reduced lentivirus susceptibility in sheep with TMEM154 mutations[J].PLoS Genet, 2012, 8(1): e1002467.[10] Tsai K L, Noorai R E, Starr-Moss A N, et al. Genome-wide association studies for multiple diseases of the German Shepherd Dog[J]. Mammalian Genome, 2012, 23(1-2): 203-211.[11] Petersen J L, Mickelson J R, Rendahl A K, et al. Genome-wide analysis reveals selection for important traits in domestic horse breeds[J]. PLoS Genet, 2013,9(1): e1003211.[12] Do D N, Strathe A B, Ostersen T, et al. Genome-wide association study reveals genetic architecture of eating behaviorin pigs and its implications for humans obesity by comparative mapping[J]. PLoS One, 2013, 8(8).[13] Daetwyler H D, Capitan A, Pausch H, et al. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic andcomplex traits in cattle[J]. Nature genetics, 2014, 46(8): 858-865.[14] Wu Y, Fan H, Wang Y, et al. Genome-Wide Association Studies Using Haplotypes and Individual SNPs in Simmental Cattle[J]. PLoS One,2014,9(10): e109330.[15] Parker C C, Gopalakrishnan S, Carbonetto P,et al.Genome-wide association study of behavioral, physiological and gene expression traits in outbred CFW mice[J]. Nature Genetics, 2016.[16] Gu X, Feng C, Ma L, et al. Genome-wide association study of body weight in chicken F2 resource population[J]. PLoS One, 2011, 6(7): e21872.[17] Xie L, Luo C, Zhang C, et al. Genome-wide association study identified a narrow chromosome 1 region associated with chicken growth traits[J]. PLoS One, 2012, 7(2): e30910.[18] Liu R, Sun Y, Zhao G, et al. Genome-Wide Association Study Identifies Loci and Candidate Genes for Body Composition and Meat Quality Traits in Beijing-You Chickens[J]. Plos One, 2012, 8(4):-.[19] Evans L M, Slavov G T, Rodgers-Melnick E, et al. Population genomics of Populus trichocarpa identifies signatures of selection and adaptive trait associations[J]. Nature genetics, 2014.[20] Porth I, Klapšte J, Skyba O,et al. Genome‐wide association mapping for wood characteristics in Populus identifiesan array of candidate single nucleotide polymorphisms[J]. New Phytologist,2013, 200(3): 710-726.[21] Van Tyne D, Park D J, Schaffner S F, et al. Identification and functional validation of the novel antimalarial resistance locus PF10_0355 in Plasmodium falciparum[J]. PLoS Genet, 2011, 7(4): e1001383.[22] Ke C, Zhou Z, Qi W, et al. Genome-wide association study of 12 agronomic traits in peach[J]. Nature Communications,2016, 7:13246.[23] Miotto O, Amato R, Ashley E A, et al. Genetic architecture of artemisinin-resistant Plasmodium falciparum[J]. Naturegenetics, 2015, 47(3): 226-234.[24] Spötter A, Gupta P, Nürnberg G, et al. Development of a 44K SNP assay focussing on the analysis of a varroa‐specific defence behaviour in honey bees (Apis mellifera carnica)[J]. Molecular ecology resources, 2012, 12(2): 323-332.重测序业务线靳姣姣丨文案武苾菲丨编辑。

复杂疾病全基因组关联研究进展——遗传统计分析

复杂疾病全基因组关联研究进展——遗传统计分析
关键词: 全基因组关联研究; 检验效能; 多重检验校正; 人群混杂; 重复
Genome-wide association study on complex diseases: genetic statistical issues
YAN Wei-Li
School of Public Health, Xinjiang Medical University, Urumqi 830054, China
HEREDITAS (Beijing) 200253-9772
DOI: 10.3724/SP.J.1005.2008.00543
综述
复杂疾病全基因组关联研究进展——遗传统计分析
严卫丽
新疆医科大学公共卫生学院, 乌鲁木齐 830054
545
1.1 基于无关个体(Unrelated individual)的关联分析
基于无关个体的研究设计分为病例对照研究设 计(Case-control study)和基于随机人群的关联分析 (Population-based association analysis)两种情况。前 者主要用来研究质量性状(是否患病), 而后者主要 用来研究数量性状。根据研究设计不同和研究表型 的不同, 采用的统计分析方法亦不同。如病例对照 研究设计(质量性状), 比较每个 SNP 的等位基因频 率在病例和对照组中的差别可采用 4 格表的卡方检 验, 计算相对危险度(Odds Ratio, OR 值)及其 95%的 可 信 限 , 进 而 可 以 计 算 归 因 分 数 (Attributable fraction, AF)和归因危险度(Attributable risk, AR)。需 要调整主要的混杂因素, 如年龄、性别等, 则采用 logistic 回归分析, 以研究对象患病状态为因变量, 以基因型和混杂因素作为自变量进行分析。当研究 设计是基于随机人群时 (数量性状), 如研究 SNP 与 某一疾病数量表型的关联时, 如 BMI, 我们比较该 位点 3 种基因型携带者 BMI 水平是否有差别(单因素 方差分析), 当需要调整混杂因素时, 采用协方差分 析或者线性回归方程。

昼夜节律钟基因多态性与儿童注意缺陷多动障碍及睡眠问题的相关分析重点

昼夜节律钟基因多态性与儿童注意缺陷多动障碍及睡眠问题的相关分析重点
attention-deficit/hyperactivity disorder Mental
Disorders(Peki,Lg
University Sixth
Hospital),and
the Key Laboratory
of Mentaf Heahh,Ministry of
Health(Peking University).Beijing JDDl91.China
组中,携带AG/GG基因型较携带AA基因型的患儿注意缺陷分更高『(19.4+4.3)分与(18.7±4.O)分,仁
4.44,P=0.04]。结论CLOCK基因rs6832769与睡眠问题对ADHD患儿的核心症状存在交互作用,尤 其是注意缺陷症状。
【关键词】注意力缺陷障碍伴多动;睡眠;多态性,单核苷酸
China(81571340)
注意缺陷多动障碍(attention—deficit/hyperactivitv disorder,ADHD)是一种儿童期常见的神经发育障 碍性疾病,临床表现主要包括注意力不集中、多动 和冲动,其患病率约为7%l”。ADHD具有高度遗传 性,遗传度约为0.76口,,遗传异质性强,是多种基因 遗传的复杂疾病。多项研究显示,位于昼夜节律钟
Corresponding
author:Qian Qi蛳n,Email:qianqiujin@bjmu.edu.ca
【Abstract】0bjective To explore the association between polymorphisms of clock circadian regulator(CLOCK)gene and attention-deficit/hyperactivity disorder(ADHD),and ADHD symptoms,as well

gwas研究基本概念1

gwas研究基本概念1

gwas研究基本概念1
GWAS(Genome-Wide Association Studies)是一种遗传学研究方法,用于寻找基因组中与特定性状或疾病相关的遗传变异。

它的基本概念如下:
1. 基因组覆盖:GWAS研究需要覆盖整个基因组的遗传变异,以确保不会错过任何与疾病或性状相关的遗传变异。

2. 关联分析:GWAS通过对研究对象的基因组和表型数据进
行关联分析,来发现与疾病或性状相关的遗传变异。

关联分析通常使用单核苷酸多态性(SNPs)作为遗传变异的标记,并
通过比较不同基因型的频率与相关性来确定它们之间的关联。

3. 候选基因和关联区域:GWAS经常会发现一些与疾病或性
状相关的候选基因或关联区域。

候选基因可能与已知的生物学过程或相关基因的功能有关,进一步研究可以解析其在疾病发展中的作用。

4. 多态性和复杂性:GWAS研究揭示了基因多态性和复杂性
在疾病或性状发生发展中的作用。

多个基因通常与一个特定性状或疾病相关,而每个基因的影响可能相对较小。

5. GWAS研究的局限性:GWAS的结果通常需要进一步验证
和功能研究,以确认与疾病相关的候选基因或关联区域,并了解其作用机制。

此外,GWAS主要关注常见变异对疾病的影响,而较罕见的变异可能被忽略。

总之,GWAS是一种通过关联分析来寻找基因组中与性状或疾病相关的遗传变异的方法,它为研究复杂疾病的遗传基础提供了重要的信息。

原发性胆汁性胆管炎遗传易感性的研究现状

原发性胆汁性胆管炎遗传易感性的研究现状

原发性胆汁性胆管炎遗传易感性的研究现状赵春梅,马狄,邰文琳昆明医科大学第二附属医院检验科,昆明 650032通信作者:邰文琳,*********************(ORCID:0000-0002-8278-929X)摘要:原发性胆汁性胆管炎(PBC)是一种以胆管上皮细胞变性坏死为主,好发于中老年女性,具有强烈的遗传倾向性的肝脏自身免疫性疾病。

随着全基因组关联分析(GWAS)的不断发展,PBC的遗传易感性备受关注。

本文阐述了与PBC密切相关的遗传易感基因的研究进展,以期为PBC治疗提供有效靶点。

关键词:原发性胆汁性胆管炎;疾病遗传易感性; HLA抗原基金项目:国家自然科学基金(82060385);昆明医科大学2023年研究生创新基金(2023S321)Current status of research on the genetic susceptibility of primary biliary cholangitisZHAO Chunmei,MA Di,TAI Wenlin.(Department of Clinical Laboratory,The Second Affiliated Hospital of Kunming Medical University, Kunming 650032, China)Corresponding author: TAI Wenlin,*********************(ORCID: 0000-0002-8278-929X)Abstract:Primary biliary cholangitis (PBC)is a liver autoimmune disease with a strong genetic tendency characterized by the degeneration and necrosis of bile duct epithelial cells,and it is often observed in middle-aged and elderly women. With the continuous development of genome-wide association studies,the genetic susceptibility of PBC has attracted more and more attention. This article elaborates on the research advances in the genetic susceptibility genes closely associated with PBC, in order to provide effective targets for the treatment of PBC.Key words:Primary Biliary Cholangitis; Genetic Predisposition to Disease; HLA AntigensResearch funding:National Natural Science Foundation of China (82060385); Postgraduate Innovation Fund of Kunming Medical University in 2023 (2023S321)原发性胆汁性胆管炎(primary biliary cholangitis,PBC)是一种以肝内小胆管破坏为特征的慢性胆汁淤积性自身免疫性肝病[1]。

s1外文文献

s1外文文献

F orPe e r R e v i ewGenome-Wide Association Study of Eight Carcass Traits inJinghai Yellow Chicken using SLAF-seq TechnologyJournal: Poultry ScienceManuscript ID: DraftManuscript Type: Full-Length ArticleKey Words: Carcass traits, chicken, GWAS, SLAF-seqPoultry ScienceF or Pe er Re vi ew Genome-Wide Association Study of Eight Carcass Traits in Jinghai Yellow1 Chicken using SLAF-seq Technology2ABSTRACT3 Carcass traits are the most important yield components in chicken (Gallus gallus ).4 Investigation of its carcass traits will help develop high-yield varieties in chicken. In order to5 identify the single-nucleotide polymorphisms (SNPs) and candidate genes affecting carcass traits,6 a genome-wide study of the association of eight carcass traits was performed in four7 hundred43-week-old Jinghai Yellow chickens. The SNPs that significantly associated with the8 phenotypic traits were identified by simple general linear model (GLM) and compressed mixed9 linear models (MLM). A total of fifteen SNPs were found to be significantly associated with eight 10 traits and 12 functional genes at a threshold of P <1.87E-6in the region of 75.5–76.1Mb chr4. This 11 region had the most significant effects on carcass weight, foot weight, and wing weight. Another 12 84-kb region on GGA3 for eviscerated weight and semi-eviscerated weight was identified. In 13 summary, these identified genes and SNPs will offer essential information for cloning14 yield-related genes in chicken.15Key words16Carcass traits; chicken; GWAS; SLAF-seq17 18Page 1 of 30Poultry ScienceF or Pe er Re vi ew Introduction19The increasing world population has resulted in a growing demand for meat products. To meet 20 this demand, the poultry industry has improved productivity rates mainly through genetic 21 improvement.22Carcass traits as the most important factors in the poultry industry decide the benefits of 23 industrialization. Remarkable advances on carcass traits have been achieved, and many relative 24 genes and quantitative trait loci (QTL) have been found (Abasht et al. 2007; Atzmon et al . 2008; 25 Fang et al . 2010; Tang et al . 2010). Numerous QTL have been mapped for performance and 26 carcass traits in different chromosomes (Hu et al . 2013).In a previous study, Ambo et al . (2009) 27 mapped a QTL for chicken body weight at 35 and 42 days using microsatellite markers in 28 chromosome 4. In another study with the same population, Baron et al . (2011) mapped a QTL for 29 percentage of thighs and drumsticks in the same region of chromosome 4.However, the 30 application of these QTL results in broiler breeding remains impractical because of low mapping 31 precision. These measures are labor intensive and are some degree of aimlessness. Genome-wide 32 association studies (GWAS) are now used to search for single nucleotide proteins (SNPs) and 33 functional genes that affect quantitative traits. A GWAS need not assume that genes or QTLs are 34 associated with specific traits (Hardy et al . 2009). And more, it is used to examine traits and 35 genetic markers (Cho et al . 2009; Liu et al . 2008; McCarthy et al . 2008). However, population 36 size greatly affects GWAS accuracy.37High-throughput sequencing technologies can provide new strategies for sequence-based SNP 38 genotyping. Whole-genome resequencing strategies can be used to genotype large variations39Page 2 of 30Poultry ScienceF or Pe er Re vi ew among samples (Lam et al . 2010; Rubin et al . 2010; Xia et al . 2009), but it remains cost 40 prohibitive in large populations. SLAF-seq is a new reduced representation sequencing technology 41 that uses bioinformatics methods to design a tag development plan and a screen-specific fragment 42 length to achieve mass labeling using high-throughput technologies that can adequatelyidentify 43 target species’ genome-wide information.44This study aimed to identify potential loci and candidate genes affecting carcass traits in 45 43-week-old Jinghai Yellow chickens using the specific-locus amplified fragment sequencing 46 (SLAF-seq) (Sun et al . 2013). In this strategy repetitive sequences can be avoided by using 47 predesigned schemes. And the selected fragment number can be decided for personalized research 48 purposes to maintain the balance between marker density and population size. Reference genome 49 sequences and polymorphism information are not needed when this strategy used. In the present 50 work, 400 chickens from a conservation population of a Chinese local breed (Jinghai Yellow 51 chicken) were used in GWAS. A total of eight carcass traits were measured.52Material and method 53Experimental Animals54The animals used in this study were obtained from the Jinghai Yellow Chicken Breeding Station. 55 Four hundred females of the same batch from the same generation were randomly chosen. All had 56 complete genealogical records and were reared in stair-step caging under the same recommended 57 nutritional and environmental conditions. A total of eight carcass quality traits were measured for 58 the GWAS: carcass weight (CW), foot weight (FW), single wing weight (WW), single breast 59 muscle weight (BMW), single leg muscle weight (LMW), abdominal fat weight (AW), eviscerated60Page 3 of 30Poultry ScienceF or Pe er Re vi ew weight (EW), and semi-eviscerated weight (SEW). After a 12-h fast the chicken were weighed and 61 slaughtered at d 300 by standard commercial procedures and the CW, FW, and WW values were 62 recorded. The adipose tissues surrounding the proven triculus and gizzard along with those located 63 around the cloacae were weighed as AW (Ain et al . 1996; Zhao et al . 2007). EW and SEW were 64 collected. The carcasses were dissected into deboned, skinless thighs and breasts for the 65 assessment of BMW and LMW.66SLAF-seq Technology Scheme Design67SLAF-seq was used to genotype a total of 400 individuals, as previously described (Qi et al .68 2014), with a few modifications. Genomic DNA (≥600 ng) from Jinghai Yellow chickens was 69 extracted from the blood samples using Dzup (Blood) Genomic DNA Isolation Reagent (Sangon 70 Biotech) and diluted to 50–100 µg/µL. DNA was incubated at 37°Cwith T4 DNA ligase (NEB), 71 0.6 U MseI (NewEnglandBiolabs, Hitchin, Herts, UK), A TP (NEB) and MseI adapters. 72 Restriction-ligation reactions were heat-inactivated at 65°C and then digested in an additional 73 reaction with the restriction enzymes HaeIII at 37°C. The PCR reactions contained the diluted 74 restriction-ligation samples, dNTP, Taq DNA polymerase (NEB), and an MseI primer containing a 75 barcode. The PCR products were purified using TaKaRa DNA Fragment Purification Kit Ver.2.0 76 and then pooled. The pooled sample was incubated at 37°C with MseI, T4 DNA ligase, A TP and 77 Solexa adapters. The samples were purified using a Quick Spin column (Qiagen) and then 78 separated on a 2% agarose gel to isolate the 500–800-bp fragments using a Gel Extraction Kit 79 (Qiagen). These fragments were used in a PCR amplification with Phusion Master Mix (NEB) and 80 Solexa. The Phusion PCR settings followed the Illumina sample preparation guide. The samples 81 were gel-purified and the products of appropriate sizes (300–500 bp) were excised and diluted for82Page 4 of 30Poultry ScienceF or P e e r R e v i ew sequencing using an IlluminaHiSeq TM 2000. Sequencing using theIlluminaHiSeq TM 2000 produced 83 primitive reads (double end sequence) that we evaluated and mapped using SOAP 2.20 software 84 (Lietal.2009)toassemblenewlyreferencedgenomes85 (Ensembl:ftp:///pub/release-75/fasta/gallus_gallus/dna/) to ensure that the original 86 sequencing data were effectively obtained. We chose the double-end sequences compared to the 87 only locus of the genome to do SLAF label employer. In line with the comparison error correction 88 result, we chose the group whose average depth of sequencing was not <4 to define the SLAF89 label.90Genotyping and Statistical Analysis91Plink (v1.07) (Purcell et al . 2007) was used to do quality control of the data. The SNPs with 92 low call frequency (<85%) and low minor allele frequency (<5%) were rejected. Finally, 400 93 samples and 90,030 SNPs that were distributed to 30 autosomes and Z chromosome were left for 94 GWAS analysis.95Based on the SNPs, we used ADMIXTURE 1.22 software (Alexander et al . 2009) to calculate 96 the sample’s groupstructure. We assumed that the 400-sample group number (Q value) was 1–15 97 for the cluster analysis and ensured the number of subgroups by the peak ∆Q value positions.98The SNPs that were significantly associated with the phenotypic traits were identified using a 99 TASSEL 3.0 general linear model (GLM, I) and a compressed mixed linear model (MLM, II) 100 (Zhang et al . 2010):101Y = µ + Xα +Qβ +e (1)102Y = µ + Xα +Qβ + Kµ′ +e (2)103Page 5 of 30Poultry ScienceF or Pe er Re vi ew Where Y is the phenotypic value, µ is the fixed effect value vector, X is the genotype, Q is the 104 population structure matrix calculated by the ADMIXTURE program, the proportion of each of 105 the different groups was fitted as a covariate, β is the weight vector of each group, and K is the 106 relative kinship matrix. Xis considered the genotype matrix, while α is the weight vector of each 107 marker and e as the random error. The relative kinship matrix (K) was constructed from 15,719 108 independent SNPs using software SPAGeDi 1.3a (Ou et al . 2009). P values were corrected by 109 Bonferroni (Nicodemus et al . 2005). Here there were 90,030 SNPs and the threshold Bonferroni 110 Pvalue was obtained from the estimated number of independent SNP markers and linkage 111 disequilibrium (LD) blocks. Here the independent SNPs and LD blocks were calculated using the 112 equation r 2>0.4 by Plink v1.07 through all autosomal SNPs and pruned using the indep-pairwise 113 option with a window size of 25 SNPs, a step of five SNPs, and an r 2threshold of 0.4. There were 114 a total of 26,767 SNPs, so the threshold Bonferroni P value of potential significance was 115 3.73E-5(1/26767), and the threshold Bonferroni P value of genome-wide significance was 116 1.87E-6(0.05/26767). Quantile-quantile plots for each trait and Manhattan plots of genome-wide 117 association analyses were produced using software TASSEL 3.0(/).118Results119Analysis of SLAF-seq data and SLAF markers120After SLAF library construction and sequencing, a total of 52.70Gb of raw data, consisting of 121 paired-end reads was obtained with each read being ~80 bp in length after preprocessing. Among 122 them, 86.1% bases were of high-quality, with quality scores of at least 20 (means a quality score 123 of 20, indicating a 1% chance of an error, and thus 99% confidence). In total, 236.07 M reads were124Page 6 of 30Poultry ScienceF or Pe er Re vi ew accuracy paired-ends mapping to chicken reference genome, which paired ends mapping ratio 125 were 71.66%. The numbers of SLAFs in chicken were 103,680, of which the average sequencing 126 depth was5.46 in the chicken. Among these data, 88,135 were polymorphic, giving a 127 polymorphism rate of 85.01%. The number of SLAF markers per chromosome ranged from 1 128 to19,722. The distribution of SLAF in the genome was well proportioned (Figure 1), and we then 129 detected the SNPs among the defined SLAF fragments. After quality control measures, 90,961 130 SNPs distributed among 29 chromosomes (including the Z chromosome) and the mitochondrial 131 genome (Table 1).The average physical distance between two neighboring SNPs was132 approximately 10 kb.133Association between polymorphisms and traits134We made the descriptive statistics of the phenotypic measurements of body composition traits 135 in the 400Jinghai Yellow chickens used for the present GWAS studies are given in Table 2.All 136 non-normal phenotypic data were normalized after Box-Cox or Johnson transformation. The 137 subgroups with a minimum ∆Q peak value were the best. The results indicated that a Q value of 138 10 is the lowest peak value (Figure 2A, B). Based on this result the samples were divided into 10 139 subgroups.140Two statistical methods, compressed mixed linear model (MLM) and generalized linear model 141 (GLM), were implemented to analyze association between SNPs and phenotypes. The results for 142 all SNPs demonstrated to have genome-wide significance (P < 1.87E-6) in GLM lower than the 143 suggested significance (P < 3.73E-5) in MLM. The MLM analysis considers more factors and is 144 stricter than the GLM. Emphasis is placed on the associations revealed by the compressed MLM145Page 7 of 30Poultry ScienceF or Pe er Re vi ew analyses because population structure effect could be controlled and false positives. GLM and 146 MLM help to locate loci that are useful for breeding. This could be reduced with this approach, as 147 shown in Q-Q plots (Figure S1).148Loci and Genes for Body Composition Traits149Carcass weight (CW). One SNP genome-wide significantly associated with CW from GLM that 150 was located in GGA4 1.6kb downstream from the Gallus gallus fibroblast growth factor binding151 protein2 (FGFBP2) gene. The protein encoded by the FGFBP2gene recognizes DNA promoting 152 regions and induce transcription of FGFs. FGFs which induce myoblast proliferation and 153 differentiation of myocytes make important contributions for skeletal muscle development in 154 chickens (Gibby et al . 2009; Felicio et al . 2013). Previous study identified that QTL for body 155 weight, leg length, and leg diameter in GGA2, 4, and 26 in an F2 population of chickens 156 (Ankra-Badu et al . 2010). Another similar study identified a QTL in the GGA4 associated with 157 body weight, carcass weight, breast weight, leg weight, and wing weight (Nassar et al . 2013).Only 158 the locus of the FGFBP2 gene is of suggested significance on MLM analysis.159Foot weight (FW). In the case of FW, one interesting region 75.54–75.67 Mb in length was 160 identified related to FW. There were five significant SNPs of genome-wide significance on GLM 161 analysis. The five SNPs were all clustered in GGA4 within a 0.1-Mb region 162 (75,548,514–75,679,707bp) and located within or 2.1–7.7kb away from three genes, family with 163 sequence similarity 184 name as member B (F AM184B ), Gallus gallus quinoiddihydropteridine 164 reductase (QDPR ), and LIM-domainbinding factor 2(LDB2). On MLM analysis, three of the five 165 SNPs in GGA4 had a genome-wide significant association with FW. They were rs75548810,166Page 8 of 30Poultry ScienceF or Pe er Re vi ew rs75641139, and rs75679707. Rs75548810 is 2.4kb upstream from the F AM184B gene, 167 rs75641139 is 5.7kb upstream from the QDPR gene, and rs75679707 is 7.7kb downstream from 168 the LDB2 gene. Sun also testified that the two genes were the important candidates that influence 169 shank circumference (Sun et al . 2013).Some studies have indicated that the F AM184B gene can 170 influence the daily gain, carcass weight, and ingestion of cattle (Lindholm-Perry et al . 2011). 171 Similar, this gene may have important influence on FW in chicken.172Single wing weight (WW). Four SNPs located on GGA4, GGA18, GGA20, and GGAZ with 173 genome-wide significance for WW were identified by GLM analysis (Table 3). In GGA18,the 174 SNP was located in Gallus gallus zinc finger protein 302 ZNF302). The second SNP was located 175 in GGA20, 1,276bp upstream from uncharacterized protein (PGO2). The third SNP was located in 176 GGAZ, 66.9kb downstream from the SMARCA2 gene. The last SNP in GGA4 was 5.7kb upstream 177 from QDPR . QDPR also had a genome-wide significant association with FW. All of the four SNPs 178 reached genome-wide significance on MLM analysis, and the only difference between the two 179 models was that the P value of MLM was slightly lower than that of GLM.180Single breast muscle weight (BMW) and single leg muscle weight (LMW). Associations 181 identified with the two traits indicate that some SNPs reached suggested significance in GLM 182 butcannot find SNPs reached genome-wide significance on GLM and MLM.183Eviscerated weight (EW). Three genome-wide significant SNPs associated with EW were 184 identified, and all were located in GGA3 on GLM clustered within an 84-kb region. They were 185 located in TULP4, 29kp upstream from the transmembrane protein181 gene (TMEM181). On 186 MLM, the three SNPs were of suggested significance. The TULP4 gene belongs to the tubby187Page 9 of 30Poultry ScienceF or Pe er R e vi ew protein family, which seem to serve as bipartite bridges through their phosphoinositide-binding 188 tubby (Mukhopadhyay et al . 2011). This family has unique amino-terminal functional domains 189 that coordinate multiple signaling pathways, including ciliary G-protein-coupled receptor 190 trafficking and Shh signaling (Mukhopadhyay et al . 2011). Another study also found statistical 191 evidence that TULP4 was a new candidate gene for cleft palate (Vieira et al . 2013). TMEM181 192 belongs to the TMEM family of proteins that encode transmembrane proteins. However, there are 193 no related reports on the TMEM181 gene.194Semi-eviscerated weight (SEW). No SNP reached genome-wide significance on GLM or MLM, 195 yet the three SNPs associated with EW reached genome-wide significance on GLM. This may be 196 because the two traits have some degree of relevance. Another SNP in GGA4 reached the 197 suggested significance level on MLM and was 1,614bp downstream from FGFBP2. This SNP was 198 also associated with CW and they share the same SNP.199Abdominal fat weight (AW).Three SNPs were of genome-wide significance on GGA2 and 200 GGA14 by GLM. One of the two SNPs on GGA2 was 16kb downstream from the Gallus gallus 201 tRNA aspartic acid methyltransferase 1 (TRDMT1) gene, while the other SNP on GGA2 had no 202 annotated genes nearby. The SNP on GGA14 was in an uncharacterized protein (novel gene). On 203 MLM, one SNP on GGA14 and another SNP on GGA5 reached suggested significance. The SNP 204 on GGA5 was in the cell growth regulator with ring finger domain 1(CGRRF1) gene. The 205 Manhattan plots for all the traits with significant SNP are shown in figure 3.206A heatmap (Figure S2) of the eight traits in this analyses show high co line among these traits 207 except AW. These results show that several SNPs were simultaneously associated with several208F or Pe er Re vi ew traits. As shown in Tables 2, the SNPs with genome-wide and suggested significance for eight 209 carcass traits on GLM and MLM analysis.210 Discussion211Genome-wide Association Analysis212The GWAS studies were commonly used to identify economically important production traits in 213 animal studies (Fan et al . 2011; Jiang et al . 2010; Shen et al . 2012). For quantitative traits,214 SLAF-seq technology developed large amounts specific markers with high success rate and low 215 cost (Sun et al . 2013). In the present study, a GWAS analysis between SLAF-SNPs and 216 quantitative traits was evaluated. Here we identified potential loci and candidate genes using 217 SLAF-seq in a conservation population of Jinghai Yellow chicken, the first new chicken varieties 218 authorized by the national commission on livestock and poultry resources in China (Zhao et al . 219 2011; Gu et al . 2011). Regarding statistical methods, we used the TASSEL compressed MLM and 220 GLM to analyze the association between SNPs and phenotypes. The MLM analysis considers 221 more factors and is stricter than GLM. However, MLM analysis may have a certain degree of false 222 negatives, which results in missing some useful SNPs. The two models contrast and complement 223 each other to help us find loci that are really useful for breeding. Most of the traits tested showed 224 considerable ranges between maximal and minimal values (Table 2). It would be expected in a 225 population being maintained for the conservation of genetic diversity. This variability could be 226 increase the power of the GWAS.227Loci and Genes for Traits Related to Body Composition228For CW, one SNP associated the FGFBP2 gene that plays an important role in embryogenesis,229F or Pe er Re vi ew cellular differentiation, and proliferation in chickens. The SNP g.651G>A in FGFBP2 was 230 associated with thawing loss and meat redness (P < 0.05) (Felicio et al. 2013).The present study 231 identified SNPs in FGFBP2 genes located in a QTL region, which corroborates the former reports 232 (Ankra-Badu et al . 2010; Nassar et al . 2013). FGFBP2 gene in the region can influence carcass 233 quality and muscle development.234For FM, the LDB2 and QDPR genes are significantly correlated with Beijing You chicken 235 growth traits (Gu et al . 2011). FW also has a positive correlation to BW, and shank circumference 236 has a direct influence on FW, further suggesting that these two genes are very important candidate 237 genes that influence the Jinghai Yellow chicken FW trait.238For WW, there are four genes ZNF302, PGO2, QDPR, and SMARCA2.ZNF302 belonging to 239 the zinc finger protein family. This protein family is responded to genital malformations 240 (hypospadias) in male Homo sapiens (human) (Gana et al . 2012). QDPR has proven significant 241 correlations with growth, shank circumference, and FW traits as described above. As such, these 242 three genes can serve as new candidate genes for further WW research. Gene 243 Smarca2encodedcomplex ORFs by yielding multiple mRNA variants. This complex alternative 244 splicing of this gene suggest that its functions may be very complex, not just simply inhibiting cell 245 proliferation (Yang et al . 2011).To our knowledge, there have not any reports of the PGO2 gene 246 function until now.247No identified SNP significantly associated with BMW and LMW traits on both GLM and 248 MLM in Jinghai Yellow chicken. These results support that the notion of complexity in the genetic 249 basis underlying BMW and LMW might be influenced by epigenetic factors. For SEW, it shares a250F or Pe er Re vi ew consistent region with EW that has a P value that is slightly higher than 3.73E-5. The result proves 251 the correlation between the two traits and the validity of our result.252As for AW, the two genes are TRDMT1 and CGRRF1. Some studies indicated that TRNM1 253 the smallest mammalian DNA methyltransferase participated in the recognition of DNA damage, 254 DNA recombination, and mutation repair. It also catalyzed DNA methylation at the 5-position of 255 cytosine and is the predominant epigenetic modification in mammals (Subramaniam et al . 2014). 256 In rats, CGRRF1 was bound up with obesity and obesity-associated endometrial cancer. CGRRF1 257 represents a novel, reproducible tissue marker of metformin response in the obese endometrium. 258 Furthermore, CGRRF1 expression may prove clinically useful in the prevention or treatment of 259 endometrial cancer (Zhang et al . 2014).260Relevant SNP association identified by cluster analysis261Heatmap can be useful for performing relevant SNP association analysis. Seven traits in this 262 analyses show obvious association together. The significant results observed in this population 263 showed that 13 genes located in QTL regions. These results can represent possible candidate genes 264 in poultry breeding programs. These results can help to marker-assisted selection for traits of 265 carcass weight, foot weight, single wing weight, single breast muscle weight, single leg muscle 266 weight, abdominal fat weight, eviscerated weight, and semi-eviscerated weight, which these traits 267 are of paramount importance to the poultry industry.268269F or Pe er Re vi ew270 Reference271Abasht, B., and Lamont, S.J. 2007. Genome-wide association analysis reveals cryptic alleles as an272 important factor in heterosis for fatness in chicken F 2 population. Anim Genet . 38(5): 273 491-498.274 Ain Baziz, H., Geraert, P. A., Padilha, J. C. F., and GUILLAUMIN, S. 1996. Chronic heat275 exposure enhances fat deposition and modifies muscle and fat partition in broiler carcasses. 276 Poult Sci . 75(4): 505-13.277 Alexander, D.H., Novembre, J., and Lange, K. 2009. Fast model-based estimation of ancestry in278 unrelated individuals. Genome Res . 19(9): 1655-64.279 Ambo, M., Moura, A.S., Ledur, M.C., Pinto, L.F., Baron, E.E., Ruy, D.C., Nones, K., Campos,280 R.L., Boschiero, C., Burt, D.W., and Coutinho, L.L. 2009. Quantitative trait loci for 281 performance traits in a broiler x layer cross. Anim Genet . 40(2): 200-208.282 Ankra-Badu, G.A., Shriner, D., Le Bihan-Duval, E., Mignon-Grasteau, S., Pitel, F., Beaumont, C.,283 Duclos, M.J., Simon, J., Porter, T.E., Vignal, A., Cogburn, L.A., Allison, D.B., Yi, N., and 284 Aggrey, S.E. 2010. Mapping main, epistatic and sex-specific QTL for body composition in a 285 chicken population divergently selected for low or high growth rate. BMC Genomics. 11: 286 107.287 Atzmon, G., Blum, S., Feldman, M., Cahaner, A., Lavi, U., Hillel, J. 2008. QTLs detected in a288 multigenerational resource chicken population. J Hered . 99(5): 528-38.289F or Pe er Re vi ew Baron, E.E., Moura, A.S., Ledur, M.C., Pinto, L.F., Boschiero, C., Ruy, D.C., Nones, K., Zanella,290 E.L., Rosário, M.F., Burt, D.W., Coutinho, L.L. 2011. QTL for percentage of carcass and 291 carcass parts in a broiler x layer cross. Anim Genet . 42(2): 117-24.292 Cho, Y.S., et al., 2009. A large-scale genome-wide association study of Asian populations293 uncovers genetic factors influencing eight quantitative traits. Nat Genet . 41(5): p. 527-34. 294 Fan, B., Go, M.J., Kim, Y.J et al., 2011. Genome-wide association study identifies Loci for body295 composition and structural soundness traits in pigs. PLoS One . 6(2): p. e14726.296 Fang, M., Nie, Q., Luo, C., Zhang, D., and Zhang, X. 2010. Associations of GHSR gene297 polymorphisms with chicken growth and carcass traits. Mol Biol Rep . 37(1): 423-428. 298 Felicio, A.M., Boschiero, C., Balieiro, J.C., Ledur, M.C., Ferraz, J.B., Moura, A.S., and Coutinho,299 L.L. 2013. Polymorphisms in FGFBP1 and FGFBP2 genes associated with carcass and meat 300 quality traits in chickens. Genet Mol Res . 12(1): 208-222.301 Gana, S., Veggiotti, P., Sciacca, G., Fedeli, C., Bersano, A., Micieli, G., Maghnie, M., Ciccone, R.,302 Rossi, E., Plunkett, K., Bi, W., Sutton, V.R., and Zuffardi, O. 2012. 19q13.11 cryptic deletion: 303 description of two new cases and indication for a role of WTIP haploinsufficiency in 304 hypospadias. Eur J Hum Genet . 20(8): 852-856.305 Gibby, K.A., McDonnell, K., Schmidt, M.O., and Wellstein, A. 2009. A distinct role for secreted306 fibroblast growth factor-binding proteins in development. Proc Natl Acad Sci U S A . 106(21): 307 p. 8585-90.308 Gu, X., Feng, C., Ma, L., Song, C., Wang, Y., Da, Y., Li, H., Chen, K., Ye, S., Ge, C., Hu, X., and309 Li, N. 2011. Genome-wide association study of body weight in chicken F2 resource 310 population. PLoS One . 6(7): e21872.311。

测序 英文医学单词

测序 英文医学单词

测序英文医学单词Title: The Essentials of Sequencing in Medical Sciences.Sequencing, a fundamental component of modern medical research, refers to the determination of the order of nucleotides in a DNA or RNA molecule. This process, often referred to as DNA sequencing, is crucial in various fields of medicine, including genetics, oncology, and infectious disease research. In this article, we delve into the significance of sequencing in medical sciences, its applications, and the latest advancements in this field.The Basics of Sequencing.DNA sequencing involves the analysis of the four nitrogenous bases adenine (A), thymine (T), cytosine (C), and guanine (G) that compose the genetic material of organisms. The order of these bases determines the genetic information encoded within DNA. Sequencing technologieshave evolved significantly over the years, from the earlySanger sequencing method to the modern high-throughput sequencing (HTS) platforms.Applications of Sequencing in Medicine.1. Genetics and Genomics: Sequencing has revolutionized the field of genetics by enabling the identification of genetic variations and diseases caused by mutations. It has been instrumental in the discovery of single-gene disorders, genome-wide association studies (GWAS), and personalized medicine.2. Oncology: Cancer genomics is a rapidly growing field that leverages sequencing technologies to understand the genetic basis of cancer. This information is crucial for developing targeted therapies and personalized treatment plans for cancer patients.3. Infectious Diseases: Sequencing has been key in monitoring the spread and evolution of infectious diseases, such as the COVID-19 pandemic. By analyzing the genomes of pathogens, researchers can track their origin, identifymutations that affect virulence, and develop more effective vaccines and therapeutics.Advancements in Sequencing Technologies.1. Next-Generation Sequencing (NGS): NGS platforms have significantly increased the speed and throughput of sequencing, enabling the analysis of entire genomes in a matter of days or hours. This technology has greatly accelerated research in various medical fields.2. Single-Cell Sequencing: This emerging technology allows researchers to sequence the genomes or transcriptomes of individual cells, providing into insights cellular heterogeneity and complex diseases.3. Long-Read Sequencing: Traditional sequencing methods often produce short reads, limiting the ability to analyze large genomic regions. Long-read sequencing technologies, such as PacBio and Oxford Nanopore, can generate reads several kilobases long, enabling the accurate assembly of complex genomes and the identification of structuralvariations.Challenges and Future Directions.Despite the remarkable progress in sequencing technologies, several challenges remain. Data analysis and interpretation can be complex and require specialized bioinformatics expertise. Additionally, ethical and privacy concerns arise when dealing with personal genetic information.Future directions in sequencing include the further improvement of technology to increase accuracy, reduce costs, and enable sequencing in real-time. There is also a need for more robust data analysis tools and methods to extract meaningful insights from the vast amounts of genetic data generated.In conclusion, sequencing has emerged as a crucial tool in medical sciences, enabling the decoding of life's genetic blueprint. Its applications range from basic research to clinical diagnostics and personalized medicine.With continuous technological advancements, sequencing will play an increasingly important role in medicine, leading to better understanding of human health and disease.。

强迫症,图雷特综合症的遗传结构

强迫症,图雷特综合症的遗传结构

Insights into the genetic architecture of OCD, Tourette syndrome强迫症、图雷特综合症的遗传结构探索An international research consortium led by investigators at Massachusetts General Hospital (MGH) and the University of Chicago has answered several questions about the genetic background of obsessive-compulsive disorder (OCD) and Tourette syndrome (TS), providing the first direct confirmation that both are highly heritable and also revealing major differences between the underlying genetic makeup of the disorders. Their report is being published in the October issue of the open-access journal PLOS Genetics.《PLOS Genetics》(一份开放性的刊物)十月份刊上,刊出一项由麻省综合医院(MGH) 和芝加哥大学的研究人员为首的全球合作化的研究成果。

论文中对“强迫性精神障碍症”(OCD)及“图雷特综合症”(TS)的遗传学结构的一些问题给出了解答。

首次对“这两种疾病均为高度可遗传性”的猜想给出了直接的证据。

揭示了这两种疾病根本的遗传学结构方面的主要差异。

"Both TS and OCD appear to have a genetic architecture of many different genes - perhaps hundreds in each person - acting in concert to cause disease," says Jeremiah Scharf, MD, PhD, of the Psychiatric and Neurodevelopmental Genetics Unit in the MGH Departments of Psychiatry and Neurology, senior corresponding author of the report. "By directly comparing and contrasting both disorders, we found that OCD heritability appears to be concentrated in particular chromosomes - particularly chromosome 15 - while TS heritability is spread across many different chromosomes."“就每个TS症患者和OCD症患者而言,其致病的基因结构存在众多(可达数百种之多)的差异。

蛋白质组学-医学院研究生课程2014-概论

蛋白质组学-医学院研究生课程2014-概论

人类与黑猩猩的基因对比研究

人类基因组有7个区域可能经历了25万年来的 “选择性清洗”,也就是突变基因具有明显竞 争优势。经过数百代繁殖后,突变种变成了种 群里的优势种,相应的突变基因也变成了正常 基因。人类基因组中经过“选择性清洗”的, 就包括与语言相关的基因。

研究人员发现,黑猩猩的Y染色体中有5个基 因已经退化,而人类Y染色体中则没有这种现 象。培格因此表示:“如果说在过去的600万 年中,人类Y染色体有基因遗失的话,这种遗 失程度也是很小的。我认为,我们可以自信地 驳斥Y染色体‘末日’理论。人类Y染色体在 今后的600万年里都不会消失。”
17

18
藏人与汉人的基因组比较

测定了50个藏族人的外显子组。发现了适应高海拔环境的一些候 选关键基因。其中最强的自然选择的信号来自一个叫做内皮PerArnt-Sim结构域蛋白1(endothelial Per-Arnt-Sim (PAS) domain protein 1,EPAS1)的基因。该基因是一个转录因子,与缺氧反 应有关。EPAS1的SNP显示藏族人与汉族人样本有78%的不同。 这是迄今为止在人类基因中发生的最快的等位基因频率的变化。 该SNP与血红蛋白含量的联系证明EPAS1的功能是适应缺氧环境 。因此,通过群体基因组的研究,发现了一个基因与适应高原环 境有关。
16
人类基因组与尼安德特人的比较
The observation that the Neandertal genome appears as closely related to the genome of a Chinese and a Papua New Guinean individual as to the genome of a French individual is particularly surprising as there is, to date, no fossil evidence that Neandertals existed in East Asia or Papua New Guinea. Green et al. thus suggest that gene flow between Neandertals and modern humans occurred prior to the divergence of European and Asian populations. Based on comparative genomic data, as well as a mathematical model of gene flow, the authors further estimate that between 1 and 4% of the genomes of people in Eurasia may be derived from Neandertals.

人类身高的遗传学研究进展

人类身高的遗传学研究进展

Hereditas (Beijing) 2015年8月, 37(8): 741―755 综 述收稿日期: 2015−02−13; 修回日期: 2015−04−20基金项目:国家自然科学基金项目(编号:31260267),新疆大学天山学者特聘教授科研项目(编号:0306407)和新疆生物资源基因工程重点实验室开放课题(编号:XJDX020-2012-040080)资助作者简介: 陈开旭,在读博士研究生,研究方向:群体遗传学。

E-mail: chenkaixu@ 通讯作者:张富春,教授,博士,研究方向:分子生物学。

E-mail: zfcxj@郑秀芬,教授,博士,研究方向:群体遗传学。

E-mail: zhengxf007@DOI: 10.16288/j.yczz.15-082网络出版时间: 2015-6-4 9:25:47URL: /kcms/detail/11.1913.R.20150604.0925.001.html人类身高的遗传学研究进展陈开旭1,王为兰1,张富春1,郑秀芬1,2,31. 新疆大学生命科学与技术学院,新疆生物资源基因工程重点实验室,乌鲁木齐 830046;2. 加拿大西安大略大学病理科,伦敦,N6A5A5;3. 加拿大劳森健康研究所,伦敦,N6A5A5摘要: 人类身高是由环境和遗传因素共同决定的,遗传学研究发现遗传因素对身高差异的影响更大。

身高是典型的多基因遗传性状,科研人员试图运用传统的连锁分析和关联分析寻找和发现对人类身高具有显著影响的常见DNA 序列变异,但进展缓慢。

近年来,随着基因分型和DNA 测序技术的发展,人类身高的遗传学研究取得了很多突破性进展。

全基因组关联分析(GWAS)的应用,发现和证实了上百个与人类身高相关的单核苷酸多态性位点(SNPs),拓展了人们对人类生长和发育的相关遗传学认识,同时也为研究人类其他复杂性状提供了理论依据和借鉴。

本文综述了人类身高的遗传学研究进展,探讨了目前该研究领域所存在的问题和未来发展方向,以期为今后人类身高相关的遗传学研究提供参考和借鉴。

医学信息检索报告

医学信息检索报告

编号:广州医科大学2013-2014年2学期各专业《医学信息检索》公共选修课综合检索报告检索课题名称对心脑血管疾病相关因素和病因的分析报告完成日期 2014.06.02 年级专业姓名学号成绩说明利用所学的医学文献检索知识和检索方法,自拟检索课题,从多方面广泛收集有关资料,完成该课题的综合检索报告。

综合检索报告的选题也可与专业课的老师商量后确定。

一、数据库选择要求:检索范围包括馆藏书目检索、CBM、PubMed、中国学术期刊全文数据库、网络检索工具(搜索引擎)以及你知道的其它网站和数据库等多个数据库,最后对所用数据库进行简要评价。

二、检索年限及其它要求:1.检索年限最好选最近五年或根据课题需要自行调整。

2.“检索词”一项需注明你所使用的检索词的性质,如:主题词/副主题词、关键词等。

3.“检索途径”指通过哪一种辅助索引查找。

4.每个数据库检索完毕后,按要求记录检索表达式和检索结果(检出文献篇数),另外列出6条与课题密切相关的文献题录。

(注:题录要按各数据库的文献记录格式抄录)。

5.检索过程中每一种数据库的检索报告如不够位置书写,可自行增补页数。

6.检索报告用计算机自行打印,请用A4纸。

内文采用宋体5号字。

实习报告将存档,请注意书面整洁,并请自行留底。

7. 综合检索报告将作为期终考试成绩,若缺综合报告,将视作未选修该课程。

检索报告请于6月6日前请交到图书馆六楼医学文献检索教研室(图书馆后座电梯上6楼),电话:81340092。

一、检索(研究)课题概况(一)检索课题简介(300-500字)心脑血管疾病就是心脏血管和脑血管的疾病统称,心脑血管疾病具有“发病率高、致残率高、死亡率高、复发率高,并发症多”即“四高一多”的特点。

由于长时间饮食习惯问题,饮食中脂类过多,醇类过多。

同时又没有合理的运动促进脂类醇类的代谢,导致体内脂类醇类物质逐渐增多,掺杂在血液中,使毛细血管堵塞,随着时间的推移,脂类醇类物质容易和体内游离的矿物质离子结合,形成血栓。

GWAS基因检测

GWAS基因检测

GWAS基因检测GWAS基因检测可以在全基因组范围内进行高通量的大规模筛选,可以发现单基因检测很难发现的遗传变异,检测结果更准确,做到真正的疾病预防:早发现、早预防、早治疗。

国际最新的GWAS全基因组基因检测技术,选取多个疾病密切相关的的基因位点,对SNP进行全面的检测,对易感疾病的判断更敏锐、准确度更高。

GWAS全基因组检测技术具体操作如下:(1)收集对照组和患者样本(组织、血液等),提取DNA(2)进行全基因组单核苷酸多态性(SNP)芯片扫描,获得数据信息(3)采用关联分析,筛选与疾病关联的序列变异的研究该方法试图通过疾病的变异基因和单核苷酸多态性,研究确定疾病发病易感区域和相关基因,寻找疾病的标记物,进行早期诊断和有效的个体化治疗,进行特异性预防措施。

GWAS与以往检测的区别和优点:现在针对我们平台所做的肿瘤套餐介绍如下:我们在GWAS数据库中提取了针对于亚洲人群(主要是日本、韩国、中国、马来西亚等人种)研究的肿瘤相关文献,挑选各个肿瘤的相关疾病位点进行研究。

现针对各个肿瘤介绍其相关位点信息:(1)胃癌基因检测的相关位点:参考文献:1. Zhang H, Jin G, Li H, et al. Genetic variants at 1q22 and 10q23 reproducibly associated with gastric cancer susceptibility in a Chinese population. Carcinogenesis. 2011; 32(6): 848-52.2. Wang N, Zhou R, Wang C, et al. A polymorphism of the interleukin-8 gene and cancer risk: a HuGE review and meta-analysis based on 42 case-control studies. Mol Biol Rep. 2012; 39(3): 2831-41.3. Lu Y, Chen J, Ding Y,et al. Genetic variation of PSCA gene is associated with the risk of both diffuse- and intestinal-type gastric cancer in a Chinese population. Int J Cancer. 2010; 127(9): 2183-9.4. Shi Y, Hu Z, Wu C, et al. A genome-wide association study identifies new susceptibility loci for non-cardia gastric cancer at 3q13.31 and 5p13.1. Nat Genet. 2011; 43(12): 1215-8.(2)、鼻咽癌基因检测的相关位点:(3)、肺癌基因检测的相关位点:参考文献:1. Hu Z, Wu C, Shi Y, et al. A genome-wide association study identifiestwo new lung cancer susceptibility loci at 13q12.12 and 22q12.2 in Han Chinese. Nat Genet. 2011; 43(8): 792-6.2. Ahn MJ, Won HH, Lee J, et al. The 18p11.22 locus is associated with never smoker non-small cell lung cancer susceptibility in Korean populations. Hum Genet. 2012; 131(3): 365-72.3. Yoon KA, Park JH, Han J, et al. A genome-wide association study reveals susceptibility variants for non-small cell lung cancer in the Korean population. Hum Mol Genet.2010; 19(24):4948-54.(4)、甲状腺癌基因检测的相关位点:参考文献:1.Matsuse M, Takahashi M, Mitsutake N, et al. The FOXE1 and NKX2-1 loci are associated with susceptibility to papillary thyroid carcinoma in the Japanese population. J Med Genet. 2011;48(9): 645-8.(5)、前列腺癌基因检测的相关位点:参考文献:1.Takata R, Akamatsu S, Kubo M, Takahashi A, et al. Genome-wide association study identifies five new susceptibility loci for prostate cancer in theJapanese population. Nat Genet. 2010;42(9): 751-4.(6)、子宫内膜癌基因检测的相关位点:参考文献:1. Xu WH, Long JR, Zheng W, et al. Association of the progesterone receptor gene with endometrial cancer risk in a Chinese population. Cancer. 2009; 115(12): 2693-700.2. Long J, Zheng W, Xiang YB et al. Genome-wide association studyidentifies a possible susceptibility locus for endometrial cancer. Cancer Epidemiol Biomarkers Prev. 2012; 21(6): 980-7.(7)、乳腺膜癌基因检测的相关位点:参考文献:1.Long J, Shu XO, Cai Q, et al. Evaluation of breast cancer susceptibility loci in Chinese women. Cancer Epidemiol Biomarkers Prev. 2010; 19(9): 2357-65.2.Kim HC, Lee JY, Sung H, et al. A genome-wide association study identifies a breast cancer risk variant in ERBB4at 2q34: results from the Seoul Breast Cancer Study. Breast Cancer Res.2012; 14(2): R56.3.Long J, Cai Q, Sung H, Shi J, et al. Genome-wide association study in east Asians identifies novel susceptibility loci for breast cancer. PLoS Genet. 2012; 8(2): e1002532.4.Shu XO, Long J, Lu W, et al. Novel genetic markers of breast cancer survival identified by a genome-wide association study. Cancer Res. 2012; 72(5): 1182-9.5.Cai Q, Long J, Lu W, et al. Genome-wide association study identifies breast cancer risk variant at 10q21.2: results from the Asia Breast Cancer Consortium. Hum Mol Genet. 2011; 20(24):4991-9.(8)、结直肠癌基因检测的相关位点:参考文献:1. Xiong F, Wu C, Bi X, et al. Risk of genome-wide association study-identified genetic variants for colorectal cancer in a Chinese population. Cancer Epidemiol Biomarkers Prev 2010 Jul;19(7):1855-61.2. Ho JW, Choi SC, Lee YF, et al. Replication study of SNP associationsfor colorectal cancer in Hong Kong Chinese. Br J Cancer 2011;104: 369-375.以上是我们平台所做的肿瘤套餐基因检测相关位点的信息。

脂蛋白a与冠状动脉粥样硬化疾病的研究进展

脂蛋白a与冠状动脉粥样硬化疾病的研究进展

脂蛋白a 与冠状动脉粥样硬化疾病的研究进展*刘宏1, 黄杨1, 赵莉1, 代巧妹1, 杨婧1, 张雨薇1,张梦迪1, 倪雪妍1, 仲丽丽2△(1黑龙江中医药大学,黑龙江 哈尔滨 150040;2黑龙江中医药大学第一附属医院,黑龙江 哈尔滨 150040)Progress in lipoprotein (a) and coronary atherosclerosisLIU Hong 1, HUANG Yang 1, ZHAO Li 1, DAI Qiaomei 1, YANG Jing 1, ZHANG Yuwei 1,ZHANG Mengdi 1, NI Xueyan 1, ZHONG Lili 2△(1Heilongjiang University of Chinese Medicine , Harbin 150040, China ; 2The First Affiliated Hospital , Heilongjiang Univer⁃sity of Chinese Medicine , Harbin 150040, China. E -mail : zhll 1979@ )[ABSTRACT ] Lipoprotein (a ) [Lp (a )] is a kind of liver -derived lipoprotein , which consists of apolipoprotein (apo ) B100 covalently bound to apoA. Therefore , it has atherothrombotic traits of both apoB (from low -density lipopro‑tein ) and apoA (thrombo -inflammatory aspects ). Recent advances in clinical and genetic research have revealed the cru‑cial role of Lp (a ) in the pathogenesis of cardiovascular diseases. High Lp (a ) concentration has long been linked to in‑creased risk of coronary atherosclerotic disease (CAD ). Although conventional pharmacological therapies have failed tosignificantly reduce Lp (a ) level , emerging new therapeutic strategies using proprotein convertase subtilisin -kexin type 9 inhibitors have shown promising results in effectively lowering Lp (a ). This review summarizes the recent advances in the role of Lp (a ) in the pathogenesis of CAD and the related lipid -lowering drugs.[关键词] 脂蛋白a ;冠状动脉粥样硬化;炎症;遗传;治疗药物[KEY WORDS ] lipoprotein (a ); atherosclerosis ; inflammation ; heredity ; therapeutic drugs[中图分类号] R543.3; R363.2 [文献标志码] Adoi : 10.3969/j.issn.1000-4718.2023.04.019脂蛋白(lipoprotein )作为研究动脉粥样硬化疾病的常客,目前已得到广泛研究。

全基因组关联分析的扩展方法及其在畜禽中应用的研究进展

全基因组关联分析的扩展方法及其在畜禽中应用的研究进展

中国畜牧兽医 2024,51(4):1382-1389C h i n aA n i m a lH u s b a n d r y &V e t e r i n a r y Me d i c i ne 全基因组关联分析的扩展方法及其在畜禽中应用的研究进展谢鑫峰1,王子轶1,钟梓奇1,潘德优1,倪世恒2,肖 倩1(1.海南大学热带农林学院,海口570228;2.海南省畜牧技术推广总站,海口571100)摘 要:全基因组关联分析(GWA S )是一种通过对大规模样本集合进行基因型和表型数据的比较分析,寻找与特定性状相关的遗传变异的方法㊂随着高通量测序技术㊁生物信息学技术和统计学方法的不断发展,一些频率更小的遗传变异或小分子物质能够被更加精准和经济的方式检测㊂基于技术进步衍生出GWA S 的扩展方法,为畜禽精准育种和遗传改良提供了新的思路,其中包括基于拷贝数变异(c o p y nu m b e rv a r i a t i o n ,C N V )㊁结构变异(s t r u c t u r a l v a r i a t i o n ,S V )和串联重复序列(t a n d e mr e p e a t s ,T R )的GWA S 和基于单倍型㊁基因表达和代谢组的GWA S ㊂研究人员期望利用不同分子标记以提供更全面和详细的遗传变异信息来增加GWA S 的解释性和准确性,或通过结合其他类型的数据来进一步解释和深化GWA S 的结果,从而深入研究遗传变异与性状之间联系并确定影响复杂性状的关键基因㊂作者介绍了基于不同分子标记的GWA S 在畜禽研究当中的应用并对其结果进行讨论,分析了不同方法的优势与可行性,为进一步推动GWA S 在畜禽研究中的应用,精准育种和遗传改良提供更多的思路和支持㊂关键词:全基因组关联分析(GWA S);畜禽;分子标记中图分类号:S 813.3文献标识码:AD o i :10.16431/j .c n k i .1671-7236.2024.04.006 开放科学(资源服务)标识码(O S I D ):收稿日期:2023-10-08基金项目:海南省 南海新星 科技创新人才平台项目(N H X X R C X M 202313);国家自然科学基金(32260814);海南省重点研发计划项目(Z D Y F 2024X D N Y 280)联系方式:谢鑫峰,E -m a i l :X I N f e n g 2021@126.c o m ㊂通信作者肖倩,E -m a i l :x i a o qi a n @h a i n a n u .e d u .c n A d v a n c e s i nE x t e n d e dM e t h o d s o fG e n o m e -W i d eA s s o c i a t i o n S t u d i e s a n dT h e i rA p p l i c a t i o n s i nL i v e s t o c ka n dP o u l t r yX I EX i n f e n g 1,WA N GZ i y i 1,Z H O N GZ i q i 1,P A N D e y o u1,N I S h i h e n g 2,XI A O Q i a n 1(1.C o l l e g e o f T r o p i c a lA g r i c u l t u r e a n dF o r e s t r y ,H a i n a nU n i v e r s i t y ,Ha i k o u 570228,C h i n a ;2.H a i n a nP r o v i n c i a lL i v e s t o c kT e c h n o l o g y Ex t e n s i o nS t a t i o n ,H a i k o u 571100,C h i n a )A b s t r a c t :G e n o m e -w i d ea s s o c i a t i o ns t u d i e s (GWA S )i sa m e t h o do fc o m p a r i n gg e n o t y pea n d p h e n o t y p e d a t a o na l a r g e s a m p l e s e t t o i d e n t i f yg e n e t i c v a r i a t i o n s a s s o c i a t e dw i t hs pe c if i c t r a i t s .W i t ht h e c o n t i n u o u s d e v e l o p m e n t o f h igh -t h r o u g h p u t s e q u e n ci n g t e c h n o l o g y,b i o i n f o r m a t i c s t e c h n o l o g y,a n ds t a t i s t i c a lm e t h o d s ,s o m e g e n e t i cv a r i a t i o n so rs m a l lm o l e c u l a rs u b s t a n c e sw i t h l o w e r f r e q u e n c i e sc a nb ed e t e c t e d m o r ea c c u r a t e l y a n de c o n o m i c a l l y.T h ee x t e n s i o n m e t h o do f GWA Sd e r i v e d f r o mt e c h n o l o g i c a l p r o g r e s s p r o v i d e s n e wi d e a s f o r p r e c i s i o nb r e e d i n g an d g e n e t i c i m p r o v e m e n t o f l i v e s t o c ka n d p o u l t r y ,i n c l u d i n g GWA Sb a s e do n c o p y n u m b e r v a r i a t i o n (C N V ),s t r u c t u r a l v a r i a t i o n (S V ),a n dt a n d e m r e p e a t s (T R ),a n d GWA S b a s e d o n h a p l o t y p e s ,ge n e e x p r e s s i o n ,a n d m e t a b o l o m i c s .R e s e a r c h e r sh o p et ou s ed if f e r e n t m o l e c u l a r m a r k e r st o p r o v i d e m o r e c o m p r e h e n s i v e a n dd e t a i l e dg e n e t i c v a r i a t i o n i n f o r m a t i o n t o e nh a n c e t h ei n t e r p r e t a b i l i t y an d a c c u r a c y o fGWA S ,o r f u r t h e r e x p l a i n a n dd e e p e n t h e r e s u l t s o fGWA Sb y c o m b i n i n g o t h e r t y pe s4期谢鑫峰等:全基因组关联分析的扩展方法及其在畜禽中应用的研究进展o f d a t a,i no r d e r t od e e p l y s t u d y t h e r e l a t i o n s h i p b e t w e e n g e n e t i c v a r i a t i o na n d t r a i t s a n d i d e n t i f y k e yg e n e s t h a ta f f e c tc o m p l e xt r a i t s.T h ea u t h o r i n t r o d u c e st h ea p p l i c a t i o no fGWA Sb a s e do n d i f f e r e n tm o l e c u l a r m a r k e r si nl i v e s t o c k a n d p o u l t r y r e s e a r c h a n d d i s c u s s e si t sr e s u l t s.T h e a d v a n t a g e s a n d f e a s i b i l i t y o f d i f f e r e n tm e t h o d s a r e a n a l y z e d,p r o v i d i n g m o r e i d e a s a n d s u p p o r t f o r f u r t h e r p r o m o t i n g t h e a p p l i c a t i o no fGWA S i n l i v e s t o c k a n d p o u l t r y r e s e a r c h,p r e c i s i o nb r e e d i n g, a n d g e n e t i c i m p r o v e m e n t.K e y w o r d s:g e n o m e-w i d e a s s o c i a t i o n s t u d i e s(GWA S);l i v e s t o c ka n d p o u l t r y;m o l e c u l a rm a r k e r s全基因组关联分析(GWA S)是一种基于基因分型和统计学的分析方法,通过测序技术获取物种遗传变异信息,用于研究基因组分子标记与复杂性状(如疾病㊁特定性状等)之间的关系㊂1996年,R i s c h 等[1]首次提出了GWA S的概念,为理解和治疗疾病开辟了新的领域,然而受限于当时的测序技术, GWA S的发展进展缓慢㊂2005年人类基因组测序计划完成了首个人类基因组测序,提供了大量的遗传变异信息,极大地推动了GWA S的发展[2],发现了大量与疾病或特定性状相关的单核苷酸多态性(S N P)位点㊂伴随着主要经济动物的全基因组测序完成,牛[3]㊁家禽[4]㊁猪[5]㊁羊[6]等家养动物的S N P 芯片出现使得GWA S在畜禽研究中广泛应用,在揭示畜禽经济性状或品种特征的遗传机制㊁鉴别复杂疾病抗性相关的基因等方面发挥重要作用,为畜禽品种育种改良提供重要的理论基础㊂值得注意的是,多数畜禽研究中主要是基于S N P分子标记的GWA S分析㊂一方面是当时测序技术与基因分型技术不够成熟;另一方面是早期对人类基因组研究认为S N P是造成表型多样性的主要变异,然而随着研究深入,发现利用基于S N P分子标记的GWA S只能解释复杂性状中部分遗传方差(2%~15%)[7],因此如何解释剩余可遗传变异成为一个主要的问题[8]㊂近年来测序技术与基因分型技术的进一步发展,高通量的测序技术使得获得大量的基因组和转录组数据成为可能[9],使各类遗传变异以及小分子物质(代谢物和长链非编码R N A (l n R N A))的检测变得更加准确和经济㊂这为更多研究团队拓展GWA S分析的方法提供了机会,促进了对复杂性状遗传基础和生物学机制的深入研究㊂目前这些扩展方法在研究人类复杂性状中发挥着重要作用,但在畜禽方面的应用研究仍然较少㊂作者对畜禽研究中使用不同类型分子标记㊁结合其他类型的数据进行GWA S的方法及结果进行讨论,为GWA S在畜禽研究中进一步应用提供帮助㊂1G W A S试验设计及其研究方法1.1G W A S试验设计在畜禽GWA S研究中,常用的试验设计方法包括基于无关个体和基于有亲缘关系群体的设计㊂基于无关个体的研究设计可分为两种情况:针对阈值性状或质量性状的病例-对照研究以及针对数量性状的随机群体关联分析㊂这种设计方法在GWA S 中很常见,通过选择来自不同族群或地理区域的个体以降低个体间的遗传相关性,从而提高检测到真实遗传变异的能力㊂基于有亲缘关系群体的试验设计则是通过选取有亲缘关系的家族成员作为研究对象,利用它们之间的遗传相关性,提高检测到遗传变异的能力㊂具体步骤包括家系招募㊁样本收集㊁家系重建㊁控制群体结构和遗传分析等㊂这种设计方法能够充分利用亲缘关系,增加发现与疾病相关的基因变异的能力㊂1.2G W A S方法在基于个体试验设计和亲缘关系群体试验设计中,都旨在研究基因与性状之间的关联,即可以使用类似的分析方法㊂如涉及质量性状关联分析,通常使用L o g i s t i c回归模型;涉及数量性状,通常使用线性回归模型㊂在L o g i s t i c回归模型中,表型是因变量,基因型是自变量;在线性回归模型中包括一般线性模型和混合线性模型㊂一般线性模型只考虑基因型对表型的直接影响,而不考虑样本之间的遗传相关性㊂然而,在畜禽研究中,亲缘关系的影响无法忽视,因此混合线性模型被广泛应用于畜禽领域的GWA S分析中[10]㊂混合线性模型公式为:Y=Xβ+ Z u+ε,其中Y为表型向量;Xβ为固定效应,如群体结构㊁场效应等;Z u为随机效应与相关矩阵,如添加亲缘关系矩阵用来描述个体之间的相关性从而更准确地估计S N P与表型之间的关联,避免群体分层和亲缘关系对结果的影响;ε是表示未能用自变量解释部分的误差项㊂混合线性模型可以控制种群结构和家系结构对结果的影响,从而减少假阳性的发生,提高检测到真实遗传变异的能力㊂此外,混合线性3831中国畜牧兽医51卷模型还能够适应复杂的遗传模型和多个遗传效应的情况,提供更准确和可靠的研究结果㊂目前已经有许多软件都支持混合线性模型,包括G E MMA㊁G C T A㊁E MMA X和f a s t GWA S软件等㊂2G W A S扩展方法在畜禽中的应用2.1基于拷贝数变异的G W A S拷贝数变异(c o p y n u m b e rv a r i a t i o n,C N V)是哺乳动物基因组遗传变异的重要来源[11],其范围从50b p到几M b不等,主要表现为缺失和重复[12-15]㊂相邻和重叠的C N V区域可以组合成一个大的基因组片段,称为拷贝数变异区(c o p y n u m b e rv a r i a t i o n r e g i o n,C N V R)[16]㊂C N V相较于S N P具有更大的核苷酸序列数量,从而导致更高的突变概率和更显著的潜在影响[17-18],如改变基因结构从而显著影响基因表达和适应性表型,可干扰遗传结构㊁基因调控和表达,并与表型多样性和对局部环境的适应有关[19]㊂越来越多的研究证实,C N V与家畜中具有经济利益的数量性状相关[20-22],这也说明基于C N V 的GWA S研究(C N V-GWA S)可以为家畜育种计划的经济重要性状的遗传控制提供有价值的见解㊂Q i u等[23]的研究旨在通过C N V-GWA S揭示影响猪培育过程中生长和肥满丰度性状的遗传机制㊂使用I l l u m i n aG e n e S e e k50K S N P芯片对6627头杜洛克猪(3303头美国杜洛克,2677头加拿大杜洛克)进行基因分型,采用P e n n C N V软件对杜洛克猪的C N V进行检测,选取具有全覆盖C N V序列的C N V R作为群体的C N V R,共鉴定出了953个非冗余的C N V R,覆盖了猪的常染色体基因组的246.89M b(约占总基因组的10.90%)㊂其中, 802个C N V R出现在美国杜洛克猪中,499个C N V R出现在加拿大杜洛克猪中,有348个C N V R 被这两个种群共享㊂采用G E MMA软件中的混合线性模型进行GWA S分析,发现了35个与生长和肥胖性状显著相关的C N V R㊂F e r n a n d e s等[24]在研究中统计肉鸡在不同年龄阶段(1㊁21㊁35㊁41和42日龄)体重,在42日龄时采集了血样进行D N A提取,使用600K A f f y m e t r i xA x i o m基因分型阵列对约1461只肉鸡进行基因分型后,使用P e n n C N V进行C N V检测,共鉴定出23214个常染色体C N V, 5042个C N V R,覆盖鸡常染色体基因组的12.84%㊂通过混合线性模型的分析发现,C N V片段位于与生长和发育相关的基因附近,如钾内向整流通道亚家族J成员-11(K C N J11)㊁肌源性分化-1(M y o D1)和S R Y转录因子-6(S O X6)㊂这些基因在先前的研究中已被证实与生长和发育过程密切相关㊂为了验证这些显著的C N V片段,作者使用实时荧光定量P C R技术进行了验证,并观察到92.59%符合率,说明这些C N V片段的存在是可靠的㊂这些结果进一步支持了C N V在调控生长和发育过程中的潜在作用㊂2.2基于串联重复序列与结构变异的G W A S串联重复序列(t a n d e mr e p e a t s,T R)是基因组中一种特殊的D N A序列,由相同或相似的短序列单元重复排列而成,分为两种类型:①短T R,由2~ 6个碱基对组成;②可变数目T R,由>7个碱基对的重复单元组成[25]㊂重复单元数目在个体之间和种群之间发生变化,这种变异性是由D N A复制过程中的聚合酶滑移或非同源重组事件所造成的㊂越来越多的研究表明,T R在基因组中广泛存在,并且在基因组演化和功能调控中发挥着重要的作用,可能对基因组的稳定性㊁基因表达的调控以及基因功能的多样性产生影响[26-27]㊂结构变异(s t r u c t u r a l v a r i a t i o n,S V)通常指的是基因组中较大的遗传变异,包括插入㊁删除㊁倒位㊁重复和转座等,大量研究已证实S V广泛存在于物种基因组内,对表型影响广泛[28-29]㊂尽管S V的大小和复杂性使得它们在GWA S中的分析和解读相对较复杂,但S V对复杂性状的贡献程度高于S N P, S V与表型多样性之间的关联更为显著[30-31]㊂研究S V可以加深对群体进化㊁表型多态性和功能基因组的理解,对于阐明驱动复杂性状变异的生物学机制将是特别有益的[32]㊂目前许多研究不仅证实S V 突变基因在人类精神障碍中起着显著的作用[33-35],也被证实与家畜的经济性状相关[36-38]㊂F a l k e r-G i e s k e等[39]通过GWA S揭示蛋鸡羽毛啄食行为(F P)的遗传机制㊂早期研究发现,F P行为与免疫系统㊁生物钟和觅食行为有关㊂此外,m i R N A的生物合成失调㊁γ-氨基丁酸(G A B A)系统的紊乱以及神经发育缺陷被认为是影响F P行为倾向的因素㊂研究人员分析了来自两个鸡品种试验群体的全基因组测序数据,将基于S V和T R进行GWA S得到的结果与基于S N P和插入缺失变异(I n D e l)的全基因关联分析结果数据共同分析F B的遗传机制㊂结合表达定量性状位点分析发现多个影响G A B A相关的基因和信号通路的变异,包括G A B A受体亚单位β-3(G A B R B3)基因下游的一个显著相关的T R,两个靶向作用于G A B A受体基因的微小R N A以及48314期谢鑫峰等:全基因组关联分析的扩展方法及其在畜禽中应用的研究进展G A B A受体聚集的直接调控因子肌萎缩蛋白(D M D)㊂此外还发现转录因子E T S变异体-1(E T V1)与23个基因的差异表达相关,表明E T V1㊁S MA D家族成员-4(S MA D4)和K L F转录因子-14 (K L F14)一起在啄羽鸡的神经发育紊乱中起到重要作用㊂B l a j等[40]针对平均日增重(A D G)㊁背膘厚度(B F T)㊁肌肉纤维直径(M F R)㊁肌肉横断面积(C R C L)等性状选取4个不同猪品种的杂交F2代猪为研究对象,采用全基因组测序来获取变异信息,以S V㊁T R作为分子标记,使用混合线性模型进行GWA S,将基于S V和T R的GWA S结果与基于S N P和I n D e l的GWA S的结果相比较,发现约56.56%的显著变异(S V㊁T R)与之前针对相同性状的S N P和I n D e l的关联研究中的显著位点不在高连锁不平衡(L D)范围内;进一步研究认为,这些未被标记的变异在标准的GWA S研究中可能被忽略,随后对这些未被标记的变异进行研究,确认了许多已知的候选基因,并发现了新的候选基因,如S H3和多肽结合区域蛋白-2(S H A N K2)㊁3-羟基-3-甲基戊二酸裂解酶样蛋白-1(HMG C L L1)㊁D i s h e v e l e d 介导的肌动蛋白重排蛋白-2(D A AM2)和胶原蛋白21α1链(C O L21A1)等㊂此外,研究还发现S V或T R与l n R N A的相关性,表明它们在调控表型性状中的功能重要性㊂这些研究结果为了解S V和T R 的特征以及它们在关联研究中的作用提供了见解,并建议将这些变异纳入未来的基因组范围关联研究中,以深入了解驱动复杂性状变异的生物学机制㊂2.3基于单倍型的G W A S单倍型(h a p l o t y p e)是染色体上一段连续的基因序列变体的组合㊂其在染色体上形成的连续的㊁稳定的㊁几乎不被重组所打断的单倍型区域,称为单倍型块(h a p l o t y p eb l o c k)㊂单倍型块是一组相邻的S N P标记,与数量性状基因座(Q T L)的连锁不平衡程度比单个S N P更高[41-42]㊂基于单倍型的GWA S (h GWA S)是以同一条染色体上多个位点上等位基因的集合为基础,进行疾病或者性状与单倍型的关联分析㊂单倍型分析在标记物鉴定方面具有很高的信息量,相较于基因组中特定区域的单个S N P, h GWA S可能有助于从数据集中提取更多信息[43]㊂大量研究表明,相较于单个标记位点的基因定位方法,h GWA S能够提供更强的检测力[44-45]㊂Z h a n g 等[46]基于鸡60K S N P芯片对475只肉鸡使用h GWA S分析,针对瘦肉型和肥育型两个肉鸡品系的腹脂含量进行了研究,结果发现共有132个单倍型块与腹部脂肪重量显著相关,这些窗口主要位于2㊁4㊁8㊁10和26染色体上㊂在这些相关区域内,包括音速刺猬信号分子(S HH)㊁肢体发育膜蛋白-1 (L M B R1)㊁纤维细胞生长因子-7(F G F7)㊁白细胞介素-16(I L16)㊁外周脂蛋白-1(P L I N1)㊁胰岛素样生长因子-1受体(I G F1R)和溶质载体蛋白家族16成员-1(S L C16A1)共7个候选基因可能在控制腹部脂肪含量中起重要作用㊂此外,还发现3号和10号染色体上的两个区域与睾丸重量显著相关,并鉴定了转录因子-21(T C F21)和T C F12等潜在重要候选基因㊂此外,21号染色体上的T N F受体超家族成员1B(T N F R S F1B)㊁前胶原赖氨酸-2-酮戊二酸-5-双氧酶-1(P L O D1)㊁钠尿素肽-C(N P P C)㊁甲基四氢叶酸还原酶(MT H F R)㊁麝香酸酯基B型受体-2 (E P H B2)和溶质载体家族35成员-A3(S L C35A3)共6个候选基因可能在骨骼发育中起重要作用㊂S r i v a s t a v a等[47]基于芯片数据对887只韩宇牛进行基因分型,通过S N P的数量㊁基因组区域的长度和连锁不平衡3种方法来构建单倍型块㊂通过h GWA S发现了与重要经济性状如胴体特征㊁眼肌面积㊁胴体重量和肉质大理石纹理评分相关的重要单倍型块和基因㊂如位于13号染色体上的磷脂酶Cβ-1(P L C B1)和P L C B4基因,编码磷脂酶,可能在脂质代谢和脂肪生成过程中起重要作用,有助于增加脂肪沉积;11号染色体上的羧酯脂肪酶(C E L),是一种胆盐激活的脂肪酶,负责脂质分解代谢过程㊂综上,基于h GWA S的研究结果能够将单倍型块与已知的功能注释信息关联起来,提供更多关于关联位点的生物学意义和功能预测㊂通过注释分析,可以进一步了解关联位点可能参与的生物过程和功能通路㊂2.4基于基因表达的全基因组关联分析基因组技术的发展使得可以利用新方法将基因型和中间表型(如l n R N A,m R N A表达数据)进行整合,以检测与基因表达水平相关的D N A变异并与复杂性状相关联[48]㊂基于基因表达的GWA S (e GWA S)是一种增强型全基因组关联研究方法,通过把基因的表达量作为数量性状,与基因型数据进行关联分析的方法[49],旨在更全面㊁精确地识别基因与复杂性状之间的关联㊂通过分析基因表达水平与基因型之间的关联,来探究遗传变异对基因表达的调控作用㊂M e s a s等[50]选用355头伊比利亚猪与3个猪品种(长白猪㊁杜洛克猪㊁皮兰特猪)的回交个体为研究对象,通过实时定量P C R测量的肌肉基5831中国畜牧兽医51卷因表达值和所有染色体分布的38426个S N P s的基因型之间进行了e GWA S,鉴定了186个位于猪染色体区域的e S N P s与酰基辅酶A合酶中链家族成员-5(A C S M5)㊁乙酰辅酶A合酶短链家族成员-2 (A C S S2)㊁激活转录因子-3(A T F3)㊁二酰甘油酰基转移酶-2(D G A T2)㊁F o s原癌基因(F O S)和胰岛素生长因子-2(I G F2)基因的表达相关㊂其中,I G F2和A C S M5的两个表达量性状位点(e Q T L s)被归类为顺式作用e Q T L s,表明同一基因的突变会影响其表达水平㊂另外作者还发现10个e Q T L s对基因表达产生了转座调控效应,即这些位点的突变会对其他基因的表达产生影响㊂当针对每个回交独立进行e GWA S时,只观察到3个共同的转座e Q T L s区域,表明不同的调控机制或不同品种之间的等位基因频率存在差异㊂这项研究结果为更好地理解肌肉中脂质代谢基因的功能调控机制提供了新的数据,以进一步了解基因对肌肉功能和代谢的调控㊂W a n g等[51]通过GWA S和e GWA S相结合的方法对雄性麻鸭头部墨绿色相关特征个体差异的遗传机制进行研究㊂GWA S结果显示,有165个显著的S N P s与71个候选基因,其中有4个与雄性麻鸭头部墨绿色特征个体差异相关的基因,包括钙电压门控通道亚单位A l p h a1(C A C N A1I)㊁WD重复结构域-59(WD R59)㊁G蛋白亚单位A l p h a-O1 (G N A O1)和钙电压门控通道辅助亚单位A l p h a2 D e l t a-4(C A C N A2D4)㊂同时e GWA S结果发现3个位于L O C101800026和突触足蛋白-2(S Y N P O2)两个候选基因范围内的S N P s与酪氨酸酶蛋白1 (T Y R P1)基因表达水平存在相关性㊂这些S N P可能是影响雄性麻鸭头部皮肤T Y R P1基因表达水平的重要调控因子㊂研究数据还显示,转录因子MA X互作蛋白-1(M X I1)可能调控T Y R P1基因的表达,从而导致雄性麻鸭头部呈现墨绿色特征的差异㊂这项研究提供了初步的数据,用于进一步分析鸭羽毛颜色的遗传调控机制㊂需要注意的是基因表达受到时间㊁空间和环境因素的影响,因此e GWA S 分析可能需要考虑不同因素与基因表达之间的相互作用,以获取更全面和准确的分析结果㊂2.5基于代谢组学的G W A S代谢物作为基因转录和蛋白质表达的最终产物,对生物体的生长和健康至关重要,被认为是基因型和表型之间联系的桥梁㊂目前在畜禽动物研究中已经发现代谢物与部分经济性状如饲料利用率[52]㊁生长性能[53]和动物健康[54]密切相关㊂代谢组学结合基因组学和转录组学被广泛应用于分析代谢多样性和途径,为研究基因与表型之间的关系提供了强大工具[55-56]㊂基于代谢组学的GWA S(m GWA S)是将代谢物作为表型并与基因型数据进行关联分析的方法,是鉴定代谢表型潜在遗传变异的有力工具,使得了解代谢多样性的遗传基础及其与复杂性状的相关性成为可能㊂L i u等[57]为了研究肉类营养和风味的遗传和生化基础,选择423只北京油鸭ˑ连城鸭的F3代作为研究对象,采集了6周龄鸭的胸肌样本㊂通过对这些样本进行代谢组学分析,得到了3431种代谢物和702种挥发物的数据㊂在m GWA S中,鉴定出了2862个关联信号,并找到了可能调节代谢物和挥发物水平的48个重要候选基因㊂这些研究结果为肌肉代谢的遗传和生化基础提供了新的见解的同时也为改良动物肉类品质提供了理论基础㊂T i a n等[58]使用非靶向代谢组学方法研究鸡的血清代谢物,选取两个不同鸡品系建立的杂交品种为研究对象,采取血清进行全面的代谢组学检测,对检测到的7191种代谢物利用代谢表型进行GWA S,发现在整个鸡基因组中,有10061个显著的S N P s与253种代谢物相关联,并广泛分布于整个基因组㊂许多功能基因影响代谢物的合成㊁代谢和调控,研究结果有利于提高对鸡血清代谢谱和代谢表型的理解,为进一步研究鸡经济性状的机制和定位提供了可靠的基础㊂目前代谢组学已经在动物研究中发挥重要作用,然而一些代谢物的功能尚不清楚,可能影响m GWA S分析结果,因此需要进一步的功能注释和基因组学研究来揭示其潜在的生物学机制等㊂3小结与展望GWA S已广泛应用于畜禽物种的遗传研究中,在揭示重要经济性状相关的基因和遗传变异以及确定与生产性能㊁抗病性㊁品质特征等相关遗传变异等方面发挥重要作用㊂这些发现有助于深入理解畜禽性状的遗传机制,为选择和育种提供了重要的遗传标记㊂传统基于S N P的GWA S存在着一定的局限性,S N P是基因组中最常见的遗传变异,某些S N P 可能与多个表型相关从而导致结果解释的复杂性,从而无法精准定位与特定性状相关的功能基因或位点,通常需要进一步的功能注释和试验验证㊂随着高通量测序技术㊁生物信息学和统计学的发展,研究人员希望利用不同分子标记以提供更全面和详细的遗传变异信息来增加GWA S的解释性和准确性或68314期谢鑫峰等:全基因组关联分析的扩展方法及其在畜禽中应用的研究进展通过结合其他类型的数据来进一步解释和深化GWA S的结果,从而衍生出GWA S的扩展方法为畜禽精准育种和遗传改良提供了新的思路和方法㊂需要注意的是,不同的分析方法都有其局限性,如C N V或S V这类复杂的遗传变异往往需要大规模样本才能捕捉到显著的关联性,不同的检测方法可能导致试验结果不一致;在e GWA S分析中由于动物基因表达在不同时间空间或环境下均可能不同进而会影响e GWA S的分析结果;m GWA S分析结果中一些代谢物的功能尚不清楚,需要进一步的功能注释和基因组学研究来揭示其潜在的生物学机制等㊂总体而言,不同分子标记和研究对象的GWA S 研究各有优势和局限性㊂局限性并不意味着没有价值,相反它们为遗传研究提供了重要的突破口和启示㊂研究人员应该在研究中根据试验目的,如探索特定基因与表型之间的关系或者了解基因功能㊁调控和表达模式,结合实际情况综合利用这些多样化的数据和方法,也可以结合多种方法助于更全面㊁深入地理解基因与表型特征之间的关联㊂未来的GWA S研究将更多地采用全基因组测序数据以获取更多的遗传信息㊂为实现致因突变位点的精细定位㊁动植物的优良育种以及个性化治疗等目标,GWA S仍然需要不断发展和完善㊂除了注重生物信息学技术和生物统计方法的发展,结合其他组学方法进行多组学分析外,整合不同层面的遗传和表型数据,对于理解基因变异对生物学功能和疾病风险的影响机制同样重要,通过对大量已发现的遗传变异进行功能注释和解读,以获取更全面和准确的信息㊂这将有助于畜禽精准育种从而提高畜禽的生产性能㊁适应性和抗病能力,为畜禽产业的健康可持续发展提供更好的支持㊂参考文献(R e f e r e n c e s):[1] R I S C H N,M E R I K A N G A S K.T h ef u t u r eo f g e n e t i cs t u d i e s o f c o m p l e xh u m a nd i s e a s e s[J].S c i e n c e,1996,273(5281):1516-1517.[2] H I R S C HH O R N J N,D A L Y M J.G e n o m e-w i d ea s s o c i a t i o ns t u d i e sf o rc o m m o n d i s e a s e s a n d c o m p l e xt r a i t s[J].N a t u r e R e v i e w sG e n e t i c s,2005,6(2):95-108.[3] M A T U K U M A L L ILK,L A W L E Y CT,S C H N A B E LRD,e t a l.D e v e l o p m e n t a n d c h a r a c t e r i z a t i o n of a h ig hd e n s i t y S N P g e n o t y p i n g a s s a y f o rc a t t l e[J].P L o SO n e,2009,4(4):e5350.[4] G R O E N E N M A,M E G E N SHJ,Z A R EY,e t a l.T h ed e v e l o p m e n t a n dc h a r a c t e r i z a t i o no f a60K S N Pc h i pf o rc h i c k e n[J].B M C G e n o m i c s,2011,12(1):1471-2164.[5] R AMO SA M,C R O O I J MA N SR P,A F F A R A N A,e t a l.D e s i g nof ah ig hd e n s i t y S N P g e n o t y p i n g a s s a yi n t h e p i g u s i n g S N P s i d e n t i f i e da n dc h a r a c t e r i z e db yn e x t g e n e r a t i o n s e q u e n c i n g t e c h n o l o g y[J].P L o SO n e,2009,4(8):e6524.[6] Q I A O X,S U R,WA N G Y,e t a l.G e n o m e-w i d e t a r g e te n r i c h m e n t-a i d e dc h i p d e s i g n:A66K S N P c h i pf o rc a s h m e r e g o a t[J].S c i e n t i f i cR e p o r t s,2017,7(1):8621.[7] M C C A R R O L L S A.E x t e n d i n g g e n o m e-w i d ea s s o c i a t i o n s t u d i e s t o c o p y-n u mb e r v a r i a t i o n[J].H u m a nM o l e c u l a rG e n e t i c s,2008,17(R2):R135-42.[8]王继英,王海霞,迟瑞宾,等.全基因组关联分析在畜禽中的研究进展[J].中国农业科学,2013,46(4):819-829.WA N GJY,WA N G H X,C H IR B,e t a l.R e s e a r c hp r o g r e s s o f g e n o m e-w i d e a s s o c i a t i o n a n a l y s i s i nl i v e s t o c k a n d p o u l t r y[J].S c i e n t i a A g r i c u l t u r a eS i n i c a,2013,46(4):819-829.(i nC h i n e s e) [9] L IY,C H E N L.B i g b i o l o g i c a ld a t a:C h a l l e n g e sa n do p p o r t u n i t i e s[J].G e n o m i c s P r o t e o m i c sB i o i n f o r m a t i c s,2014,12(5):187-189.[10] Z HA N G Z,B U C K L E R E S,C A S S T E V E N S T M,e t a l.S of t w a r e e ng i n e e r i n g th e mi x e d m o d e l f o rg e n o m e-w i d e a s s o c i a t i o n s t u d i e s o n l a r g e s a m p l e s[J].B r i e f i n g s i nB i o i n f o r m a t i c s,2009,10(6):664-675.[11] HU L,Z HA N G L,L IQ,e t a l.G e n o m e-w i d ea n a l y s i so fC N V s i nt h r e e p o p u l a t i o n so fT i b e t a ns h e e p u s i n gw h o l e-g e n o m e r e s e q u e n c i n g[J].F r o n t i e r s i nG e n e t i c s,2022,13:971464.[12] F E U KL,C A R S O N A R,S C H E R E RS W.S t r u c t u r a lv a r i a t i o ni nt h eh u m a n g e n o m e[J].N a t u r eR e v i e w sG e n e t i c s,2006,7(2):85-97.[13] Z A R R E IM,MA C D O N A L DJR,M E R I C O D,e t a l.Ac o p y n u m b e r v a r i a t i o n m a p o f t h e h u m a ng e n o m e[J].N a t u r eR e v i e w sG e n e t i c s,2015,16(3):172-183.[14] MA C D O N A L DJ R,Z I MA N R,Y U E N R K,e t a l.T h e d a t a b a s e o f g e n o m i c v a r i a n t s:Ac u r a t e d c o l l e c t i o no fs t r u c t u r a l v a r i a t i o n i n t h e h u m a n g e n o m e[J].N u c l e i cA c i d sR e s e a r c h,2014,42:D986-92.[15] MA HMO U D M,G O B E T N,C R U Z-DÁV A L O SDI,e t a l.S t r u c t u r a lv a r i a n tc a l l i n g:T h el o n g a n d t h es h o r t o f i t[J].G e n o m e B i o l o g y,2019,20(1):246.[16] R E D O N R,I S H I K AWAS,F I T C H K R,e t a l.G l o b a l7831。

2021年腹膜透析领域进展

2021年腹膜透析领域进展

一、AQP1启动子变异影响腹膜透析的超滤及预后腹膜透析超滤量的控制不如血液透析容易,对于大多数残余肾功能较差的PD患者,超滤显著影响其生活质量及生存率。

Morelle等人今年在《新英格兰医学杂志》上发表的研究,揭示了腹膜透析超滤量个体差异的机制:编码原型水通道蛋白-1(water channels Aquaporin 1,AQP1)的基因变异性影响其活性,进而影响超滤量和生存。

Morelle等使用细胞、小鼠模型和从人体样本进行了研究,收集了7个队列中1,851例腹膜透析患者的临床和遗传数据,发现AQP1启动子常见变异rs2075574与腹膜超滤相关。

rs2075574 TT基因型携带者(10%~16%的患者)的平均净超滤水平低于CC基因型携带者(35%~47%的患者)。

在探索阶段TT基因型携带者与CC基因型携带者的净超滤量分别为506±237 ml和626±283 ml(P=0.007),在验证期则分别为368±603ml和563±641 ml(P=0.003)。

平均随访944天后,898例患者中有139例(15%)死亡,280例(31%)已转为血液透析。

TT携带者发生复合结局(死亡或技术失败)的风险高于CC携带者(HR:1.70;95%CI:1.24~2.33;P=0.001),全因死亡风险也更高(24% vs. 15%,P=0.03)。

在机制研究中,rs2075574危险变异与AQP1启动子活性和水通道蛋白-1表达水平降低,以及与葡萄糖驱动的渗透性水转运减少相关。

在动物模型中使用艾考糊精作为渗透剂后,AQP1 /和AQP1 / 水转运的差异性明显缩小。

这一研究提示我们在选择透析方式时需检测AQP1 rs2075574位点的基因型,可能预测PD治疗失败的风险。

对于AQP1基因型有害的患者,可考虑应用胶体渗透剂。

rs207 5574位点基因的靶向药物具有广阔的研究前景。

全基因组关联分析

全基因组关联分析

全基因组关联分析(Genome-wide association study or GWAS)人类基因包含着百万种序列变异,它们对于疾病的形成或者对患者药物的反应程度有直接或间接的影响.全基因组关联分析是指在人类全基因组范围内找出存在的序列变异,即单核苷酸多态性(SNP),从中筛选出与疾病相关的部分。

此项技术能够一次性对疾病进行轮廓性概览,在全基因组层面上,开展多中心、大样本、反复验证基因与疾病的关联研究,全面揭示疾病发生、发展,以及与治疗相关的遗传基因。

随着人类基因组学的大幅度进步和基因测序的飞速进展,这种最新的研究方式开始大规模应用于筛选与人群复杂疾病和药物特异性相关的序列变异。

进行全基因组关联分析研究时,通过采集某类疾病患者与非患者两类人群的DNA,在基因芯片上读出DNA中的序列变异,然后用生物工程技术进行分析比较。

若某些基因变异在患者人群中非常普遍,则该序列变异是与此种疾病‘相关’的。

有了全基因组关联分析,今后从事疾病诊断,患者对药物的反应程度的研究,可以集中于这些与疾病‘相关’的序列变异,从而显著缩短研究时间,提高研究效率。

全基因组关联分析是研究人类复杂疾病的一项重大突破,其优势在于:1 高通量 --- 一个反应监测成百上千个序列变异;2 不只局限于“候选基因”,基因可以是“未知”的;3无需在研究之前构建任何假设。

2005年,Science杂志报道了第一项具有年龄相关性的黄斑变性全基因组关联分析研究,之后陆续出现有关冠心病、肥胖病、II型糖尿病、甘油三酯、精神分裂症以及相关表型的报道。

由此可见,全基因组关联分析研究作为一种全新的疾病研究方式,自人类基因测序大规模展开以来,就被医学界广泛接受和应用。

截止到2010年12月,世界范围内进行了超过1200项针对200多种疾病的全基因组关联分析研究,找到4000多个‘相关’的序列变异。

在全基因组关联分析研究中,SNP基因芯片(SNP array)扮演了非常重要的角色。

gwas经典文章

gwas经典文章

GWAS (Genome-Wide Association Study) 是一种用于研究人类基因组与复杂疾病之间关联的方法。

以下是一些经典的GWAS文章:1. "Genome-wide association study of 17,000 cases of seven common diseases and 3,000 shared controls"(Nature,2007):这篇文章报道了人类基因组中与七种常见疾病相关的数千个基因变异。

2. "Common variants on chromosome 9p21 are associated with coronary heart disease"(Nature,2007):这篇文章揭示了染色体9p21上的一个基因区域与冠心病之间的关联。

3. "Genome-wide association study identifies novel loci associatedwith waist-hip ratio and provides insights into the pathophysiology of body fat distribution"(Nature Genetics,2013):这篇文章研究了与腰臀比相关的基因变异,并为身体脂肪分布的病理生理学提供了见解。

4. "Meta-analysis identifies new loci associated with body mass index and provides insights into the relationship between waist-hip ratio and height"(Nature Genetics,2013):这篇文章探讨了与身体质量指数和腰臀比相关的基因变异,以及它们与身高的关系。

生物学中的因果性与因果选择

生物学中的因果性与因果选择

圆园21年第2期国际学术动态2020年1月6日,由北京大学哲学系主办的“生物学中的因果性与因果选择”国际研讨会在北京大学如期举办。

本次会议的6位报告人中,2位外宾分别是来自悉尼大学的Andrew Latham 博士和来自澳洲麦考瑞大学的Perrick Bourrat 博士;4位中方报告人分别是来自复旦大学的吴东颖博士(台湾学者)、清华大学的吴小安博士、上海交通大学的周理乾博士和来自台湾大学的陈乐知博士(香港学者)。

对于自然科学如生物学来说,其一大目标是总结和发现自然现象之间的因果关系。

对于因果性究竟是什么,因果关系如何理解和定义,日常所认为的因果关系是什么,以及科学的方法如何发现不同变量间的因果关系等问题,是哲学家、科学哲学家、科学家等始终致力于研究的主题。

在因果的数学建模领域,Judea Pearl 和Peter Spirtes,Clark Glymour and Richard Scheines 两个团队将近二十多年的研究成果———基于因果图模型和结构因果模型进行因果分析———成为因果讨论中的一匹黑马。

最近一、两年,哲学、逻辑学、人工智能、社会科学和认知科学等均试图运用此工具为各自领域提供新的思路。

本次会议邀请了结构因果建模方面的国内专家吴小安和吴东颖博士,生物学哲学家Pierrick Bourrat 和周理乾博士,以及有深厚的传统哲学功底的Andrew Latham 和陈乐知博士,希望来自三个领域的思想碰撞能够为因果建模在生物学中的运用提供充分的理解甚至更为深刻的洞见。

会议由吴东颖博士的报告“Degrees of Causal Contribution ”开场。

不同的原因往往对同一结果有不同程度的贡献,而量化此种贡献度是本报告的主题。

最近国际文献中有不少尝试,将原因的概念理解为充分造成结果的必要元素,且在这样的概念中讨论如何将因果贡献度量化。

目前仍少有将因果贡献度的概念以概率来量化。

吴博士的报告提到两项研究结果:第一结果是利用结构方程模型来量化分析因果贡献度;第二结果是评估第一结果和先前国际研究成果的实用性比较。

2021年基于生物质谱数据鉴定单核苷酸变异的生物信息学方法

2021年基于生物质谱数据鉴定单核苷酸变异的生物信息学方法

基于生物质谱数据鉴定单核苷酸变异的生物信息学方法在肽段鉴定领域,图谱库搜索是一种有望取代序列数据库搜索的鉴定策略,下面是搜集的一篇探究生物质谱数据鉴定单核苷酸变异的,欢迎阅读参考。

单核苷酸变异(singlenucleotidevariations,SNVs)是由DNA序列上单个碱基变异产生的,包括碱基的缺失、插入、转换及颠换等.SNVs是基因组序列变异的主要形式[1],同时也是生物体生理和病理变异的遗传基础[2].从遗传学的角度看,SNVs既可以存在于具有遗传性的生殖细胞中,也可以存在于不具有遗传性的体细胞中.其中,只有位于基因编码区的SNVs能够影响蛋白的编码.位于编码区的SNVs可以分为3类:(ⅰ)同义SNVs,不改变相应的氨基酸种类;(ⅱ)无义SNVs,突变成为终止 ___子,提早结束编码;(ⅲ)非同义SNVs(nonsynonymousSNVs,nsSNVs),改变氨基酸的种类.nsSNVs能够改变蛋白的结构、功能、表达以及亚细胞定位等[3],进而对多种遗传性的特征、疾病以及癌症等产生影响[4~9],如人类耳垢的类型[6]、腋窝的气味[7]、癌症与肿瘤的发生[8]、阿尔茨海默病[9]以及镰刀形红细胞贫血症[10]等.因此,对SNVs展开研究可以揭示出基因与表型多样性和基因与疾病间的关系,并且有可能研发出治疗疾病的新方法.目前,全基因组关联研究(genome-wideassociationstu ___s,GWAS)[11]虽然在基因变异与表型多样性的研究中产出了许多能够用来解释特异性疾病分子途径的结果,但是仍然难以对绝大部分具有复杂特征的分子机制以及SNVs与复杂疾病表型间的关系进行解释[12].在这种情况下,对突变蛋白的研究提供了另一种了解基因型与表型间关联的方法[13].由SNVs引起的单个氨基酸的变异称为单氨基酸变异(singleaminoacidvariations,S ___s),因此S ___s是SNVs在蛋白水平上的表现.对S ___s的研究,有助于了解基因型与表型间的关系,进而从本质上了解基因是怎样在蛋白水平上影响生物体的生命过程的[14].目前,基于串联质谱的鸟枪法蛋白质组学(shotgunproteomics)技术由于其自动化、高通量、高灵敏度和高分辨率等特点,已成为大规模蛋白质研究的主要方法.序列数据库搜索算法由于具有较高的可靠性以及灵敏度而成为当今鸟枪法蛋白质组学中蛋白鉴定的主要生物信息学方法.然而,通常蛋白质数据库在构建时为了减小数据库的冗余程度,往往有意压缩对S ___s信息的收录(如Swiss-Prot数据库[15,16],IPI数据库[17]等),从而使得常用的数据库搜索策略不能有效地鉴定出样本中的氨基酸突变信息.为此,研究人员提出了一系列鉴定突变蛋白的方法,如构建包含有突变信息的蛋白质数据库、构建相似性图谱库等.在基于串联质谱进行S ___s鉴定时,可以采用与蛋白质翻译后修饰(post-translationalmodifications,PTMs)鉴定[18]相同的方法,这是因为肽段的突变和修饰在质谱图中的表现都是质量迁移,如甲硫氨酸(Met)氧化与丙氨酸(Ala)突变为丝氨酸(Ser)在质量上都是增加16Da[19],所以鉴定PTMs的算法和流程通常也能够鉴定S ___s(如Bonanza算法[20]).虽然PTMs和S ___s的质谱鉴定方法非常相似,但由于其上的差别,在实际的鉴定策略中有所不同.(ⅰ)PTMs的种类远比S ___s要多,鉴定PTMs所需的搜索空间一般会比鉴定S ___s 所需的大,在质量控制方面具有更大的挑战;(ⅱ)蛋白水平的S ___s 大部分是从基因组或转录组延续过来的,充分利用SNVs的数据能大大降低搜索空间,从而得到更可靠的结果.因此在计算方法与策略方面,S ___s和PTMs的鉴定具有一定的相似性,也有其独有的特点.本文从序列数据库搜索算法、序列标签搜索算法以及图谱库搜索算法3个大方面,详细地介绍了目前基于生物质谱数据鉴定S ___s 的各种生物信息学方法,并分析了各种突变鉴定方法的不足之处,最后介绍了基于生物质谱的S ___s鉴定研究现状及其发展方向.当前基于生物质谱的S ___s鉴定算法都是由常规鉴定算法改进而来的,因此根据常规串联质谱鉴定算法中对数据库的依赖程度以及使用的数据库种类,可以将基于生物质谱的S ___s鉴定算法分为3大类(表1):(ⅰ)完全依赖序列数据库的搜索算法,即基于序列数据库搜索的氨基酸突变鉴定算法.此算法利用前体离子质量从序列数据库中筛选出候选肽段,然后将候选肽段的理论图谱与目标图谱进行比对,从而鉴定出样品中的突变肽段;(ⅱ)将从头测序算法(denovo)与序列比对结合的算法,即基于序列标签的氨基酸突变鉴定算法.此算法首先通过denovo测序算法推导出目标图谱中的肽序列标签(peptidesequen ___tags,PSTs),然后利用PSTs过滤数据库筛选出候选肽段,最后结合PSTs对理论谱图与目标图谱进行比较打分,从而鉴定出样品中的突变肽段;(ⅲ)依赖于图谱库的搜索算法,即基于图谱库的氨基酸突变鉴定算法.此算法将实验图谱与图谱库中的一致性图谱进行比对,从而鉴定出样品中的突变肽段.这3类方法和策略在实施过程中各有其优劣(表1),相互之间暂无法替代,因此在不同的目的下各有其适用性.1.1基于序列数据库搜索的氨基酸突变鉴定算法基于序列数据库搜索的氨基酸突变鉴定算法,根据不同的数据库构建方法可以细分为3类:(ⅰ)基于穷举法的氨基酸突变鉴定算法,即通过枚举数据库中氨基酸残基的所有可能突变种类进行突变肽段的鉴定;(ⅱ)结合已知氨基酸突变信息对突变肽段进行鉴定,即结合当前变异数据库(如dbSNP数据库[21]、CO ___IC数据库[22]等,表2列举了常用的氨基酸与基因突变数据库)中的变异信息构建数据库进行突变肽段的鉴定;(ⅲ)基于样本特异性的数据库鉴定突变肽段,即结合样本数据中可能存在的突变肽段信息构建数据库进行突变肽段的鉴定.以下将对这3种方式进行逐一详细地说明.(1)基于穷举法的氨基酸突变鉴定算法.在序列数据库搜索中,最早对突变肽段进行鉴定的自动化方法是穷举法,此方法不仅原理简单而且理论上能够鉴定出样品中所有可能的突变肽段.这类算法的大体步骤是:通过穷举法罗列出所有可能的突变肽段序列,然后用常规鉴定方法进行比对打分筛选出最有可能的突变肽段序列.此类算法的代表有SEQUEST-SNP算法[27]和Siprosv2.0算法[18]等.Gatlin等人[27]在2000年,利用改进的SEQUEST算法(SEQUEST-SNP)率先实现了利用自动化的数据库搜索对突变肽段进行鉴定.此方法特点在于动态生成所有可能的核苷酸突变序列,将其翻译成肽段并构建成一个数据库用于对突变肽段的鉴定.此后,通过穷举蛋白序列中所有可能的氨基酸突变进行肽段突变鉴定的方法在Mascot[28]和X!Tandem[29]相继采用.xx年,Hyatt和Pan[18]提出了不受数据库约束的穷举法突变肽段鉴定算法Siprosv2.0,此算法通过肽段产生模块和肽段打分模块实现对CPU和内存效率的优化以应对穷举法产生的大数据库.理论上,穷举法能够鉴定出样品中所有的突变肽段,但肽段中的每一个氨基酸残基都有18种可能的突变,因此利用此方 ___大大增加搜索空间[18,24],延长搜索时间,并且会增加假阳性风险从而降低结果的灵敏度.(2)结合已知氨基酸突变信息对突变氨基酸进行鉴定.为了避免穷举法引起搜索空间过大的问题,一些团队提出结合已知的编码SNVs信息或是与疾病等有关的突变信息构建蛋白质数据库,以减小突变肽段的搜索范围.此类数据库的代表有MSIPI[17]和MS-CanProVar[24]等.xx年,Schandorff等人[17]将一些dbSNP数据库[21]的编码SNP(singlenucleotidepolymorphi ___)以及与IPI(theinternationalproteinindex)数据库中数据有冲突的序列等整合到IPI数据库[30]中构建了质谱友好型的变异数据库MSIPI.其质谱友好型体现在,在保留原始IPI条目完整性的基础上,将后加的肽段序列附加到原有序列中,用不代表任何氨基酸的字母"J"将原始条目与附加肽段区分开来,并且将在原始条目的表头信息中加入附加肽段信息.同年,Bunger等人[31]也利用dbSNP数据库中人类基因变异信息构建变异蛋白质数据库K-SNPdb,并构建相应的常规数据库.然后对分开搜库结果进行比对打分,筛选出高可信的变异肽段.Li等人[24]在xx年基于人类癌症蛋白质变异数据库CanProVar[32]构建了一个MS-CanProVar数据库,此数据库中不仅包含了dbSNP数据库中的编码的SNP信息,还包括了CO ___IC[22]和OMIM[23]等数据库中与癌症相关的体细胞变异信息.除了自定义构建突变数据库以外,氨基酸突变信息也被一些在线平台收录、整合,如Swiss-Var[33],SysPIMP[34]和RAId_DbS[35]等.Swiss-Var ___搜集的是Swiss-Prot数据库[36]中突变肽段的信息,主要为用户提供Swiss-Prot数据库中的突变肽段信息及其与疾病间的关系.SysPIMP主要用于鉴定与人类疾病有关的突变肽段序列,它的数据主要OMIM数据库中等位基因突变信息、蛋白质突变数据库(protei ___utationdatabase,PMD)[37]以及Swiss-Prot数据库中与人类疾病和多态性有关的序列信息.而在RAId_DbS数据库中不仅整合了S ___s与疾病的信息,同时也收录了PTMs与疾病有关的信息.xx年,Mathivanan等人[25]提出的iMASp策略即是利用现有的突变信息对突变肽段进行鉴定.这种策略利用了分步搜索的方法,即是第一次通过常规搜索鉴定出样本中的常规蛋白,第二次利用突变数据库对第一次没有鉴定出的质谱图进行搜索鉴定样品中的突变肽段.相比穷举法,结合已知氨基酸突变信息对突变氨基酸进行鉴定的方法虽然在一定程度上缩小了搜索空间,但在数据库中添加的上万条突变肽段序列绝大部分不会在样品数据集中出现.因此,这种方法并没有十分有效地规避假阳性升高以及鉴定结果灵敏性降低的缺点[14].(3)基于样本特异性的数据库鉴定突变肽段.除了直接利用公共数据库中的突变数据外,利用DNA/RNA等信息提供的样本特异性突变构建的数据库能更好地贴合实际样本数据,提高鉴定效率.目前利用样本特异性鉴定突变肽段的方法有2种:两次搜索数据库的方法以及利用转录组数据构建数据库的方法.两次搜索数据库的方法与iMASp策略中所使用的分步搜索以及Mascot和X!Tandem中的容错搜索相似,不同的地方在于两次搜索数据库中所使用的突变数据库依赖于样本特异性的DAN/RAN信息,而iMASp策略中的突变数据库是整合所有已知的蛋白突变信息,不具有样本特异性;Mascot和X!Tandem则是对第一次搜索所得的蛋白序列进行穷举从而鉴定出突变或修饰肽段.Chernobrovkin等人[38]提出的二次迭代法以及Su等人[39]构建样本特异性突变数据库的策略都是样本特异性的两次搜索方法的代表.另一种方法是利用转录组数据构建样本特异性数据库用于突变肽段的鉴定.相对于利用公共的突变数据库,利用转录组数据构建蛋白质数据库可以由样品转录组数据直接推导样本中可能存在的蛋白及其突变序列并由其构建数据库[40].用此方法构建的数据库所包含的蛋白质信息更加接近样品中真实信息,因此这种无偏性的数据库能高效地鉴定出样品中存在的突变序列[16,41].由于转录组数据十分庞大,在现有的计算能力下要想利用转录组数据构建数据库就必须要对转录组数据进行压缩.xx年,Edwards[16]提出了一个压缩表达序列标签(expressedsequen ___tags,ESTs)数据的策略,实现了利用EST 数据库进行常规化的肽段序列和变异位点的鉴定.此压缩策略的特点在于选用某种方法来表示肽段,确保绝大多数的重复肽段序列被消除,并且不影响肽段序列的鉴定.随着下一代测序(nextgenerationsequencing,NGS)技术的出现,RNA测序(RNA-sequecing,RNA-Seq)的成本越来越低[14],并且克服了EST测序存在的克隆偏性和高花费等缺点[42],因此利用RNA-Seq数据构建样本特异性数据库逐渐受到人们的重视.Wang等人[41]在xx年提出了一个利用RNA-Seq数据构建样本特异性数据库的策略,此策略通过两步来实现:(ⅰ)利用一个经验性的RPKM(readsperkilobasespermillionreads)值排除不表达或低表达基因以减小数据库中的条目;(ⅱ)将由RNA-Seq数据鉴定得来的高可靠性SNVs的相应肽段添加到数据库中,以寻找变异肽段.此后,Wang和Zhang[43]为生成自定义RNA-Seq数据库编写了R程序包customProDB,能够生成含有突变、插入、缺失等变异肽段的RNA-Seq数据库.xx年,Sheynk ___n等人[14]实践了Wang和Zhang[43]的方法,利用Jurkat细胞系的RNA-Seq数据构建一个自定义的变异蛋白质数据库,并成功地应用在Jurkat细胞系的质谱数据突变鉴定中.同年,Woo等人[44]在尽量不影响鉴定结果灵敏性的基础上,将秀丽隐杆线虫(Caenorhabditiselegans)的RNA-Seq数据压缩了近1000倍,并利用此数据库成功地鉴定到了新型蛋白.由于并不是所有的样本都同时拥有蛋白质数据和RNA-Seq数据,因此,Wang和Zhang[43]利用64个大肠癌的RNA-Seq数据构建了一致性蛋白质数据库,并成功地将此数据库应用在蛋白鉴定中.样本特异性的数据库,特别是利用RNA-Seq数据构建的样本数据库不仅能够有效地缩减搜索空间,而且能够鉴定出样品中所有已知类型的蛋白种类以及新型的变异肽段序列.随着计算方法的不断改进,通过RNA-Seq数据对样本进行突变肽段的鉴定方法有望成为常规的突变鉴定方法.(4)基于序列数据库搜索的氨基酸突变鉴定算法的缺点.在鉴定突变肽段的方法中,虽然通过构建含有突变信息的序列数据库鉴定突变肽段的方法是目前被最广泛采用的方法,但它的缺点也是不容忽视的.(ⅰ)除了利用穷举法构建的突变数据库以外,利用其他方法构建的突变数据库对突变信息包含得都不够全面,如公共数据库通常会有意忽略对变异数据的收录,而样本特异性数据库为了减小搜索空间通常也会去除低表达的蛋白质;(ⅱ)序列数据搜索中,当图谱中的碎裂信息不够完整、信噪比较低时,搜索引擎就不能将候选肽段正确地区分开[45],因而会增加假阳性的概率.为了避免序列数据库的上述缺点,提出了其他鉴定突变肽段的方法,如序列标签算法、图谱库搜索算法等.1.2基于序列标签的氨基酸突变鉴定算法相比序列数据库搜索算法利用肽段母离子质量从数据库中筛选候选肽段,序列标签算法利用denovo测序算法推导的PSTs能够更有效地过滤数据库,减少候选肽段的数目以缩小搜索空间,使得更复杂和计算更密集的方法能够应用到对候选肽段的突变打分算法中[45],从而提高了突变鉴定结果的灵敏性并且减少了结果中的假阳性率.下面从序列标签搜索算法与denovo测序算法之间的关系以及当前结合PSTs进行氨基酸突变鉴定的主流工具两个方面对序列标签算法鉴定突变氨基酸进行介绍.(1)序列标签搜索算法与denovo测序算法.相比序列数据库搜索算法,denovo算法在对质谱图进行氨基酸序列推导时不依赖蛋白质数据库,因此它在鉴定氨基酸突变方面有独特的优势[45~47].当前使用denovo测序算法的代表性工具有SHERENGA[48],PEAKS[49~51]以及PepNovo[52]等.这些工具所使用的算法都是通过生成前缀残基质量图谱(prefixresidue ___ssspectra)重构整个图谱进行肽段序列推导的,因此这些算法对质谱图的质量具有较高的要求[45].但通过诱导碰撞解离(collision-indu ___ddissociation,CID)产生的串联图谱中不可避免地含有不完整的碎裂离子系列、噪音离子和精度较差的碎裂离子质量,这使得denovo算法常常产生一些不确定的序列区域,导致denovo算法通常只能准确地推导出肽段序列中的部分序列[46].因此,结合denovo算法鉴定的部分肽段序列进行数据库搜索的序列标签算法应运而生,这种算法不仅可以利用denovo推导出的PSTs作为筛选候选肽段时的过滤指标,有效地减少搜索空间,而且可以通过改变PSTs与候选肽段匹配的打分算法,提高对突变肽段的鉴定效率.(2)结合肽序列标签的氨基酸突变鉴定算法.最早结合PSTs进行数据库搜索的方法是由Mann和Wilm[53]在1994年提出的,此方法不仅能有效地对常规图谱进行鉴定,而且能够鉴定出带有突变或修饰图谱的肽段序列.当前结合肽序列标签对氨基酸突变进行鉴定的算法或程序有GutenTag程序[54],Opensea工具[55],SPIDER程序[56,57],InsPecT搜索引擎[45],DirecTag算法[58]以及MoDa算法[59]等.鉴定突变氨基酸常用的序列标签软件及其网址见表3.GutenTag是由Yates实验室 ___出来的能够自动推导+2电荷母离子串联图谱PSTs用于数据库搜索的算法,其特点是利用碎片离子峰强度经验模型并结合相邻氨基酸和碎片离子的相对质量对肽段碎裂的影响推导PSTs,之后用多个PSTs进行搜库,同时放宽对PST两端质量匹配的限制,从而能够有效地进行突变肽段的鉴定.但由于GutenTag算法没有考虑同源突变或修饰,所以此算法只能对数据库中已存在的突变序列进行鉴定,并且由于在打分方面存在漏洞[55],所以鉴定出来的结果中存在较高的假阳性.在GutenTag算法发表后的第2年,Searle等人[55]首次将序列标签算法的思想应用于非限制翻译后修饰,并提出了基于质量的序列比对算法工具Opensea.Opensea的特点是利用基于质量的宽度优先的算法("breadth-firstsearch"algorithm)鉴定出突变位点或修饰位点.但宽度优先的算法是一种贪婪的匹配算法,并且在Opensea中没有考虑在一个位点上同时存在denovo的测序错误和同源突变的情况,所以它不能保证最终结果的可靠性.SPIDER方法与Opensea工具有相似的序列标签算法思想,但与Opensea不同的是,它能够在一个位置上同时考虑denovo的测序错误和同源突变的情况,并且利用动态规划算法进行比对打分.SPIDER算法已被整合进PEAKS软件中,专门用来对突变肽段和跨物种的同源性肽段进行鉴定.在GutenTag算法推出后,Pevzner实验室迅速推出了InsPecT序列标签算法搜索引擎[45],它是最早实现规模化鉴定翻译后修饰肽段的搜索工具,现在仍然被广泛使用.InsPect搜索引擎推导PSTs的算法的特点在于利用改进的denovo算法推导出PSTs作为过滤器缩小候选肽段的范围,并利用树状快速搜索方法(fasttree-basedsearch)找出与PSTs匹配的候选肽段,用基于动态规划算法(dynamicprogramming)的图谱比对方法鉴定修饰肽段,并在打分算法中考虑肽段的碎裂模式.在推导PSTs时,InsPecT需要构建前缀残基质量图,而DirecTag算法则是直接利用串联图谱的质核比值和峰强度信息对可能的标签进行打分.由于DirecTag只能用来推导PSTs,因此其团队后续 ___了TagRecon算法[47]并将DirecTag,TagRecon 和IDPicker工具[60]整合成鉴定突变和修饰肽段的流程,其大致过程为:(ⅰ)利用DirecTag生成PSTs;(ⅱ)TagRecon利用PSTs对常规数据库进行候选肽段过滤,并且定位数据集中的突变或修饰肽段;(ⅲ)利用IDPicker工具对鉴定结果进行质量控制并且装配成蛋白.此流程算法在xx年由Abraham等人[19]在鉴定胡杨树(Populus)单氨基酸多态性的实验中被成功地使用.目前序列标签算法都依赖于denovo测序构建PSTs,但是由denovo 算法测出的肽片段往往存在部分构建错误的序列[56].MoDa算法[59]在搜索候选肽段时,由于采用序列标签链算法(ta___hainalgorithm)[61],能有效地避免由denovo测序引起的错误匹配.在MoDa算法中,将序列标签算法和动态规划算法结合,同时利用多条序列标签与候选肽段进行比对,找出存在质量差的位点,然后利用基于动态规划算法的图谱比对算法找出最佳的肽段序列.此方法能够大规模地鉴定出存在多个修饰位点或突变位点的肽段.(3)基于序列标签的氨基酸突变鉴定算法面临的问题.基于肽段序列标签的氨基酸突变序列鉴定算法虽然能够有效地利用PSTs过滤数据库,弥补denovo测序算法的测序错误并且提高对突变或修饰肽段鉴定的效率和准确性,但目前已有的PSTs算法仍然存在着许多不足,如在GutenTag算法中没有考虑同源突变或修饰,所以不能鉴定出数据库中不存在的突变序列,而在Opensea软件中没有考虑到突变位点的出现可能是由denovo的测序错误引起的等.但是图谱质量是限制序列标签算法的主要因素,因为低能CID碎裂模式通常很难将质量相同或相近的碎裂离子区分开来,如亮氨酸(Leu)和异亮氨酸(Ile)、赖氨酸(Lys)和谷氨酰胺(Gln)以及苯丙氨酸(Phe)和氧化的甲硫氨酸(Met)等[46].近年来,随着电子转移解离(electrontransferdissociation,ETD)和高能碰撞解离(high-energycollisionindu ___ddissociation,HCD)的出现,越来越多的比CID质谱图质量高的、含有丰富的碎裂离子信息的.高精度质谱图被产出,这些高精度的质谱图能更好地适用于序列标签算法,提高其准确性.1.3基于图谱库搜索的氨基酸突变鉴定算法在肽段鉴定领域,图谱库搜索是一种有望取代序列数据库搜索的鉴定策略[62].相比序列数据库搜索策略,图谱库搜索策略有以下优点:(ⅰ)直接利用图谱库中每一张真实图谱的各种不同的特征信息进行比对,如碎片离子峰的峰强度信息、碎裂模式等,使图谱比对算法具有更高的灵敏性;(ⅱ)能够在一个更小、更精确的搜索空间内进行搜索,可以比序列搜索速度快好几个数量级;(ⅲ)能够轻松地鉴定出图谱库中已存在的变异肽段[63].对于依赖于图谱库搜索的蛋白突变鉴定来讲,目前最大的限制图谱库的覆盖范围,尤其是对突变和修饰肽段图谱的包含[63,64].由于在相似的条件下,肽段的图谱具有可再生性[65]并且相似序列的肽段通常能够产生相似的质谱图[20,66],因此一批利用图谱库中已收录的肽段图谱来扩大图谱库对肽段的覆盖范围,以实现对氨基酸突变进行鉴定的算法或工具应运而生.目前常用的图谱搜索软件及其网址见表4.在蛋白质组学中,图谱库搜索概念早在1998年就由Yates等人[70]率先提出,但由于质谱仪通量不高、生物质谱数据缺乏以及质谱数据的自动化分析方法不完善等[71]原因使得图谱库搜索策略发展缓慢.直到最近10年,随着质谱和计算机技术的快速发展,鉴定出的肽段图谱匹配对(peptidespectrum ___tch,P ___)的数目与日俱增,图谱库搜索策略才逐渐被应用到大规模数据集和数据库中.最近,图谱库搜索策略更是被用于发掘样品中的突变肽段.要用图谱搜索策略来鉴定样品中的突变肽段,就必须要扩大图谱库对突变肽段的覆盖范围.目前用于扩大图谱库覆盖范围的算法有pMatch[63]、半经验算法[72,73]以及Ji等人[66]提出的相似性算法等.pMatch在构建图谱时,利用肽段已知的实验图谱和理论图谱混合构建图谱,用来缓冲由修饰或突变氨基酸残基引起的肽段碎裂模式的变化[64].由Hu等人[72]在xx年提出的半经验方法通过利用图谱库中已收录的P___s构建突变肽段的质谱图以扩大对突变肽段的覆盖范围.这种算法把图谱库中图谱对应的肽段序列替换为相应的突变肽段序列,并将突变肽段的碎裂离子的质核比值替换到图谱中.xx年,Ji等人[66]提出的相似性算法通过利用相似序列肽段的图谱来推断目标肽段的图谱,以达到扩充图谱库的覆盖范围的目的.这种算法的特点是,通过加权K邻近相似算法[66](weightedK-nearestneighbormethod)和支持向量机(supportvector ___chine,SVM)[74],利用与目标肽段序列相似且长度相等的肽段的图谱来精确地预测目标肽段序列的优势碎裂离子(如b,y离子类型以及其中性丢失离子类型等)的峰强度,并且利。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Genome-Wide Association Studies For Common Disease and Complex Traits
2005.7.12 생물정보학 이은지
Contents





What is Association Study? Why Genome-wide association studies? What has made these studies possible? Markers for genome-wide Association Studies. Allelic spectra of common disease Strategies to increase efficiency Avoiding false positive association Analyzing gene-gene interactions Conclusions and future directions
LD metrices


D :deviation of haplotype frequencies from the equilibrium state D = f(AB)-f(A)f(B) D’: standardization of D r 2 : statistical coefficient determination
The missense approach

Proposed by Botstein and Risch Focused on missense SNPs Limitation
1.
2.
Monogenic disorder highly penerant of causal allele, severe phenotype, severe change in protein function Complex trait alleles have more subtle effects on disease risk and might be more likely to include non-coding regulatory variants with a modest impact on expression


Most of the genome falls into segments of strong LD, within which variants are strongly correlated with each other, and most chromosomes carry one of only a few common haplotype Predict that in regions of genome with strong LD, a selection of evenly spaced SNPs, or those chosen in the basis of their LD with other SNPs (tag SNPs), can provide adequate ‘coverage’ of the region in an association studies
1.
Linkage studies

Achieved only limited success for most common disease.
1.
2.
3. 4. 5. 6.
Low heritability of most complex traits The inability of the standard set of microsetellites Imprecise definition of phenotype Inadequately powered study designs Difficulty of getting large pedigrees Less powerful for identifying genetic variants sease
Candidate-gene resequencing studies


Hypothesis based studies, genes are selected for further study Resequencing the entire gene in patient and controls, searching for a variant or set of variants that is enriches or depleted in disease genes Limitation
D2 r f (a) f ( B) f ( A) f (b)
2

-inversely related to the required sample size for association mapping, give a fixed genetic effect 2 r Relationship between D’ and
1.
Candidate-gene association studies

Limitation
Require prediction the identity of the correct gene or genes, on the basis of biological hypotheses or the location of the candidate within a previously determined region of linkage 2. Inadequate to fully explain the genetic basis of the disease when the fundamental physiological defects of a disease are unknown
Glossary

Linkage Disequilibrium Haplotype Penetrance Heritability Population stratification SNPs LD blocks Tag SNPs
What is Association Study?
Association study

Assess correlations between genetic variants and trait differences on a population scale.
Genome-wide association study

A dense set of SNPs across the genome is genotyped to survey the most common genetic variation for a role in disease or to identify the heritable quantitative traits that are risk factors for disease.

Markers for genome-wide association studies
Selection of Markers
LD-based markers The missense approach A convenience-based approach

LD-based marker selection
1.
Genome-wide association studies

No assumption are made about the genomic location of the causal variants
What has made these studies possible?
Requirement in genome wide association studies

Compare frequency of alleles or genotypes of a particular variant between case & controls Advantage
Cheaper and simpler 2. Powerful means of identifying the common variants that underlie complex traits 3. Indirect LD based method
Linkage vs. Association

Assumption co-inheritance of adjacent DNA variants 1. Linkage
-inherited intact over several generations
1.
Association
-retention of adjacent DNA variants over many generations. -very large linkage studies of unobserved, hypothetical pedigrees. -population stratification, linkage, mutation, natural selection
1.
2.
Laborious, expensive Properly interpreting the results might be challenging, particularly when considering rare noncoding variants.
Association studies
r D f (a) f (B) / f ( A) f (b)
2 '2
Tag SNPs
相关文档
最新文档