Accuracy of Genomic Selection Methods in a Standard
智能科技与创新英语作文
智能科技与创新英语作文The rapid advancement of technology has transformed our world in unprecedented ways. Intelligent technology, in particular, has emerged as a driving force behind innovation, revolutionizing industries and shaping the future of our society. From artificial intelligence (AI) to the Internet of Things (IoT), these cutting-edge technologies are redefining the boundaries of what is possible, paving the way for a more connected, efficient, and innovative future.One of the most significant impacts of intelligent technology has been in the field of artificial intelligence. AI systems, powered by complex algorithms and vast amounts of data, are capable of performing tasks that were once thought to be the exclusive domain of human intelligence. These systems can analyze and interpret data, make decisions, and even learn and adapt over time. The applications of AI are far-reaching, spanning industries such as healthcare, finance, transportation, and even creative fields like art and music.In the healthcare sector, for instance, AI-powered systems are beingused to assist in the early detection and diagnosis of diseases. By analyzing medical images, patient data, and genomic information, these systems can identify patterns and anomalies that might be missed by human clinicians. This not only improves the accuracy of diagnoses but also allows for earlier intervention and more personalized treatment plans. Additionally, AI-powered robotic surgeons are being developed, capable of performing complex procedures with greater precision and consistency than human surgeons.Similarly, in the financial industry, AI is transforming the way investment decisions are made. Algorithmic trading systems, powered by AI, can analyze market data and make investment decisions in real-time, often outperforming human traders. These systems can also detect patterns and anomalies in financial data, helping to identify potential risks and opportunities that might be overlooked by traditional methods.The rise of the Internet of Things has also been a significant driver of innovation in intelligent technology. The IoT refers to the interconnected network of devices, sensors, and systems that can communicate with each other and exchange data. This has led to the development of smart homes, where appliances, lighting, and security systems can be controlled and monitored remotely, improving efficiency and convenience.In the realm of transportation, the IoT has enabled the development of autonomous vehicles, which can navigate roads and make decisions without human intervention. These self-driving cars have the potential to reduce accidents, ease traffic congestion, and provide mobility to those who are unable to drive. Additionally, the integration of IoT technology in logistics and supply chain management has led to more efficient and transparent tracking of goods, reducing waste and optimizing delivery times.Beyond these practical applications, intelligent technology is also transforming the creative industries. AI-powered tools are being used to generate music, art, and even poetry, challenging traditional notions of creativity and authorship. While these AI-generated works may not yet match the depth and nuance of human creations, they are pushing the boundaries of what is possible and opening up new avenues for artistic exploration.However, the rise of intelligent technology also presents a number of challenges and ethical considerations. As these systems become more sophisticated and autonomous, there are concerns about job displacement, privacy, and the potential for AI to be used for malicious purposes. Policymakers, industry leaders, and the public must work together to address these issues and ensure that the benefits of intelligent technology are realized in a responsible andethical manner.Despite these challenges, the future of intelligent technology remains bright. As researchers and innovators continue to push the boundaries of what is possible, we can expect to see even more transformative advancements in the years to come. From improving healthcare outcomes to revolutionizing transportation and enhancing creative expression, intelligent technology has the potential to create a better, more sustainable, and more equitable world for all.In conclusion, the impact of intelligent technology on innovation is undeniable. As we continue to harness the power of AI, IoT, and other cutting-edge technologies, we must do so with a keen eye towards the ethical implications and a commitment to using these tools to improve the human condition. By embracing the transformative potential of intelligent technology, we can unlock new possibilities and pave the way for a future that is more connected, efficient, and innovative than ever before.。
基因组选择在猪杂交育种中的应用
Hereditas (Beijing) 2020年2月, 42(2): 145―152 收稿日期: 2019-08-28; 修回日期: 2020-02-08基金项目:国家重点研发计划项目(编号:2017YFD0501504,2016YFD0501308)和国家现代农业产业技术体系建设专项资金(编号:CARS-36)基金资助[Supported by the National Key Research and Development (Nos. 2017YFD0501504, 2016YFD0501308) and Special Fund forthe Industrial Technology System Construction of Modern Agriculture (No. CARS-36)]作者简介: 杨岸奇,博士研究生,研究方向:猪遗传育种。
E-mail: yanganqi90@通讯作者:陈斌,教授,博士生导师,研究方向:猪遗传育种。
E-mail: chenbin7586@冉茂良,博士,研究方向:猪遗传育种。
E-mail: ranmaoliang0903@DOI: 10.16288/j.yczz.19-253网络出版时间: 2020/2/17 17:00URI: /kcms/detail/11.1913.R.20200214.1640.001.html综 述基因组选择在猪杂交育种中的应用杨岸奇1,2,3,陈斌1,3,冉茂良1,3,杨广民2,曾诚21. 湖南农业大学动物科学技术学院,长沙 4101282. 湖南美可达生物资源股份有限公司,长沙 4103313. 畜禽遗传改良湖南省重点实验室,长沙 410128摘要: 基因组选择是指在全基因组范围内通过基因组中大量的标记信息估计出个体全基因组范围的育种值,可进一步提升育种效率和准确性,目前在猪纯繁育种中得到广泛应用。
但有研究表明,现有的基因组选择方法在猪杂交育种上的应用效果并不理想,在跨群体条件下预测准确性极低。
rrBLUP软件:基于Ridge回归的基因定位预测软件说明书
Package‘rrBLUP’December10,2023Title Ridge Regression and Other Kernels for Genomic SelectionVersion4.6.3Author Jeffrey EndelmanMaintainer Jeffrey Endelman<*****************>Depends R(>=4.0)Imports stats,graphics,grDevices,parallelDescription Software for genomic prediction with the RR-BLUP mixed model(Endel-man2011,<doi:10.3835/plantgenome2011.08.0024>).One application is to estimate marker ef-fects by ridge regression;alternatively,BLUPs can be calculated based on an additive relation-ship matrix or a Gaussian kernel.License GPL-3URL<https:///software/>NeedsCompilation noRepository CRANDate/Publication2023-12-1017:10:06UTCR topics documented:rrBLUP-package (2)A.mat (2)GW AS (4)kin.blup (6)kinship.BLUP (8)mixed.solve (10)Index131rrBLUP-package Ridge regression and other kernels for genomic selectionDescriptionThis package has been developed primarily for genomic prediction with mixed models(but it can also do genome-wide association mapping with GWAS).The heart of the package is the function mixed.solve,which is a general-purpose solver for mixed models with a single variance com-ponent other than the error.Genomic predictions can be made by estimating marker effects(RR-BLUP)or by estimating line effects(G-BLUP).In Endelman(2011)I made the poor choice of using the letter G to denotype the genotype or marker data.To be consistent with Endelman(2011)I have retained this notation in kinship.BLUP.However,that function has now been superseded bykin.blup and A.mat,the latter being a utility for estimating the additive relationship matrix(A) from markers.In these newer functions I adopt the usual convention that G is the genetic covariance (not the marker data),which is also consistent with the notation in Endelman and Jannink(2012).Vignettes illustrating some of the features of this package can be found at https://potatobreeding./software/.ReferencesEndelman,J.B.2011.Ridge regression and other kernels for genomic selection with R package rrBLUP.Plant Genome4:250-255.<doi:10.3835/plantgenome2011.08.0024>Endelman,J.B.,and J.-L.Jannink.2012.Shrinkage estimation of the realized relationship matrix.G3:Genes,Genomes,Genetics2:1405-1413.<doi:10.1534/g3.112.004259>A.mat Additive relationship matrixDescriptionCalculates the realized additive relationship matrixUsageA.mat(X,min.MAF=NULL,max.missing=NULL,impute.method="mean",tol=0.02,n.core=1,shrink=FALSE,return.imputed=FALSE)ArgumentsX matrix (n ×m )of unphased genotypes for n lines and m biallelic markers,coded as {-1,0,1}.Fractional (imputed)and missing values (NA)are allowed.min.MAF Minimum minor allele frequency.The A matrix is not sensitive to rare alleles,so by default only monomorphic markers are removed.max.missing Maximum proportion of missing data;default removes completely missing mark-ers.impute.method There are two options.The default is "mean",which imputes with the mean for each marker.The "EM"option imputes with an EM algorithm (see details).tol Specifies the convergence criterion for the EM algorithm (see details).n.core Specifies the number of cores to use for parallel execution of the EM algorithm shrinkset shrink=FALSE to disable shrinkage estimation.See Details for how to enable shrinkage estimation.return.imputed When TRUE,the imputed marker matrix is returned.DetailsAt high marker density,the relationship matrix is estimated as A =W W /c ,where W ik =X ik +1−2p k and p k is the frequency of the 1allele at marker k.By using a normalization constant of c =2k p k (1−p k ),the mean of the diagonal elements is 1+f (Endelman and Jannink 2012).The EM imputation algorithm is based on the multivariate normal distribution and was designed for use with GBS (genotyping-by-sequencing)markers,which tend to be high density but with lots of missing data.Details are given in Poland et al.(2012).The EM algorithm stops at iteration t when the RMS error =n −1 A t −A t −1 2<tol.Shrinkage estimation can improve the accuracy of genome-wide marker-assisted selection,particularly at low marker density (Endelman and Jannink 2012).The shrinkage intensity ranges from 0(no shrinkage)to 1(A =(1+f )I ).Two algorithms for estimat-ing the shrinkage intensity are available.The first is the method described in Endelman and Jannink (2012)and is specified by shrink=list(method="EJ").The second involves designating a ran-dom sample of the markers as simulated QTL and then regressing the A matrix based on the QTL against the A matrix based on the remaining markers (Yang et al.2010;Mueller et al.2015).The re-gression method is specified by shrink=list(method="REG",n.qtl=100,n.iter=5),where the parameters n.qtl and n.iter can be varied to adjust the number of simulated QTL and number of iterations,respectively.The shrinkage and EM-imputation options are designed for opposite sce-narios (low vs.high density)and cannot be used simultaneously.When the EM algorithm is used,the imputed alleles can lie outside the interval [-1,1].Polymorphic markers that do not meet the min.MAF and max.missing criteria are not imputed.ValueIf return.imputed =FALSE,the n ×n additive relationship matrix is returned.If return.imputed =TRUE,the function returns a list containing $A the A matrix$imputed the imputed marker matrix4GW ASReferencesEndelman,J.B.,and J.-L.Jannink.2012.Shrinkage estimation of the realized relationship matrix.G3:Genes,Genomes,Genetics.2:1405-1413.<doi:10.1534/g3.112.004259>Mueller et al.2015.Shrinkage estimation of the genomic relationship matrix can improve genomic estimated breeding values in the training set.Theor Appl Genet128:693-703.<doi:10.1007/s00122-015-2464-6>Poland,J.,J.Endelman et al.2012.Genomic selection in wheat breeding using genotyping-by-sequencing.Plant Genome5:103-113.<doi:10.3835/plantgenome2012.06.0006>Yang et mon SNPs explain a large proportion of the heritability for human height.Nat.Genetics42:565-569.<doi:10.1038/ng.608>GWAS Genome-wide association analysisDescriptionPerforms genome-wide association analysis based on the mixed model(Yu et al.2006):y=Xβ+Zg+Sτ+εwhereβis a vector offixed effects that can model both environmental factors and population structure.The variable g models the genetic background of each line as a random effect with V ar[g]=Kσ2.The variableτmodels the additive SNP effect as afixed effect.The residual variance is V ar[ε]=Iσ2e.UsageGWAS(pheno,geno,fixed=NULL,K=NULL,n.PC=0,min.MAF=0.05,n.core=1,P3D=TRUE,plot=TRUE)Argumentspheno Data frame where thefirst column is the line name(gid).The remaining columnscan be either a phenotype or the levels of afixed effect.Any column not desig-nated as afixed effect is assumed to be a phenotype.geno Data frame with the marker names in thefirst column.The second and thirdcolumns contain the chromosome and map position(either bp or cM),respec-tively,which are used only when plot=TRUE to make Manhattan plots.If themarkers are unmapped,just use a placeholder for those two columns.Columns4and higher contain the marker scores for each line,coded as{-1,0,1}={aa,Aa,AA}.Fractional(imputed)and missing(NA)values are allowed.The column namesmust match the line names in the"pheno"data frame.fixed An array of strings containing the names of the columns that should be includedas(categorical)fixed effects in the mixed model.GWAS5 K Kinship matrix for the covariance between lines due to a polygenic effect.If not passed,it is calculated from the markers using A.mat.n.PC Number of principal components to include asfixed effects.Default is0(equals K model).min.MAF Specifies the minimum minor allele frequency(MAF).If a marker has a MAF less than min.MAF,it is assigned a zero score.n.core Setting n.core>1will enable parallel execution on a machine with multiple cores(use only at UNIX command line).P3D When P3D=TRUE,variance components are estimated by REML only once, without any markers in the model.When P3D=FALSE,variance componentsare estimated by REML for each marker separately.plot When plot=TRUE,qq and Manhattan plots are generated.DetailsFor unbalanced designs where phenotypes come from different environments,the environment mean can be modeled using thefixed option(e.g.,fixed="env"if the column in the pheno data.frame is called"env").When principal components are included(P+K model),the loadings are determined from an eigenvalue decomposition of the K matrix.The terminology"P3D"(population parameters previously determined)was introduced by Zhang et al.(2010).When P3D=FALSE,this function is equivalent to EMMA with REML(Kang et al.2008).When P3D=TRUE,it is equivalent to EMMAX(Kang et al.2010).The P3D=TRUE option is faster but can underestimate significance compared to P3D=FALSE.The dashed line in the Manhattan plots corresponds to an FDR rate of0.05and is calculated using the qvalue package(Storey and Tibshirani2003).The p-value corresponding to a q-value of0.05is determined by interpolation.When there are no q-values less than0.05,the dashed line is omitted. ValueReturns a data frame where thefirst three columns are the marker name,chromosome,and position, and subsequent columns are the marker scores(−log10p)for the traits.ReferencesKang et al.2008.Efficient control of population structure in model organism association mapping.Genetics178:1709-1723.Kang et al.2010.Variance component model to account for sample structure in genome-wide association studies.Nat.Genet.42:348-354.Storey and Tibshirani.2003.Statistical significance for genome-wide studies.PNAS100:9440-9445.Yu et al.2006.A unified mixed-model method for association mapping that accounts for multiple levels of relatedness.Genetics38:203-208.Zhang et al.2010.Mixed linear model approach adapted for genome-wide association studies.Nat.Genet.42:355-360.Examples#random population of200lines with1000markersM<-matrix(rep(0,200*1000),1000,200)for(i in1:200){M[,i]<-ifelse(runif(1000)<0.5,-1,1)}colnames(M)<-1:200geno<-data.frame(marker=1:1000,chrom=rep(1,1000),pos=1:1000,M,s=FALSE)QTL<-100*(1:5)#pick5QTLu<-rep(0,1000)#marker effectsu[QTL]<-1g<-as.vector(crossprod(M,u))h2<-0.5y<-g+rnorm(200,mean=0,sd=sqrt((1-h2)/h2*var(g)))pheno<-data.frame(line=1:200,y=y)scores<-GWAS(pheno,geno,plot=FALSE)kin.blup Genotypic value prediction based on kinshipDescriptionGenotypic value prediction by G-BLUP,where the genotypic covariance G can be additive or based on a Gaussian kernel.Usagekin.blup(data,geno,pheno,GAUSS=FALSE,K=NULL,fixed=NULL,covariate=NULL,PEV=FALSE,n.core=1,theta.seq=NULL)Argumentsdata Data frame with columns for the phenotype,the genotype identifier,and any environmental variables.geno Character string for the name of the column in the data frame that contains the genotype identifier.pheno Character string for the name of the column in the data frame that contains the phenotype.GAUSS To model genetic covariance with a Gaussian kernel,set GAUSS=TRUE and pass the Euclidean distance for K(see below).K There are three options for specifying kinship:(1)If K=NULL,genotypes are assumed to be independent(G=I V g).(2)For breeding value prediction,set GAUSS=FALSE and use an additive relationship matrix for K to create themodel(G=K V g).(3)For the Gaussian kernel,set GAUSS=TRUE and passthe Euclidean distance matrix for K to create the model G ij=e−(K ij/θ)2V g.fixed An array of strings containing the names of columns that should be included as (categorical)fixed effects in the mixed model.covariate An array of strings containing the names of columns that should be included as covariates in the mixed model.PEV When PEV=TRUE,the function returns the prediction error variance for the genotypic values(P EV i=V ar[g∗i−g i]).n.core Specifies the number of cores to use for parallel execution of the Gaussian kernel method(use only at UNIX command line).theta.seq The scale parameter for the Gaussian kernel is set by maximizing the restricted log-likelihood over a grid of values.By default,the grid is constructed by di-viding the interval(0,max(K)]into10points.Passing a numeric array to thisvariable(theta.seq="theta sequence")will specify a different set of grid points(e.g.,for large problems you might want fewer than10).DetailsThis function is a wrapper for mixed.solve and thus solves mixed models of the form:y=Xβ+[Z0]g+εwhereβis a vector offixed effects,g is a vector of random genotypic values with covariance G=V ar[g],and the residuals follow V ar[εi]=R iσ2e,with R i=1by default.The design matrix for the genetic values has been partitioned to illustrate that not all lines need phenotypes(i.e., for genomic selection).Unlike mixed.solve,this function does not return estimates of thefixed effects,only the BLUP solution for the genotypic values.It was designed to replace kinship.BLUP and to relieve the user of having to explicitly construct design matrices.Variance components are estimated by REML and BLUP values are returned for every entry in K,regardless of whether it has been phenotyped.The rownames of K must match the genotype labels in the data frame for phenotyped lines;missing phenotypes(NA)are simply omitted.Unlike its predecessor,this function does not handle marker data directly.For breeding value pre-diction,the user must supply a relationship matrix,which can be calculated from markers withA.mat.For Gaussian kernel predictions,pass the Euclidean distance matrix for K,which can becalculated with dist.In the terminology of mixed models,both the"fixed"and"covariate"variables arefixed effects (βin the above equation):the former are treated as factors with distinct levels while the latter are continuous with one coefficient per variable.The population mean is automatically included as a fixed effect.The prediction error variance(PEV)is the square of the SE of the BLUPs(see mixed.solve)and.can be used to estimate the expected accuracy of BLUP predictions according to r2i=1−P EV iV g K ii ValueThe function always returns$Vg REML estimate of the genetic variance$Ve REML estimate of the error variance$g BLUP solution for the genetic values$resid residuals$pred predicted genetic values,averaged over thefixed effectsIf PEV=TRUE,the list also includes$PEV Prediction error variance for the genetic valuesIf GAUSS=TRUE,the list also includes$profile the log-likelihood profile for the scale parameter in the Gaussian kernelReferencesEndelman,J.B.2011.Ridge regression and other kernels for genomic selection with R package rrBLUP.Plant Genome4:250-255.<doi:10.3835/plantgenome2011.08.0024>Examples#random population of200lines with1000markersM<-matrix(rep(0,200*1000),200,1000)for(i in1:200){M[i,]<-ifelse(runif(1000)<0.5,-1,1)}rownames(M)<-1:200A<-A.mat(M)#random phenotypesu<-rnorm(1000)g<-as.vector(crossprod(t(M),u))h2<-0.5#heritabilityy<-g+rnorm(200,mean=0,sd=sqrt((1-h2)/h2*var(g)))data<-data.frame(y=y,gid=1:200)#predict breeding valuesans<-kin.blup(data=data,geno="gid",pheno="y",K=A)accuracy<-cor(g,ans$g)kinship.BLUP Genomic prediction by kinship-BLUP(deprecated)Description***This function has been superseded by kin.blup;please refer to its help page.Usagekinship.BLUP(y,G.train,G.pred=NULL,X=NULL,Z.train=NULL,K.method="RR",n.profile=10,mixed.method="REML",n.core=1)Argumentsy Vector(n.obs×1)of observations.Missing values(NA)are omitted.G.train Matrix(n.train×m)of unphased genotypes for the training population:n.trainlines with m bi-allelic markers.Genotypes should be coded as{-1,0,1};frac-tional(imputed)and missing(NA)alleles are allowed.G.pred Matrix(n.pred×m)of unphased genotypes for the prediction population:n.predlines with m bi-allelic markers.Genotypes should be coded as{-1,0,1};frac-tional(imputed)and missing(NA)alleles are allowed.X Design matrix(n.obs×p)offixed effects.If not passed,a vector of1’s is used to model the intercept.Z.train0-1matrix(n.obs×n.train)relating observations to lines in the training set.If not passed the identity matrix is used.K.method"RR"(default)is ridge regression,for which K is the realized additive relation-ship matrix computed with A.mat.The option"GAUSS"is a Gaussian kernel(K=e−D2/θ2)and"EXP"is an exponential kernel(K=e−D/θ),where Eu-clidean distances D are computed with dist.n.profile For K.method="GAUSS"or"EXP",the number of points to use in the log-likelihood profile for the scale parameterθ.mixed.method Either"REML"(default)or"ML".n.core Setting n.core>1will enable parallel execution of the Gaussian kernel compu-tation(use only at UNIX command line).Value$g.train BLUP solution for the training set$g.pred BLUP solution for the prediction set(when G.pred!=NULL)$beta ML estimate offixed effectsFor GAUSS or EXP,function also returns$profile log-likelihood profile for the scale parameterReferencesEndelman,J.B.2011.Ridge regression and other kernels for genomic selection with R package rrBLUP.Plant Genome4:250-255.Examples#random population of200lines with1000markersG<-matrix(rep(0,200*1000),200,1000)for(i in1:200){G[i,]<-ifelse(runif(1000)<0.5,-1,1)}#random phenotypesg<-as.vector(crossprod(t(G),rnorm(1000)))10mixed.solve h2<-0.5y<-g+rnorm(200,mean=0,sd=sqrt((1-h2)/h2*var(g)))#split in half for training and predictiontrain<-1:100pred<-101:200ans<-kinship.BLUP(y=y[train],G.train=G[train,],G.pred=G[pred,],K.method="GAUSS")#correlation accuracyr.gy<-cor(ans$g.pred,y[pred])mixed.solve Mixed-model solverDescriptionCalculates maximum-likelihood(ML/REML)solutions for mixed models of the formy=Xβ+Zu+εwhereβis a vector offixed effects and u is a vector of random effects with V ar[u]=Kσ2u.The residual variance is V ar[ε]=Iσ2e.This class of mixed models,in which there is a single variance component other than the residual error,has a close relationship with ridge regression (ridge parameterλ=σ2e/σ2u).Usagemixed.solve(y,Z=NULL,K=NULL,X=NULL,method="REML",bounds=c(1e-09,1e+09),SE=FALSE,return.Hinv=FALSE)Argumentsy Vector(n×1)of observations.Missing values(NA)are omitted,along with the corresponding rows of X and Z.Z Design matrix(n×m)for the random effects.If not passed,assumed to be the identity matrix.K Covariance matrix(m×m)for random effects;must be positive semi-definite.If not passed,assumed to be the identity matrix.X Design matrix(n×p)for thefixed effects.If not passed,a vector of1’s is used to model the intercept.X must be full column rank(impliesβis estimable).method Specifies whether the full("ML")or restricted("REML")maximum-likelihood method is used.bounds Array with two elements specifying the lower and upper bound for the ridge parameter.SE If TRUE,standard errors are calculated.return.Hinv If TRUE,the function returns the inverse of H=ZKZ +λI.This is useful for GWAS.DetailsThis function can be used to predict marker effects or breeding values(see examples).The nu-merical method is based on the spectral decomposition of ZKZ and SZKZ S,where S= I−X(X X)−1X is the projection operator for the nullspace of X(Kang et al.,2008).This algorithm generates the inverse phenotypic covariance matrix V−1,which can then be used to cal-culate the BLUE and BLUP solutions for thefixed and random effects,respectively,using standard formulas(Searle et al.1992):BLUE(β)=β∗=(X V−1X)−1X V−1yBLUP(u)=u∗=σ2u KZ V−1(y−Xβ∗)The standard errors are calculated as the square root of the diagonal elements of the following matrices(Searle et al.1992):V ar[β∗]=(X V−1X)−1V ar[u∗−u]=Kσ2u−σ4u KZ V−1ZK+σ4u KZ V−1XV ar[β∗]X V−1ZK For marker effects where K=I,the function will run faster if K is not passed than if the user passes the identity matrix.ValueIf SE=FALSE,the function returns a list containing$Vu estimator forσ2u$Ve estimator forσ2e$beta BLUE(β)$u BLUP(u)$LL maximized log-likelihood(full or restricted,depending on method)If SE=TRUE,the list also contains$beta.SE standard error forβ$u.SE standard error for u∗−uIf return.Hinv=TRUE,the list also contains$Hinv the inverse of HReferencesKang et al.2008.Efficient control of population structure in model organism association mapping.Genetics178:1709-1723.Endelman,J.B.2011.Ridge regression and other kernels for genomic selection with R package rrBLUP.Plant Genome4:250-255.Searle,S.R.,G.Casella and C.E.McCulloch.1992.Variance Components.John Wiley,Hoboken.Examples#random population of200lines with1000markersM<-matrix(rep(0,200*1000),200,1000)for(i in1:200){M[i,]<-ifelse(runif(1000)<0.5,-1,1)}#random phenotypesu<-rnorm(1000)g<-as.vector(crossprod(t(M),u))h2<-0.5#heritabilityy<-g+rnorm(200,mean=0,sd=sqrt((1-h2)/h2*var(g))) #predict marker effectsans<-mixed.solve(y,Z=M)#By default K=Iaccuracy<-cor(u,ans$u)#predict breeding valuesans<-mixed.solve(y,K=A.mat(M))accuracy<-cor(g,ans$u)IndexA.mat,2,2,5,7,9dist,7,9GWAS,2,4,10kin.blup,2,6,8kinship.BLUP,2,7,8mixed.solve,2,7,10rrBLUP-package,213。
第17章 分子标记辅助选择
Efficient data tracking, management, and integration with phenotypic data
Decision support tools for breeders
– optimal design of selection strategies – accurate selection of genotypes
Marker technologies provide the potential to understand the underlying causes of epistasis and GXE, which could greatly improve selection efficiency
Paradox of MAS
目的基因与标记连锁(交换值为r)
亲本中的标记带型
mM RS
×
F1中的标记带型 F2群体中3种标记带型
RR (1-r)2 0.9025
RS 2r(1-r) 0.095
SS r2 0.0025
当 r=0.05 时 , 根 据 标 记 基 因 型 mm选择目的基因型RR,选错的 概率约为0.10
Definitions
植物全基因组选择技术的研究进展及其在玉米育种上的应用
植物全基因组选择技术的研究进展及其在玉米育种上的应用孙琦;李文兰;陈立涛;赵勐;李文才;于彦丽;孟昭东【摘要】全基因组选择技术通过全基因组中大量的单核苷酸多态性标记(SNP)和参照群体的表型数据建立 BLUP模型估计出每一标记的育种值,称为估计育种值(GEBV),然后仅利用同样的分子标记估计出后代个体育种值并进行选择。
该文就近年来国内外有关影响基因组选择效率的主要因素———参考群体的类型与大小、模型的建立方法、标记的类型及其数目、性状遗传力,以及对基因组选择效率的影响等方面的研究进展进行综述,并介绍了全基因组选择技术在玉米育种上应用概况以及对未来的展望。
%Marker-assisted selection (MAS)technology could realize direct genetic selection,but it must base on QTL mapping.Genomic selection (GS),as the newest MAS method,has much advantage com-pared to traditional MAS technology,especially QTL mapping not necessary.Inthis paper,the factors af-fecting prediction accuracy of GS were reviewed,including training population type,prediction model, marker number,population size,population structure,hereditary of traits and so on.The application of GS in maize breeding was also introduced as well as hybrids performance prediction.We then predicated the future research and application of GS in maize breeding.【期刊名称】《西北植物学报》【年(卷),期】2016(036)006【总页数】9页(P1269-1277)【关键词】全基因组选择;玉米;估计育种值【作者】孙琦;李文兰;陈立涛;赵勐;李文才;于彦丽;孟昭东【作者单位】山东省农业科学院玉米研究所,济南 250100;山东省农业科学院玉米研究所,济南 250100;莱阳市种子公司,山东莱阳 265200;山东省农业科学院玉米研究所,济南 250100;山东省农业科学院玉米研究所,济南 250100;山东省农业科学院玉米研究所,济南 250100;山东省农业科学院玉米研究所,济南250100【正文语种】中文【中图分类】Q789With rapid development of the molecular biology and genomics, marker-assisted selection(MAS) emerged as the times require. MAS technology is as a kind of crop genetic improvement method combing the phenotypic and genetic value, which can realize genetic direct selection and effective polymerization[1] . When complex traits controlled by multiple genes need to be improved, MAS has two aspects of flaws. First, selection of the progeny population is established on the quantity traits location (QTL) mapping. But the result of QTL mapping basing on the bi-parental populations has no universality and couldn’t be applied accurately in breeding[2]. Second, the important traits were controlled by lots of small effective genes,lack of appropriate statistic method and breeding technology which will apply quantity genes to complex traits improvement[3]. New MAS technology-genomic selection (GS) emerged asthe times require.Meuwissen first put forward genomic selection (GS) breeding strategy. GS uses a “training population” of individuals that have been genotyped and phenotyped. Best linear unbiased prediction (BLUP) model is established on the basis of the genotyped result of an individual and its breeding value (Mean performance of crosses with same tester) for the training population. The breeding value of “Candidate population” is estimated by BLUP model and genotypic data.without cross to tester and phenotypes record[4]. BLUP model takes genotypic data of untested individuals and produces genomic estimated breeding values (GEBVs). These GEBVs say nothing of the function of the underlying genes as the ideal selection criterion[5] . Genomic selection basis of GEBVs is superior to traditional breeding for increasing gains per unit time even if both models show the same efficiency. In principle, phenotypes value of the candidate individuals is non-essential for the selection, hence shortening the length of the breeding cycle[6].Genomic selection have several merits compared to the traditional MAS. (1) QTL mapping is not necessary for GS. Genomic selection differs from previous strategies such as linkage and association mapping in that it abandons the objective to map the effect of single gene and instead of focusing on the efficient estimation of breeding values on the basis of a large number of molecular markers, ideally covering the full genome[5] . (2) Genomic selection is more precision especially for early selection. Genotyping uses high density molecular markers which can estimate all ofthe QTL effects and explain the genetic variance for most of the traits. But MAS only uses several markers in traits selection. So genomic selection is more accurate than MAS[7] . (3) Genomic selection can shorten generation interval, accelerate genetic progress and reduce production cost. Genetic progress of GS is more than phenotypic selection 4%-25%. Cost of GS is less than traditional breeding 26%-56%[8] . (4) Selection efficiency of low heritability traits is higher for GS than MAS. (5) The criterion of GS is breeding value, sum of all of the allele genetic effects for each individual. It is judged by the mean performance of its cross progeny, not the performance of itself. So GS is more accurate[9].Genomic selection originated from animal breeding during last century. It has been widely used in dairy cattle breeding in America, Australia, New Zealand and so on[10-11] . It was also applied in broiler chickens and pigs breeding[12-13] . GS’ application in plant breeding was developed in recent years, which focused on simulation studies. It is used in maize[14] , wheat[15] , tree[16] , sugar beet[17] , Barley[18] , triticale[19] and so on. Empirical study is performed in larger companies such as Monsanto and Pioneer-Dupond. Mark Sorrells and Jean-Luc Jannink are trying to use GS to increase the speed of variety improvement 3-4 times. The work is carried out with CYMMIT and performed four aspects to improve the yield of maize and wheat[20].Under the above context, the objective of this study is to review the essential factors affecting the GS in plant breeding. Maize is essential for global food security. More research of genomic selection on maize lauchedin recent years[21-23]. The paper will introduce the advance on the application of GS in maize breeding. We than put forward the future research which should be carried out in maize breeding in China. Factors that affect GS prediction accuracy of include the number of markers used for estimating the GEBVs[10] , trait heritability[7] , calibration population size[5] , statistical models[24] , number and type of molecular markers[25-26] , linkage disequilibrium[27] , effective population size[28], relationship between calibration and test set (TS)[29-31] and population structure[32-34] .2.1 Training population of genomic selectionIn animal breeding, we only discussed GS in the context of population-wide linkage disequilibrium, where the population might be defined as an entire breed of cattle, pig, or chicken. The need for high marker densities in GS may be reduced if the candidate population consists of progeny of the training population. In that case, an evenly spaced low-density subset of the markers typed on the training population can be used on the candidates, and scores for the full complement of markers can be inferred by cosegregation[35] . Because plants often produce very large full sibships (an F2 population derived from a single F1 by selfing is an example of such a sibship), however, there is also a tradition of QTL detection, MAS and GS within such sibships[5] . Bernardo compared F2, BC1, and BC2 populations from an adapted×exotic maize cross as training population in the simulation experiment[14]. The result indicates that genomewide selection should start at F2 rather than backcross population,even when the number of favorable alleles is substantially larger in the adapted parent than in the exotic parent. Compared to natural populations, genetic basis of F2 populations is simpler because F2 populations derive from only two inbred lines. So the biparental population size might be smaller than that of natural populations. Simulation studies have previously indicated that for three cycles of genomewide selection in an adapt ed×exotic cross, a population size of NC0 = 144 was generally sufficient[21] . Low density markers are suitable to F2 populations[22] . But two disadvantages of F2 populations exist. Biparental population requires separate model for training within each cross.The BLUP model is only suit for the progenies selection from the two parental lines. The progeny of F2 population must be selected by the phenotypic value of F3 testcrosses. Following progeny selection may be only according to BLUP model afterF3.F2 as training population often be suilt for cross-pollinated plant such as maize. Yusheng Zhao based on experimental data of six segregating populations from a half-diallel mating design with 788 testcross progenies from an elite maize breeding program[23]. In the study of Vannesa etal.[36] , marker effects estimated in 255 diverse maize hybrids were used to predict grain yield, anthesis date, and anthesis-silking interval within the diversity panel and testcross progenies of 30 F2-derived lines from each of five populations.Wegenast et al. suggested that genomic selection was applied in plant breeding, however, not only within a specific bi-parental cross or within adiverse panel of elite lines but also rather within and among crosses[37]. Self-pollination plant often adopt natural population such as wheat or sugar. Würschum et al used 924 sugar beet lines as training population. The results suggest that a training population derived from intensively phenotyped and genotyped diverse lines from a breeding program does hold potential to build up robust calibration models for genomic selection[17]. Hans et al. accessed the accuracy of GEBVs for rust resistance in 206 hexaploid wheat landraces[15].2.2 Prediction model of genomic selectionGenomic selection modeling takes advantage of the increasing abundance of molecular markers through modeling of many genetic loci with small effects[26,35,38] . Over the last decade, simulation and empirical cross-validation studies in plants have shown GS is more effective than traditional MAS strategies that use only a subset of markers with significant effects[5-7,39] .Estimation methods of allelic effects include least squares regression[40], ridge regression BLUP (RR-BLUP), principle component analysis[41-42] and Bayes regression[43]. In essence for least squares, chromosome fragments or markers are selected associated to the traits by genome-wide association studies (GWAS) at the same time and then the effect of the fragments is estimated[44]. RR-BLUP method regards the fragment effects as random effects. The marker effect was estimated by linear mixed models. The sum of fragments effect is breeding value for an individual[43]. Bayes methods combines the prior distribution of marker effect varianceand data collection. Frenquently used Bayes methods conclude Bayes A and Bayes B. Main difference between Bayes A and Bayes B is that Bayes A permits different variance for different markers and Bayes B permits that the variance of some markers is zero[45].Simulation studies show that the prediction accuracy of Bayes method is best and least squares is weakest. The accuracy rate of RR-BLUP is slightly smaller than Bayes A. Even so, RR-BLUP has four aspects superior to Bayesian method. First, Bayesian method is complex and need super computer. But computer requirement is lower and calculation speed is higher for RR-BLUP. Marker effects are estimated by RR-BLUP in SAS PROC IML[46]. Second, prediction within families was more accurate in BLUP than Bayes B. Regression coefficient b of RR-BLUP is nearer to 1 than BayesA[47]. Habier et al. showed that RR-BLUP is more effective at capturing genetic relationships because it fits more markers into the prediction Model[27]. In contrast, Bayes B is more effective at capturing LD between markers and QTL. Third, RR-BLUP is more accurate than other method when the number of QTLs increases or the heredity is higher[18] . Fourth, BLUP led to lower inbreeding and a smaller reduction of genetic variance compared to Bayes and PLS [48]. From above, we can conclud that BLUP methods is better than Bayesian regression for plant models.In addition, machine-learning methods also can be used to predict the marker effect, including support vector machine (SVM) , booting and random forest (RF). Ogutu et al. compared these methods for genomic selection. The result shows that the correlation between the predicted andtrue breeding values is 0.547 for boosting, 0.497 for SVMs,and 0.483 for RF, indicating better performance for boosting than for SVMs and RF[49].2.3 Other factors affecting prediction accuracyIn genome-wide selection methods, prediction accuracy is affected by population size (N), average hereditary of traits (h2) and markernumbers(NM)[50]. Simulation studies showed that the population structure is also crucial for the prediction accuracy in genomicselection[27].Prediction accuracy increases with markers density. Markers number on a certain length genome also directly affects total information of genetic markers. If SSR markers density increases from 0.25 Ne/morgan (Ne, effective population size) to 2 Ne/morgan, prediction accuracy will be improved from 0.63 to 0.83. If SNP markers density increases from 1Ne/morgan to 8 Ne/morgan, prediction accuracy will be improved from 0.69 to 0.86. Even at the highest tested densities of 2 Ne SSR markers per Morgan or 8 Ne SNP markers per Morgan, accuracy had not reached a plateau[5] . Meanwhile, more markers number, more easy to get the Linkage disequilibrium(LD) markers. Emily found that in the biparental populations, there was no consistent gain in genome-wide prediction (rmp) from increasing marker density above one marker per 12.5 cM[22]. Zhao et al. revealed that the accuracy was nearly reaching a plateau at 800 SNPs when the number of markers varied from 100 to 800 [23]. The reason is that genome is sufficiently saturated with markers when the prediction accuracy arrives at a plateau[28,50]. The number of markers needed foraccurate predictions of genotypic values depends on the extent of linkage disequilibrium (LD) between markers and QTL[4] and also on the germplasm under consideration[18] .Different marker type has different polymorphism information content (PIC). Comparing SSR and SNP markers, they found that for similar accuracies, the SNP markers required a density of 2 to 3 times that of the SSR[5].Simulation studies showed that the population size is crucial for the prediction accuracy in genomic selection[27]. The result of Emily et al. indicated that prediction accuracy rmp increased as population size N increased. In the biparental maize population and with the highest markers number NM, (1 213 markers) and hereditary h2 = 0.30, the prediction accuracy for grain yield was rmp = 0.19 with N= 48, rmp = 0.26 with N = 96, and rmp = 0.33 with N = 192[22]. Zhao Yusheng observed a monotonic increase in the prediction accuracy for grain yield with increasing population size without any substantial decrease in the slope [23] . The study of Bernardo also indicated that lager poluation size would get higher prediction precision[14]. But F2 population size of NC0 = 144 was generally sufficient[21].Training population structure is also an important factor affecting prediction accuracy of genomic selection for multi-parental populations. Training population structure set methods conclude random sampling, unidirectional sampling (selecting individuals with highest genotypic values), bidirectional sampling (selecting individuals with highest or lowestgenotypic values)[50-51]. This bidirectional selection showed to be much more powerful than random sampling[52] . Yusheng Zhao observed a substantial loss in the accuracy to predict genomic breeding values in unidirectional selected populations. Bidirectional selection is a valuable approach to efficiently implement genomic selection in applied plant breeding programs[53].For the same trait within the same population, prediction accuracy(rmp) will remain unchanged for different combinations of population size (N) and trait hereditary (h2). Decrease on h2 can be compensated by a proportional increase in N (and vice versa) so that rmp is maintained. On the other hand, traits with initially low h2 can be evaluated with larger N or the h2 for a subset of traits can be increased by the use of additional testing resources. Different traits, however, vary in their prediction accuracy even when N, h2, and NM (markers number) are constant. Yield traits had lower prediction accuracy than other traits despite the constant N, h2, and NM. Simulation results indicated that rmp is also lowest for yield traits even when its h2 is as high as other traits. Plant height and lodging are always predicted most accurately followed by floweringtime[22] . Empirical evidence and experience on the predictability of different traits are necessary in designing training populations.3.1 Origination of GS in maizeThe key technology of GS is the maize hybrid prediction by BLUP model with markers effects or coefficient of parentage. It was used to predict the single-cross performance in maize hybrid breeding at first. The BLUPmodel is established based on the tested hybrids data and the markers information of their parents. The performance of untested hybrids is predicted by the BLUP model and the markers data of the parents[54]. Bernardo devoted himself to hybrids prediction by BLUP model inmaize[55-58]. The coefficient of relative between theory and actual observation was 0.688~0.800 by RFLP markers[54] . BLUP is suitable for hybrid performance prediction since the trait only has moderate heritability. Prediction accuracy of molecular marker effects is higher than phylogenetic relationship[58]. With the development of molecular markers, new molecular marker type emerged. Simple sequence repeats (SSR) and single nucleotide polymorphism (SNP) were widely used. Manje Gowda et al. found that prediction accuracy of flower time and plant height was above 0.8 with SSR markers in maize[19]. Research of Massman et al. indicated that prediction accuracy of grain yield was 0.8, and root logging ratio was 0.87 using SSRmarkers[59]. But the prediction effect of grain yield was only 0.50~0.66, and root logging ratio was only 0.31~0.45 with coefficient of parentage[55] . Then it indicated that molecular markers was more suitable for hybrid performance prediction than coefficient of parentage.Then scientists found that BLUP was not only used to hybrid performance prediction, but also the breeding value of individuals among the maize population. So BLUP was used to individuals selection of F2 population in selection and breeding of inbred lines. Hybrid performance prediction lay the foundation for the genome-wide selection in maize.3.2 Application of genomic selection in maizeBernardo’s laboratory began to study applying GS to maize breeding in Minnesota University of America[21] . They did plenty of simulation and empirical experiments. Piepho in German and Robert in Brazil also tried to study using GS in maize breeding[60-61]. GS utility in maize breeding consist of two sides, hybrids performance prediction and improvement of inbred lines. He devoted to inbred lines improvement using GS. The BLUP model of biparental populations from two inbred lines is only suit for the progeny of the parents. Genomewide selection as proposed in maize involves two steps[21]. First, a segregating maize population is genotyped and evaluated for testcross performance of F3 family. Based on the genotypic and phenotypic data, breeding values associated with a large set of markers (e.g., 256 to 512 markers) are calculated for the traits of interest. Significance tests for markers are not used, and the effects of all markers are fitted as random effects in a linear model by best linear unbiased prediction (BLUP). Second, two or three generations of selection based on all markers are conducted in a year-round nursery (e.g., Hawaii or Puerto Rico) or greenhouse. Trait values are predicted as the sum of an individual plant’s marker values across all markers, and selection is subsequently based on these genomewide prediction. According to the steps, Emily (2013b) introgressed semidwarf germplasm to U.S. Corn belt inbred and found that genomewide selection from Cycle 1 until Cycle 5 either maintained or improved on the gains from phenotypic selection achieved in Cycle 1[62].The results of Bernardo indicated that a useful strategy for the rapid improvement of an adapted×exotic cross involves 7 to 8 cycles of genomewide selection starting in the F2[14]. Benjamin et al. demonstrated that progressive selfing had a significant and positive impact on genomic selection gains. In particular, selfing to the F8 produced a 72% increase over F2 gains[63]. However, most of the gains are realized by the F5 generation (95% of the F8 gains). Also note that the F8 and DH performed similarly, consistent with previous observations[64] .In the research of Bernardo, the training population is the specific bi-parental populations from the two parental lines, so the BLUP model is suit for the progeny of the two inbred lines. Other experiments of GS in maize are about multi-parental populations as training population. Study of Yusheng Zhao was based on experimental data of six segregating populations from a half-diallel mating design. As for maize up to three generations are feasible per year, selection gain per unit time is high and, consequently, genomic selection holds great promise for maize breeding programs[23]. These result of the study might be as genomic prediction model for further breeding elite maize lines between the six populations. In the study of Vanessa et al., marker effects estimated in 255 diverse maize hybrids were used to predict grain yield, anthesis date, and anthesis-silking interval within the diversity panel and testcross progenies of 30 F2-derived lines from each of five populations[36]. Potential uses for genomic prediction in maize hybrid breeding are discussed emphasizing the need of (1) a clear definition of the breeding scenario in which genomicprediction should be applied (i.e., prediction among or within populations), (2) a detailed analysis of the population structure before performing cross validation, and (3) larger training sets with strong genetic relationship to the validation set.GS is just beginning to be implemented, but it will take long time to be used in maize breeding. In previous study, training population was only from several inbred lines, even if two inbred lines. It couldn’t be implemented by other breeding program. Future research should focus on two sides of work. First, we should commit to build a generalized prediction model for some kinds of inbred lines such as yield, quality and so on. But these traits were complex composed of a great deal of genes. Traditional MAS technology couldn’t realize the traits selection in maize breeding. 973 Plan “Basic study on breeding of geno me-wide selection of yield and quality traits in maize” has been carried out in 2014. The plan will systematicly analyze the genetic basis of maize yield and quality, and then build genome-wide selection breeding model. It will afford new technology for maize breeding. Seond,in China, abiotic stress tolerance also reduces the yield seriously in maize especially drought tolerance. Drought is the foremost factor restricting maize production, often resulting in 20-50% maize yield reduction every year in China[65] . If we establish prediction model of drought tolerance, it will afford the theory and technology support of maize breeding. Consequently, our research team will carried out study on the genomic selection program of drought tolerance.References:[1] STUBER C W, POLACCO M, SENIOR M L. Synergy of empirical breeding, marker-assisted selection, and genomics to increase crop yield potential[J]. Crop Science, 1999,39:1 571-1 583.[2] MOOSE S P, MUMM R H. Molecular plant breeding as the foundation for 21st century crop improvement[J]. Plant Physiology, 2008, 147: 969-977.[3] BERNARDO R. Molecular markers and selection for complex traits in plants: learning from the last 20 years[J].Crop Science, 2008, 48:1 649-1 664.[4] MEUWISSENT H, HAYES B J, GODDARD M E. Prediction of total genetic value using genome-wide dense marker maps[J]. Genetics, 2001, 157: 1 819-1 829.[5] JANNINK J L, LORENZ A J, IWATA H. Genomic selection in plant breeding: from theory to practice[J]. Briefings in Functional Genomics, 2010, 9(2):166-177.[6] HEFFNER E L, JANNINK J L, IWATA H, et al. Genomic selection accuracy for grain quality traits in biparental wheat populations[J]. Crop Science, 2011, 51: 2 597-2 606.[7] HEFFNER E L, SORRELLS M E, JANNINK J L. Genomic selection for crop improvement[J]. Crop Science, 2009, 49: 1-12.[8] MAYOR P J , BERNARDO R. Genomewide selection and marker-assisted recurrent selection in doubled haploid versus F2 populations[J]. Crop Science, 2009, 49:1 719-1 725.[9] MASSMAN J M, JUNG H J G, BERNARDO R. Genomewide selectionversus marker-assisted recurrent selection to improve grain yield and stover-quality traits for cellulosic ethanol in maize[J]. Crop Science, 2012, 53(1): 58-66.[10] SCHAEFFER L R. Strategy for applying genome-wide selection in dairy cattle[J]. Journal of Animal Breeding Genetic, 2006, 123: 218-223.[11] GODDARD M E, HAYES B J. Genomic selection[J]. Journal of animal Breeding Genetics, 2007, 124: 323-330.[12] DAETWYLER H D, VILLANUEVA B, BIJMA P. Inbreeding in genome-wide selection[J]. Journal of Animal Breeding Genetic, 2007, 124: 369-376.[13] TU L, WOOLLIAMS J A, SIGBJORN L. The accuracy of genomic selection in norwegian red cattle assessed by cross validation[J]. Genetics, 2009, 183: 1 119-1 126.[14] BERNARDO R. Genomewide selection for rapid introgression of exotic germplasm in maize[J]. Crop Science, 2009, 49: 419-425.[15] HANS D D, BANSAL U K, BARIANA H S, et al. Genomic prediction for rust resistance in diverse wheat landraces[J]. Theory and Applied Genetics, 2014, 127: 1 795-1 803.[16] MARIE D, BOUVET J M. Genomic selection in tree breeding: testing accuracy of prediction models including dominance effect[J]. BMC Proceedings, 2011, 5(Supply7): 1-2.[17] WÜRSCHUM T, REIF J C , KRAFT T, et al. Genomic selection in sugar beet breeding populations[J]. BMC Genetics, 2013, 14: 85-92.[18] ZHONG S Q, DEKKERS J C M, FERNANDO R L, et al. Factors affecting accuracy from genomic selection in populations derived from multipleinbred lines: a barley case study[J]. Genetics, 2009, 182(1): 355-364. [19] GOWDA M, ZHAO Y S , MAURER H P, et al. Best linear unbiased prediction of triticale hybrid performance[J]. Euphytica, 2013, 191: 223-230.[20] 吴永升, 邵俊明, 周瑞阳, 等. 植物数量性状全基因组选择研究进展[J]. 西南农业学报, 2012,25(4): 1 510-1 514.WU Y S, SHAO J M, ZHOU R Y, et al. Reviews of genome- wide selection for quantitative traits in plants[J]. Southwest China Journal of Agricultural Sciences, 2012, 25(4): 1 510-1 514.[21] BERNARDO R, YU J. Prospects for genome-wide selection for quantita-tive traits in maize[J]. Crop Science,2007, 47: 1 082-1 090.[22] EMILY C, BERNARDO R. Accuracy of genomewide selection for different traits with constant population size, heritability, and number of markers[J]. Plant Genome, 2013a, 6(1): 1-7.[23] ZHAO Y S, GOWDA M, LIU W X, et al. Accuracy of genomic selection in European maize elite breeding populations[J].Theoretical and Appllied Genetics, 2012a, 124: 769-776.[24] HESLOT N, YANG H P, SORRELLS M E, et al. Genomic selection in plant breeding: a comparison of models[J]. Crop Science, 2012, 52: 146-160.[25] CHEN X, SULLIVAN P F. Single nucleotide polymorphism genotyping: biochemistry, protocol, cost and throughput[J]. Pharmaco Genetics, 2003, 3: 77-96.[26] POLAND J, RIFE T W. Genotyping-by-sequencing for plant breeding and genetics[J]. Plant Genetics, 2012, 5: 92-102.[27] HABIER D, FERNANDO R L, DEKKERS J C M. The impact of genetic relationship information on genome-assisted breeding values[J]. Genetics, 2007, 177: 2 389-2 397.[28] DAETWYLER H D, VILLANUEVA B, WOOLLIAMS J A. Accuracy of predicting the genetic risk of disease using a genome-wide approach[J]. PLoS One, 2008, 3: 3 395.[29] ALBRECHT T, WIMMER V, AUINGER H J, et al.Genome-based prediction of testcross values in maize[J]. Theoretical and Appllied Genetics, 2011, 123: 339-350[30] CLARK S, HICKEY J, WERF J. Different models of genetic variation and their effect on genomic evaluation[J]. Genetic Selection Evolution, 2011, 43: 18.[31] PSZCZOLA M, STRABEL T, MULDER H A, et al.Reliability of direct genomic values for animals with different relationships within and to the reference population[J]. Journal of Dairy Science, 2012, 95z: 389-400. [32] SAATCHI M, MCCLURE M C, MCKAY S D, et al. Accuracies of genomic breeding values in American Angus beef cattle using k-means clustering for cross-validation[J]. Genetic Selection Evolution, 2011, 43: 40.[33] WINDHAUSEN V S, ATLIN G N, CROSSA J, et al. Effectiveness of genomic prediction of maize hybrid performance in different breeding populations and environments[J]. Genes Genomes Genetic, 2012, 2:1 427-1 436.[34] GUO Z, TUCKER D M, BASTEN C J, et al. The impact of population structure on genomic prediction in stratified populations[J]. Theoretical。
重测序-全基因组选择(GS)
首页 科技服务 测序指南 基因课堂 市场活动与进展 文章成果 关于我们全基因组选择1. Meuwissen T H, Hayes B J, Goddard M E.Prediction of total genetic value using genome-wide dense marker maps[J]. Genetics, 2001, 157(4): 1819 1829. 阅读原文>>2. Haberland A M, Pimentel E C G, Ytournel F, et al. Interplay between heritability, genetic correlation and economic weighting in a selection index with and without genomic information[J]. Journal of Animal Breeding and Genetics, 2013, 130(6): 456-467. 阅读原文>>3. Wu X, Lund M S, Sun D, et al. Impact of relationships between test and training animals and among training animals on reliability of genomic prediction[J]. Journal of Animal Breeding and Genetics, 2015, 132(5): 366-375. 阅读原文>>4. Goddard M E ,Hayes BJ. Genomic selection [J]. Journal of Animal Breeding and Genetics,2007,124:323:330. 阅读原文>>5. Heffner E L, Sorrells M E, Jannink J L. Genomic selection for crop improvement [J]. Crop Science, 2009, 49(1): 1-12. 阅读原文>>参考文献全基因组选择简介Meuwissen等[1]在2001年首次提出了基因组选择理论(Genomic selection , GS),即利用具有表型和基因型的个体来预测只具有基因型不具有表型值动植物的基因组育种值(GEBV)。
Genomic selection 4-11-13
What is the best GEBV model for a particular trait
and population? (there are various methods for estimating GEBV)
Massman et al., 2013
Response to selection by GWS and MARS in maize
Massman et al., 2013
Field evaluation showed that gains for the Stover Index and Yield + Stover Index were 14 to 50% larger (P = 0.05) with GWS than with MARS. “Our results indicate that using all available markers for predicting genotypic value leads to greater gain than using a subset of markers with significant effects.”
Requires dense marker coverage Includes small-effect genes that are not
captured by traditional MAS
Barley OWB SNP map Jesse Poland 2012
Hayes et al., 2010
Challenges (continued)
How often is model retraining needed before LD is decreased by recombination such that accuracy is reduced? Will rare and potentially beneficial alleles be lost since they are unlikely have detectable effects? How fast will genetic diversity will be reduced by this combination of phenotypic and GS? What is the best way to incorporate new genetic diversity into GS populations?
动物基因组选配方法与应用
Hereditas (Beijing) 2019年6月, 41(6): 486―493 收稿日期: 2019-02-28; 修回日期: 2019-05-10基金项目:湖南省科技计划重点项目(编号:2018NK2081),长沙市科技计划重点项目(编号:kq1801014)和湖南省百人计划项目和湖南省畜禽安全协同创新中心项目资助[Supported by Key Project of Scientific Research Plan of Hunan Province (No.2018NK2081), Key Project of Scientific Research Plan of Changsha city (No. kq1801014) and Hundred-Talent Project of Hunan Province and Hunan Innovation Centerof Animal Safety Production]作者简介: 何俊,博士,副教授,研究方向:动物遗传育种。
E-mail: hejun@通讯作者:吴晓林,教授,博士生导师,研究方向:动物遗传育种。
E-mail: nwu@DOI: 10.16288/j.yczz.19-053网络出版时间: 2019/5/30 13:11:44URI: /kcms/detail/11.1913.R.20190530.1311.001.html综 述动物基因组选配方法与应用何俊1,Fernando B. Lopes 2,吴晓林1,2,31. 湖南农业大学动物科技学院,长沙 4101282. 美国威斯康星大学动物科学系,威斯康星州麦迪逊市 537063. 美国纽勤公司生物信息与生物统计部,内布拉斯加州林肯市 68504摘要: 基因组选择(genomic selection, GS)是利用覆盖基因组的分子标记预测动物个体的估计育种值,可以提高选择的准确度和选择强度,缩短世代间隔,做到早选、准选,使动物育种发生了巨大变革。
浅谈杜洛克公猪的精液特性与精子形态对其精液质量的影响
112猪业科学 SWINE INDUSTRY SCIENCE 2016年33卷第6期遗传改良GENETIC IMPROVEMENT浅谈杜洛克公猪的精液特性与精子形态对其精液质量的影响蒋腾飞,楼平儿,曾新斌,范权飚(福建一春农业发展有限公司,福建 南平 353000)中国的猪人工授精技术始于20世纪50年代。
随着人工授精技术的广泛普及,人们对公猪精液的要求也越来越高。
实际生产中,单次采精所能稀释的精液份数由公猪射精量、精液品质(如活力、密度等)和每份精液所需的精子数量所决定。
如果公猪单次射精量高且品质好,就能给猪场带来更多的效益。
杜洛克公猪作为终端父本,其精液使用量远超其他品种的公猪,但其精液品质往往不如其他品种公猪。
这迫使人工授精站或猪场饲养更多的杜洛克公猪,以满足生产所需,导致生产成本增加。
如何提高杜洛克公猪的精液品质是当前亟待解决的问题。
公猪的精液品质受到许多因素的影响,本文重点讨论精液特性及精子形态对杜洛克公猪精液质量的影响,为提高杜洛克公猪的精液产量及质量提供理论摘 要:在生产实践中,杜洛克公猪的精液受到长途运输等应激后会较其他品种公猪的精液品质下降更快。
该文旨在通过比较不同品种的精液特性及精子形态,找到杜洛克公猪的精液品质较其他品种差的根本原因。
并利用以往的研究结果,对精子特性与精子形态的关系及精子形态与精子活力的关系展开讨论。
结果表明,与皮特兰、长白和大白等猪种相比,杜洛克猪的射精量少、精液浓度高、精子头部大而精子尾巴短。
借此能吸引更多的同行对此进行专门的研究。
关键词:杜洛克公猪;品种;精液特性;精子形态;精液质量基础。
1 精液特性与精子形态之间的关系精子的形状、大小与精液的特性(如精液浓度、精子数量)有关,不同的品种其精液特性也不一样。
下面从品种内部和品种间2个方面来阐述精液特性与精子形态之间的关系。
1.1 品种内Kondracki 等[1]针对杜洛克公猪的精液特性及精子形态展开了深入的研究,他们利用8头7-9月龄的青年杜洛表1 杜洛克猪不同原精浓度的精液特性与精子形态的关系注:角标为不同的小写字母,表示同行数据差异显著(P <0.05);角标为不同的大写字母,表示差异极显著(P <0.01),下同。
illumina芯片拷贝数变异分析流程
illumina芯片拷贝数变异分析流程The Illumina chip copy number variation (CNV) analysis process involves a series of steps designed to detect and interpret variations in the number of copies of specific genomic regions. Starting with the raw sequencing data obtained from the Illumina platform, the first step involves quality control and preprocessing to ensure the accuracy and reliability of the data. This includes steps such as base calling, adapter trimming, and filtering out low-quality reads.接下来,经过预处理的数据会进行比对(alignment),即将测序读段(reads)与参考基因组进行比对,以确定它们在基因组中的位置。
比对完成后,会生成一个比对文件,记录每个读段在参考基因组上的位置信息。
Subsequently, the aligned data is analyzed to detect CNVs. This involves the application of various algorithms and statistical methods to identify regions of the genome that exhibit significant deviations in copy number compared to the expected or reference state. These methods can range from simple read depth-based approaches to more complex machine learning techniques.在检测到CNV区域后,会进行进一步的注释和分析,以理解这些变异对基因和表型的影响。
测序分析流程
测序分析流程Sequencing analysis is an essential component of genetic research, providing valuable insights into the composition and function of an organism's genome. 测序分析是遗传研究的重要组成部分,可以为研究生物体基因组的组成和功能提供宝贵的洞见。
By determining the precise order of nucleotides in a DNA molecule, researchers can identify genetic variations, mutations, and structural changes that impact an organism's traits and susceptibility to diseases. 通过确定DNA分子中核苷酸的精确顺序,研究人员可以识别影响生物体特征和疾病易感性的遗传变异、突变和结构改变。
This information is crucial for understanding the genetic basis of various phenotypes and developing personalized medicine tailored to an individual's unique genetic profile. 这些信息对于理解各种表型的遗传基础以及制定适合个体独特遗传特征的个性化医学是至关重要的。
The sequencing analysis process typically involves several key steps, starting with sample collection and DNA extraction. 测序分析过程通常包括几个关键步骤,从样本收集和DNA提取开始。
非转基因产品的检测报告
非转基因产品的检测报告【标题】非转基因产品的检测报告【引言】尽管转基因技术在农业和食品领域具有一定的应用,但是关于转基因食品的安全性和潜在风险引发了广泛的关注。
为了满足消费者对非转基因产品的需求,食品安全监管机构和检测机构对市场上的食品进行了严格的检测和监管。
本篇文章将深入探讨非转基因产品的检测方法、标准和相关问题,并分享对非转基因产品的理解和观点。
【正文】1. 非转基因产品的意义和定义1.1 非转基因产品对消费者的重要性在当今社会,消费者越来越注重食品的安全性和质量。
非转基因产品作为一种安全可靠的选择,能够满足这一需求。
1.2 非转基因产品的定义非转基因产品是指不含有任何转基因成分的食品或农产品。
通过识别和验证产品的原材料和生产过程,可以确保其不含转基因成分。
2. 非转基因产品的检测方法2.1 基于PCR的转基因成分检测方法PCR技术是目前最常用的转基因检测方法之一。
通过扩增目标基因序列,可以检测转基因成分的存在。
2.2 基于质谱和DNA测序的转基因成分检测方法质谱和DNA测序技术可以通过检测食品中特定的转基因DNA序列,来确定食品中是否含有转基因成分。
2.3 组织学和表型学等传统方法的应用组织学和表型学方法通过观察和分析植物或动物的形态和组织结构,可以初步判断食品中是否存在转基因成分。
3. 非转基因产品的检测标准3.1 国际公认的转基因食品标签法规国际上,许多国家和地区已经实施了转基因食品标签法规,并制定了相应的阈值和标识要求。
3.2 转基因成分检测的严格要求针对非转基因产品的检测,标准要求转基因成分的检测结果应该满足相应的法规和标准,并确保检测结果的准确性和可靠性。
4. 非转基因产品检测的挑战和措施4.1 转基因产品的溯源和混合问题转基因产品的溯源和混合问题是非转基因产品检测的重要挑战之一。
针对这一问题,可以通过建立完善的供应链管理和追溯体系来解决。
4.2 转基因成分的检测方法和技术的发展随着转基因技术和检测技术的不断发展,非转基因产品的检测方法和技术将更加完善和精确。
T2_mapping技术在鉴别直肠黏液腺癌和非黏液腺癌中的价值
直肠黏液腺癌是指肿瘤组织中黏液成分超过50%的腺癌,是直肠癌的一种特定病理亚型,占所有直肠癌的5%~20%[1-2]。
与非黏液腺癌相比,黏液腺癌预后较差,对新辅助放化疗不敏感[3-4]。
直肠癌个体化治疗方案的选择是基于准确的影像学评估。
因此,术前确定直肠癌的病理类型对治疗方式的选择至关重要。
高分辨MRI 具有较高的软组织分辨力,已被广泛应用于直肠癌的术前评估中[5-6],但放射科医师鉴别黏液腺癌和非黏液腺癌具有一定的主观性。
DWI 可通过水分子的扩散特性定量描述组织复杂的生理状态,有助于鉴别黏液腺癌和非黏液腺癌[7-8]。
然而,DWI 易受分辨力及磁场不均匀性的影响,在组织鉴别方面的截断值差异较大[9]。
T 2 mapping是通过不同回波时间的信号强度测量组织的T 2值。
T 2值可反映基本的组织特性,且在一定的场强下与扫描仪器和扫描参数无关。
目前,T 2 mapping 在骨关节和心肌中应用较多[10-12],近年来也用于体部良恶性肿瘤的评估[13-15],但其在鉴别直肠癌病理类型中研究较少。
本研究旨在探讨T 2 mapping 技术在术前鉴别直肠黏液腺癌和非黏液腺癌中的价值,并与DWI 进行比较。
1 资料与方法1.1 一般资料前瞻性收集山东大学齐鲁医院2022年10月至DOI :10.3969/j.issn.1672-0512.2024.03.019[基金项目] 山东省自然科学基金项目(ZR2019MH049)。
[通信作者] 王芳,Email :****************。
T 2 mapping 技术在鉴别直肠黏液腺癌和非黏液腺癌中的价值吴广太1,贾进正2,徐兴华1,常玲玉3,程连华1,王林红1,王 青1,王 芳11.山东大学齐鲁医院放射科,山东 济南 250012;2.山东省平度市中医医院放射科,山东 青岛 266700;3.山东大学齐鲁医学院,山东 济南 250012[摘要] 目的:探讨T 2 mapping 技术对直肠黏液腺癌和非黏液腺癌的鉴别诊断价值。
genomic selection in animals
genomic selection in animals基因组选择(Genomic Selection)是一种利用基因组信息预测动物个体遗传值的方法。
它是通过分析动物基因组上的单核苷酸多态性(SNP)来进行的。
基因组选择的步骤如下:1. 数据收集:首先,需要收集动物个体的基因组数据和表型数据。
基因组数据可以通过高通量测序技术获得,例如全基因组测序或SNP芯片。
表型数据可以包括生长性状、繁殖性状、健康性状等。
2. 基因组分析:利用基因组数据,可以进行基因型分析和基因频率估计。
基因型分析可以通过比对测序数据和参考基因组序列来确定个体的基因型。
基因频率估计可以通过计算每个基因型在种群中的频率来获得。
3. 基因组预测:基于基因组数据和表型数据,可以建立统计模型来预测动物个体的遗传值。
常用的模型包括线性模型、贝叶斯模型等。
预测的遗传值可以用来评估个体的遗传潜力和选择最佳的个体用于繁殖。
4. 选择决策:根据预测的遗传值,可以进行选择决策。
选择决策可以基于不同的目标,例如改良某个特定性状、提高整体遗传进展等。
基因组选择在动物育种中具有许多优势:1. 提高选择效率:相比传统的选择方法,基因组选择可以更准确地预测个体的遗传值,从而提高选择效率。
2. 提高遗传进展:基因组选择可以更好地利用基因组上的遗传变异,从而加速遗传进展。
3. 选择多个性状:基因组选择可以同时选择多个性状,从而更好地实现多目标选择。
4. 选择未表型的性状:基因组选择可以利用基因组数据预测未表型的性状,从而更好地选择个体。
总之,基因组选择在动物育种中具有重要的应用价值,可以提高选择效率、加速遗传进展,并且可以选择多个性状和未表型的性状。
二代基因测序流程
二代基因测序流程Genomic sequencing is a powerful tool that has revolutionized the fields of medicine, agriculture, and evolutionary biology. It allows scientists to decipher the complete genetic code of an organism, providing valuable insight into its health, traits, and evolutionary history. The process involves determining the order of nucleotides within an individual's DNA, which can help identify mutations, disease-causing variants, and unique genetic markers.基因组测序是一种强大的工具,彻底改变了医学、农业和进化生物学领域。
它使科学家能够解读生物体的完整遗传密码,为其健康、特征和进化历史提供宝贵的见解。
该过程涉及确定个体DNA内核苷酸的顺序,可以帮助识别突变、致病变异和独特的遗传标记。
There are two main types of genomic sequencing technologies: first-generation sequencing (Sanger sequencing) and second-generation sequencing (Next-Generation Sequencing, NGS). First-generation sequencing is a labor-intensive and time-consuming process that involves breaking DNA strands into fragments, amplifying the fragments, and then sequencing them using fluorescently labelednucleotides. In contrast, second-generation sequencing is more efficient and cost-effective, allowing researchers to sequence millions of DNA fragments in parallel.基因组测序技术主要有两种类型:一代测序(Sanger测序)和二代测序(下一代测序,NGS)。
【doc】数量性状分离分析的精确度及其改善途径
数量性状分离分析的精确度及其改善途径第27卷第6期2001年11月作物ACTAAGRONOMICASINICAV ol_27,No.6N0v.,2001数量性状分离分析的精确度及其改善途径章元明盖钩镒戚存扣"南京农业大学大豆研究所.农业部国家大豆改盅中15.江苏南京210095)提要在分析提高数量性状分离分析精度的三种途径基础上,对遗传率较低或试验误差较大的数量性状分离分析.综合上述三方法提出了利用P,F,P,B…,B...和F家系重复试验的联台分离分析法.其基本理论是将分离群体看作由多基因和环境修饰的主基因型正态分布的混台分布通过油菜初花期的遗传分析表明:分析家系重复试验的家系平均数资料获得的AIC值比分析单一重复赍斟获得的AIC值更低;P,F和Pz的加人可增加一阶遗传参数的估计精度;通过方差分析联合估计提供的误差方差可增加二阶遗传参数的估计精度关键词主基因一多基因混台遗传;家系重复试验;数量性状ThePrecisionofSegregatingAnalysisofQuantitativeTraitandItslm- provingMethodsZHANGYuan——MingGAIJun——YiQICun--KoutS~>6eanResearchl描机埘,Na3mgAgricul~ralU~iversa,{NationalCenter.fs0啦n"Im争r…眦,MinistryofAg越-拙re.N口ndng21C095?China) AbstractToimprovetheprecisionofmixedgeneticanalysisofquantitativetraits,threekinds ofmethodswereproposed.Byjointingtheabovemethods.jointsegregatinganalysis(JSA)of quantitativetraitforthereplicatedfamilyexperiment(RFE)ofPl,F..P:,B1.2,B2.2andF2.3wassetupinthepaper.TheresultsofanexampleoftheinheritanceofdatetoflowerofBrassica napusL.showedthattheAkaikeSinformationcriterion(AIC)valuesofgeneticmodelsusing theaverageddataofmultiplereplicationswouldbelowerthanthoseusingthedataofonereplic ation;theprecisionofthefirstordergeneticparameterscanbeimprovedbymeansofaddingthe informationofPL,FIandP2generations;theunbiasederrorvarianceobtainedbyanalysisofv ari—anceofRFEcouldimprovetheprecisionoftheestimatesofsecondordergeneticparameters. KeywordsMixedmajorgenepluspolygeneinheritance;thereplicatedfamilytest;Quantita tivetrait在王建康,盖钧镒在Elkind等(1986,1990),奠惠栋等(1993):和姜长鉴等(1995):;的工作基础上,提出了利用个别世代的数量性状分离分析法,经他人在水稻广国家973项目(G1998010206)和重庆市科委应用基础研究项目资助章元明(1965年5月生),男,重庆永川^.论文博士研究生,副教授,研究方向:数量遗传与生物统计"现在江苏省农科院经作所工作.江苏南京,21O014收稿日期:20000929,接受日期:200101—14Recdvedo:2000—09—29,Acceptedon-gO01-O1—14作物27卷亲和性和白叶枯病抗性.大豆抗食叶性害虫,抗斜纹夜蛾,孢囊线虫l号小种和上部节数相对值等性状的遗传规律研究中应用,提出了相应的育种策略和培育了部分抗病品种应用中发现.该法在不同分离世代结果间有差异,由此提出了联合多个分离世代的联合分析法.经他人在水稻广亲和性,抗白叶枯病和株高.小麦白叶枯病抗性,玉米矮枯叶病.大豆抗食叶性害虫,抗斜纹夜,抗孢囊线虫,抗花叶病毒以及干豆乳产量,干豆腐产量,油分含晕,最长叶柄节位相对值等性状的遗传研究表明它们均符合该遗传模式J.应用中发现该套方法还有必要进行拓展为此.章元明,盖钧镒在分离世代,遗传模型和极大似然估计的算法等方面进行了拓展..~.应用中还发现,遗传率低和试验误差大的数量性状分离分析精度不高,这是因为现有的植物数量性状分离分析法虽进行了拓展但基本上还是针对试验单位为单株或单株一家系的田问试验遗传资料.然而,作物的重要经济性状,如品质性状,产量及其产量因素等许多性状属低遗传力的,且田问试验单株资料的误差又较大因此,分析影响数量性状分离分析精确度的原因井提出改善途径很有必要.本文就是这方面的探索.I提高数量性状分离分析精确度的途径1l采用家系平均数代替单株观察值根据数量性状分离分析的基本假定可知,P.,F.和P!同质群体中单株数量性状观测值服从单一正春分布(.);若采用株观测值的平均数进行分析,则该平均数服从N(./n)由此可见.分析平均数资料的误差比分析单株观测值资料误差小,前者是后者的1"该结论同样适合于分离群体.例如,F:群体中单株数量性状观测值是由k种主基因型I所对应的并由多基因和环境修饰的k1个正态分布混合而成的,即..~.N(),其中.一?一.一一多基因方差组分.然而,由该F群体衍生的F.家系群体的数量性状家系平均数.(每家系观测株)分布由k.种主基因型家系所对应的并由多基因和环境修饰L]的k一个正态分布混合而成的,即~丌.Ⅳ(,!),其中,一./+多基因和主基因方蔗组分(f一】.…,k)显然,的误差方差组分是的误差方差组分的倍.因此.利用家系平均数有更小的误差方差,用具有更小误差方差的家系平均数进行分离分析有更高的精度1.2采用联合多世代分析代替单世代分析文献指出.利用多世代联合分析能够得到的信息量最多,并且可以克服利用几个分离世代独立分析出现的矛盾现象从数量遗传学理论可知,有亲本和F参加的数量性状联合分析有利于精确定位一阶遗传参数估计值大小,从而提高一阶遗传参数估计值的精度此外,利用多世代联合分析的样本容量大为增加也是联合分析精度高的重要原因.1.3采用家系重复试验方法利用家系重复试验能增加数量性状分离分析精度主要有两方面的原因:1)家系试验可以设置重复,增加重复能降低误差达到提高遗传分析精度的目的;2)家系重复试验方差分析可获得无偏试验误差方差估计值,这可增大二阶遗传参数的估计精度,特别是结合重复内分组随机区组设计更能有效地减少试验误差.6期章元明等:数量性状分离分析的精确度及其改善途径通过近j年的工作,已从利用单株资料发展到利用家系平均数资料;从利用单个分离世代发展到利用多个分离世代的联合0'];从未设置重复的试验~.发展到设置重复的试验.,从l对主基因+多基因发展到了2对主基因+多基因...从参数估计的EM算法拓展到了迭代条件EM(IECM)算法~.以上这些发展均只是从一个方面考虑的本文一方面是对以前工作的总结,更重要的方面是同时结合上述提高数量性状分离分析精度的三种途径而提出的一种适合低遗传力和误差较大性状的分析方法.以便提高其分析精度为此,本文提出利用P,F,P,BB.和F:家系重复试验的联合分离分析法2利用P.,F,P,B,B:;:和F=,家系试验的联合分离分析2.1试验设计利用家系进行遗传试验一般采用随机区组设计,但当家系数较多时,随机区组的区组太大控制误差的效果可能不佳.可考虑采用不完全区组设计,其中格子设计较好.但格子设计安排的家系数受平方数限制且环境效应矫正较复杂,其应用也受限制,若供试家系数能被区组容量整除且重复2~4次时,可采用简单广义格子设计方法"]近来一些研究者常采用重复内分组随机医组设计由于这种设计也属不完全区组设计,特别适用于供试样本为随机样本的情形,这时假定分组后的每组家系均为总体代表性样本,组间应无显着差异,若存在显着差异.应为环境误差.可予以剔除设将家系群体供试家系随机分为0组,每组有6个家系,每家系设置c个重复.按重复内分组随机区组设计.每一重复种曲小区.若数据单位为小区时.数量性状.的数学模型为:.=—q十+(ny)+卢.cI)+其中.为群体平均数;q为第i组的效应,服从N(0,);y为重复效应,组与重复问的互作效应ty).~N(O.d);卢)为第i组内第J个家系的效应,服从N(O,d;){试验误差随机变量E~N(o,).若数据单位为小区内单株,其数学模型为:,1-=一+y+(ny)++()m1)+占其中.()Ⅲ与以小区为数据单位的同义.可检测组内家系问的差异显着性;~N(0.)为小区内株间差异,最后,仍以小区平均数为单位进行分析.误差方差通过方差分析提供.以此估计二阶遗传参数若组间差异显着.则剔除组效应q后进行以下分析.若家系间差异不显着,则分析到此为止2.2基本假定与遗传模型数量性状混合遗传分析基本假定参见文献.根据文献一的方法,可推导出1对主基因(A),2对主基因(B),多基因(C),1对主基因+多基因(D)和2对主基因一多基因(E)五类24种遗传模型的迭代公式和遗传参数表达式.其推导方法与文献相似,但因群体成分分布遗传构成的差异,致使迭代公式,约束条件数和约束条件方程组与其他供试群体和试验设计的情况均有较大的不同.本文着眼点还在于从减少试验误差以提高遗传分析精度出发,提出家系重复试验的数量性状混合遗传分析方法.为节省篇幅,只列出利用P,F,P,B, B和:世代联合分离分析的24种独立遗传模型,详见表1.作物裹1利用Pl,FlPl,BmB2和F2一世代的联台分离分析遗传模型TablelThegenetkmodelsofjoin!segregatinganalysisofquantitativetraitsusingPFi,P,B】:,B2iandr3populations2.3多世代联合似然函数及其分布参数和遗传参数的估计利用P,F,P.B,琏:和F家系世代联合分析的样本似然函数为:121^^L厶(y9)一Ⅱ,(¨,)Ⅱf(i.;,)Ⅱf(x¨,)Ⅱ∑f(x,;.,:)Ⅱ∑f(x,i)Ⅱ∑f(i.;.i)l一】=一其中,w~t.~和~表示样本的后验概率,~岛表示BB..和F分离群体的成分分布个数.利用文献13]的IECM算法估计其分布参数最优遗传模型确定后,根据分布平均数向量0与一阶遗传参数向量G的相互关系:e—AG,由最小二乘原理可得一阶遗传参数的估计为G一(AA)A0.各群体表型方差;由试验数据直接算得;误差方差由重复内分组随机区组设计方差分析提供;,和分别为多基因方差组分(,和矗)和环境方差组分(/n)构成:群体主基因方差等于群体表型方差减去该群体纯台主基因型成分分布方差,群体多基因方差等于该群体纯台主基因型成分分布方差减去误差方差.主基因遗传率和多基因遗传率的计算参见前面有关文献r6~12].6期章元明等:数量性状分离分析的精确度及其改善途径当只有一对主基因时,还可利用Bayesian后验概率"~.,5..~Ⅲ:和~Ⅻ*对家系主基因型进行归类.3应用实例以油菜HSTC14【P)×宁油7号(P)组合6家系世代的初花期数据为例说明家系重复试验数据的分析方法.在田间试验设计时,将20份亲本材料分成1O组,每组2份,重复2次;将120个B,B和F::.家系分成10组,每组12个家系.重复2次以家系内单株为单位进行录.考查性状为初花期,以3月1日为1,小区平均数是4株观测值的平均.表2列出亲本及其杂种后代初花期性状相同家系2重复平均数次数分布从表2可知,B.和B_:z群体不是呈单峰分布,F群体呈偏态的单峰分布.因此,可认为有主基因控制初花期的表现表2油菜HSTC14(P.)×宁油7号【Pz)组台6采系世代韧花期分布[able2Thefrequencydistributionsdatetoflowerforsixfamilypopulationsarape㈣为体现利用家系重复试验资料进行混合遗传分析的优点,通过重复内分组随机区组设计方差分析可提供误差方差的无偏估计,并降低误差方差.方差分析结果见表3.无偏的误差方差估计值为:!845—1.3348359'.若分别估计误差方差,如利用亲本和F群体时为9.5759.而分别利用B,Bz.和F一群体时分别为1.3347,l9291和0.9406因此,通过方差分析联合估计误差方差的结果更好从表3可知.分组间差异不显着,即不进行组问调整.然而,两亲本不同材料间却呈现出显着差异,这可能是由于两亲本方差分析的误差方差太小所致,若用方差分析提供的误差方差进行F检验.其显着性将有所改变.衰3方差分析裹Table3Theanalysisofvariance作物27卷为得出用家系蘑复试验资料进行主基固+多基因混合遗传分析可有较高的精度,2种方案:(1)用相同家系重复试验平均数资料进行混合遗传分析;(2)分别用本试验的重复1和2进行混台遗传分析.分别用I,Ⅱ,Ⅱ表示.用IECM算法估计样本似然函数中的分布参数,极大对数似然函数值和AIC值:".3种情况的AIC值见表4裹4各谙传模型AIC值Table4TheAkaiRe'sin[elfnlationcriterion(AIC)vu跨undervariousgeneticmodels参数估计倩参数估计值参数估计值参数估计值PalameterEstimatesParameterEstimatesP~eterEstinmt~PaTameterEstimaces根据表4比较相同模型I~Ⅲ间的AIC值,可发现除极个别情况外用家系平均数资料(I)得到的AIC值比只用个别重复资料(Ⅱ或Ⅱ)的AIC值低;分别在I~Ⅲ资料形式下,D一1模型的AIC值均是最低的,但是只用重复1资料时D-1与n4模型的AIC 值几乎相等,这对模型选择是不利的这说明用相同家系平均数资料的结果比只用个别重复资料的结果为6期章元明等:数量性状分离分析的精确度及其改善途径优.从适合性检验的结果来看也有该特点因此.当使用重复内分组随机区组设计时用家系平均数资料进行混合遗传分析结果要好些.此外,通过该设计方差分析提供的无偏误差方差1.3348均比】~m下D一1模型获得的误差方差极大似然估计值1.9060,2.6883和1.6849要低,这会减少多基因方差和多基因遗传率被低估的可能性由利用重复试验平均数资料获得的分布参数估计值(表5)可估计相应的遗传参数(表6).混合遗传模型分布参数的估计一般采EM用算法】但是,当利用家系群体时是不现实的文献[13-提出的IECM算法有效地解决了这一问题.从表4可知,用相同家系平均数资料的AIC值比用单一重复资料的低,说明用平均数资料更好.由于亲本和F群体的加入联合分析,有效地定位了一阶遗传参数估计值的大小.并认为杂种后代平均数偏向低值亲本即开花提前是由于主基因的显性效应为负所致.利用家系重复试验是减少试验误差提高试验精度的重要手段从方差分析理论知-通过家系重复试验方差分析提供了误差方差的无偏估计.该误差方差估计值1.3348比通过混合遗传分析得到的误差方差估计值小,这就减少低估多基因方差和多基因遗传率的概率,提高了二阶遗传参数的估计精度,使遗传模型及其遗传参数估计值间不致出现大的矛盾.参考文献1E[kindY8ACahanffr7ApGer~et,1986,72:377~383ElkindY&ACahanerHeredv.1990,64{205~2133奠惠埠.作物.1993,19(1):1~64奠惠栋.作钫./993,10(3):193~2005姜长蜉.莫惠栋.作聊,l995,2116):柏l~6486盖钧镒章元朗,王建康着2001.植物数量性状遗传分析体系北京:科学出版社7王建康,盖钧镒遗传,1997.24(5)432~4408盖竹锰,王建康作物,1996,24(4):402~4099GaiJY&JKWangApplGenex,1998t97(7):1162~11681c壬建隶,盖钧镒.作蛔,1998,24(61:651~65911章元明,盖钧镒遗传,200U,27(7):634~640l2盖钩镒.章元明,王建康.作物,2000,26(4):385~391 13章元州,盖钧锚.怍物,2000,26(6):699~706l吴天侠,盖钧锚.马育华.作物,1995,21(3):300~3O。
植物多倍体基因组组装流程
植物多倍体基因组组装流程Plant polyploidy refers to the condition where plants have multiple sets of chromosomes. This phenomenon is quite common in the plant kingdom and has been found in various plant species. Polyploidy can occur naturally or can be induced artificially through breeding techniques. Understanding the genome of polyploid plants is crucial for plant breeding, genetic studies, and crop improvement. However, the assembly of polyploid genomes posessignificant challenges due to the presence of multiple homologous chromosomes and repetitive sequences. In this article, we will discuss the general workflow and challenges involved in the assembly of polyploid plant genomes.The first step in assembling a polyploid plant genome is the generation of high-quality sequencing data. Next-generation sequencing technologies, such as Illumina sequencing, have revolutionized the field of genomics by providing massive amounts of short-read data at arelatively low cost. These short reads are generated from fragmented DNA and need to be assembled into longer contiguous sequences, known as contigs. However, the assembly of polyploid genomes using short-read data alone is challenging due to the presence of repetitive sequences, which can result in misassemblies and collapsed repeats.To overcome this challenge, researchers often employ a combination of short-read and long-read sequencing technologies. Long-read sequencing platforms, such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), generate much longer reads that can span repetitive regions and provide valuable informationfor resolving complex genomic structures. These long reads can be used to scaffold the short-read assembly, improving the contiguity and accuracy of the final genome assembly.Once the sequencing data is generated, the next step is to preprocess and clean the data to remove low-quality reads, adapter sequences, and other artifacts. This is typically done using specialized software tools, such as Trimmomatic or Cutadapt. After preprocessing, the reads areusually subjected to error correction to improve the accuracy of the assembly. Several error correction tools, such as Pilon and QuorUM, are available for this purpose.After preprocessing and error correction, the short-read and long-read data are typically assembled separately using different assembly algorithms. Short-read assemblers, such as SPAdes and Velvet, use de Bruijn graph-based algorithms to assemble the short reads into contigs. Long-read assemblers, such as Canu and Flye, employ overlap-layout-consensus (OLC) algorithms to assemble the long reads into contigs. The contigs generated from the short-read and long-read assemblies are then combined using scaffolding algorithms to produce a more complete and accurate assembly.One of the major challenges in polyploid genome assembly is the presence of homologous chromosomes and repetitive sequences. These sequences can lead to collapsed repeats and misassemblies in the assembly process. To address this, researchers often employ various strategies, such as haplotype phasing, to separate the homologouschromosomes and resolve complex genomic structures. Haplotype phasing involves assigning the short reads or long reads to their respective homologous chromosomes, allowing for the reconstruction of individual chromosome sequences.Another challenge in polyploid genome assembly is the presence of allelic variation and heterozygosity. Polyploid plants can have multiple copies of each gene, and these copies can differ in sequence due to allelic variation. Resolving allelic variation and heterozygosity is important for accurately representing the gene content and genetic diversity of the polyploid genome. Various methods, such as read mapping and variant calling, can be used to identify and resolve allelic variation in the assembly process.In conclusion, the assembly of polyploid plant genomes is a complex and challenging task. It requires the integration of multiple sequencing technologies, preprocessing steps, and assembly algorithms to generate a high-quality assembly. The presence of homologous chromosomes, repetitive sequences, and allelic variationfurther complicates the assembly process. However, advancements in sequencing technologies and computational algorithms have greatly improved our ability to assemble and analyze polyploid plant genomes. These assemblies provide valuable resources for studying plant evolution, genetic diversity, and crop improvement.。
全基因组选择和育种模拟在纯系育种作物亲本选配和组合预测中的利用研究
结果表明,当被预测的品种在所有环境中都没有观测值时,多环 境预测模型同单环境预测模型表现类似;当被预测的品种在其他 环境中有观测值时,多环境预测模型的预测准确性大大高于单环 境模型。因此,多环境模型能有效利用环境之间的相关性,从而 提高性状预测的准确性。
目前的GS研究多集中于性状本身的预测,对于育种中,特别是纯 系育种中的杂交组合预测和亲本选配尚缺少研究。为简便起 见,GS模型预测时往往忽略上位性效应;基因型与环境互作在作 物育种中普遍存在,利用性状在不同环境下的遗传相关,可能对 性状在特定环境下的表现进行预测。
本研究探索利用GS方法和育种模拟方法,开展小麦(Triticum aestivum L.)亲本选配和组合预测;比较不同GS模型在不同群体 和性状上对上位性效应的预测能力;比较不同GS模型的单个环境 预测和多环境联合预测之间的差异。主要研究内容和获得的主 要研究结果如下:1.基于全基因组选择的杂交组合预测模拟研究 在不同性状遗传结构下,利用模拟方法比较了不同全基因组选择 模型对杂交组合的预测效果,以及不同选择强度下,杂交组合有 效性和中亲值两种杂交组合预测方法所带来的遗传增益。
在少数性状和环境中,加入上位性效应的模型表现,与仅有加性 效应的模型相当或者略低。因此,常规水稻和小麦这种纯系品种 选育过程中利用GS方法时,应尽可能在预测模型中考虑上位性效 应。
4.全基因组选择中多环境表型的预测研究利用一个水稻RIL群体 在多个地点的表型试验进行多环境GS研究。采用两种育种场景 的交叉验证方案,比较不同模型的预测精度。
全基因组选择和育种模拟在纯系育种 作物亲本选配和组合预测中的利用研
究
全基因组选择(genomic selection,GS)是一种新兴的分子育种 方法,它利用训练群体的基因型和表型数据建模,然后对只有基 因型的育种群体进行表型预测和选择。已有多种预测模型被用 于性状GEBV(genomic estimated breeding value)的预测,例如 ridge regression best linear unbiased prediction(RRBLUP)、genomic best linear unbiased prediction(GBLUP)、 Bayes模型和机器学习模型。
应用单核苷酸多态性微阵列技术检测胎儿染色体6p25缺失综合征
·论著·《中国产前诊断杂志(电子版)》 2019年第11卷第4期应用单核苷酸多态性微阵列技术检测胎儿染色体6p25缺失综合征罗晓辉 曾玉坤 胡蓉 兰菲菲 (广东省妇幼保健院医学遗传中心,广东广州 511440)【摘要】 目的 探讨染色体6p25缺失综合征的遗传学病因、临床表型和致病基因,结合文献探讨单核苷酸多态性微阵列(singlenucleotidepolymorphismmicroarray,SNP array)检测技术在胎儿6p25缺失综合征产前诊断中的应用。
方法 羊膜腔穿刺取胎儿羊水,羊水细胞提取核酸后采用SNP array对胎儿DNA进行全基因组扫描后分析基因组拷贝数变异,同时进行羊水G显带染色体核型分析。
结果 胎儿的6号染色体6p25.3 p25.2区域发生缺失,缺失片段大小约2.7Mb,包含已知的犉犗犡犆1、犉犗犡犉2、犇犝犛犘22、犐犚犉4、犈犡犗犆2等15个OMIM基因,该片段缺失可导致6p25缺失综合征。
羊水G显带染色体核型分析提示为46,XN,未见明显异常。
结论 染色体6p25.3 p25.2区域缺失可导致6p25缺失综合征,单倍剂量不足基因犉犗犡犆1在6p25缺失综合征中发挥关键作用。
SNP array检测技术分辨率高、准确性好,可以弥补染色体核型分析分辨率的不足,适用于胎儿6p25缺失综合征的产前诊断,可以为临床遗传咨询医生提供详细的诊断信息和再生育咨询参考。
【关键词】 单核苷酸多态性微阵列;6p25缺失综合征;产前诊断【中图分类号】 R714.53 【文献标识码】 A犇犗犐:10.13470/j.cnki.cjpd.2019.04.09 通信作者:兰菲菲,E mail:shuyue88@163.com基金项目:广东省医学科研基金(B2014021)【犃犫狊狋狉犪犮狋】 犗犫犼犲犮狋犻狏犲 Toexplorethegeneticcausesofchromosome6p25deletionsyndrome,clinicalphenotypeanddiseasegenes.Anddiscusstheapplicationofsinglenucleotidepolymorphismsmicroarraydetectiontechnologyin6p25deletionsyndromeprenataldiagnosiscombinedwiththeliteraturereview.犕犲狋犺狅犱 Fetalamnioticfluidsamplewasextractedbyamnioticcavitypuncture.Genomecopynumbervar iationoffetusDNAwasanalyzedbySNP arraygenome widescanafteramnioticfluidcellcultureandGbandingkaryotypeanalysiswasdetermined.犚犲狊狌犾狋狊 Thereisadeletionfragmentat6p25.3 p25.2regioninfetuschromosome6andthefragmentsizeis2.7Mb.ItcontainssomeknownOMIMgenes,forexam ple,犉犗犡犆1、犉犗犡犉2、犇犝犛犘22、犐犚犉4and犈犡犗犆2.Thedeletionofthisregionhashighlycorrelatedwithclinicalphenotypeof6p25deletionsyndrome.Gbandingkaryotypeofamnioticfluidsampleresultis46,XN,noobviousabnormalities.犆狅狀犮犾狌狊犻狅狀狊 Thedeletionat6p25.3 p25.2regionofchromosome6couldresultin6p25deletionsyndrome.Itmaybeassociatedwithinadequateexpressiondoseof犉犗犡犆1gene.SNP arraydetectiontechnologyhashighresolutionandaccuracy,soitcanmakeupfortheresolutioninad equacyofkaryotypeanalysis.SNP arraydetectiontechnologymayapplytotheprenataldiagnosisoffetal6p25deletionsyndrome.ItCanprovidedetaileddiagnosticinformationforclinicalgeneticconsultandfer tilityconsultingreference.【犓犲狔狑狅狉犱狊】 Singlenucleotidepolymorphismmicroarray;6p25deletionsyndrome;Prenataldiagnosis 6p25缺失综合征(6p25deletionsyndrome)是一种罕见的染色体微缺失综合征,主要包含犉犗犡犆1、犉犗犡犉2等致病基因,6p25缺失综合征表64《中国产前诊断杂志(电子版)》 2019年第11卷第4期·论著· 型涉及多系统,表型可为有脑畸形、先天性心脏异常、发育迟缓、眼前段异常、听力丧失等[1]。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
INVESTIGATION Accuracy of Genomic Selection Methods in a StandardData Set of Loblolly Pine(Pinus taeda L.)M.F.R.Resende,Jr.,*,†,1P.Muñoz,‡,†,1M.D.V.Resende,§,**D.J.Garrick,††R.L.Fernando,††J.M.Davis,†,‡‡E.J.Jokela,†T.A.Martin,†G.F.Peter,†,‡‡and M.Kirst†,‡‡,2 *Genetics and Genomics Graduate Program,‡Plant Molecular and Cellular Biology Graduate Program,and†School of Forest Resources and Conservation,University of Florida,Gainesville,Florida32611,§EMBRAPA Forestry,Estrada da Ribeira,Colombo,PR83411-000Brazil,**Department of Forest Engineering,Universidade Federal de Viçosa,Viçosa,MG36571-000Brazil,††Department of Animal Science,Iowa State University,Ames,Iowa50011,and‡‡University of Florida Genetics Institute,University of Florida,Gainesville,Florida32611ABSTRACT Genomic selection can increase genetic gain per generation through early selection.Genomic selection is expected to be particularly valuable for traits that are costly to phenotype and expressed late in the life cycle of long-lived species.Alternative approaches to genomic selection prediction models may perform differently for traits with distinct genetic properties.Here the performance of four different original methods of genomic selection that differ with respect to assumptions regarding distribution of marker effects,including(i)ridge regression–best linear unbiased prediction(RR–BLUP),(ii)Bayes A,(iii)Bayes C p,and(iv)Bayesian LASSO are presented.In addition,a modified RR–BLUP(RR–BLUP B)that utilizes a selected subset of markers was evaluated.The accuracy of these methods was compared across17traits with distinct heritabilities and genetic architectures,including growth, development,and disease-resistance properties,measured in a Pinus taeda(loblolly pine)training population of951individuals genotyped with4853SNPs.The predictive ability of the methods was evaluated using a10-fold,cross-validation approach,and differed only marginally for most method/trait combinations.Interestingly,for fusiform rust disease-resistance traits,Bayes C p,Bayes A, and RR–BLUB B had higher predictive ability than RR–BLUP and Bayesian LASSO.Fusiform rust is controlled by few genes of large effect.A limitation of RR–BLUP is the assumption of equal contribution of all markers to the observed variation.However,RR-BLUP B performed equally well as the Bayesian approaches.The genotypic and phenotypic data used in this study are publically available for comparative analysis of genomic selection prediction models.P LANT and animal breeders have effectively used pheno-typic selection to increase the mean performance in se-lected populations.For many traits,phenotypic selection is costly and time consuming,especially so for traits expressed late in the life cycle of long-lived species.Genome-wide selection(GWS)(Meuwissen et al.2001)was proposed as an approach to accelerating the breeding cycle.In GWS, trait-specific models predict phenotypes using dense molec-ular markers from a base population.These predictions are applied to genotypic information in subsequent generations to estimate direct genetic values(DGV).Several analytical approaches have been proposed for genome-based prediction of genetic values,and these differ with respect to assumptions about the marker effects(de los Campos et al.2009a;Habier et al.2011;Meuwissen et al. 2001).For example,ridge regression–best linear unbiased prediction(RR–BLUP)assumes that all marker effects are normally distributed and that these marker effects have identical variance(Meuwissen et al.2001).In Bayes A, markers are assumed to have different variances and are modeled as following a scaled inverse x2distribution (Meuwissen et al.2001).The prior in Bayes B(Meuwissen et al.2001)assumes the variance of markers to equal zero with probability p,and the complement with probability (1–p)follows an inverse x2distribution,with v degree of freedom and scale parameter S.The definition of the prob-ability p depends on the genetic architecture of the trait,Copyright©2012by the Genetics Society of Americadoi:10.1534/genetics.111.137026Manuscript received November19,2011;accepted for publication January10,2012Available freely online through the author-supported open access option.Supporting information is available online at /content/suppl/2012/01/23/genetics.111.137026.DC1.1These authors contributed equally to this work.2Correspondencing author:University of Florida,Newins-Ziegler Hall Rm.367,University of Florida,Gainesville,FL32611.E-mail:mkirst@ufl.eduGenetics,Vol.190,1503–1510April20121503suggesting an improvement to the Bayes B model,known as Bayes C p.In Bayes C p,the mixture probability p has a prior uniform distribution(Habier et al.2011).A drawback of Bayesian methods is the need for the definition of priors. The requirement of a prior for the parameter p is circumvented in the Bayesian LASSO method,which needs less information (de los Campos et al.2009b;Legarra et al.2011b).Methods for genomic prediction of genetic values may perform differently for different phenotypes(Meuwissen et al.2001;Usai et al. 2009;Habier et al.2011)and results may diverge because of differences in genetic architecture among traits(Hayes et al. 2009;Grattapaglia and Resende2011).Therefore,it is valu-able to compare performance among methods with real data and identify those that provide more accurate predictions.Recently,GWS was applied to agricultural crops(Crossa et al.2010)and trees(Resende et al.2011).Here we report, for an experimental breeding population of the tree species loblolly pine(Pinus taeda L.),a comparison of GWS predic-tive models for17traits with different heritabilities and pre-dicted genetic architectures.Genome-wide selection models included RR–BLUP,Bayes A,Bayes C p,and the Bayesian LASSO.In addition,we evaluated a modified RR–BLUP method that utilizes a subset of selected markers,RR–BLUP B.We show that,for most traits,there is limited difference among these four original methods in their ability to predict GBV.Bayes C p performed better for fusiform rust resistance—a disease-resistance trait shown previously to be controlled in part by major genes—and the proposed method RR–BLUP B was sim-ilar to or better than Bayes C p when a subsample of markers wasfitted to the model.Materials and MethodsTraining population and genotypic dataThe loblolly pine population used in this analysis is derived from32parents representing a wide range of accessions from the Atlantic coastal plain,Florida,and lower Gulf of the Uni-ted States.Parents were crossed in a circular mating design with additional off-diagonal crosses,resulting in70full-sib families with an average of13.5individuals per family(Baltunis et al.2007a).This population is referred to hereafter as CCLONES(c omparing c lonal l ines on e xperimental s ites).A subset of the CCLONES population,composed of951indi-viduals from61families(mean,15;standard deviation,2.2) was genotyped using an Illumina Infinium assay(Illumina, San Diego,CA;Eckert et al.2010)with7216SNP,each representing a unique pine EST contig.A subset of4853 SNPs were polymorphic in this population and were used in this study.None of the markers were excluded on the basis of minimum allele frequency.Genotypic data and pedigree in-formation are available in the Supporting Information,File S1 and File S2.Phenotypic dataThe CCLONES population was phenotyped for growth,de-velopmental,and disease-resistance traits in three replicated studies.Thefirst was afield study established using single-tree plots in eight replicates(one ramet of each individual is represented in each replicate)that utilized a resolvable alpha-incomplete block design(Williams et al.2002).In thatfield trial,four replicates were grown under a high-intensity and four were grown under a standard silvicultural intensity re-gime.The traits stem diameter(DBH,cm),total stem height (HT,cm),and total height to the base of the live crown(HTLC, cm)were measured in the eight replicates at years6,6,and4, respectively.At year6,crown width across the planting beds (CWAC,cm),crown width along the planting beds(CWAL, cm),basal height of the live crown(BLC,cm),branch angle average(BA,degrees),and average branch diameter(BD,cm) were measured only in the high-intensity silvicultural treat-ment.Phenotypic traits tree stiffness(Stiffness,km2/sec2), lignin content(Lignin),latewood percentage at year4 (LateWood),wood specific gravity(Density),and5-and 6-carbon sugar content(C5C6)were measured only in two repetitions,in the high-intensity culture(Baltunis et al. 2007a;Emhart et al.2007;Li et al.2007;Sykes et al.2009).The second study was a greenhouse disease-resistance screen.The experimental design was a randomized complete block with single-tree plots arranged in an alpha lattice with an incomplete block(tray container).Fusiform rust(Cronartium quercuum Berk.Miyable ex Shirai f.sp.fusiforme)suscepti-bility was assessed as gall volume(Rust_gall_vol)and pres-ence or absence of rust(Rust_bin)(Kayihan et al.2005; Kayihan et al.2010).Finally,in the third study the rooting ability of cuttings was investigated in an incomplete block design(tray container) with four complete repetitions,in a controlled greenhouse en-vironment.Root number(Rootnum)and presence or absence of roots(Rootnum_bin)were quantified(Baltunis et al.2005; Baltunis et al.2007b).Breeding value predictionAnalyses were carried out using ASReml v.2(Gilmour et al. 2006)with the following mixed linear model,y5Xb1Z1i1Z2a1Z3c1Z4f1Z5d11Z6d21e; where y is the phenotypic measure of the trait being analyzed, b is a vector of thefixed effects,i is a vector of the random incomplete block effects within replication N(0,I s2iblk),a is a vector of random additive effects of clones, N(0,A s2a),c is a vector of random nonadditive effects of clones N(0,I s2c), f is a vector of random family effects N(0,I s2f),d1and d2 are described below,e is the vector of random residual effects N(0,DIAG s2e),X and Z1–Z6are incidence matrices,and I,A,and DIAG are the identity,numerator relationship,and block diagonal matrices,respectively.For traits measured in thefield study under both high and standard culture intensi-ties,the model also included d1,a vector of the random ad-ditive·culture type interaction N(0,DIAG s2d1),and d2, a vector of the random family·culture type interaction N (0,DIAG s2d2).Narrow-sense heritability was calculated as1504M.F.R.Resende et al.the ratio of the additive variance s2a to the total or pheno-typic variance(e.g.,for thefield experiment total variancewas s2a1s2n1s2f1s2d11s2d21s2e).Prior to use in GWS modeling,the estimated breeding values were dereg-ressed into phenotypes(DP)following the approach describedin Garrick et al.(2009),to remove parental average effectsfrom each individual.Breeding values and deregressed phe-notypes are available in File S3and File S4.Statistical methodsThe SNP effects were estimated on the basis offive differentstatistical methods:RR–BLUP,Bayes A(Meuwissen et al.2001),Bayes C p(Habier et al.2011),the Improved Bayes-ian LASSO(BLASSO)approach proposed by Legarra et al.(2011b),and RR–BLUP B(a modified RR–BLUP).In all casesthe genotypic information wasfitted using the modelDP51b1Zm1e;where DP is the vector of phenotypes deregressed from theadditive genetic values(Garrick et al.2009),b is the overallmeanfitted as afixed effect,m is the vector of random markereffects,and e is the vector of random error effects,1is a vectorof ones,and Z is the incidence matrix m,constructed fromcovariates based on the genotypes.No additional information,such as marker location,polygenic effects,or pedigree wasused in those models.Once the marker effects were estimated using one of themethods,the predicted DGV of individual j for that methodwas given by^g j5X niZ ij b m i;where i is the specific allele of the i th marker on individual j and n is the total number of markers.Random regression–best linear unbiased predictorThe RR–BLUP assumed that the SNP effects,m,were random (Meuwissen et al.2001).The variance parameters were as-sumed to be unknown and were estimated by restricted max-imum likelihood(REML),which is equivalent to Bayesian inference using an uninformative,flat prior.Thefirst and second moments for this model arem À0;G5I s2mÁ;EðyÞ51be À0;R5I s2eÁ;VarðyÞ5V5ZGR91R;where s2m is the variance common to each marker effect ands2e is the residual variance.The mixed model equation for the prediction of m is equivalent to:2 4X9X X9ZZ9X Z9Z1Is2es2a=h35"^b^m#5X9yZ9y!;where s2a refers to the total additive variance of the trait and h,due to standardization of the Z matrix,refers to the total number of markers(Meuwissen et al.2009).The matrix Z was parameterized and standardized to have a mean of zero and variance of one as previously described(Resende et al.2010; Resende et al.2011).The analyses were performed in the software R(available at CRAN,/) and the script is available in File S5.Bayes AThe Bayes A method proposed by Meuwissen et al.(2001) assumes the conditional distribution of each effect(given its variance)to follow a normal distribution.The variances are assumed to follow a scaled inversed x2distribution with degrees of freedom n a and scale parameter S2a.The uncon-ditional distribution of the marker effects can be shown to follow a t-distribution with mean zero(Sorensen and Gianola2002).Bayes A differs from RR–BLUP in that each SNP has its own variance.In this study,n a was assigned the value4,and S2awas calculated from the additive variance according to Habier et al.(2011)asS2a5~s2aðv a22Þv a;where~s2a5~s2sð12pÞP Kk¼12p kð12p kÞand p k is the allele frequency of the k th SNP,~s2ais thevariance of a given marker and~s2sis the additive genetic variance explained by the SNPs.Bayes C pBayes C p was proposed by Habier et al.(2011).In this method,the SNP effects have a common variance,which follows a scaled inverse x2prior with parameters n a,S2a.As a result,the effect of a SNPfitted with probability(12p) follows a mixture of multivariate Student’s t-distributions,t(0,n a,IS2a),where p is the probability of a marker havingzero effect.Parameters n a and S2awere chosen as described for Bayes A.The p parameter is treated as unknown with a uniform(0,1)prior distribution.Bayes A and Bayes C p were performed using the software GenSel(Fernando and Garrick2008);available at http:// /bigsgui/)for which an R package is available in File S6.The marker inputfile was coded as 210,0,and10for marker genotypes0,1,and2,respec-tively.A total of50,000iterations were used,with thefirst 2000excluded as the burn-in.Bayesian LASSOThe Bayesian LASSO method was performed as proposed by Legarra et al.(2011b),using the same model equation used previously for the estimation of the markers effects.How-ever,in this case:Genomic Selection in Loblolly Pine1505m j l Y n il2exp ð2l j a i j Þ;e j s 2e MVN À0;I s 2eÁVar ðm Þ52l 2;where MNV represents a multivariate normal distribution and l is the “sharpness ”parameter.Using a formulation in terms of an augmented hierarchical model including an extra variance component t 2i associated with each marker locus,we have:p ðm j t Þ N ð0;D Þ;diag ðD Þ5Àt 21...t 2nÁp ðt j l Þ5Y il 22exp 2l 2t 2i2Therefore,Var ðm i Þ5s 2m i 5t 2i .The prior distribution for s 2e was an inverted x 2distribu-tion with 4degrees of freedom and expectations equal to thevalue used in regular genetic evaluation for s 2e .Analyses were performed using the software GS3(Legarra et al.2011a),available in http://snp.toulouse.inra.fr/ alegarra/.The chain length was 100,000iterations,with the first 2000excluded as the burn-in and a thinning interval of 100.RR –BLUP BWe also evaluated a modi fied,two-step RR –BLUP method that reduces the number of marker effects estimated.In this case,the DGV for each trait was generated on the basis of a reduced subset of markers.To de fine the number of markers in the subset,the marker effects from the RR –BLUP were ranked in decreasing order by their absolute values and grouped in mul-tiples of 10(10,20,30,...,4800).Each group was used,with their original effects,to estimate DGV.The size,q ,of the sub-set that maximized the predictive ability was selected as the optimum number of marker effects to be used in subsequent analyses.Next,markers effects were reestimated in a second RR –BLUP,using only the selected q markers within each training partition (see below).The estimated effects derived from this analysis were used to predict the merit of the individuals in the validation partition that were not present in the training partition.This process was repeated for dif-ferent allocations of the data into training and validation partitions.In each validation,a different subset of markers was selected,on the basis of the highest absolute effects within that training partition.Therefore,the only restriction applied to the second analysis was related to q ,the number of markers to be included in each data set.The same ap-proach was performed with two additional subsets of markers of the same size as a control:the first subset con-tained randomly selected markers and the second subset con-tained markers with the smallest absolute effect values.Validation of the modelsTwo cross-validation schemes were tested in the RR –BLUP method:10-fold and leave-one-out.For the 10-fold cross-validation approach a random subsampling partitioning,fixed for all methods,was used (Kohavi 1995).Brie fly ,the data for each trait were partitioned into two subsets.The first one was composed of the majority of the individuals (90%)and was used to estimate the marker effects.The sec-ond one,the validation partition (10%),had their phenotypes predicted on the basis of the marker effects estimated in the training set.Randomly taken samples of N ¼(9/10)·N T individuals were used as training sets,while the remaining individuals were used for validation (N T is the total number of individuals in the population).The process was repeated 10times,each time with a different set of individuals as the validation partition,until all individuals had their phenotypes predicted (Legarra et al.2008;Usai et al.2009;Verbyla et al.2010).In the leave-one-out approach,a model was con-structed using N T 21individuals in the training population and validated in a single individual that was not used in the training set.This was repeated N T times,such that each individual in the sample was used once as the vali-dation individual.This method maximized the training population size.Accuracy of the modelsThe correlation between the DGV and the DP was estimated using the software ASReml,v.2(Gilmour et al.2006),from a bivariate analysis,including the validation groups as fixed effects since each validation group had DGV estimated from a different prediction equation and might have had a differ-ent mean.This correlation represented the predictive ability ðr y ^y Þof GS to predict phenotypes and was theoretically rep-resented (Resende et al.2010)byr y ^y 5r g ^gh ;where r g ^g was the accuracy of GS and h was the square root of the heritability of adjusted phenotypes,which is associated to Mendelian sampling effects and is given byh 2m 5n 0:5s 2an 0:5s 2a 1s 2e;where n was the number of ramets used in each study.Toremove the in fluence of the heritability upon the predictive ability and thus estimate the accuracy ,the following formula was appliedr g ^g5r y ^y h :In addition,for each method and trait,the slope coef fi-cient for the regression of DP on DGV was calculated as a measurement of the bias of the DGV.Unbiased models are expected to have a slope coef ficient of 1,whereas values greater than 1indicate a biased underestimation in the DGV prediction and values smaller than 1indicate a biased overestimation of the DGV.1506M.F.R.Resende et al.ResultsCross-validation methodTesting the effect of cross-validation using two methods,10-fold and leave-one-out (N -fold),showed that their pre-dictive ability was not signi ficantly different (Table S1).The largest difference was detected for the trait CWAC,where the leave-one-out method outperformed the 10-fold cross-validation by 0.02(standard error ,0.03).Likewise,no signi-ficant differences were found for bias of the regressions (slope)in both methods (Table S2).Thus,the 10-fold approach was selected and used for comparing all methods.Predictive ability of the methodsFour well-established genome-wide selection methods were compared in 17traits with heritabilities ranging from 0.07to 0.45.Overall,the ability to predict phenotype ðr y ^y Þranged from 0.17for Lignin to 0.51for BA (Table 1).Al-though the methods differ in a priori assumptions about marker effects,their predictive ability was similar —no signif-icant differences were detected for any of the 17traits.The standard errors for each method and trait are described in Table S3.Bayesian approaches performed better for traits in the disease-resistance category .For Rust_bin,the methods Bayes A and Bayes C p were 0.05superior than RR –BLUP and 0.06superior to BLASSO.For Rust_gall_vol,Bayes C p was 0.05superior to RR –BLUP and BLASSO.The accuracy (r g ^g )for each genome-wide prediction method was also estimated and varied from 0.37to 0.77(Table S4).For all methods,the ability to predict phenotypes ðr y ^yÞwas linearly correlated with trait heritability .The strongest correlation (0.79)was observed for RR –BLUP (Figure 1).The correlation is expected,as traits with lower heritability have phenotypes less re flective of their genetic content,and are expected to be less predictable through genomic selection.Bias of the methodsThe coef ficient of regression (slope)of DP on DGV was cal-culated as a measurement of the bias of each method.Ideally ,a value of b equal to one indicates no bias in the prediction.For all traits,the slopes of all the models were not signi fi-cantly different than one,indicating no signi ficant bias in the prediction.In addition,no signi ficant differences among the methods were detected (Table S5).Although no evidence of signi ficant bias was detected,the value of b derived from RR –BLUP was slightly higher for all traits (average across traits equal to 1.18).Markers Subset and RR –BLUP BPrediction of phenotype was also performed with RR –BLUP,but adding increasingly larger marker subsets,until all markers were used jointly in the prediction.The predictive ability was plotted against the size of the subset of markers (Figure 2).The pattern of the prediction accuracy was sim-ilar for 13out 17traits (Figure 2A),where differences were mainly found in the rate with which the correlation reached the asymptote.In these cases,the size of the subset rangedTable 1Predictive ability of genomic selection models using four different methodsMethodsTrait category Trait h 2RR –BLUP BLASSO Bayes A Bayes C p GrowthHT 0.310.390.380.380.38HTLC 0.220.450.440.440.44BHLC 0.350.490.490.490.49DBH 0.310.460.460.460.46DevelopmentCWAL 0.270.380.360.360.36CWAC 0.450.480.460.470.47BD 0.150.270.250.270.27BA0.330.510.510.510.51Rootnum_bin 0.100.280.280.270.28Rootnum 0.070.240.260.250.24Disease resistance Rust_bin 0.210.290.280.340.34Rust_gall_vol 0.120.230.240.280.29Wood qualityStiffness 0.370.430.390.420.42Lignin 0.110.170.170.170.17LateWood 0.170.240.240.230.24Density 0.090.200.220.230.22C5C60.140.260.250.250.25h 2is the narrow-sense heritability of thetrait.Figure 1Regression of RR –BLUP predictive ability on narrow-sense her-itability for 17traits (trend line is shown,R 2¼0.79).Genomic Selection in Loblolly Pine 1507from 820to 4790markers.However,a distinct pattern was detected for disease-resistance-related traits,density ,and CWAL (Figure 2B).In these cases,maximum predictive ability was reached with smaller marker subsets (110–590markers)and decreased with the addition of more markers.An addi-tional RR –BLUP was performed using as covariates only the marker subset for which maximum predictive ability was ob-tained.For traits where a large number of markers (.600)explain the phenotypic variability ,RR –BLUP B was similar to RR –BLUP or Bayesian methods (Table S6).However ,for traits where the maximum predictive ability (Density ,Rust_bin,Rust_gall_vol)was reached with a smaller number of marker (,600),RR –BLUP B performed signi ficantly better than RR –BLUP.For example,the predictive ability of the trait Rust_gall_vol was 61%higher using RR –BLUP B (0.37)compared to the traditional RR –BLUP (0.23)and also improved rela-tive to BLASSO (0.24),Bayes A (0.28and Bayes C p (0.29).We also contrasted these results with the predictive ability using a subset of markers of similar size,but selected either randomly or to include those markers with lower effects.As expected,for the three traits the predictive ability was larger for the subset selected by RR –BLUP B over the subsets with lower effects and random effects (Figure 3).A signi ficant dif-ference over the lower and random subsets was found for rust-resistance-related traits (Rust_bin,Rust_gall_vol),while for Density the markers selected by RR –BLUP B were only signi ficantly different than the lower marker subset but not different than the random marker subset.DiscussionWe characterized the performance of RR –BLUP/RR –BLUP B,Bayes A,Bayesian LASSO regression,and Bayes C p for GWS of growth,developmental,disease resistance,and biomass quality traits in common data set generated from an exper-imental population of the conifer loblolly pine.In general,the methods evaluated differed only modestly in their pre-dictive ability (de fined by the correlation between the DGV and DP).The suitability of different methods of developing GWS predictive models is expected to be trait dependent,condi-tional on the genetic architecture of the characteristic.RR –BLUP differs from the other approaches used in this study in that the unconditional variance of marker effects is normally distributed,with the same variance for all markers (Meuwissen et al.2001).This assumption may be suitable when consid-ering an in finitesimal model (Fisher 1918),in which the characters are determined by an in finite number of unlinked and nonepistatic loci,with small effect.Not surprisingly,BLUP-based methods underperformed relative to Bayesian approaches for oligogenic traits.For instance,Verbyla et al.(2009)showed that BLUP-based GWS had lower accuracy ,compared to Bayesian methods,in prediction of fat percent-age in a population in which a single gene explains 50%of the genetic variation.Similarly,our observation that Bayes A and Bayes C p were more accurate in predicting fusiform rust-resistance traits,compared to RR –BLUP,may re flect a simpler genetic architecture,with a few loci of large effect.While the causative genes that regulate fusiform rust resis-tance have not yet been uncovered,several genetic studies support the role of few major genes in the trait variation.For example,the Fr1locus confers resistance to speci fic fungus aeciospore isolates (Wilcox et al.1996),and at least fiveFigure 2Example of the two patterns of predictive abil-ity observed among traits,as an increasing number of markers is added to the model.Each marker group is represented by a set of 10markers.(Left)For DBH,the maximum predictive ability was detected when 380groups of markers (3800markers)were included in the model.(Right)For the trait Rust_gall_vol,predictive ability pattern reached a maximum when only 10groups (100markers)were added.Lines indicate the predictive ability of RR –BLUP (solid line),Bayes C p (dashed line),and RR –BLUP B (dotted line)as reported in Table 1and Table S6.Figure 3Predictive ability for subsets of 310markers for Rust_bin,110markers for Rust_gall_vol,and 240markers for Density.Subsets were generated by selecting markers with the lowest absolute effects (light shading),with random values (medium shading),including all markers (dark shading),and including only those markers with largest absolute effects (solid).1508M.F.R.Resende et al.。