Sparse Covariance Selection via Robust Maximum Likelihood Estimation

合集下载

基于稀疏贝叶斯学习的稳健STAP算法

基于稀疏贝叶斯学习的稳健STAP算法

两种失配同时存在的情况还需进一步探索。
∑ 针对上 述 问 题,本 文 提 出 一 种 基 于 稀 疏 贝 叶 斯 框
架[1920]的 稳 健 STAP (robustsparseBayesianlearning basedSTAP,RSBLSTAP)算法。RSBLSTAP 算 法 首 先 利用导向矢量的 Kronecker结构构建阵列幅相误差和格点 失配同时存在情况下的误差信号模型,然后利用贝叶斯推 断和最大期望(expectationmaximization,EM)算法 迭 [2125] 代求取角度 多普勒像、阵列误差参数以及格点失配参数, 最后利用求解参数计算精确的 CCM 和STAP权矢量。此 外,为了减小模型构建所增加的计算复杂度,本文还提出了 一种基于空域通道的自适应降维字典矩阵设计方法。仿真 实验证明了所提算法的正确性与有效性。
示 划
分必然会带来格点失配效应。为了解决这个问题,本文借鉴
文献[15]中的策略,给每一个离散化的空域通道犳狊,犻(犻=1, 2,…,犖狊)增加一个辅助原子。定义
式 疏
中角:度α狓犮=多=[普α犻1犖=勒,狊11,犼α像犖=犱21,,α2非犻,,犼…犜零狏,α元(犳犖狊犱素犖,犱犼表,]犳T狊示∈,犻)犆相+犖狊应狀犖犱格×=1点犜表犞^上示α犮存待+在求狀杂取
(5) 的稀 波分
量 空
时;犞^字典[狏矩(犳阵犱,1。,犳但狊,1是),杂狏(波犳犱在,2,空犳狊时,2)平,面…是,狏连(犳续犱,犖存犱 ,在犳犱的,犖狊,)离]表散
SRSTAP算法的CCM 估计精度。
其引入式(1),则实际接收信号模型 可 [1718] 以修正为
∑ 为了减小模型失配造成的影响,文献[12 16]对离散
化处理造成的格点失配现象进行了分析,提出局域化搜索 和非均匀划分的空时字典校准算法;文献[17 18]对由阵 元幅相误差造成的失配现象进行了分析,提出误差参数和

最近鲁棒优化进展Recent Advances in Robust Optimization and Robustness An Overview

最近鲁棒优化进展Recent Advances in Robust Optimization and Robustness An Overview

Recent Advances in Robust Optimization and Robustness:An OverviewVirginie Gabrel∗and C´e cile Murat†and Aur´e lie Thiele‡July2012AbstractThis paper provides an overview of developments in robust optimization and robustness published in the aca-demic literature over the pastfive years.1IntroductionThis review focuses on papers identified by Web of Science as having been published since2007(included),be-longing to the area of Operations Research and Management Science,and having‘robust’and‘optimization’in their title.There were exactly100such papers as of June20,2012.We have completed this list by considering 726works indexed by Web of Science that had either robustness(for80of them)or robust(for646)in their title and belonged to the Operations Research and Management Science topic area.We also identified34PhD disserta-tions dated from the lastfive years with‘robust’in their title and belonging to the areas of operations research or management.Among those we have chosen to focus on the works with a primary focus on management science rather than system design or optimal control,which are broadfields that would deserve a review paper of their own, and papers that could be of interest to a large segment of the robust optimization research community.We feel it is important to include PhD dissertations to identify these recent graduates as the new generation trained in robust optimization and robustness analysis,whether they have remained in academia or joined industry.We have also added a few not-yet-published preprints to capture ongoing research efforts.While many additional works would have deserved inclusion,we feel that the works selected give an informative and comprehensive view of the state of robustness and robust optimization to date in the context of operations research and management science.∗Universit´e Paris-Dauphine,LAMSADE,Place du Mar´e chal de Lattre de Tassigny,F-75775Paris Cedex16,France gabrel@lamsade.dauphine.fr Corresponding author†Universit´e Paris-Dauphine,LAMSADE,Place du Mar´e chal de Lattre de Tassigny,F-75775Paris Cedex16,France mu-rat@lamsade.dauphine.fr‡Lehigh University,Industrial and Systems Engineering Department,200W Packer Ave Bethlehem PA18015,USA aure-lie.thiele@2Theory of Robust Optimization and Robustness2.1Definitions and BasicsThe term“robust optimization”has come to encompass several approaches to protecting the decision-maker against parameter ambiguity and stochastic uncertainty.At a high level,the manager must determine what it means for him to have a robust solution:is it a solution whose feasibility must be guaranteed for any realization of the uncertain parameters?or whose objective value must be guaranteed?or whose distance to optimality must be guaranteed? The main paradigm relies on worst-case analysis:a solution is evaluated using the realization of the uncertainty that is most unfavorable.The way to compute the worst case is also open to debate:should it use afinite number of scenarios,such as historical data,or continuous,convex uncertainty sets,such as polyhedra or ellipsoids?The answers to these questions will determine the formulation and the type of the robust counterpart.Issues of over-conservatism are paramount in robust optimization,where the uncertain parameter set over which the worst case is computed should be chosen to achieve a trade-off between system performance and protection against uncertainty,i.e.,neither too small nor too large.2.2Static Robust OptimizationIn this framework,the manager must take a decision in the presence of uncertainty and no recourse action will be possible once uncertainty has been realized.It is then necessary to distinguish between two types of uncertainty: uncertainty on the feasibility of the solution and uncertainty on its objective value.Indeed,the decision maker generally has different attitudes with respect to infeasibility and sub-optimality,which justifies analyzing these two settings separately.2.2.1Uncertainty on feasibilityWhen uncertainty affects the feasibility of a solution,robust optimization seeks to obtain a solution that will be feasible for any realization taken by the unknown coefficients;however,complete protection from adverse realiza-tions often comes at the expense of a severe deterioration in the objective.This extreme approach can be justified in some engineering applications of robustness,such as robust control theory,but is less advisable in operations research,where adverse events such as low customer demand do not produce the high-profile repercussions that engineering failures–such as a doomed satellite launch or a destroyed unmanned robot–can have.To make the robust methodology appealing to business practitioners,robust optimization thus focuses on obtaining a solution that will be feasible for any realization taken by the unknown coefficients within a smaller,“realistic”set,called the uncertainty set,which is centered around the nominal values of the uncertain parameters.The goal becomes to optimize the objective,over the set of solutions that are feasible for all coefficient values in the uncertainty set.The specific choice of the set plays an important role in ensuring computational tractability of the robust problem and limiting deterioration of the objective at optimality,and must be thought through carefully by the decision maker.A large branch of robust optimization focuses on worst-case optimization over a convex uncertainty set.The reader is referred to Bertsimas et al.(2011a)and Ben-Tal and Nemirovski(2008)for comprehensive surveys of robust optimization and to Ben-Tal et al.(2009)for a book treatment of the topic.2.2.2Uncertainty on objective valueWhen uncertainty affects the optimality of a solution,robust optimization seeks to obtain a solution that performs well for any realization taken by the unknown coefficients.While a common criterion is to optimize the worst-case objective,some studies have investigated other robustness measures.Roy(2010)proposes a new robustness criterion that holds great appeal for the manager due to its simplicity of use and practical relevance.This framework,called bw-robustness,allows the decision-maker to identify a solution which guarantees an objective value,in a maximization problem,of at least w in all scenarios,and maximizes the probability of reaching a target value of b(b>w).Gabrel et al.(2011)extend this criterion from afinite set of scenarios to the case of an uncertainty set modeled using intervals.Kalai et al.(2012)suggest another criterion called lexicographicα-robustness,also defined over afinite set of scenarios for the uncertain parameters,which mitigates the primary role of the worst-case scenario in defining the solution.Thiele(2010)discusses over-conservatism in robust linear optimization with cost uncertainty.Gancarova and Todd(2012)studies the loss in objective value when an inaccurate objective is optimized instead of the true one, and shows that on average this loss is very small,for an arbitrary compact feasible region.In combinatorial optimization,Morrison(2010)develops a framework of robustness based on persistence(of decisions)using the Dempster-Shafer theory as an evidence of robustness and applies it to portfolio tracking and sensor placement.2.2.3DualitySince duality has been shown to play a key role in the tractability of robust optimization(see for instance Bertsimas et al.(2011a)),it is natural to ask how duality and robust optimization are connected.Beck and Ben-Tal(2009) shows that primal worst is equal to dual best.The relationship between robustness and duality is also explored in Gabrel and Murat(2010)when the right-hand sides of the constraints are uncertain and the uncertainty sets are represented using intervals,with a focus on establishing the relationships between linear programs with uncertain right hand sides and linear programs with uncertain objective coefficients using duality theory.This avenue of research is further explored in Gabrel et al.(2010)and Remli(2011).2.3Multi-Stage Decision-MakingMost early work on robust optimization focused on static decision-making:the manager decided at once of the values taken by all decision variables and,if the problem allowed for multiple decision stages as uncertainty was realized,the stages were incorporated by re-solving the multi-stage problem as time went by and implementing only the decisions related to the current stage.As thefield of static robust optimization matured,incorporating–ina tractable manner–the information revealed over time directly into the modeling framework became a major area of research.2.3.1Optimal and Approximate PoliciesA work going in that direction is Bertsimas et al.(2010a),which establishes the optimality of policies affine in the uncertainty for one-dimensional robust optimization problems with convex state costs and linear control costs.Chen et al.(2007)also suggests a tractable approximation for a class of multistage chance-constrained linear program-ming problems,which converts the original formulation into a second-order cone programming problem.Chen and Zhang(2009)propose an extension of the Affinely Adjustable Robust Counterpart framework described in Ben-Tal et al.(2009)and argue that its potential is well beyond what has been in the literature so far.2.3.2Two stagesBecause of the difficulty in incorporating multiple stages in robust optimization,many theoretical works have focused on two stages.Regarding two-stage problems,Thiele et al.(2009)presents a cutting-plane method based on Kelley’s algorithm for solving convex adjustable robust optimization problems,while Terry(2009)provides in addition preliminary results on the conditioning of a robust linear program and of an equivalent second-order cone program.Assavapokee et al.(2008a)and Assavapokee et al.(2008b)develop tractable algorithms in the case of robust two-stage problems where the worst-case regret is minimized,in the case of interval-based uncertainty and scenario-based uncertainty,respectively,while Minoux(2011)provides complexity results for the two-stage robust linear problem with right-hand-side uncertainty.2.4Connection with Stochastic OptimizationAn early stream in robust optimization modeled stochastic variables as uncertain parameters belonging to a known uncertainty set,to which robust optimization techniques were then applied.An advantage of this method was to yield approaches to decision-making under uncertainty that were of a level of complexity similar to that of their deterministic counterparts,and did not suffer from the curse of dimensionality that afflicts stochastic and dynamic programming.Researchers are now making renewed efforts to connect the robust optimization and stochastic opti-mization paradigms,for instance quantifying the performance of the robust optimization solution in the stochastic world.The topic of robust optimization in the context of uncertain probability distributions,i.e.,in the stochastic framework itself,is also being revisited.2.4.1Bridging the Robust and Stochastic WorldsBertsimas and Goyal(2010)investigates the performance of static robust solutions in two-stage stochastic and adaptive optimization problems.The authors show that static robust solutions are good-quality solutions to the adaptive problem under a broad set of assumptions.They provide bounds on the ratio of the cost of the optimal static robust solution to the optimal expected cost in the stochastic problem,called the stochasticity gap,and onthe ratio of the cost of the optimal static robust solution to the optimal cost in the two-stage adaptable problem, called the adaptability gap.Chen et al.(2007),mentioned earlier,also provides a robust optimization perspective to stochastic programming.Bertsimas et al.(2011a)investigates the role of geometric properties of uncertainty sets, such as symmetry,in the power offinite adaptability in multistage stochastic and adaptive optimization.Duzgun(2012)bridges descriptions of uncertainty based on stochastic and robust optimization by considering multiple ranges for each uncertain parameter and setting the maximum number of parameters that can fall within each range.The corresponding optimization problem can be reformulated in a tractable manner using the total unimodularity of the feasible set and allows for afiner description of uncertainty while preserving tractability.It also studies the formulations that arise in robust binary optimization with uncertain objective coefficients using the Bernstein approximation to chance constraints described in Ben-Tal et al.(2009),and shows that the robust optimization problems are deterministic problems for modified values of the coefficients.While many results bridging the robust and stochastic worlds focus on giving probabilistic guarantees for the solutions generated by the robust optimization models,Manuja(2008)proposes a formulation for robust linear programming problems that allows the decision-maker to control both the probability and the expected value of constraint violation.Bandi and Bertsimas(2012)propose a new approach to analyze stochastic systems based on robust optimiza-tion.The key idea is to replace the Kolmogorov axioms and the concept of random variables as primitives of probability theory,with uncertainty sets that are derived from some of the asymptotic implications of probability theory like the central limit theorem.The authors show that the performance analysis questions become highly structured optimization problems for which there exist efficient algorithms that are capable of solving problems in high dimensions.They also demonstrate that the proposed approach achieves computationally tractable methods for(a)analyzing queueing networks,(b)designing multi-item,multi-bidder auctions with budget constraints,and (c)pricing multi-dimensional options.2.4.2Distributionally Robust OptimizationBen-Tal et al.(2010)considers the optimization of a worst-case expected-value criterion,where the worst case is computed over all probability distributions within a set.The contribution of the work is to define a notion of robustness that allows for different guarantees for different subsets of probability measures.The concept of distributional robustness is also explored in Goh and Sim(2010),with an emphasis on linear and piecewise-linear decision rules to reformulate the original problem in aflexible manner using expected-value terms.Xu et al.(2012) also investigates probabilistic interpretations of robust optimization.A related area of study is worst-case optimization with partial information on the moments of distributions.In particular,Popescu(2007)analyzes robust solutions to a certain class of stochastic optimization problems,using mean-covariance information about the distributions underlying the uncertain parameters.The author connects the problem for a broad class of objective functions to a univariate mean-variance robust objective and,subsequently, to a(deterministic)parametric quadratic programming problem.The reader is referred to Doan(2010)for a moment-based uncertainty model for stochastic optimization prob-lems,which addresses the ambiguity of probability distributions of random parameters with a minimax decision rule,and a comparison with data-driven approaches.Distributionally robust optimization in the context of data-driven problems is the focus of Delage(2009),which uses observed data to define a”well structured”set of dis-tributions that is guaranteed with high probability to contain the distribution from which the samples were drawn. Zymler et al.(2012a)develop tractable semidefinite programming(SDP)based approximations for distributionally robust individual and joint chance constraints,assuming that only thefirst-and second-order moments as well as the support of the uncertain parameters are given.Becker(2011)studies the distributionally robust optimization problem with known mean,covariance and support and develops a decomposition method for this family of prob-lems which recursively derives sub-policies along projected dimensions of uncertainty while providing a sequence of bounds on the value of the derived policy.Robust linear optimization using distributional information is further studied in Kang(2008).Further,Delage and Ye(2010)investigates distributional robustness with moment uncertainty.Specifically,uncertainty affects the problem both in terms of the distribution and of its moments.The authors show that the resulting problems can be solved efficiently and prove that the solutions exhibit,with high probability,best worst-case performance over a set of distributions.Bertsimas et al.(2010)proposes a semidefinite optimization model to address minimax two-stage stochastic linear problems with risk aversion,when the distribution of the second-stage random variables belongs to a set of multivariate distributions with knownfirst and second moments.The minimax solutions provide a natural distribu-tion to stress-test stochastic optimization problems under distributional ambiguity.Cromvik and Patriksson(2010a) show that,under certain assumptions,global optima and stationary solutions of stochastic mathematical programs with equilibrium constraints are robust with respect to changes in the underlying probability distribution.Works such as Zhu and Fukushima(2009)and Zymler(2010)also study distributional robustness in the context of specific applications,such as portfolio management.2.5Connection with Risk TheoryBertsimas and Brown(2009)describe how to connect uncertainty sets in robust linear optimization to coherent risk measures,an example of which is Conditional Value-at-Risk.In particular,the authors show the link between polyhedral uncertainty sets of a special structure and a subclass of coherent risk measures called distortion risk measures.Independently,Chen et al.(2007)present an approach for constructing uncertainty sets for robust opti-mization using new deviation measures that capture the asymmetry of the distributions.These deviation measures lead to improved approximations of chance constraints.Dentcheva and Ruszczynski(2010)proposes the concept of robust stochastic dominance and shows its applica-tion to risk-averse optimization.They consider stochastic optimization problems where risk-aversion is expressed by a robust stochastic dominance constraint and develop necessary and sufficient conditions of optimality for such optimization problems in the convex case.In the nonconvex case,they derive necessary conditions of optimality under additional smoothness assumptions of some mappings involved in the problem.2.6Nonlinear OptimizationRobust nonlinear optimization remains much less widely studied to date than its linear counterpart.Bertsimas et al.(2010c)presents a robust optimization approach for unconstrained non-convex problems and problems based on simulations.Such problems arise for instance in the partial differential equations literature and in engineering applications such as nanophotonic design.An appealing feature of the approach is that it does not assume any specific structure for the problem.The case of robust nonlinear optimization with constraints is investigated in Bertsimas et al.(2010b)with an application to radiation therapy for cancer treatment.Bertsimas and Nohadani (2010)further explore robust nonconvex optimization in contexts where solutions are not known explicitly,e.g., have to be found using simulation.They present a robust simulated annealing algorithm that improves performance and robustness of the solution.Further,Boni et al.(2008)analyzes problems with uncertain conic quadratic constraints,formulating an approx-imate robust counterpart,and Zhang(2007)provide formulations to nonlinear programming problems that are valid in the neighborhood of the nominal parameters and robust to thefirst order.Hsiung et al.(2008)present tractable approximations to robust geometric programming,by using piecewise-linear convex approximations of each non-linear constraint.Geometric programming is also investigated in Shen et al.(2008),where the robustness is injected at the level of the algorithm and seeks to avoid obtaining infeasible solutions because of the approximations used in the traditional approach.Interval uncertainty-based robust optimization for convex and non-convex quadratic programs are considered in Li et al.(2011).Takeda et al.(2010)studies robustness for uncertain convex quadratic programming problems with ellipsoidal uncertainties and proposes a relaxation technique based on random sampling for robust deviation optimization sserre(2011)considers minimax and robust models of polynomial optimization.A special case of nonlinear problems that are linear in the decision variables but convex in the uncertainty when the worst-case objective is to be maximized is investigated in Kawas and Thiele(2011a).In that setting,exact and tractable robust counterparts can be derived.A special class of nonconvex robust optimization is examined in Kawas and Thiele(2011b).Robust nonconvex optimization is examined in detail in Teo(2007),which presents a method that is applicable to arbitrary objective functions by iteratively moving along descent directions and terminates at a robust local minimum.3Applications of Robust OptimizationWe describe below examples to which robust optimization has been applied.While an appealing feature of robust optimization is that it leads to models that can be solved using off-the-shelf software,it is worth pointing the existence of algebraic modeling tools that facilitate the formulation and subsequent analysis of robust optimization problems on the computer(Goh and Sim,2011).3.1Production,Inventory and Logistics3.1.1Classical logistics problemsThe capacitated vehicle routing problem with demand uncertainty is studied in Sungur et al.(2008),with a more extensive treatment in Sungur(2007),and the robust traveling salesman problem with interval data in Montemanni et al.(2007).Remli and Rekik(2012)considers the problem of combinatorial auctions in transportation services when shipment volumes are uncertain and proposes a two-stage robust formulation solved using a constraint gener-ation algorithm.Zhang(2011)investigates two-stage minimax regret robust uncapacitated lot-sizing problems with demand uncertainty,in particular showing that it is polynomially solvable under the interval uncertain demand set.3.1.2SchedulingGoren and Sabuncuoglu(2008)analyzes robustness and stability measures for scheduling in a single-machine environment subject to machine breakdowns and embeds them in a tabu-search-based scheduling algorithm.Mittal (2011)investigates efficient algorithms that give optimal or near-optimal solutions for problems with non-linear objective functions,with a focus on robust scheduling and service operations.Examples considered include parallel machine scheduling problems with the makespan objective,appointment scheduling and assortment optimization problems with logit choice models.Hazir et al.(2010)considers robust scheduling and robustness measures for the discrete time/cost trade-off problem.3.1.3Facility locationAn important question in logistics is not only how to operate a system most efficiently but also how to design it. Baron et al.(2011)applies robust optimization to the problem of locating facilities in a network facing uncertain demand over multiple periods.They consider a multi-periodfixed-charge network location problem for which they find the number of facilities,their location and capacities,the production in each period,and allocation of demand to facilities.The authors show that different models of uncertainty lead to very different solution network topologies, with the model with box uncertainty set opening fewer,larger facilities.?investigate a robust version of the location transportation problem with an uncertain demand using a2-stage formulation.The resulting robust formulation is a convex(nonlinear)program,and the authors apply a cutting plane algorithm to solve the problem exactly.Atamt¨u rk and Zhang(2007)study the networkflow and design problem under uncertainty from a complexity standpoint,with applications to lot-sizing and location-transportation problems,while Bardossy(2011)presents a dual-based local search approach for deterministic,stochastic,and robust variants of the connected facility location problem.The robust capacity expansion problem of networkflows is investigated in Ordonez and Zhao(2007),which provides tractable reformulations under a broad set of assumptions.Mudchanatongsuk et al.(2008)analyze the network design problem under transportation cost and demand uncertainty.They present a tractable approximation when each commodity only has a single origin and destination,and an efficient column generation for networks with path constraints.Atamt¨u rk and Zhang(2007)provides complexity results for the two-stage networkflow anddesign plexity results for the robust networkflow and network design problem are also provided in Minoux(2009)and Minoux(2010).The problem of designing an uncapacitated network in the presence of link failures and a competing mode is investigated in Laporte et al.(2010)in a railway application using a game theoretic perspective.Torres Soto(2009)also takes a comprehensive view of the facility location problem by determining not only the optimal location but also the optimal time for establishing capacitated facilities when demand and cost parameters are time varying.The models are solved using Benders’decomposition or heuristics such as local search and simulated annealing.In addition,the robust networkflow problem is also analyzed in Boyko(2010),which proposes a stochastic formulation of minimum costflow problem aimed atfinding network design andflow assignments subject to uncertain factors,such as network component disruptions/failures when the risk measure is Conditional Value at Risk.Nagurney and Qiang(2009)suggests a relative total cost index for the evaluation of transportation network robustness in the presence of degradable links and alternative travel behavior.Further,the problem of locating a competitive facility in the plane is studied in Blanquero et al.(2011)with a robustness criterion.Supply chain design problems are also studied in Pan and Nagi(2010)and Poojari et al.(2008).3.1.4Inventory managementThe topic of robust multi-stage inventory management has been investigated in detail in Bienstock and Ozbay (2008)through the computation of robust basestock levels and Ben-Tal et al.(2009)through an extension of the Affinely Adjustable Robust Counterpart framework to control inventories under demand uncertainty.See and Sim (2010)studies a multi-period inventory control problem under ambiguous demand for which only mean,support and some measures of deviations are known,using a factor-based model.The parameters of the replenishment policies are obtained using a second-order conic programming problem.Song(2010)considers stochastic inventory control in robust supply chain systems.The work proposes an inte-grated approach that combines in a single step datafitting and inventory optimization–using histograms directly as the inputs for the optimization model–for the single-item multi-period periodic-review stochastic lot-sizing problem.Operation and planning issues for dynamic supply chain and transportation networks in uncertain envi-ronments are considered in Chung(2010),with examples drawn from emergency logistics planning,network design and congestion pricing problems.3.1.5Industry-specific applicationsAng et al.(2012)proposes a robust storage assignment approach in unit-load warehouses facing variable supply and uncertain demand in a multi-period setting.The authors assume a factor-based demand model and minimize the worst-case expected total travel in the warehouse with distributional ambiguity of demand.A related problem is considered in Werners and Wuelfing(2010),which optimizes internal transports at a parcel sorting center.Galli(2011)describes the models and algorithms that arise from implementing recoverable robust optimization to train platforming and rolling stock planning,where the concept of recoverable robustness has been defined in。

基于果蝇优化的支持向量机回归模型

基于果蝇优化的支持向量机回归模型

基于果蝇优化的支持向量机回归模型
赵伟
【期刊名称】《西安邮电学院学报》
【年(卷),期】2015(000)004
【摘要】给出一种基于果蝇优化的支持向量机回归模型。

将支持向量机惩罚因子和核函数参数初始化为果蝇群体,根据果蝇优化算法原理,依据适应度最优原则进行迭代觅食,搜索最优参数,建立模型。

将该模型用于分析有机化合物熔点预测问题,结果显示,该模型预测均方误差为3.02%,相关系数达到89.39%。

【总页数】4页(P70-73)
【作者】赵伟
【作者单位】陕西省行政学院电子设备与信息管理处,陕西西安 710068
【正文语种】中文
【中图分类】TP301.8
【相关文献】
1.基于果蝇优化的SVM回归模型及其应用研究 [J], 吴悦
2.基于果蝇优化的支持向量机回归模型 [J], 赵伟;
3.基于改进果蝇优化算法的随机森林回归模型及其在风速预测中的应用 [J], 朱昶胜;李岁寒
4.基于改进果蝇优化算法优化支持向量机的故障诊断 [J], 黄晓璐;周湘贞
5.基于果蝇优化算法的支持向量机参数优化在船舶操纵预报中的应用 [J], 王雪刚;邹早建
因版权原因,仅展示原文概要,查看原文内容请购买。

变量选择的稳健贝叶斯LASSO方法

变量选择的稳健贝叶斯LASSO方法

第48卷第8期西南师范大学学报(自然科学版)2023年8月V o l.48N o.8 J o u r n a l o f S o u t h w e s t C h i n aN o r m a lU n i v e r s i t y(N a t u r a l S c i e n c eE d i t i o n)A u g.2023D O I:10.13718/j.c n k i.x s x b.2023.08.005变量选择的稳健贝叶斯L A S S O方法①梁韵婷,张辉国,胡锡健新疆大学数学与系统科学学院,乌鲁木齐830046摘要:针对数据中广泛存在的异常值会扭曲贝叶斯L A S S O方法的参数估计和变量选择结果的问题,通过引入异方差扰动的先验设定,借此提升贝叶斯L A S S O方法的稳健性,并推导出各参数的后验分布,利用G i b b s抽样得到其估计值与置信区间.该方法在数值模拟中表现出较低的拟合误差与较高的变量识别准确率,对糖尿病数据集和血浆β-胡萝卜素水平数据集的分析表明该方法能达到简化模型与减少预测误差的平衡,实现稳健的变量选择与系数估计,并对数据中可能包含的异常值与异方差扰动有良好的抑制作用.关键词:变量选择;贝叶斯L A S S O;稳健性;异常值;异方差中图分类号:O212.8文献标志码:A文章编号:10005471(2023)08003308R o b u s t B a y e s i a nL A S S Of o rV a r i a b l e S e l e c t i o nL I A N G Y u n t i n g,Z HA N G H u i g u o, HU X i j i a nC o l l e g eo fM a t h e m a t i c sa n dS y s t e mS c i e n c e,X i n j i a n g U n i v e r s i t y,U r u m q i830046,C h i n aA b s t r a c t:G i v e n t h a t t h eu b i q u i t o u so u t l i e r s i n t h ed a t a c a nd i s t o r t t h e p a r a m e t e r e s t i m a t i o na n dv a r i a b l e s e l e c t i o nr e s u l t s o fB a y e s i a nL A S S O,t h e p r i o r i n f o r m a t i o n o f h e t e r o s c e d a s t i c d i s t u r b a n c e s i s i n t r o d u c e d t o i m p r o v e t h e r o b u s t n e s s o fB a y e s i a nL A S S O.T h e p o s t e r i o rd i s t r i b u t i o no f e a c h p a r a m e t e r i sd e r i v e d,a n d t h e e s t i m a t i o na n d c o n f i d e n c e i n t e r v a l o f e a c h p a r a m e t e r a r e o b t a i n e db y G i b b s s a m p l i n g.T h em e t h o de x-h i b i t s l o wf i t t i n g e r r o r a n dh i g hv a r i a b l e i d e n t i f i c a t i o na c c u r a c y i nn u m e r i c a l s i m u l a t i o n,a n d t h e a n a l y s e s o f d i a b e t e s d a t a s e t a n dP l a s m aB e t a-C a r o t e n eL e v e lD a t a s e t s h o wt h a t t h e p r o p o s e d m e t h o da c h i e v e s t h e b a l a n c eb e t w e e n s i m p l i f y i n g m o d e l a n d r e d u c i n gp r e d i c t i o ne r r o r.T h e p r o p o s e dm e t h o dc a n r e a l i z e r o b u s t v a r i a b l e s e l e c t i o na n d c o e f f i c i e n t e s t i m a t i o na n dh a s a g o o d i n h i b i t o r y e f f e c t t oo u t l i e r s a n dh e t e r o s c e d a s t i c d i s t u r b a n c e s t h a tm a y b e i n c l u d e d i n t h e d a t a.K e y w o r d s:v a r i a b l e s e l e c t i o n;B a y e s i a nL A S S O;r o b u s t n e s s;o u t l i e r;h e t e r o s c e d a s t i c i t y随着信息化时代的到来,大数据的应用越来越广泛,同时也不可避免地出现了异质性问题,表现出异方差特性.而当数据中存在异方差误差或异常点时,变量选择的结果将不再稳定.目前变量选择方法主要分为非贝叶斯方法和贝叶斯方法.基于惩罚函数的变量选择是非贝叶斯方法的主流[1-9],最常见的包括L A S-S O(L e a s tA b s o l u t eS h r i n k a g e a n dS e l e c t i o nO p e r a t o r)及其改进方法,如:E N(E l a s t i cN e t)㊁自适应L A S-①收稿日期:20221023基金项目:国家自然科学基金项目(11961065);教育部人文社会科学研究规划基金项目(19Y J A910007);新疆自然科学基金项目(2019D01C045).作者简介:梁韵婷,硕士研究生,主要从事贝叶斯空间计量模型的研究.Copyright©博看网. All Rights Reserved.S O (A L A S S O )㊁组L A S S O ㊁S C A D (S m o o t h l y C l i p pe dA b s o l u t eD e v i a t i o n )㊁M C P (M i n i m a xC o n v e xP e n a l -t y)㊁最小绝对偏差L A S S O [7]等.尽管非贝叶斯方法已经取得了不错的成果,但这类方法都不能提供令人满意的标准差估计.文献[1]表明当回归参数具有独立且相同的拉普拉斯先验时,L A S S O 估计可以解释为后验众数估计.因此,基于该联系和贝叶斯思想,文献[10]提出了贝叶斯L A S S O (B L A S S O )并构造了全贝叶斯分层模型和相应的采样器.文献[11]证明在预测均方误差方面,贝叶斯L A S S O 的表现与频率派L A S S O 相似甚至在某些情况下更好.基于文献[10-13]的研究,本文将贝叶斯L A S S O 与异方差误差先验相结合,以实现稳健的变量选择与系数估计,同时该法能自动产生各参数的置信区间.1 分层模型1.1 G i b b s 采样器考虑以下线性回归模型Y =X β+ε,ε~N (0,σ2V )(1)其中:Y 为n ˑ1维的因变量,X 为n ˑp 维的解释变量,误差ε服从异方差的多元正态分布,V =d i a g(V 1, ,V n ),则该模型的似然函数如式(2)所示L (Y |β,σ2,V )=(2πσ2)-n 2|V |-12e x p -12σ2(Y -X β)T V -1(Y -X β)éëêêùûúú(2)结合文献[10,12]的工作,则全模型的分层表示为Y =X β+ε,ε~N (0,σ2V )p (β|τ21,τ22, ,τ2p )~N (0,σ2D τ)D τ=d i a g (τ21,τ22, ,τ2p )p (τ21,τ22, ,τ2p )~ᵑpj =1λ22e -λ2τ2j 2p(σ2)~γαΓ(α)(σ2)-α-1e -γσ2(α>0,γ>0)p r V i æèçöø÷~i.i .d .χ2(r ),i =1, ,n 将该模型的似然函数与各参数的先验分布相乘,可得联合后验分布为p (β,σ2,V ,τ21, ,τ2p |Y ,X )ɖ|V |-12(2πσ2)-n 2e x p -12σ2(Y -X β)T V -1(Y -X β)éëêêùûúúγαΓ(α)(σ2)-α-1e -γσ2ˑᵑpj =11(2πσ2τ2j)12e -β2j2σ2τ2jλ22e -λ2τ2j 2ˑr 2æèçöø÷n r 2Γr 2æèçöø÷éëêêùûúú-n ᵑni =1V -r +22i e -r 2V i (3)基于式(3),可得β的全条件后验分布服从均值为B -1X T V -1Y ,方差为σ2B -1的多元正态分布,其中:B =X TV -1X +D -1τ;σ2的全条件后验分布服从形状参数为n 2+p 2+α,尺度参数为(Y -X β)T V -1(Y -X β)2+βT D -1τβ2+γ的逆伽马分布;1τ2j 的全条件后验分布服从形状参数为λ'=λ2,均值参数为μ'=λ2σ2β2j 的逆高斯分布;文献[12]得出V 的全条件后验分布服从以下形式的卡方分布p e 2i σ-2+r V i β,σ2,V -i ,τ21, ,τ2p æèçöø÷ɖχ2(r +1)式中e i 项为向量e =Y -X β的第i 个元素,V -i =(V 1, ,V i -1,V i +1, ,V n ),i =1, ,n .根据各参数后43西南师范大学学报(自然科学版) h t t p ://x b b jb .s w u .e d u .c n 第48卷Copyright ©博看网. All Rights Reserved.验分布可构造出稳健贝叶斯L A S S O 的G i b b s 采样算法:算法1:稳健贝叶斯L A S S O 的G i b b s 采样器输入:Y ,X ,迭代次数T d r a w ,预热次数T o m i t ,初值β(0),σ2(0),τ2(0),V (0)输出:βɡ,σɡ2,τɡ2,V ɡ1:k ѳ12:当k ɤT d r a w3: 从后验分布p (β|Y ,X ,σ2(k -1),V (k -1),τ2(k -1))中抽样并记为β(k )4: 从后验分布p (τ2|Y ,X ,β(k ),σ2(k -1),V (k -1))中抽样并记为τ2(k )5: 从后验分布p (σ2|Y ,X ,β(k ),V (k -1),τ2(k ))中抽样并记为σ2(k )6: 从后验分布p (V |Y ,X ,β(k ),σ2(k ),τ2(k ))中抽样并记为V (k )7: k ѳk +18:结束9:删去前T o m i t 轮样本,取后T d r a w -T o m i t 轮样本计算各参数的后验平均值作为估计值1.2 超参数选取关于超参数λ2的选取,借鉴文献[10]提出的基于边际最大似然的经验贝叶斯法,具体算法如下:1)令k =0并设初值为λ(0)=pσɡ2W L Sðpj =1βɡ2W L S,其中σɡ2W L S 和βɡ2W L S为以普通线性最小二乘估计残差值的绝对值的倒数为权重的加权最小二乘估计值;2)令λ=λ(k )并利用上述G i b b s 采样器从β,σ2,τ2,V 的后验分布中生成第k 轮样本;3)利用第k 轮样本近似计算更新λ(k +1)=2p ðpj =1Eλ(k )τ2j Y []并令k =k +1;4)重复步骤2)-3)直至所需的收敛水平.由于经验贝叶斯法需要多次G i b b s 采样,因此该法计算量极大.文献[14]提出了一种基于随机近似的单步方法作为替代,该方法可以仅使用单次G i b b s 采样器来获得超参数的极大似然估计,从而极大减少计算量.该法首先作变换λ(k )=e s (k ),具体算法如下:1)令k =0并设初值为s (0)=0,θ(0)=(β(0),σ2(0),τ2(0),V (0));2)从K s (k)(θ(k ),㊃)中生成θ(k +1),其中K s 为联合后验分布p (㊃Y ,s )的G i b b s 采样器的马尔科夫核;3)令s (k +1)=s (k )+a k (2p -e 2s (k )ðpj =1τ2j ,(k +1))令k =k +1;4)重复步骤2)-3)直至所需的迭代次数.其中a k ,k ȡ0{}为一个非降的正数序列,并满足以下性质l i m k ңɕa k =0,ða k =¥,ða 2k <ɕ2 数值模拟本节将评估异方差误差先验下稳健贝叶斯L A S S O 的实验特性与优点.根据式(1)生成数据,令X =[ιn ,X '],ιn 为n 维的单位向量,X '=X 1,X 2, ,X p -1[]为多元正态分布N (0,Σ)生成,其中Σi j =0.5|i -j|.为了考虑系数向量不同的稀释度,所有模拟均设置n =100和p =50并令非零系数的个数q ɪ10,20{}.此外,为了测试收缩的适应性,一半的非零系数从正态分布N (0,1)中生成,另一半非零系数从正态分布N (0,5)中抽样,从而使得一半的非零系数接近于0,另一半的非零系数则表现出更大的变化,剩余系数则设置为0.每次模拟均使用5000次迭代并取后2500次抽样计算各参数的后验均值作为估计值,为了避免偶然性,模拟均重复100次.为了考察所提方法对异常值的稳健性,本文考虑了4种不同的ε.例1(异方差误差):为了生成异方差误差,对于样本量n 按照文献[15]生成随机组,其中组的个数由均53第8期 梁韵婷,等:变量选择的稳健贝叶斯L A S S O 方法Copyright ©博看网. All Rights Reserved.匀分布U (3,20)抽样得出.如果组个数大于10,则将该组所有样本的方差设置为等于组个数,否则将方差设置为组个数倒数的平方,并令ε的第i 个元素为εi =σiξi 其中:σi为第i 个观测样本的标准差,ξi 来自独立同分布的标准正态分布N (0,1).例2(污染分布):ε服从污染分布,其中前90%来自标准正态分布,后10%服从标准柯西分布.例3(柯西分布):ε服从标准柯西分布.例4(拉普拉斯分布):ε服从标准拉普拉斯分布.为了衡量系数估计与变量选择的性能,本文采用均方误差(M S E )与平衡准确率(B A R )作为指标.平衡准确率能综合衡量变量选择方法正确选择㊁错选㊁漏选变量的个数,其计算公式如下B A R =12T P T P +F N +T N T N +F P æèçöø÷其中T P ,T N ,F P ,F N 分别表示真阳性㊁真阴性㊁假阳性和假阴性的数量.将本文提出的稳健贝叶斯L A S S O 方法简记为R B L A S S O.表1列出了不施加异方差误差先验下几种常见方法与R B L A S S O 的实验结果,其中每项指标为基于100次模拟的平均值.值得注意的是,贝叶斯方法的变量选择结果基于参数的95%置信区间.若95%置信区间含0,则可认为该参数被识别为0.从模拟结果可得,本文方法在大多数情况下都具有较好的综合表现,其中当误差分布为异方差时R B L A S S O 的各项性能指标均为最优.根据对比可得,当非零系数的个数q 增大时,即系数向量越密集时,每种方法的估计值往往会稍差,这是因为需要用相同数量的观测值估计更多的非零参数.当误差分布服从标准柯西分布,即例子3时,不施加异方差误差先验下的贝叶斯L A S S O 的M S E (βɡ)相比其他误差分布大得多,而R B L A S S O 依然能保持较好的系数估计与变量选择能力,甚至在q 增大时M S E (βɡ)反而减小,这表明了施加异方差误差先验对抵抗异常值具有重大作用.表1 不同模型在4种扰动下基于100次模拟试验的变量选择结果方法q =10M S E (βɡ)B A Rq =20M S E (βɡ)B A RE x a m pl e 1B L A S S O 0.07880.72690.10520.7294L A S S O 0.05680.72010.08780.6884A L A S S O0.05100.73410.10380.7057R B L A S S O0.01480.83700.04840.7933E x a m p l e 2B L A S S O 0.41440.74270.34320.7846L A S S O0.09980.71510.26480.6533A L A S S O0.10080.76470.24060.7695R B L A S S O0.11240.76830.27240.8082E x a m p l e 3B L A S S O 19.85660.582960.05740.5235L A S S O0.53840.62000.46660.5561A L A S S O0.70620.60790.78720.5521R B L A S S O0.65940.63690.35420.5879E x a m p l e 4B L A S S O 0.03040.93530.04740.8276L A S S O0.01940.78540.04500.6699A L A S S O0.01780.85240.03520.8134R B L A S S O0.03020.92440.05520.81323 案例研究3.1 糖尿病数据集将本文提出的稳健贝叶斯L A S S O 方法应用到糖尿病数据集中,该数据集由文献[16]提供,共有44263西南师范大学学报(自然科学版) h t t p ://x b b jb .s w u .e d u .c n 第48卷Copyright ©博看网. All Rights Reserved.个样本和11个变量,其中10个解释变量分别为年龄(a g e )㊁性别(s e x )㊁体重指数(b m i )㊁平均血压(m a p )及6种血清测量(t c ,l d l ,h d l ,t c h ,l t g ,gl u ),因变量为基线点一年后疾病进展的定量测量.本文所使用的数据集来自R 包c a r e,所有变量均已标准化使得均值为0㊁方差为1.为了研究所提方法的稳健性,随机选取20%的样本在因变量上加上噪音c ,其中c 取为3倍的因变量标准差,并随机划分70%的数据集作为训练集,剩余30%作为测试集.评估指标采用预测均方误差(M S E )与中值绝对预测误差(MA P E ).图1为该数据集各变量的箱线图,初步可得解释变量和因变量均存在异常值;图2为学生化残差与帽子统计量关系图,其中圆圈面积与观测点的C o o k 距离成正比,垂直两条虚线分别为两倍和三倍平均帽子值的参考线,水平两条虚线分别是学生化残差为0及2的参考线,进一步分析可得该数据集中样本295和305为离群点,样本323和354为高杠杆值点,若以4n -k -1为C o o k 距离的阈值则有35个强影响点.图1 糖尿病数据集各变量的箱线图图2 学生化残差与帽子统计量的气泡图,其中圆圈的面积表示与C o o k 距离成正比的观测值各模型估计结果如表2所示,其中标粗体的系数估计值代表其置信区间含0.B L A S S O 和R B L A S S O均排除了7个相同的非重要变量,而L A S S O 和A L A S S O 仅排除了4个非重要变量,且这4个非重要变量73第8期 梁韵婷,等:变量选择的稳健贝叶斯L A S S O 方法Copyright ©博看网. All Rights Reserved.均为4个模型所排除的共同变量,分别为s e x,l d l,t c h,g l u.根据M S E和MA P E,本文所提方法的预测误差最低.此外,由图3可得相比B L A S S O,施加了异方差先验的R B L A S S O具有更短的置信区间.因此,所提方法的结果应具备更高的可靠性.表2不同方法下糖尿病数据集的估计结果L e a s t S q u a r e s W e i g h t e dL e a s tS q u a r e sB a y e s i a nL A S S OR o b u s tB a y e s i a nL A S S O L A S S OA d a p t i v eL A S S Oa g e-0.0026-0.0949-0.0661-0.0491-0.0831-0.1119s e x0.0120-0.02820.0041-0.048900b m i0.44090.41750.41590.34170.43150.4428m a p0.28500.25130.23680.16430.25250.2735 t c-1.0098-0.8514-0.0514-0.0735-0.0987-0.1337l d l0.75080.5823-0.0227-0.048500h d l0.29280.2336-0.0582-0.0591-0.0374-0.0359t c h0.00670.02430.00470.028800l t g0.77540.68810.35580.38820.39030.4236g l u-0.0155-0.00260.00770.018600M S E278.7343273.4827272.9438266.5315274.3907276.0192 MA P E0.66140.55060.60660.54810.61220.6211图3不同方法下糖尿病数据集各变量的系数估计值与对应的95%置信区间3.2血浆β-胡萝卜素水平数据集文献[17]数据集包含了315名患者,均在3年内进行过活检或切除肺㊁结肠㊁乳腺㊁皮肤㊁卵巢或子宫的非癌病变,选取其中的273名女性患者作为研究对象.该数据集共有11个变量,10个解释变量分别为年龄(a g e)㊁吸烟状态(s m o k s t a t)㊁Q u e t e l e t指数(q u e t e l e t)㊁维生素使用(v i t u s e)㊁每天摄入的卡路里数(c a l o-r i e s)㊁每天摄入的脂肪克数(f a t)㊁每天摄入的纤维克数(f i b e r)㊁每周摄入的酒精饮料数量(a l c o h o l)㊁胆固醇摄入量(m g/天,c h o l)㊁膳食β-胡萝卜素消耗量(m c g/d,b e t a d i e t),因变量为血浆β-胡萝卜素(n g/m l).所有变量均已标准化使得均值为0㊁方差为1,随机划分70%的数据集作为训练集拟合模型,将剩余30%作为测试集并通过计算预测均方误差(M S E)与中值绝对预测误差(MA P E)来评估模型的预测能力.图4和图5分别为血浆β-胡萝卜素和胆固醇的直方图,由图可得这两个变量均含有异常值.将各模型应用于该数据,估计结果如表3所示,其中B L A S S O和R B L A S S O均认为q u e t e l e t,v i t u s e和b e t a d i e t为重要变量,而L A S S O和A L A S S O仅排除了c a l o r i e s变量.尽管R B L A S S O的MA P E不是最低,但与MA P E 最低的B L A S S O差距甚小,且R B L A S S O的M S E远低于其他方法,综合来说R B L A S S O模型的预测能力83西南师范大学学报(自然科学版)h t t p://x b b j b.s w u.e d u.c n第48卷Copyright©博看网. All Rights Reserved.最优.此外,从图6可得R B L A S S O 明显比B L A S S O 具有更短的置信区间,估计精度更高.图4 血浆胡萝卜素的直方图图5 胆固醇的直方图表3 不同方法下血浆胡萝卜素水平数据集的估计结果L e a s tS qu a r e s W e i gh t e dL e a s t S q u a r e s B a ye s i a n L A S S O R o b u s tB a y e s i a nL A S S O L A S S O A d a pt i v e L A S S O a ge 0.06230.05070.04860.07480.05470.0641s m o k s t a t -0.0460-0.0346-0.0337-0.0201-0.0328-0.0424q u e t e l e t -0.2052-0.1818-0.1836-0.1380-0.1946-0.2023v i t u s e-0.2655-0.2400-0.2286-0.1367-0.2472-0.2564c a l o r i e s-0.0804-0.2062-0.0117-0.025700f a t-0.05140.0709-0.0593-0.0062-0.0911-0.1021f i b e r 0.23410.21970.16910.04950.18380.1992a l c o h o l 0.16000.10440.10370.03040.12890.1453c h o l-0.0468-0.0430-0.0384-0.0161-0.0402-0.0473b e t a d i e t 0.23600.22230.21500.15260.22730.2353M S E 34.914128.592229.465320.370332.385334.2673MA P E0.34660.34370.32390.32690.35360.3641图6 不同方法下血浆胡萝卜素水平数据集各变量的系数估计值与对应的95%置信区间93第8期 梁韵婷,等:变量选择的稳健贝叶斯L A S S O 方法Copyright ©博看网. All Rights Reserved.04西南师范大学学报(自然科学版)h t t p://x b b j b.s w u.e d u.c n第48卷4结论本文通过将异方差误差先验引入贝叶斯L A S S O,提出了贝叶斯L A S S O的稳健模型并建立了相应的贝叶斯分层模型与G i b b s采样器,从而提高了对异常值及异方差误差的稳健性.数值模拟和实证分析表明当存在异常值或异方差误差时,该方法能实现较简洁的模型与较低的误差,从而实现稳健的变量选择.此外,该模型立足于贝叶斯思想,能方便地得到估计值的置信区间,从而弥补了L A S S O类方法不能给出较好可信度评估的劣势.参考文献:[1]T I B S H I R A N IR.R e g r e s s i o nS h r i n k a g e a n dS e l e c t i o nv i a t h eL a s s o[J].J o u r n a l o f t h eR o y a l S t a t i s t i c a l S o c i e t y S e r i e sB:S t a t i s t i c a lM e t h o d o l o g y,1996,58(1):267-288.[2] Z O U H,HA S T I ET.R e g u l a r i z a t i o n a n dV a r i a b l e S e l e c t i o n v i a t h eE l a s t i cN e t[J].J o u r n a l o f t h eR o y a l S t a t i s t i c a l S o c i e-t y S e r i e sB:S t a t i s t i c a lM e t h o d o l o g y,2005,67(2):301-320.[3] Z O U H.T h e A d a p t i v eL a s s oa n dI t s O r a c l eP r o p e r t i e s[J].J o u r n a lo ft h e A m e r i c a nS t a t i s t i c a lA s s o c i a t i o n,2006,101(476):1418-1429.[4] Y U A N M,L I N Y.M o d e l S e l e c t i o n a n dE s t i m a t i o n i nR e g r e s s i o nw i t hG r o u p e dV a r i a b l e s[J].J o u r n a l o f t h eR o y a l S t a-t i s t i c a l S o c i e t y S e r i e sB:S t a t i s t i c a lM e t h o d o l o g y,2006,68(1):49-67.[5] F A NJQ,L IRZ.V a r i a b l eS e l e c t i o nv i aN o n c o n c a v eP e n a l i z e dL i k e l i h o o da n d I t sO r a c l eP r o p e r t i e s[J].J o u r n a l o f t h eA m e r i c a nS t a t i s t i c a lA s s o c i a t i o n,2001,96(456):1348-1360.[6] Z HA N GC H.N e a r l y U n b i a s e dV a r i a b l eS e l e c t i o nu n d e rM i n i m a xC o n c a v eP e n a l t y[J].T h eA n n a l s o f S t a t i s t i c s,2010,38(2):894-942.[7] WA N G H S,L IG D,J I A N G G H.R o b u s tR e g r e s s i o nS h r i n k a g ea n dC o n s i s t e n tV a r i a b l eS e l e c t i o nt h r o u g ht h eL A D-L a s s o[J].J o u r n a l o fB u s i n e s s&E c o n o m i cS t a t i s t i c s,2007,25(3):347-355.[8] WU Y,L I U Y.V a r i a b l eS e l e c t i o n i nQ u a n t i l eR e g r e s s i o n[J].S t a t i s t i c aS i n i c a,2009,19(2):801-817.[9] WA N G X Q,J I A N G YL,HU A N G M,e t a l.R o b u s tV a r i a b l eS e l e c t i o nw i t hE x p o n e n t i a l S q u a r e dL o s s[J].J o u r n a l o ft h eA m e r i c a nS t a t i s t i c a lA s s o c i a t i o n,2013,108(502):632-643.[10]P A R K T,C A S E L L A G.T h eB a y e s i a nL a s s o[J].J o u r n a l o f t h eA m e r i c a nS t a t i s t i c a lA s s o c i a t i o n,2008,103(482):681-686.[11]K Y U N G M,G I L LJ,G HO S H M,e t a l.P e n a l i z e dR e g r e s s i o n,S t a n d a r dE r r o r s,a n dB a y e s i a nL a s s o s[J].B a y e s i a nA-n a l y s i s,2010,5(2):369-412.[12]G E W E K EJ.B a y e s i a nT r e a t m e n t o f t h e I n d e p e n d e n t S t u d e n t-t L i n e a rM o d e l[J].J o u r n a l o fA p p l i e dE c o n o m e t r i c s,1993,8(S1):S19-S40.[13]L A N G EKL,L I T T L ERJA,T A Y L O RJM G.R o b u s t S t a t i s t i c a lM o d e l i n g U s i n g t h e t D i s t r i b u t i o n[J].J o u r n a l o f t h eA m e r i c a nS t a t i s t i c a lA s s o c i a t i o n,1989,84(408):881-896.[14]A T C HA DÉY F.A C o m p u t a t i o n a lF r a m e w o r kf o rE m p i r i c a lB a y e s I n f e r e n c e[J].S t a t i s t i c sa n dC o m p u t i n g,2011,21(4):463-473.[15]L I N X,L E ELF.GMM E s t i m a t i o no f S p a t i a lA u t o r e g r e s s i v eM o d e l sw i t hU n k n o w nH e t e r o s k e d a s t i c i t y[J].J o u r n a l o fE c o n o m e t r i c s,2010,157(1):34-52.[16]E F R O NB,HA S T I ET,J OHN S T O N EI,e t a l.L e a s tA n g l eR e g r e s s i o n[J].T h eA n n a l so fS t a t i s t i c s,2004,32(2):407-499.[17]N I E R E N B E R GD W,S T U K E LT A,B A R O NJA,e t a l.D e t e r m i n a n t s o f P l a s m aL e v e l s o f b e t a-C a r o t e n e a n dR e t i n o l[J].A m e r i c a n J o u r n a l o fE p i d e m i o l o g y,1989,130(3):511-521.责任编辑张栒Copyright©博看网. All Rights Reserved.。

基于稀疏重建的信号DOA估计

基于稀疏重建的信号DOA估计

基于稀疏重建的信号DOA估计任肖丽;王骥;万群【摘要】从稀疏信号重建角度提出了一种改进的波达方向(DOA)估计方法。

由于最小冗余线阵(MRLA)能以较少的阵元数获得较大的阵列孔径,将MRLA与ℓ1-SVD方法相结合估计信号的DOA。

仿真结果表明,经多次实验验证,所提方法是有效的,相比ℓ1-SVD方法可以估计出更多信源的DOA,并且可以用较少的阵元数估计更多的信源DOA,具有信源过载能力。

%This paper proposes a modified Direction of Arrival(DOA)estimation method based on Minimum Redundancy Linear Array(MRLA)from the sparse signal reconstruction perspective. According to the structure feature of MRLA that obtaining larger antenna aperture through a smaller number of array sensors, MRLA is combined with ℓ1-SVD method to estimate signal DOAs. Simulations demonstrate that the proposed method is effective, and compared with ℓ1-SVD meth-od it can estimate more DOAs of signal source, and it is capable of estimating more DOAs with fewer antenna elements.【期刊名称】《计算机工程与应用》【年(卷),期】2015(000)001【总页数】6页(P195-199,217)【关键词】波达方向(DOA);稀疏信号重建;最小冗余线阵(MRLA);ℓ1-SVD【作者】任肖丽;王骥;万群【作者单位】广东海洋大学信息学院,广东湛江 524088;广东海洋大学信息学院,广东湛江 524088;电子科技大学电子工程学院,成都 611731【正文语种】中文【中图分类】TN911.71 引言源定位是信号处理领域的主要目的之一,利用传感器阵列可以将其转换成DOA估计。

kosel包的中文名字:基于修正碰撞法的变量选择方法说明书

kosel包的中文名字:基于修正碰撞法的变量选择方法说明书

Package‘kosel’October13,2022Title Variable Selection by Revisited Knockoffs ProceduresVersion0.0.1Description Performs variable selection for many types of L1-regularised regressions using the revis-ited knockoffs procedure.This procedure uses a matrix of knockoffs of the covariates indepen-dent from the response variable Y.The idea is to determine if a covariate be-longs to the model depending on whether it enters the model before or after its knock-off.The procedure suits for a wide range of regressions with various types of response vari-ables.Regression models available are exported from the R packages'glmnet'and'ordinal-Net'.Based on the paper linked to via the URL below:Gegout A.,Gueudin A.,Kar-mann C.(2019)<arXiv:1907.03153>.URL https:///pdf/1907.03153.pdfLicense GPL-3Depends R(>=1.1)Encoding UTF-8LazyData trueRoxygenNote6.1.1Imports glmnet,ordinalNetSuggests graphicsNeedsCompilation noAuthor Clemence Karmann[aut,cre],Aurelie Gueudin[aut]Maintainer Clemence Karmann<**************************>Repository CRANDate/Publication2019-07-1810:44:06UTCR topics documented:ko.glm (2)ko.ordinal (3)ko.sel (4)Index812ko.glm ko.glm Statistics of the knockoffs procedure for glmnet regression models.DescriptionReturns the vector of statistics W of the revisited knockoffs procedure for regressions available in the R package glmnet.Most of the parameters come from glmnet().See glmnet documentation for more details.Usageko.glm(x,y,family="gaussian",alpha=1,type.gaussian=ifelse(nvars<500,"covariance","naive"),type.logistic="Newton",type.multinomial="ungrouped",nVal=50,random=FALSE)Argumentsx Input matrix,of dimension nobs x nvars;each row is an observation vector.Can be in sparse matrix format(inherit from class"sparseMatrix"as in packageMatrix;not yet available for family="cox")y Response variable.Quantitative for family="gaussian",or family="poisson"(non-negative counts).For family="binomial"should be either a factor with twolevels,or a two-column matrix of counts or proportions(the second column istreated as the target class;for a factor,the last level in alphabetical order is thetarget class).For family="multinomial",can be a nc>=2level factor,or amatrix with nc columns of counts or proportions.For either"binomial"or"multinomial",if y is presented as a vector,it will be coerced into a factor.Forfamily="cox",y should be a two-column matrix with columns named’time’and’status’.The latter is a binary variable,with’1’indicating death,and’0’indicating right censored.The function Surv()in package survival producessuch a matrix.family Response type:"gaussian","binomial","poisson","multinomial","cox".Not avail-able for"mgaussian".alpha The elasticnet mixing parameter,with0<=alpha<=1.alpha=1is the lasso penalty,and alpha=0the ridge penalty.The default is1.type.gaussian See glmnet documentation.type.logistic See glmnet documentation.type.multinomialSee glmnet documentation.nVal Length of lambda sequence-default is50.random If TRUE,the matrix of knockoffs is different for every run.If FALSE,a seed is used so that the knockoffs are the same.The default is FALSE.ko.ordinal3ValueA vector of dimension nvars corresponding to the statistics W.See Alsoko.selExamples#see ko.selko.ordinal Statistics of the knockoffs procedure for ordinalNet regression models.DescriptionReturns the vector of statistics W of the revisited knockoffs procedure for regressions available in the R package ordinalNet.Most of the parameters come from ordinalNet().See ordinalNet documentation for more details.Usageko.ordinal(x,y,family="cumulative",reverse=FALSE,link="logit",alpha=1,parallelTerms=TRUE,nonparallelTerms=FALSE,nVal=100,warn=FALSE,random=FALSE)Argumentsx Covariate matrix,of dimension nobs x nvars;each row is an observation vector.It is recommended that categorical covariates are converted to a set of indicatorvariables with a variable for each category(i.e.no baseline category);otherwisethe choice of baseline category will affect the modelfit.y Response variable.Can be a factor,ordered factor,or a matrix where each row isa multinomial vector of counts.A weightedfit can be obtained using the matrixoption,since the row sums are essentially observation weights.Non-integermatrix entries are allowed.family Specifies the type of model family.Options are"cumulative"for cumulative probability,"sratio"for stopping ratio,"cratio"for continuation ratio,and"acat"for adjacent category.reverse Logical.If TRUE,then the"backward"form of the model isfit,i.e.the model is defined with response categories in reverse order.For example,the reversecumulative model with K+1response categories applies the link function to thecumulative probabilities P(Y>=2),...,P(Y>=K+1),rather then P(Y<=1),...,P(Y<=K).link Specifies the link function.The options supported are logit,probit,complemen-tary log-log,and cauchit.alpha The elastic net mixing parameter,with0<=alpha<=1.alpha=1corresponds to the lasso penalty,and alpha=0corresponds to the ridge penalty.parallelTerms Logical.If TRUE,then parallel coefficient terms will be included in the model.parallelTerms and nonparallelTerms cannot both be FALSE.nonparallelTermsLogical.if TRUE,then nonparallel coefficient terms will be included in themodel.parallelTerms and nonparallelTerms cannot both be FALSE.Defaultis FALSE.nonparallelTerms=TRUE is highly discouraged.nVal Length of lambda sequence-default is100.warn Logical.If TRUE,the following warning message is displayed whenfitting a cu-mulative probability model with nonparallelTerms=TRUE(i.e.nonparallel orsemi-parallel model)."Warning message:For out-of-sample data,the cumula-tive probability model with nonparallelTerms=TRUE may predict cumulativeprobabilities that are not monotone increasing."The warning is displayed bydefault,but the user may wish to disable it.random If TRUE,the matrix of knockoffs is different for every run.If FALSE,a seed is used so that the knockoffs are the same.The default is FALSE.ValueA vector of dimension nvars corresponding to the statistics W.NotenonparallelTerms=TRUE is highly discouraged because the knockoffs procedure does not suit well to this setting.See Alsoko.selExamples#see ko.selko.sel Variable selection with the knockoffs procedure.DescriptionPerforms variable selection from an object(vector of statistics W)returned by ko.glm or ko.ordinal.Usageko.sel(W,print=FALSE,method="stats")ArgumentsW A vector of length nvars corresponding to the statistics W.Object returned by the functions ko.glm or ko.ordinal.print Logical.If TRUE,positive statistics W are displayed in increasing order.If FALSE,nothing is displayed.If method= manual ,print is automaticallyTRUE.method Can be stats , gaps or manual .If stats ,the threshold used is the W-threshold.If gaps ,the threshold used is the gaps-threshold.If manual ,the user can choose its own threshold using the graph of the positive statistics Wsorted in increasing order.ValueA list containing two elements:•threshold A positive real value corresponding to the threshold used.•estimation A binary vector of length nvars corresponding to the variable selection:1*(W>= threshold).1indicates that the associated covariate belongs to the estimated model.ReferencesGegout-Petit Anne,Gueudin Aurelie,Karmann Clemence(2019).The revisited knockoffs method for variable selection in L1-penalised regressions,arXiv:1907.03153.See Alsoko.glm,ko.ordinalExampleslibrary(graphics)#linear Gaussian regressionn=100p=20set.seed(11)x=matrix(rnorm(n*p),nrow=n,ncol=p)beta=c(rep(1,5),rep(0,15))y=x%*%beta+rnorm(n)W=ko.glm(x,y)ko.sel(W,print=TRUE)#logistic regressionn=100p=20set.seed(11)x=matrix(runif(n*p,-1,1),nrow=n,ncol=p)u=runif(n)beta=c(c(3:1),rep(0,17))y=rep(0,n)a=1/(1+exp(0.1-x%*%beta))y=1*(u>a)W=ko.glm(x,y,family= binomial ,nVal=50)ko.sel(W,print=TRUE)#cumulative logit regressionn=100p=10set.seed(11)x=matrix(runif(n*p),nrow=n,ncol=p)u=runif(n)beta=c(3,rep(0,9))y=rep(0,n)a=1/(1+exp(0.8-x%*%beta))b=1/(1+exp(-0.6-x%*%beta))y=1*(u<a)+2*((u>=a)&(u<b))+3*(u>=b)W=ko.ordinal(x,as.factor(y),nVal=20)ko.sel(W,print=TRUE)#adjacent logit regressionn=100p=10set.seed(11)x=matrix(rnorm(n*p),nrow=n,ncol=p)U=runif(n)beta=c(5,rep(0,9))alpha=c(-2,1.5)M=2y=rep(0,n)for(i in1:n){eta=alpha+sum(beta*x[i,])u=U[i]Prob=rep(1,M+1)for(j in1:M){Prob[j]=exp(sum(eta[j:M]))}Prob=Prob/sum(Prob)C=cumsum(Prob)C=c(0,C)j=1while((C[j]>u)||(u>=C[j+1])){j=j+1}y[i]=j}W=ko.ordinal(x,as.factor(y),family= acat ,nVal=10) ko.sel(W,method= manual )0.4#How to use randomness?n=100p=20set.seed(11)x=matrix(rnorm(n*p),nrow=n,ncol=p)beta=c(5:1,rep(0,15))y=x%*%beta+rnorm(n)Esti=0for(i in1:100){W=ko.glm(x,y,random=TRUE)Esti=Esti+ko.sel(W,method= gaps )$estimation }EstiIndexko.glm,2,4,5ko.ordinal,3,4,5ko.sel,3,4,48。

大数据基础培训系列机器学习算法最新PPT课件

大数据基础培训系列机器学习算法最新PPT课件

扫描一遍整个数据库, 计频算率。1-itemsets 出现的
剪满足支持度和可信度
的到这下些一轮1-i流tem程s,et再s移寻动找 出现的2-itemsets。
重复,对于每种水平的 项知集道我一们直之重前复定计义算的,项 集大小为止。
8. 经典算法之Expectation Maximization
? Matrix Factorization ① Principal component analysis ② Truncated singular value decomposition ③ Dictionary Learning ④ Factor Analysis ⑤ Independent component analysis ⑥ Non-negative matrix factorization ⑦ Latent Dirichlet Allocation
或 递归构建二叉树。对回归树采用 L1 L2损失函数最小化作为分裂准则,对分类树用基尼不纯度最小化或信息增 益最大化作为分裂准则
案个例测:点)17进年行8月了,分针析对,实找验出中区心分曹度受最天大提的供条宇件通,及从竞而争了车解型与的竞纵争向车加型速之度间数的据区(别五。
5. 经典算法之k-means clustering
? Biclustering ① Spectral Co-Clustring ② Spectral Biclustering
? Novelty and Outlier Detection ① One-class SVM ② Elliptic envelope ③ Isolating Forest ④ Local outlier factor
? Regression ① Ordinary Least Squares ② Elastic Net ③ Orthogonal Matching Pursuit ④ Bayesian Regression ⑤ Random Sample Consensus ⑥ Polynomial regression ⑦ Kernel Ridge Regression ⑧ Support vector Regression ⑨ Stochastic Gradient Descent ⑩ Nearest Neighbors

浩瀚贝叶斯3.0.1:个人级别、汇总级别和单步贝叶斯回归模型用户指南说明书

浩瀚贝叶斯3.0.1:个人级别、汇总级别和单步贝叶斯回归模型用户指南说明书

Package‘hibayes’November28,2023Title Individual-Level,Summary-Level and Single-Step BayesianRegression ModelVersion3.0.1Date2023-11-27Description A user-friendly tool tofit Bayesian regression models.It canfit3types of Bayesian mod-els using individual-level,summary-level,and individual plus pedigree-level(single-step)data for both Genomic prediction/selection(GS)and Genome-Wide Associa-tion Study(GW AS),it was designed to estimate joint effects and genetic parameters for a com-plex trait,including:(1)fixed effects and coefficients of covariates,(2)environmental random effects,and its corresponding variance,(3)genetic variance,(4)residual variance,(5)heritability,(6)genomic estimated breeding values(GEBV)for both genotyped and non-genotyped individuals,(7)SNP effect size,(8)phenotype/genetic variance explained(PVE)for single or multiple SNPs,(9)posterior probability of association of the genomic window(WPPA),(10)posterior inclusive probability(PIP).The functions are not limited,we will keep on going in enriching it with more features.References:Meuwissen et al.(2001)<doi:10.1093/genetics/157.4.1819>;Gus-tavo et al.(2013)<doi:10.1534/genetics.112.143313>;Habier et al.(2011)<doi:10.1186/1471-2105-12-186>;Yi et al.(2008)<doi:10.1534/genetics.107.085589>;Zhou et al.(2013)<doi:10.1371/journal.pgen.1003264>;Moser Jones et al.(2019)<doi:10.1038/s41467-019-12653-0>;Hender-son(1976)<doi:10.2307/2529339>;Fernando et al.(2014)<doi:10.1186/1297-9686-46-50>.License GPL-3Maintainer Lilin Yin<**************>URL https:///YinLiLin/hibayesBugReports https:///YinLiLin/hibayes/issuesEncoding UTF-8Imports utils,stats,methods,stringr,CMplot12ibrmDepends R(>=3.3.0),bigmemory,MatrixLinkingTo Rcpp,RcppArmadillo(>=0.9.600.0.0),RcppProgress,BH,bigmemory,MatrixRoxygenNote7.2.3NeedsCompilation yesAuthor Lilin Yin[aut,cre,cph],Haohao Zhang[aut,cph],Xiaolei Liu[aut,cph]Repository CRANDate/Publication2023-11-2813:00:03UTCR topics documented:ibrm (2)ldmat (6)read_plink (8)sbrm (9)ssbrm (12)Index17 ibrm Bayes modelDescriptionBayes linear regression model using individual level datay=Xβ+Rr+Mα+ewhereβis a vector of estimated coefficient for covariates,and r is a vector of environmental random effects.M is a matrix of genotype covariate,αis a vector of estimated marker effect size.e is a vector of residuals.Usageibrm(formula,data=NULL,M=NULL,M.id=NULL,method=c("BayesCpi","BayesA","BayesL","BSLMM","BayesR","BayesB","BayesC", "BayesBpi","BayesRR"),map=NULL,Pi=NULL,fold=NULL,ibrm3 niter=NULL,nburn=NULL,thin=5,windsize=NULL,windnum=NULL,dfvr=NULL,s2vr=NULL,vg=NULL,dfvg=NULL,s2vg=NULL,ve=NULL,dfve=NULL,s2ve=NULL,lambda=0,printfreq=100,seed=666666,threads=4,verbose=TRUE)Argumentsformula a two-sided linear formula object describing both thefixed-effects and random-effects part of the model,with the response on the left of a‘~’operator andthe terms,separated by‘+’operators,on the right.Random-effects terms aredistinguished by vertical bars(1|’)separating expressions for design matricesfrom grouping factors.data the data frame containing the variables named in’formula’,NOTE that thefirst column in’data’should be the individual id.M numeric matrix of genotype with individuals in rows and markers in columns, NAs are not allowed.M.id vector of id for genotyped individuals,NOTE that no need to adjust the order of id to be the same between’data’and’M’,the package will do it automatically.method bayes methods including:"BayesB","BayesA","BayesL","BayesRR","Bayes-Bpi","BayesC","BayesCpi","BayesR","BSLMM".•"BayesRR":Bayes Ridge Regression,all SNPs have non-zero effects andshare the same variance,equals to RRBLUP or GBLUP.•"BayesA":all SNPs have non-zero effects,and take different variance whichfollows an inverse chi-square distribution.•"BayesB":only a small proportion of SNPs(1-Pi)have non-zero effects,and take different variance which follows an inverse chi-square distribution.•"BayesBpi":the same with"BayesB",but’Pi’is notfixed.•"BayesC":only a small proportion of SNPs(1-Pi)have non-zero effects,and share the same variance.•"BayesCpi":the same with"BayesC",but’Pi’is notfixed.•"BayesL":BayesLASSO,all SNPs have non-zero effects,and take differentvariance which follows an exponential distribution.4ibrm•"BSLMM":all SNPs have non-zero effects,and take the same variance,buta small proportion of SNPs have additional shared variance.•"BayesR":only a small proportion of SNPs have non-zero effects,and theSNPs are allocated into different groups,each group has the same variance.map(optional,only for GW AS)the map information of genotype,at least3columns are:SNPs,chromosome,physical position.Pi vector,the proportion of zero effect and non-zero effect SNPs,thefirst value must be the proportion of non-effect markers.fold proportion of variance explained for groups of SNPs,the default is c(0,0.0001,0.001,0.01).niter the number of MCMC iteration.nburn the number of iterations to be discarded.thin the number of thinning after burn-in.Note that smaller thinning frequency may have higher accuracy of estimated parameters,but would result in more memoryfor collecting process,on contrary,bigger frequency may have negative effecton accuracy of estimations.windsize window size in bp for GW AS,the default is NULL.windnumfixed number of SNPs in a window for GW AS,if it is specified,’windsize’will be invalid,the default is NULL.dfvr the number of degrees of freedom for the distribution of environmental variance.s2vr scale parameter for the distribution of environmental variance.vg prior value of genetic variance.dfvg the number of degrees of freedom for the distribution of genetic variance.s2vg scale parameter for the distribution of genetic variance.ve prior value of residual variance.dfve the number of degrees of freedom for the distribution of residual variance.s2ve scale parameter for the distribution of residual variance.lambda value of ridge regression for inverting a matrix.printfreq frequency of printing iterative details on console.seed seed for random sample.threads number of threads used for OpenMP.verbose whether to print the iteration information on console.Details•thefixed effects and covariates in’formula’must be in factors and numeric,respectively.if not,please remember to use’as.factor’and’as.numeric’to transform.•the package has the automatical function of taking the intersection and adjusting the order of id between’data’and the genotype’M’,thus thefirst column in’data’should be the individual id.ibrm5•if any one of the options’windsize’and’windnum’is specified,the GW AS results will be returned,and the’map’information must be provided,in which the physical positions should be all in digital values.•the’windsize’or’windnum’option only works for the methods of which the assumption has a proportion of zero effect markers,e.g.,BayesB,BayesBpi,BayesC,BayesCpi,BSLMM,and BayesR.Valuethe function returns a’blrMod’object containing$mu the regression intercept$pi estimated proportion of zero effect and non-zero effect SNPs$beta estimated coefficients for all covariates$r estimated environmental random effects$Vr estimated variance for all environmental random effect$Vg estimated genetic variance$Ve estimated residual variance$h2estimated heritability(h2=Vg/(Vr+Vg+Ve))$alpha estimated effect size of all markers$g genomic estimated breeding value$e residuals of the model$pip the frequency for markers to be included in the model during MCMC iteration,known as posterior inclusive probability(PIP)$gwas WPPA is defined to be the window posterior probability of association,it is estimated by counting the number of MCMC samples in whichαis nonzero for at least one SNP in the window$MCMCsamples the collected samples of posterior estimation for all the above parameters across MCMC iterationsReferencesMeuwissen,Theo HE,Ben J.Hayes,and Michael E.Goddard."Prediction of total genetic value using genome-wide dense marker maps."Genetics157.4(2001):1819-1829.de los Campos,G.,Hickey,J.M.,Pong-Wong,R.,Daetwyler,H.D.,and Calus,M.P.(2013).Whole-genome regression and prediction methods applied to plant and animal breeding.Genetics, 193(2),327-345.Habier,David,et al."Extension of the Bayesian alphabet for genomic selection."BMC bioinfor-matics12.1(2011):1-12.Yi,Nengjun,and Shizhong Xu."Bayesian LASSO for quantitative trait loci mapping."Genetics 179.2(2008):1045-1055.Zhou,Xiang,Peter Carbonetto,and Matthew Stephens."Polygenic modeling with Bayesian sparselinear mixed models."PLoS genetics9.2(2013):e1003264.Moser,Gerhard,et al."Simultaneous discovery,estimation and prediction analysis of complex traits using a Bayesian mixture model."PLoS genetics11.4(2015):e1004969.Examples#Load the example data attached in the packagepheno_file_path=system.file("extdata","demo.phe",package="hibayes")pheno=read.table(pheno_file_path,header=TRUE)bfile_path=system.file("extdata","demo",package="hibayes")bin=read_plink(bfile_path,threads=1)fam=bin$famgeno=bin$genomap=bin$map#For GS/GP##no environmental effects:fit=ibrm(T1~1,data=pheno,M=geno,M.id=fam[,2],method="BayesCpi",niter=2000,nburn=1200,thin=5,threads=1)##overview of the returned resultssummary(fit)##add fixed effects or covariates:fit=ibrm(T1~sex+season+day+bwt,data=pheno,M=geno,M.id=fam[,2],method="BayesCpi")##add environmental random effects:fit=ibrm(T1~sex+(1|loc)+(1|dam),data=pheno,M=geno,M.id=fam[,2],method="BayesCpi")#For GWASfit=ibrm(T1~sex+bwt+(1|dam),data=pheno,M=geno,M.id=fam[,2],method="BayesCpi",map=map,windsize=1e6)#get the SD of estimated SNP effects for markerssummary(fit)$alpha#get the SD of estimated breeding valuessummary(fit)$gldmat LD variance-covariance matrix calculationDescriptionTo calculate density or sparse LD variance-covariance matrix with genotype in bigmemory format. Usageldmat(geno,map=NULL,gwas.geno=NULL,gwas.map=NULL,chisq=NULL,ldchr=FALSE,threads=4,verbose=FALSE)Argumentsgeno the reference genotype panel in bigmemory format.map the map information of reference genotype panel,columns are:SNPs,chromo-some,physical position.gwas.geno(optional)the genotype of gwas samples which were used to generate the sum-mary data.gwas.map(optional)the map information of the genotype of gwas samples,columns are: SNPs,chromosome,physical position.chisq chi-squre value for generating sparse matrix,if n*r2<chisq,it would be set to zero.ldchr lpgical,whether to calulate the LD between chromosomes.threads the number of threads used in computation.verbose whether to print the information.ValueFor full ld matrix,it returns a standard R matrix,for sparse matrix,it returns a’dgCMatrix’. Examplesbfile_path=system.file("extdata","demo",package="hibayes")data=read_plink(bfile_path)geno=data$genomap=data$mapxx=ldmat(geno,threads=4,verbose=FALSE)#chromosome wide full ld matrix#xx=ldmat(geno,chisq=5,threads=4)#chromosome wide sparse ld matrix#xx=ldmat(geno,map,ldchr=FALSE,threads=4)#chromosome block ld matrix#xx=ldmat(geno,map,ldchr=FALSE,chisq=5,threads=4)#chromosome block+sparse ld matrix8read_plink read_plink data loadDescriptionTo load plink binary dataUsageread_plink(bfile="",maxLine=10000,impute=TRUE,mode=c("A","D"),out=NULL,threads=4)Argumentsbfile character,prefix of Plink binary format data.maxLine number,set the number of lines to read at a time.impute logical,whether to impute missing values in genotype by major alleles.mode"A"or"D",additive effect or dominant effect.out character,path and prefix of outputfilethreads number,the number of used threads for parallel processValuefourfiles will be generated in the directed folder:"xx.desc","xx.bin","xx.id,"xx.map",where’xx’is the prefix of the argument’out’,the memory-mappingfiles can be fast loaded into memory by ’geno=attach.big.matrix("xx.desc")’.Note that hibayes will code the genotype A1A1as2,A1A2 as1,and A2A2as0,where A1is thefirst allele of each marker in".bim"file,therefore the estimated effect size is on A1allele,users should pay attention to it when a process involves marker effect. Examplesbfile_path=system.file("extdata","demo",package="hibayes")data=read_plink(bfile_path,out=tempfile(),mode="A")fam=data$famgeno=data$genomap=data$mapsbrm9 sbrm SBayes modelDescriptionBayes linear regression model using summary level dataUsagesbrm(sumstat,ldm,method=c("BayesB","BayesA","BayesL","BayesRR","BayesBpi","BayesC","BayesCpi", "BayesR","CG"),map=NULL,Pi=NULL,lambda=NULL,fold=NULL,niter=NULL,nburn=NULL,thin=5,windsize=NULL,windnum=NULL,vg=NULL,dfvg=NULL,s2vg=NULL,ve=NULL,dfve=NULL,s2ve=NULL,printfreq=100,seed=666666,threads=4,verbose=TRUE)Argumentssumstat matrix of summary data,details refer to https:///software/gcta/#COJO.ldm dense or sparse matrix,ld for reference panel(m*m,m is the number of SNPs).NOTE that the order of SNPs should be consistent with summary data.method bayes methods including:"BayesB","BayesA","BayesL","BayesRR","Bayes-Bpi","BayesC","BayesCpi","BayesR","CG".•"BayesRR":Bayes Ridge Regression,all SNPs have non-zero effects andshare the same variance,equals to RRBLUP or GBLUP.•"BayesA":all SNPs have non-zero effects,and take different variance whichfollows an inverse chi-square distribution.10sbrm•"BayesB":only a small proportion of SNPs(1-Pi)have non-zero effects,and take different variance which follows an inverse chi-square distribution.•"BayesBpi":the same with"BayesB",but’Pi’is notfixed.•"BayesC":only a small proportion of SNPs(1-Pi)have non-zero effects,and share the same variance.•"BayesCpi":the same with"BayesC",but’Pi’is notfixed.•"BayesL":BayesLASSO,all SNPs have non-zero effects,and take differentvariance which follows an exponential distribution.•"BayesR":only a small proportion of SNPs have non-zero effects,and theSNPs are allocated into different groups,each group has the same variance.•"CG":conjugate gradient algorithm with assigned lambda.map(optional,only for GW AS)the map information of genotype,at least3columns are:SNPs,chromosome,physical position.Pi vector,the proportion of zero effect and non-zero effect SNPs,thefirst value must be the proportion of non-effect markers.lambda value or vector,the ridge regression value for each SNPs.fold percentage of variance explained for groups of SNPs,the default is c(0,0.0001,0.001,0.01).niter the number of MCMC iteration.nburn the number of iterations to be discarded.thin the number of thinning after burn-in.Note that smaller thinning frequency may have higher accuracy of estimated parameters,but would result in more memoryfor collecting process,on contrary,bigger frequency may have negative effecton accuracy of estimations.windsize window size in bp for GW AS,the default is1e6.windnumfixed number of SNPs in a window for GW AS,if it is specified,’windsize’will be invalid,the default is NULL.vg prior value of genetic variance.dfvg the number of degrees of freedom for the distribution of genetic variance.s2vg scale parameter for the distribution of genetic variance.ve prior value of residual variance.dfve the number of degrees of freedom for the distribution of residual variance.s2ve scale parameter for the distribution of residual variance.printfreq frequency of collecting the estimated parameters and printing on console.Note that smaller frequency may have higher accuracy of estimated parameters,butwould result in more time and memory for collecting process,on contrary,big-ger frequency may have an negative effect on accuracy of estimations.seed seed for random sample.threads number of threads used for OpenMP.verbose whether to print the iteration information on console.sbrm11Details•if any one of the options’windsize’and’windnum’is specified,the GW AS results will be returned,and the’map’information must be provided,in which the physical positions should be all in digital values.•the’windsize’or’windnum’option only works for the methods of which the assumption has a proportion of zero effect markers,e.g.,BayesB,BayesBpi,BayesC,BayesCpi,BSLMM,and BayesR.Valuethe function returns a’blrMod’object containing$pi estimated proportion of zero effect and non-zero effect SNPs$Vg estimated genetic variance$Ve estimated residual variance$h2estimated heritability(h2=Vg/(Vg+Ve))$alpha estimated effect size of all markers$pip the frequency for markers to be included in the model during MCMC iteration,also known as posterior inclusive probability(PIP)$gwas WPPA is defined to be the window posterior probability of association,it is estimated by counting the number of MCMC samples in whichαis nonzero for at least one SNP in the window$MCMCsamples the collected samples of posterior estimation for all the above parameters across MCMC iterationsReferencesLloyd-Jones,Luke R.,et al."Improved polygenic prediction by Bayesian multiple regression on summary statistics."Nature communications10.1(2019):1-11.Examplesbfile_path=system.file("extdata","demo",package="hibayes")bin=read_plink(bfile_path,threads=1)fam=bin$famgeno=bin$genomap=bin$mapsumstat_path=system.file("extdata","demo.ma",package="hibayes")sumstat=read.table(sumstat_path,header=TRUE)head(sumstat)#computate ld variance covariance matrix##construct genome wide full variance-covariance matrixldm1<-ldmat(geno,threads=4)##construct genome wide sparse variance-covariance matrix#ldm2<-ldmat(geno,chisq=5,threads=4)##construct chromosome wide full variance-covariance matrix#ldm3<-ldmat(geno,map,ldchr=FALSE,threads=4)##construct chromosome wide sparse variance-covariance matrix#ldm4<-ldmat(geno,map,ldchr=FALSE,chisq=5,threads=4)#if the order of SNPs in genotype is not consistent with the order in sumstat file,#prior adjusting is necessary.indx=match(map[,1],sumstat[,1])sumstat=sumstat[indx,]#fit modelfit=sbrm(sumstat=sumstat,ldm=ldm1,method="BayesCpi",Pi=c(0.95,0.05),niter=20000,nburn=12000,seed=666666,map=map,windsize=1e6,threads=1)#overview of the returned resultssummary(fit)#get the SD of estimated SNP effects for markerssummary(fit)$alphassbrm Single-step Bayes modelDescriptionSingle-step Bayes linear regression model using individual level data and pedigree informationy=Xβ+Rr+Mα+U +ewhere y is the vector of phenotypic values for both genotyped and non-genotyped individuals,βis a vector of estimated coefficient for covariates,M contains the genotype(M2)for genotyped individuals and the imputed genotype(M1=A12A−122M2)for non-genotyped individuals, is the vector of genotype imputation error,e is a vector of residuals.Usagessbrm(formula,data=NULL,M=NULL,M.id=NULL,pedigree=NULL,method=c("BayesCpi","BayesA","BayesL","BayesR","BayesB","BayesC","BayesBpi", "BayesRR"),map=NULL,Pi=NULL,fold=NULL,niter=NULL,nburn=NULL,thin=5,windsize=NULL,windnum=NULL,maf=0.01,dfvr=NULL,s2vr=NULL,vg=NULL,dfvg=NULL,s2vg=NULL,ve=NULL,dfve=NULL,s2ve=NULL,printfreq=100,seed=666666,threads=4,verbose=TRUE)Argumentsformula a two-sided linear formula object describing both thefixed-effects and random-effects part of the model,with the response on the left of a‘~’operator andthe terms,separated by‘+’operators,on the right.Random-effects terms aredistinguished by vertical bars(1|’)separating expressions for design matricesfrom grouping factors.data the data frame containing the variables named in’formula’,NOTE that thefirst column in’data’should be the individual id.M numeric matrix of genotype with individuals in rows and markers in columns, NAs are not allowed.M.id vector of id for genotype.pedigree matrix of pedigree,3columns limited,the order of columns shoud be"id","sir", "dam".method bayes methods including:"BayesB","BayesA","BayesL","BayesRR","Bayes-Bpi","BayesC","BayesCpi","BayesR".•"BayesRR":Bayes Ridge Regression,all SNPs have non-zero effects andshare the same variance,equals to RRBLUP or GBLUP.•"BayesA":all SNPs have non-zero effects,and take different variance whichfollows an inverse chi-square distribution.•"BayesB":only a small proportion of SNPs(1-Pi)have non-zero effects,and take different variance which follows an inverse chi-square distribution.•"BayesBpi":the same with"BayesB",but’Pi’is notfixed.•"BayesC":only a small proportion of SNPs(1-Pi)have non-zero effects,and share the same variance.•"BayesCpi":the same with"BayesC",but’Pi’is notfixed.•"BayesL":BayesLASSO,all SNPs have non-zero effects,and take differentvariance which follows an exponential distribution.•"BayesR":only a small proportion of SNPs have non-zero effects,and theSNPs are allocated into different groups,each group has the same variance.map(optional,only for GW AS)the map information of genotype,at least3columns are:SNPs,chromosome,physical position.Pi vector,the proportion of zero effect and non-zero effect SNPs,thefirst value must be the proportion of non-effect markers.fold proportion of variance explained for groups of SNPs,the default is c(0,0.0001,0.001,0.01).niter the number of MCMC iteration.nburn the number of iterations to be discarded.thin the number of thinning after burn-in.Note that smaller thinning frequency may have higher accuracy of estimated parameters,but would result in more memoryfor collecting process,on contrary,bigger frequency may have negative effecton accuracy of estimations.windsize window size in bp for GW AS,the default is NULL.windnumfixed number of SNPs in a window for GW AS,if it is specified,’windsize’will be invalid,the default is NULL.maf the effects of markers whose MAF is lower than the threshold will not be esti-mated.dfvr the number of degrees of freedom for the distribution of environmental variance.s2vr scale parameter for the distribution of environmental variance.vg prior value of genetic variance.dfvg the number of degrees of freedom for the distribution of genetic variance.s2vg scale parameter for the distribution of genetic variance.ve prior value of residual variance.dfve the number of degrees of freedom for the distribution of residual variance.s2ve scale parameter for the distribution of residual variance.printfreq frequency of printing iterative details on console.seed seed for random sample.threads number of threads used for OpenMP.verbose whether to print the iteration information on console.Valuethe function returns a a’blrMod’object containing$J coefficient for genotype imputation residuals$Veps estimated variance of genotype imputation residuals$epsilon genotype imputation residuals$mu the regression intercept$pi estimated proportion of zero effect and non-zero effect SNPs$beta estimated coefficients for all covariates$r estimated environmental random effects$Vr estimated variance for all environmental random effect$Vg estimated genetic variance$Ve estimated residual variance$h2estimated heritability(h2=Vg/(Vr+Vg+Ve))$g data.frame,thefirst column is the list of individual id,the second column is the genomic esti-mated breeding value for all individuals,including genotyped and non-genotyped.$alpha estimated effect size of all markers$e residuals of the model$pip the frequency for markers to be included in the model during MCMC iteration,also known as posterior inclusive probability(PIP)$gwas WPPA is defined to be the window posterior probability of association,it is estimated by counting the number of MCMC samples in whichαis nonzero for at least one SNP in the window$MCMCsamples the collected samples of posterior estimation for all the above parameters across MCMC iterationsReferencesFernando,Rohan L.,Jack CM Dekkers,and Dorian J.Garrick."A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses."Genetics Selection Evolution46.1(2014):1-13.Henderson,C.R.:A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values.Biometrics32(1),69-83(1976).Examples#Load the example data attached in the packagepheno_file_path=system.file("extdata","demo.phe",package="hibayes")pheno=read.table(pheno_file_path,header=TRUE)bfile_path=system.file("extdata","demo",package="hibayes")bin=read_plink(bfile_path,threads=1)fam=bin$famgeno=bin$genomap=bin$mappedigree_file_path=system.file("extdata","demo.ped",package="hibayes")ped=read.table(pedigree_file_path,header=TRUE)#For GS/GP##no environmental effects:fit=ssbrm(T1~1,data=pheno,M=geno,M.id=fam[,2],pedigree=ped,method="BayesCpi",niter=1000,nburn=600,thin=5,printfreq=100,threads=1) ##overview of the returned resultssummary(fit)##add fixed effects or covariates:fit=ssbrm(T1~sex+bwt,data=pheno,M=geno,M.id=fam[,2],pedigree=ped, method="BayesCpi")##add environmental random effects:fit=ssbrm(T1~(1|loc)+(1|dam),data=pheno,M=geno,M.id=fam[,2],pedigree=ped,method="BayesCpi")#For GWASfit=ssbrm(T1~sex+bwt+(1|dam),data=pheno,M=geno,M.id=fam[,2],pedigree=ped,method="BayesCpi",map=map,windsize=1e6)#get the SD of estimated SNP effects for markerssummary(fit)$alpha#get the SD of estimated breeding valuessummary(fit)$g。

机器学习中矩阵低秩与稀疏近似

机器学习中矩阵低秩与稀疏近似
2、论文要求自己动手撰写,如发现论文是从网上下载的,或 者是抄袭剽窃别人文章的,按作弊处理,本门课程考核成绩计 0 分。
3、课程论文用 A4 纸双面打印。字体全部用宋体简体,题目 要求用小二号字加粗,标题行要求用小四号字加粗,正文内容要求 用小四号字;经学院同意,课程论文可以用英文撰写,字体全部用 Times New Roman,题目要求用 18 号字加粗;标题行要求用 14 号字加粗,正文内容要求用 12 号字;行距为 2 倍行距(方便教师 批注);页边距左为 3cm、右为 2cm、上为 2.5cm、下为 2.5cm;其 它格式请参照学位论文要求。
1.2 l0正则
l0正则是最直接最根本的稀疏学习技术。然而不幸的是,它具有组合的性质,是 个非凸正则子,难于分析。最小化l0范数是一个NP难的问题,在理论和实践中,均只 存在指数复杂度(相对于向量维数)的算法。一般来说,绝大多数算法对求l0只能得 到一个非精确解,有的直接求解最接近l0正则的凸l1正则(显然在lp正则中,p越少越
3
华南理工大学工学博士研究生课程论文
a) p ≥ 1
b) 0 < p < 1
图 1 当p ≥ 1与0 < p < 1时,lp正则子的形状示意图。
接近l0正则),也有的研究者使用如下函数逼近来逼近l0: x 0 ≈ i log(ε + |xi|),其 中ε是一个很小的正数,它是为了避免出现log 0数值上的无意义。但对于需要直接优
2
华南理工大学工学博士研究生课程论文
统计学习是当今机器学习领域的主流技术。向量空间的统计学习算法已经比较 成熟,近几年来,许多研究者主要把目光放在矩阵空间上。与向量空间相比,基于矩 阵空间的学习技术由于缺少扩展性,会随着问题的大小在空间和时间复杂度上分别 呈二次方与三次方增长,所以如何逼近一个目标矩阵而令机器学习技术更鲁棒更精 确更适合于大规模的情况已成为当今机器学习领域十分热门的话题。受到支持向量 机、压缩感知和非负矩阵分解等技术的启发,基于稀疏和低秩性质的假设,人们开发 了一系列基于矩阵方法的机器学习算法。

Feature selection for k-means clustering stability-DMKD

Feature selection for k-means clustering stability-DMKD

Data Min Knowl Disc(2014)28:918–960DOI10.1007/s10618-013-0320-3Feature selection for k-means clustering stability: theoretical analysis and an algorithmDimitrios Mavroeidis·Elena MarchioriReceived:31October2011/Accepted:25April2013/Published online:29May2013©The Author(s)2013Abstract Stability of a learning algorithm with respect to small input perturbations is an important property,as it implies that the derived models are robust with respect to the presence of noisy features and/or data samplefluctuations.The qualitative nature of the stability property enhardens the development of practical,stability optimizing, data mining algorithms as several issues naturally arise,such as:how“much”stability is enough,or how can stability be effectively associated with intrinsic data properties. In the context of this work we take into account these issues and explore the effect of stability maximization in the continuous(PCA-based)k-means clustering problem. Our analysis is based on both mathematical optimization and statistical arguments that complement each other and allow for the solid interpretation of the algorithm’s stability properties.Interestingly,we derive that stability maximization naturally introduces a tradeoff between cluster separation and variance,leading to the selection of features that have a high cluster separation index that is not artificially inflated by the features variance.The proposed algorithmic setup is based on a Sparse PCA approach,that selects the features that maximize stability in a greedy fashion.In our study,we also analyze several properties of Sparse PCA relevant to stability that promote Sparse PCA as a viable feature selection mechanism for clustering.The practical relevance of the proposed method is demonstrated in the context of cancer research,where Responsible editor:Dimitrios Gunopulos,Donato Malerba,Michalis Vazirgiannis.D.Mavroeidis(B)IBM Research–Ireland,Mulhuddart,Dublin15,Irelande-mail:dimitrim@E.MarchioriDepartment of Computer Science,Faculty of Sciences,Radboud University,Heyendaalseweg135,6525AJ Nijmegen,The Netherlandse-mail:e.marchiori@cs.ru.nlFeature selection for k-means clustering stability919 we consider the problem of detecting potential tumor biomarkers using microarray gene expression data.The application of our method to a leukemia dataset shows that the tradeoff between cluster separation and variance leads to the selection of features corresponding to important biomarker genes.Some of them have relative low variance and are not detected without the direct optimization of stability in Sparse PCA based k-means.Apart from the qualitative evaluation,we have also verified our approach as a feature selection method for k-means clustering using four cancer research datasets. The quantitative empirical results illustrate the practical utility of our framework as a feature selection mechanism for clustering.Keywords Sparse PCA·Stability·Feature selection·Clustering1IntroductionThe stability of a learning algorithm with respect to small input perturbations is gener-ally considered a desired property of learning algorithms,as it ensures that the derived models are robust and are not significantly affected by noisy features or data sample fluctuations.Based on these motivations,the notion of stability has been employed by several popular machine learning paradigms(such as Bagging—Breiman1996)and it has been the central theme in several studies that focus both on the theoretical study of stability and the development of practical stability optimizing algorithms.Stability-optimizing machine learning methods are mostly relevant in applications where the number of features is large compared to the number of instances and also when there is a high amount of noise present.These data characteristics are very common in life-sciences and are encountered in several bio-informatics applications(i.e.on can refer to Sandrine and Jane2003).These data characteristics are also observed in the microarray datasets employed in Sect.7of this paper.Albeit the considerable amount of research that has been devoted to the study of stability,the interplay between clustering stability and feature selection has not been substantially investigated.This is because most feature selection frameworks do not take into account the contribution of the features to the variance of the derived mod-els and solely evaluate the“relevance”of each feature to the target class structure. This may result in suboptimal models since prediction error is,as illustrated by the bias-variance decomposition,affected by both the relevance of each feature(bias)and its contribution to the stability(variance)of the resulting data model.These consid-erations,that are also discussed in Munson and Caruana(2009)motivate the study for practical feature selection algorithms that achieve the right balance between the bias-variance tradeoff and optimize the predictive ability of the resulting models.In the context of this work we undertake this challenge and explore the potentials of performing feature selection with the general purpose of maximizing the stability of the continuous(PCA-based)k-means clustering output.1The proposed analysis is performed at a theoretical,algorithmic and empirical level,using both optimization1With the term continuous k-means clustering problem we refer to the continuous relaxation approach for approximating k-means(Ding and He2004).920 D.Mavroeidis,E.Marchiori and statistical arguments.From the optimization point of view,we demonstrate that stability maximization,naturally leads to a cluster separation versus feature variance trade off that results in the selection of features that have a high cluster separation index that is not artificially inflated by the feature’s variance.This conceptual contribution brings new insights to the theoretical properties of stability,provides practitioners with a clear understanding of the semantics of stability maximization and also allows for the effective interpretation of the success(or possible failure)of the stability based feature selection process.From the statistical perspective,we associate the stability of the clustering results to the statistical significance of the resulting cluster structure.This is an interesting viewpoint as it allows for the rigorous statistical evaluation of whether the cluster result is an artifact of chance,or a result of the data structure.The statistical perspective of stability,further enhances our understanding of the effect of feature selection for stability optimization and allows practitioners to directly evaluate whether the level of stability is sufficient for their application needs.From the algorithmic point of view,we propose a Sparse PCA formulation for select-ing the relevant features that maximize the stability of the clustering solution.Sparse PCA presents a natural choice,since the continuous k-means solution is derived by the principal components,i.e.the dominant eigenvectors of the feature covariance matrix (Ding and He2004).In order to approximate the Sparse PCA Objective,we derive an efficient forward greedy algorithm that optimizes a lower bound on the stability max-imizing objective.The proposed algorithmic framework takes after d’Aspremont et al.(2007,2008)where analogous greedy,lower-bound maximizing approaches were proposed for the standard Sparse PCA problem.Within this work we also derive several interesting results that are related to the suitability of Sparse PCA for feature selection in clustering.Specifically,we demon-strate that Sparse PCA can be derived as a continuous relaxation to a feature selection problem that optimizes for a cluster separation index.Moreover,we show that double centering the data before the application of Sparse PCA,leads to a“two-way”stability property.I.e.the stability of the instance-clusters becomes equal to the stability of the feature-clusters.This is an important observation that complements our work with data mining algorithms that utilize the feature clusters in the data mining process.Finally, we propose a novel“two-way”stable Sparse PCA algorithm that relies on a greedy lower bound optimization.These results can be considered as side-contributions to our understanding of Sparse PCA as a feature selection mechanism for clustering.Empirically,we verify the proposed stable Sparse PCA framework in the context of Cancer Research.In our experiments we have employed four publicly available microarray datasets that are related to the identification of certain cancer types.In particular,we consider the problem of detecting potential tumor biomarkers using microarray gene expression data.Application of our method to leukemia gene expres-sion data shows that the tradeoff between cluster separation and variance leads to the selection of features corresponding to important biomarker genes.Some of them have relative low variance and are not detected without the direct optimization of stability.The rest of this paper is organized as follows:Sect.2recalls the relations between PCA and k-means clustering;Sect.3discusses some cases when Sparse PCA will fail as a feature selection mechanism;Sect.4presents the proposed stability objective;Feature selection for k-means clustering stability 921Sect.5describes the algorithmic approach that is adopted;Sect.6recalls the related work;Sect.7presents the empirical evaluation;Sect.8discusses an alternate view of feature selection that is based on statistical significance optimization;finally Sect.9contains the concluding remarks.2Spectral k -means2.1From discrete to continuousK -means clustering is arguably the most popular clustering algorithm among data min-ing practitioners,and albeit its introduction more than 50years ago,it still constitutes an active area of research.The goal of K -means is to find the clustering that minimizes the sum of squared distances of the elements of each cluster to the cluster centers.Formally this objective can be stated as:J K = K k =1 i ∈C k ||x i −μk ||2where we consider x i to be the instance vectors,μk the respective cluster centers and C k ,to denote the clusters.The most popular heuristic for approximating J K is the standard Lloyd’s algorithm,that starts with a random initial guess of the cluster center and iteratively converges using an E M -style iterative process to a local optima of the k -means objective.In the context of this work,we focus on a different approximation scheme for the k -means objective that is based on the continuous (spectral)relaxation of the discrete cluster assignment vector (Ding and He 2004).The Spectral relaxation allows us to study the stability of the clustering output using the advanced results of matrix perturbation theory (Stewart and Sun 1990).In order to illustrate spectral k -means,we recall from Ding and He (2004)that the k -means problem can be written in equivalent form as:J K =Trace (X T f c X f c )−12J D where X f c is the input m ×n feature-instance matrix,with centered features (rows),and J D in the 2-cluster case (clusters c 1and c 2with sizes n 1and n 2)is defined as:J D =n 1n 2n 2d (c 1,c 2)n 1n 2−d (c 1,c 1)n 21−d (c 2,c 2)n 22(1)with d (c k ,c l )= i ∈C k ,j ∈C l ||x i −x j ||2.Moreover,in Ding and He (2004)it is demon-strated that J D =2Trace (Q T K −1X T f c X f c Q K −1),where Q K −1is a n ×(K −1)matrix(n =#inst.,K =#clust.)that contains the discrete cluster labels.In the simple case of K =2-way clustering the discrete cluster values of Q 1are defined as follows (Eq.7in Ding and He 2004):Q 1(i )= √n 2/nn 1if i ∈c 1−√n 1/nn 2if i ∈c 2(2)where n 1,n 2,c 1,c 2are as defined as in Eq.1.Based on the afore equations,the minimization of J K is equivalent to the maxi-mization of J D .By applying the continuous relaxation to J D ,the continuous solution is derived by projecting the data to the k −1principal eigenvectors,i.e.the k −1dom-inant eigenvectors of the covariance matrix that correspond to the largest eigenvalues.922 D.Mavroeidis,E.Marchiori Naturally,the spectral solution will contain the continuous values and an extra stepneeds to be applied to discretize the continuous cluster assignments with a popularheuristic being the application of standard Lloyd’s k-means to the reduced principaleigenspace.It can be noticed that the minimization of J K is equivalent to the maximization ofJ D because Trace(X T f c X f c)is a constant that is equal to the(scaled)sum of variances of the available features.2In a feature selection setup this term will not remain constantsince different features may have different variances,unless the data are appropriatelypreprocessed such that they have equal variances.The stability of Spectral k-means can be evaluated using Matrix Perturbation Theory(Stewart and Sun1990).The relevant theorems designate that the stability of thecontinuous Spectral k-means solution depends on the size of the eigengapλk−1−λk between the k−1and the k largest eigenvalues of the relevant matrix with a largereigengap implying improved stability.Thus,a stability optimizing algorithm shouldaim to maximize this eigengap.2.2From continuous(stability)to discrete(stability)The continuous PCA-based k-means,described in the previous section,provides somenecessary insights for understanding how an eigenvector problem can be employed foreffectively approximating the discrete cluster indicators of k-means clustering(fromdiscrete to continuous discrete→continuous direction).Based on this analogy we cangain the intuition that the stability of the continuous(PCA-based)k-means will implythat the discrete cluster results are also stable.This general intuition is further supportedby recent research works that have rigorously analysed how stability transfers fromthe continuous spectral solutions to the discrete cluster results(continuous→discretedirection).A thorough analysis of the continuous→discrete direction with respect tostability,can be found in Huang et al.(2008).The stability connection in Huang et al.(2008)is established for spectral clustering with the Laplacian matrix(instead of thecovariance matrix that is used in our context);however the results naturally transfer tothe spectral k-means case since the relevant proof in Huang et al.(2008,Proposition1)does not employ any special properties of the Laplacian matrix.The bound of interest(Proposition1in Huang et al.2008),links the stability of the continuous spectralresults to the discrete mis-clustering rateη.ηis defined in Huang et al.(2008)as theproportion of samples that have different cluster memberships when computed on thetwo different versions of the data,the“clean”input data X and the perturbed datamatrix X=X+E.The bound that establishes the connection can be stated as:η≤|| v1−v1||22whereηdenotes the mis-clustering rate,v1denotes the continuous(PCA-based)solu-tion of the“clean”data X and v1denotes the continuous cluster solution based on X. 2This is because Trace(X TX f c)=Trace(X f c X T f c).f cFeature selection for k-means clustering stability923 In Huang et al.(2008)the required assumptions/hypotheses for this inequality to hold are thoroughly presented and discussed.Based on matrix perturbation theory(Stewart and Sun1990)one can derive an upper bound on the difference between eigenvectors v1and v1that depends on the norm of the perturbation and certain algebraic properties of the covariance matrix. More precisely,if we consider that v1is derived by covariance matrix C and v1by the perturbed covariance matrix C+E,we can employ the bound(illustrated in Mavroeidis and Magdalinos2012)|| v1−v1||2≤4||E||2λ1−λ2,or the bounds analyzed in Stewart and Sun(1990)that also depend on the norm of the perturbation matrix||E|| and the size of the relevant eigengap.In this manner we can derive a bound on the discrete mis-clustering rate that depends on the stability of the continuous results in the form:η≤|| v1−v1||2≤f(E,λ1−λ2)(3)2In the above formula,it is evident that the characterization of the mis-clustering rate, can be achieved through the appropriate definition of the E perturbation matrix.For this purpose,there exist several options that depend on the type of perturbations that we want to model in our data set.I.e.E matrix can be based on a specific noise model(as in Huang et al.2008),a sampling variance approach(as in Mavroeidis and Bingham 2010,2008;Mavroeidis and Vazirgiannis2007;Mavroeidis and Magdalinos2012)or a combination of the two.It can be observed that by employing a sampling variance definition of E matrix we can characterize the statistical significance of the clustering results,i.e.we can effectively compute an upper-bound on the mis-clustering rate with respect to using a different data sample from the same data-generating distribution. This means that if we can compute a small mis-clustering bound(rgeλ1−λ2) then we can guarantee that the resulting clustering structure will be characteristic of the data generating distribution and not an artifact of chance.3Sparse PCA as a feature selection mechanism for k-meansBased on the relations between PCA and k-means it is natural to consider Sparse PCA as a feature selection mechanism for k-means clustering.In its basic form,Sparse PCA is defined as a cardinality constrain PCA problem,with the cardinality constraint being imposed on the resulting principal components.In effect,the principal components are constructed based on a subset of the available features,with the general purpose of maximizing the variance in the projected space.Based on the above,and denoting our input covariance matrix as Co v,the Sparse PCA problem can be stated as:max v T(Co v)vs.t.||v||2=1card(v)≤kwhere card denotes the cardinality of a vector.This Sparse PCA formulation,along with certain other alternatives can be found in d’Aspremont et al.(2008).It can be924 D.Mavroeidis,E.Marchiori observed that the Sparse PCA approach to feature selection for k-means clustering promotes features with high(structured)variance as more appropriate to model the underlying cluster structure of the data space.Although this is appropriate under several standard data generating distributions there may be cases where this bias can significantly degrade the quality of the clustering results.In these cases,as we will further on analyze in detail,we need to put in perspective the ability of a feature to separate between the clusters3versus the“unstructured variance contribution”of a feature(that is a feature with high variance that does not separate well between the clusters).In technical terms,the contribution of a feature to the separation between clusters(“structured variance contribution”)is modelled by the dominant eigenvalue of the feature covariance matrix(the fact that the dominant eigenvalue can be used for this purpose is analytically illustrated in the next paragraph)while the“unstructured variance contribution”is measured by the sum of all the eigenvalues of the feature covariance matrix.In order to illustrate the shortcomings of Sparse PCA,we consider a3D problem where only one feature is relevant for discriminating between the two clusters.More precisely we consider a two-way clustering problem where the two clusters(objects) have a discoid shape,i.e.they are characterized by two features with high variance and one feature with low variance.In Fig.1we illustrate these clusters considering x,z to be the“uninformative high variance dimensions”and y to be the“informative low variance dimension”.Dimension y is the informative one since the two discoid clusters (shapes)are placed on top of the other,thus variable y is capable of separating between the two clusters.In our experiment we have varied the cluster distance parameter and our basicfinding is that a“pure”Sparse PCA approach that selects the top two features based solely on the maximum eigenvalue of the covariance matrix,fails to identify the importance of feature y.In fact“pure”Sparse PCA ranks it as third(last)in importance, as the two clusters come closer together and selects the uninformative pair x,z.This can be observed in Fig.1b where we report the dominant eigenvalue(y axis in Fig.1b) for all possible feature pair selections.The reason for this behaviour is that variables x,z have a high contribution to both unstructured and structured variance and thus,are able to mislead the“pure”Sparse PCA framework.A stability-based approach can effectively overcome this problem by placing unstructured and structured variance in perspective and can effectively identify the importance of feature y.We should clarify here that we did not employ a specific Sparse PCA algorithm,but we explicitly evaluated all possible feature combinations (i.e.we emulated the behavior of an optimal Sparse PCA algorithm).4Stable Sparse PCA4.1Stability maximizing objective and the cluster separation/variance tradeoffWe will now move on to define the appropriate optimization objective for feature selection that maximizes the stability of the Spectral k-means clustering output.The 3We will refer to this property as“structured variance contribution”of a feature.Feature selection for k-means clustering stability925Fig.1We illustrate here the behaviour of Sparse PCA as a feature selection algorithm for k -means cluster-ing.We can observe that the top two features for Sparse PCA are X,Z (features with high variance but with no contribution to cluster separation),while the only feature that can separate between the two clusters,Y is ranked third.Sparse PCA can successfully identify the Y only when the clusters are moved adequately far apart.a Data visualization and b Sparse PCA featuresproposed formulation is based on the Sparse PCA approach in d’Aspremont et al.(2007,2008),with the appropriate modifications that account for stability maximiza-tion.In order to optimize for stability,we incorporate a term that accounts for the differ-ence between the two largest eigenvalues.Since the aim is to distinguish between the two largest eigenvalues and not between λk −1and λk ,our framework initially con-siders the two-way clustering problem,and extends for k >2-way clustering using a deflation method analytically described in Sect.5.3.In this manner,the proposed framework can select a different subset of features at each sequential step (for each eigenvector)thus possibly identifying different feature subsets for separating between different clusters.This property of our framework will be useful in cases where dif-ferent clusters are well separated by different feature subsets.For facilitating the optimization problem we consider the average difference between the largest eigenvalue with the rest.I.e.1n n i =1(λ1−λi )instead of λ1−λ2.Although this formulation will not directly optimize for the difference between the largest eigenvalues,the objective will have a stronger incentive for minimizing the eigenvalues that are closer to λ1since they will contribute more to the maximization of the average difference.As we will illustrate in this section,the difference between the largest and the consecutive eigenvalues gives rise to a tradeoff between maximiz-ing the distance between clusters and the feature variances.This balancing procedure imposes a variance based threshold on a cluster separation index and utterly selects the features that optimize the harder separation objective.Before we move on to define our objective function we will clarify the nota-tion we will use.We will denote X as our input m ×n (feature-instance)matrix,C n =(I −e n e T n /n ),denotes the standard row (feature)centering matrix,u is a vector of length m ,with u (i )=1if feature i is retained in the final solution.diag (u )is an m ×m diagonal matrix with vector u in its diagonal (i.e.u (i ,i )=0if feature i is removed,otherwise it is equal to 1),and card (u )is equal to the number of non-zero elements of u (i.e.the number of features that are selected.It can be observed that the multiplication diag (u )X removes the features that correspond to u (i )=0.Finally926 D.Mavroeidis,E.Marchiori we will denote the column(feature)-centered matrix as X f c=XC n and also x f c(i) as a n×1vector that contains the centered vector-representation of feature i(notice that we represent x f c(i)as a column vector although it corresponds to row i of matrix X f c).Based on the afore notation,the covariance matrix after feature selection(omitting term1/n that does not affect our optimization problem)is defined as:Co v=diag(u)X f c X T f c diag(u)We now define the stability maximizing objective as:Obj=maxu∈{0,1}m1nni=1(λ1(Co v)−λi(Co v))=maxu∈{0,1}mn−1nλ1(Co v)−1nni=2λi(Co v))=maxu∈{0,1}mλ1(Co v)−1nTrace(Co v)(4)Based on the ability to express the k-means objective using the clustering separation index J D(as analyzed in Sect.2),we can derive the afore objective as a continuous relaxation to the following feature selection clustering objective:max u∈{0,1}m n1n2n2d(u)(c1,c2)n1n2−d(u)(c1,c1)n21−d(u)(c2,c2)n22−mi=1u i·var(f i)(5)where d(u)(c i,c j)=k∈c il∈c j(x(u)k−x(u)l)2and x(u)denotes the representation ofan instance after feature selection(i.e.only the selected features are taken into account when computing the respective distances).Moreover,var(f i)denotes the variance of feature i.Notice that the cluster separation index is equivalent to the J D of formula1 after feature selection.The proof of the relationship between the Objectives4,5is mostly based on the derivations made within(Ding and He2004).Based on the afore analysis,we have demonstrated that stability optimization leads to the introduction of a cluster separation versus variance tradeoff in the feature selec-tion process.In this manner the features that are selected will have high cluster sepa-ration value and among features with equal cluster separation abilities the ones with the smaller variance will be selected.The novelty of the proposed objective resides in the fact that it explicitly penalizes high feature variance and it leads to the selection of the feature subset that has high cluster separation index and low variance.Although this seems to contradict a basic rule of thumb in feature selection that considers fea-tures with high variance to be more helpful in separating between clusters(notably,the selection of features with high variance is commonly used as a baseline in the empirical evaluation of feature selection algorithms),our framework can be justified by the view of feature selection as a variance reduction process(as done in Munson and CaruanaFeature selection for k-means clustering stability927 2009).In this conceptual approach,feature selection improves the quality of a learningalgorithm when it achieves the reduction of variance without significantly increasingthe algorithm’s bias.Interestingly,under this paradigm the contribution of each featureto the bias-variance of the output model is more important that the exact identificationof the relevant/non-relevant features(i.e.a relevant feature that contributes highly tothe model’s variance may not be desirable).The stable Sparse PCA objective formulation currently accounts only for the maxi-mization of the stability of the instance-clustering output.We use the term“solely”aswe will demonstrate in the next section that this objective can be extended such thatit simultaneously optimizes the stability of both instance and feature clusters.4.2Two-way stabilitySeveral data mining frameworks employ the clustering of the features as an importantcomponent within the general data mining process.One such example is bi-clustering,or co-clustering(Dhillon2001)where one tries to cluster simultaneously the featuresand the instances for identifying the clusters of features that can be used for describingthe instance clusters.In these application contexts the stability of the clustering of thefeatures is of central importance,since an unstable cluster structure could result inspurious feature clusters that are sensitive to noise or data sample variations.Based on these motivations,we will present here the necessary extensions that areneeded such that the proposed stable Sparse PCA objective,optimizes concurrentlyboth for the stability of the instance and feature clusters.We will furtheron refer to thistype of concurrent stability optimization as“two-way”stability.As we illustrate in thefollowing lemma,two-way stability can be achieved by employing double-centering,a popular data processing technique.The double centering procedure centers bothrows and the columns of the data matrix such that they have zero mean.Naturallythe appropriateness of this transformation should be initially assessed,by inspectingwhether the induced semantics are appropriate for the relevant dataset.The semanticsof double-centering can be understood if we consider that our n instances and mfeatures define a bi-partite graph(V n,V m,W)with the edge-weights W(i,j)being the value that feature j has at instance i.Double-centering in this context can beinterpreted as a procedure for transforming the edge weights such that they indicatewhether a positive or negative connection is present between an instance and a feature(and also the strength of this connection).Moreover,double-centering normalizes theeffect of graph nodes by setting to zero the sum of all edge-weights connected to anode.Further discussion on the double-centering preprocessing method can be foundin Cho(2010).Based on double-centering the stability between the instance-clustersand feature clusters becomes equivalent.This effect is demonstrated in the followinglemma whose proof can be found in the appendix.Lemma1Let X be our input m×n feature-instance data matrix.If X is double-centered,then the stability of spectral k-means applied on the instances is equivalent to the stability of spectral k-means applied on the features.。

贝叶斯非参数模型框架构建简介

贝叶斯非参数模型框架构建简介

贝叶斯非参数模型框架构建简介作者:董平周东傲林嘉宇来源:《数字技术与应用》2015年第09期摘要:1973年,Ferguson提出用带有无限维度参数空间的参数模型来表示先验的方法,随后涌现了大量的构建贝叶斯非参数模型的方法。

基于这些不同的模型构建方法,贝叶斯非参数过程在回归、聚类、变量选择等问题中得到非常广泛的应用。

本文概略的介绍了贝叶斯非参数模型的构建方法。

关键词:贝叶斯非参数参数模型局部观测中图分类号:TN918.1 文献标识码:A 文章编号:1007-9416(2015)09-0000-00Abstract:In 1973, Ferguson proposed a parametric model with infinite dimensional parameter space to represent the prior method, and then emerged a large number of methods to construct a Bayesian nonparametric model. Based on these different model construction methods, Bayesian nonparametric process is widely used in regression, clustering, variable selection and so on. This paper introduces the method of Constructing BayesianKey words:Bayesian nonparametri;Nonparametric models;Local observation贝叶斯模型的基本思想是用数学中的概率论来表示和处理所有形式的模型中的不确定量。

这是一个十分简单却异常有效的方法。

只需要掌握两个概率论的定理:求和规则和乘法规则。

缺帧环境下弱纹理图像的三维重建方法

缺帧环境下弱纹理图像的三维重建方法

缺帧环境下弱纹理图像的三维重建方法王芳;汪伟【摘要】针对传统方法进行缺帧环境下弱纹理图像三维重建时容易受到边缘缺帧点干扰的影响,降低三维重建效果的问题,提出一种基于边缘轮廓角点分割和相邻帧补偿缺帧环境下弱纹理图像三维重建方法.构建缺帧环境下图像采集模型,采用阈值降噪算法对采集的缺帧环境下弱纹理图像进行降噪滤波预处理,对降噪输出图像进行边缘轮廓角点分割,采用相邻帧补偿方法进行弱纹理修复和稳像处理,提高了对图像弱纹理特征的重建能力,实现图像三维重建的改进.仿真结果表明,改进方法能提高图像纹理识别和修复重建能力,输出图像质量得到有效改善.【期刊名称】《西安工程大学学报》【年(卷),期】2016(030)004【总页数】6页(P477-482)【关键词】弱纹理图像;边缘轮廓;相邻帧补偿;图像识别【作者】王芳;汪伟【作者单位】郑州升达经贸管理学院信息工程系,河南郑州451191;河南工程学院计算机学院,河南郑州451191【正文语种】中文【中图分类】TP391随着图像处理技术的发展,对图像进行纹理分割和边缘轮廓特征提取,在图像识别和图像修复等领域具有较好的应用价值.在复杂环境下受采集设备和环境因素影响,导致图像采集的帧丢失,在缺帧环境下,会导致图像纹理细节丢失,对缺帧环境下弱纹理图像进行三维重建是实现图像识别的重要环节,相关算法研究受到人们的极大重视[1-5].传统方法对缺帧环境下弱纹理图像三维重建方法主要有基于CT/MRI/US的缺帧环境下弱纹理图像重构算法,或基于统计形状模型(Statistical Shape Model, SSM)的缺帧环境下弱纹理图像重构方法、基于模板形状匹配的缺帧环境下弱纹理图像重构方法等[6-7],但这些方法在信噪比较低时,降低图像三维重建抗干扰性能[8-14].为此,提出一种基于边缘轮廓角点分割和相邻帧补偿缺帧环境下弱纹理图像三维重建方法.1.1 图像采集模型及图像特征分析文中构建的缺帧环境下图像采集模型如图1所示.根据上述图像采集模型,采用正方形网格立体建模,将背景图像B和当前缺帧环境下弱纹理图像I划分为(W/2)×(H/2)个子块,划分后不规则三角网中的像素灰度值满足采用深度超像素图像融合方法得到图像的边缘纹理区域中的2×2个像素点,假设缺帧环境下弱纹理图像边缘分量,且满足C([a,b],R)的像素特征收敛判决条件,得到缺帧环境下弱纹理图像的边缘网格轮廓标记线集合为采用视点切换运动方程进行图像的稳像处理:其中:x,y,z为纹理规则化的特征分布位置;ψV为视点切换运动的轮廓标记线偏角.计算弱纹理图像轮廓波域的邻域在(x,y)(x∈[0,W-1],y∈[0,H-1])处的像素值,进行图像特征点采集,使用归一化分割模型进行缺帧环境下弱纹理图像特征分割,相邻帧像素点的近似解为通过小波尺度分解,得到缺帧环境下弱纹理图像纹理特征序列子样,利用图像采集的帧误差T,得到图像主分量边缘轮廓信息特征,获取缺帧环境下弱纹理图像信息的采集形式为采用二阶不变矩作为向量量化参数,得到三维成像数据的补偿量度,成像区域特征点的灰度像素为1.2 弱纹理图像降噪预处理文中采用阈值降噪法进行图像降噪滤波,首先利用Harris角点分割,把图像分割为M个子块[15],利用Hessian矩阵,判断成像区域的极值点是否为图像的噪点,Hessian矩阵为假设Ix表示主分量边缘轮廓中含有较大噪点的特征值;Iy表示较小的特征值,通过对图像特征值的匹配和融合,计算α与β的比值γ,即α/β=γ,得到缺帧环境下弱纹理图像的透射率为当时,则深度超像素特征分割点为图像的边缘点.统计各个子块的噪声点数,判断噪声点数是否大于阈值.如果是,则命令N=5;如果否,则N=3,实现阈值降噪滤波,输出的降噪图像为其中:Δx和Δy分别表示缺帧环境下弱纹理图像的梯度水平幅值和竖直幅值;k表示缩放因子;θ表示差异性旋转角度.2.1 问题描述及边缘轮廓角点分割首先对降噪输出的图像进行边缘轮廓角点分割,利用式(12)获取缺帧环境下弱纹理图像噪声强度:其中:J(x)为一个7×7像素滑动窗口内的干扰强度;A为帧丢失的幅值;t(x)为背景噪声干扰下缺帧环境下弱纹理图像的透射率;J(x)t(x)为图像噪点的衰减项;边缘轮廓子空间内图像的帧差损失为计算全部帧图像对应的粗尺度和细尺度,通过Harris角点检测,得到图像角点包络特征方程描述为其中:ρ(x)表示两帧之间的运动位移;exp(-βd(x))是高斯函数的尺度;d(x)是相邻帧的方差.假设Ii(x,y)是小波特征分解细节重构函数,P(x,y)iv和P(x,y)if分别为两帧之间运动的成像区域像素,则尺度不变性约束函数为通过Harris角点特征提取,实现对采集的缺帧环境下弱纹理图像特征分割.2.2 弱纹理图像三维重建实现过程在上述进行缺帧环境下弱纹理图像的边缘轮廓角点分割基础上,采用相邻帧补偿方法实现弱纹理图像三维重建,相邻帧补偿示意图描述如图2所示.实现步骤如下:(1)对缺帧环境下弱纹理图像特征点及参量进行初始化,假设弱纹理图像度量区域间的差异性级数N,在相邻两帧之间求解失真阀值ε;弱纹理图像像素值的训练序列{xj},j=0,1,…,m-1,初始化帧序列n=0,D-1=∞;(2)根据缺帧环境下弱纹理图像的角点特征幅值n={yi},i=1,2,…,N,得到缺帧环境下弱纹理图像角点周围的像素点子集序列{xj},j=0,1,…,m-1和关于n 的目标图像运动参数估计值,利用角点匹配值si={xj:d(xj,yi)≤d(xj,yl)}实现误差补偿,∀l=0,1,…,N,计算缺帧环境下弱纹理图像的融合邻域空间,得到(3)如果(Dn-1-Dn)/Dn≤ε,停止;求得相应时刻的最佳估算值s(k|k),否则继续;(4)在图像边缘点进行中心像素差异值补偿,得到图像三维重建的输出码书(sj)},j=1,2,…,N;在阈值降噪后输出图像中进行边缘轮廓分割与特征提取,使得图像的像素值输出失真最小;(sj),令n=m+1,采用第2帧补偿第1帧图像,以此类推,实现相邻帧补偿和图像的三维重建.为了测试本文算法在缺帧环境下弱纹理图像三维重建中的性能,进行仿真实验.硬件环境:Intel(R) 2.3 GHz CPU,2 GB内存,32位Windows 7系统的PC机.基于Matlab 2010编程平台,根据上述仿真环境和参数设定,得到原始采集的弱纹理图像如图3所示.从图3可见,原始图像采集受边缘缺帧点干扰的影响,导致图像纹理特征较弱,为实现图像三维重建,采用本文方法对缺帧环境下弱纹理图像进行降噪预处理,对降噪输出的图像进行边缘轮廓角点分割,实现角点提取,为了对比性能,采用本文方法和传统的小波检测方法和SUSAN角点方法进行对比,得到结果如图4所示.从图4可见,采用本文方法进行缺帧环境下弱纹理图像角点提取,能有效实现对弱纹理特征的重建和边缘轮廓分割,提高图像的修复和重建能力.最后得到不同算法下进行不同帧图像的三维重建修复处理结果如图5所示.从图5可见,采用本文方法进行缺帧环境下图像三维重建,具有较好的降噪和纹理修复性能,提高对图像成像和识别能力.本文提出一种基于边缘轮廓角点分割和相邻帧补偿缺帧环境下弱纹理图像三维重建方法.仿真结果表明,采用本文方法进行缺帧环境下弱纹理图像的三维重建,提高纹理的修复能力,三维重建后图像成像的质量提高,性能优越于传统方法,展示了较好的应用价值.[2] KARLSSON J,ROWE W,XU L,et al. Fast missing-data IAA with application to notched spectrum SAR[J]. IEEE Transactions on Aerospace Electronic Systems,2014,50(2):959-971.[3] PARK H R,LI Jie. Sparse covariance-based high resolution time delay estimation for spread spectrum signals[J].Electronics Letters,2015,51(2):155-157.[4] GLENTIS G O,JAKOBSSON A,ANGELOPOULOS K. Block-recursiveIAA-based spectral estimates with missing samples using data interpolation[C]//International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence,2014,12(10):350-354.[5] 王慧贤,靳惠佳,王娇龙,等. k均值聚类引导的遥感影像多尺度分割优化方法[J]. 测绘学报,2015,44(5):526-532.[6] 宋涛,李鸥,刘广怡. 基于空时多线索融合的超像素运动目标检测方法[J]. 电子与信息学报,2016,38(6):1503-1511.[7] BROX T,MALIK J. Large displacement optical flow:Descriptor matching in variational motion estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,33(3):500-513.[8] 李秦,夏选太. 基于Kinect传感器的三维重建算法研究[J]. 电子设计工程,2015,23(17):30-31.[9] 赫高进,熊伟,陈荦,等. 基于MPI的大规模遥感影像金字塔并行构建方法[J]. 地球信息科学学报,2015,17(5):515-522.[10] 李旭超,宋博,甘良志. 改进的迭代算法在图像恢复正则化模型中的应用[J]. 电子学报,2015,43(6):1152-1159.[11] 于海琦,刘真,田全慧. 一种基于RBF神经网络的打印机光谱预测模型[J]. 影像科学与光化学,2015,33(3):238-243.[12] 柏猛,李敏花,吕英俊. 基于对称性分析的棋盘图像角点检测方法[J]. 信息与控制,2015,44(3):276-283.[13] 张子龙,薛静,乔鸿海,等. 基于改进 SURF 算法的交通视频车辆检索方法研究[J]. 西北工业大学学报,2014,32(2):297-301.[14] EVANGELIO R H,PATZOLD M,KELLER I. Adaptively splitted GMM with feedback improvement for the task of background subtraction[J]. IEEE Transactions on Information Forensics and Security,2014,9(5):863-874.[15] MARTINS P,CASEIRO R,BATISTA J. Non-parametric bayesian constrained local models[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus,2014:1797-1804.WANG Fang,WANG Wei.Three dimensional reconstruction method ofweak texture image under the condition of lack of frame[J].Journal of Xi′an Polytechnic University,2016,30(4):477-482.【相关文献】[1] 周镭,单锋,刘鹏,等. 基于供应链的企业信息化评价模型的建立[J]. 西安工程大学学报,2015,29(6):772-779.。

stata空间模型中赤池信息准则和施瓦茨准则实现

stata空间模型中赤池信息准则和施瓦茨准则实现

赤池信息准则(本人C)和施瓦茨准则(BIC)是在stata空间模型中常用的模型选择准则。

它们可以帮助我们在众多可能的模型中选择出最为合适的模型,从而提高模型的预测准确性和解释能力。

让我们来了解一下赤池信息准则和施瓦茨准则的基本概念。

赤池信息准则是由赤池广一(Akaike)教授于1974年提出的,它是一种以信息熵为基础的模型选择准则。

赤池信息准则的计算公式为本人C = -2ln(L)+2k,其中ln(L)代表模型的最大似然函数值,k代表模型的参数个数。

而施瓦茨准则是由施瓦茨瓦尔德(Schwarz)教授于1978年提出的,它是一种以贝叶斯信息准则为基础的模型选择准则。

施瓦茨准则的计算公式为BIC = -2ln(L)+k*ln(n),其中ln(L)代表模型的最大似然函数值,k代表模型的参数个数,n代表样本量。

在stata中,我们可以使用一些内置的命令来实现赤池信息准则和施瓦茨准则的模型选择。

以空间滞后模型为例,我们可以使用命令“spml”来估计模型,同时在命令中添加“aic”或“bic”选项即可得到相应的本人C值或BIC值。

通过比较不同模型的本人C值和BIC值,我们可以选择出最为合适的模型。

通过本人C和BIC准则进行模型选择的优势在于,它们可以在一定程度上避免了过拟合的问题。

过拟合是指模型在训练数据上表现非常好,但在测试数据上表现较差的情况。

本人C和BIC准则考虑了参数个数对模型准确性的影响,因此可以有效地避免过拟合问题的发生。

另外,本人C和BIC准则也考虑了样本量的大小,在样本量较小的情况下能够更好地适应模型选择。

当然,在使用本人C和BIC准则进行模型选择时也存在一些局限性。

本人C和BIC准则并不能保证我们选择出来的模型就一定是真实的最佳模型,它们只是在一定程度上帮助我们选择出最为合适的模型。

另外,本人C和BIC准则在参数个数较多的情况下可能会偏向选择出较为简单的模型,而在参数个数较少的情况下可能会偏向选择出较为复杂的模型。

Regression shrinkage and selection viathelasso论文解读

Regression shrinkage and selection viathelasso论文解读

《Regression shrinkage and selection via the lasso》论文解读姓名(学校学院,地址邮编)本文是在阅读Robert Tibshirani先生的《Regression shrinkage and selection via the lasso》这篇文章后的总结与理解。

本文从以下几个方面开展:第一部分是引言,简要介绍几种回归估计方法。

第二部分介绍Lasso回归,由garotte回归引出Lasso回归的定义。

第三部分是对Lasso回归的参数推导,通过推导公式得到λ与Lasso参数t之间具有一一对应的关系。

第四部分介绍Lasso的两种改进方法,通过改变Lasso的惩罚项对Lasso进行改进,分别叙述了两者各自的特点。

第五部分是结论,对前文的概述性总结。

1 引言普通最小二乘(OLS)估计是通过最小化剩余平方误差获得的。

但是OLS估计存在局限,一是其预测的准确度:OLS估计具有低偏差性和高方差性,即进行拟合时的误差小,但是模型的泛化能力弱,通过缩小某些系数或将某些系数设置为0来提高预测精度,牺牲偏差来降低预测值的方差;二是解释:对于大量的预测因子,往往确定一个更小的子集来展示最强的影响。

子集选择通过筛选变量增加了模型的解释能力,但是模型不稳定。

因为它是一个离散的过程,可能会有很大的变化,即回归变量要么被保留,要么从模型中删除。

数据中的微小变化会导致选择非常不同的模型,这可能会降低其预测准确性。

岭回归是缩小系数的连续过程,因此更稳定。

但是,它不将任何系数设置为0,因此不能给出容易解释的模型。

Lasso可以收缩一些系数,将其他系数设置为0,因此试图保留子集选择和岭回归的优点[1]。

本文就是主要研究Lasso回归。

2 Lasso回归2.1 non-negative garotte minimizes由Breiman提出的non-negative garotte minimizes(非负加洛特最小化,以下简化为garotte)的模型∑(y i−α−∑c jβj0̂j x ij)2Ni=1subject to c j≥0,∑c j≤t (1)garotte从OLS估计开始,并通过总和受限的非负因子缩小它们。

病原微生物宏基因组生信分析参考品及其制备方法和应用[发明专利]

病原微生物宏基因组生信分析参考品及其制备方法和应用[发明专利]

专利名称:病原微生物宏基因组生信分析参考品及其制备方法和应用
专利类型:发明专利
发明人:杨启文,朱盈,贾沛瑶,喻玮,杨斌,刘慧芳,韩士瑞
申请号:CN202210392556.4
申请日:20220415
公开号:CN114496085A
公开日:
20220513
专利内容由知识产权出版社提供
摘要:本发明涉及一种病原微生物宏基因组生信分析参考品及其制备方法和应用,属于基因检测技术领域。

该方法包括以下步骤:建立丰度分布模型:收集临床样本的宏基因组检测数据,建立自变量为测序序列数目因变量为相对丰度的高斯回归模型;标准化高通量测序数据生成:获取参考基因组序列,模拟生成每种微生物物种预定读长和预定测序错误率的高通量序列数据;Gamma‑泊松分布模型:以Gamma‑泊松分布模型拟合临床样本的宏基因组检测数据;参考品制备:以Gamma‑泊松分布模型随机产生一组模拟样本序列数据,并从标准化高通量测序数据中随机挑选相同数目的测序数据,即得。

采用该方法得到的生信分析参考品,可全面地评估生物信息分析流程的灵敏度、特异度、召回率和准确性。

申请人:中国医学科学院北京协和医院,广州微远基因科技有限公司
地址:100032 北京市东城区王府井帅府园1号
国籍:CN
更多信息请下载全文后查看。

2001Variable Selection via Nonconcave Penalized

2001Variable Selection via Nonconcave Penalized

Variable Selection via Nonconcave PenalizedLikelihood and its Oracle Properties Jianqing F an and Runze L iVariable selection is fundamental to high-dimensiona l statistical modeling,including nonparametri c regression.Many approaches in useare stepwise selection procedures,which can be computationally expensive and ignore stochastic errors in the variable selection process.I n this article,penalized likelihood approaches are proposed to handle these kinds of problems.The proposed methods select variablesand estimate coef cients simultaneously.Hence they enable us to construct con dence intervals for estimated parameters.The proposedapproaches are distinguished from others in that the penalty functions are symmetric,nonconcav e on401ˆ5,and have singularities at theorigin to produce sparse solutions.Furthermore,the penalty functions should be bounded by a constant to reduce bias and satisfy certainconditions to yield continuous solutions.A new algorithm is proposed for optimizing penalized likelihood functions.The proposed ideasare widely applicable.They are readily applied to a variety of parametric models such as generalized linear models and robust regressionmodels.They can also be applied easily to nonparametri c modeling by using wavelets and splines.Rates of convergenc e of the proposedpenalized likelihood estimators are established.Furthermore,with proper choice of regularization parameters,we show that the proposedestimators perform as well as the oracle procedure in variable selection;namely,they work as well as if the correct submodel wereknown.Our simulation shows that the newly proposed methods compare favorably with other variable selection techniques.Furthermore,the standard error formulas are tested to be accurate enough for practical applications.KEY WORDS:Hard thresholding;LASSO;Nonnegative garrote;Penalized likelihood;Oracle estimator;SCAD;Soft thresholding.1.INTRODUCTIONVariable selection is an important topic in linear regressionanalysis.I n practice,a large number of predictors usually areintroduced at the initial stage of modeling to attenuate possiblemodeling biases.On the other hand,to enhance predictabil-ity and to select signi cant variables,statisticians usually usestepwise deletion and subset selection.Although they arepractically useful,these selection procedures ignore stochas-tic errors inherited in the stages of variable selections.Hence,their theoretical properties are somewhat hard to understand.Furthermore,the best subset variable selection suffers fromseveral drawbacks,the most severe of which is its lack ofstability as analyzed,for instance,by Breiman(1996).I n anattempt to automatically and simultaneously select variables,we propose a uni ed approach via penalized least squares,retaining good features of both subset selection and ridgeregression.The penalty functions have to be singular at theorigin to produce sparse solutions(many estimated coef cientsare zero),to satisfy certain conditions to produce continuousmodels(for stability of model selection),and to be boundedby a constant to produce nearly unbiased estimates for largecoef cients.The bridge regression proposed in Frank andFriedman(1993)and the least absolute shrinkage and selectionoperator(LASSO)proposed by Tibshirani(1996,1997)aremembers of the penalized least squares,although their asso-ciated Lq penalty functions do not satisfy all of the precedingthree required properties.The penalized least squares idea can be extended naturally to likelihood-based models in various statistical contexts.Our approaches are distinguished from traditional methods(usu-ally quadratic penalty)in that the penalty functions are sym-Jianqing Fan is Professor of Statistics,Department of Statistics,Chinese University of Hong Kong and Professor,Department of Statistics,Univer-sity of California,Los Angeles,CA90095.Runze Li is Assistant Professor, Department of Statistics,Pennsylvania State University,University Park,PA 16802-2111.Fan’s research was partially supported by National Science Foundation(NSF)grants DMS-0196041and DMS-9977096and a grant from University of California at Los Angeles.Li’s research was supported by a NSF grant DMS-0102505.The authors thank the associate editor and the ref-erees for constructive comments that substantially improved an earlier draft.metric,convex on401ˆ5(rather than concave for the negative quadratic penalty in the penalized likelihood situation),and possess singularities at the origin.A few penalty functions are discussed.They allow statisticians to select a penalty function to enhance the predictive power of a model and engineers to sharpen noisy images.Optimizing a penalized likelihood is challenging,because the target function is a high-dimensional nonconcave function with singularities.A new and generic algorithm is proposed that yields a uni ed variable selection procedure.A standard error formula for estimated coef cients is obtained by using a sandwich formula.The formula is tested accurately enough for practical purposes,even when the sam-ple size is very moderate.The proposed procedures are com-pared with various other variable selection approaches.The results indicate the favorable performance of the newly pro-posed procedures.Unlike the traditional variable selection procedures,the sampling properties on the penalized likelihood can be estab-lished.We demonstrate how the rates of convergence for the penalized likelihood estimators depend on the regularization parameter.We further show that the penalized likelihood esti-mators perform as well as the oracle procedure in terms of selecting the correct model,when the regularization param-eter is appropriately chosen.In other words,when the true parameters have some zero components,they are estimated as 0with probability tending to1,and the nonzero components are estimated as well as when the correct submodel is known. This improves the accuracy for estimating not only the null components,but also the nonnull components.I n short,the penalized likelihood estimators work as well as if the correct submodel were known in advance.The signi cance of this is that the proposed procedures outperform the maximum likeli-hood estimator and perform as well as we hope.This is very analogous to the superef ciency phenomenon in the Hodges example(see Lehmann1983,p.405).©2001American Statistical AssociationJournal of the American Statistical Association December2001,Vol.96,No.456,Theory and Methods1348Fan and Li:Nonconcave Penalized Likelihood1349 The proposed penalized likelihood method can be appliedreadily to high-dimensional nonparametric modeling.Afterapproximating regression functions using splines or wavelets,it remains very critical to select signi cant variables(termsin the expansion)to ef ciently represent unknown functions.I n a series of work by Stone and his collaborators(seeStone,Hansen,Kooperberg,and Truong1997),the traditionalvariable selection approaches were modi ed to select usefulspline subbases.It remains very challenging to understandthe sampling properties of these data-driven variable selec-tion techniques.Penalized likelihood approaches,outlined inWahba(1990),and Green and Silverman(1994)and refer-ences therein,are based on a quadratic penalty.They reducethe variability of estimators via the ridge regression.In waveletapproximations,Donoho and Johnstone(1994a)selected sig-ni cant subbases(terms in the wavelet expansion)via thresh-olding.Our penalized likelihood approach can be applieddirectly to these problems(see Antoniadis and Fan,in press).Because we select variables and estimate parameters simulta-neously,the sampling properties of such a data-driven variableselection method can be established.In Section2,we discuss the relationship between the penal-ized least squares and the subset selection when design matri-ces are orthonormal.In Section3,we extend the penalizedlikelihood approach discussed in Section2to various para-metric regression models,including traditional linear regres-sion models,robust linear regression models,and generalizedlinear models.The asymptotic properties of the penalizedlikelihood estimators are established in Section3.2.Basedon local quadratic approximations,a uni ed iterative algo-rithm for nding penalized likelihood estimators is proposedin Section3.3.The formulas for covariance matrices of theestimated coef cients are also derived in this section.Twodata-driven methods for nding unknown thresholding param-eters are discussed in Section4.Numerical comparisons andsimulation studies also are given in this section.Some discus-sion is given in Section5.Technical proofs are relegated tothe Appendix.2.PENALIZED LEAST SQUARES ANDVARIABLE SELECTIONConsider the linear regression modely D X‚C˜1(2.1)where y is an n 1vector and X is an n d matrix.As inthe traditional linear regression model,we assume that yi ’s areconditionally independent given the design matrix.There are strong connections between the penalized least squares and the variable selection in the linear regression model.To gain more insights about various variable selection procedures,in this section we assume that the columns of X in(2.1)are orthonormal.The least squares estimate is obtained via min-imizing˜yƒX‚˜2,which is equivalent to˜O‚ƒ‚˜2,where O‚D X T y is the ordinary least squares estimate.Denote z D X T y and let O y D XX T y.A form of the penalized least squares is1˜yƒX‚˜2C‹d X j D1p j4—‚j—5D1˜yƒO y˜2C1d X j D14z jƒ‚j52C‹d X j D1p j4—‚j—50(2.2)The penalty functions pj4¢5in(2.2)are not necessarily the same for all j.For example,we may wish to keep impor-tant predictors in a parametric model and hence not be will-ing to penalize their corresponding parameters.For simplicity of presentation,we assume that the penalty functions for all coef cients are the same,denoted by p4—¢—5.Furthermore,we denote‹p4—¢—5by p‹4—¢—5,so p4—¢—5may be allowed to depend on‹.Extensions to the case with different thresholding func-tions do not involve any extra dif culties.The minimization problem of(2.2)is equivalent to mini-mizing componentwise.This leads us to consider the penal-ized least squares problem124zƒˆ52C p‹4—ˆ—50(2.3) By taking the hard thresholding penalty function[see Fig.1(a)]p‹4—ˆ—5D‹2ƒ4—ˆ—ƒ‹52I4—ˆ—<‹51(2.4) we obtain the hard thresholding rule(see Antoniadis1997and Fan1997)OˆD zI4—z—>‹53(2.5)see Figure2(a).In other words,the solution to(2.2)is simply zjI4—zj—>‹5,which coincides with the best subset selection,and stepwise deletion and addition for orthonor-mal designs.Note that the hard thresholding penalty func-tion is a smoother penalty function than the entropy penaltyp‹4—ˆ—5D4‹2=25I4—ˆ—D05,which also results in(2.5).The former facilitates computational expedience in other settings.A good penalty function should result in an estimator with three properties.1.Unbiasedness:The resulting estimator is nearly unbi-ased when the true unknown parameter is large to avoid unnecessary modeling bias.2.Sparsity:The resulting estimator is a thresholding rule,which automatically sets small estimated coef cients to zero to reduce model complexity.3.Continuity:The resulting estimator is continuous in dataz to avoid instability in model prediction.We now provide some insights on these requirements.The rst order derivative of(2.3)with respect toˆis sgn4ˆ58—ˆ—C p0‹4—ˆ—59ƒz.It is easy to see that when p0‹4—ˆ—5D 0for large—ˆ—,the resulting estimator is z when—z—is suf ciently large.Thus,when the true parameter—ˆ—is large,the observed value—z—is large with high probability. Hence the penalized least squares simply is OˆD z,which is approximately unbiased.Thus,the condition that p0‹4—ˆ—5D01350Journal of the American Statistical Association,December 2001Figure 1.Three Penalty Functions p ‹(ˆ)and Their Quadratic Approximations.The values of ‹are the same as those in Figure 5(c).for large —ˆ—is a suf cient condition for unbiasedness for a large true parameter.I t corresponds to an improper prior dis-tribution in the Bayesian model selection setting.A suf cient condition for the resulting estimator to be a thresholding ruleis that the minimum of the function —ˆ—C p 0‹4—ˆ—5is positive .Figure 3provides further insights into this statement.When—z —<min ˆD 08—ˆ—C p 0‹4—ˆ—59,the derivative of (2.3)is posi-tive for all positive ˆ’s (and is negative for all negative ˆ’s).Therefore,the penalized least squares estimator is 0in thissituation,namely O ˆD 0for —z —<min ˆD 08—ˆ—C p 0‹4—ˆ—59.When —z —>min ˆD 0—ˆ—C p 0‹4—ˆ—5,two crossings may exist as shown in Figure 1;the larger one is a penalized least squares esti-mator.This implies that a suf cient and necessary conditionfor continuity is that the minimum of the function —ˆ—C p 0‹4—ˆ—5is attained at 0.From the foregoing discussion,we conclude that a penalty function satisfying the conditions of sparsity and continuity must be singular at the origin.It is well known that the L 2penalty p ‹4—ˆ—5D ‹—ˆ—2results in a ridge regression.The L 1penalty p ‹4—ˆ—5D ‹—ˆ—yields a soft thresholding ruleO ˆj D sgn 4z j 54—z j —ƒ‹5C 1(2.6)Figure 2.Plot of Thresholding Functions for (a)the Hard,(b)the Soft,and (c)the SCAD Thresholding Functions With ‹D 2and a D 3.7for SCAD.which was proposed by Donoho and Johnstone (1994a).LASSO,proposed by Tibshirani (1996,1997),is the penalized least squares estimate with the L 1penalty in the general least squares and likelihood settings.The L q penalty p ‹4—ˆ—5D ‹—ˆ—q leads to a bridge regression (Frank and Friedman 1993and Fu 1998).The solution is continuous only when q ¶1.How-ever,when q >1,the minimum of —ˆ—C p 0‹4—ˆ—5is zero and hence it does not produce a sparse solution [see Fig.4(a)].The only continuous solution with a thresholding rule in this family is the L 1penalty,but this comes at the price of shifting the resulting estimator by a constant ‹[see Fig.2(b)].2.1Smoothly Clipped Absolute Deviation PenaltyThe L q and the hard thresholding penalty functions do not simultaneously satisfy the mathematical conditions for unbi-asedness,sparsity,and continuity.The continuous differen-tiable penalty function de ned byp 0‹4ˆ5D ‹I4ˆµ‹5C4a‹ƒˆ5C4a ƒ15‹I4ˆ>‹5for some a >2and ˆ>01(2.7)Fan and Li:Nonconcave Penalized Likelihood 1351Figure 3.A Plot of ˆ+p 0‹(ˆ)Against ˆ(ˆ>0).improves the properties of the L 1penalty and the hardthresholding penalty function given by (2.4)[see Fig.1(c)and subsequent discussion].We call this penalty function the smoothly clipped absolute deviation (SCAD)penalty.It corre-sponds to a quadratic spline function with knots at ‹and a‹.This penalty function leaves large values of ˆnot excessively penalized and makes the solution continuous.The resulting solution is given byO ˆD 8><>:sgn 4z54—z —ƒ‹5C 1when —z —µ2‹184a ƒ15z ƒsgn 4z5a‹9=4a ƒ251when 2‹<—z —µa‹1z1when —z —>a‹(2.8)[see Fig.2(c)].This solution is owing to Fan (1997),who gave a brief discussion in the settings of wavelets.In this article,we use it to develop an effective variable selection procedure for a broad class of models,including linear regression models and generalized linear models.For simplicity of presentation,we use the acronym SCAD for all procedures using the SCAD penalty.The performance of SCAD is similar to that of rm shrinkage proposed by Gao and Bruce (1997)when design matrices are orthonormal.Figure 4.Plot of p 0‹(ˆ)Functions Over ˆ>0(a)for L q Penalties,(b)the Hard Thresholding Penalty,and (c)the SCAD Penalty.In (a),the heavy line corresponds to L 1,the dash-dot line corresponds to L .5,and the thin line corresponds to L 2penalties.The thresholding rule in (2.8)involves two unknown param-eters ‹and a .I n practice,we could search the best pair 4‹1a5over the two-dimensional grids using some criteria,such as cross-validation and generalized cross-validation (Craven and Wahba 1979).Such an implementation can be computation-ally expensive.To implement tools in Bayesian risk analy-sis,we assume that for given a and ‹,the prior distribution for ˆis a normal distribution with zero mean and variance a‹.We compute the Bayes risk via numerical integration.Figure 5(a)depicts the Bayes risk as a function of a under the squared loss,for the universal thresholding ‹D p 2log 4d5(see Donoho and Johnstone,1994a)with d D 20140160,and 100;and Figure 5(b)is for d D 512,1024,2048,and 4096.From Figure 5,(a)and (b),it can be seen that the Bayesian risks are not very sensitive to the values of a .It can be seen from Figure 5(a)that the Bayes risks achieve their minimums ata 307when the value of d is less than 100.This choice gives pretty good practical performance for various variable selec-tion problems.I ndeed,based on the simulations in Section 4.3,the choice of a D 307works similarly to that chosen by the generalized cross-validation (GCV)method.2.2Performance of Thresholding RulesWe now compare the performance of the four previously stated thresholding rules.Marron,Adak,Johnstone,Neumann,and Patil (1998)applied the tool of risk analysis to under-stand the small sample behavior of the hard and soft thresh-olding rules.The closed forms for the L 2risk functions R4O ˆ1ˆ5D E4O ˆƒˆ52were derived under the Gaussian model Z N 4ˆ1‘25for hard and soft thresholding rules by Donoho and Johnstone (1994b).The risk function of the SCAD thresh-olding rule can be found in Li (2000).To gauge the perfor-mance of the four thresholding rules,Figure 5(c)depicts their L 2risk functions under the Gaussian model Z N 4ˆ115.To make the scale of the thresholding parameters roughly com-parable,we took ‹D 2for the hard thresholding rule and adjusted the values of ‹for the other thresholding rules so that their estimated values are the same when ˆD 3.The SCAD1352Journal of the American Statistical Association,December2001 Figure5.Risk Functions of Proposed Procedures Under the Quadratic Loss.(a)Posterior risk functions of the SCAD under the priorˆN(0,a‹)using the universal thresholding‹D p2log(d)for four different values d:heavy line,d D20;dashed line,d D40;medium line,d D60; thin line,d D100.(b)Risk functions similar to those for(a):heavy line,d D572;dashed line,d D1,024;medium line,d D2,048;thin line, d D4,096.(c)Risk functions of the four different thresholding rules.The heavy,dashed,and solid lines denote minimum SCAD,hard,and soft thresholding rules,respectively.performs favorably compared with the other two thresholdingrules.This also can be understood via their correspondingpenalty functions plotted in Figure1.It is clear that the SCADretains the good mathematical properties of the other twothresholding penalty functions.Hence,it is expected to per-form the best.For general‘2,the picture is the same,exceptit is scaled vertically by‘2,and theˆaxis should be replacedbyˆ=‘.3.VARIABLE SELECTION VIA PENALIZEDLIKELIHOODThe methodology in the previous section can be applieddirectly to many other statistical contexts.In this section,weconsider linear regression models,robust linear models,andlikelihood-based generalized linear models.From now on,weassume that the design matrix X D4xij 5is standardized so thateach column has mean0and variance1.3.1Penalized Least Squares and LikelihoodIn the classical linear regression model,the least squares estimate is obtained via minimizing the sum of squared resid-ual errors.Therefore,(2.2)can be extended naturally to the situation in which design matrices are not orthonormal.Simi-lar to(2.2),a form of penalized least squares is1 24yƒX‚5T4yƒX‚5C ndX j D1p‹4—‚j—50(3.1)Minimizing(3.1)with respect to‚leads to a penalized least squares estimator of‚.It is well known that the least squares estimate is not robust. We can consider the outlier-resistant loss functions such asthe L1loss or,more generally,Huber’s–function(see Huber1981).Therefore,instead of minimizing(3.1),we minimizenX i D1–4—y iƒx i‚—5C n d X j D1p‹4—‚j—5(3.2)with respect to‚.This results in a penalized robust estimator for‚.For generalized linear models,statistical inferences are based on underlying likelihood functions.The penalized max-imum likelihood estimator can be used to select signi cant variables.Assume that the data84xi1Yi59are collected inde-pendently.Conditioning on xi1Y i has a density f i4g4x Ti‚51y i5, where g is a known link function.Let`i D log f i denote the conditional log-likelihood of Yi.A form of the penalized like-lihood isnX i D1`ig x Ti‚1y iƒn d X j D1p‹4—‚j—50 Maximizing the penalized likelihood function is equivalent to minimizingƒn X i D1`ig x Ti‚1y i C n d X j D1p‹4—‚j—5(3.3)with respect to‚.To obtain a penalized maximum likelihood estimator of‚,we minimize(3.3)with respect to‚for some thresholding parameter‹.3.2Sampling Properties and Oracle PropertiesIn this section,we establish the asymptotic theory for our nonconcave penalized likelihood estimator.Let‚0D4‚101:::1‚d05TD‚T101‚T20T0Without loss of generality,assume that‚20D0.Let I4‚05be the Fisher information matrix and let I14‚10105be the Fisher information knowing‚20D0.We rst show that there exists a penalized likelihood estimator that converges at the rateOP4nƒ1=2C a n51(3.4)where an D max8p0‹n4—‚j0—52‚j0D09.This implies that for the hard thresholding and SCAD penalty functions,the penalized likelihood estimator is root-n consistent if‹n!0.Further-more,we demonstrate that such a root-n consistent estimatorFan and Li:Nonconcave Penalized Likelihood 1353must satisfy O ‚2D 0and O ‚1is asymptotic normal with covari-ance matrix I ƒ11,if n 1=2‹n !ˆ.This implies that the penal-ized likelihood estimator performs as well as if ‚20D 0were known.In language similar to Donoho and Johnstone (1994a),the resulting estimator performs as well as the oracle estima-tor,which knows in advance that ‚20D 0.The preceding oracle performance is closely related to the superef ciency phenomenon.Consider the simplest linear regression model y D 1n ŒC ˜,where ˜N n 401I n 5.A super-ef cient estimate for Œis…n D(S Y 1if —S Y —¶n ƒ1=41c S Y 1if —S Y —<n ƒ1=41owing to Hodges (see Lehmann 1983,p.405).If we set c to0,then …n coincides with the hard thresholding estimator with the thresholding parameter ‹n D n ƒ1=4.This estimator correctly estimates the parameter at point 0without paying any price for estimating the parameter elsewhere.We now state the result in a fairly general setting.To facilitate the presentation,we assume that the penalization is applied to every component of ‚.However,there is no extra dif culty to extend it to the case where some components (e.g.,variance in the linear models)are not penalized.Set V i D 4X i 1Y i 5,i D 11:::1n .Let L4‚5be the log-likelihood function of the observations V 11:::1V n and let Q4‚5be the penalized likelihood function L4‚5ƒn P dj D 1p ‹n 4—‚j —5.We state our theorems here,but their proofs are relegated to the Appendix,where the conditions for the theorems also can be found.Theorem 1.Let V 11:::1V n be independent and identi-cally distributed,each with a density f 4V 1‚5(with respect to a measure Œ)that satis es conditions (A)–(C)in theAppendix.If max 8—p 00‹n4—‚j 0—5—2‚j 0D 09!0,then there exists a local maximizer O ‚of Q4‚5such that ˜O ‚ƒ‚0˜D O P 4n ƒ1=2C a n 5,where a n is given by (3.4).It is clear from Theorem 1that by choosing a proper ‹n ,there exists a root-n consistent penalized likelihood estimator.We now show that this estimator must possess the sparsityproperty O ‚2D 0,which is stated as follows.Lemma 1.Let V 11:::1V n be independent and identically distributed,each with a density f 4V 1‚5that satis es condi-tions (A)–(C)in the Appendix.Assume thatlim inf n !ˆlim inf ˆ!0Cp 0‹n 4ˆ5=‹n >00(3.5)I f ‹n !0and pn‹n !ˆas n !ˆ,then with probabil-ity tending to 1,for any given ‚1satisfying ˜‚1ƒ‚10˜D O P 4n ƒ1=25and any constant C ,Q‚1Dmax˜‚2˜µCn ƒ1=2Q‚1‚2Denote èD diag p 00‹n 4—‚10—51:::1p 00‹n 4—‚s 0—5andbD p 0‹n 4—‚10—5sgn 4‚1051:::1p 0‹n4—‚s 0—5sgn 4‚s 05T1where s is the number of components of ‚10.Theorem 2(Oracle Property).Let V 11:::1V n be inde-pendent and identically distributed,each with a densityf 4V 1‚5satisfying conditions (A)–(C)in Appendix.Assume that the penalty function p ‹n 4—ˆ—5satis es condition (3.5).If ‹n !0and p n‹n !ˆas n !ˆ,then with probability tend-ing to 1,the root-n consistent local maximizers O ‚D O ‚1O ‚2in Theorem 1must satisfy:(a)Sparsity:O ‚2D 0.(b)Asymptotic normality:pn4I 14‚105Cè5O ‚1ƒ‚10C 4I 14‚105C è5ƒ1b !N 01I 14‚105in distribution,where I 14‚105D I 14‚10105,the Fisher informa-tion knowing ‚2D 0.As a consequence,the asymptotic covariance matrix of O ‚1is 1n I 14‚105C èƒ1I 14‚105I 14‚105C èƒ11which approximately equals 41=n5I ƒ114‚105for the threshold-ing penalties discussed in Section 2if ‹n tends to 0.Remark 1.For the hard and SCAD thresholding penalty functions,if ‹n !01a n D 0.Hence,by Theorem 2,when pn‹n !ˆ,their corresponding penalized likelihood esti-mators possess the oracle property and perform as well as the maximum likelihood estimates for estimating ‚1knowing ‚2D 0.However,for the L 1penalty,a n D ‹n .Hence,the root-n consistency requires that ‹n D O P 4n ƒ1=25.On the other hand,the oracle property in Theorem 2requires that pn‹n !ˆ.These two conditions for LASSO cannot be satis ed simul-taneously.Indeed,for the L 1penalty,we conjecture that the oracle property does not hold.However,for L q penalty with q <1,the oracle property continues to hold with suitable choice of ‹n .Now we brie y discuss the regularity conditions (A)–(C)for the generalized linear models (see McCullagh and Nelder 1989).With a canonical link,the condition distribution of Y given X D x belongs to the canonical exponential family,that is,with a density functionf 4y3x 1‚5D c4y5exp y x T ‚ƒb4x T ‚5Clearly,the regularity conditions (A)are satis ed.The Fisher information matrix isI4‚5DE b 004x T ‚5xxT =a4”50Therefore,if E8b 004x T ‚5xx T 9is nite and positive de nite,then condition (B)holds.I f for all ‚in some neighborhood of ‚0,—b 4354x T ‚5—µM 04x 5for some function M 04x 5satisfy-ing E ‚08M 04x 5X j X k X l 9<ˆfor all j1k1l ,then condition (C)holds.For general link functions,similar conditions need to guarantee conditions (B)and (C).The mathematical deriva-tion of those conditions does not involve any extra dif culty except more tedious notation.Results in Theorems 1and 2also can be established for the penalized least squares (3.1)and the penalized robust linear regression (3.2)under some mild regularity conditions.See Li (2000)for details.1354Journal of the American Statistical Association,December2001 3.3A New Uni’ed AlgorithmTibshirani(1996)proposed an algorithm for solvingconstrained least squares problems of LASSO,whereasFu(1998)provided a“shooting algorithm”for LASSO.See also LASSO2submitted by Berwin Turlach at Statlib(http:===S=).In this section,we propose a newuni ed algorithm for the minimization problems(3.1),(3.2),and(3.3)via local quadratic approximations.The rst term in(3.1),(3.2),and(3.3)may be regarded as a loss function of‚.Denote it by`4‚5.Then the expressions(3.1),(3.2),and(3.3)can be written in a uni ed form as`4‚5C ndX j D1p‹4—‚j—50(3.6)The L1,hard thresholding,and SCAD penalty functions aresingular at the origin,and they do not have continuous second order derivatives.However,they can be locally approximated by a quadratic function as follows.Suppose that we are givenan initial value‚0that is close to the minimizer of(3.6).If‚j0is very close to0,then set O‚j D0.Otherwise they can belocally approximated by a quadratic functionasp ‹4—‚j—50D p0‹4—‚j—5sgn4‚j5p0‹4—‚j0—5=—‚j0—‚j1when‚j D0.I n other words,p ‹4—‚j—5p‹4—‚j0—5C12p0‹4—‚j0—5=—‚j0—4‚2jƒ‚2j051for‚j‚j00(3.7)Figure1shows the L1,hard thresholding,and SCAD penaltyfunctions,and their approximations on the right-hand side of(3.7)at two different values of‚j0.A drawback of this approx-imation is that once a coef cient is shrunken to zero,it will stay at zero.However,this method signi cantly reduces the computational burden.If`4‚5is the L1loss as in(3.2),then it does not havecontinuous second order partial derivatives with respect to‚. However,–4—yƒx T‚—5in(3.2)can be analogously approx-imated by8–4yƒx T‚05=4yƒx T‚05294yƒx T‚52,as long asthe initial value‚0of‚is close to the minimizer.When someof the residuals—yƒx T‚0—are small,this approximation is not very good.See Section3.4for some slight modi cations of this approximation.Now assume that the log-likelihood function is smooth with respect to‚so that its rst two partial derivatives are contin-uous.Thus the rst term in(3.6)can be locally approximated by a quadratic function.Therefore,the minimization problem (3.6)can be reduced to a quadratic minimization problem and the Newton–Raphson algorithm can be used.I ndeed,(3.6)can be locally approximated(except for a constant term)by`4‚05Cï`4‚5T4‚ƒ‚5C124‚ƒ‚5Tï2`4‚54‚ƒ‚5C1n‚Tè‹4‚05‚1(3.8)whereï`4‚5D¡`4‚51ï2`4‚5D¡2`4‚5¡‚¡‚1è‹4‚5D diag p0‹4—‚10—5=—‚10—1:::1p0‹4—‚d0—5=—‚d0—0The quadratic minimization problem(3.8)yields the solution‚1D‚0ƒï2`4‚05C nè‹4‚05ƒ1ï`4‚5C n U‹4‚51(3.9)where U‹4‚5Dè‹4‚5‚.When the algorithm converges,the estimator satis es the condition¡`4O‚05¡‚jC np0‹4—O‚j0—5sgn4O‚j05D01the penalized likelihood equation,for nonzero elements ofO‚0.Speci cally,for the penalized least squares problem(3.1),the solution can be found by iteratively computing the ridgeregression‚1D X T X C nè‹4‚05ƒ17X T y0Similarly we obtain the solution for(3.2)by iterating‚1D X T WX C1nè‹4‚05ƒ1X T Wy1where W D diag8–4—y1ƒx T1‚0—5=4y1ƒx T1‚0521:::1–4—y nƒx Tn‚0—5=4y nƒx T n‚0529.As in the maximum likelihood estimation(MLE)setting,with the good initial value‚,the one-step procedure canbe as ef cient as the fully iterative procedure,namely,thepenalized maximum likelihood estimator,when the Newton–Raphson algorithm is used(see Bickel1975).Now regarding‚4kƒ15as a good initial value at the k th step,the next iterationalso can be regarded as a one-step procedure and hence theresulting estimator still can be as ef cient as the fully itera-tive method(see Robinson1988for the theory on the differ-ence between the MLE and the k-step estimators).Therefore,estimators obtained by the aforementioned algorithm with afew iterations always can be regarded as a one-step estima-tor,which is as ef cient as the fully iterative method.In thissense,we do not have to iterate the foregoing algorithm untilconvergence as long as the initial estimators are good enough.The estimators from the full models can be used as initial esti-mators,as long as they are not overly parameterized.3.4Standard Error FormulaThe standard errors for the estimated parameters can beobtained directly because we are estimating parameters andselecting variables at the same time.Following the conven-tional technique in the likelihood setting,the correspondingsandwich formula can be used as an estimator for the covari-ance of the estimates O‚1,the nonvanishing component of O‚.That is,d c ov4O‚15Dï2`4O‚15C nè‹4O‚15ƒ1d c ovï`4O‚15ï2`4O‚15C nè‹4O‚15ƒ10(3.10)。

spcov包的中文名称:稀疏估计多变量正定协方差矩阵说明书

spcov包的中文名称:稀疏估计多变量正定协方差矩阵说明书

Package‘spcov’October14,2022Type PackageTitle Sparse Estimation of a Covariance MatrixVersion1.3Date2022-09-21Author Jacob Bien and Rob TibshiraniMaintainer Jacob Bien<*************>Description Provides a covariance estimator for multivariate normaldata that is sparse and positive definite.Implements themajorize-minimize algorithm described in Bien,J.,andTibshirani,R.(2011),``Sparse Estimation of a CovarianceMatrix,''Biometrika.98(4).807--820.License GPL-2LazyLoad yesEncoding UTF-8RoxygenNote7.2.1NeedsCompilation noRepository CRANDate/Publication2022-09-2320:20:02UTCR topics documented:spcov-package (2)GenerateCliquesCovariance (2)ProxADMM (3)spcov (5)Index812GenerateCliquesCovariance spcov-package Sparse Estimation of a Covariance MatrixDescriptionPerforms the majorize-minimize algorithm described inDetailsBien,J.,and Tibshirani,R.(2011),"Sparse Estimation of a Covariance Matrix,"Biometrika.98(4).807–820.Generalized gradient descent is used to minimize each majorizer.Package:spcovType:PackageVersion: 1.01Date:2012-03-04License:GPL-2LazyLoad:yesSee the function spcov.Author(s)Jacob Bien and Rob TibshiraniMaintainer:Jacob Bien<******************>ReferencesBien,J.,and Tibshirani,R.(2011),"Sparse Estimation of a Covariance Matrix,"Biometrika.98(4).807–820.See AlsospcovGenerateCliquesCovarianceGenerate a block diagonal covariance matrixDescriptionThis function is included in the package so that it can be used in the example code provided in spcov.UsageGenerateCliquesCovariance(ncliques,cliquesize,theta)Argumentsncliques number of blockscliquesize size of each blocktheta magnitude of non-zerosDetailsThis function generates a block diagonal positive definite matrix with randomly-signed,non-zero elements.A shift is added to the diagonal of the matrix so that its condition number equals p,the number of variables.ValueSigma the covariance matrixA symmetric square root of Sigmashift how much the eigenvalues were shifted.See details.Author(s)Jacob Bien and Rob TibshiraniReferencesBien,J.,and Tibshirani,R.(2011),"Sparse Estimation of a Covariance Matrix,"accepted for pub-lication in Biometrika.ProxADMM Solving penalized Frobenius problem.DescriptionThis function solves the optimization problemUsageProxADMM(A,del,lam,P,rho=0.1,tol=1e-06,maxiters=100,verb=FALSE)ArgumentsA A symmetric matrix.del A non-negative scalar.Lower bound on eigenvalues.lam A non-negative scalar.L1penalty parameter.P Matrix with non-negative elements and dimension of A.Allows for differing L1 penalty parameters.rho ADMM parameter.Can affect rate of convergence a lot.tol Convergence threshold.maxiters Maximum number of iterations.verb Controls whether to be verbose.DetailsMinimize_X(1/2)||X-A||_F^2+lam||P*X||_1s.t.X>=del*I.This is the prox function for the generalized gradient descent of Bien&Tibshirani2011(see full reference below).This is the R implementation of the algorithm in Appendix3of Bien,J.,and Tibshirani,R.(2011), "Sparse Estimation of a Covariance Matrix,"Biometrika.98(4).807–820.It uses an ADMM approach to solve the problemMinimize_X(1/2)||X-A||_F^2+lam||P*X||_1s.t.X>=del*I.Here,the multiplication between P and X is elementwise.The inequality in the constraint is a lower bound on the minimum eigenvalue of the matrix X.Note that there are two variables X and Z that are outputted.Both are estimates of the optimal X.However,Z has exact zeros whereas X has eigenvalues at least del.Running the ADMM algorithm long enough,these two are guaranteed to converge.ValueX Estimate of optimal X.Z Estimate of optimal X.obj Objective values.Author(s)Jacob Bien and Rob TibshiraniReferencesBien,J.,and Tibshirani,R.(2011),"Sparse Estimation of a Covariance Matrix,"Biometrika.98(4).807–820.See AlsospcovExamplesset.seed(1)n<-100p<-200#generate a covariance matrix:model<-GenerateCliquesCovariance(ncliques=4,cliquesize=p/4,1)#generate data matrix with x[i,]~N(0,model$Sigma):x<-matrix(rnorm(n*p),ncol=p)%*%model$AS<-var(x)#compute sparse,positive covariance estimator:P<-matrix(1,p,p)diag(P)<-0lam<-0.1aa<-ProxADMM(S,0.01,lam,P)spcov Sparse Covariance EstimationDescriptionProvides a sparse and positive definite estimate of a covariance matrix.This function performs the majorize-minimize algorithm described in Bien&Tibshirani2011(see full reference below). Usagespcov(Sigma,S,lambda,step.size,nesterov=TRUE,n.outer.steps=10000,n.inner.steps=10000,tol.outer=1e-04,thr.inner=0.01,backtracking=0.2,trace=0)ArgumentsSigma an initial guess for Sigma(suggestions:S or diag(diag(S))).S the empirical covariance matrix of the data.Must be positive definite(if it is not, add a small constant to the diagonal).lambda penalty parameter.Either a scalar or a matrix of the same dimension as Sigma.This latter choice should be used to penalize only off-diagonal elements.Allelements of lambda must be non-negative.step.size the step size to use in generalized gradient descent.Affects speed of algorithm.nesterov indicates whether to use Nesterov’s modification of generalized gradient de-scent.Default:TRUE.n.outer.steps maximum number of majorize-minimize steps to take(recall that MM is the outer loop).n.inner.steps maximum number of generalized gradient steps to take(recall that generalized gradient descent is the inner loop).tol.outer convergence threshold for outer(MM)loop.Stops when drop in objective be-tween steps is less than tol.outer.thr.inner convergence threshold for inner(i.e.generalized gradient)loop.Stops when mean absolute change in Sigma is less than thr.inner*mean(abs(S)).backtracking if FALSE,thenfixed step size used.If numeric and in(0,1),this is the parameter of backtracking that multiplies step.size on each ually,in range of(0.1,0.8).Default:0.2.trace controls how verbose output should be.DetailsThis is the R implementation of Algorithm1in Bien,J.,and Tibshirani,R.(2011),"Sparse Estima-tion of a Covariance Matrix,"Biometrika.98(4).807–820.The goal is to approximately minimize (over Sigma)the following non-convex optimization problem:minimize logdet(Sigma)+trace(S Sigma^-1)+||lambda*Sigma||_1subject to Sigma positive definite.Here,the L1norm and matrix multiplication between lambda and Sigma are elementwise.The empirical covariance matrix must be positive definite for the optimization problem to have bounded objective(see Section3.3of paper).We suggest adding a small constant to the diagonal of S if it is not.Since the above problem is not convex,the returned matrix is not guaranteed to be a global minimum of the problem.In Section3.2of the paper,we mention a simple modification of gradient descent due to Nesterov.The argument nesterov controls whether to use this modification(we suggest that it be used).We also strongly recommend using backtracking.This allows the algorithm to begin by taking large steps(the initial size is determined by the argument step.size)and then to gradually reduce the size of steps.At the start of the algorithm,a lower bound(delta in the paper)on the eigenvalues of the solution is calculated.As shown in Equation(3)of the paper,the prox function for our generalized gradient descent amounts to minimizing(over a matrix X)a problem of the formminimize(1/2)||X-A||_F^2+||lambda*X||_1subject to X>=delta IThis is implemented using an alternating direction method of multipliers approach given in Ap-pendix3.ValueSigma the sparse covariance estimaten.iter a vector giving the number of generalized gradient steps taken on each step of the MM algorithmobj a vector giving the objective values after each step of the MM algorithmAuthor(s)Jacob Bien and Rob TibshiraniReferencesBien,J.,and Tibshirani,R.(2011),"Sparse Estimation of a Covariance Matrix,"Biometrika.98(4).807–820.See AlsoProxADMMExamplesset.seed(1)n<-100p<-20#generate a covariance matrix:model<-GenerateCliquesCovariance(ncliques=4,cliquesize=p/4,1)#generate data matrix with x[i,]~N(0,model$Sigma):x<-matrix(rnorm(n*p),ncol=p)%*%model$AS<-var(x)#compute sparse,positive covariance estimator:step.size<-100tol<-1e-3P<-matrix(1,p,p)diag(P)<-0lam<-0.06mm<-spcov(Sigma=S,S=S,lambda=lam*P,step.size=step.size,n.inner.steps=200,thr.inner=0,tol.outer=tol,trace=1)sqrt(mean((mm$Sigma-model$Sigma)^2))sqrt(mean((S-model$Sigma)^2))##Not run:image(mm$Sigma!=0)Index∗multivariateGenerateCliquesCovariance,2ProxADMM,3spcov,5spcov-package,2 GenerateCliquesCovariance,2ProxADMM,3spcov,2,5spcov-package,28。

基于机器学习算法和生物信息学技术构建的肺癌与肺结核鉴别诊断模型及其初步评价

基于机器学习算法和生物信息学技术构建的肺癌与肺结核鉴别诊断模型及其初步评价

基于机器学习算法和生物信息学技术构建的肺癌与肺结核鉴别诊断模型及其初步评价夏文俊;于斐;胡鹏远;张晓旭;张燕;包亮亮;毛宏凯;玛依沙·达肯;曹明芹【期刊名称】《山东医药》【年(卷),期】2023(63)5【摘要】目的采用机器学习算法结合生物信息学构建肺癌与肺结核鉴别诊断模型,并对其诊断准确度进行初步评价。

方法通过GEO数据库筛选并下载肺癌与肺结核数据集GSE42834,运用R软件的limma包筛选肺癌与肺结核差异表达基因(DEGs),对筛选出的DEGs进行GO生物过程和KEGG作用通路分析。

使用STRING工具和Cytscape软件构建蛋白质相互作用网络(PPI),筛选肺癌与肺结核核心DEGs并使用t检验验证;将筛选出的核心DEGs输入R软件caret包,使用留一交叉验证法(LOOCV)结合8种机器学习算法构建肺癌与肺结核的鉴别诊断模型,包括支持向量机(SVM)、自适应提升算法(Ada Boost)、C5.0决策树(C5.0)、随机森林(RF)、朴素贝叶斯(NB)、神经网络(NN)、线性判别分析(LDA)及逻辑回归(LR)模型,筛选模型的最优参数。

使用Bootstrap法对模型进行内部验证,采用准确率、Kappa值、敏感度及特异度初步评价鉴别诊断模型的诊断准确度。

结果GSE42834数据集中共筛选出325个DEGs,其中上调基因205个,下调基因120个。

GO生物过程分析结果显示,肺癌与肺结核DEGs主要富集的生物过程为对病毒的反应、对病毒的防御反应、干扰素γ反应等;KEGG作用通路分析结果显示,肺癌与肺结核DEGs主要富集的作用通路为甲型流感、EB病毒感染、抗原处理和呈递等。

PPI网络显示,具有最高连通性的前10个核心DEGs分别为STAT1、CXCL10、MX1、ISG15、IFIH1、OASL、IFIT3、GBP1、IFI44和IFIT1,经验证10个核心DEGs在肺癌患者中的表达水平均低于肺结核患者(P均<0.05)。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
1 Introduction
Consider a data set with n variables, drawn from a multivariate Gaussian distribution N (0, Σ), where the covariance matrix Σ is unknown. When the number of variables n is large, estimating the entries of Σ becomes a significant problem.
In this paper we focus on the problem of computing a sparse estimate of the covariance matrix using only a large-scale, a priori dense and noisy sample covariance matrix Σ. Our approach is based on l1-penalized maximum likelihood, and can be interpreted as a ”robust maximum likelihood” method, where we assume that the true covariance matrix is within a component-wise bounded perturbation of the sample one, and the estimate is chosen to maximize
arXiv:cs/0506023v1 [cs.CE] 8 Jun 2005
Sparse Covariance Selection via Robust Maximum Likelihood Estimation
Onureena Banerjee∗, Alexandre d’Aspremont†, Laurent El Ghaoui‡
February 1, 2008
Abstract
We address a problem of covariance selection, where we seek a trade-off between a high likelihood against the number of non-zero elements in the inverse covariance matrix. We solve a maximum likelihood problem with a penalty term given by the sum of absolute values of the elements of the inverse covariance matrix, and allow for imposing bounds on the condition number of the solution. The problem is directly amenable to now standard interiorpoint algorithms for convex optimization, but remains challenging due to its size. We first give some results on the theoretical computational complexity of the problem, by showing that a recent methodology for non-smooth convex optimization due to Nesterov can be applied to this problem, to greatly improve on the complexity estimate given by interior-point algorithms. We then examine two practical algorithms aimed at solving large-scale, noisy (hence dense) instances: one is based on a block-coordinate descent approach, where columns and rows are updated sequentially, another applies a dual version of Nesterov’s method.
2 Preliminaries
Problem setup. as
For a given covariance matrix Σ ∈ Sn+, and reals numbers 0 ≤ α < β, our problem is formuபைடு நூலகம்ated
p∗ := max{log det X − Σ, X − ρ X 1 : αIn X βIn}
1
the worst-case (minimal) likelihood. One of our goals is to provide an efficient algorithm to discover structure, rather than solve problems where the inverse covariance matrix has an already known sparse structure, as in [DRV05].
∗EECS Department, UC Berkeley, Berkeley, CA 94720. onureena@ †ORFE Department, Princeton University, Princeton, NJ 08544. aspremon@ ‡EECS Department, UC Berkeley, Berkeley, CA 94720. elghaoui@
(1)
X
with variable X ∈ Sn, and X 1 :=
n i,j=1
|Xij |.
The
parameter
ρ
>
0
controls
the
size
of
the
penalty,
hence
the
sparsity of the solution. Here Σ, X = Tr ΣX denotes the scalar product between the two symmetric matrices Σ, X.
Our contributions are as follows: we specify the problem and outline some of its basic properties (section 2); we describe how one can apply a recent methodology for convex optimization due to Nesterov [Nes03], and obtain as a result a computational complexity estimate that has a much better dependence on problem size than interior-point algorithms (section 3); we present two algorithms for solving large dense problems (section 4): a version of Nesterov’s method applied to the dual problem, and a block coordinate descent method. In section 5 we present the results of some numerical experiments comparing these two algorithms.
In 1972, Dempster [Dem72] suggested reducing the number of parameters to be estimated by setting to zero some elements of the inverse covariance matrix Σ−1. This idea, known as covariance selection, can lead to a more robust estimate of Σ if enough entries in its inverse are set to zero. Furthermore, conditional independence properties of the distribution are determined by the locations of zeros in Σ−1. Hence the approach can be used to simultaneously determine a robust estimate of the covariance matrix and, perhaps more importantly, discover structure, namely conditional independence properties, in the underlying graphical model, which is a useful information in its own right. Specific applications of covariance selection include speech recognition [Bil99, CG99, Bil00] and gene expression data analysis [DW04, DHJ+04].
相关文档
最新文档