cs229-2-Review of Probability Theory
Advances in prospect theory cumulative representation of uncertainty
Expected utility theory reigned for several decades as the dominant normative and descriptive model of decision making under uncertainty, but it has come under serious question in recent years. There is now general agreement that the theory does not provide an adequate description of individual choice: a substantial body of evidence shows that decision makers systematically violate its basic tenets. Many alternative models have been proposed in response to this empirical challenge (for reviews, see Camerer, 1989; Fishburn, 1988; Machina, 1987). Some time ago we presented a model of choice, called prospect theory, which explained the major violations of expected utility theory in choices between risky prospects with a small number of outcomes (Kahneman and Tversky, 1979; Tversky and Kahneman, 1986). The key elements of this theory are 1) a value function that is concave for gains, convex for losses, and steeper for losses than for gains,
Cs229笔记
Cs229笔记第⼀部分到第三部分注意:综合张⾬⽯笔记和中⽂讲义学习,中⽂讲义有⼀些概念翻译不准确。
1.关于trace性质的证明见ML公开课笔记附录。
2.⾮参数算法的定义更直观来讲应该是:参数数量随着数据集规模的增长⽽增长。
3.似然性(likelihood)和概率(proability)两个概念⼤体上是相同的,似然性强调L(θ)是θ的函数,⽽概率强调是数据的函数。
所以我们经常说“数据的概率,参数的似然性”。
(Probability of data, likelihood of parameter)。
4.⼴义线性模型中的线性:是指假设3,即⾃然参数η和输⼊值x是线性相关的。
5.⼴义线性模型的好处是,如果我有⼀个需要预测的量y,只有0和1两种值,那么我选择了伯努利分布,之后的预测模型的得出都是⾃动进⾏的。
6.在数学推导的《线性回归与最⼩⼆乘法》⼀⽂中,矩阵偏导使⽤的是分⼦布局,与的布局⼀致。
另外附上:第四部分⽣成学习1.argmax:求函数达到最⼤值时对应的⾃变量值。
2.半正定矩阵是正定矩阵的推⼴。
实对称矩阵A称为半正定的,如果⼆次型X'AX半正定,即对于任意不为0的实列向量X,都有X'AX≥0。
(正定矩阵是>0)。
3.NB的关于三个参数的意义:p(x|y=1)是统计垃圾邮件中出现相应单词的邮件⽐例,p(x|y=0)是正常邮件中出现相应单词的邮件⽐例,p(y)是垃圾邮件概率(因为是Bernoulli分布),由此可以由p(y|x)=p(x|y)p(y)预测给定相应单词是垃圾邮件的概率。
4.多项式事件模型中,p(y)是邮件的种类标签对应的概率,这是⾸先要被选中的,然后是对应各个词的概率。
⽤到的参数Φ(k|y=1)意为当⼀个⼈决定向你发送垃圾邮件时选⽤词典中下标为k的词的概率。
5.多项式事件模型中,参数的最⼤似然估计,Φ(k|y=1)公式中m是邮件数量,ni是词典中的词总数。
分⼦意思是垃圾邮件中对应单词出现的次数,分母是垃圾邮件的单词总数之和,也就是把所有垃圾邮件展开⼀维的总长。
cs229斯坦福大学机器学习教程 Supplemental notes 4 - hoeffding
John Duchi
1
Basic probability bounds
A basic question in probability, statistics, and machine learning is the following: given a random variable Z with expectation E[Z ], how likely is Z to be close to its expectation? And more precisely, how close is it likely to be? With that in mind, these notes give a few tools for computing bounds of the form P(Z ≥ E[Z ] + t) and P(Z ≤ E[Z ] − t) (1)
for t ≥ 0. Our first bound is perhaps the most basic of all probability inequalities, and it is known as Markov’s inequality. Given its basic-ness, it is perhaps unsurprising that its proof is essentially only one line. Proposition 1 (Markov’s inequality). Let Z ≥ 0 be a non-negative random variable. Then for all t ≥ 0, P( Z ≥ t ) ≤ E [Z ] . t
考虑随机效应的两阶段退化系统剩余寿命预测方法
0 引 言
呈现出集成化、智能化、复杂化的发展趋势,功能增
随着现代科学技术的飞速发展,工业设备产品 强的同时给设备的可靠性研究带来了新的挑战。设
收稿日期: 2018-07-17;收到修改稿日期: 2018-08-14 基金项目: 国家自然科学基金(61573365) 作者简介: 张 鹏(1994-),男,陕西西安市人,硕士研究生,专业方向为故障诊断与寿命预测。
?a???c????221?b?t?2a22212b2t?222t?a12t?22a2bexp?????b??a222a2b?????????a2b?b2a2a2b????a2b?b2a2a2b2a2b???2a2b2a2b?????a2b?b2a2a2b2a2b????????bexp???2?2122241???12t?22a2bexp?????b??c222a2b?????????c2b?b2a2a2b????c2b?b2a2a2b2a2b???2a2b2a2b?????c2b?b2a2a2b2a2b????????7tkxklkfllk?1?2若已知当前时刻的退化状态用表示系统的剩余寿命表示系统剩余寿命的pdf在随机退化速率和的影响下可获得基于两阶段维纳过程退化模型的系统剩余寿命的pdftktklt
(火箭军工程大学,陕西 西安 710025)
摘 要: 针对退化过程呈现两阶段特征的随机退化系统剩余寿命预测问题,建立两阶段维纳过程退化模型,并引入随
机效应描述样本间差异性。基于时间-空间变化方法以及变点处退化值的随机特性,给出首达时间意义下系统寿命
分布解析表达形式。提出一种基于期望最大化 (expectation maximization, EM) 算法和贝叶斯理论的模型参数离线辨
识和在线更新算法。最后,结合液力耦合器 (liquid coupling device, LCD) 的实际退化数据,验证所提方法的可行性
cs229斯坦福机器学习笔记(一)--入门与LR模型
cs229斯坦福机器学习笔记(⼀)--⼊门与LR模型版权声明:本⽂为博主原创⽂章,转载请注明出处。
前⾔说到机器学习,⾮常多⼈推荐的学习资料就是斯坦福Andrew Ng的cs229。
有相关的和。
只是好的资料 != 好⼊门的资料,Andrew Ng在coursera有另外⼀个,更适合⼊门。
课程有video,review questions和programing exercises,视频尽管没有中⽂字幕,只是看演⽰的讲义还是⾮常好理解的(假设当初⼤学⾥的课有这么好。
我也不⾄于毕业后成为⽂盲。
)。
最重要的就是⾥⾯的programing exercises,得理解透才完毕得来的,毕竟不是简单点点⿏标的选择题。
只是coursera的课程屏蔽⾮常⼀些⽐較难的内容,假设认为课程不够过瘾。
能够再看看cs229的。
这篇笔记主要是參照cs229的课程。
但也会穿插coursera的⼀些内容。
接触完机器学习,会发现有两门课⾮常重要,⼀个是概率统计。
另外⼀个是线性代数。
由于机器学习使⽤的数据,能够看成概率统计⾥的样本,⽽机器学习建模之后,你会发现剩下的就是线性代数求解问题。
⾄于学习资料,周志华最新的《机器学习》西⽠书已经出了,肯定是⾸选!曾经的话我推荐《机器学习实战》,能解决你对机器学习怎么落地的困惑。
李航的《统计学习⽅法》能够当提纲參考。
cs229除了lecture notes。
还有session notes(简直是雪中送炭。
夏天送风扇,lecture notes⾥那些让你认为有必要再深⼊了解的点这⾥能够找到),和problem sets。
假设细致读。
资料也够多了。
线性回归 linear regression通过现实⽣活中的样例。
能够帮助理解和体会线性回归。
⽐⽅某⽇,某屌丝同事说买了房⼦,那⼀般⼤家关⼼的就是房⼦在哪。
哪个⼩区,多少钱⼀平⽅这些信息,由于我们知道。
这些信息是"关键信息”(机器学习⾥的⿊话叫“feature”)。
吴恩达的CS229的数学基础(概率论),有人把它做成了在线翻译版本!
吴恩达的CS229的数学基础(概率论),有人把它做成了在线翻译版本!本文是斯坦福大学CS229 机器学习课程的基础材料,原始文件下载[1]原文作者:Arian Maleki , Tom Do翻译:石振宇[2]审核和修改制作:黄海广[3]备注:请关注github[4]的更新。
线性代数的翻译见(这篇文章)。
CS229 机器学习课程复习材料-概率论概率论复习和参考概率论是对不确定性的研究。
通过这门课,我们将依靠概率论中的概念来推导机器学习算法。
这篇笔记试图涵盖适用于CS229的概率论基础。
概率论的数学理论非常复杂,并且涉及到“分析”的一个分支:测度论。
在这篇笔记中,我们提供了概率的一些基本处理方法,但是不会涉及到这些更复杂的细节。
1. 概率的基本要素为了定义集合上的概率,我们需要一些基本元素,•样本空间:随机实验的所有结果的集合。
在这里,每个结果可以被认为是实验结束时现实世界状态的完整描述。
•事件集(事件空间):元素的集合(称为事件)是的子集(即每个是一个实验可能结果的集合)。
备注:需要满足以下三个条件:(1)(2)(3)•概率度量:函数是一个的映射,满足以下性质:•对于每个,,••如果是互不相交的事件 (即当时, ), 那么:以上三条性质被称为概率公理。
举例:考虑投掷六面骰子的事件。
样本空间为。
最简单的事件空间是平凡事件空间.另一个事件空间是的所有子集的集合。
对于第一个事件空间,满足上述要求的唯一概率度量由,给出。
对于第二个事件空间,一个有效的概率度量是将事件空间中每个事件的概率分配为,这里是这个事件集合中元素的数量;例如,。
性质:•如果,则:••(布尔不等式):••(全概率定律):如果是一些互不相交的事件并且它们的并集是,那么它们的概率之和是 11.1 条件概率和独立性假设是一个概率非0 的事件,我们定义在给定的条件下的条件概率为:换句话说,)是度量已经观测到事件发生的情况下事件发生的概率,两个事件被称为独立事件当且仅当(或等价地,)。
改进的数据驱动时频分析方法及其应用论文
N941.3 926.7
学号 密级
10020030 公开
理学硕士学位论文
改进的数据驱动时频分析方法及其应用
硕士生姓名 学 科 专 业 研 究 方 向 指 导 教 师
张学阳 系统科学 多元信息处理 朱炬波 教授
国防科学技术大学研究生院 二〇一二年十一月
改 进 的 数 据 驱 动 时 频 分 析 方 法 及 其 应 用
国 防 科 学 技 术 大 学 研 究 生 院
Improved Data-Driven Time-Frequency Analysis and its Application
Candidate:Zhang Xueyang Advisor:Prof. Zhu Jubo
A dissertation Submitted in partial fulfillment of the requirements for the degree of Master of Science in Systems Science Graduate School of National University of Defense Technology Changsha,Hunan,P.R.China November,2012
第 I 页
国防科学技术大学研究生院硕士学位论文
第五章 DDTFA 在北斗伪距波动问题中的应用 ................................................ 35 5.1 伪距与伪距波动 ............................................................................................. 35 5.1.1 伪距定位原理 ...................................................................................... 35 5.1.2 伪距波动 .............................................................................................. 36 5.2 伪距波动数据说明 ......................................................................................... 38 5.2.1 北斗系统基本观测方程 ...................................................................... 38 5.2.2 伪距数据分析公式 .............................................................................. 38 5.2.3 数据说明 .............................................................................................. 40 5.3 数据处理与分析 ............................................................................................. 40 5.3.1 数据分解与统计重要性检测 .............................................................. 40 5.3.2 周期、振幅与相关性分析 .................................................................. 43 5.3.3 结论 ...................................................................................................... 45 5.4 本章小结 ......................................................................................................... 46 第六章 总结与展望 ......................................................................................... 47 致 谢 .............................................................................................................. 49 参考文献 ........................................................................................................... 51 作者在学期间取得的学术成果 ........................................................................... 55 发表的学术论文 .................................................................................................... 55 参与的科研工作 .................................................................................................... 55
机器学习 深度学习 笔记 (5)
Now, suppose we pick u to correspond the the direction shown in the figure below. The circles denote the projections of the original data onto this line.
3 grayscale image, and each x(ji) took a value in {0, 1, . . . , 255} corresponding to the intensity value of pixel j in image i.
Now, having normalized our data, how do we compute the “major axis of variation” u—that is, the direction on which the data approximately lies? One way is to pose this problem as finding the unit vector u so that when the data is projected onto the direction corresponding to u, the variance of the projected data is maximized. Intuitively, the data starts off with some amount of variance/information in it. We would like to choose a direction u so that if we were to approximate the data as lying in the direction/subspace corresponding to u, as much as possible of this variance is still retained.
最新斯坦福吴恩达机器学习全套讲义(CS229)
Andrew Ng
ห้องสมุดไป่ตู้
Supervised learning
Lets start by talking about a few examples of supervised learning problems. Suppose we have a dataset giving the living areas and prices of 47 houses from Portland, Oregon:
Training set
Learning algorithm
x
h
predicted y
(living area of
(predicted price)
house.)
of house)
When the target variable that we’re trying to predict is continuous, such as in our housing example, we call the learning problem a regression problem. When y can take on only a small number of discrete values (such as if, given the living area, we wanted to predict if a dwelling is a house or an apartment, say), we call it a classification problem.
Living area (feet2) 2104 1600 2400 1416 3000 ...
斯坦福大学 CS229 机器学习notes12
CS229Lecture notesAndrew NgPart XIIIReinforcement Learning and ControlWe now begin our study of reinforcement learning and adaptive control.In supervised learning,we saw algorithms that tried to make their outputs mimic the labels y given in the training set.In that setting,the labels gave an unambiguous“right answer”for each of the inputs x.In contrast,for many sequential decision making and control problems,it is very difficult to provide this type of explicit supervision to a learning algorithm.For example, if we have just built a four-legged robot and are trying to program it to walk, then initially we have no idea what the“correct”actions to take are to make it walk,and so do not know how to provide explicit supervision for a learning algorithm to try to mimic.In the reinforcement learning framework,we will instead provide our al-gorithms only a reward function,which indicates to the learning agent when it is doing well,and when it is doing poorly.In the four-legged walking ex-ample,the reward function might give the robot positive rewards for moving forwards,and negative rewards for either moving backwards or falling over. It will then be the learning algorithm’s job tofigure out how to choose actions over time so as to obtain large rewards.Reinforcement learning has been successful in applications as diverse as autonomous helicopterflight,robot legged locomotion,cell-phone network routing,marketing strategy selection,factory control,and efficient web-page indexing.Our study of reinforcement learning will begin with a definition of the Markov decision processes(MDP),which provides the formalism in which RL problems are usually posed.12 1Markov decision processesA Markov decision process is a tuple(S,A,{P sa},γ,R),where:•S is a set of states.(For example,in autonomous helicopterflight,S might be the set of all possible positions and orientations of the heli-copter.)•A is a set of actions.(For example,the set of all possible directions in which you can push the helicopter’s control sticks.)•P sa are the state transition probabilities.For each state s∈S and action a∈A,P sa is a distribution over the state space.We’ll say more about this later,but briefly,P sa gives the distribution over what states we will transition to if we take action a in state s.•γ∈[0,1)is called the discount factor.•R:S×A→R is the reward function.(Rewards are sometimes also written as a function of a state S only,in which case we would have R:S→R).The dynamics of an MDP proceeds as follows:We start in some state s0, and get to choose some action a0∈A to take in the MDP.As a result of our choice,the state of the MDP randomly transitions to some successor states1,drawn according to s1∼P s0a0.Then,we get to pick another action a1.As a result of this action,the state transitions again,now to some s2∼P s1a1.We then pick a2,and so on....Pictorially,we can represent this process as follows:s0a0−→s1a1−→s2a2−→s3a3−→...Upon visiting the sequence of states s0,s1,...with actions a0,a1,...,our total payoffis given byR(s0,a0)+γR(s1,a1)+γ2R(s2,a2)+···.Or,when we are writing rewards as a function of the states only,this becomesR(s0)+γR(s1)+γ2R(s2)+···.For most of our development,we will use the simpler state-rewards R(s), though the generalization to state-action rewards R(s,a)offers no special difficulties.3 Our goal in reinforcement learning is to choose actions over time so as to maximize the expected value of the total payoff:E R(s0)+γR(s1)+γ2R(s2)+···Note that the reward at timestep t is discounted by a factor ofγt.Thus,to make this expectation large,we would like to accrue positive rewards as soon as possible(and postpone negative rewards as long as possible).In economic applications where R(·)is the amount of money made,γalso has a natural interpretation in terms of the interest rate(where a dollar today is worth more than a dollar tomorrow).A policy is any functionπ:S→A mapping from the states to the actions.We say that we are executing some policyπif,whenever we are in state s,we take action a=π(s).We also define the value function for a policyπaccording toVπ(s)=E R(s0)+γR(s1)+γ2R(s2)+··· s0=s,π].Vπ(s)is simply the expected sum of discounted rewards upon starting in state s,and taking actions according toπ.1Given afixed policyπ,its value function Vπsatisfies the Bellman equa-tions:Vπ(s)=R(s)+γ s ∈S P sπ(s)(s )Vπ(s ).This says that the expected sum of discounted rewards Vπ(s)for starting in s consists of two terms:First,the immediate reward R(s)that we get rightaway simply for starting in state s,and second,the expected sum of future discounted rewards.Examining the second term in more detail,we[Vπ(s )].This see that the summation term above can be rewritten E s ∼Psπ(s)is the expected sum of discounted rewards for starting in state s ,where s is distributed according P sπ(s),which is the distribution over where we will end up after taking thefirst actionπ(s)in the MDP from state s.Thus,the second term above gives the expected sum of discounted rewards obtained after thefirst step in the MDP.Bellman’s equations can be used to efficiently solve for Vπ.Specifically, in afinite-state MDP(|S|<∞),we can write down one such equation for Vπ(s)for every state s.This gives us a set of|S|linear equations in|S| variables(the unknown Vπ(s)’s,one for each state),which can be efficiently solved for the Vπ(s)’s.1This notation in which we condition onπisn’t technically correct becauseπisn’t a random variable,but this is quite standard in the literature.4We also define the optimal value function according toV ∗(s )=max πV π(s ).(1)In other words,this is the best possible expected sum of discounted rewards that can be attained using any policy.There is also a version of Bellman’s equations for the optimal value function:V ∗(s )=R (s )+max a ∈A γ s ∈SP sa (s )V ∗(s ).(2)The first term above is the immediate reward as before.The second term is the maximum over all actions a of the expected future sum of discounted rewards we’ll get upon after action a .You should make sure you understand this equation and see why it makes sense.We also define a policy π∗:S →A as follows:π∗(s )=arg max a ∈A s ∈SP sa (s )V ∗(s ).(3)Note that π∗(s )gives the action a that attains the maximum in the “max”in Equation (2).It is a fact that for every state s and every policy π,we haveV ∗(s )=V π∗(s )≥V π(s ).The first equality says that the V π∗,the value function for π∗,is equal to the optimal value function V ∗for every state s .Further,the inequality above says that π∗’s value is at least a large as the value of any other other policy.In other words,π∗as defined in Equation (3)is the optimal policy.Note that π∗has the interesting property that it is the optimal policy for all states s .Specifically,it is not the case that if we were starting in some state s then there’d be some optimal policy for that state,and if we were starting in some other state s then there’d be some other policy that’s optimal policy for s .Specifically,the same policy π∗attains the maximum in Equation (1)for all states s .This means that we can use the same policy π∗no matter what the initial state of our MDP is.2Value iteration and policy iterationWe now describe two efficient algorithms for solving finite-state MDPs.For now,we will consider only MDPs with finite state and action spaces (|S |<∞,|A |<∞).The first algorithm,value iteration ,is as follows:51.For each state s,initialize V(s):=0.2.Repeat until convergence{For every state,update V(s):=R(s)+max a∈Aγ s P sa(s )V(s ).}This algorithm can be thought of as repeatedly trying to update the esti-mated value function using Bellman Equations(2).There are two possible ways of performing the updates in the inner loop of the algorithm.In thefirst,we canfirst compute the new values for V(s)for every state s,and then overwrite all the old values with the new values.This is called a synchronous update.In this case,the algorithm can be viewed as implementing a“Bellman backup operator”that takes a current estimate of the value function,and maps it to a new estimate.(See homework problem for details.)Alternatively,we can also perform asynchronous updates. Here,we would loop over the states(in some order),updating the values one at a time.Under either synchronous or asynchronous updates,it can be shown that value iteration will cause V to converge to V∗.Having found V∗,we can then use Equation(3)tofind the optimal policy.Apart from value iteration,there is a second standard algorithm forfind-ing an optimal policy for an MDP.The policy iteration algorithm proceeds as follows:1.Initializeπrandomly.2.Repeat until convergence{(a)Let V:=Vπ.(b)For each state s,letπ(s):=arg max a∈A s P sa(s )V(s ).}Thus,the inner-loop repeatedly computes the value function for the current policy,and then updates the policy using the current value function.(The policyπfound in step(b)is also called the policy that is greedy with re-spect to V.)Note that step(a)can be done via solving Bellman’s equations as described earlier,which in the case of afixed policy,is just a set of|S| linear equations in|S|variables.After at most afinite number of iterations of this algorithm,V will con-verge to V∗,andπwill converge toπ∗.6Both value iteration and policy iteration are standard algorithms for solv-ing MDPs,and there isn’t currently universal agreement over which algo-rithm is better.For small MDPs,policy iteration is often very fast and converges with very few iterations.However,for MDPs with large state spaces,solving for Vπexplicitly would involve solving a large system of lin-ear equations,and could be difficult.In these problems,value iteration may be preferred.For this reason,in practice value iteration seems to be used more often than policy iteration.3Learning a model for an MDPSo far,we have discussed MDPs and algorithms for MDPs assuming that the state transition probabilities and rewards are known.In many realistic prob-lems,we are not given state transition probabilities and rewards explicitly, but must instead estimate them from data.(Usually,S,A andγare known.) For example,suppose that,for the inverted pendulum problem(see prob-lem set4),we had a number of trials in the MDP,that proceeded as follows:s(1)0a (1) 0−→s(1)1a (1) 1−→s(1)2a (1) 2−→s(1)3a (1) 3−→...s(2)0a (2) 0−→s(2)1a (2) 1−→s(2)2a (2) 2−→s(2)3a (2) 3−→......Here,s(j)i is the state we were at time i of trial j,and a(j)i is the cor-responding action that was taken from that state.In practice,each of the trials above might be run until the MDP terminates(such as if the pole falls over in the inverted pendulum problem),or it might be run for some large butfinite number of timesteps.Given this“experience”in the MDP consisting of a number of trials, we can then easily derive the maximum likelihood estimates for the state transition probabilities:P sa(s )=#times took we action a in state s and got to s#times we took action a in state s(4)Or,if the ratio above is“0/0”—corresponding to the case of never having taken action a in state s before—the we might simply estimate P sa(s )to be 1/|S|.(I.e.,estimate P sa to be the uniform distribution over all states.) Note that,if we gain more experience(observe more trials)in the MDP, there is an efficient way to update our estimated state transition probabilities7 using the new experience.Specifically,if we keep around the counts for both the numerator and denominator terms of(4),then as we observe more trials, we can simply keep accumulating those puting the ratio of these counts then given our estimate of P sa.Using a similar procedure,if R is unknown,we can also pick our estimate of the expected immediate reward R(s)in state s to be the average reward observed in state s.Having learned a model for the MDP,we can then use either value it-eration or policy iteration to solve the MDP using the estimated transition probabilities and rewards.For example,putting together model learning and value iteration,here is one possible algorithm for learning in an MDP with unknown state transition probabilities:1.Initializeπrandomly.2.Repeat{(a)Executeπin the MDP for some number of trials.(b)Using the accumulated experience in the MDP,update our esti-mates for P sa(and R,if applicable).(c)Apply value iteration with the estimated state transition probabil-ities and rewards to get a new estimated value function V.(d)Updateπto be the greedy policy with respect to V.}We note that,for this particular algorithm,there is one simple optimiza-tion that can make it run much more quickly.Specifically,in the inner loop of the algorithm where we apply value iteration,if instead of initializing value iteration with V=0,we initialize it with the solution found during the pre-vious iteration of our algorithm,then that will provide value iteration with a much better initial starting point and make it converge more quickly.。
Lec00
•
Designed to provide a broad introduction of the theories, algorithms and applications of machine learning. Topics include:
• Supervised learning (generative/discriminative learning, support vector machines, logistic regressions), Unsupervised learning (clustering, feature selection, nonparametric Bayesian methods) • Learning theory (bias/variance tradeoff, model selection, VC theory) • Probabilistic graphical models (HMM, structure learning) • Applications
• Project proposal (Week 3/4): (10) • Final presentation (Week 15/16): (10) • One report for each group
Group Project
Examples:
Prj1: Disease related gene selection: Select the genes differently expressed in a DNA microarray data from patients and healthy controls. Prj2: Review rating prediction: Predict the rating scores of online hotel reviews. Prj3: Spam filtering. Prj4: Network intrusion detection.
吴恩达cs229-notes4
CS229Lecture notesAndrew NgPart VILearning Theory1Bias/variance tradeoffWhen talking about linear regression,we discussed the problem of whether tofit a“simple”model such as the linear“y=θ0+θ1x,”or a more“complex”model such as the polynomial“y=θ0+θ1x+···θ5x5.”We saw the followingFitting a5th order polynomial to the data(rightmostfigure)did not result in a good model.Specifically,even though the5th order polynomial did a very good job predicting y(say,prices of houses)from x(say,living area)for the examples in the training set,we do not expect the model shown to be a good one for predicting the prices of houses not in the training set.In other words,what’s has been learned from the training set does not generalize well to other houses.The generalization error(which will be made formal shortly)of a hypothesis is its expected error on examples not necessarily in the training set.Both the models in the leftmost and the rightmostfigures above have large generalization error.However,the problems that the two models suffer from are very different.If the relationship between y and x is not linear,12 then even if we werefitting a linear model to a very large amount of training data,the linear model would still fail to accurately capture the structure in the rmally,we define the bias of a model to be the expected generalization error even if we were tofit it to a very(say,infinitely)large training set.Thus,for the problem above,the linear model suffers from large bias,and may underfit(i.e.,fail to capture structure exhibited by)the data.Apart from bias,there’s a second component to the generalization error, consisting of the variance of a modelfitting procedure.Specifically,when fitting a5th order polynomial as in the rightmostfigure,there is a large risk that we’refitting patterns in the data that happened to be present in our small,finite training set,but that do not reflect the wider pattern of the relationship between x and y.This could be,say,because in the training set we just happened by chance to get a slightly more-expensive-than-average house here,and a slightly less-expensive-than-average house there,and so on.Byfitting these“spurious”patterns in the training set,we might again obtain a model with large generalization error.In this case,we say the model has large variance.1Often,there is a tradeoffbetween bias and variance.If our model is too “simple”and has very few parameters,then it may have large bias(but small variance);if it is too“complex”and has very many parameters,then it may suffer from large variance(but have smaller bias).In the example above,fitting a quadratic function does better than either of the extremes of afirst or afifth order polynomial.2PreliminariesIn this set of notes,we begin our foray into learning theory.Apart from being interesting and enlightening in its own right,this discussion will also help us hone our intuitions and derive rules of thumb about how to best apply learning algorithms in different settings.We will also seek to answer a few questions:First,can we make formal the bias/variance tradeoffthat was just discussed?The will also eventually lead us to talk about modelselection methods,which can,for instance,automatically decide what order polynomial tofit to a training set.Second,in machine learning it’s really 1In these notes,we will not try to formalize the definitions of bias and variance beyond this discussion.While bias and variance are straightforward to define formally for,e.g., linear regression,there have been several proposals for the definitions of bias and variance for classification,and there is as yet no agreement on what is the“right”and/or the most useful formalism.3 generalization error that we care about,but most learning algorithmsfit their models to the training set.Why should doing well on the training set tell us anything about generalization error?Specifically,can we relate error on the training set to generalization error?Third andfinally,are there conditions under which we can actually prove that learning algorithms will work well?We start with two simple but very useful lemmas.Lemma.(The union bound).Let A1,A2,...,A k be k different events(that may not be independent).ThenP(A1∪···∪A k)≤P(A1)+...+P(A k).In probability theory,the union bound is usually stated as an axiom (and thus we won’t try to prove it),but it also makes intuitive sense:The probability of any one of k events happening is at most the sums of the probabilities of the k different events.Lemma.(Hoeffding inequality)Let Z1,...,Z m be m independent and iden-tically distributed(iid)random variables drawn from a Bernoulli(φ)distri-bution.I.e.,P(Z i=1)=φ,and P(Z i=0)=1−φ.Letˆφ=(1/m) m i=1Z i be the mean of these random variables,and let anyγ>0befixed.ThenP(|φ−ˆφ|>γ)≤2exp(−2γ2m)This lemma(which in learning theory is also called the Chernoffbound) says that if we takeˆφ—the average of m Bernoulli(φ)random variables—to be our estimate ofφ,then the probability of our being far from the true value is small,so long as m is large.Another way of saying this is that if you have a biased coin whose chance of landing on heads isφ,then if you toss it m times and calculate the fraction of times that it came up heads,that will be a good estimate ofφwith high probability(if m is large).Using just these two lemmas,we will be able to prove some of the deepest and most important results in learning theory.To simplify our exposition,lets restrict our attention to binary classifica-tion in which the labels are y∈{0,1}.Everything we’ll say here generalizes to other,including regression and multi-class classification,problems.We assume we are given a training set S={(x(i),y(i));i=1,...,m} of size m,where the training examples(x(i),y(i))are drawn iid from some probability distribution D.For a hypothesis h,we define the training error (also called the empirical risk or empirical error in learning theory)tobeˆε(h)=1mmi=11{h(x(i))=y(i)}.4This is just the fraction of training examples that h misclassifies.When we want to make explicit the dependence of ˆε(h )on the training set S ,we may also write this a ˆεS (h ).We also define the generalization error to beε(h )=P (x,y )∼D (h (x )=y ).I.e.this is the probability that,if we now draw a new example (x,y )from the distribution D ,h will misclassify it.Note that we have assumed that the training data was drawn from the same distribution D with which we’re going to evaluate our hypotheses (in the definition of generalization error).This is sometimes also referred to as one of the PAC assumptions.2Consider the setting of linear classification,and let h θ(x )=1{θT x ≥0}.What’s a reasonable way of fitting the parameters θ?One approach is to try to minimize the training error,and pickˆθ=arg min θˆε(h θ).We call this process empirical risk minimization (ERM),and the resulting hypothesis output by the learning algorithm is ˆh =h ˆθ.We think of ERM as the most “basic”learning algorithm,and it will be this algorithm that we focus on in these notes.(Algorithms such as logistic regression can also be viewed as approximations to empirical risk minimization.)In our study of learning theory,it will be useful to abstract away from the specific parameterization of hypotheses and from issues such as whether we’re using a linear classifier.We define the hypothesis class H used by a learning algorithm to be the set of all classifiers considered by it.For linear classification,H ={h θ:h θ(x )=1{θT x ≥0},θ∈R n +1}is thus the set of all classifiers over X (the domain of the inputs)where the decision boundary is linear.More broadly,if we were studying,say,neural networks,then we could let H be the set of all classifiers representable by some neural network architecture.Empirical risk minimization can now be thought of as a minimization over the class of functions H ,in which the learning algorithm picks the hypothesis:ˆh =arg min h ∈Hˆε(h )2PAC stands for “probably approximately correct,”which is a framework and set of assumptions under which numerous results on learning theory were proved.Of these,the assumption of training and testing on the same distribution,and the assumption of the independently drawn training examples,were the most important.5 3The case offinite HLets start by considering a learning problem in which we have afinite hy-pothesis class H={h1,...,h k}consisting of k hypotheses.Thus,H is just a set of k functions mapping from X to{0,1},and empirical risk minimization selectsˆh to be whichever of these k functions has the smallest training error.We would like to give guarantees on the generalization error ofˆh.Our strategy for doing so will be in two parts:First,we will show thatˆε(h)is a reliable estimate ofε(h)for all h.Second,we will show that this implies an upper-bound on the generalization error ofˆh.Take any one,fixed,h i∈H.Consider a Bernoulli random variable Z whose distribution is defined as follows.We’re going to sample(x,y)∼D. Then,we set Z=1{h i(x)=y}.I.e.,we’re going to draw one example, and let Z indicate whether h i misclassifies it.Similarly,we also define Z j= 1{h i(x(j))=y(j)}.Since our training set was drawn iid from D,Z and the Z j’s have the same distribution.We see that the misclassification probability on a randomly drawn example—that is,ε(h)—is exactly the expected value of Z(and Z j).Moreover,the training error can be writtenˆε(h i)=1mm j=1Z j.Thus,ˆε(h i)is exactly the mean of the m random variables Z j that are drawn iid from a Bernoulli distribution with meanε(h i).Hence,we can apply the Hoeffding inequality,and obtainP(|ε(h i)−ˆε(h i)|>γ)≤2exp(−2γ2m).This shows that,for our particular h i,training error will be close to generalization error with high probability,assuming m is large.But we don’t just want to guarantee thatε(h i)will be close toˆε(h i)(with high probability)for just only one particular h i.We want to prove that this will be true for simultaneously for all h∈H.To do so,let A i denote the event that|ε(h i)−ˆε(h i)|>γ.We’ve already show that,for any particular A i,it holds true that P(A i)≤2exp(−2γ2m).Thus,using the union bound,we6have thatP (∃h ∈H .|ε(h i )−ˆε(h i )|>γ)=P (A 1∪···∪A k )≤k i =1P (A i )≤k i =12exp(−2γ2m )=2k exp(−2γ2m )If we subtract both sides from 1,we find thatP (¬∃h ∈H .|ε(h i )−ˆε(h i )|>γ)=P (∀h ∈H .|ε(h i )−ˆε(h i )|≤γ)≥1−2k exp(−2γ2m )(The “¬”symbol means “not.”)So,with probability at least 1−2k exp(−2γ2m ),we have that ε(h )will be within γof ˆε(h )for all h ∈H .This is called a uni-form convergence result,because this is a bound that holds simultaneously for all (as opposed to just one)h ∈H .In the discussion above,what we did was,for particular values of m and γ,given a bound on the probability that,for some h ∈H ,|ε(h )−ˆε(h )|>γ.There are three quantities of interest here:m ,γ,and the probability of error;we can bound either one in terms of the other two.For instance,we can ask the following question:Given γand some δ>0,how large must m be before we can guarantee that with probability at least 1−δ,training error will be within γof generalization error?By setting δ=2k exp(−2γ2m )and solving for m ,[you should convince yourself this is the right thing to do!],we find that ifm ≥12γ2log 2k δ,then with probability at least 1−δ,we have that |ε(h )−ˆε(h )|≤γfor all h ∈H .(Equivalently,this show that the probability that |ε(h )−ˆε(h )|>γfor some h ∈H is at most δ.)This bound tells us how many training examples we need in order make a guarantee.The training set size m that a certain method or algorithm requires in order to achieve a certain level of performance is also called the algorithm’s sample complexity .The key property of the bound above is that the number of training examples needed to make this guarantee is only logarithmic in k ,the number of hypotheses in H .This will be important later.7 Similarly,we can also hold m andδfixed and solve forγin the previous equation,and show[again,convince yourself that this is right!]that with probability1−δ,we have that for all h∈H,|ˆε(h)−ε(h)|≤ 12m log2kδ.Now,lets assume that uniform convergence holds,i.e.,that|ε(h)−ˆε(h)|≤γfor all h∈H.What can we prove about the generalization of our learning algorithm that pickedˆh=arg min h∈Hˆε(h)?Define h∗=arg min h∈Hε(h)to be the best possible hypothesis in H.Note that h∗is the best that we could possibly do given that we are using H,so it makes sense to compare our performance to that of h∗.We have:ε(ˆh)≤ˆε(ˆh)+γ≤ˆε(h∗)+γ≤ε(h∗)+2γThefirst line used the fact that|ε(ˆh)−ˆε(ˆh)|≤γ(by our uniform convergence assumption).The second used the fact thatˆh was chosen to minimizeˆε(h), and henceˆε(ˆh)≤ˆε(h)for all h,and in particularˆε(ˆh)≤ˆε(h∗).The third line used the uniform convergence assumption again,to show thatˆε(h∗)≤ε(h∗)+γ.So,what we’ve shown is the following:If uniform convergence occurs,then the generalization error ofˆh is at most2γworse than the best possible hypothesis in H!Lets put all this together into a theorem.Theorem.Let|H|=k,and let any m,δbefixed.Then with probability at least1−δ,we have thatε(ˆh)≤ min h∈Hε(h) +2 12m log2kδ.√·term,using our previous argu-This is proved by lettingγequal thement that uniform convergence occurs with probability at least1−δ,and then noting that uniform convergence impliesε(h)is at most2γhigher than ε(h∗)=min h∈Hε(h)(as we showed previously).This also quantifies what we were saying previously saying about the bias/variance tradeoffin model selection.Specifically,suppose we have some hypothesis class H,and are considering switching to some much larger hy-pothesis class H ⊇H.If we switch to H ,then thefirst term min hε(h)8 can only decrease(since we’d then be taking a min over a larger set of func-tions).Hence,by learning using a larger hypothesis class,our“bias”can only decrease.However,if k increases,then the second2√·term would also increase.This increase corresponds to our“variance”increasing when we use a larger hypothesis class.By holdingγandδfixed and solving for m like we did before,we can also obtain the following sample complexity bound:Corollary.Let|H|=k,and let anyδ,γbefixed.Then forε(ˆh)≤min h∈Hε(h)+2γto hold with probability at least1−δ,it suffices thatm≥12γ2log2kδ=O 1γ2log kδ ,4The case of infinite HWe have proved some useful theorems for the case offinite hypothesis classes. But many hypothesis classes,including any parameterized by real numbers (as in linear classification)actually contain an infinite number of functions. Can we prove similar results for this setting?Lets start by going through something that is not the“right”argument. Better and more general arguments exist,but this will be useful for honing our intuitions about the domain.Suppose we have an H that is parameterized by d real numbers.Since we are using a computer to represent real numbers,and IEEE double-precision floating point(double’s in C)uses64bits to represent afloating point num-ber,this means that our learning algorithm,assuming we’re using double-precisionfloating point,is parameterized by64d bits.Thus,our hypothesis class really consists of at most k=264d different hypotheses.From the Corol-lary at the end of the previous section,we thereforefind that,to guarantee ε(ˆh)≤ε(h∗)+2γ,with to hold with probability at least1−δ,it suffices that m≥O 1γ2log264dδ =O dγ2log1δ =Oγ,δ(d).(Theγ,δsubscripts are to indicate that the last big-O is hiding constants that may depend onγand δ.)Thus,the number of training examples needed is at most linear in the parameters of the model.The fact that we relied on64-bitfloating point makes this argument not entirely satisfying,but the conclusion is nonetheless roughly correct:If what we’re going to do is try to minimize training error,then in order to learn9“well”using a hypothesis class that has d parameters,generally we’re goingto need on the order of a linear number of training examples in d.(At this point,it’s worth noting that these results were proved for an al-gorithm that uses empirical risk minimization.Thus,while the linear depen-dence of sample complexity on d does generally hold for most discriminativelearning algorithms that try to minimize training error or some approxima-tion to training error,these conclusions do not always apply as readily todiscriminative learning algorithms.Giving good theoretical guarantees onmany non-ERM learning algorithms is still an area of active research.)The other part of our previous argument that’s slightly unsatisfying isthat it relies on the parameterization of H.Intuitively,this doesn’t seem likeit should matter:We had written the class of linear classifiers as hθ(x)=1{θ0+θ1x1+···θn x n≥0},with n+1parametersθ0,...,θn.But it could also be written h u,v(x)=1{(u20−v20)+(u21−v21)x1+···(u2n−v2n)x n≥0} with2n+2parameters u i,v i.Yet,both of these are just defining the sameH:The set of linear classifiers in n dimensions.To derive a more satisfying argument,lets define a few more things.Given a set S={x(i),...,x(d)}(no relation to the training set)of points x(i)∈X,we say that H shatters S if H can realize any labeling on S.I.e.,if for any set of labels{y(1),...,y(d)},there exists some h∈H so that h(x(i))=y(i)for all i=1,...d.Given a hypothesis class H,we then define its Vapnik-Chervonenkisdimension,written VC(H),to be the size of the largest set that is shatteredby H.(If H can shatter arbitrarily large sets,then VC(H)=∞.)For instance,consider the following set of three points:x2x1Can the set H of linear classifiers in two dimensions(h(x)=1{θ0+θ1x1+θ2x2≥0})can shatter the set above?The answer is yes.Specifically,we10see that,for any of the eight possible labelings of these points,we can find a linear classifier that obtains “zero training error”on them:x x 12x x 12x x 12x x 12x x 12x x 12x x 12x x 12Moreover,it is possible to show that there is no set of 4points that this hypothesis class can shatter.Thus,the largest set that H can shatter is of size 3,and hence VC(H )=3.Note that the VC dimension of H here is 3even though there may be sets of size 3that it cannot shatter.For instance,if we had a set of three points lying in a straight line (left figure),then there is no way to find a linear separator for the labeling of the three points shown below (right figure):x x 12x x 12In order words,under the definition of the VC dimension,in order to prove that VC(H )is at least d ,we need to show only that there’s at least one set of size d that H can shatter.The following theorem,due to Vapnik,can then be shown.(This is,many would argue,the most important theorem in all of learning theory.)11 Theorem.Let H be given,and let d=VC(H).Then with probability at least1−δ,we have that for all h∈H,|ε(h)−ˆε(h)|≤O d m log m d+1m log1δ .Thus,with probability at least1−δ,we also have that:ε(ˆh)≤ε(h∗)+O d m log m d+1m log1δ .In other words,if a hypothesis class hasfinite VC dimension,then uniform convergence occurs as m becomes large.As before,this allows us to give a bound onε(h)in terms ofε(h∗).We also have the following corollary: Corollary.For|ε(h)−ˆε(h)|≤γto hold for all h∈H(and henceε(ˆh)≤ε(h∗)+2γ)with probability at least1−δ,it suffices that m=Oγ,δ(d).In other words,the number of training examples needed to learn“well”using H is linear in the VC dimension of H.It turns out that,for“most”hypothesis classes,the VC dimension(assuming a“reasonable”parameter-ization)is also roughly linear in the number of parameters.Putting these together,we conclude that(for an algorithm that tries to minimize training error)the number of training examples needed is usually roughly linear in the number of parameters of H.。
纸浆中残留黑液对漂白废水污染负荷的影响
摘要近年来,我国对造纸工业废水的排放标准愈发严格,特别是大多数浆厂仍采用含氯漂剂带来废水的污染问题日益突出,引起了社会的广泛重视。
对于纸浆漂白过程中COD Cr、BOD5、AOX的生成原因,过去认为未漂浆中的残余固相木素是其主要来源。
实际上,因浆料特性、提取工艺和设备等原因,黑液不能从固相纤维中被完全提取出来,未被分离的黑液木质素残留在固相纤维中,与固相纤维一起经受漂白作用,进一步降解形成的有机物存在于漂后废水中,从而对废水的污染控制和治理带来严重的后果。
以传统蒸煮化学浆为原料,对蒸煮黑液进行特征分析,结果表明黑液色度很深,黑液的BOD5/COD Cr为0.18,黑液中含有大量抑止微生物生长的有机物,黑液中有机成分非常复杂,其中芳香族化合物的相对含量达到76.84%,是蒸煮黑液的主要污染物。
采用传统次氯酸盐漂白探讨了有效氯用量、时间、温度、浆浓对纸浆性能的影响,在单因素试验的基础上将响应面法应用到纸浆漂白条件的优化上,得到纸浆漂白最佳的工艺条件为有效氯用量6%、温度52℃、时间25 min,漂后纸浆的白度、粘度和漂损分别是48.0%ISO、347.2 mL·g-1、4.10%。
以纸浆洗涤后残留黑液为对象,研究含有不同残留黑液量即不同洗净度的浆料在漂白过程中液相有机物对废水污染负荷(COD Cr、BOD5)的贡献。
彻底洗净浆料在不同的有效氯用量下进行漂白时,漂损与废水COD Cr和BOD5之间线性关系十分显著,废水的可生化性较差。
对于彻底洗净浆料和未洗净含残留黑液的浆料,随着有效氯用量的增加,漂后的白度逐渐增大,卡伯值逐渐减小,固相纤维中木素的去除率增大,浆料的卡伯值和白度呈现一定的线性关系。
未洗净浆料中含残留黑液对废水污染负荷的影响较大,在相同的漂白条件下,浆中残留黑液量越大,黑液漂后对废水总的COD Cr和BOD5污染负荷贡献越大,残留黑液产生的COD Cr和和BOD5可分别占总COD Cr和总BOD5的34 %及22 %以上,负荷贡献均最高可达90%,表明浆中残留黑液是漂白过程中的主要污染源之一。
SCI_期刊最新分区表-JCR-中科院-2015最新版
阈值阈值3.4585225序号123456789101112131415161718192021222324252627282930 31 32NanoNanotechnology Biology andPROPHOTOVOLTAICSPROCEEDIEEEJOURNALPROGRESGROWTH ANDCHARACTERIZATION OFMATERIALSACM COMPUPolymCHEMISTRYACPROGRESSCIENCETREBIOTECHNOLOGYADVANCEMATERIALSNANOANNUALBIOMEDICAL ENGINEERINGLAB OINTERMATERIALS REVIEWSBIOTECADVANCESCURRENBIOTECHNOLOGYMaterBIOMAnnual Revand Biomolecular EngineeringSADVANCEANNUALMATERIALS RESEARCHNATURENature NNATURE BIEnergy EducTechnologyPROGRESSSCIENCEMATERIAENGINEERING R-REPORTSNanPROGRESAND COMBUSTION刊111111111111111111111111377分区111111111549-9634NANOMED-NANOTECHNOL1062-7995PROG PHOTOVOLTAICS0018-9219P IEEE1369-7021MATER TODAY0142-9612BIOMATERIALS0021-9517J CATAL1473-0197LAB CHIP0950-6608INT MATER REV0960-8974PROG CRYST GROWTH CH0360-0300ACM COMPUT SURV1558-3724POLYM REV1616-301X ADV FUNCT MATER0734-9750BIOTECHNOL ADV0897-4756CHEM MATER1947-5438ANNU REV CHEM BIOMOL1613-6810SMALL0958-1669CURR OPIN BIOTECH1531-7331ANNU REV MATER RES1936-0851ACS NANO0079-6816PROG SURF SCI0167-7799TRENDS BIOTECHNOLISSN刊名简称1476-1122NAT MATER1748-3387NAT NANOTECHNOL1523-9829ANNU REV BIOMED ENG1区2区期刊数期刊数3区阈值1.931 1.1121087-0156NAT BIOTECHNOL1301-8361ENERGY EDUC SCI TECH0079-6425PROG MATER SCI0935-9648ADV MATER0927-796X MAT SCI ENG R1748-0132NANO TODAY0360-1285PROG ENERG COMBUST1530-6984NANO LETT33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 5051 52 5354 55 56 57 58 59 6061 62CRITICALFOOD SCIENCE ANDCURRENSOLID STATE & MATERIALSIEEE TRANINDUSTRIAL ELECTRONICSMACROMOLCOMMUNICATIONSSOLAMATERIALS AND SOLARJOURNASOURCESINTERNATIOF PLASTICITYBiotechnolMIS QSIAM JouSciencesBIORTECHNOLOGYADVANCEMECHANICSELECTROCOMMUNICATIONSMOLECULAFOOD RESEARCHJOURNALCHEMISTRYMETABOLICSoActa BNaIEEE CoSurveys and TutorialsIEEE SIGNAMAGAZINECAIEEE TRANPATTERN ANALYSIS ANDMACHINE INTELLIGENCEMACRORENESUSTAINABLE ENERGYCRITICALBIOTECHNOLOGYNPG ABIOSEBIOELECTRONICSPROGRESELECTRONICSMRS1111111111111111111111111111110378-7753J POWER SOURCES1040-8398CRIT REV FOOD SCI1388-2481ELECTROCHEM COMMUN1359-0286CURR OPIN SOLID ST M0278-0046IEEE T IND ELECTRON1022-1336MACROMOL RAPID COMM0927-0248SOL ENERG MAT SOL C1613-4125MOL NUTR FOOD RES1744-683X SOFT MATTER1742-7061ACTA BIOMATER0960-8524BIORESOURCE TECHNOL0065-2156ADV APPL MECH1754-6834BIOTECHNOL BIOFUELS0276-7783MIS QUART1053-5888IEEE SIGNAL PROC MAG0008-6223CARBON2040-3364NANOSCALE1936-4954SIAM J IMAGING SCI0162-8828IEEE T PATTERN ANAL0024-9297MACROMOLECULES0749-6419INT J PLASTICITY1553-877X IEEE COMMUN SURV TUT0883-7694MRS BULL0959-9428J MATER CHEM1364-0321RENEW SUST ENERG REV0738-8551CRIT REV BIOTECHNOL1096-7176METAB ENG1884-4049NPG ASIA MATER0956-5663BIOSENS BIOELECTRON0079-6727PROG QUANT ELECTRON63 64 65 6667 68 69 70 71 7273 74 7576 77 7879 8081 82 83 8485 86878889 90 91 92 93 94SENSACTUATORS B-CHEMICALAPPLIED SREVIEWSFOOD CPlaJournal of NMicrofluidicsENIEEE JOSELECTED TOPICS INQUANTUM ELECTRONICSIEEE TRANFUZZY SYSTEMSBIOTECHBIOENGINEERINGIEEE TRANPOWER ELECTRONICSBIOMASSELECTROAnnual RScience and TechnologyNANOTEACM TRANGRAPHICSJOURNALSCIENCETRENDS IN& TECHNOLOGYIEEE JOSELECTED AREAS INORGANICACTA MINTERNATIOF COMPUTER VISIONIEEE TRANEVOLUTIONARYAPPLIEACS ApplInterfacesINTERNATIOF HYDROGEN ENERGYJOURNAL OMATERIALSMEDICAL IMHUMANINTERACTIONMicrobialInternationalSystems222211112222111211111111111111111741-2560J NEURAL ENG1613-4982MICROFLUID NANOFLUID0308-8146FOOD CHEM0360-5442ENERGY0016-2361FUEL1557-1955PLASMONICS0376-7388J MEMBRANE SCI0925-4005SENSOR ACTUAT B-CHEM1077-260X IEEE J SEL TOP QUANT1063-6706IEEE T FUZZY SYST0961-9534BIOMASS BIOENERG0570-4928APPL SPECTROSC REV1941-1413ANNU REV FOOD SCI T0957-4484NANOTECHNOLOGY0730-0301ACM T GRAPHIC0013-4686ELECTROCHIM ACTA0306-2619APPL ENERG1944-8244ACS APPL MATER INTER0006-3592BIOTECHNOL BIOENG0885-8993IEEE T POWER ELECTR0733-8716IEEE J SEL AREA COMM1566-1199ORG ELECTRON1361-8415MED IMAGE ANAL0737-0024HUM-COMPUT INTERACT0360-3199INT J HYDROGEN ENERG1359-6454ACTA MATER1475-2859MICROB CELL FACT0129-0657INT J NEURAL SYST0924-2244TRENDS FOOD SCI TECH0304-3894J HAZARD MATER0920-5691INT J COMPUT VISION1089-778X IEEE T EVOLUT COMPUT9596 979899100 101 102 103 104 105106 107108 109 110 111 112 113 114 115116 117 118119 120 121 122 123 124DENTALCOMPOSAND TECHNOLOGYJOUBIOTECHNOLOGYIEEE CIntelligence MagazineJournal ofBehavior of BiomedicalCOMBUSTIINTERNATIOF ROBOTICS RESEARCHENVIRMODELLING & SOFTWARESCIETECHNOLOGY OFFOOD HYDJournal of Cand ManagementIEEE TRANIMAGE PROCESSINGIEEE COMMAGAZINEFuJOUNANOPARTICLE RESEARCHMARINE BIOJournal of SINFORMATCORROSMICROMICROANALYSISINTERNATIOF NONLINEAR SCIENCESAND NUMERICALSIMULATIONFOOD MIBIOMMICRODEVICESAPPLIED MAND BIOTECHNOLOGYIEEE JOURSTATE CIRCUITSINTERNATIOF FOOD MICROBIOLOGYInternatioNanomedicineCHEMICALJOURNALCurrenBiomechanicMechanobiology2222222222222222222222222222221392-3730J CIV ENG MANAG1057-7149IEEE T IMAGE PROCESS0266-3538COMPOS SCI TECHNOL0168-1656J BIOTECHNOL1388-0764J NANOPART RES1436-2228MAR BIOTECHNOL0163-6804IEEE COMMUN MAG1556-603X IEEE COMPUT INTELL M1548-7660J STAT SOFTW0020-0255INFORM SCIENCES0109-5641DENT MATER1615-6846FUEL CELLS1431-9276MICROSC MICROANAL1751-6161J MECH BEHAV BIOMED0278-3649INT J ROBOT RES1364-8152ENVIRON MODELL SOFTW1178-2013INT J NANOMED0010-2180COMBUST FLAME1468-6996SCI TECHNOL ADV MAT0268-005X FOOD HYDROCOLLOID0010-938X CORROS SCI1385-8947CHEM ENG J1570-1646CURR PROTEOMICS1617-7959BIOMECH MODEL MECHAN0018-9200IEEE J SOLID-ST CIRC0168-1605INT J FOOD MICROBIOL0740-0020FOOD MICROBIOL1387-2176BIOMED MICRODEVICES0175-7598APPL MICROBIOL BIOT1565-1339INT J NONLIN SCI NUM125126127128129130131132133134135136137138139140141142143144145146147148149150151152153Science of ANanoscale IEEE ELECLETTERSIEEE TRANINFORMATION THEORY FUEL PTECHNOLOGY PROGRESSSCIENCES BiofIEEE TRANSOFTWARE ENGINEERINGFOODINTERNATIONAL COMPUTEAND INFRASTRUCTUREENGINEERING JOURNAL OMATERIALS RESEARCHInnovativeEmerging Technologies Journal of Nand Rehabilitation SCRIPTA IBM JORESEARCH AND JOURNAL JOURNALAND COMPATIBLE JOURNALLEARNING RESEARCHVLDB JOUSUPERCRITICAL FLUIDS IEEE TRANNEURAL NETWORKS IEEE TRANSYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS InternatioElectrochemical Science PROCEEDCOMBUSTION INSTITUTE DYES AN IEEE IndusMagazineSEPARPURIFICATION CEL GOLD 222222222222222222222222222220098-5589IEEE T SOFTWARE ENG1947-2935SCI ADV MATER0963-9969FOOD RES INT 1466-8564INNOV FOOD SCI EMERG 1743-0003J NEUROENG REHABIL 0376-0421PROG AEROSP SCI 1758-5082BIOFABRICATION 0741-3106IEEE ELECTR DEVICE L 0018-9448IEEE T INFORM THEORY 0004-5411J ACM0883-9115J BIOACT COMPAT POL 1359-6462SCRIPTA MATER 0378-3820FUEL PROCESS TECHNOL 1532-4435J MACH LEARN RES 1066-8888VLDB J1931-7573NANOSCALE RES LETT 0018-8646IBM J RES DEV0017-1557GOLD BULL 1093-9687COMPUT-AIDED CIV INF 0896-8446J SUPERCRIT FLUID 1045-9227IEEE T NEURAL NETWOR1540-7489P COMBUST INST 1549-3296J BIOMED MATER RES A 1932-4529IEEE IND ELECTRON M 1383-5866SEP PURIF TECHNOL0969-0239CELLULOSE 0143-7208DYES PIGMENTS 1083-4419IEEE T SYST MAN CY B 1452-3981INT J ELECTROCHEM SC154155 156157158 159 160161 162 163164 165166167168 169 170171 172173174 175 176 177178 179 180181 182 183 184COMPOSAPPLIED SCIENCE ANDMANUFACTURINGIEEE TRANENERGY CONVERSIONISPRS JPHOTOGRAMMETRY ANDREMOTE SENSINGIEEE PoMagazineBIOCENGINEERING JOURNALPATTERNIEEEEARTHQUJournal ofIEEE TRANSIGNAL PROCESSINGREACTIVEPOLYMERSENERGFOOD ANTOXICOLOGYIEEE TraRoboticsAUTIEEECOMPUTINGANNUALINFORMATION SCIENCEAND TECHNOLOGYIEEE INSYSTEMSJournal ofARTIFICIALADVABIOCHEMICALENGINEERING /BIOTECHNOLOGYINTEGRATEAIDED ENGINEERINGIEEE TRANGEOSCIENCE AND REMOTESENSINGIEEE TRANINTELLIGENTTRANSPORTATIONSYSTEMSRENEWAIEEE-ASMEON MECHATRONICSBIOTECOMPREHEIN FOOD SCIENCE ANDFOOD SAFETYBMC BIOTFOOD22222222222222222222222222222220887-0624ENERG FUEL1359-835X COMPOS PART A-APPL S0924-2716ISPRS J PHOTOGRAMM1540-7977IEEE POWER ENERGY M0272-1732IEEE MICRO0885-8969IEEE T ENERGY CONVER1570-8268J WEB SEMANT1053-587X IEEE T SIGNAL PROCES1381-5148REACT FUNCT POLYM8755-2930EARTHQ SPECTRA1069-2509INTEGR COMPUT-AID E0196-2892IEEE T GEOSCI REMOTE1369-703X BIOCHEM ENG J0031-3203PATTERN RECOGN1552-3098IEEE T ROBOT0005-1098AUTOMATICA1083-4435IEEE-ASME T MECH0736-6205BIOTECHNIQUES1524-9050IEEE T INTELL TRANSP1089-7801IEEE INTERNET COMPUT1942-0862MABS-AUSTIN1541-4337COMPR REV FOOD SCI F0278-6915FOOD CHEM TOXICOL0960-1481RENEW ENERG0956-7135FOOD CONTROL0066-4200ANNU REV INFORM SCI1556-4959J FIELD ROBOT0004-3702ARTIF INTELL1541-1672IEEE INTELL SYST0724-6145ADV BIOCHEM ENG BIOT1472-6750BMC BIOTECHNOL185 186187 188189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204205 206 207 208 209 210 211 212JOURNEUROPEAN CERAMICIEEE PhoINTERNAJOURNALPLANT FOONUTRITIONIEEE CONTMAGAZINECOMPUINTELLIGENCEIEEE TRANPOWER SYSTEMSLEBENWISSENSCHAFT UND-TECHNOLOGIE-FOODSCIENCE ANDJOURNAL OMICROBIOLOGY &BIOTECHNOLOGYCHEMICALSCIENCEEXPERT SAPPLICATIONSIEEE TRANELECTRON DEVICESJOURNAL OTECHNOLOGYJOURNASCIENCETRANSRESEARCH PART B-JOURNALTHERMODYNAMICSAPPLIED SOINFORMANAGEMENTJournal ofMateriaEngineering C-Materials forARCCOMPUTATIONALANNALS OENGINEERINGCEMENT ARESEARCHIEEE TRANMOBILE COMPUTINGCOMPUTEREnterprisSystemsJOURNAL OTECHNOLOGYJOURNELECTROCHEMICAL22222222222222222222222222221367-5435J IND MICROBIOL BIOT0009-2509CHEM ENG SCI0885-8950IEEE T POWER SYST0023-6438LWT-FOOD SCI TECHNOL0018-9383IEEE T ELECTRON DEV0955-2219J EUR CERAM SOC0958-6946INT DAIRY J0921-9668PLANT FOOD HUM NUTR1748-0221J INSTRUM1943-0655IEEE PHOTONICS J1066-033X IEEE CONTR SYST MAG0824-7935COMPUT INTELL-US0957-4174EXPERT SYST APPL0928-4931MAT SCI ENG C-MATER1134-3060ARCH COMPUT METHOD E0090-6964ANN BIOMED ENG1568-4946APPL SOFT COMPUT0378-7206INFORM MANAGE-AMSTER0733-5210J CEREAL SCI0191-2615TRANSPORT RES B-METH0360-1315COMPUT EDUC1751-7575ENTERP INF SYST-UK0008-8846CEMENT CONCRETE RES0021-9614J CHEM THERMODYN0268-3962J INF TECHNOL0013-4651J ELECTROCHEM SOC0733-8724J LIGHTWAVE TECHNOL1536-1233IEEE T MOBILE COMPUT213214 215 216217 218219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237238239 240INTERNATIOF COAL GEOLOGYJOURNAMERICAN SOCIETY FORINFORMATION SCIENCEAND TECHNOLOGYMATERIALAND PHYSICSULTRAMFUTURECOMPUTER SYSTEMSJOURNALINFORMATION SYSTEMSJOURNAL OCOMPOUNDSPhotonics anFundamentals andJOURNAL OMATERIALS RESEARCHPART B-APPLIEDBIOMATERIALSCOMPUTEAPPLIED MECHANICS ANDENGINEERINGJOURNALSCIENCE-MATERIALS INMEDICINEBiomedFOOD QPREFERENCEIEEE WCOMMUNICATIONSIEEE JourTopics in Signal ProcessingSOLAIEEE TRANWIRELESSCOMMUNICACMBioinIEEE TRANAUTOMATIC CONTROLJOURNENGINEERINGAUSTRALIAGRAPE AND WINEMEATIEEE PCOMPUTINGUSER MOUSER-ADAPTEDEVOLCOMPUTATIONIEEE TraAutonomous MentalBIOTECPROGRESS22222222222222222222222222221532-2882J AM SOC INF SCI TEC0925-8388J ALLOY COMPD1569-4410PHOTONIC NANOSTRUCT0167-739X FUTURE GENER COMP SY1552-4973J BIOMED MATER RES B0045-7825COMPUT METHOD APPL M0166-5162INT J COAL GEOL0963-8687J STRATEGIC INF SYST0018-9286IEEE T AUTOMAT CONTR0254-0584MATER CHEM PHYS0957-4530J MATER SCI-MATER M1748-6041BIOMED MATER1932-4553IEEE J-STSP0304-3991ULTRAMICROSCOPY1536-1276IEEE T WIREL COMMUN0001-0782COMMUN ACM1934-8630BIOINTERPHASES0038-092X SOL ENERGY0924-1868USER MODEL USER-ADAP1063-6560EVOL COMPUT0950-3293FOOD QUAL PREFER1536-1284IEEE WIREL COMMUN1322-7130AUST J GRAPE WINE R0309-1740MEAT SCI1536-1268IEEE PERVAS COMPUT1943-0604IEEE T AUTON MENT DE8756-7938BIOTECHNOL PROGR0260-8774J FOOD ENG241 242243 244 245 246 247 248 249250 251252 253 254255 256257 258259 260261262 263 264 265 266267 268269COMPUTERENGINEERINGJOUBIOMATERIALSENERGY COMANAGEMENTIEEE TraIndustrial InformaticsIEEE TRANBIOMEDICAL ENGINEERINGBUILDENVIRONMENTIEEE JOQUANTUM ELECTRONICSIEEECOMPOSITIEEE-ACM TON NETWORKINGMATERIAJOURNAL OINTERIEEE RAUTOMATION MAGAZINECHEMOMINTELLIGENT LABORATORYSYSTEMSINTERNATIOF HEAT AND MASSFLUID PHAIEEE TRANSYSTEMS MAN ANDCYBERNETICS PART A-SYSTEMS AND HUMANSAICHENEURAL CInternatioSemantic Web andIEEE TRANVISUALIZATION ANDCOMPUTER GRAPHICSDESAKNOWLINFORMATION SYSTEMSARTIFJOURNAMERICAN CERAMICJOUMICROMECHANICS ANDJOURNCOMPOSITION ANDDECISIOSYSTEMS222222222222222222222222222220933-2790J CRYPTOL0966-9795INTERMETALLICS0098-1354COMPUT CHEM ENG0196-8904ENERG CONVERS MANAGE1551-3203IEEE T IND INFORM0017-9310INT J HEAT MASS TRAN0378-3812FLUID PHASE EQUILIBR1070-9932IEEE ROBOT AUTOM MAG0018-9294IEEE T BIO-MED ENG1083-4427IEEE T SYST MAN CY A0001-1541AICHE J0885-3282J BIOMATER APPL0169-7439CHEMOMETR INTELL LAB1552-6283INT J SEMANT WEB INF0360-1323BUILD ENVIRON0890-8044IEEE NETWORK0263-8223COMPOS STRUCT0002-7820J AM CERAM SOC0018-9197IEEE J QUANTUM ELECT1063-6692IEEE ACM T NETWORK0167-577X MATER LETT0899-7667NEURAL COMPUT0960-1317J MICROMECH MICROENG0889-1575J FOOD COMPOS ANAL0167-9236DECIS SUPPORT SYST0219-1377KNOWL INF SYST1064-5462ARTIF LIFE1077-2626IEEE T VIS COMPUT GR0011-9164DESALINATION270271272 273 274 275 276 277 278 279 280 281 282 283284 285 286287288 289290 291 292ACM TransaNetworksHYDROMENERGY AMATERIALSENGINEERING A-STRUCTURAL MATERIALSPROPERTIES MICROSTFOOD ADCONTAMINANTSJOURNAL OEDUCATIONIEEE PTECHNOLOGY LETTERSIEEE ENGMEDICINE AND BIOLOGYMAGAZINEINDUENGINEERING CHEMISTRYEUROPEJOURNAL EINTERNATIOF MACHINE TOOLS &MANUFACTUREJOUBIOMATERIALS SCIENCE-NEURALJOURNALTECHNOLOGY ANDBIOTECHNOLOGYBaltic JourBridge EngineeringINTERNATIOF ELECTRICAL POWER &ENERGY SYSTEMSIEEE TRANCIRCUITS AND SYSTEMSFOR VIDEO TECHNOLOGYJOURNAL OELECTROCHEMISTRYJOUMANAGEMENTJOUMICROELECTROMECHANICAL SYSTEMSIEEE TRANSYSTEMS MAN ANDCYBERNETICS PART C-APPLICATIONS AND REJournal of thInformation SystemsMATERIALBULLETIN222222222222222222222221041-1135IEEE PHOTONIC TECH L0739-5175IEEE ENG MED BIOL0304-386X HYDROMETALLURGY0378-7788ENERG BUILDINGS0890-6955INT J MACH TOOL MANU0920-5063J BIOMAT SCI-POLYM E0888-5885IND ENG CHEM RES0921-5093MAT SCI ENG A-STRUCT0893-6080NEURAL NETWORKS0268-2575J CHEM TECHNOL BIOT1550-4859ACM T SENSOR NETWORK1292-8941EUR PHYS J E0025-5408MATER RES BULL1944-0049FOOD ADDIT CONTAM A1822-427X BALT J ROAD BRIDGE E0142-0615INT J ELEC POWER1432-8488J SOLID STATE ELECTR1069-4730J ENG EDUC1057-7157J MICROELECTROMECH S1094-6977IEEE T SYST MAN CY C1536-9323J ASSOC INF SYST0742-1222J MANAGE INFORM SYST1051-8215IEEE T CIRC SYST VID293294295 296297 298 299300 301 302303 304 305 306 307 308 309 310 311312 313314 315 316317 318 319 320 321BIOPROBIOSYSTEMSSTRUCTUMONITORING-ANINTERNATIONAL JOURNALTRANSRESEARCH PART A-POLICYMECHANIAND SIGNAL PROCESSINGMACROMATERIALS ANDCEMENTCOMPOSITESCOASTALIET NanoDATA MKNOWLEDGE DISCOVERYJournal of DiBIODEGIEEE TRANANTENNAS ANDMECHANICSSMART MSTRUCTURESSTRUCTUINTERNATIOF APPROXIMATEPOWDERJOURNAL OAND ENGINEERINGIEEE TRANNANOTECHNOLOGYELECTROCSOLID STATE LETTERSAPPLIEENGINEERINGSURFACETECHNOLOGYIEEE TRANKNOWLEDGE AND DATAENGINEERINGINTERNATIOF ENERGY RESEARCHCOMPULINGUISTICSINTERNATIOF ADHESION ANDINTERNATIFOR NUMERICAL METHODSIN ENGINEERINGFunctionalIEEE TRANMICROWAVE THEORY ANDTECHNIQUES333333333332222222222222222220965-8564TRANSPORT RES A-POL1438-7492MACROMOL MATER ENG0888-3270MECH SYST SIGNAL PR0888-613X INT J APPROX REASON0032-5910POWDER TECHNOL1615-7591BIOPROC BIOSYST ENG1475-9217STRUCT HEALTH MONIT0378-3839COAST ENG1751-8741IET NANOBIOTECHNOL1099-0062ELECTROCHEM SOLID ST1359-4311APPL THERM ENG1226-086X J IND ENG CHEM1384-5810DATA MIN KNOWL DISC0257-8972SURF COAT TECH1041-4347IEEE T KNOWL DATA EN0958-9465CEMENT CONCRETE COMP1536-125X IEEE T NANOTECHNOL0891-2017COMPUT LINGUIST1551-319X J DISP TECHNOL0018-926X IEEE T ANTENN PROPAG0167-6636MECH MATER1793-6047FUNCT MATER LETT0923-9820BIODEGRADATION0964-1726SMART MATER STRUCT0167-4730STRUCT SAF0363-907X INT J ENERG RES0018-9480IEEE T MICROW THEORY0143-7496INT J ADHES ADHES0029-5981INT J NUMER METH ENG322323 324325 326327 328 329 330 331332 333 334335 336337 338 339 340 341 342 343 344 345 346 347 348349 350 351CHEMIDEPOSITIONSENSACTUATORS A-PHYSICALMATERIAIEEE-ACMComputational Biology andBioinformaticsFoodPLASMA CPLASMA PROCESSINGOPTICALARTIFICAPPLIESCIENCEPROGRESCOATINGSIEEE TraInformation Forensics andJOURNALAND ENGINEERING DATAIEEE TraBiomedical Circuits andACM TRANCOMPUTER SYSTEMSENERGSYNTHEACM TransaCOMACM TRANMATHEMATICALADSORPTIOTHE INTERNATIONALADSORPTION SOCIETYDIAMONDMATERIALSTRANSRESEARCH PART E-LOGISTICS ANDTRANSPORTATION REVIEWTHIN SCOMPUTEIMAGE UNDERSTANDINGIEEE TRANMULTIMEDIAJOURNALCONTROLINTERNATIOF THERMAL SCIENCESJournal of FJOURNPROTECTIONCOMPOPERATIONS RESEARCH3333333333333333333333333333330925-3467OPT MATER0948-1907CHEM VAPOR DEPOS0261-3069MATER DESIGN0734-2071ACM T COMPUT SYST0924-4247SENSOR ACTUAT A-PHYS1545-5963IEEE ACM T COMPUT BI1876-4517FOOD SECUR0272-4324PLASMA CHEM PLASMA P0195-6574ENERG J0018-9162COMPUTER0098-3500ACM T MATH SOFTWARE0021-9568J CHEM ENG DATA1932-4545IEEE T BIOMED CIRC S0169-4332APPL SURF SCI0300-9440PROG ORG COAT1366-5545TRANSPORT RES E-LOG0040-6090THIN SOLID FILMS0929-5607ADSORPTION1556-6013IEEE T INF FOREN SEC1077-3142COMPUT VIS IMAGE UND1520-9210IEEE T MULTIMEDIA0160-564X ARTIF ORGANS0925-9635DIAM RELAT MATER0305-0548COMPUT OPER RES0379-6779SYNTHETIC MET0959-1524J PROCESS CONTR1290-0729INT J THERM SCI1559-1131ACM T WEB1756-4646J FUNCT FOODS0362-028X J FOOD PROTECT352 353354 355 356 357 358359 360361362 363 364 365366 367368 369370 371 372373 374375376377378379380381SIAM JOCOMPUTINGMACHINMEDICAL EPHYSICSGPS SINTERNATIOF DAMAGE MECHANICSBioinspiratioJOURNAMERICAN OIL CHEMISTSCOMPOSENGINEERINGFoodWJOURNALMATERIALSEMPIRICAENGINEERINGAPPLIEAAPGAPPLIED BAND BIOTECHNOLOGYFOOD ANBULLETININTERNATIOF HEAT AND FLUID FLOWSEIET RenGenerationPOLYMADVANCEDKNOWLESYSTEMSIEEE MICWIRELESS COMPONENTSLETTERSInformPOLYMIEEE TRANBROADCASTINGCHEMICALAND PROCESSINGINTERNATIOF CIRCUIT THEORY ANDAPPLICATIONSTRANSRESEARCH PART C-EMERGING TECHNOLOGIESJOURNFRANKLIN INSTITUTE-ENGINEERING ANDAPPLIED MATHEMATICSJOURNALSCIENCE3333333333333333333333333333331056-7895INT J DAMAGE MECH1748-3182BIOINSPIR BIOMIM1350-4533MED ENG PHYS0003-021X J AM OIL CHEM SOC1080-5370GPS SOLUT0142-727X INT J HEAT FLUID FL0097-5397SIAM J COMPUT1359-8368COMPOS PART B-ENG1557-1858FOOD BIOPHYS1382-3256EMPIR SOFTW ENG0885-6125MACH LEARN0149-1423AAPG BULL0273-2289APPL BIOCHEM BIOTECH0379-5721FOOD NUTR BULL1559-128X APPL OPTICS1566-2535INFORM FUSION0142-9418POLYM TEST0043-1648WEAR0022-3115J NUCL MATER1752-1416IET RENEW POWER GEN1042-7147POLYM ADVAN TECHNOL0098-9886INT J CIRC THEOR APP0968-090X TRANSPORT RES C-EMER0018-9316IEEE T BROADCAST0950-7051KNOWL-BASED SYST0016-0032J FRANKLIN I0022-2461J MATER SCI1424-8220SENSORS-BASEL0255-2701CHEM ENG PROCESS1531-1309IEEE MICROW WIREL CO382 383384 385 386387 388389390 391392 393 394395 396397 398399 400 401 402 403 404 405 406407 408IEEE TRANCIRCUITS AND SYSTEMS I-FUNDAMENTAL THEORYAND APPLICATJOUMICROENCAPSULATIONDigesNanomaterials andMODESIMULATION IN MATERIALSSCIENCE ANDENGINEERINGJOURNALMANAGEMENTAd HoEUROPEALIPID SCIENCE ANDMacromolEngineeringCIRPMANUFACTURINGTECHNOLOGYJOURNALSPRAY TECHNOLOGYIEEE TRANDEVICE AND MATERIALSRELIABILITYACM TRANSOFTWARE ENGINEERINGAND METHODOLOGYJOURNSCIENCEAPPLIEDMATERIALS SCIENCE &COMPSTRUCTURESMEXPERIMEINTERNATIOF HUMAN-COMPUTERCurrentJOUMICROSCOPY-OXFORDIEEE TRANINFORMATIONTECHNOLOGY INBIOMEDICINEMETIEEE SIEEE TRANCONTROL SYSTEMSTECHNOLOGYANNUALCONTROLBIOTECHNOMACRORESEARCH3333333333333333333333333330265-2048J MICROENCAPSUL1530-4388IEEE T DEVICE MAT RE1049-331X ACM T SOFTW ENG METH0007-8506CIRP ANN-MANUF TECHN0022-1147J FOOD SCI0947-8396APPL PHYS A-MATER1549-8328IEEE T CIRCUITS-I1059-9630J THERM SPRAY TECHN1996-1944MATERIALS1842-3582DIG J NANOMATER BIOS1063-8016J DATABASE MANAGE1570-8705AD HOC NETW0740-7459IEEE SOFTWARE0965-0393MODEL SIMUL MATER SC1438-7697EUR J LIPID SCI TECH1862-832X MACROMOL REACT ENG0045-7949COMPUT STRUCT1063-6536IEEE T CONTR SYST T1367-5788ANNU REV CONTROL0141-5492BIOTECHNOL LETT1089-7771IEEE T INF TECHNOL B0026-1394METROLOGIA1071-5819INT J HUM-COMPUT ST1573-4137CURR NANOSCI0022-2720J MICROSC-OXFORD1598-5032MACROMOL RES0723-4864EXP FLUIDS409410 411412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430431432433 434 435 436TRIBOLOJOURNALINTELLIGENCE RESEARCHTRIINTERNATIONALMJournal of thof Chemical EngineersDATA & KENGINEERINGDRYING TCOMPUTFORUMMACROTHEORY AND SIMULATIONSCONTROLPRACTICENanoscaleThermophysical EngineeringMETALLUMATERIALSTRANSACTIONS A-PHYSICAL METALLURGYAND MATERIALAUTONOMAND MULTI-AGENTWINDINTERNATIOF REFRACTORY METALS& HARD MATERIALSIEEE TraComputational Intelligence andBIOLOGICAIEEE TransSpeech and LanguageJOURNAELECTROCHEMISTRYIEEE CGRAPHICS ANDJOURNAL OREASONINGCERAMICSLETTERMICROBIOLOGYIEEE TRANVEHICULAR TECHNOLOGYAPPLIED SIEEE TRANULTRASONICSFERROELECTRICS ANDFREQUENCY CONTROLINTERNATIOF ROBUST ANDNONLINEAR CONTROLINTERNATIOF FATIGUE33333333333333333333333333330737-3937DRY TECHNOL0167-7055COMPUT GRAPH FORUM1076-9757J ARTIF INTELL RES0301-679X TRIBOL INT1556-7265NANOSC MICROSC THERM1073-5623METALL MATER TRANS A1022-1344MACROMOL THEOR SIMUL0968-4328MICRON1387-2532AUTON AGENT MULTI-AG1095-4244WIND ENERGY1023-8883TRIBOL LETT0967-0661CONTROL ENG PRACT0018-9545IEEE T VEH TECHNOL1876-1070J TAIWAN INST CHEM E0263-4368INT J REFRACT MET H1943-068X IEEE T COMP INTEL AI0021-891X J APPL ELECTROCHEM0169-023X DATA KNOWL ENG0168-7433J AUTOM REASONING0272-8842CERAM INT0266-8254LETT APPL MICROBIOL0272-1716IEEE COMPUT GRAPH1049-8923INT J ROBUST NONLIN0142-1123INT J FATIGUE0340-1200BIOL CYBERN1558-7916IEEE T AUDIO SPEECH0003-7028APPL SPECTROSC0885-3010IEEE T ULTRASON FERR。
大学328门专业课程标准英文翻译模板(经济,财会,金融,管理等专业)
泽稷网校-财务金融证书在线教育领导品牌
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
高级财务会计 高级会计 高级英语技能 高级英语阅读 高级语言程序设计 工程经济学与管理会计 工程造价管理 工程制图 工业会计 工业企业管理 工业企业经济活动分析 工业行业技术评估概论 公共关系 公共金融学 公关礼仪 Chinese
比较管理学 比较审计学 毕业实习 财务报表分析 财务报告环境 财务分析 财务风险控制 财务管理 财务会计 财务造假对策 财政概论 财政学 财政与信贷 成本会计 程序设计语言 促销与广告 大学生活导论 大学英语
Comparative Management Comparative Auditing Graduation Practice Financial Statements Analysis Financial Reporting Environment Financial Analysis Management of Financial Risks Financial Management Financial Accounting Countermeasure for Financial Fraudulence Introduction to Fiscal Science Fiscal Science Finance and Credit Cost Accounting Programme Design Language Promotion and Advertising Guidance of Campus Life College English
泽稷网校-财务金融证书在线教育领导品牌
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 建筑工程概论 建筑项目预算 金工实习 金融管理软件 金融市场 金融市场与投资 金融学 金融战略 进出口业务 经济地理 经济法概论 经济数学基础 经济文献检索 经济文写作 经济效益审计
计算机电子通信等信息类SCI期刊大全
计算机类SCI分区数据WoS四区(cs.whu)序号刊名1 AEU-INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATIONS2 JOURNAL OF COMMUNICATIONS AND NETWORKS3 International Journal of Network Management4 ETRI JOURNAL5 INTERNATIONAL JOURNAL OF SATELLITE COMMUNICATIONS AND NETWORKING6 Journal of Web Semantics7 R Journal8 Security and Communication Networks9 INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS10 QUEUEING SYSTEMS11 INFORMATICA12 Frontiers of Computer Science13 IET Information Security14 Ad Hoc & Sensor Wireless Networks15 JOURNAL OF COMPUTATIONAL BIOLOGY16 Journal on Multimodal User Interfaces17 INFORMATION TECHNOLOGY AND LIBRARIES18 MICROPROCESSORS AND MICROSYSTEMS19 ENGINEERING COMPUTATIONS20 ACM Transactions on Modeling and Computer Simulation21 ACTA INFORMATICA22 CONCURRENT ENGINEERING-RESEARCH AND APPLICATIONS23 INTEGRATION-THE VLSI JOURNAL24 INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS25 INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE26 International Journal of Microwave and Wireless Technologies27 COMPUTATIONAL INTELLIGENCE28 JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY29 WIRELESS PERSONAL COMMUNICATIONS30 JOURNAL OF VISUALIZATION31 Radioengineering32 IEEE TECHNOLOGY AND SOCIETY MAGAZINE33 ADVANCED ROBOTICS34 IETE JOURNAL OF RESEARCH35 Semiconductors and Semimetals36 China Communications37 International Journal on Document Analysis and Recognition38 International Journal of Humanoid Robotics39 TRANSPORTATION JOURNAL40 Journal of Signal Processing Systems for Signal Image and Video Technology41 AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFAC42 IET Computer Vision43 Journal of the Society for Information Display44 Intelligent Service Robotics45 SIGMOD RECORD46 CONNECTION SCIENCE47 INDUSTRIAL ROBOT-AN INTERNATIONAL JOURNAL48 Elektronika Ir Elektrotechnika49 ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS50 JOURNAL OF ELECTROMAGNETIC WAVES AND APPLICATIONS51 Mobile Information Systems52 Journal of Applied Logic53 Computer Science and Information Systems54 IEICE TRANSACTIONS ON COMMUNICATIONS55 Statistical Analysis and Data Mining56 Computers and Concrete57 AI MAGAZINE58 KYBERNETES59 Journal of Ambient Intelligence and Smart Environments60 ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE61 Advances in Mathematics of Communications62 Information Retrieval Journal63 Advances in Computers64 Research in Transportation Economics65 International Journal on Artificial Intelligence Tools66 Natural Computing67 MODELING IDENTIFICATION AND CONTROL68 Intelligent Data Analysis69 Journal of Simulation70 IEEE AEROSPACE AND ELECTRONIC SYSTEMS MAGAZINE71 Journal of Zhejiang University-SCIENCE C-Computers & Electronics72 Computational and Mathematical Organization Theory73 INTERNATIONAL JOURNAL OF APPLIED ELECTROMAGNETICS AND MECHANICS74 INTERNATIONAL JOURNAL OF QUANTUM INFORMATION75 ENGINEERING WITH COMPUTERS76 Journal of Organizational and End User Computing77 New Review of Hypermedia and Multimedia78 JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION79 JOURNAL OF ELECTRONIC IMAGING80 Biologically Inspired Cognitive Architectures81 PRESENCE-TELEOPERATORS AND VIRTUAL ENVIRONMENTS82 INFORMATION PROCESSING LETTERS83 INTERNATIONAL JOURNAL OF RF AND MICROWAVE COMPUTER-AIDED ENGINEERING84 JOURNAL OF COMPUTATIONAL ACOUSTICS85 Language Resources and Evaluation86 MICROELECTRONICS INTERNATIONAL87 ALGORITHMICA88 IET Software89 Current Computer-Aided Drug Design90 MICROWAVE AND OPTICAL TECHNOLOGY LETTERS91 MATHEMATICAL STRUCTURES IN COMPUTER SCIENCE92 INTERNATIONAL JOURNAL OF ELECTRONICS93 CIRCUIT WORLD94 International Journal of Data Warehousing and Mining95 DISCRETE & COMPUTATIONAL GEOMETRY96 International Arab Journal of Information Technology97 DISCRETE MATHEMATICS AND THEORETICAL COMPUTER SCIENCE98 SIMULATION-TRANSACTIONS OF THE SOCIETY FOR MODELING AND SIMULATION INTERNAT99 INTERNATIONAL JOURNAL OF GAME THEORY100 COMPUTER JOURNAL101 DISCRETE DYNAMICS IN NATURE AND SOCIETY102 Journal of Public Transportation103 Transportation Letters-The International Journal of Transportation Research 104 International Journal of Ad Hoc and Ubiquitous Computing105 THEORETICAL COMPUTER SCIENCE106 JOURNAL OF UNIVERSAL COMPUTER SCIENCE107 Journal of Cellular Automata108 COMPUTER APPLICATIONS IN ENGINEERING EDUCATION109 Journal of Logical and Algebraic Methods in Programming110 ENVIRONMENTAL AND ECOLOGICAL STATISTICS111 FUNDAMENTA INFORMATICAE112 Translator113 JOURNAL OF FUNCTIONAL PROGRAMMING114 JOURNAL OF COMPUTER INFORMATION SYSTEMS115 INTERNATIONAL JOURNAL OF ROBOTICS & AUTOMATION116 CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES117 APPLICABLE ALGEBRA IN ENGINEERING COMMUNICATION AND COMPUTING118 International Journal of Web Services Research119 Logical Methods in Computer Science120 NEW GENERATION COMPUTING121 AI COMMUNICATIONS122 APPLIED ARTIFICIAL INTELLIGENCE123 ANNALS OF PURE AND APPLIED LOGIC124 JOURNAL OF ELECTRONIC TESTING-THEORY AND APPLICATIONS125 RENDICONTI DEL SEMINARIO MATEMATICO DELLA UNIVERSITA DI PADOVA126 THEORY OF COMPUTING SYSTEMS127 INTELLIGENT AUTOMATION AND SOFT COMPUTING128 Advances in Applied Clifford Algebras129 ZEITSCHRIFT FUR ANALYSIS UND IHRE ANWENDUNGEN130 Journal of Nonlinear and Convex Analysis131 JOURNAL OF COMPUTATIONAL MATHEMATICS132 Asian Journal of Communication133 International Journal of Sensor Networks134 Analysis and Mathematical Physics135 IEEE Latin America Transactions136 VIRTUAL REALITY137 Scientific Programming138 Journal of Noncommutative Geometry139 ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING140 Frontiers of Information Technology & Electronic Engineering141 INTERNATIONAL JOURNAL OF NUMERICAL MODELLING-ELECTRONIC NETWORKS DEVICES AN 142 International Communication Gazette143 European Journal of Transport and Infrastructure Research144 Applications of Mathematics145 International Journal of Shipping and Transport Logistics146 JOURNAL OF COMPUTATIONAL ANALYSIS AND APPLICATIONS147 Complex Analysis and Operator Theory148 Journal of Computational and Theoretical Transport149 Malaysian Journal of Computer Science150 DYNAMICAL SYSTEMS-AN INTERNATIONAL JOURNAL151 JOURNAL OF MATHEMATICAL ECONOMICS152 RUSSIAN JOURNAL OF NUMERICAL ANALYSIS AND MATHEMATICAL MODELLING153 Advances in Electrical and Computer Engineering154 Problems of Information Transmission155 Journal of Web Engineering156 JOURNAL OF ORGANIZATIONAL COMPUTING AND ELECTRONIC COMMERCE157 Turkish Journal of Electrical Engineering and Computer Sciences158 JOURNAL OF MICROWAVE POWER AND ELECTROMAGNETIC ENERGY159 LOGIC JOURNAL OF THE IGPL160 STOCHASTIC ANALYSIS AND APPLICATIONS161 ELECTRICAL ENGINEERING162 JOURNAL OF MATHEMATICAL SOCIOLOGY163 Differential and Integral Equations164 ACM Transactions on Asian and Low-Resource Language Information Processing 165 Chinese Journal of Communication166 NETWORK-COMPUTATION IN NEURAL SYSTEMS167 Iranian Journal of Fuzzy Systems168 RAIRO-THEORETICAL INFORMATICS AND APPLICATIONS169 Journal of Systems Science & Complexity170 PROGRAM-ELECTRONIC LIBRARY AND INFORMATION SYSTEMS171 COMPUTATIONAL GEOMETRY-THEORY AND APPLICATIONS172 INFINITE DIMENSIONAL ANALYSIS QUANTUM PROBABILITY AND RELATED TOPICS173 Journal of Logic Language and Information174 Annals of Combinatorics175 ELECTRONIC JOURNAL OF COMBINATORICS176 Pacific Journal of Optimization177 Mathematical Control and Related Fields178 Journal of Pseudo-Differential Operators and Applications179 Journal of Systems Engineering and Electronics180 Journal of Electrical Engineering & Technology181 IEEJ Transactions on Electrical and Electronic Engineering182 DESIGN AUTOMATION FOR EMBEDDED SYSTEMS183 ELECTROMAGNETICS184 IET Computers and Digital Techniques185 Journal of Semiconductor Technology and Science186 MINDS AND MACHINES187 CHINESE JOURNAL OF ELECTRONICS188 Econometrics Journal189 Numerical Mathematics-Theory Methods and Applications190 IEICE TRANSACTIONS ON ELECTRONICS191 ACM Journal on Computing and Cultural Heritage192 Journal of Grey System193 Revista Iberoamericana de Automatica e Informatica Industrial194 DIFFERENTIAL GEOMETRY AND ITS APPLICATIONS195 Journal of Nanoelectronics and Optoelectronics196 JOURNAL OF ALGEBRA AND ITS APPLICATIONS197 COMPUTING AND INFORMATICS198 COMPEL-THE INTERNATIONAL JOURNAL FOR COMPUTATION AND MATHEMATICS IN ELECTRI 199 Homology Homotopy and Applications200 JAPAN JOURNAL OF INDUSTRIAL AND APPLIED MATHEMATICS201 JOURNAL OF COMPUTER AND SYSTEMS SCIENCES INTERNATIONAL202 Social Semiotics203 Journal of Electrical Engineering-Elektrotechnicky Casopis204 JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS205 INFORMACIJE MIDEM-JOURNAL OF MICROELECTRONICS ELECTRONIC COMPONENTS AND MAT 206 Annals of Functional Analysis207 Information Technology and Control208 Discrete Optimization209 Continuum-Journal of Media & Cultural Studies210 JOURNAL OF INFORMATION SCIENCE AND ENGINEERING211 Journal of Transportation Safety & Security212 Revista de la Union Matematica Argentina213 International Journal of Wavelets Multiresolution and Information Processin 214 FREQUENZ215 Fixed Point Theory216 JOURNAL OF DATABASE MANAGEMENT217 QUARTERLY JOURNAL OF SPEECH218 INTEGRATED FERROELECTRICS219 Milan Journal of Mathematics220 IEICE Electronics Express221 Computational Methods and Function Theory222 KSII Transactions on Internet and Information Systems223 Journal of Function Spaces224 FUNCTIONAL ANALYSIS AND ITS APPLICATIONS225 Communication Culture & Critique226 Text & Talk227 ACTA MATHEMATICA SINICA-ENGLISH SERIES228 JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS229 APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY JOURNAL230 Statistics and Its Interface231 COMPUTATIONAL COMPLEXITY232 JOURNAL OF THE KOREAN MATHEMATICAL SOCIETY233 MATHEMATICAL AND COMPUTER MODELLING OF DYNAMICAL SYSTEMS234 Proceedings of the Steklov Institute of Mathematics235 Journal of Mathematics and Music236 Revista Internacional de Metodos Numericos para Calculo y Diseno en Ingenie 237 East Asian Journal on Applied Mathematics238 NATURAL RESOURCE MODELING239 COMPUTER ANIMATION AND VIRTUAL WORLDS240 MATHEMATICAL SOCIAL SCIENCES241 Analele Stiintifice ale Universitatii Ovidius Constanta-Seria Matematica 242 Journal of Mass Media Ethics243 Theory and Applications of Categories244 Mathematics and Financial Economics245 Periodica Mathematica Hungarica246 JAVNOST-THE PUBLIC247 IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS248 COMPUTER MUSIC JOURNAL249 Journal of Numerical Mathematics250 Funkcialaj Ekvacioj-Serio Internacia251 Neural Network World252 INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE253 Automatika254 KYBERNETIKA255 TOPOLOGY AND ITS APPLICATIONS256 INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING EDUCATION257 JOURNAL OF MULTIPLE-VALUED LOGIC AND SOFT COMPUTING258 Romanian Journal of Information Science and Technology259 PMM JOURNAL OF APPLIED MATHEMATICS AND MECHANICS260 COMPUTER SYSTEMS SCIENCE AND ENGINEERING261 Media International Australia262 ALGEBRA COLLOQUIUM263 CMC-Computers Materials & Continua264 ACM SIGPLAN NOTICES265 Advances in Difference Equations266 Iranian Journal of Science and Technology-Transactions of Electrical Engine 267 Rhetoric Society Quarterly268 Glasnik Matematicki269 NARRATIVE INQUIRY270 Mathematical Communications271 ARCHIVE FOR HISTORY OF EXACT SCIENCES272 JOURNAL OF APPLIED COMMUNICATION RESEARCH273 Bollettino di Storia delle Scienze Matematiche274 Economic Computation and Economic Cybernetics Studies and Research275 INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING 276 DYNAMIC SYSTEMS AND APPLICATIONS277 Mathematical Population Studies278 University Politehnica of Bucharest Scientific Bulletin-Series A-Applied Ma 279 IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUT 280 UTILITAS MATHEMATICA281 HISTORIA MATHEMATICA282 MICROWAVE JOURNAL283 CRYPTOLOGIA284 Applied Mathematics-A Journal of Chinese Universities Series B285 Acta Mathematicae Applicatae Sinica-English Series286 PROGRAMMING AND COMPUTER SOFTWARE287 Ukrainian Mathematical Journal288 International Journal of Transport Economics289 JOURNAL OF MEDIA ECONOMICS290 Electronics and Communications in Japan291 FUJITSU SCIENTIFIC & TECHNICAL JOURNAL292 INFOR293 ELECTRICAL ENGINEERING IN JAPAN294 African Journalism Studies295 Tijdschrift voor Communicatiewetenschap296 Journal of African Media Studies297 ICGA JOURNAL298 Pure and Applied Mathematics Quarterly299 Light & Engineering300 EPE Journal301 SOLID STATE TECHNOLOGY302 Journal of the Institute of Telecommunications Professionals303 Traitement du Signal304 ELECTRONICS WORLD305 Road & Transport Research306 IEEE Transactions on Cognitive and Developmental Systems引用次数影响因子eigenFactorScore 1506 1.1470.00259826 1.130.00268225 1.1180.000261153 1.1160.00199354 1.0790.000471091 1.0750.00154422 1.0750.002561095 1.0670.002551474 1.0660.002631335 1.060.00221398 1.0560.00071405 1.0390.00113355 1.0370.00082311 1.0340.000683216 1.0320.00826197 1.0310.00048260 1.0290.00039691 1.0250.00111406 1.010.0019363810.0010278310.0003749810.000350410.0007724610.0001511650.9940.001513580.9760.001096280.9640.000949110.9560.0021335080.9510.007044170.950.0008911340.9450.001912590.9430.0003915720.920.002442950.9090.000334380.9090.000287350.9030.001784610.9020.000525580.90.000644640.8970.000216800.8930.001555770.8850.000584870.8780.0011510280.8770.00151340.8750.0003513140.8750.001413580.8670.000279300.8630.000897700.8590.001266140.850.0011615580.850.002752610.8490.000362830.8380.000733920.8370.00093 17340.8270.00279 3750.8270.00196 4040.8130.00085 12310.8120.00113 7850.8110.00071 2620.8090.00033 8330.8070.00113 2540.80.00168 9270.7910.00103 3510.7890.00048 6690.7810.00166 4220.7780.00063 6630.7780.00084 3530.7750.00031 8220.7720.00113 3610.7720.00063 8860.7710.00099 3230.770.00081 3640.7690.00031 9620.7690.00137 8180.7660.00162 8530.7650.00137 1380.7595.00E-05 1410.7580.00018 19260.7570.00654 18130.7540.00261 1610.7530.00029 16520.750.00054 36180.7480.00625 5970.7460.00106 4540.740.00067 4400.7380.00104 1940.7370.00029 19870.7350.00516 2280.7330.0004 3860.7320.00052 51150.7310.00754 5620.730.00145 12450.7290.00109 2260.7270.00024 2190.7270.00019 17270.7240.00645 5020.7240.0003 3350.7230.00176 12930.720.00137 13120.7130.00235 32140.7110.00301 13590.7110.00349 5440.710.00063 1860.7060.00048 3480.7050.00064 73280.6980.01629 10670.6960.001127860.6880.00125 17110.6870.0028 2360.6860.00028 4050.6770.00081 7300.6750.00073 3320.6740.00046 14080.670.0028 3700.6670.00099 1340.6670.00024 5070.6610.00337 3510.6570.0003 3620.6540.00098 5880.6520.00049 11180.6470.0037 3710.6470.00055 4130.6460.00131 6630.6450.0029 3390.6440.00041 4020.6430.00078 3650.6430.00081 5790.6420.00151 6830.6410.00128 3470.6380.00057 3050.6350.00038 620.6320.00082 10060.6310.00102 3640.6280.00031 3430.6270.00047 2080.6250.00235 13240.6230.00216 880.6220.00016 3820.6220.00065 4860.6220.00103 3610.6190.00061 3240.6180.00058 1730.6090.00048 5730.6090.00097 2980.6050.00209 3230.60.00042 890.68.00E-05 4550.5970.00124 11650.5970.00354 2060.5970.00047 3360.5950.00038 6580.5810.00081 1060.580.00019 2670.5790.00023 6860.5780.00107 4490.5750.00025 3730.5750.0015 7600.5730.00199120.5620 1290.5620.00043 6380.5620.00024 2760.560.00063 2100.5580.00067 5590.5560.00106 3300.5560.00037 7710.5480.00163 8420.5480.00142 2480.5480.00036 4010.5440.0022 18450.5430.01147 3490.5430.00127 1040.5420.00125 840.5290.00056 8250.5290.00152 7680.5250.00183 4890.5170.00076 1390.5160.00035 4100.5160.00058 2310.5150.00049 2750.5150.00073 3580.5140.00045 5290.5130.00075 7910.5130.00287 1900.5090.00091 10750.5030.00156 900.50.00019 3560.50.0002 1440.50.00028 5980.4970.00346 4200.4970.00073 6020.4890.00428 3380.4880.00053 7110.4870.00121 2470.4860.00149 3170.4860.00078 3620.4840.00051 3440.4840.00072 4170.4830.00043 5880.4810.00075 950.4780.00016 1370.4770.00098 1630.4750.00031 4580.4690.00161 4960.4680.00091 6340.4680.00089 930.4650.00034 870.4640.00046 3270.4630.00053 2070.4620.000259130.4570.00109 2670.4570.0012 9790.4560.00189 2500.4550.00111 5690.4520.00119 1500.4510.00068 18410.450.00144 2090.4480.00071 2970.4480.00083 14430.4460.0042 9350.4460.00076 4810.4440.00099 4210.4440.00201 6360.4410.00124 5780.4410.00194 3450.4390.00074 8600.4360.00318 530.4350.00011 840.4310.00017 600.4260.00037 3530.4260.0006 3670.4240.00099 7510.4230.00131 1440.4220.00052 2430.4190.00037 3030.4190.00145 1630.4150.00129 3440.4150.00079 1910.4130.00033 14480.4110.00196 4020.4050.00018 2760.4050.00163 5040.3940.00125 2510.3940.00023 6160.390.00185 1590.380.00041 8080.3790.00125 19520.3770.00551 1520.3750.0001 3320.3650.00087 1080.3650.0002 18640.3570.00117 1550.3480.00015 2640.3460.00068 3750.3430.00142 3150.3390.00093 25410.3350.0032 8490.3350.00233 220.333 5.00E-05 2040.3330.0003 2560.3280.000496920.3080.00072 150.30 1030.2990.00019 3450.2990.00054 2640.2980.00052 1700.2860.00021 1840.2790.00034 13810.2740.00196 4820.2610.00082 2210.2580.00028 3960.2580.00063 1600.2563.00E-05 1740.2470.00049 5380.2420.00106 1140.230.00017 6990.2280.00079 1890.220.00021 1870.2170.00017 1670.2020.00036 1730.1910.00022 3800.1890.00025 3980.1880.00033 120.1712.00E-05 420.1717.00E-05 660.1540.00018 630.1520.00028 2410.1490.00141 410.1183.00E-05 1010.0910.00011 1620.0820.000170.0780 720.0281.00E-05 450.0262.00E-05 1150.0210.00014 7-999.999 1.00E-05。
基于二阶动力学模型的非完整多个体系统有限时间一致性问题研究
近年来 , 多智 能 体 的一 致 性 问 题 已经受 到 了越 来 越 多 的关 注. 这 主 要 得 益 于 多 智 能 体 系统 已经 在 很多领域 得到广 泛应用. 比如 编 队控 制 , 群 集 现 象 等. 多 智能 体 一 致 性 问题 是 指 系 统 的所 有 的状 态 在 所 设计 的控 制算 下 执行 一个 适 当的一 致 性 规则 最 终 达 到一 个共 同 的状 态值 . 多 智 能 体 系 统 一 般 可 以分
an t i — d i s t u r ba nc e a b i l i t y . Ke y wor d s:f ini t e — t i me c o ns e n s us;mul t i — a g e n t s y s t e m ;g r a p h t h e o r y
WE I J i a n — l o n g ,Q I U Z h i — h u i
( S c h o o l o f E l e c t r i c a l E n g i n e e r i n g a n d A u t o ma t i o n , T i a n j i n U n i v e r s i t y , T i a n j i n 3 0 0 0 7 2 , C h i n a )
中 图分 类 号 : T P 2 1 2 文献标识码 : A d o i : 1 0 . 3 9 6 9 / j . i s s n . 1 6 7 3 - 0 9 5 X. 2 0 1 3 . 0 3 . 0 0 1
Fi n i t e - - t i me c o ns e ns u s a l g o r i t hm f o r m ul t i - - a g e nt s y s t e ms wi t h do ub l e — i nt e g r a t o r d y n a mi c s
cs229-9-Gaussian processes
p(xA , xB ; µ, Σ) p(xA , xB ; µ, Σ)dxA xA p(xA , xB ; µ, Σ) p(xA , xB ; µ, Σ)dxB xB
A proof of this property is given in Appendix A.2. (See also Appendix A.3 for an easier version of the derivation.) 4. Summation. The sum of independent Gaussian random variables (with the same dimensionality), y ∼ N (µ, Σ) and z ∼ N (µ′, Σ′ ), is also Gaussian: y + z ∼ N (µ + µ′ , Σ + Σ′ ).
xB
p(xA , xB ; µ, Σ)dxB p(xA , xB ; µ, Σ)dxA
xA
p(xB ) =
5
There are actually cases in which we would want to deal with multivariate Gaussian distributions where Σ is positive semidefinite but not positive definite (i.e., Σ is not full rank). In such cases, Σ−1 does not exist, so the definition of the Gaussian density given in (1) does not apply. For instance, see the course lecture notes on “Factor Analysis.”
cs229斯坦福大学机器学习教程 Lecture note
CS 229 Machine LearningAndrew NgStanford UniversityContentsNote1:Supervised learning 1Note2:Generative Learning algorithms 31Note3:Support Vector Machines 45Note4:Learning Theory 70Note5:Regularization and model selection 81Note6:The perceptron and large margin classifiers 89 Note7a:The k-means clustering algorithm 92Note7b:Mixtures of Gaussians and the EM algorithm 95 Note8:The EM algorithm 99Note9:Factor analysis 107Note10:Principal components analysis 116Note11:Independent Components analysis 122Note12:Reinforcement Learning and Control 128CS229Lecture notesAndrew NgSupervised learningLets start by talking about a few examples of supervised learning problems.Suppose we have a dataset giving the living areas and prices of 47houses from Portland,Oregon:Living area (feet 2)Price (1000$s)21044001600330240036914162323000540......We can plot this data:Given data like this,how can we learn to predict the prices of other houses in Portland,as a function of the size of their living areas?1CS229Winter 20032To establish notation for future use,we’ll use x (i )to denote the “input”variables (living area in this example),also called input features ,and y (i )to denote the “output”or target variable that we are trying to predict (price).A pair (x (i ),y (i ))is called a training example ,and the dataset that we’ll be using to learn—a list of m training examples {(x (i ),y (i ));i =1,...,m }—is called a training set .Note that the superscript “(i )”in the notation is simply an index into the training set,and has nothing to do with exponentiation.We will also use X denote the space of input values,and Y the space of output values.In this example,X =Y =R .To describe the supervised learning problem slightly more formally,our goal is,given a training set,to learn a function h :X →Y so that h (x )is a “good”predictor for the corresponding value of y .For historical reasons,this function h is called a hypothesis .Seen pictorially,the process is therefore like this:house.)xof house)When the target variable that we’re trying to predict is continuous,such as in our housing example,we call the learning problem a regression prob-lem.When y can take on only a small number of discrete values (such as if,given the living area,we wanted to predict if a dwelling is a house or an apartment,say),we call it a classification problem.3Part ILinear RegressionTo make our housing example more interesting,lets consider a slightly richer dataset in which we also know the number of bedrooms in each house:Living area (feet 2)#bedrooms Price (1000$s)2104340016003330240033691416223230004540.........Here,the x ’s are two-dimensional vectors in R 2.For instance,x (i )1is theliving area of the i -th house in the training set,and x (i )2is its number ofbedrooms.(In general,when designing a learning problem,it will be up to you to decide what features to choose,so if you are out in Portland gathering housing data,you might also decide to include other features such as whether each house has a fireplace,the number of bathrooms,and so on.We’ll say more about feature selection later,but for now lets take the features as given.)To perform supervised learning,we must decide how we’re going to rep-resent functions/hypotheses h in a computer.As an initial choice,lets say we decide to approximate y as a linear function of x :h θ(x )=θ0+θ1x 1+θ2x 2Here,the θi ’s are the parameters (also called weights )parameterizing the space of linear functions mapping from X to Y .When there is no risk of confusion,we will drop the θsubscript in h θ(x ),and write it more simply as h (x ).To simplify our notation,we also introduce the convention of letting x 0=1(this is the intercept term ),so thath (x )=n i =0θi x i =θT x,where on the right-hand side above we are viewing θand x both as vectors,and here n is the number of input variables (not counting x 0).Now,given a training set,how do we pick,or learn,the parameters θ?One reasonable method seems to be to make h (x )close to y ,at least for4 the training examples we have.To formalize this,we will define a function that measures,for each value of theθ’s,how close the h(x(i))’s are to the corresponding y(i)’s.We define the cost function:J(θ)=12mi=1(hθ(x(i))−y(i))2.If you’ve seen linear regression before,you may recognize this as the familiar least-squares cost function that gives rise to the ordinary least squares regression model.Whether or not you have seen it previously,lets keep going,and we’ll eventually show this to be a special case of a much broader family of algorithms.1LMS algorithmWe want to chooseθso as to minimize J(θ).To do so,lets use a search algorithm that starts with some“initial guess”forθ,and that repeatedly changesθto make J(θ)smaller,until hopefully we converge to a value of θthat minimizes J(θ).Specifically,lets consider the gradient descent algorithm,which starts with some initialθ,and repeatedly performs the update:θj:=θj−α∂∂θjJ(θ).(This update is simultaneously performed for all values of j=0,...,n.) Here,αis called the learning rate.This is a very natural algorithm that repeatedly takes a step in the direction of steepest decrease of J.In order to implement this algorithm,we have to work out what is the partial derivative term on the right hand side.Letsfirst work it out for the case of if we have only one training example(x,y),so that we can neglect the sum in the definition of J.We have:∂∂θj J(θ)=∂∂θj12(hθ(x)−y)2=2·12(hθ(x)−y)·∂∂θj(hθ(x)−y) =(hθ(x)−y)·∂∂θj n i=0θi x i−y=(hθ(x)−y)x j5 For a single training example,this gives the update rule:1θj:=θj+α y(i)−hθ(x(i)) x(i)j.The rule is called the LMS update rule(LMS stands for“least mean squares”), and is also known as the Widrow-Hofflearning rule.This rule has several properties that seem natural and intuitive.For instance,the magnitude of the update is proportional to the error term(y(i)−hθ(x(i)));thus,for in-stance,if we are encountering a training example on which our prediction nearly matches the actual value of y(i),then wefind that there is little need to change the parameters;in contrast,a larger change to the parameters will be made if our prediction hθ(x(i))has a large error(i.e.,if it is very far from y(i)).We’d derived the LMS rule for when there was only a single training example.There are two ways to modify this method for a training set of more than one example.Thefirst is replace it with the following algorithm: Repeat until convergence{θj:=θj+α m i=1 y(i)−hθ(x(i)) x(i)j(for every j).}The reader can easily verify that the quantity in the summation in the update rule above is just∂J(θ)/∂θj(for the original definition of J).So,this is simply gradient descent on the original cost function J.This method looks at every example in the entire training set on every step,and is called batch gradient descent.Note that,while gradient descent can be susceptible to local minima in general,the optimization problem we have posed here for linear regression has only one global,and no other local,optima;thus gradient descent always converges(assuming the learning rateαis not too large)to the global minimum.Indeed,J is a convex quadratic function. Here is an example of gradient descent as it is run to minimize a quadratic function.1We use the notation“a:=b”to denote an operation(in a computer program)in which we set the value of a variable a to be equal to the value of b.In other words,this operation overwrites a with the value of b.In contrast,we will write“a=b”when we are asserting a statement of fact,that the value of a is equal to the value of b.6The Also shown is the trajectory taken by gradient descent,with was initialized at (48,30).The x’s in thefigure(joined by straight lines)mark the successive values ofθthat gradient descent went through.When we run batch gradient descent tofitθon our previous dataset, to learn to predict housing price as a function of living area,we obtain θ0=71.27,θ1=0.1345.If we plot hθ(x)as a function of x(area),along with the training data,we obtain the followingfigure:If the number of bedrooms were included as one of the input features as well, we getθ0=89.60,θ1=0.1392,θ2=−8.738.The above results were obtained with batch gradient descent.There is an alternative to batch gradient descent that also works very well.Consider the following algorithm:7Loop{for i=1to m,{θj:=θj+α y(i)−hθ(x(i)) x(i)j(for every j).}}In this algorithm,we repeatedly run through the training set,and each time we encounter a training example,we update the parameters according to the gradient of the error with respect to that single training example only. This algorithm is called stochastic gradient descent(also incremental gradient descent).Whereas batch gradient descent has to scan through the entire training set before taking a single step—a costly operation if m is large—stochastic gradient descent can start making progress right away,and continues to make progress with each example it looks at.Often,stochastic gradient descent getsθ“close”to the minimum much faster than batch gra-dient descent.(Note however that it may never“converge”to the minimum, and the parametersθwill keep oscillating around the minimum of J(θ);but in practice most of the values near the minimum will be reasonably good approximations to the true minimum.2)For these reasons,particularly when the training set is large,stochastic gradient descent is often preferred over batch gradient descent.2The normal equationsGradient descent gives one way of minimizing J.Lets discuss a second way of doing so,this time performing the minimization explicitly and without resorting to an iterative algorithm.In this method,we will minimize J by explicitly taking its derivatives with respect to theθj’s,and setting them to zero.To enable us to do this without having to write reams of algebra and pages full of matrices of derivatives,lets introduce some notation for doing calculus with matrices.2While it is more common to run stochastic gradient descent as we have described it and with afixed learning rateα,by slowly letting the learning rateαdecrease to zero as the algorithm runs,it is also possible to ensure that the parameters will converge to the global minimum rather then merely oscillate around the minimum.82.1Matrix derivativesFor a function f :R m ×n →R mapping from m -by-n matrices to the real numbers,we define the derivative of f with respect to A to be:∇A f (A )= ∂f ∂A 11···∂f ∂A 1n .........∂f ∂A m 1···∂f ∂A mnThus,the gradient ∇A f (A )is itself an m -by-n matrix,whose (i,j )-element is ∂f/∂A ij .For example,suppose A = A 11A 12A 21A 22 is a 2-by-2matrix,and the function f :R 2×2→R is given byf (A )=32A 11+5A 212+A 21A 22.Here,A ij denotes the (i,j )entry of the matrix A .We then have ∇A f (A )= 3210A 12A 22A 21 .We also introduce the trace operator,written “tr.”For an n -by-n (square)matrix A ,the trace of A is defined to be the sum of its diagonal entries:tr A =n i =1A iiIf a is a real number (i.e.,a 1-by-1matrix),then tr a =a .(If you haven’t seen this “operator notation”before,you should think of the trace of A as tr(A ),or as application of the “trace”function to the matrix A .It’s more commonly written without the parentheses,however.)The trace operator has the property that for two matrices A and B such that AB is square,we have that tr AB =tr BA .(Check this yourself!)As corollaries of this,we also have,e.g.,tr ABC =tr CAB =tr BCA,tr ABCD =tr DABC =tr CDAB =tr BCDA.The following properties of the trace operator are also easily verified.Here,A and B are square matrices,and a is a real number:tr A =tr A Ttr(A +B )=tr A +tr Btr aA =a tr A9 We now state without proof some facts of matrix derivatives(we won’t need some of these until later this quarter).Equation(4)applies only to non-singular square matrices A,where|A|denotes the determinant of A.We have:∇A tr AB=B T(1)∇A T f(A)=(∇A f(A))T(2)∇A tr ABA T C=CAB+C T AB T(3)∇A|A|=|A|(A−1)T.(4) To make our matrix notation more concrete,let us now explain in detail the meaning of thefirst of these equations.Suppose we have somefixed matrix B∈R n×m.We can then define a function f:R m×n→R according to f(A)=tr AB.Note that this definition makes sense,because if A∈R m×n, then AB is a square matrix,and we can apply the trace operator to it;thus, f does indeed map from R m×n to R.We can then apply our definition of matrix derivatives tofind∇A f(A),which will itself by an m-by-n matrix. Equation(1)above states that the(i,j)entry of this matrix will be given by the(i,j)-entry of B T,or equivalently,by B ji.The proofs of Equations(1-3)are reasonably simple,and are left as an exercise to the reader.Equations(4)can be derived using the adjoint repre-sentation of the inverse of a matrix.32.2Least squares revisitedArmed with the tools of matrix derivatives,let us now proceed tofind in closed-form the value ofθthat minimizes J(θ).We begin by re-writing J in matrix-vectorial notation.Giving a training set,define the design matrix X to be the m-by-n matrix(actually m-by-n+1,if we include the intercept term)that contains 3If we define A′to be the matrix whose(i,j)element is(−1)i+j times the determinant of the square matrix resulting from deleting row i and column j from A,then it can be proved that A−1=(A′)T/|A|.(You can check that this is consistent with the standard way offinding A−1when A is a2-by-2matrix.If you want to see a proof of this more general result,see an intermediate or advanced linear algebra text,such as Charles Curtis, 1991,Linear Algebra,Springer.)This shows that A′=|A|(A−1)T.Also,the determinant of a matrix can be written|A|= j A ij A′ij.Since(A′)ij does not depend on A ij(as can be seen from its definition),this implies that(∂/∂A ij)|A|=A′ij.Putting all this together shows the result.10the training examples’input values in its rows:X = —(x (1))T ——(x (2))T —...—(x (m ))T —.Also,let y be the m -dimensional vector containing all the target values from the training set: y = y (1)y (2)...y (m ) .Now,since h θ(x (i ))=(x (i ))T θ,we can easily verifythat Xθ− y = (x (1))T θ...(x (m ))T θ − y (1)...y (m ) = h θ(x (1))−y (1)...h θ(x (m ))−y (m ) .Thus,using the fact that for a vector z ,we have that z T z =i z 2i :12(Xθ− y )T (Xθ− y )=12m i =1(h θ(x (i ))−y (i ))2=J (θ)Finally,to minimize J ,lets find its derivatives with respect to θ.Combining Equations (2)and (3),we find that∇A T tr ABA T C =B T A T C T +BA T C (5)11Hence,∇θJ (θ)=∇θ12(Xθ− y )T (Xθ− y )=12∇θ θT X T Xθ−θT X T y − y T Xθ+ y T y =12∇θtr θT X T Xθ−θT X T y − y T Xθ+ y T y =12∇θ tr θT X T Xθ−2tr y T Xθ =12X T Xθ+X T Xθ−2X T y =X T Xθ−X T yIn the third step,we used the fact that the trace of a real number is just the real number;the fourth step used the fact that tr A =tr A T ,and the fifth step used Equation (5)with A T =θ,B =B T =X T X ,and C =I ,and Equation (1).To minimize J ,we set its derivatives to zero,and obtain the normal equations :X T Xθ=X T yThus,the value of θthat minimizes J (θ)is given in closed form by the equationθ=(X T X )−1X T y .3Probabilistic interpretationWhen faced with a regression problem,why might linear regression,and specifically why might the least-squares cost function J ,be a reasonable choice?In this section,we will give a set of probabilistic assumptions,under which least-squares regression is derived as a very natural algorithm.Let us assume that the target variables and the inputs are related via the equationy (i )=θT x (i )+ǫ(i ),where ǫ(i )is an error term that captures either unmodeled effects (such as if there are some features very pertinent to predicting housing price,but that we’d left out of the regression),or random noise.Let us further assume that the ǫ(i )are distributed IID (independently and identically distributed)according to a Gaussian distribution (also called a Normal distribution)with12 mean zero and some varianceσ2.We can write this assumption as“ǫ(i)∼N(0,σ2).”I.e.,the density ofǫ(i)is given byp(ǫ(i))=1√2πσexp −(ǫ(i))22σ2 .This implies thatp(y(i)|x(i);θ)=1√2πσexp −(y(i)−θT x(i))22σ2 .The notation“p(y(i)|x(i);θ)”indicates that this is the distribution of y(i) given x(i)and parameterized byθ.Note that we should not condition onθ(“p(y(i)|x(i),θ)”),sinceθis not a random variable.We can also write the distribution of y(i)as as y(i)|x(i);θ∼N(θT x(i),σ2).Given X(the design matrix,which contains all the x(i)’s)andθ,what is the distribution of the y(i)’s?The probability of the data is given by p( y|X;θ).This quantity is typically viewed a function of y(and perhaps X), for afixed value ofθ.When we wish to explicitly view this as a function of θ,we will instead call it the likelihood function:L(θ)=L(θ;X, y)=p( y|X;θ).Note that by the independence assumption on theǫ(i)’s(and hence also the y(i)’s given the x(i)’s),this can also be writtenL(θ)=mi=1p(y(i)|x(i);θ)=mi=11√2πσexp −(y(i)−θT x(i))22σ2 .Now,given this probabilistic model relating the y(i)’s and the x(i)’s,what is a reasonable way of choosing our best guess of the parametersθ?The principal of maximum likelihood says that we should should chooseθso as to make the data as high probability as possible.I.e.,we should chooseθto maximize L(θ).Instead of maximizing L(θ),we can also maximize any strictly increasing function of L(θ).In particular,the derivations will be a bit simpler if we13 instead maximize the log likelihoodℓ(θ):ℓ(θ)=log L(θ)=logmi=11√2πσexp −(y(i)−θT x(i))22σ2=mi=1log1√2πσexp −(y(i)−θT x(i))22σ2=m log1√2πσ−1σ2·12mi=1(y(i)−θT x(i))2.Hence,maximizingℓ(θ)gives the same answer as minimizing1 2mi=1(y(i)−θT x(i))2,which we recognize to be J(θ),our original least-squares cost function.To summarize:Under the previous probabilistic assumptions on the data, least-squares regression corresponds tofinding the maximum likelihood esti-mate ofθ.This is thus one set of assumptions under which least-squares re-gression can be justified as a very natural method that’s just doing maximum likelihood estimation.(Note however that the probabilistic assumptions are by no means necessary for least-squares to be a perfectly good and rational procedure,and there may—and indeed there are—other natural assumptions that can also be used to justify it.)Note also that,in our previous discussion,ourfinal choice ofθdid not depend on what wasσ2,and indeed we’d have arrived at the same result even ifσ2were unknown.We will use this fact again later,when we talk about the exponential family and generalized linear models.4Locally weighted linear regressionConsider the problem of predicting y from x∈R.The leftmostfigure below shows the result offitting a y=θ0+θ1x to a dataset.We see that the data doesn’t really lie on straight line,and so thefit is not very good.14Instead,if we had added an extra feature x2,andfit y=θ0+θ1x+θ2x2,then we obtain a slightly betterfit to the data.(See middlefigure)Naively,itmight seem that the more features we add,the better.However,there is alsoa danger in adding too many features:The rightmostfigure is the result offitting a5-th order polynomial y= 5j=0θj x j.We see that even though the fitted curve passes through the data perfectly,we would not expect this tobe a very good predictor of,say,housing prices(y)for different living areas(x).Without formally defining what these terms mean,we’ll say thefigureon the left shows an instance of underfitting—in which the data clearlyshows structure not captured by the model—and thefigure on the right isan example of overfitting.(Later in this class,when we talk about learningtheory we’ll formalize some of these notions,and also define more carefullyjust what it means for a hypothesis to be good or bad.)As discussed previously,and as shown in the example above,the choice offeatures is important to ensuring good performance of a learning algorithm.(When we talk about model selection,we’ll also see algorithms for automat-ically choosing a good set of features.)In this section,let us talk briefly talkabout the locally weighted linear regression(LWR)algorithm which,assum-ing there is sufficient training data,makes the choice of features less critical.This treatment will be brief,since you’ll get a chance to explore some of theproperties of the LWR algorithm yourself in the homework.In the original linear regression algorithm,to make a prediction at a querypoint x(i.e.,to evaluate h(x)),we would:1.Fitθto minimize i(y(i)−θT x(i))2.2.OutputθT x.In contrast,the locally weighted linear regression algorithm does the fol-lowing:1.Fitθto minimize i w(i)(y(i)−θT x(i))2.2.OutputθT x.15 Here,the w(i)’s are non-negative valued weights.Intuitively,if w(i)is large for a particular value of i,then in pickingθ,we’ll try hard to make(y(i)−θT x(i))2small.If w(i)is small,then the(y(i)−θT x(i))2error term will be pretty much ignored in thefit.A fairly standard choice for the weights is4w(i)=exp −(x(i)−x)22τ2Note that the weights depend on the particular point x at which we’re trying to evaluate x.Moreover,if|x(i)−x|is small,then w(i)is close to1;and if|x(i)−x|is large,then w(i)is small.Hence,θis chosen giving a much higher“weight”to the(errors on)training examples close to the query point x.(Note also that while the formula for the weights takes a form that is cosmetically similar to the density of a Gaussian distribution,the w(i)’s do not directly have anything to do with Gaussians,and in particular the w(i) are not random variables,normally distributed or otherwise.)The parameter τcontrols how quickly the weight of a training example falls offwith distance of its x(i)from the query point x;τis called the bandwidth parameter,and is also something that you’ll get to experiment with in your homework.Locally weighted linear regression is thefirst example we’re seeing of a non-parametric algorithm.The(unweighted)linear regression algorithm that we saw earlier is known as a parametric learning algorithm,because it has afixed,finite number of parameters(theθi’s),which arefit to the data.Once we’vefit theθi’s and stored them away,we no longer need to keep the training data around to make future predictions.In contrast,to make predictions using locally weighted linear regression,we need to keep the entire training set around.The term“non-parametric”(roughly)refers to the fact that the amount of stuffwe need to keep in order to represent the hypothesis h grows linearly with the size of the training set.4If x is vector-valued,this is generalized to be w(i)=exp(−(x(i)−x)T(x(i)−x)/(2τ2)), or w(i)=exp(−(x(i)−x)TΣ−1(x(i)−x)/2),for an appropriate choice ofτorΣ.16Part IIClassification and logistic regressionLets now talk about the classification problem.This is just like the regression problem,except that the values y we now want to predict take on only a small number of discrete values.For now,we will focus on the binary classification problem in which y can take on only two values,0and1. (Most of what we say here will also generalize to the multiple-class case.) For instance,if we are trying to build a spam classifier for email,then x(i) may be some features of a piece of email,and y may be1if it is a piece of spam mail,and0otherwise.0is also called the negative class,and1 the positive class,and they are sometimes also denoted by the symbols“-”and“+.”Given x(i),the corresponding y(i)is also called the label for the training example.5Logistic regressionWe could approach the classification problem ignoring the fact that y is discrete-valued,and use our old linear regression algorithm to try to predict y given x.However,it is easy to construct examples where this method performs very poorly.Intuitively,it also doesn’t make sense for hθ(x)to take values larger than1or smaller than0when we know that y∈{0,1}.Tofix this,lets change the form for our hypotheses hθ(x).We will choosehθ(x)=g(θT x)=11+e−θT x,whereg(z)=11+e−zis called the logistic function or the sigmoid function.Here is a plot showing g(z):17Notice that g(z)tends towards1as z→∞,and g(z)tends towards0as z→−∞.Moreover,g(z),and hence also h(x),is always bounded between 0and1.As before,we are keeping the convention of letting x0=1,so that θT x=θ0+ n j=1θj x j.For now,lets take the choice of g as given.Other functions that smoothly increase from0to1can also be used,but for a couple of reasons that we’ll see later(when we talk about GLMs,and when we talk about generative learning algorithms),the choice of the logistic function is a fairly natural one.Before moving on,here’s a useful property of the derivative of the sigmoid function, which we write a g′:g′(z)=ddz11+e−z=1 (1+e−z)2 e−z=1(1+e−z)·1−1(1+e−z)=g(z)(1−g(z)).So,given the logistic regression model,how do wefitθfor it?Follow-ing how we saw least squares regression could be derived as the maximum likelihood estimator under a set of assumptions,lets endow our classification model with a set of probabilistic assumptions,and thenfit the parameters via maximum likelihood.18 Let us assume thatP(y=1|x;θ)=hθ(x)P(y=0|x;θ)=1−hθ(x)Note that this can be written more compactly asp(y|x;θ)=(hθ(x))y(1−hθ(x))1−yAssuming that the m training examples were generated independently,we can then write down the likelihood of the parameters asL(θ)=p( y|X;θ)=mi=1p(y(i)|x(i);θ)=mi=1 hθ(x(i)) y(i) 1−hθ(x(i)) 1−y(i)As before,it will be easier to maximize the log likelihood:ℓ(θ)=log L(θ)=mi=1y(i)log h(x(i))+(1−y(i))log(1−h(x(i)))How do we maximize the likelihood?Similar to our derivation in the case of linear regression,we can use gradient ascent.Written in vectorial notation, our updates will therefore be given byθ:=θ+α∇θℓ(θ).(Note the positive rather than negative sign in the update formula,since we’re maximizing, rather than minimizing,a function now.)Lets start by working with just one training example(x,y),and take derivatives to derive the stochastic gradient ascent rule:∂∂θjℓ(θ)= y1g(θT x)−(1−y)11−g(θT x) ∂∂θj g(θT x)= y1g(θT x)−(1−y)11−g(θT x) g(θT x)(1−g(θT x)∂∂θjθT x= y(1−g(θT x))−(1−y)g(θT x) x j=(y−hθ(x))x j19Above,we used the fact that g′(z)=g(z)(1−g(z)).This therefore gives us the stochastic gradient ascent ruleθj:=θj+α y(i)−hθ(x(i)) x(i)jIf we compare this to the LMS update rule,we see that it looks identical;but this is not the same algorithm,because hθ(x(i))is now defined as a non-linear function ofθT x(i).Nonetheless,it’s a little surprising that we end up with the same update rule for a rather different algorithm and learning problem. Is this coincidence,or is there a deeper reason behind this?We’ll answer this when get get to GLM models.(See also the extra credit problem on Q3of problem set1.)6Digression:The perceptron learning algo-rithmWe now digress to talk briefly about an algorithm that’s of some historical interest,and that we will also return to later when we talk about learning theory.Consider modifying the logistic regression method to“force”it to output values that are either0or1or exactly.To do so,it seems natural to change the definition of g to be the threshold function:g(z)= 1if z≥00if z<0If we then let hθ(x)=g(θT x)as before but using this modified definition of g,and if we use the update ruleθj:=θj+α y(i)−hθ(x(i)) x(i)j.then we have the perceptron learning algorithm.In the1960s,this“perceptron”was argued to be a rough model for how individual neurons in the brain work.Given how simple the algorithm is,it will also provide a starting point for our analysis when we talk about learning theory later in this class.Note however that even though the perceptron may be cosmetically similar to the other algorithms we talked about,it is actually a very different type of algorithm than logistic regression and least squares linear regression;in particular,it is difficult to endow the perceptron’s predic-tions with meaningful probabilistic interpretations,or derive the perceptron as a maximum likelihood estimation algorithm.。
斯坦福大学机器学习课件--作业3解答
i=1,...,k
δ ˆi, (a) Prove that with probability at least 1 − 2 , for all h
ˆ i )| ≤ ˆ i) − ε |ε(h ˆScv (h
1 4k log . 2βm δ
ˆ i , the empirical error on the cross-validation set, ε ˆ i ) represents Answer: For each h ˆ(h ˆ the average of βm random variables with mean ε(hi ), so by the Hoeffding inequality for ˆ i, any h ˆ i )| ≥ γ ) ≤ 2 exp(−2γ 2 βm). ˆi) − ε P (|ε(h ˆScv (h ˆ i , we need to take the union over As in the class notes, to insure that this holds for all h ˆ all k of the hi ’s. ˆ i )| ≥ γ ) ≤ 2k exp(−2γ 2 βm). ˆi) − ε P (∃i, s.t.|ε(h ˆScv (h
CS229 Problem Set #3 Solutions
1
CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
1. Uniform convergence and Model Selection In this problem, we will prove a bound on the error of a simple model selection procedure. Let there be a binary classification problem with labels y ∈ {0, 1}, and let H1 ⊆ H2 ⊆ . . . ⊆ Hk be k different finite hypothesis classes (|Hi | < ∞). Given a dataset S of m iid training examples, we will divide it into a training set Strain consisting of the first (1 − β )m examples, and a hold-out cross validation set Scv consisting of the remaining βm examples. Here, β ∈ (0, 1). ˆ i = arg minh∈H ε (h) be the hypothesis in Hi with the lowest training error ˆ Let h i Strain ˆ (on Strain ). Thus, hi would be the hypothesis returned by training (with empirical risk minimization) using hypothesis class Hi and dataset Strain . Also let h⋆ i = arg minh∈Hi ε(h) be the hypothesis in Hi with the lowest generalization error. ˆ i ’s using empirical risk minimization then Suppose that our algorithm first finds all the h ˆ 1, . . . , h ˆ k } with uses the hold-out cross validation set to select a hypothesis from this the {h minimum training error. That is, the algorithm will output ˆ = arg h
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
P (Ai )
These three properties are called the Axioms of Probability. Example: Consider the event of tossing a six-sided die. The sample space is Ω = {1, 2, 3, 4, 5, 6}. We can define different event spaces on this sample space. For example, the simplest event space is the trivial event space F = {∅, Ω}. Another event space is the set of all subsets of Ω. For the first event space, the unique probability measure satisfying the requirements above is given by P (∅) = 0, P (Ω) = 1. For the second event space, one valid probability measure is to assign the i probability of each set in the event space to be 6 where i is the number of elements of that set; for 4 3 example, P ({1, 2, 3, 4}) = 6 and P ({1, 2, 3}) = 6 . Properties: If A ⊆ B =⇒ P (A) ≤ P (B ). P (A ∩ B ) ≤ min(P (A), P (B )). (Union Bound) P (A ∪ B ) ≤ P (A) + P (B ). P (Ω \ A) = 1 − P (A). (Law of Total Probability) If A1 , . . . , Ak are a set of disjoint events such that ∪k i=1 Ai = Ω, then k i=1 P (Ak ) = 1.
2
Random variables
Consider an experiment in which we flip 10 coins, and we want to know the number of coins that come up heads. Here, the elements of the sample space Ω are 10-length sequences of heads and tails. For example, we might have w0 = H, H, T, H, T, H, H, T, T, T ∈ Ω. However, in practice, we usually do not care about the probability of obtaining any particular sequence of heads and tails. Instead we usually care about real-valued functions of outcomes, such as the number of heads that appear among our 10 tosses, or the length of the longest run of tails. These functions, under some technical conditions, are known as random variables. More formally, a random variable X is a function X : Ω −→ R.2 Typically, we will denote random variables using upper case letters X (ω ) or more simply X (where the dependence on the random outcome ω is implied). We will denote the value that a random variable may take on using lower case letters x. Example: In our experiment above, suppose that X (ω ) is the number of heads which occur in the sequence of tosses ω . Given that only 10 coins are tossed, X (ω ) can take only a finite number of values, so it is known as a discrete random variable. Here, the probability of the set associated with a random variable X taking on some specific value k is P (X = k ) := P ({ω : X (ω ) = k }). Example: Suppose that X (ω ) is a random variable indicating the amount of time it takes for a radioactive particle to decay. In this case, X (ω ) takes on a infinite number of possible values, so it is called a continuous random variable. We denote the probability that X takes on a value between two real constants a and b (where a < b) as P (a ≤ X ≤ b) := P ({ω : a ≤ X (ω ) ≤ b}). 2.1 Cumulative distribution functions
1
Elements of probability
In order to define a probability on a set we need a few basic elements, • Sample space Ω: The set of all the outcomes of a random experiment. Here, each outcome ω ∈ Ω can be thought of as a complete description of the state of the real world at the end of the experiment. • Set of events (or event space) F : A set whose elements A ∈ F (called events) are subsets of Ω (i.e., A ⊆ Ω is a collection of possible outcomes of an experiment).1 . • Probability measure: A function P : F → R that satisfies the following properties, - P (A) ≥ 0, for all A ∈ F - P (Ω) = 1 - If A1 , A2 , . . . are disjoint events (i.e., Ai ∩ Aj = ∅ whenever i = j ), then P (∪i Ai ) =
Let B be an event with non-zero probability. The conditional probability of any event A given B is defined as, P (A ∩ B ) P (A|B ) P (B ) In other words, P (A|B ) is the probability measure of the event A after observing the occurrence of event B . Two events are called independent if and only if P (A ∩ B ) = P (A)P (B ) (or equivalently, P (A|B ) = P (A)). Therefore, independence is equivalent to saying that observing B does not have any effect on the probability of A.
In order to specify the probability measures used when dealing with random variables, it is often convenient to specify alternative functions (CDFs, PDFs, and PMFs) from which the probability measure governing an experiment immediately follows. In this section and the next two sections, we describe each of these types of functions in turn. A cumulative distribution function (CDF) is a function FX : R → [0, 1] which specifies a probability measure as, FX (x) P (X ≤ x). (1) 3 By using this function one can calculate the probability of any event in F . Figure ?? shows a sample CDF function. Properties: