哈工大机器学习历年考试

合集下载

哈工大08机器学习试题

哈工大08机器学习试题

2008年春硕士研究生机器学习试题下列各题每个大题10分注意:在给出算法时,非标准(自己设计的)部分应给出说明。

特别是自己设置的参数及变量的意义要说明。

1.请看以下的正例和反例序列,它们描述的概念是“两个住在同一个房间中的人”。

每个训练样例描述了一个有序对,每个人由其性别(male、female)、头发颜色(black、brown)、身高(tall、short)以及国籍(US、French)描述。

+<<male brown tall US>,<female black short US>>+<<male brown short French>,<female black short US>>-<<female brown tall French>,<female black short French>>+<<male black tall French>,<female brown tall US>>考虑在这些实例上定义的假设空间为:所有假设以一对4元组表示,其中每个值约束可以为:一个特定值(比如male、tall等)、?(表示接受任意值)和 (表示拒绝所有值)。

例如,下面假设:<<male ? tall ?>,<female ? ? French>>它表示了所有这样的序对:第一个人为高个男性(国籍发色任意),第二个人为法国女性(发色和身高任意)。

1)根据上述提供的训练样例和假设表示,手动执行候选消除算法。

特别是要写出处理了每一个训练样例后变型空间的特殊和一般边界;2)列出最后形成的变型空间中的所有假设。

2.假设一个神经网络有一个隐藏层(有一个隐藏层的神经网络由一个输入层、一个隐藏层、一个输出层组成),写出训练这个神经网络的反向传播算法的步骤。

机器学习基础期末考试试题

机器学习基础期末考试试题

机器学习基础期末考试试题一、选择题(每题2分,共20分)1. 在机器学习中,下列哪个算法属于监督学习算法?A. 决策树B. K-meansC. 遗传算法D. 随机森林2. 以下哪个是线性回归的假设条件?A. 特征之间相互独立B. 特征与目标变量之间存在非线性关系C. 目标变量的误差项服从正态分布D. 所有特征都是类别型变量3. 支持向量机(SVM)的主要目标是什么?A. 找到数据点之间的最大间隔B. 减少模型的复杂度C. 增加模型的泛化能力D. 所有选项都正确4. 在深度学习中,卷积神经网络(CNN)通常用于处理哪种类型的数据?A. 音频数据B. 图像数据C. 文本数据D. 时间序列数据5. 交叉验证的主要目的是:A. 减少模型的过拟合B. 增加模型的复杂度C. 减少训练集的大小D. 增加模型的运行时间二、简答题(每题10分,共30分)6. 解释什么是过拟合,并给出一个避免过拟合的策略。

7. 描述随机森林算法的基本原理,并简述其相对于决策树的优势。

8. 解释梯度下降算法的工作原理,并说明为什么它在优化问题中如此重要。

三、计算题(每题25分,共50分)9. 假设你有一个线性回归模型,其目标函数为 \( J(\theta) =\frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2 \),其中 \( h_\theta(x) = \theta_0 + \theta_1 x_1 + \theta_2x_2 \)。

给定以下数据点:\[\begin{align*}x_1 & : [1, 2, 3] \\x_2 & : [1, 3, 4] \\y & : [2, 4, 5]\end{align*}\]请计算该模型的损失函数 \( J(\theta) \)。

10. 给定一个二分类问题的数据集,使用逻辑回归模型进行分类。

如果模型的决策边界是 \( w_1 x_1 + w_2 x_2 - \theta = 0 \),其中\( w_1 = 0.5 \),\( w_2 = -1 \),\( \theta = 0.5 \)。

机器学习考试题目及答案

机器学习考试题目及答案

机器学习考试题目答案1.简描述机器学习概念?TomMitCheI1:"对于某类任务T和性能度量P,如果一个计算机程序在T上以P衡量的性能随着经验E而自我完善,那么我们称这个计算机程序在从经验E学习J 我们遇到的大部分事情一般包括分类问题与回归问题。

如房价的预测,股价的预测等属于分类问题。

一般的处理过程是:首先,1)获取数据;2)提取最能体现数据的特征;3)利用算法建模;4)将建立的模型用于预测。

如人脸识别系统,首先我们获取到一堆人脸照片,首先,对数据进行预处理,然后提取人脸特征,最后用算法如SVM或者NN等。

这样,我们就建立了一个人脸识别系统,当输入一张人脸,我们就知道这张面孔是否在系统中。

这就是机器学习的整个流程,其次还包括寻找最优参数等。

机器学习主要分为:监督学习:数据集是有标签的,大部分机器学习模型都属于这一类别,包括线性分类器、支持向量机等等;无监督学习:跟监督学习相反,数据集是完全没有标签的,主要的依据是相似的样本在数据空间中一般距离是相近的,这样就能通过距离的计算把样本分类,这样就完全不需要IabeI,比如著名的kmeans算法就是无监督学习应用最广泛的算法;半监督学习:半监督学习一般针对的问题是数据量超级大但是有标签数据很少或者说标签数据的获取很难很贵的情况,训练的时候有一部分是有标签的而有一部分是没有的;强化学习:一直激励学习的方式,通过激励函数来让模型不断根据遇到的情况做出调整;2.循环神经网络的基本原理?RNNS的目的是用来处理序列数据。

在传统的神经网络模型中,是从输入层到隐含层再到输出层,层与层之间是全连接的,每层之间的节点是无连接的。

但是这种普通的神经网络对于很多问题却无能无力。

例如,你要预测句子的下一个单词是什么,一般需要用到前面的单词,因为一个句子中前后单词并不是独立的。

RNNS之所以称为循环神经网路,即一个序列当前的输出与前面的输出也有关。

具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐藏层之间的节点不再无连接而是有连接的,并且隐藏层的输入不仅包括输入层的输出还包括上一时刻隐藏层的输出。

1_机器学习2023上学期考试题

1_机器学习2023上学期考试题

1. 机器学习2023上学期考试题一、背景介绍机器学习是一门涉及使计算机具备自我学习能力的人工智能学科。

它通过利用大量的数据和算法,使计算机能够从中学习并改进自身的性能。

机器学习的应用广泛用于自然语言处理、图像识别、数据挖掘、预测分析等领域。

本次考试将考察机器学习的基本概念、算法和应用。

请认真阅读每个问题,并给出您的答案。

二、问题1.什么是机器学习?它的主要任务是什么?2.请简要解释无监督学习和有监督学习这两个概念,并举例说明。

3.什么是决策树算法?请说明其原理和应用场景。

4.请解释朴素贝叶斯分类器的原理,并说明其在文本分类任务中的应用。

5.什么是神经网络?请描述神经网络的结构以及反向传播算法的基本原理。

6.请简要介绍深度学习的概念,并说明与传统机器学习的区别。

7.什么是聚类算法?请举例说明一个常用的聚类算法,并简要描述其原理和应用场景。

8.请解释支持向量机(SVM)算法的原理,并说明其在图像识别中的应用。

9.什么是强化学习?请说明其关键概念和基本原理,并描述一个实际应用场景。

10.请简要介绍深度学习中常用的激活函数有哪些,以及它们的特点和应用场景。

三、参考答案1.机器学习是一种人工智能的方法论,通过利用大量的数据和算法,使计算机能够自动从中学习并改进性能。

它的主要任务是利用经验数据来训练模型,然后使用这些模型来进行预测、分类、识别等任务。

2.无监督学习是一种不依赖于标签的机器学习方法,它试图从数据中找到隐藏的结构或模式。

常见的无监督学习算法包括聚类和降维。

例如,K-means聚类算法可以将数据集划分成不同的类别。

有监督学习则是一种依赖于标签的学习方法,通过将输入和输出的关系映射到一个函数来训练模型。

例如,线性回归是一种有监督学习算法,它可以根据输入的特征来预测输出的值。

3.决策树算法是一种基于树形结构的分类模型,它通过一系列的判断选择来对数据进行分类。

决策树的原理是将数据集根据属性值的不同特征进行划分,直到每个子集都属于同一类别或达到停止条件。

2022机器学习专项测试试题及答案

2022机器学习专项测试试题及答案

2022机器学习专项测试试题及答案1.机器学习的流程包括:分析案例、数据获取、________和模型验证这四个过程。

()A.数据清洗A、数据清洗B.数据分析C.模型训练(正确答案)D.模型搭建2.机器翻译属于下列哪个领域的应用?() *A.自然语言系统(正确答案)A. 自然语言系统(正确答案)B.机器学习C.专家系统D.人类感官模拟3.为了解决如何模拟人类的感性思维, 例如视觉理解、直觉思维、悟性等, 研究者找到一个重要的信息处理的机制是()。

*A.专家系统B.人工神经网络(正确答案)C.模式识别D.智能代理4.要想让机器具有智能, 必须让机器具有知识。

因此, 在人工智能中有一个研究领域, 主要研究计算机如何自动获取知识和技能, 实现自我完善, 这门研究分支学科叫()。

*A. 专家系统A.专家系统B. 机器学习(正确答案)C. 神经网络D. 模式识别5.如下属于机器学习应用的包括()。

*A.自动计算, 通过编程计算 456*457*458*459 的值(正确答案)A. 自动计算,通过编程计算 456*457*458*459 的值(正确答案)A.自动计算,通过编程计算 456*457*458*459 的值(正确答案)B.文字识别, 如通过 OCR 快速获得的图像中出汉字, 保存为文本C.语音输入, 通过话筒将讲话内容转成文本D.麦克风阵列, 如利用灵云该技术实现远场语音交互的电视6.对于神经网络模型, 当样本足够多时, 少量输入样本中带有较大的误差甚至个别错误对模型的输入-输出映射关系影响很小, 这属于()。

*A. 泛化能力A.泛化能力B. 容错能力(正确答案)C. 搜索能力D. 非线性映射能力7.下列选项不属于机器学习研究内容的是() *A. 学习机理A.学习机理B. 自动控制(正确答案)C. 学习方法D. 计算机存储系统8.机器学习的经典定义是: () *A.利用技术进步改善系统自身性能A. 利用技术进步改善系统自身性能B.利用技术进步改善人的能力C.利用经验改善系统自身的性能(正确答案)D.利用经验改善人的能力9.研究某超市销售记录数据后发现, 买啤酒的人很大概率也会购买尿布, 这种属于数据挖掘的那类问题()。

机器学习基础知识试题

机器学习基础知识试题

机器学习基础知识试题一、选择题1. 机器学习的主要目标是什么?A. 让机器能够像人一样思考B. 让机器能够自动学习C. 提高计算机的运算速度D. 使机器具备无限的记忆能力2. 哪个是监督学习的主要特点?A. 需要标记好的训练数据B. 无需人工干预C. 机器能独立学习D. 只能处理分类问题3. 以下哪个属于非监督学习?A. 图像分类B. 垃圾邮件过滤C. 聚类分析D. 情感分析4. 在机器学习中,过拟合指的是什么?A. 模型无法适应新的数据B. 模型在训练集上表现较好,在测试集上表现较差C. 模型无法收敛D. 模型的准确率低5. 以下哪个是机器学习中常用的性能评估指标?A. 准确率B. 召回率C. F1值D. 所有选项都正确二、填空题1. 机器学习是一门研究怎样使计算机能够__________的科学。

2. 监督学习中,训练数据包括__________和__________。

3. __________是一种无监督学习算法,用于将数据分成相似的组或簇。

4. 过拟合是指模型在训练集上过度学习,导致在测试集上_____________。

5. 准确率是用来评估__________模型性能的指标。

三、简答题1. 请简要解释机器学习中的模型训练过程。

2. 什么是特征工程?为什么它在机器学习中很重要?3. 请解释交叉验证的概念及其作用。

4. 解释机器学习中的偏差和方差之间的关系。

5. 什么是集成学习?如何应用于机器学习中?四、应用题假设你是一个房地产公司的数据科学家,公司希望使用机器学习模型来预测未来一年的房屋价格。

你被要求开发一个模型,基于房屋的相关特征,帮助公司预测房屋的售价。

1. 请列举至少五个可能有用的特征,用于训练模型。

2. 你认为是分类问题还是回归问题?为什么?3. 你将如何评估你开发的模型的性能?4. 请描述你将如何使用交叉验证来提高模型的泛化能力。

5. 除了单一的机器学习模型,你可以考虑使用哪些集成学习方法来提高预测性能?答案:一、选择题1. B2. A3. C4. B5. D二、填空题1. 自动学习2. 特征、标签3. 聚类分析4. 表现较差5. 分类器三、简答题1. 模型训练过程包括选择合适的算法和模型结构、准备训练数据、使用训练数据对模型进行训练、评估模型性能以及根据评估结果调整模型参数。

2023年6月机器学习考试题及答案

2023年6月机器学习考试题及答案

2023年6月机器学习考试题及答案考试题目1. 什么是机器研究?2. 请简要说明监督研究和无监督研究的区别。

3. 什么是过拟合?如何避免过拟合?4. 请解释什么是决策树,并列举一些常用的决策树算法。

5. 什么是集成研究?列举两种常见的集成研究方法。

6. 请解释支持向量机(SVM)的工作原理。

7. 什么是深度研究?列举两个常用的深度研究模型。

8. 请简要介绍一下主成分分析(PCA)的原理和应用领域。

9. 什么是聚类分析?请列举一个常用的聚类算法。

10. 请说明机器研究中的特征选择方法。

答案1. 机器研究是一种人工智能的分支,旨在通过使用算法和统计模型,使计算机能够从数据中研究和改进,而无需明确编程。

它涉及让计算机从经验中自动研究,并利用研究到的知识来进行决策和预测。

3. 过拟合指模型在训练集上表现很好,但在新数据上表现较差的现象。

为了避免过拟合,可以采用以下方法:- 使用正则化技术,如L1正则化和L2正则化,限制模型的复杂度。

- 进行特征选择,排除一些对模型泛化能力影响较大的特征。

4. 决策树是一种基于树结构的分类和回归模型,它代表了对数据进行决策的过程。

常见的决策树算法包括ID3、C4.5和CART。

5. 集成研究是一种使用多个研究器进行组合预测的方法。

常见的集成研究方法包括随机森林和梯度提升树。

6. 支持向量机(SVM)是一种二分类模型,其工作原理是将数据映射到高维空间,在高维空间中找到一个最优超平面来分割不同类别的数据点。

7. 深度研究是一种基于神经网络的机器研究方法,它通过多层次的非线性变换来研究和表示数据。

常见的深度研究模型包括卷积神经网络(CNN)和循环神经网络(RNN)。

8. 主成分分析(PCA)是一种常用的降维技术,它通过线性变换将原始数据映射到低维空间,保留数据集中的主要特征。

主成分分析在数据预处理、图像处理和模式识别等领域有广泛的应用。

9. 聚类分析是一种将数据点划分为不同组别或类别的无监督研究方法。

机器学习考试试题

机器学习考试试题

机器学习考试试题一、选择题(每题 3 分,共 30 分)1、以下哪种情况不属于机器学习的应用场景?()A 图像识别B 自然语言处理C 传统的数值计算D 预测股票价格2、在监督学习中,如果预测值与真实值之间的差异较大,通常使用以下哪种方法来衡量模型的性能?()A 准确率B 召回率C 均方误差D F1 值3、下列哪种算法不是聚类算法?()A KMeansB 决策树C 层次聚类D 密度聚类4、对于一个过拟合的模型,以下哪种方法可以缓解?()A 增加训练数据量B 减少模型的复杂度C 增加正则化项D 以上都是5、以下关于特征工程的描述,错误的是?()A 特征工程是将原始数据转换为更有意义和有用的特征的过程B 特征选择是特征工程的一部分C 特征工程对于机器学习模型的性能影响不大D 特征缩放可以提高模型的训练效率6、在深度学习中,以下哪个不是常见的激活函数?()A Sigmoid 函数B ReLU 函数C Tanh 函数D Logistic 函数7、支持向量机(SVM)主要用于解决什么问题?()A 回归问题B 分类问题C 聚类问题D 降维问题8、以下哪种优化算法常用于神经网络的训练?()A 随机梯度下降(SGD)B 牛顿法C 共轭梯度法D 以上都是9、下面关于集成学习的说法,错误的是?()A 随机森林是一种集成学习算法B 集成学习可以提高模型的稳定性和泛化能力C 集成学习中的个体学习器必须是同一种类型的模型D 集成学习通过组合多个弱学习器来构建一个强学习器10、对于一个二分类问题,若混淆矩阵如下:||预测正例|预测反例||||||实际正例| 80 | 20 ||实际反例| 10 | 90 |则该模型的准确率是多少?()A 80%B 90%C 70%D 85%二、填空题(每题 3 分,共 30 分)1、机器学习中的有监督学习包括________、________和________等任务。

2、常见的无监督学习算法有________、________和________。

哈工大深圳机器学习复习3

哈工大深圳机器学习复习3

1.What is Machine LearningA Computer program can improve its performance automatically with experience2.Learning DefinitionA computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experienceLearning: improving with experience at some taskImprove over task TWith respect to performance measure PBased on experience EExample:A checkers learning problem:T: play checkersP: percentage of games won in a tournament(锦标赛)E: opportunity to play against itselfHandwriting recognition learning problemT: recognizing and classifying handwritten words within imagesP: percent of words correctly classifiedE: a database of handwritten words with given classificationsA robot driving learning problemT: driving on public four-lane highways using vision sensorsP: average distance traveled before an error (as judged by human overseer)E: a sequence of images and steering commands recorded while observing a human driver 3.Candidate-Elimination Learning AlgorithmIf d is a negative example(先处理S)Remove from S any hypothesis that is inconsistent with dFor each hypothesis g in G that is not consistent with dremove g from G.Add to G all minimal specializations h of g such thath consistent with dSome member of S is more specific than hRemove from G any hypothesis that is less general than another hypothesis in GInductive bias of candidate-elimination algorithm{c H}Sky: Sunny, Cloudy, RainyAirTemp: Warm, ColdHumidity: Normal, HighWind: Strong, WeakWater: Warm, ColdForecast: Same, Change#distinct instances : 3*2*2*2*2*2 = 96#distinct concepts : 296#syntactically(语法)distinct hypotheses : 5*4*4*4*4*4=5120#semantically(语义的)distinct hypotheses : 1+4*3*3*3*3*3=973Example Candidate Elimination4. Information Gain is the expected reduction in entropy caused by partitioning the examples according to this attribute5.Gradient DescentTwo hypotheses: patient has cancer, patient does notLaboratory test with two possible outcomes: + and -Prior knowledge: only 0.008 have this disease.The test returns a correct positive result in only 98% of the cases in which the disease is present The test returns a correct negative result in only 97% of the cases in which the disease is not presentIn summaryP(cancer)=0.008, P( cancer)=0.992P(+|cancer)=0.98, P(-|cancer)=0.02P(+| cancer)=0.03, P(-| cancer)=0.97problem:A patient for whom the test returns a positive result. The patient has cancer or not? MAP can be found using Equation(6.2)P(+|cancer)P(cancer)=0.0078P(+| cancer)P( cancer)=0.0298hMAP= cancerP(canner|+)=0.0078/(0.0078+0.0298)=0.21P( cancer|-)=0.79The result of Bayesian inference depends strongly on the prior probabilities. In this example the hypotheses are not completely accepted or rejected, but rather become more or less probable as more data is observed.7.EMGA(Fitness, fitness_threshold, p,r,m)initialize population:P=Generate p hypotheses at randomEvaluate:For each h in P,compute Fitness(h)dowhile[maxFitness(h)]<Fitness_thresholdcreate a new generation, PS:probabilistically select (1-r)p members of P to add to PS. The probability Pr(hi) 1.Select:of selecting hypothesis hi from P is given by/∑Fitness(hj)Pr(hi)=Fitness(hi)pairs of hypotheses from P,according to Pr(hi)r*p/22.Crossover:Probabilisticallyselectgiven above . For each pair, <h1,h2>, produce two offspring by applying the crossover operator. Add all offspring to PS. 3. Mutate: choose m percent of the members of PS with uniform probability. For each, invert one randomly selected bit in its representation.update:P=PS4.5.Evaluate: for each h in P, compute Fitness(h)Return the hypothesis from P that has the highest fitness9. Fitness Function and SelectionThe fitness function defines the criterion for ranking potential hypothesesIf the task is to learn classification rules, then the fitness function typically has a component that scores the classification accuracy (complexity or generality) of the rule over a set of provided training examples.A probability method for selecting a hypothesisFitness proportionate selection(roulette whell selection) ,the probability of selecting a h is gibven by the ratio of its fitness to the fitness of other member of the current populationTournament selectionRank selection。

机器学习笔试题汇总

机器学习笔试题汇总

机器学习笔试题汇总⽂章⽬录树1、在以下集成学习模型的调参中,哪个算法没有⽤到学习率learning rate? BA.XGboostB.随机森林Random ForestC.LightGBMD.Adaboost分析:其他三个都是基于梯度的算法,有梯度基本都有学习率,详细的可以去看看他们的更新公式。

2、在集成学习两⼤类策略中,boosting和bagging如何影响模型的偏差(bias)和⽅差(variance)?CA. boosting和bagging均使得⽅差减⼩B. boosting和bagging均使得偏差减⼩C. boosting使得偏差减⼩,bagging使得⽅差减⼩D. boosting使得⽅差减⼩,bagging使得偏差减⼩3、梯度提升决策树(GBDT)是在⼯业界和竞赛⾥最常⽤的模型之⼀,Xgboost和Lightgbm均是改进版的GBDT模型。

关于调整参数缓解过拟合,以下说法正确的是:C1、增⼤正则化参数2、减⼩树数量tree numbers3、减⼩⼦采样⽐例subsample4、增⼤树深度max_depthA.1、2、3B.1、2、4C.1、3、4D.2、3、4分析:树越多越不会过拟合;树的深度,越深代表模型越复杂,越容易过拟合;减⼩⼦采样⽐例subsample,类似神经⽹络⾥⾯的dropout,能缓解过拟合。

2叉和3叉的区别1、稳定不⼀样,⼆叉树鲁棒性更强2、3叉⾼阶组合少了,⼆叉树表达能⼒更强3、男⼥这种特证做三叉树不好做xgboost相对于GBDT的改进?lt相对于xgboost的改进?特征⼯程1、特征选择(Feature selection)对于机器学习任务是⼗分重要的,是解决维度灾难的有效⽅法。

以下关于特征选择算法的说法不正确的是? DA. 过滤式⽅法(Filter)通常是独⽴地选择特征,这可能会忽略特征组合之间的相关性。

B. 封装式⽅法(Wrapper)可能所选特征效果好,但是时间复杂度通常⾮常⾼。

哈工程计算机试题

哈工程计算机试题

哈尔滨工程大学计算机考研笔记与真题一填空(每空一分,共14分)1 数据元素是数据结构的基本单位,数据项是数据的不可分割的最小单位。

2 深度是k的完全二叉树至少有2^(k-1)个结点,至多有2^k-1个结点。

3 哈希表的查找效率主要取决于造表时选取的哈希函数和处理冲突的方法。

4 对100个记录进行折半查找,最多比较7次,最少比较1次。

5 有n个顶点的无向图,最少有0条边,最多有n(n-1)/2条边。

6 AOE网中,从源点到汇点的最长路径上的活动叫做关键活动。

有环的图不能进行拓扑排序。

7 对于堆排序,常用的建堆算法是筛选法,堆的形状是一棵完全二叉树。

二判断题(每小题1分,共5分)1 线性表的链式存储结构优于顺序存储结构。

错2 链表的每个节点中都帢包含一个指针。

错例如双向链表3 栈和队列都是顺序存储结构的线性结构。

错链栈4 若数的度为2时,则该树为二叉树。

错5 若广义表中的每个元素都是原子,则广义表为线性表。

对三问答题(每小题4分,共16分)1 一棵3阶4层(根为第一层,叶子为第四层)的B-树,至少有多少个关键字,至多有多少个关键字?答:7个 26个2 利用栈秋表达式((A-B)-C)-(D-(E-F)) 的值,运算符栈和操作数栈各必须具有多少项?答:5项 4项3 以行序为主序存储10阶对称矩阵A,采用下三角的压缩存储方式,若起始地址是d,则A85的存储地址是多少?答:32+d4 设哈希表中以存在无个记录(如图一所示)。

哈希函数为H(K)=K MOD 11,用二次探测再散列处理冲突。

请问关键字为94的记录的存储地址是多少?0 1 2 3 4 5 6 7 8 9 10图一45 16 39 62 76答:存储地址是 2四综合题(每小题5分,共35分)1 给定一组权值{9,6,14,17,2,15,3,16},请构造哈夫曼树,并计算其带权路径长度。

答:带权路径长度1862 已知二叉树的先序遍历的结果为ABCDEFGHIJ,中序遍历的结果为CBEDAHGIJF,请画出这颗二叉树。

机器学习期末考试试题

机器学习期末考试试题

机器学习期末考试试题# 机器学习期末考试试题## 一、选择题(每题2分,共20分)1. 机器学习中的监督学习主要解决的问题类型是: - A. 回归问题- B. 分类问题- C. 聚类问题- D. 以上都是2. 下列哪个算法不是用于分类的:- A. 决策树- B. 支持向量机- C. K-means- D. 逻辑回归3. 在神经网络中,激活函数的作用是:- A. 增加计算复杂度- B. 引入非线性- C. 减少训练时间- D. 降低模型的泛化能力4. 交叉验证的主要目的是:- A. 加速模型训练- B. 减少模型过拟合- C. 增加数据量- D. 减少计算资源消耗5. 下列哪个不是深度学习模型:- A. 卷积神经网络(CNN)- B. 循环神经网络(RNN)- C. 随机森林- D. 长短期记忆网络(LSTM)## 二、简答题(每题10分,共30分)1. 请简述机器学习中的过拟合现象及其可能的解决方案。

2. 解释什么是特征工程,并说明其在机器学习中的重要性。

3. 描述一下什么是模型的泛化能力,并举例说明如何评估一个模型的泛化能力。

## 三、计算题(每题15分,共30分)1. 给定一个线性回归模型 \( y = \beta_0 + \beta_1 x_1 +\epsilon \),其中 \( \epsilon \) 服从均值为0的正态分布。

假设我们有以下数据点:- \( x_1 = [1, 2, 3, 4, 5] \)- \( y = [2, 4, 5, 4, 5] \)- 请计算最小二乘法估计的参数 \( \beta_0 \) 和 \( \beta_1 \)。

2. 假设有一个简单的二分类问题,我们使用逻辑回归模型进行分类。

给定以下数据点和对应的标签:- 特征:\( [x_1, x_2] = [[2, 1], [3, 0], [1, 1], [4, 1]] \) - 标签:\( y = [1, 0, 1, 0] \)- 请写出逻辑回归的假设函数 \( h(x) \),并计算使用梯度下降法更新参数的一次迭代过程。

哈工大机器学习历年考试

哈工大机器学习历年考试

1 Give the definitions or your comprehensions of the following terms.(12’) 1.1 The inductive learning hypothesis P171.2 Overfitting P491.4 Consistent learner P1482 Give brief answers to the following questions.(15’)2.2 If the size of a version space is ||VS , In general what is the smallest number of queries may be required by a concept learner using optimal query strategy to perfectly learn the target concept? P272.3 In genaral, decision trees represent a disjunction of conjunctions of constrains on the attribute values of instanse,then what expression does the following decision tree corresponds to ?3 Give the explaination to inductive bias, and list inductive bias of CANDIDATE-ELIMINATION algorithm, decision tree learning(ID3), BACKPROPAGATION algorithm.(10’)4 How to solve overfitting in decision tree and neural network?(10’) Solution:● Decision tree:◆ 及早停止树增长(stop growing earlier) ◆ 后修剪法(post-pruning) ● Neural Network◆ 权值衰减(weight decay) ◆ 验证数据集(validation set) 5 Prove that the LMS weight update rule ^(()())i i train i V b V b x ωωη←+-performs a gradient descent to minimize the squared error. In particular, define the squared error E as in the text. Now calculate the derivative of E with respect to the weight i ω, assuming that ^()V b is a linear function as defined in the text. Gradient descent is achieved by updating each weight in proportionto iEω∂-∂. Therefore, you must show that the LMS training rule alters weights in this proportion for each training example it encounters. ( ^2,() (()())train train b V b training exampleE V b V b 〈〉∈≡-∑) (8’)Solution :As Vtrain(b) ˆV(Successor(b)) we can get E=2ˆ(()())trainVb V b -∑ =ˆ2(()())train iV b V b x -- As mentioned in LMS:ˆ(()())i i train iV b V b x ωωη←+- We can get (/)i i i E w ωωη←+-∂∂Therefore, gradient descent is achievement by updating each weight in proportion to /i E w -∂∂; LMS rules alters weights in this proportion for each training example it encounters.6 True or false: if decision tree D2 is an elaboration of tree D1, then D1 is more-general-than D2. Assume D1 and D2 are decision trees representing arbitrary boolean funcions, and that D2 is an elaboration of D1 if ID3 could extend D1 to D2. If true give a proof; if false, a counter example. (Definition: Let j h and k h be boolean-valued functions defined over X .then j h is more_general_than_or_equal_tokh (writtenj g kh h ≥) If and only if()[(()1)(()1)]k j x X h x h x ∀∈=→= then ()()j k j g k k g j h h h h h h >⇔≥∧≥) (10’)The hypothesis is false.One counter example is A XOR Bwhile if A!=B, training examples are all positive, while if A==B, training examples are all negative,then, using ID3 to extend D1, the new tree D2 will be equivalent to D1, i.e., D2 is equal to D1. 7 Design a two-input perceptron that implements the boolean function A B ∧⌝.Design a two-layer network of perceptrons that implements A XOR B . (10’)8 Suppose that a hypothesis space containing three hypotheses, 1h ,2h ,3h , and the posterior probabilities of these typotheses given the training data are 0.4, 0.3 and 0.3 respectively. And if a new instance x is encountered, which is classified positive by 1h , but negative by 2h and3h ,then give the result and detail classification course of Bayes optimal classifier.(10’)P1259 Suppose S is a collection of training-example days described by attributes including Humidity, which can have the values High or Normal. Assume S is a collection containing 10 examples, [7+,3-]. Of these 10 examples, suppose 3 of the positive and 2 of the negative examples have Humidity = High, and the remainder have Humidity = Normal. Please calculate the information gain due to sorting the original 10 examples by the attribute Humidity.( log 21=0, log 22=1, log 23=1.58, log 24=2, log 25=2.32, log 26=2.58, log 27=2.8, log 28=3, log 29=3.16, log 210=3.32, ) (5’)Solution :(a)Here we denote S=[7+,3-],then Entropy([7+,3-])= 227733log log 10101010-- =0.886; (b)i v v values(Humidity )Gain(S,Humidity)=Entropy(S)-Entropy(S )vS S∈∑Gain(S,a2)Values(Humidity )={High, Normal}∴{|()}High S s S Humidity s High =∈=223322Entropy()=-log -log 0.9725555High S =,5High S ==4∴ 224411Entropy()=-log -log 0.725555Normal S = ,Normal S =5Thus Gain (S,Humidity)=0.886-55(0.972*0.72)1010⨯+=0.0410 Finish the following algorithm. (10’)(1) GRADIENT-DESCENT(training examples, η)Each training example is a pair of the form ,x t , where x is the vector of input values, and t is the target output value. η is the learning rate (e.g., 0.05). ● Initialize each i ω to some small random value ● Until the termination condition is met, Do● Initialize each i ω∆ to zero.● For each ,x t in training_examples, Do● Input the instance x to the unit and compute the output o ● For each linear unit weight i ω, Do ● For each linear unit weight i ω, Do(2) FIND-S Algorithm● Initialize h to the most specific hypothesis in H●For each positive training instance x●For each attribute constraint a i in hIfThendo nothingElsereplace a i in h by the next more general constraint that is satisfied by x●Output hypothesis h1.What is the definition of learning problem?(5)Use “a checkers learning problem” as an example to state how to design a learning system.(15)Answer:A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience. (5) Example:A checkers learning problem:T: play checkers(1)P: percentage of games won in a tournament (1) E: opportunity to play against itself (1) To design a learning system:Step 1: Choosing the Training Experience(4)A checkers learning problem:Task T: playing checkersPerformance measure P: percent of games won in the world tournamentTraining experience E: games played against itselfIn order to complete the design of the learning system, we must now choose1. the exact type of knowledge to be learned2. a representation for this target knowledge3. a learning mechanismStep 2: Choosing the Target Function (4)1. if b is a final board state that is won, then V(b)=1002. if b is a final board state that is lost, then V (b)=-1003. if b is a final board state that is drawn, then V (b)=04. if b is a not a final state in the game, then V(b)=V (b'), where b' is the bestfinal board state that can be achieved starting from b and playing optimallyuntil the end of the game (assuming the opponent plays optimally, as well).Step 3: Choosing a Representation for the Target Function (4) x1: the number of black pieces on the boardx2: the number of red pieces on the boardx3: the number of black kings on the boardx4: the number of red kings on the boardx5: the number of black pieces threatened by red (i.e., which can be captured on red's ext turn)x6: the number of red pieces threatened by black.Thus, our learning program will represent V (b) a's a linear function of the formV (b)=w o+w l x l+w2x2+w3x3+w4x4+w5x5+w6x6where w o through w6 are numerical coefficients, or weights, to be chosen by thelearning algorithm. Learned values for the weights w1 through w6 will determinethe relative importance of the various board features in determining the value ofthe board, whereas the weight wo will provide an additive constant to the board value.2.Answer:Find-S & Find-G:Step 1: Initialize S to the most specific hypothesis in H. (1)φ, φ, φ, φ, φ, φ}S0:{Initialize G to the most general hypothesis in H.G0:{?, ?, ?, ?, ?, ?}.Step 2: The first example is {<Sunny, Warm, Normal, Strong, Warm, Same, +>} (3)S1:{Sunny, Warm, Normal, Strong, Warm, Same}G1:{?, ?, ?, ?, ?, ?}.Step 3: The second example is {<Sunny, Warm, High, Strong, Warm, Same, +>} (3)S2:{Sunny, Warm, ?, Strong, Warm, Same}G2:{?, ?, ?, ?, ?, ?}.Step 4: The third example is {<Rainy, Cold, High, Strong, Warm, Change, ->} (3)S3:{ Sunny, Warm, ?, Strong, Warm, Same }G3:{<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>}Step 5: The fourth example is {<Sunny, Warm, High, Strong, Cool, Change, +>} (3)S4:{Sunny, Warm, ?, Strong, ?, ?}G4:{<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?> }Finally, all the hypotheses are: (2) {<Sunny, Warm, ?, Strong, ?, ?>, <Sunny, ?, ?, Strong, ?, ?>, <Sunny, Warm, ?, ?, ?, ?>,<?, Warm, ?, Strong, ?, ?>, <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?> }3.Answer:Flog(X) = -X*log(X)-(1-X)*log(1-X);STEP1 choose the root node:entropy_all = flog(4/10)=0.971; (2) gain_outlook = entropy_all - 0.3*flog(1/3) - 0.3*flog(1) - 0.4*flog(1/2)=0.296;(1)gain_templture = entropy_all - 0.3*flog(1/3) - 0.3*flog(1/3) - 0.4*flog(1/2)=0.02; (1)gain_humidity = entropy_all - 0.5*flog(2/5) - 0.5*flog(1/5)=0.125; (1) gain_wind = entropy_all - 0.6*flog(5/6) - 0.4*flog(1/4)=0.256;(1)Root Node is “outlook”:(2)step 2 choose the second NODE:for sunny (humidity OR temperature):entropy_sunny = flog(1/3)=0.918; (1) sunny_gain_wind = entropy_sunny - (2/3)*flog(0.5) - (1/3)*flog(1)=0.252; (1) sunny_gain_humidity = entropy_sunny - (2/3)*flog(1) - (1/3)*flog(1)=0.918;(1)sunny_gain_temperature = entropy_sunny - (2/3)*flog(1) - (1/3)*flog(1)=0.918; (1) choose humidity or temperature. (1) for rain (wind):entropy_rain = flog(1/2)=1; (1)rain_gain_wind = entropy_rain - (1/2)*flog(1) - (1/2)*flog(1)=1; (1) rain_gain_humidity = entropy_rain - (1/2)*flog(1/2)-(1/2)*flog(1/2)=0; (1) rain_gain_temperature = entropy_rain - (1/4)*flog(1)- (3/4)*flog(1/3)=0.311; (1) choose wind. (1)(2)or4.Answer:A: The primitive neural units are: perceptron, linear unit and sigmoid unit. (3) Perceptron: (2)A perceptron takes a vector of real-valued inputs, calculates a linear combination of theseinputs, then output a 1 if the result is greater than some threshold and -1 otherwise. More precisely, given input x1 through xn, the output o(x1,..xi,.. xn) computed by the perceptron is NSometimes write the perceptron function asLinear units: (2)a linear unit for which the output o is given byThus, a linear unit corresponds to the first stage of a perceptron, without the threshold.Sigmoid units: (2) The sigmoid unit is illustrated in picture like the perceptron, the sigmoid unit first computes a linear combination of its inputs, then applies a threshold to the result. In the case of the sigmoid unit, however, the threshold output is a continuous function of its input.More precisely, the sigmoid unit computes its output o asWhere,B: (因题目有打印错误,所以感知器规则和delta规则均可,给出的是delta规则) Derivation process is: (6)感知器规则(perceptron learning rule)5.Answer:P(no)=5/14 P(yes)=9/14 (1) P(sunny|no)=3/5 (1) P(cool|no)=1/5 (1) P(high|no)=4/5 (1) P(strong|no)=3/5 (1) P(no|new instance)=P(no)*P(sunny|no)*P(cool|no)*P(high|no)*P(strong|no)=5/14*3/5*1/5*4/5*3/5 = 0.02057=2.057*10-2 (2) P(sunny|yes)=2/9 (1) P(cool|yes)=3/9 (1) P(high|yes)=3/9 (1) P(strong|yes)=3/9 (1)P(yes|new instance)=P(yes)*P(sunny|yes)*P(cool|yes)*P(high|yes)*P(strong|yes)=9/14*2/9*3/9*3/9*3/9 = 0.05291=5.291*10-3 (2) ANSWER: NO (2) 6.Answer:INDUCTIVE BIAS: (8)Consider a concept learning algorithm L for the set of instances X, Let c be an arbitraryconcept define over X, and let D c = {< x; c(x) >} be an arbitrary set of training examples of c .Let denote the classification assigned to the instance x i by L after training on the data D c .The inductive bias of L is any minimal set of assertions B such that for any target concept c and corresponding training examples D c:(?x i ∈X)[(B ∧x i ∧D c) ? L(x i;D c)]---The?futility?of?bias-free?learning: (7)A?learner?that?makes?no?a?priori?assumptions?regarding?the?identity?of?the?target?concept?ha s?no?rational?basis?for?classifying?any?unseen?instances.?In?fact,?the?only?reason?that?the?lea rner?was?able?to?generalize?beyond?the?observed?training?examples?is?that?it?was?biased?by? the?inductive?bias.Unfortunately,the only instances that will produce a unanimous vote are the previously observed training examples. For, all the other instances, taking a vote will be futile: each unobserved instance will be classified positive by precisely half the hypotheses in the version space and will be classified negative by the other half.1In the EnjoySport learning task, every example day is represented by 6 attributes. Given that attributes Sky has three possible values, and that AirTemp、Humidity、Wind、Wind、Water and Forecast each have two possible values. Explain why the size of the hypothesis space is 973.How would the number of possible instances and possible hypotheses increase with the addition of one attribute A that takes on on K possible values?2Write the algorithm of Candidate_Elimination using version space. Assume G is the set of maximally general hopytheses in hypothesis space H, and S is the set of maximally specific hopytheses.3Consider the following set of training examples for EnjoySport:Example Sky AirTemp Humidity Wind Water Forcast EnjoySport1 sunny warm normal strong warm same Yes2 sunny warm high strong warm same yes3 rainy cold high strong warm chagge no4 sunny warm high strong cool change yes5 sunny warm normal weak warm same no(a)What is the Entropy of the collection training examples with respect to the target functionclassification?(b)According to the 5 traning examples, compute the decision tree that be learned by ID3, andshow the decision tree.(log23=1.585, log25=2.322)4Give several approaches to avoid overfitting in decision tree learning. How to determin thecorrect final tree size?5 Write the BackPropagation algorithm for feedforward network containing two layers of sigmoid units.6 Explain the Maximum a posteriori(MAP) hypothesis. 7Using Naive Byes Classifier to classify the new instance:<Outlook=sunny,Temperature=cool,Humidity=high,Wind=strong>Our task is to predict the target value (yes or no) of the target concept PlayTennis for this new8 Question Eight :The definition of three types of fitness functions in genetic algorithm Question one :(举一个例子,比如:导航仪、西洋跳棋) Question two :Initilize: G={?,?,?,?,?,?} S={,,,,,}Step 1:G={?,?,?,?,?,?} S={sunny,warm,normal,strong,warm,same} Step2: coming one positive instance 2G={?,?,?,?,?,?} S={sunny,warm,?,strong,warm,same} Step3: coming one negative instance 3 G=<Sunny,?,?,?,?,?> <?,warm,?,?,?,?> <?,?,?,?,?,same> S={sunny,warm,?,strong,warm,same} Step4: coming one positive instance 4 S= { sunny,warm,?,strong,?,? } G=<Sunny,?,?,?,?,?> <?,warm,?,?,?,?> Question three : (a) Entropy(S)=og(3/5)og(2/5)= 0.971(b)Gain(S,sky) = Entropy(S) –[(4/5) Entropy(Ssunny) + (1/5) Entropy(Srainny)] = 0.322Gain(S,AirTemp) = Gain(S,wind) = Gain(S,sky) =0.322Gain(S,Humidity) = Gain(S,Forcast) = 0.02Gain(S,water) = 0.171Choose any feature of AirTemp, wind and sky as the top node.The decision tree as follow: (If choose sky as the top node)Question Four:Answer:Inductive bias: give some proor assumption for a target concept made by the learner to have a basis for classifying unseen instances.Suppose L is a machine learning algorithm and x is a set of training examples. L(xi, Dc) denotes the classification assigned to xi by L after training examples on Dc. Then the inductive bias is a minimal set of assertion B, given an arbitrary target concept C and set of trainingexamples Dc: (Xi ) [(B Dc Xi) -| L(xi, Dc)]C_E: the target concept is contained in the given gypothesis space H, and the training examples are all positive examples.ID3: a, small trees are preferred over larger trees.B, the trees that place high information gain attribute close to root are preferred over those that do not.BP:Smooth interpolation beteen data points.Question Five:Answer: In na?ve bayes classification, we assump that all attributes are independent given the tatget value, while in bayes belif net, it specifes a set of conditional independence along with a set of probability distribution.Question Six:随即梯度下降算法Question Seven:朴素贝叶斯例子Question Eight:The definition of three types of fitness functions in genetic algorithm Answer:In order to select one hypothese according to fitness function, there are always three methods: roulette wheel selection, tournament selection and rank selection.Question nine:Single-point crossover:Two-point crossover:Offspring: ()Uniform crossover:Point mutation:Any mutation is ok!1Solution:A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P,if its performance at tasks in T, as measured by P, improves with experience E.Example : (point out the T,P,E of the example)A checkers learning problem.A handwriting recognition learning problemA robot driving learning problem.……2 Solution:S 0:{φ, φ, φ, φ, φ, φ}S 1:{Sunny, Warm, Normal, Strong, Warm, Same}S 2:{Sunny, Warm, ?, Strong, Warm, Same}G 0, G 1, G 2:{?, ?, ?, ?, ?, ?}S 3:{ Sunny, Warm, ?, Strong, Warm, Same }G 3:{Sunny, ?, ?, ?, ?, ?} {?, Warm, ?, ?, ?, ?}{?, ?, ?, ?, ?, Same} S 4:{Sunny, Warm, ?, Strong, ?, ?}G 4:{Sunny, ?, ?, ?, ?, ?} {?, Warm, ?, ?, ?, ?}3 Solution: (a)Here we denote S=[7+,3-],then Entropy([7+,3-])= 227733log log 10101010-- =0.886; (b)i v v values(Humidity )Gain(S,Humidity)=Entropy(S)-Entropy(S )v S S∈∑Gain(S,a2) Values(Humidity )={High, Normal}∴{|()}High S s S Humidity s High =∈=223322Entropy()=-log -log 0.9725555High S =,5High S ==4 ∴ 224411Entropy()=-log -log 0.725555Normal S = ,Normal S =5 Thus Gain (S,Humidity)=0.886-55(0.972*0.72)1010⨯+=0.04 4 In general,inductive inference: Some form of prior assumptions regarding the indentity of thetarget concept made by a learner to have a rational basis for classifying an unseen instances. FormallyCANDIDATE-ELIMINATION:The target concept c is contained in the given hypothesis space H. Decision tree learning(ID3): Shorter trees are preferred over larger trees.Trees that place high information gain attributes close to the root are perferred over those that do not.BACKPROPAGATION algorithm:smooth interpolation between data points.5 Solution:(1)(2)6(3) GRADIENT-DESCENT(training examples, η)Each training example is a pair of the form ,x t , where x is the vector of input values, and t is the target output value. η is the learning rate (e.g., 0.05).● Initialize each i ω to some small random value● Until the termination condition is met, Do● Initialize each i ω∆ to zero.● For each ,x t in training_examples, Do● Input the instance x to the unit and compute the output o● For each linear unit weight i ω, Doa) n+1 8。

机器学习试题

机器学习试题

机器学习试题一、选择题1. 什么是机器学习?a) 一种人工智能技术b) 一种自动编程方法c) 一种人机交互界面d) 一种传统数据处理方法2. 以下哪一项不是机器学习的主要任务?a) 分类b) 回归c) 聚类d) 排序3. 机器学习算法的目标是什么?a) 最大化准确率b) 最小化计算时间c) 最小化学习误差d) 最大化训练数据规模二、判断题1. 监督学习是一种有标签数据的学习方法。

2. 无监督学习可以在没有标签的情况下自动学习数据。

3. 决策树是一种无监督学习算法。

三、简答题1. 请简要解释监督学习和无监督学习的区别。

2. 什么是过拟合问题?如何解决过拟合问题?3. 请举例说明聚类算法的应用场景。

四、编程题请使用Python编写一个简单的线性回归模型,基于给定的训练数据进行训练,并对新的数据进行预测。

提示:1. 可以使用第三方机器学习库(如scikit-learn)来实现线性回归模型。

2. 需要将数据集拆分为训练集和测试集,用于模型的训练和评估。

3. 可以使用均方误差(Mean Squared Error)作为模型评估指标。

五、论述题请论述支持向量机(SVM)算法的原理和应用场景。

注意:以上题目仅作参考,具体试题内容可能会有所调整。

结语:机器学习是一门涉及统计学、计算机科学和人工智能的交叉学科,通过训练模型从数据中学习规律,并利用学到的模型进行预测和决策。

希望以上试题能够帮助您巩固机器学习的基础知识,进一步探索和应用机器学习的可能性。

祝您学习愉快!。

机器学习应用考试 选择题40题 附答案

机器学习应用考试 选择题40题 附答案

1. 机器学习的主要目标是:A. 通过数据自动发现规律和模式B. 手动编写所有程序逻辑C. 优化硬件性能D. 提高网络速度答案:A2. 以下哪项不是机器学习的类型?A. 监督学习B. 无监督学习C. 半监督学习D. 全监督学习答案:D3. 监督学习的主要任务是:A. 分类和回归B. 聚类C. 关联规则学习D. 降维答案:A4. 无监督学习的主要任务是:A. 分类和回归B. 聚类C. 关联规则学习D. 降维答案:B5. 以下哪项是监督学习的典型应用?A. 图像识别B. 市场细分C. 异常检测D. 推荐系统答案:A6. 以下哪项是无监督学习的典型应用?A. 图像识别B. 市场细分C. 异常检测D. 推荐系统答案:B7. 以下哪项是半监督学习的典型应用?A. 图像识别B. 市场细分C. 异常检测D. 推荐系统答案:C8. 以下哪项是强化学习的典型应用?A. 图像识别B. 市场细分C. 异常检测D. 游戏AI答案:D9. 以下哪项是深度学习的典型应用?A. 图像识别B. 市场细分C. 异常检测D. 推荐系统答案:A10. 以下哪项是机器学习模型的评估指标?A. 准确率B. 召回率C. F1分数D. 以上都是答案:D11. 以下哪项是机器学习模型的过拟合现象?A. 模型在训练数据上表现良好,但在新数据上表现不佳B. 模型在训练数据上表现不佳,但在新数据上表现良好C. 模型在训练数据和新数据上表现都良好D. 模型在训练数据和新数据上表现都不佳答案:A12. 以下哪项是机器学习模型的欠拟合现象?A. 模型在训练数据上表现良好,但在新数据上表现不佳B. 模型在训练数据上表现不佳,但在新数据上表现良好C. 模型在训练数据和新数据上表现都良好D. 模型在训练数据和新数据上表现都不佳答案:D13. 以下哪项是机器学习模型的正则化方法?A. L1正则化B. L2正则化C. dropoutD. 以上都是答案:D14. 以下哪项是机器学习模型的特征选择方法?A. 过滤法B. 包装法C. 嵌入法D. 以上都是答案:D15. 以下哪项是机器学习模型的特征提取方法?A. PCAB. LDAC. t-SNED. 以上都是答案:D16. 以下哪项是机器学习模型的集成学习方法?A. 随机森林B. 梯度提升机C. 堆叠法D. 以上都是答案:D17. 以下哪项是机器学习模型的交叉验证方法?A. K折交叉验证B. 留一法交叉验证C. 随机划分交叉验证D. 以上都是答案:D18. 以下哪项是机器学习模型的超参数调优方法?A. 网格搜索B. 随机搜索C. 贝叶斯优化D. 以上都是答案:D19. 以下哪项是机器学习模型的数据预处理方法?A. 缺失值处理B. 异常值处理C. 数据标准化D. 以上都是答案:D20. 以下哪项是机器学习模型的特征工程方法?A. 特征选择B. 特征提取C. 特征构建D. 以上都是答案:D21. 以下哪项是机器学习模型的模型选择方法?A. 交叉验证B. 超参数调优C. 模型集成D. 以上都是答案:D22. 以下哪项是机器学习模型的模型解释方法?A. 特征重要性分析B. 局部解释方法C. 全局解释方法D. 以上都是答案:D23. 以下哪项是机器学习模型的模型部署方法?A. 模型打包B. 模型服务C. 模型监控D. 以上都是答案:D24. 以下哪项是机器学习模型的模型维护方法?A. 模型更新B. 模型回滚C. 模型备份D. 以上都是答案:D25. 以下哪项是机器学习模型的模型评估方法?A. 准确率B. 召回率C. F1分数D. 以上都是答案:D26. 以下哪项是机器学习模型的模型优化方法?A. 正则化B. 特征选择C. 超参数调优D. 以上都是答案:D27. 以下哪项是机器学习模型的模型解释方法?A. 特征重要性分析B. 局部解释方法C. 全局解释方法D. 以上都是答案:D28. 以下哪项是机器学习模型的模型部署方法?A. 模型打包B. 模型服务C. 模型监控D. 以上都是答案:D29. 以下哪项是机器学习模型的模型维护方法?A. 模型更新B. 模型回滚C. 模型备份D. 以上都是答案:D30. 以下哪项是机器学习模型的模型评估方法?A. 准确率B. 召回率C. F1分数D. 以上都是答案:D31. 以下哪项是机器学习模型的模型优化方法?A. 正则化B. 特征选择C. 超参数调优D. 以上都是答案:D32. 以下哪项是机器学习模型的模型解释方法?A. 特征重要性分析B. 局部解释方法C. 全局解释方法D. 以上都是答案:D33. 以下哪项是机器学习模型的模型部署方法?A. 模型打包B. 模型服务C. 模型监控D. 以上都是答案:D34. 以下哪项是机器学习模型的模型维护方法?A. 模型更新B. 模型回滚C. 模型备份D. 以上都是答案:D35. 以下哪项是机器学习模型的模型评估方法?A. 准确率B. 召回率C. F1分数D. 以上都是答案:D36. 以下哪项是机器学习模型的模型优化方法?A. 正则化B. 特征选择C. 超参数调优D. 以上都是答案:D37. 以下哪项是机器学习模型的模型解释方法?A. 特征重要性分析B. 局部解释方法C. 全局解释方法D. 以上都是答案:D38. 以下哪项是机器学习模型的模型部署方法?A. 模型打包B. 模型服务C. 模型监控D. 以上都是答案:D39. 以下哪项是机器学习模型的模型维护方法?A. 模型更新B. 模型回滚C. 模型备份D. 以上都是答案:D40. 以下哪项是机器学习模型的模型评估方法?A. 准确率B. 召回率C. F1分数D. 以上都是答案:D答案:1. A2. D3. A4. B5. A6. B7. C8. D9. A10. D11. A12. D13. D14. D15. D16. D17. D18. D19. D20. D21. D22. D23. D24. D25. D26. D27. D28. D29. D30. D31. D32. D33. D34. D35. D36. D37. D38. D39. D40. D。

机器学习课程期末考试试题

机器学习课程期末考试试题

机器学习课程期末考试试题### 机器学习课程期末考试试题一、选择题(每题2分,共20分)1. 在机器学习中,通常所说的“过拟合”是指:- A. 模型在训练集上表现很好,但在测试集上表现较差 - B. 模型在训练集上表现较差- C. 模型在训练集和测试集上表现都很差- D. 模型在训练集上表现一般,但在测试集上表现很好2. 支持向量机(SVM)的核心思想是:- A. 找到最佳拟合线- B. 找到最佳拟合平面- C. 在特征空间中找到最优的决策边界- D. 在数据空间中找到最优的决策边界3. 以下哪个算法是用于聚类分析的?- A. 逻辑回归- B. 决策树- C. K-means- D. 随机森林4. 在神经网络中,激活函数的作用是:- A. 增加模型的复杂度- B. 引入非线性因素- C. 减少模型的复杂度- D. 使模型更容易训练5. 交叉验证的主要目的是什么?- A. 减少模型训练时间- B. 减少模型的过拟合风险- C. 提高模型的泛化能力- D. 增加模型的复杂度二、简答题(每题10分,共30分)1. 描述机器学习中的“训练集”和“测试集”的区别,并解释为什么在机器学习中需要将数据集分为训练集和测试集。

2. 解释什么是“决策树”,并简述如何使用决策树进行分类。

3. 什么是“梯度下降”算法?它在机器学习中如何应用?三、计算题(每题25分,共50分)1. 假设我们有一个简单的线性回归问题,模型的预测函数为 \( f(x) = wx + b \),其中 \( w \) 是权重,\( b \) 是偏置项。

给定数据集 \( \{(x_1, y_1), (x_2, y_2), ..., (x_n, y_n)\),其中\( y_i = wx_i + b + \epsilon_i \),\( \epsilon_i \) 是噪声项。

请推导最小二乘法的权重 \( w \) 和偏置 \( b \) 的更新公式。

哈工大机器学习历年考试

哈工大机器学习历年考试

1 Give the definitions or your comprehensions of the following terms.(12’)1.1 The inductive learning hypothesisP171.2 OverfittingP491.4 Consistent learnerP1482 Give brief answers to the following questions.(15’)VS, In general what is the smallest number 2.2 If the size of a version space is ||of queries may be required by a concept learner using optimal query strategy to perfectly learn the target concept?P272.3 In genaral, decision trees represent a disjunction of conjunctions ofconstrains on the attribute values of instanse,then what expression does the following decision tree corresponds to ?3 Give the explaination to inductive bias, and list inductive bias of CANDIDATE-ELIMINATION algorithm, decision tree learning(ID3), BACKPROPAGATION algorithm.(10’)4 How to solve overfitting in decision tree and neural network?(10’) Solution:● Decision tree:◆ 及早停止树增长(stop growing earlier) ◆ 后修剪法(post-pruning) ● Neural Network◆ 权值衰减(weight decay) ◆ 验证数据集(validation set)5 Prove that the LMS weight update rule ^(()())i i train i V b V b x ωωη←+-performs a gradient descent to minimize the squared error. In particular, define the squared error E as in the text. Now calculate the derivative of E with respect to the weighti ω, assuming that ^()V b is a linear function as defined in the text. Gradientdescent is achieved by updating each weight in proportion to iEω∂-∂. Therefore, you must show that the LMS training rule alters weights in this proportion for each training example it encounters. ( ^2,() (()())train train b V b training exampleE V b V b 〈〉∈≡-∑) (8’)Solution :As Vtrain(b)ˆV(Successor(b)) we can get E=2ˆ(()())trainVb Vb -∑ 0112233445566ˆ()w +w x +w x +w x +w x +w x +w x V b = ˆˆ/2(()())(()())/i train train i E w V b V b V b V b w -∂∂=--∂-∂ =ˆ2(()())train iV b V b x -- As mentioned in LMS: ˆ(()())i i train iV b V b x ωωη←+- We can get (/)i i i E w ωωη←+-∂∂/2ηη'=-Therefore, gradient descent is achievement by updating each weight in proportion to /i E w -∂∂;LMS rules alters weights in this proportion for each training example it encounters.6 True or false: if decision tree D2 is an elaboration of tree D1, then D1 is more-general-than D2. Assume D1 and D2 are decision trees representing arbitrary boolean funcions, and that D2 is an elaboration of D1 if ID3 could extend D1 to D2. If true give a proof; if false, a counter example.(Definition: Let j h and k h be boolean-valued functions defined over X .then j h is more_general_than_or_equal_tokh (writtenj g kh h ≥) If and only if()[(()1)(()1)]k j x X h x h x ∀∈=→= then ()()j k j g k k g j h h h h h h >⇔≥∧≥) (10’)The hypothesis is false.One counter example is A XOR Bwhile if A!=B, training examples are all positive, while if A==B, training examples are all negative,then, using ID3 to extend D1, the new tree D2 will be equivalent to D1, i.e., D2 is equal to D1.7 Design a two-input perceptron that implements the boolean function A B ∧⌝.Design a two-layer network of perceptrons that implements A XOR B . (10’)8 Suppose that a hypothesis space containing three hypotheses, 1h ,2h ,3h , and the posterior probabilities of these typotheses given the training data are 0.4, 0.3 and 0.3 respectively. And if a new instance x is encountered, which is classified positive by 1h , but negative by 2h and 3h ,then give the result and detail classification course of Bayes optimal classifier.(10’) P1259 Suppose S is a collection of training-example days described by attributes including Humidity, which can have the values High or Normal. Assume S is a collection containing 10 examples, [7+,3-]. Of these 10 examples, suppose 3 of the positive and 2 of the negative examples have Humidity = High, and the remainder have Humidity = Normal. Please calculate the information gain due to sorting the original 10examples by the attribute Humidity.( log21=0, log 22=1, log 23=1.58, log 24=2, log 25=2.32, log 26=2.58, log 27=2.8, log 28=3, log 29=3.16, log 210=3.32, ) (5’)Solution :(a)Here we denote S=[7+,3-],then Entropy([7+,3-])= 227733log log 10101010-- =0.886;(b)i v v values(Humidity )Gain(S,Humidity)=Entropy(S)-Entropy(S )vS S∈∑Gain(S,a2)Values(Humidity )={High, Normal}∴{|()}High S s S Humidity s High =∈=223322Entropy()=-log -log 0.9725555High S =,5High S ==4∴ 224411Entropy()=-log -log 0.725555Normal S = ,Normal S =5Thus Gain (S,Humidity)=0.886-55(0.972*0.72)1010⨯+=0.0410 Finish the following algorithm. (10’)(1) GRADIENT-DESCENT(training examples, η)Each training example is a pair of the form ,x t , where x is the vector of input values, and t is the target output value. η is the learning rate (e.g., 0.05).● Initialize each i ω to some small random value ● Until the termination condition is met, Do● Initialize each i ω∆ to zero.● For each ,x t in training_examples, Do● Input the instance x to the unit and compute the output o ● For each linear unit weight i ω, Do● For each linear unit weight i ω, Do i i i ωωω←+∆(2) FIND-S Algorithm● Initialize h to the most specific hypothesis in H ● For each positive training instance x● For each attribute constraint a i in hIf Thendo nothingElsereplace a i in h by the next more general constraint that is satisfied by x● Output hypothesis h1. What is the definition of learning problem?(5)Use “a checkers learning problem ” as an example to state how to design a learning system. (15) Answer:A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience. (5)Example:A checkers learning problem:T: play checkers (1) P: percentage of games won in a tournament (1) E: opportunity to play against itself (1)To design a learning system: Step 1: Choosing the Training Experience (4)A checkers learning problem: Task T: playing checkersPerformance measure P: percent of games won in the world tournament Training experience E: games played against itselfIn order to complete the design of the learning system, we must now choose1. the exact type of knowledge to be learned2. a representation for this target knowledge3. a learning mechanismStep 2: Choosing the Target Function (4)1. if b is a final board state that is won, then V(b)=1002. if b is a final board state that is lost, then V (b)=-1003. if b is a final board state that is drawn, then V (b)=04. if b is a not a final state in the game, then V(b)=V (b'), where b' is thebestfinal board state that can be achieved starting from b and playing optimallyuntil the end of the game (assuming the opponent plays optimally, as well).Step 3: Choosing a Representation for the Target Function (4)x1: the number of black pieces on the boardx2: the number of red pieces on the boardx3: the number of black kings on the boardx4: the number of red kings on the boardx5: the number of black pieces threatened by red (i.e., which can be capturedon red's ext turn)x6: the number of red pieces threatened by black.Thus, our learning program will represent V (b) a's a linear function of theformV (b)=w o+w l x l+w2x2+w3x3+w4x4+w5x5+w6x6where w o through w6 are numerical coefficients, or weights, to be chosen bythelearning algorithm. Learned values for the weights w1through w6will determinethe relative importance of the various board features in determining the valueofthe board, whereas the weight wo will provide an additive constant to the board value.2.Answer:Find-S & Find-G:Step 1: Initialize S to the most specific hypothesis in H.(1)φ, φ, φ, φ, φ, φ}S0:{Initialize G to the most general hypothesis in H.G0:{?, ?, ?, ?, ?, ?}.Step 2: The first example is {<Sunny, Warm, Normal, Strong, Warm, Same, +>} (3)S1:{Sunny, Warm, Normal, Strong, Warm, Same}G1:{?, ?, ?, ?, ?, ?}.Step 3: The second example is {<Sunny, Warm, High, Strong, Warm, Same, +>} (3)S2:{Sunny, Warm, ?, Strong, Warm, Same}G2:{?, ?, ?, ?, ?, ?}.Step 4: The third example is {<Rainy, Cold, High, Strong, Warm, Change, ->} (3)S3:{ Sunny, Warm, ?, Strong, Warm, Same }G3:{<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>}Step 5: The fourth example is {<Sunny, Warm, High, Strong, Cool, Change, +>} (3)S4:{Sunny, Warm, ?, Strong, ?, ?}G4:{<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?> }Finally, all the hypotheses are:(2){<Sunny, Warm, ?, Strong, ?, ?>, <Sunny, ?, ?, Strong, ?, ?>, <Sunny,Warm, ?, ?, ?, ?>,<?, Warm, ?, Strong, ?, ?>, <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?> }3.Answer:Flog(X) = -X*log(X)-(1-X)*log(1-X);STEP1 choose the root node:entropy_all = flog(4/10)=0.971;(2)gain_outlook = entropy_all - 0.3*flog(1/3) - 0.3*flog(1) - 0.4*flog(1/2)=0.296; (1)gain_templture = entropy_all - 0.3*flog(1/3) - 0.3*flog(1/3) - 0.4*flog(1/2)=0.02;(1)gain_humidity = entropy_all - 0.5*flog(2/5) - 0.5*flog(1/5)=0.125;(1)gain_wind = entropy_all - 0.6*flog(5/6) - 0.4*flog(1/4)=0.256;(1)Root Node is “outlook”:(2)step 2 choose the second NODE:for sunny (humidity OR temperature):entropy_sunny = flog(1/3)=0.918;(1)sunny_gain_wind = entropy_sunny - (2/3)*flog(0.5) - (1/3)*flog(1)=0.252; (1)sunny_gain_humidity = entropy_sunny - (2/3)*flog(1) - (1/3)*flog(1)=0.918;(1)sunny_gain_temperature = entropy_sunny - (2/3)*flog(1) - (1/3)*flog(1)=0.918;(1)choose humidity or temperature.(1)for rain (wind):entropy_rain = flog(1/2)=1;(1)rain_gain_wind = entropy_rain - (1/2)*flog(1) - (1/2)*flog(1)=1;(1)rain_gain_humidity = entropy_rain - (1/2)*flog(1/2)-(1/2)*flog(1/2)=0; (1)rain_gain_temperature = entropy_rain - (1/4)*flog(1)- (3/4)*flog(1/3)=0.311;(1)choose wind. (1)(2)or4.Answer:A: The primitive neural units are: perceptron, linear unit and sigmoid unit.(3) Perceptron: (2)A perceptron takes a vector of real-valued inputs, calculates a linearcombination of these inputs, then output a 1 if the result is greater than some threshold and -1 otherwise. More precisely, given input x1 through xn, the output o(x1,..xi,.. xn) computed by the perceptron is NSometimes write the perceptron function asLinear units: (2)a linear unit for which the output o is given byThus, a linear unit corresponds to the first stage of a perceptron, without the threshold.Sigmoid units: (2) The sigmoid unit is illustrated in picture like the perceptron, the sigmoid unit first computes a linear combination of its inputs, then applies a threshold to the result. In the case of the sigmoid unit, however, the threshold output is a continuous function of its input.More precisely, the sigmoid unit computes its output o asWhere,B: (因题目有打印错误,所以感知器规则和delta规则均可,给出的是delta规则) Derivation process is: (6)感知器规则(perceptron learning rule)5.Answer:P(no)=5/14 P(yes)=9/14(1)P(sunny|no)=3/5 (1)P(cool|no)=1/5 (1)P(high|no)=4/5 (1)P(strong|no)=3/5 (1)P(no|new instance)=P(no)*P(sunny|no)*P(cool|no)*P(high|no)*P(strong|no)=5/14*3/5*1/5*4/5*3/5 = 0.02057=2.057*10-2 (2)P(sunny|yes)=2/9 (1)P(cool|yes)=3/9 (1)P(high|yes)=3/9 (1)P(strong|yes)=3/9 (1)P(yes|new instance)=P(yes)*P(sunny|yes)*P(cool|yes)*P(high|yes)*P(strong|yes)=9/14*2/9*3/9*3/9*3/9 = 0.05291=5.291*10-3 (2) ANSWER: NO (2)6.Answer:INDUCTIVE BIAS: (8)Consider a concept learning algorithm L for the set of instances X, Let c be an arbitrary concept define over X, and let D c = {< x; c(x) >} be an arbitrary set of training examples of c . Let denote the classification assigned to the instance x i by L after training on the data D c .The inductive bias of L is any minimal set of assertions B such that for any target concept c and corresponding training examples D c:(∀x i ∈ X)[(B ∧ x i ∧ D c) ⊢ L(x i;D c)]---The futility of bias-free learning: (7)A learner that makes no a priori assumptions regarding the identityof the target concept has no rational basis for classifying any unse en instances. In fact, the only reason that the learner was able to generalize beyond the observed training examples is that it was bia sed by the inductive bias.Unfortunately,the only instances that will produce a unanimous vote are the previously observed training examples. For, all the other instances, taking a vote will be futile: each unobserved instance will be classified positive by precisely half the hypotheses in the version space and will be classified negative by the other half.1In the EnjoySport learning task, every example day is represented by 6 attributes.Given that attributes Sky has three possible values, and that AirTemp、Humidity、Wind、Wind、Water and Forecast each have two possible values. Explain why the size of the hypothesis space is 973. How would the number of possible instances and possible hypotheses increase with the addition of one attribute A that takes on on K possible values?2Write the algorithm of Candidate_Elimination using version space. Assume G is the set of maximally general hopytheses in hypothesis space H, and S is the set of maximally specific hopytheses.3Consider the following set of training examples for EnjoySport:Example Sky AirTemp Humidity Wind Water Forcast EnjoySport 1sunny warm normal strong warm same Yes2sunny warm high strong warm same yes3rainy cold high strong warm chagge no4sunny warm high strong cool change yes5sunny warm normal weak warm same no (a)What is the Entropy of the collection training examples with respect to the targetfunction classification?(b)According to the 5 traning examples, compute the decision tree that be learnedby ID3, and show the decision tree.(log23=1.585, log25=2.322)4Give several approaches to avoid overfitting in decision tree learning. How to determin the correct final tree size?5Write the BackPropagation algorithm for feedforward network containing two layers of sigmoid units.6Explain the Maximum a posteriori(MAP) hypothesis.7Using Naive Byes Classifier to classify the new instance:<Outlook=sunny,Temperature=cool,Humidity=high,Wind=strong> Our task is to predict the target value (yes or no) of the target concept PlayTennis for this new instance. The table blow provides a set of 14 trainingD7 D8 D9 D10 D11 D12 D13 D14 Overcast Sunny Sunny Rain Sunny Overcast Overcast Rain Cool Mild Cool Mild Mild Mild Hot Mild Normal High Normal Normal Normal High Normal High Strong Weak Weak Weak Strong Strong Weak Strong Yes No Yes Yes Yes Yes Yes No8Question Eight :The definition of three types of fitness functions in genetic algorithmQuestion one :(举一个例子,比如:导航仪、西洋跳棋) Question two :Initilize: G={?,?,?,?,?,?} S={,,,,,}Step 1:G={?,?,?,?,?,?} S={sunny,warm,normal,strong,warm,same} Step2: coming one positive instance 2G={?,?,?,?,?,?} S={sunny,warm,?,strong,warm,same} Step3: coming one negative instance 3G=<Sunny,?,?,?,?,?> <?,warm,?,?,?,?> <?,?,?,?,?,same> S={sunny,warm,?,strong,warm,same} Step4: coming one positive instance 4 S= { sunny,warm,?,strong,?,? }G=<Sunny,?,?,?,?,?> <?,warm,?,?,?,?> Question three : (a) Entropy(S)=og(3/5)og(2/5)= 0.971(b) Gain(S,sky) = Entropy(S) –[(4/5) Entropy(Ssunny) + (1/5) Entropy(Srainny)]= 0.322Gain(S,AirTemp) = Gain(S,wind) = Gain(S,sky) =0.322 Gain(S,Humidity) = Gain(S,Forcast) = 0.02 Gain(S,water) = 0.171Choose any feature of AirTemp, wind and sky as the top node. The decision tree as follow: (If choose sky as the top node)Question Four:Answer:Inductive bias: give some proor assumption for a target concept made by the learner to have a basis for classifying unseen instances.Suppose L is a machine learning algorithm and x is a set of training examples. L(xi, Dc) denotes the classification assigned to xi by L after training examples on Dc. Then the inductive bias is a minimal set of assertion B, given an arbitrarytarget concept C and set of training examples Dc: (Xi ) [(B Dc Xi) -| L(xi,Dc)]C_E: the target concept is contained in the given gypothesis space H, and the training examples are all positive examples.ID3: a, small trees are preferred over larger trees.B, the trees that place high information gain attribute close to root are preferred over those that do not.BP:Smooth interpolation beteen data points.Question Five:Answer: In naïve bayes classification, we assump that all attributes are independent given the tatget value, while in bayes belif net, it specifes a set of conditional independence along with a set of probability distribution.Question Six:随即梯度下降算法Question Seven:朴素贝叶斯例子Question Eight:The definition of three types of fitness functions in genetic algorithmAnswer:In order to select one hypothese according to fitness function, there are always three methods: roulette wheel selection, tournament selection and rank selection. Question nine:Single-point crossover:Crossover mask: 11111100000 or 11111000000 or 1111 0000000 or 00001111111 Two-point crossover:Offspring: (11001011000, 00101000101)Uniform crossover:Crossover mask: 10011110011 or 10001110011 or 01111101100 or 10000010011 or 10011110001 01100001100Point mutation:Any mutation is ok!1 Solution:A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P,if its performance at tasks in T, as measured by P, improves with experience E. Example : (point out the T,P,E of the example)A checkers learning problem.A handwriting recognition learning problem A robot driving learning problem. …… 2 Solution:S 0:{φ, φ, φ, φ, φ, φ}S 1:{Sunny, Warm, Normal, Strong, Warm, Same} S 2:{Sunny, Warm, ?, Strong, Warm, Same} G 0, G 1, G 2:{?, ?, ?, ?, ?, ?}S 3:{ Sunny, Warm, ?, Strong, Warm, Same }G 3:{Sunny, ?, ?, ?, ?, ?} {?, Warm, ?, ?, ?, ?} {?, ?, ?, ?, ?,Same}S 4:{Sunny, Warm, ?, Strong, ?, ?}G 4:{Sunny, ?, ?, ?, ?, ?} {?, Warm, ?, ?, ?, ?}3 Solution:(a)Here we denote S=[7+,3-],then Entropy([7+,3-])= 227733log log 10101010-- =0.886;(b)i v v values(Humidity )Gain(S,Humidity)=Entropy(S)-Entropy(S )vS S∈∑Gain(S,a2)Values(Humidity )={High, Normal}∴{|()}High S s S Humidity s High =∈=223322Entropy()=-log -log 0.9725555High S =,5High S ==4∴ 224411Entropy()=-log -log 0.725555Normal S = ,Normal S =5Thus Gain (S,Humidity)=0.886-55(0.972*0.72)1010⨯+=0.044 In general,inductive inference: Some form of prior assumptions regarding theindentity of the target concept made by a learner to have a rational basis for classifying an unseen instances. FormallyCANDIDATE-ELIMINATION:The target concept c is contained in the given hypothesis space H.Decision tree learning(ID3): Shorter trees are preferred over larger trees.Trees that place high information gain attributes close to the root are perferred over those that do not.BACKPROPAGATION algorithm:smooth interpolation between data points.5 Solution: (1)(2)6(3) GRADIENT-DESCENT(training examples, η) Each training example is a pair of the form ,x t , where x is the vectorof input values, and t is the target output value. η is the learning rate (e.g.,0.05).● Initialize each i ω to some small random value● Until the termination condition is met, Do● Initialize each i ω∆ to zero.● For each ,x t in training_examples, Do● Input the instance x to the unit and compute the output o● For each linear unit weight i ω, Do()i i i w w t o x η←+-a) n+18-----精心整理,希望对您有所帮助!。

哈工大强基计划试题

哈工大强基计划试题

哈工大强基计划试题一、在计算机科学与技术领域,以下哪项技术是实现人工智能中机器学习算法的基础?A. 数据库管理系统B. 计算机网络C. 数据挖掘与分析D. 操作系统原理(C)(答案)二、在材料科学与工程领域,以下哪种材料因其优异的导电性和热导率而被广泛应用于电子器件中?A. 高分子材料B. 陶瓷材料C. 半导体材料D. 金属材料(D)(答案)三、在航空航天工程中,以下哪个参数是衡量飞机飞行效率的重要指标?A. 最大飞行速度B. 升限C. 航程与载油量比D. 爬升率(C)(注:此选项为简化表述,实际中衡量飞行效率的指标可能更复杂,但此处以航程与载油量比作为代表)(答案)四、在能源与动力工程中,以下哪种能源转换方式是将化学能直接转换为电能?A. 火力发电B. 水力发电C. 燃料电池D. 风力发电(C)(答案)五、在机械工程领域,以下哪种机构能够实现将直线运动转换为旋转运动?A. 蜗轮蜗杆机构B. 曲柄摇杆机构C. 凸轮机构(特定设计下)D. 皮带传动机构(B)(注:虽然凸轮机构在某些特定设计下也能实现此功能,但曲柄摇杆机构是更常见的将直线运动转换为旋转运动的机构)(答案)六、在土木工程领域,以下哪种结构形式常用于大跨度桥梁的设计?A. 砖混结构B. 钢筋混凝土结构C. 悬索结构D. 拱式结构(C)(注:悬索结构是大跨度桥梁中常用的一种结构形式,但拱式结构也常用于某些大跨度桥梁的设计,此处以悬索结构为正确答案)(答案)七、在环境科学与工程领域,以下哪种技术是用于处理大气污染中的颗粒物?A. 生物降解B. 膜分离C. 静电除尘D. 化学沉淀(C)(答案)八、在自动化控制领域,以下哪个概念描述了系统根据设定值与实际值之间的偏差进行自动调节的过程?A. 反馈控制B. 开环控制C. 顺序控制D. 逻辑控制(A)(答案)。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

1 Give the defi niti ons or your comprehe nsions of the follow ing terms.(12 '1.1 The in ductive lear ning hypothesisP171.2 Overfitti ngP491.4 Con siste nt lear nerP1482 Give brief answers to the following questions.(15 '2.2 If the size of a version space is |VS |, In general what is the smallest number of queries may berequired by a concept learner using optimal query strategy to perfectly learn the target con ceptP272.3 In genaral, decision trees represent a disjunction of conjunctions of constrains on the attributevalues of in sta nse,the n what expressi on does the followi ng decisi on tree corresp onds to3 Give the explaination to inductive bias, and list inductive bias of CANDIDATE-ELIMINATION algorithm, decisi on tree learni ng(ID3), BACKPROPAGATION algorithm.(10 '4 How to solve overfitt ing in decisi on tree and n eural n etwork(10 'Soluti on:Decisi on tree:及早停止树增长(stop growing earlier)后修剪法(post-pruning)Neural Network权值衰减(weight decay) 验证数据集(validation set)A5 Prove that the LMS weight update rule i i(V train (b) V (b))x i performs a gradientdescent to minimize the squared error. In particular, define the squared error E as in the text. NowAcalculate the derivative of E with respect to the weight i, assuming that V (b) is a linear function as defi ned in the text. Gradie nt desce nt is achieved by updat ing each weight i n proport ion Eto --------- . Therefore, you must show that the LMS trai ning rule alters weights in this proporti on iA2for each training example it encounters. ( E (V train (b) V(b)) ) (8'b ,V t r ain (b) training exampleSolution :As Vtrai n(b) \? (Successor(b))we can get E= (V train (b) V(b))2=2(V train(b)伽)中As mentioned in LMS: i i (V train (b) \?(b))X iWe can get i i ( E / w i)Therefore, gradient descent is achievement by updating each weight in proportion to E / w i;LMS rules alters weights in this proportion for each training example it encounters.6 True or false: if decisi on tree D2 is an elaborati on of tree D1, the n D1 is more-ge neral-tha n D2. Assume D1 and D2 are decision trees representing arbitrary boolean funcions, and that D2 is an elaboratio n of D1 if ID3 could exte nd D1 to D2. If true give a proof; if false, a coun ter example. (Definition: Let h j and h k be boolean-valued functions defined over X .then h j ismore_ge neral_tha n_o r_equal_to h k (writte n h j g h k ) If and only if (x X)[(h k(x) 1) (h j(x) 1)] then h j h k (h j g h k )(h k g h j)) (10 'The hypothesis is false.One cou nter example is A XOR B while if A!=B, trai ning examples are all positive, while if A==B, trai ning examples are all n egative, then, usi ng ID3 to exte nd D1, the new tree D2 will be equivale nt to D1, i.e., D2 is equal to D1.7 Design a two-input perceptron that implements the boolean function A B .Design atwo-layer network of perceptrons that implements A XOR B . (10 '8 Suppose that a hypothesis space containing three hypotheses, h!, h2,h3, and the posteriorprobabilities of these typotheses given the training data are 0.4, 0.3 and 0.3 respectively. And if anew instanee x is encountered, which is classified positive by g, but negative by h2andh3,then give the result and detail classification course of Bayes optimal classifier.(10 'P1259 Suppose S is a collection of training-example days described by attributes including Humidity, which can have the values High or Normal. Assume S is a collection containing 10 examples, [7+,3_]. Of these 10 examples, suppose 3 of the positive and 2 of the negative examples have Humidity = High, and the rema in der have Humidity = Normal. Please calculate the in formati on gain due to sorting the original 10 examples by the attribute Humidity.( log 2l=0, log 22=1, Iog 23=1.58, Iog 24=2, Iog 25=2.32, Iog 26=2.58, Iog 27=2.8, Iog 28=3, Iog 29=3.16, Iog 2l0=3.32,) (5' Solution :(a)Here we denote S=[7+,3-],then Entropy([7+,3-])=丄 l^ 上? I^ ? =0.886;10 10 10 10(b) Gai n(S,Humidity)=E ntropy(S)-v values(Humidity JQ Values(Humidity )={High, Normal}S High {s S|Humidity (s) High}Each trai ning example is a pair of the form ;. x,t ;:, where x is the vector of in put values,Initialize eachi to some small random valueUn til the term in atio n con diti on is met, DoInitialize each i to zero.For each ( x, n in training_examples, DoIn put the in sta nee x to the un it and compute the output o For each linear unit weight i , Do For each linear unit weight i , Do(2) FIND-S AlgorithmIn itialize h to the most specific hypothesis in H For each positive trai ning in sta nee xFor each attribute constraint a i in h—Entropy(Sz) Gain(S,a2)3 3 2 2Entropy(S High )=-Jog2[-匸log ?匚 0.972, 0 5 5 5 54 4 En tropy(S Normal )=-:Iog 2 匚5 55 Thus Gain (S,Humidity)=0.886- ( 0.972 10 Fin ish the followi ng algorithm. (10 '(1) GRADIENT-DESCENT(training examples,)igh 5 =44 V 1 log ? 0.72 , S N5 55*0.72) =0.0410ormal=5and t is the target output value.is the lear ning rate (e.g., 0.05).If ________________________The ndo nothingElsereplace a i in h by the n ext more gen eral con stra int that is satisfied by x Output hypothesis h1. What is the defi niti on of lear ning problem(5)Use a checkers learning problem ” as an example to state how to design a learning system.(15)An swer:A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experie nce.(5)Example:A checkers lear ning problem:T: play checkers(1)P: perce ntage of games won in a tour name nt (1) E: opport un ity to play aga inst itself (1)To desig n a lear ning system:Step 1: Choos ing the Trai ning Experie nce(4)A checkers lear ning problem:Task T: play ing checkersPerforma nce measure P: perce nt of games won in the world tourn ame ntTraining experie nce E: games played aga in st itselfIn order to complete the design of the learning system, we must now choose1. the exact type of kno wledge to be lear ned2. a represe ntati on for this target kno wledge3. a lear ning mecha nismStep 2: Choos ing the Target Function ⑷1. if b is a final board state that is won, then V(b)=1002. if b is a final board state that is lost, then V (b)=-1003. if b is a final board state that is draw n, the n V (b)=04. if b is a not a final state in the game, then V(b)=V (b'), where b' is the best final board statethat can be achieved starting from b and playing optimally un til the end of the game (assuming the opp onent plays optimally, as well).Step 3: Choos ing a Represe ntati on for the Target Function (4) x1: the nu mber of black pieces on the boardx2: the nu mber of red pieces on the boardx3: the nu mber of black kings on the boardx4: the number of red kings on the boardx5: the number of black pieces threatened by red (i.e., which can be captured on red's ext turn)x6: the number of red pieces threatened by black.Thus, our learning program will represent V (b) a's a linear function of the formV (b)=w o+w l x l+w2x2+w 3x3+w 4x4+w5x5+w6x6 where w o through w6 are numerical coefficients, or weights, to be chosen by the learning algorithm. Learned values for the weights w 1 through w 6 will determine the relative importance of the various board features in determining the value of the board, whereas the weight wo will provide an additive constant to the board value.2. Answer: Find-S & Find-G: Step 1: Initialize S to the most specific hypothesis in H. (1)S0:{ , , , , , }Initialize G to the most general hypothesis in H.G0:{, , , , , }.Step 2: The first example is {<Sunny, Warm, Normal, Strong, Warm, Same, +>} (3)S1:{Sunny, Warm, Normal, Strong, Warm, Same} G1:{, , , , , }.Step 3: The second example is {<Sunny, Warm, High, Strong, Warm, Same, +>} (3) S2:{Sunny, Warm, , Strong, Warm, Same} G2:{, , , , , }.Step 4: The third example is {<Rainy, Cold, High, Strong, Warm, Change, ->} (3)S3:{ Sunny, Warm, , Strong, Warm, Same } G3:{<Sunny, , , , , >, <, Warm, , , , >, <, , , , , Same>} Step 5: The fourth example is {<Sunny, Warm, High, Strong, Cool, Change, +>} (3)S4:{Sunny, Warm, , Strong, , } G4:{<Sunny, , , , , >, <, Warm, , , , > }Finally, all the hypotheses are: (2) {<Sunny, Warm, , Strong, , >, <Sunny, , , Strong, , >, <Sunny, Warm, , , , >,<, Warm, , Strong, , >, <Sunny, , , , , >, <, Warm, , , , > }3. Answer: Flog(X) = -X*log(X)-(1-X)*log(1-X); STEP1 choose the root node: entropy_all =flog(4/10)=0.971; (2) gain_outlook = entropy_all - 0.3*flog(1/3) - 0.3*flog(1) - 0.4*flog(1/2)=0.296; (1) gain_templture = entropy_all - 0.3*flog(1/3) - 0.3*flog(1/3) - 0.4*flog(1/2)=0.02; (1)step 2 choose the sec ond NODE:for sunny (humidity OR temperature):en tropy_su nny = flog(1/3)=0.918; (1) sunn y_gain_wi nd = en tropy_su nny - (2/3)*flog(0.5) - (1/3)*flog(1)=0.252; (1) sunn y_gain_humidity = en tropy_su nny - (2/3)*flog(1) - (1/3)*flog(1)=0.918;(1)sunn y_gain_temperature = en tropy_su nny - (2/3)*flog(1) - (1/3)*flog(1)=0.918; (1) choose humidity or temperature. (1)for rain (win d):en tropy_rain = flog(1/2)=1; (1)rain_gain_wi nd = en tropy_rain - (1/2)*flog(1) - (1/2)*flog(1)=1; (1)rain_gain_humidity = en tropy_rain - (1/2)*flog(1/2)-(1/2)*flog(1/2)=0; (1)rain_gain_temperature = en tropy_rain - (1/4)*flog(1)- (3/4)*flog(1/3)=0.311; (1) gain_wind = en tropy_all - 0.6*flog(5/6)(1)Root Node is(2)⑴0.4*flog(1/4)=0.256;outlook ”: orgain_humidity = entropy_all - 0.5*flog(2/5) - 0.5*flog(1/5)=0.125;4. An swer:A: The primitive n eural un its are: perceptro n, li near unit and sigmoid un it. (3) Perceptr on: (2)A perceptr on takes a vector of real-valued in puts, calculates a lin ear comb in ati on of these inputs, the n output a 1 if the result is greater tha n some threshold and -1 otherwise. More precisely, give n in put x1 through xn, the output o(x1,..xi,.. xn) computed by the perceptr on is NSometimes write the perceptr on fun cti on asLin ear un its: (2)a lin ear unit for which the output o is give n byThus, a lin ear un it corresp onds to the first stage of a perceptr on, without the threshold. Sigmoid un its: (2) The sigmoid un it is illustrated in picture like the perceptro n, the sigmoid un it first computes a lin ear comb in atio n of its in puts, the n applies a threshold to the result. In the case of the sigmoid un it, however, the threshold output is a continu ous fun cti on of its in put.More precisely, the sigmoid un it computes its output o asWhere,B:(因题目有打印错误,所以感知器规则和delta规则均可,给出的是delta规则)Derivati on process is: (6)感知器规则(perceptron learning rule)5. An swer:P( no)=5/14 P(yes)=9/14 (1)P(su nny|no )=3/5 (1)P(cool| no)=1/5 (1) P(high| no)=4/5 (1) P(stro ng| no)=3/5 (1) P(no|new instance)=P(no)*P(sunny|no)*P(cool|no)*P(high|no)*P(strong|no)=5/14*3/5*1/5*4/5*3/5 = 0.02057=2.057*10 -2(2) P(su nny |yes)=2/9 (1) P(cool|yes)=3/9 (1) P(high|yes)=3/9 (1) P(stro ng|yes)=3/9 (1) P(yes|new instance)=P(yes)*P(sunny|yes)*P(cool|yes)*P(high|yes)*P(strong|yes)=9/14*2/9*3/9*3/9*3/9 = 0.0529 仁5.291*10 -3(2) ANSWER: NO (2) 6. An swer:INDUCTIVE BIAS: (8)Consider a concept learning algorithm L for the set of instances X, Let c be an arbitrary concept define over X, and let D c = {< x; c (x) >} be an arbitrary set of training examples of c . Let denote the classification assigned to the instanee x i by L after training on thedata D c .The in ductive bias of L is any mini mal set of asserti ons B such that for any targetconcept c and corresponding training examples D c:(?<i € X)[(B A x i A D c) ? L(x i;D c)]---The?futility?of?bias-free?learning: (7)A?learner?that?makes? no?a?priori?assumptio ns?regardi ng?the?ide ntity?of?the?target?co ncept?ha s?n o?ratio nal?basis?for?classifyi ng?a ny?u nsee n?i nsta nces.?l n?fact,?the?o nly?reaso n?that?the?lea rner?was?able?to?ge neralize?beyo nd?the?observed?trai ning?examples?is?that?it?was?biased?by? the? in ductive?bias.Unfortunately , the only instances that will produce a unanimous vote are the previously observed training examples. For, all the other instances, taking a vote will be futile: each unobserved instance will be classified positive by precisely half the hypotheses in the version space and will be classified n egative by the other half.1 In the EnjoySport lear ning task, every example day is represe nted by 6 attributes. Given thatattributes Sky has three possible values, and that AirTemp、Humidity、Wind、Wind、Water and Forecast each have two possible values. Expla in why the size of the hypothesis space is 973.How would the nu mber of possible in sta nces and possible hypotheses in crease with theaddition of one attribute A that takes on on K possible values2 Write the algorithm of Can didate_Elim in atio n using vers ion space. Assume G is the set ofmaximally gen eral hopytheses in hypothesis space H, and S is the set of maximally specific hopytheses.(a) What is the Entropy of the collection training examples with respect to the target functionclassificati on(b) According to the 5 traning examples, compute the decision tree that be learned by ID3, and showthe decisi on tree.(log23=1.585, log25=2.322)4 Give several approaches to avoid overfitti ng in decisi on tree lear ning. How to determ in thecorrect final tree size5 Write the BackPropagation algorithm for feedforward network containing two layers of sigmoid units.6 Explai n the Maximum a posteriori(MAP) hypothesis.7 Usi ng Naive Byes Classifier to classify the new in sta nee:<Outlook=s unn y,Temperature=cool,Humidity=high,Wi nd=stro ng> Our task is to predict the target value (yes or no) of the target concept PlayTennis for this new8 Question Eight : The definition of three types of fitness functions in genetic algorithmQuestion one :(举一个例子,比如:导航仪、西洋跳棋)Question two :In itilize: G={,,,,,} S={ ,,,,,}Step 1:G={,,,,,} S={s unny ,warm ,no rmal,str on g,warm,same}Step2: coming one positive in sta nee 2G={,,,,,} S={s unny ,warm,,str on g,warm,same}Step3: coming one n egative in sta nee 3G=<S unny,,,,,> <,warm,,,,> <,,,,,same>S={s unny ,warm,,str on g,warm,same}Step4: coming one positive in sta nee 4S= { sunny ,warm,,str on g,, }G=<Su nn y,,,,,> <,warm,,,,>Question three :(a) Entropy(S)= 一 -丨og(3/5) 一 -】og(2/5)= 0.971(b) Gain(S,sky) = Entropy(S) - (4/5) Entropy(Ssunny) + (1/5) Entropy(Srainny)] = 0.322Gai n( S,AirTemp) = Gai n(S,wi nd) = Gai n(S,sky) =0.322Gai n( S,Humidity) = Gain (S,Forcast) = 0.02Gai n( S,water) = 0.171Choose any feature of AirTemp, wi nd and sky as the top no de.The decisi on tree as follow: (If choose sky as the top no de)Question Four :An swer:In ductive bias: give some proor assumpti on for a target con cept made by the lear ner to have a basis for classify ing un see n in sta nces.Suppose L is a machine learning algorithm and x is a set of training examples. L(xi, Dc) denotes the classification assigned to xi by L after training examples on Dc. Then the inductive bias is a minimal set of assertion B, given an arbitrary target concept C and set of training examples Dc:(眾i E 艾)[(B n Dc「Xi) -| L(xi, Dc)]C_E: the target concept is contained in the given gypothesis space H, and the training examples are all positive examples.ID3: a, small trees are preferred over larger trees.B, the trees that place high information gain attribute close to root are preferred over those that do not.BP:Smooth in terpolati on betee n data poin ts.Question Five :Answer: In na?ve bayes classification, we assump that all attributes are independent given the tatget value, while in bayes belif n et, it specifes a set of con diti onal in depe ndence along with a set of probability distributi on.Question Six :随即梯度下降算法Question Seven :朴素贝叶斯例子Question Eight : The definition of three types of fitness functions in genetic algorithmAn swer:In order to select one hypothese according to fitness function, there are always three methods: roulette wheel selecti on, tour name nt selecti on and rank selectio n.Question nine :Sin gle-po int crossover:Two-po int crossover:Offspri ng:()Uniform crossover:Point mutati on:Any mutati on is ok!1 Solutio n:A computer program is said to lear n from experie nee E with respect to some class of tasks T and performa nee measure P,if its performa nee at tasks in T, as measured by P, improves with experie nee E.Example : (po int out the T,P,E of the example)A checkers lear ning problem.A handwriting recognition learning problemA robot drivi ng lear ning problem.2 Solutio n:S o :{ , , , , , }S 1:{Su nny, Warm, Normal, Stro ng, Warm, Same}S 2:{Su nny, Warm, , Stro ng, Warm, Same}G o , G 1, G 2:{, , , , , }S 3:{ Su nny, Warm, , Stro ng, Warm, Same }G 3:{Sunny, , , , , } U {, Warm, , , , } U {, , , , , Same}S 4:{Su nny, Warm, , Stro ng, , }G 4:{Sunny, , , , , } U {, Warm, , , , }3 Solutio n:4 In gen eral,i nductive inference: Some form of prior assumpti ons regard ing the inden tity of thetarget concept made by a learner to have a rational basis for classifying an unseen instances.FormallyCANDIDATE-ELIMINATION:The target con cept c is co ntain ed in the give n hypothesis space H. Decision tree learning(ID3): Shorter trees are preferred over larger trees.Trees that place highinformation gain attributes close to the root are perferred over those that do not. BACKPROPAGATION algorithm:smooth in terpolation between data poi nts.5 Soluti on: (1)⑵6(3) GRADIENT-DESCENT(training examples,)Each training example is a pair of the form : x,t. , where x is the vector of inputvalues, and t is the target output value. is the lear ning rate (e.g., 0.05).(a)Here we denote S=[7+,3-],then Entropy([7+,3-])=10 10 鼻2空 10 10 =0.886;(b) Gai n(S,Humidity)=E ntropy(S)-v values(Humidity J Entropy(S v ) Gain(S,a2)Q Values(Humidity )={High, Normal}S High {s S | Humidity (s) High}3 3 2 2Entropy(S High )=-:log 2:-匸log ?匚 0.972, S High 5=4 5 5 5 5 En tropy(S N ormal )=-|log 24-1log 21. 5 5 5 5 0.72 Thus Gain (S,Humidity) =0.886-(-°972 存OS =°04Initialize each i to some small random valueUn til the term in atio n con diti on is met, DoInitialize each i to zero.For each (x, t) in training_examples, DorIn put the in sta nee x to the un it and compute the output oFor each linear unit weight i, Doa) n+18Dtfinitiort: Consider a concept class C defined over a set of instances X of lengtli n and a learner L using hypothesis space H. C is PAC-learnable by L using H if for all c e C, distributions T> over X t芒such that 0 < € < 1/2, and $ such that 0 < 5 < 1/2, learner L will with probability at least (1 — 5) output a hypothesis h e H such that error^W< 巳in time that is polynomial in 1/百,l/久n r and。

相关文档
最新文档