机器学习总结之四 神经网络neural network

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

1.17.1. Multi-layer Perceptron

多层感知器

Multi-layer Perceptron (MLP) is a supervised learning algorithm that learns a function by training on a dataset,

where m is the number of dimensions for input and o is the number of dimensions for output. Given a set of features X and a target y , it can learn a non-linear function approximator for either classification or regression. It is different from logistic regression, in that between the input and the output layer, there can be one or more non-linear layers, called hidden layers. Figure 1 shows a one hidden layer MLP with scalar output.

多层感知器(MLP)是一种监督学习算法,它通过在数据集上训练来学习函数,其中输入的维数和输出的维数。给定一组特征和一个目标,它可以学习用于分类或回归的非线性函数逼近器。它不同于逻辑回归,因为在输入和输出层之间,可以有一个或多个非线性层,称为隐藏层。

1.17.6. Complexity

复杂度

Suppose there are m training samples,n features,k hidden layers, each containing h neurons - for simplicity, and o output neurons. The time complexity of backpropagation is

, where i is the number of iterations. Since backpropagation has a high time complexity, it is advisable to start with smaller number of hidden neurons and few hidden layers for training.

假设有m个训练样本,n个特征,k个隐藏层,每个包含h个神经元——为了简单起见,还有o个输出神经元。反向传播的时间复杂度是,其中i是迭代次数。由于反向传播具有很高的时间复杂度,建议从较少数量的隐藏神经元和较少的隐藏层开始进行训练。

例如,将输入向量X上的每个属性缩放到[0,1]或[-1,+1),或将它标准化为平均值0和方差1。请注意,为了获得有意义的结果,您必须对测试集应用相同的缩放比例。您可以使用

StandardScaler进行标准化。

An alternative and recommended approach is to use StandardScaler in a Pipeline

另一种推荐的方法是在管道中使用StandardScaler

●Finding a reasonable regularization parameter is best done

using GridSearchCV, usually in the range

找到一个合理的正则化参数最好使用GridSearchCV,通常在这个范围内

●Empirically, we observed that L-BFGS converges faster and

with better solutions on small datasets. For relatively large

datasets, however, Adam is very robust. It usually converges quickly and gives pretty good performance. SGD with

momentum or nesterov’s momentum, on the other hand, can perform better than those two algorithms if learning rate is

correctly tuned.

根据经验,我们观察到L-BFGS(梯度下降高级算法)在小数据集上收敛得更快,解也更好。然而,对于相对较大的数据集,Adam(梯度下降高级算法)非常健壮。它通常收敛得很快,并给出相当好的性能。另一方面,如果正确调整学习速率,具有动量的SGD或nesterov动量的SGD可以比这两种算法表现得更好。

1.17.9. More control with warm_start

相关文档
最新文档