文献翻译-基于最小二乘支持向量回归的短期混沌时间序列预测

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

外文文献翻译
Short Term Chaotic Time Series Prediction
using Symmetric LS-SVM Regression
Abstract—In this article, we illustrate the effect of imposing symmetry as prior knowledge into the modelling stage, within the context of chaotic time series predictions. It is illustrated that using Least-Squares Support Vector Machines with symmetry constraints improves the simulation performance, for the cases of time series generated from the Lorenz attractor, and multi-scroll attractors. Not only accurate forecasts are obtained, but also the forecast horizon for which these predictions are obtained is expanded.
1. Introduction
In applied nonlinear time series analysis, the estimation of a nonlinear black-box model in order to produce accurate forecasts starting from a set of observations is common practice. Usually a time series model is estimated based on available data up to time t, and its final assessment is based on the simulation performance from t + 1 onwards. Due to the nature of time series generated by chaotic systems, where the series not only shows nonlinear behavior but also drastic regime changes due to local instability of attractors, this is a very challenging task. For this reason, chaotic time series have been used as benchmark in several time series competitions.
The modelling of chaotic time series can be improved by exploiting some of its properties. If the true underlying system is symmetric, this information can be imposed to the model as prior knowledge ,
in which case it is possible to obtain better forecasts than those obtained with a general model . In this article, short term predictions for chaotic time series are generated using Least-Squares Support Vector Machines (LS-SVM) regression. We show that LS-SVM with symmetry constraints can produce accurate predictions. Not only accurate forecasts are obtained, but also the forecast horizon for which these predictions are obtained is expanded, when compared with the unconstrained LS-SVM formulation.
This paper is structured as follows. Section 2 describes the LS-SVM technique for regression, and how symmetry can be imposed in a straightforward way.Section 3 describes the applications for the cases of the x−coordinate of the Lorenz attractor, and the data generated by a nonlinear transformation of multi-scroll attractors.
2. LS-SVM with Symmetry Constraints
Least-Squares Support Vector Machines (LS-SVM) is a powerful nonlinear black-box regression method,which builds a linear model in the so-called feature space where the inputs have been transformed by means of a (possibly infinite dimensional) nonlinear mapping . This is converted to the dual space by means of the Mercer’s theorem and the use of a positive definite kernel, without computing explicitly the mapping. The LS-SVM formulation, solves a linear system in dual space under a least-squares cost function , where the sparseness property can be obtained by e.g. sequentially pruning the support value
spectrum or via a fixed-size subset selection approach. The LS-SVM training procedure involves the selection of a kernel parameter and the regularization parameter of the cost function, which can be done e.g. by cross-validation, Bayesian techniques or others. The inclusion of a symmetry constraint (odd or even) to the nonlinearity within the LS-SVM regression framework can be formulated as follows . Given the sample of N points {},1k k N x y k =, with input vectors p k x R ∈and output values k y R ∈, the goal is to estimate a model of the form ()T y w x b e ϕ=++
():h
n p R R ϕ∙→ is the mapping to a high dimensional (and possibly infinite dimensional) feature space, and the residuals e are assumed to be i.i.d. with zero mean and constant (and finite) variance. The following optimization problem with a regularized cost function is formulated:
2,,1
11min 22(),1,...,..()(),1,...,k N T k w b e k T k k k T T k k w w e y w x b e k N s t w x aw x k N γϕϕϕ=+⎧=++=⎨=-=⎩
∑ where a is a given constant which can take either -1 or 1. The first restriction is the standard model
formulation in the LS-SVM framework. The second restriction is a shorthand for the cases where we want to impose the nonlinear function ()T k w x ϕ to be even (resp. odd) by using a = 1 (resp. a = −1). The solution is formalized in the KKT lemma.
3. Application to Chaotic Time Series
In this section, the effects of imposing symmetry to the LS-SVM are presented for two cases of chaotic time series. On each example, an RBF kernel is used and the parameters σ and γ are found by 10-fold cross validation over the corresponding training sample. The results using the standard LS-SVM are compared to those obtained with the symmetry-constrained LS-SVM (S-LS-SVM) from (2). The examples are defined in such a way that there are not enough training datapoints on every region of the relevant space; thus, it is very difficult for a black-box model to ”learn” the symmetry just by using the available information. The examples are compared in terms of the performance in the training sample (cross-validation mean squared error, MSE-CV) and the generalization performance (MSE out of sample, MSE-OUT). For each case, a Nonlinear AutoRegressive (NAR) black-box model is formulated: ()((1),(2),...,())y t g y t y t y t p e t
=---+
where g is to be identified by LS-SVM and S-LS-SVM. The order p is selected during the cross-validation process as an extra parameter. After each model is estimated, they are used in simulation mode, where the future predictions are computed with the estimated model using past predictions: ()((1),(2),...,())y t g y t y t y t p ∧∧∧∧∧
=---
3.1. Lorenz attractor
This example is taken from [1]. The x−coordinate of the Lorenz attractor is used as an example of a time series generated by a dynamical system. A sample of 1000 datapoints is used for training, which corresponds to an unbalanced sample over the evolution of the system, shown on Figure 1 as a time-delay embedding.Figure 2 (top) shows the training sequence (thick line) and the future evolution of the series (test zone). Figure 2 (bottom) shows the simulations obtained from both models on the test zone. Results are presented on Table 1. Clearly the S-LS-SVM can simulate the system for the next 500 timesteps, far beyond the 100 points that can be simulated by the LS-SVM.
3.2. Multi-scroll attractors
This dataset was used for the K.U.Leuven Time Series Prediction Competition . The series was generated
by
()
tanh()x h x y W Vx ∙==
where h is the multi-scroll equation, x is the 3-dimensional coordinate vector, and W,V are the interconnection matrices of the nonlinear function (a 3-units multilayer perceptron, MLP). This MLP function hides the underlying structure of the attractor .A training set of 2,000 points was available for model estimation, shown on Figure 3, and the goal was to predict the next 200 points out of sample. The winner of the competition followed a complete methodology involving local modelling, specialized many-steps ahead cross-validation parameters tuning, and the exploitation of the symmetry properties of the series (which he did by flipping the series around the time axis).
Following the winner approach, both LS-SVM and S-LS-SVM are trained using 10-step-ahead cross-validation for hyperparameters selection. To illustrate the difference between both models, the out of sample MSE is computed considering only the first n simulation points, where n = 20, 50, 100, 200. It is important to emphasize that both models are trained using exactly the same methodology for order and hyperparameter selection; the only difference is the symmetry constraint for the S-LS-SVM case. Results are reported on Table 2. The simulations from both models are shown on Figure 4.
Figure 1: The training (left) and test (right) series from the x−coordinate of the Lorenz attractor
Figure 2: (Top) The series from the x−coordinate of the Lorenz attractor, part of which is used for training (thick line).
(Bottom) Simulations with LS-SVM (dashed line), S-LS-SVM (thick line) compared to the actual values (thin line).
Figure 3: The training sample (thick line) and future evolution (thin line) of the series from the K.U.Leuven Time Series Competition
Figure 4: Simulations with LS-SVM (dashed line), S-LSSVM (thick line) compared to the actual values (thin line) for the next 200 points of the K.U.Leuven data.
Table 1: Performance of LS-SVM and S-LS-SVM on the Lorenz data.
Table 2: Performance of LS-SVM and S-LS-SVM on the K.U.Leuven data.
4. Conclusions
For the task of chaotic time series prediction, we have illustrated how to use LS-SVM regression with symmetry constraints to improve the simulation performance for the cases of series generated by Lorenz attractor and multi-scroll attractors. By adding symmetry constraints to the LS-SVM formulation, it is possible to embed the information about symmetry into the kernel level. This translates not only in better predictions for a given time horizon, but also on a larger forecast horizon in which the model can track the time series into the future.
基于最小二乘支持向量回归的短期混沌时间序列预测
摘要：本文解释了在混沌序列预测范围内，先验知识到模型阶段任意使用对称性的作用效果。

以洛伦兹吸引器和多涡卷吸引子为例，表明了使用具有对称约束的最小二乘支持向量机能够提高仿真性能，其不仅能够获得的预测结果精确，而且所获得的预测范围能够得到扩展。

1.介绍
应用非线性时间序列分析，对一个非线性的黑箱模型的预测，要得到准确的预测，从一开始的一系列观测是比较常见的做法。

通常一个时间序列模型是根据可用的时间t 数据预测，他的最后评估时基于t+1时刻起的仿真性能。

由于时间序列的本质产生于混沌系统，那里的时间序列不仅显示非线性行为而且激烈的范围变化引起了吸引子的局部不稳定，这是一个非常具有挑战性的任务。

因此，混沌时间序列已经在几次时间序列的竞争中被看做基准。

混沌时间序列的模型可以利用它的一些属性来改善。

如果真正的底层系统是对称的，那么这个信息可以强加到模型中作为先验知识。

在这篇文章中，短期预测混沌时间序列是利用最小二乘支持向量机（LS-SVM ）的回归来产生。

我们的分析表明，与无约束的最小二乘支持向量机算法相比，利用对称约束的最小二乘支持向量机可以产生精确的预测结果：不仅得到了精确的预测结果，而且这些预测的预测界限还得到了扩展。

这篇文章的摘要组织如下。

第二部分描述了最小二乘支持向量回归的技术，以及对称可以以一种简便的方法实行。

第三部分描述了以x 坐标下的洛伦兹吸引器为例的应用，以及多轴吸引子的非线性转变所产生的数据。

2.利用对称约束的最小二乘支持向量机
最小二乘支持向量机（LS-SVM ）是一个强大的非线性黑箱模型的回归方法，它在所谓的输入是一系列非线性映射的特征空间（可能是无限的空间）里，构建了一个线性模型。

利用mercer 定理和使用正定内核转变在二维空间进行分析，并不用计算明确的映射。

最小二乘支持向量机解决了二维空间下线性系统的最小二乘成本函数，稀疏属性可以连续地修剪保价频谱，或者通过一个固定大小的子集选取来接近。

最小二乘支持向量机训练的过程涉及到核函数参数的选取和定向聚合价值函数的参数选择，可以用交叉验证，贝叶斯方法或者其他方法来实现。

包括对称约束（奇数或偶数）内的非线性特征，可以利用最小二乘支持向量机回归框架来制定如下：
给出N 个实例点{}
,1k k N x y k =，输入向量p k x R ∈和输出向量k y R ∈，目标是来预测如下的模型: ()T y w x b e ϕ=++
():h n p R R ϕ∙→是映射到一个高维空间（并可能无限维）的特征空间，和其余额e ，被认为
是i.i.d 与灵均值和方差常数。

用正规化函数的制定来解决以下优化问题：
2,,1
11min 22(),1,...,..()(),1,...,k N T k w b e k T k k k T T k k w w e y w x b e k N s t w x aw x k N γϕϕϕ=+⎧=++=⎨=-=⎩
∑ a 是一个给定的常数，可以取1或者-1，第一个限制是最小二乘支持向量机框架中的标准模型框架，第二个限制是我们希望对非线性函数变换()T k w x ϕ成为偶数（或者奇数）利用a 的取值为1（奇
数为-1）。

在KKT 定理中引出这个正式的解决方案。

3.混沌时间序列的应用
在这部分中，最小二乘支持向量机的对称约束的影响通过两个混沌时间序列的例子来表示。

在每个例子中，使用径向基核函数，径向基核函数参数σ和γ在相应的训练样本中进行十倍的交叉验证选取。

利用最小二乘支持向量机的标准进行了结果对比，与那些获得对称的最小二乘支持向量机（S-LS-SVM ）进行对比。

实例以这样一种方式定义，在相对空间的每一个范围里没有得到足够的训练，因此，对一个黑箱模型来说，只用过可用的信息来学习对称是一件特别困难的事。

这些示例在训练样本中利用性能来对比（交叉验证表示性能错误，MSE-CV ）和普遍性能（MSE-OUT ）。

在每个例子中，一个非线性自回归（NAR ）黑箱模型这样设定：()((1),(2),...,())()y t g y t y t y t p e t =---+
g 用来鉴别S-LS-SVM 和LS-SVM 。

p 命令在交叉验证时被选作是额外的参数。

在每个模型都被预测之后，它们被用于模拟模式，那里的未来预测利用过去的预测来计算：()((1),(2),...,())y t g y t y t y t p ∧∧∧∧∧
=---
3.1洛伦兹吸引子
这个例子是取自（1），洛伦兹吸引子的x 坐标被用作由一个动力系统产生的时间序列的实例，一个样本的1000个数据点被用来训练，而相应的在系统的演变中变成不平衡的实例，图1中表示的是时滞嵌入时的情形，图2（上）表示的是训练序列（粗线）和未来序列的演变（测试区）。

图2（下）表示的是在测试区从所有模型上得到的预测结果。

显然，S-LS-SVM 可以模拟系统在接下来的500个时间步的情形，远远超出LS-SVM 可以预测的100步。

3.2 多涡卷吸引子
这里的数据设置被用作K.U.Leuven 时间序列预测竞争。

序列由以下产生： ()
tanh()x h x y W Vx ∙==
h 是多轴方程式，x 是三维坐标向量，W,V 是非线性函数的相互联系的模型（MLP ），这个MLP 函数隐藏了吸引子的根本结构。

一个2000步的训练对预测模型来说是可用的，如图3所示，目标是预测在所举例子之外的200步，竞争中的优胜者包括局部建模的完整的方法论，专业的之前许多步的交叉验证参数寻优，和序列中的对称特性（通过在时间轴上翻转序列）。

以下竞争中优胜者的方法，LS-SVM 和S-LS-SVM 都是用10步交叉验证参数选择。

为了显示模型中间的不同，从样本MSE 是只考虑第一次仿真中的n 个点，其中n=20,50,100,200。

强调这些模型都是用确切相同的方法进行训练，不论是对秩序还是参数选择。

唯一不同的是对S-LS-SVM 的对称约束，结果如表2所示。

所有模型的仿真结果在图4中表示。

图1 洛伦兹吸引子的x 坐标的训练（左）和测试（右）结果
下：利用LS-SVM的仿真结果（虚线），S-LS-SVM（粗线）与实际值（细线）相对比
图3 在K.U.Leuven时间序列竞争中训练样本（粗线）和将来序列的变化（细线）
图4 对K.U.Leuven数据将来的200步的LS-SVM的仿真结果（虚线），
S-LS-SVM（粗线）与实际值（细线）的比较
表1 在lorenz数据中LS-SVM和S-LS-SVM的性能比较
表2 对K.U.Leuven数据的LS-SVM和S-LS-SVM的不同的表现分析
4.结论
对于混沌时间序列预测的任务，我们举例说明了对于由洛伦兹吸引子和多涡卷吸引子产生的序列，应该如何利用最小二乘支持向量机回归对称约束来提高仿真效果。

把对称约束加到LS-SVM公式中后，在内核级别中嵌入对称信息成为可能。

这不但意味着在一个给定时间区域里的预测结果会更加良好，而且从长远来看，模型可以实现在未来跟踪时间序列。