Abstract Time series prediction with single multiplicative neuron model
基于模块化交通流组合预测模型
doi:10.3969/j.issn.1003-3114.2023.04.023引用格式:顾潮,肖婷婷,丁飞,等.基于模块化交通流组合预测模型[J].无线电通信技术,2023,49(4):761-772.[GU Chao,XIAO Tingting,DING Fei,et al.Traffic Flow Combination Prediction Model Based on Modularization [J].Radio Communi-cations Technology,2023,49(4):761-772.]基于模块化交通流组合预测模型顾㊀潮1,肖婷婷2,丁㊀飞1,周启航1,赵芝因1(1.南京邮电大学物联网学院,江苏南京210003;2.南京信息工程大学电子与信息工程学院,江苏南京210044)摘㊀要:短时交通流预测是智能交通系统的核心能力组件之一,为城市交通管理㊁交通控制和交通引导提供智能决策支撑㊂针对交通路网交通流呈现的非线性㊁动态性和时序相关性,提出一种基于模块化的交通流组合预测模型ICEEMDAN-ISSA-BiGRU㊂采用改进的基于完全自适应噪声集合经验模态分解(Improved Complete Ensemble EmpiricalMode Decomposition with Adaptive Noise,ICEEMDAN)方法对交通流非线性时间序列进行分解,获取本征模态分量(Intrin-sic Mode Functions,IMF);利用双向门控循环单元(Bi-directional Gate Recurrent Unit,BiGRU)挖掘交通流量序列中的时序相关性特征;基于动态自适应t 分布变异方法改进麻雀搜索算法(Improved Sparrow Search Algorithm,ISSA),实现对BiG-RU 网络权值参数的迭代寻优,避免了短时预测结果陷入局部最优;基于公开PeMS 数据集对短时交通流预测性能进行性能评估与验证㊂实验结果表明,所提组合模型的短时交通流预测性能优于10个传统模型,改进后的交通流量预测平均绝对误差(Mean Absolute Error,MAE)指标接近10.98,平均绝对值百分误差(Mean Absolute Percentage Error,MAPE)指标接近10.12%,均方根误差(Root Mean Square Error,RMSE)指标接近12.42,且在不同数据集下所提模型具有较好的泛化性能㊂关键词:短时交通流预测;完全自适应噪声集合经验模态分解;本征模态分量;麻雀搜索算法;双向门控循环单元中图分类号:TP391㊀㊀㊀文献标志码:A㊀㊀㊀开放科学(资源服务)标识码(OSID):文章编号:1003-3114(2023)04-0761-12收稿日期:2023-03-16Traffic Flow Combination Prediction Model Based on ModularizationGU Chao 1,XIAO Tingting 2,DING Fei 1,ZHOU Qihang 1,ZHAO Zhiyin 1(1.School of Internet of Things,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;2.School of Electronic and Information Engineering,Nanjing University of Information Science and Technology,Nanjing 210044,China)Abstract :Short-term traffic flow prediction is one of the core competence components of intelligent transportation system,whichprovides intelligent decision support for urban traffic management,traffic control and traffic guidance.In this paper,Iceemdan-Isa-Bigru,a modular combined traffic flow prediction model,is proposed based on the nonlinear,dynamic and temporal correlation of traffic flow inthe traffic network.Firstly,an Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN)methodbased on fully adaptive noise set empirical Mode decomposition was used to decompose the nonlinear time series of traffic flows and ob-tain the intrinsic mode component.Secondly,bidirectional gated cycle unit is used to explore temporal correlation characteristics of traf-fic flow sequence.Then,based on the dynamic adaptive distributed variation method,the SSA algorithm is improved to achieve iterative optimization of the weight parameters of Bi-directional Gate Recurrent Unit (BiGRU)network,which avoids short-term prediction re-sults falling into local optimal.Finally,the short-time traffic flow prediction performance is evaluated and verified based on open PeMS data set.Experimental results show that the short-time traffic flow prediction performance of the proposed combined model is better than that of the 10traditional models.The improved Mean Absolute Error (MAE)index is close to 10.98,the Mean Absolute Percentage Er-ror (MAPE)index is close to 10.12%,and the Root Mean Square Error (PMSE)index is close to 12.42.The proposed model has bet-ter generalization performance under different data sets.Keywords:short-term traffic flow prediction;complete ensemble empirical mode decomposition with adaptive noise;intrinsic mode function;sparrow search algorithm;BiGRU0 引言随着城市化的发展,城市人口和机动车数量逐渐增加,道路利用率趋于饱和状态,尤其是早晚高峰时段,形成了城市 交通病 ㊂为解决这一问题,研究人员提出智能交通系统(Intelligent Traffic System, ITS)[1],通过计算机㊁通信㊁传感㊁控制㊁电子等技术手段,集成道路㊁汽车㊁驾驶员等各种资源㊂大数据与人工智能技术的发展,ITS已经开始向数据驱动型ITS转变[2]㊂其中,短期交通流预测是未来ITS 的核心能力组件之一,为交通管理㊁交通控制和交通引导提供决策依据,为智慧出行提供智力支持㊂现有交通流预测模型主要包括两类:统计模型和机器学习模型[3]㊂统计模型可分为线性理论模型和非线性理论模型㊂线性理论模型包括自回归移动平均模型(Autoregressive Integrated Moving Aver-age Model,ARIMA)及其变体[4-6]㊂非线性理论模型包括K近邻非参数回归(K-Nearest Neighbors, KNN)[7]㊁卡尔曼滤波器(Kalman Filter,KF)[8]㊁支持向量机(Support Vector Machine,SVM)[9]和人工神经网络(Artificial Neural Network,ANN)[10]㊂ITS 的建设与发展,针对道路感知能力正在获得有力提升,如地磁线圈㊁雷达检测器㊁视频监控以及车辆GPS数据等,为交通流预测提供了大量基础数据㊂由于数据多源㊁异质以及区域离散分布,采用传统线性和非线性方法进行短时交通流预测存在技术挑战基于机器学习的深度学习方法[11-12]适应于大样本量交通数据集的特征提取与分类识别,目前受到业界的广泛关注㊂考虑到深度网络相较于单层网络更能挖掘物理量的时空特征,李静宜等人[13]对长短时记忆(Long Short-Term Memory,LSTM)神经网络架构进行深度设计,采用3层LSTM神经网络结构,融入遗传算法对LSTM层数㊁Dense层数㊁隐藏层神经元数和Dense层神经元数进行寻优,能更好捕获路网交通流的波动特性,从而实现更准确的短时交通流预测精度㊂由于交通流的内在变化规律是复杂的,通过深度网络捕获长期的历史信息占用了大量的训练时间和内存㊂因此研究人员对交通流时间序列进行分解,简化预测模型的结构,全面有效地提取特征㊂Rilling等人[14]提出了经验模态分解(Empirical Mode Decomposition,EMD),将信号中不同尺度的趋势或波动逐步分解为一系列具有不同频率的本征模态分量(Intrinsic Mode Functions,IMF)㊂理论上,非线性和随机性的信号可以被分解;然而,传统的EMD分解不完整,会引起混合和假模态的问题㊂因此,研究人员在EMD基础上进行改进与优化,取得了很好的效果㊂例如,Wei等人[15]将EMD和反向传播神经网络(Back Propagation Neural Network, BPNN)相结合,选择与原始数据高度相关的模态以提高预测效率,表现出显著的性能㊂同样,Chen等人[16]采用集成经验模态分解(Ensemble Empirical Mode Decomposition,EEMD)对交通流时间序列进行分解,去除高频模态的同时引入LSTM,对左侧重构模态进行预测㊂因为每个本征模态分量在时间序列中都起着重要作用,过早放弃某些分量会导致交通流特征信息不足㊂针对这一问题,Lu等人[17]通过XGBoost方法对经过完全自适应噪声集合经验模态分解(Complete Ensemble Empirical Mode Decomposi-tion with Adaptive Noise,CEEMDAN)后的各车道交通流IMF进行预测,有效提取了其中的先验特征㊂Wang等人[18]将CEEMDAN与最小二乘支持向量机(Least Squares Support Vector Machine,LSSVM)相结合,有效地提高了非线性非平稳公路交通流量的预测精度㊂Huang等人[19]提出了通过K-means算法将CEEMDAN分解的交通流IMF进行聚类,并结合BiLSTM进行预测,有效降低了交通流量序列中的波动性与非平稳性㊂虽然上述方法模态混合问题在一定程度上得到了解决,但残余噪声和伪模态仍然存在㊂此外,训练数据规模较小或内存使用较高,依然存在难以捕捉交通流的深层变化特征的问题,尚需要进一步研究㊂从交通流量序列的非线性㊁非平稳性及时序相关性特征考虑,现有研究未将三者全面考虑,且残余噪声和伪模态仍然存在㊂对此,本文将CEEMDAN 能够细化交通流量时间序列非平稳性优势[20]㊁麻雀搜索算法(Sparrow Search Algorithm,SSA)收敛速度快和寻优能力强的特点[21]㊁双向门控循环单元(Bi-directional Gate Recurrent Unit,BiGRU)深度挖掘交通流序列中的时序相关性的优点[22]三者结合,提出一种改进完全自适应噪声集合经验模态分解(Im-proved Complete Ensemble Empirical Mode Decompo-sition with Adaptive Noise,ICEEMDAN)和改进麻雀搜索算法(Improved Sparrow Search Algorithm,ISSA)优化BiGRU的组合模型㊂1㊀基于CEEMDAN理论EMD分解是一种经典的自适应方法,用于解决非线性和非光滑信号问题㊂在交通流数据中,由于受到外部因素的影响,采集到的数据会包含更多的噪声,因此需要进行信号处理和去噪㊂EMD分解能够将信号分解成多个IMF,每个模态分量表示一个具有不同幅度和频率的波形,将数据分解成多个模态分量可以更好地观察到不同时间尺度上地变化趋势和周期性㊂但是EMD分解得到的模态分量存在模态混合,模态混合问题地出现导致时频分布不准确,使得一些模态分量失去了物理意义㊂EEMD在原始信号的基础上加入高斯白噪声,解决了模式混合问题㊂但分解后的高斯白噪声无法消除,分解的完整性较差,导致算法重构误差较大㊂为了克服这些问题,Torres等人[23]提出了CEEMDAN旨在每个分解过程中加入自适应高斯白噪声来改进EEMD,提高了EEMD分解的完整性的同时减少了重构误差及降低了计算成本㊂基于CEEMDAN分解的交通流时间序列数据的实现步骤如下:步骤1㊀对原始交通流量时间序列数据x第j次添加高斯白噪声,则交通流时间序列可表示为:x(j)=x+βj v(j)㊂(1)步骤2㊀采用EMD方法对交通流数据进行分解,得到第一个模态分量IMF(j)1,均值为IMF1以及第一个残差分量Res1:IMF1=x-IMF(j)1,(2)Res1=x-IMF1㊂(3)步骤3㊀根据步骤2可以计算得出第二个模态分量IMF(j)2,均值为IMF2和第二个残差分量Res2: IMF2=1NðN j=1E1(Res1+β1E1(v(j))),(4)Res2=Res1-IMF2㊂(5)步骤4㊀以此类推,计算第k个残差分量Res k 和第k+1个模态分量IMF k+1:Res k=Res k-1-IMF k,(6) IMF k+1=1NðN j=1E1(Res k+βk E k(v(j)))㊂(7)步骤5㊀重复上述步骤,直到残差分量无法继续分解,即为单调函数或不超过两个极值点,最终残差分量可表示为:Res=x-ðNk=1IMF k㊂(8)式中:x为原始交通流的时间序列数据,IMF k为分解后得到的第k个IMF,v(j)为分解过程中在j(j=1, 2, ,N)时刻添加的满足标准正态分布的高斯白噪声信号,βj为原始交通流时间序列数据x分解各阶段的信噪比,E k(㊃)表示EMD算法分解的第k个IMF㊂上述函数共同构成了原始信号在不同时间尺度上的特征,清晰地显示了原始交通流时间序列的变化趋势,有效地降低了预测误差㊂2㊀SSA-BiGRU短时交通流预测理论2.1㊀SSA原理SSA是基于麻雀觅食和反捕食行为的群体智能优化算法[24],其中,麻雀种群中的个体被分为3种不同类型:生产者㊁觅食者和侦查者㊂生产者具有高能量储备㊁强大的探索能力和广阔的探索空间,并负责为整个种群寻找食物丰富的觅食区域㊂当麻雀发现捕食者时,生产者需要引领其他个体到达安全区域,以避免捕食者的攻击㊂假设麻雀搜索算法的种群规模为N,且在d维空间寻找最优解,则生产者的位置更新方程如下:X Iter+1id=X Iter idˑexp-iαˑmax Iter(),R2<STX Iter id+QˑL,R2ȡST{,(9)式中:Iter 表示当前迭代次数,X Iter +1id 表示第i 只麻雀在第d (d =1,2, ,dim )维的位置,αɪ(0,1]为随机数,max Iter 表示最大迭代次数,R 2ɪ[0,1]和ST ɪ[0.5,1]分别表示警戒值和安全阈值,Q 是服从正态分布的随机数,L 是一个1ˑdim 的行向量,初始化所有元素为1㊂觅食者始终跟随生产者获取高质量的食物并增加能量储备㊂一些觅食者监视生产者并与其竞争食物㊂当觅食者的能量储备低时,它们将离开群体自己寻找食物以生存㊂觅食者的位置更新方程式为:X Iter +1id=Q ˑexp Xw Iterd-X Iter +1idi 2(),i >n 2Xb Iter+1d +X Iter +1id -Xb Iter +1dA +ˑL ,otherwise ìîíïïïï,(10)式中:Xw Iter d表示当前全局最差位置,n 表示种群中个体数量,Xb Iter +1d表示由生产者发现的全局最佳位置,A 是一个1ˑdim 维度的行向量,其元素被随机分配为1或-1,A +是A 的逆矩阵,即A +=A T (AA T )-1㊂在麻雀种群中,一些个体扮演侦察者的角色㊂这些个体能够探测到捕食者的威胁并向其他个体发出警报以避免危险㊂在模拟实验中,假设这样的个体占总种群的10%~20%,它们的初始位置被随机分配㊂侦察者的位置更新方程为:X Iter +1id =Xb Iterd+βˑ(X Iter +1id-Xb Iter d),f i ʂf g X Iterid+K ˑX Iter id -Xw Iter d f i -f w +ε(),f i =f g ìîíïïïï,(11)式中:Xb Iter d 表示当前全局最优位置;β是步长控制因子,是一个随机数,其遵循平均值为0㊁方差为1的正态分布;K 是一个在[-1,1]的随机数,f i 表示当前个体的适应度值(目标函数值),f g 和f w 分别表示当前全局最优和最差适应度值,ε是一个非常小的数,以避免分母为0的情况㊂2.2㊀BiGRU 原理递归神经网络以序列数据为输入,按序列演化为方向进行递归,所有循环单元都由链连接[25]㊂Cho 等人[26]提出了门控循环单元(Gate RecurrentUnit,GRU),该方法具有较少的收敛参数,其本质使用GRN 模块单元代替了RNN 的隐藏单元,有效地解决了传统RNN 由于短期记忆而导致的梯度消失问题㊂GRU 结构如图1所示㊂图1㊀GRU 网络结构Fig.1㊀GRU networks structureGRU 模块单元通过当前节点的输入x t 和上一节点的输出h t -1计算复位门控状态r t 和更新门控状态z t :r t =σ(w r ㊃[h t -1,x t ]),(12)z t =σ(w z ㊃[h t -1,x t ]),(13)式中:x t 为t 时刻输入数据,h t -1为前一时刻的隐藏状态信息,[]表示w r 和w z 两个向量的连接,即权重矩阵,表σ(㊃)示sigmoid 函数㊂得到信号后,首先通过h t -1和r t 相乘得到复位内存数据,然后与x t 进行拼接,最后使用tanh 激活函数将数据缩放到[-1,1],得到当前隐藏状态信息h ~t :h ~t =tanh(w h ~㊃[r t ㊃h t -1,x t ]),(14)式中:tanh(㊃)表示双曲正切激活函数㊂由于遗忘和记忆是同时进行的,将之前得到的更新门z t 作为遗忘门,1-z t 作为输入门,最终得到GRU 的输出为:h t =z t ㊃h ~t +(1-z t )㊃h t -1,(15)式中:h t 为GRU 网络在t 时刻的输出,门控信号z t ɪ[0,1],其值越接近0,数据越被遗忘;相反,它越接近1,保留的数据就越多㊂考虑到GRU 状态是单向传递的,这种情况下神经网络的映射输出仅基于时间数据的正向信息㊂本文采用BiGRU 模型将正反向传播机制[27]与GRU 相结合,使其较单向GRU 模型挖掘更多交通流量序列信息㊂如图2所示,BiGRU 模型通过正㊁反向GRU 隐藏状态信息h ңt 和h ѳt 线性计算得到最终的交通流量预测值y t :y t =w t h t ң+v t h t ѳ+b t ,(16)式中:w t 为正向GRU网络输出层权值系数,v t 为反向GRU 网络输出层权值系数,b t 为t 时刻双向门控循环单元中双隐态对应的偏置项㊂图2㊀BiGRU网络结构Fig.2㊀BiGRU networks structure3㊀ICEEMD-ISSA-BiGRU短时交通流预测模型构建及算法3.1㊀ICEEMDAN原理考虑到CEEMDAN方法仍然存在残留噪声和伪模态,使得误差在迭代过程中逐步积累,影响预测模型的训练和性能㊂本文提出了ICEEMDAN,该算法有两个改进点:①估计信号叠加噪声的局部均值,并将其与当前残差分量的平均值差异定义为主模态,减少了分解模式分量中存在的残留噪声;②提取第k个模态分量,使用E k(v(j))替换白噪声,减少模态重叠㊂具体步骤如下[24]:步骤1㊀设x(j)=x+β0E1(v(j)),计算第一个残差分量:Res1=M(x(j)),(17)式中:M(㊃)为模态分量的局部均值㊂步骤2㊀计算第一个模态分量IMF1:IMF1=X-Res1㊂(18)步骤3㊀第二个残差分量为Res1+β1E2(v(j))的均值,第二个模态分量IMF2定义为:IMF2=Res1-Res2=Res1-M(Res1+β1E2(v(j)))㊂(19)步骤4㊀同理,第k(k=3, ,K)个残差分量Res k表示为:Res k=M(Res k-1+βk-1E k(v(j))㊂(20)步骤5㊀最终得到ICEEMDAN算法的第k个IMF k:IMF k=Res k-1-Res k㊂(21)步骤6㊀执行步骤4,计算第k+1个残差分量㊂3.2㊀改进SSA优化BiGRU网络算法BiGRU网络采用传统的梯度下降法迭代网络参数,在梯度下降过程中易出现预测精度不足的问题[28],因此本节通过两方面的改进以提升组合模型预测精度及收敛速度k:①引入寻优能力及迭代速度在新型群体智能算法中较优的麻雀搜索算法对双向门控循环网络进行参数择优;②基于动态自适应t分布变异方法对SSA进行改进以缓解麻雀搜索算法易陷入局部最优解的问题㊂在群体智能优化算法中,变异可以提高种群的多样性,帮助脱离局部极值㊂其中,高斯变异算子和柯西变异算子是比较常见的变异算子㊂研究表明,高斯分布和柯西分布具有不同的概率分布特性,高斯变异算子具有更好的局部开发能力,而柯西变异算子具有更好的全局探索能力[29]㊂文中提出的动态自适应t分布是一种典型的标准统计分布,结合了高斯变异算子和柯西变异算子的优点,其形状随自由参数n的值而变化㊂当n=1时,t分布是柯西分布;当nңɕ时,t分布类似于高斯分布㊂由于t分布算子的变异性能与变异比例因子和自由度参数相关[30],所以本文在变异比例因子和t 分布自由度参数中都引入了迭代次数㊂麻雀的变异比例因子和改进后生产者的位置公式如下:σ=2(e-e Iter/max Iter)e-1,(22)xᶄi=x i+x iσi t(Iter),(23)式中:e是自然对数的底数,Iter是当前迭代次数, max Iter是最大迭代次数,x i和xᶄi分别表示变异前和变异后第i只麻雀的位置,t(Iter)是以Iter为自由度参数的t分布㊂如图3所示,在迭代的开始阶段,雀群需要在广阔的搜索空间中寻找目标㊂此时,突变尺度因子σ很大,t分布接近于柯西分布,使其具有更好的全局探索能力㊂随着迭代次数的增加,t分布的变异逐渐演变为高斯分布的同时突变尺度因子σ适当地减小,使得雀群缓慢地接近最优的全局解㊂可以看出,通过将迭代次数引入到自由度参数和变异比例因子σ中,动态自适应t分布变异实现了对变异幅度的非线性自适应控制㊂最终,动态自适应t分布变异方法提高了算法的全局和局部探索能力㊂由此得出ISSA优化BiGRU网络步骤如下:步骤1㊀初始化BiGRU和ISSA的超参数,定义第q个BiGRU交通流量子网络待优化参数集合W q,当前迭代次数Iter=1,当前权值参数编号h=1,最大迭代次数max Iter;步骤2㊀随机初始化N只麻雀的属性,即为N只麻雀随机赋予0~1的随机数;步骤3㊀采用式(12)~式(16)对交通流量模态分量进行预测,由式(24)计算结果与真实值间的MAE,即ISSA迭代过程中的适应度函数f i,以修正得到第q个BiGRU预测子网络的第h个权值参数:f i=1nðn i=1(himf h i-ximf h i)2,(24)式中:himf i为IMF中第i个真实模态分量值,himf h i 为训练第h个权值时输出的第i个交通分量预测值;步骤4㊀根据式(23)更新生产者的位置,并由式(10)和式(11)更新觅食者和侦察者的位置;步骤5㊀更新全局最优适应度值f g=f i和历史最优位置Xb Iter d=xᶄi㊂此时若Iter<max Iter,则令Iter= Iter+1返回步骤4执行;否则,将全局最优位置所对应的麻雀所赋予的随机数作为第q个BiGRU网络中第h个权值参数的最优权值参数;步骤6㊀若h未达到权值参数总数,令h=h+1后返回步骤2继续执行;否则,输出BiGRU预测子网络中的最优参数集合W∗q㊂(a)柯西分布㊁t分布和高斯分布概率密度曲线(b)突变尺度因子曲线图3㊀概率密度分布与突变尺度因子关系对比图Fig.3㊀Comparison diagram of the relationship between㊀probability density distribution and abrupt scalefactor4 短时交通流预测组合建模交通流量数据具有较强的非线性㊁非平稳及时序相关性,使用单一的预测方法难以达到理想的效果㊂因而采用ICEEMDAN算法对交通流量时间序列数据进行分解,以BiGRU的权值参数为优化对象,结合ISSA确定BiGRU权值参数的最优值,构建ICEEMDAN-ISSA-BiGRU组合模型,对城市短期交通流进行准确预测㊂模型结构框图如图4所示㊂图4㊀ICEEMDAN-ISSA-BiGRU短时交通流预测模型结构Fig.4㊀ICEEMDAN-ISSA-BiGRU structure of short-term traffic flow prediction model具体步骤如下:步骤1㊀采用3.1节步骤将待预测交通流量序列X =(x 1,x 2, ,x n )T分解,得到m 组体现路网趋势性㊁周期性的交通流量模态分量IMF;步骤2㊀采用3.2节步骤分别对各BiGRU 网络权值反复训练,最终得到最优参数集合W ∗,及各性能最优的BiGRU 预测子模型;步骤3㊀采用步骤2训练完成的BiGRU 预测子模型分别对交通流测试集模态分量进行预测,得到交通流量测试集模态分量的预测序列H q =(h 1,h 2, ,h n )T ,其中n 表示交通流量数据总数;步骤4㊀由式(8)可知,交通流量真实值由各交通模态分量及残差分量于等系数下相加而成,因此,总交通流量预测序列Y 由各交通模态分量的预测序列H q 相加得到,即Y =ðmq =1H q =(y 1,y 2, ,y n )T㊂5㊀实验与分析5.1㊀数据选取和实验参数设置本文采用美国加州Irvine 市405-N 高速公路车流量探测器(编号:VDS-1209092)获得的PeMS 数据集[30],选择PeMS 数据集中2019年5月1 7日样本数据,该样本数据中包括每5min 的交通流量㊁速度和占有率等㊂数据集被分为两部分:①选择2019年5月1 6日的数据当作训练集,训练本文模型;②选择2019年5月7日的交通流量数据调整模型参数㊂随后,使用训练好的模型预测2019年5月8日的交通流量,并将预测值与2019年5月8日的真实值进行比较,评价模型性能㊂考虑到作为预测模型输入的各种参数的维数不同且变化较大,使得模型的学习率可能会受到数据范围的限制而减缓收敛速度,进一步降低模型的稳定性和准确性㊂本文在模型数据输入之前,将数据集中各类数据的最大值和最小值作为参考值对数据集进行归一化处理:x norm =x -x minx max -x min ,(25)式中:x norm 为归一化数据,x 为原始数据,x min 和x max 为各数据维度所对应的最大值和最小值㊂通过归一化处理,每个数据都被映射到一个[0,1]的值㊂本文在64位Windows 10操作系统上使用Mat-lab R2021b 进行所有实验,主要硬件包括3.6GHz的CPU 和32GB 的内存㊂根据前期研究及实验修正,制定仿真参数如表1所示㊂表1㊀仿真参数设置Tab.1㊀Simulation parameter setting5.2㊀模型评价指标采用MAE㊁MAPE 和RMSE 作为模型的评价标准,其表达式为:MAE =1n ðn t =1|x t -x ^t |,(26)MAPE =1n ðn t =1x t -x ^tx t,(27)RMSE =1n ðnt =1(x t -x ^t)2,(28)式中:N 为测试数据,x t 和x ^t 为t 时刻的真实值和预测值㊂以上3个评价准则取值越低,取得的预测效果越优㊂5.3㊀改进麻雀搜索算法对比分析表2给出了应用于SSA和ISSA 的6个基准测试函数以验证ISSA 的优越性,实验函数包括3种单峰函数F 1~F 3(用于局部优化能力测试)和3种多峰函数F 4~F 6(用于全局优化能力测试)㊂本节对每个测试函数进行20次实验,计算出SSA 和ISSA 测试结果的最优值㊁平均值㊁最差值和标准差㊂其中,平均值反映了算法的收敛精度,标准差反映了算法的稳定性㊂表2㊀测试函数Tab.2㊀Test function函数类型公式维度上界和下界最优解单峰函数F 1(x )=ðni =1x 2i30[-100,100]0F 2(x )=ðni =1(ði j =1x j )230[-100,100]0F 3(x )=max x i ,1ɤi ɤn {}30[-100,100]0多峰函数F 4(x )=ðni =1-x i sin(x i )30[-500,500]-12569F 5(x )=14000ðn i =1x 2i -ᵑni =1cosx i i()+130[-600,600]0F 6(x )=ðni =1[x 2i -10cos(2πx i )+10]30[-5.12,5.12]㊀㊀从表3可以看出,ISSA 算法在所有测试函数中均成功获得最优解㊂具体的,在最优值㊁平均值㊁最差值和标准差方面都有较好的表现,表现出ISSA 优化的准确性和稳定性㊂由此得出ISSA 通过加入动态自适应t 分布变异,有效地增强了局部和全局寻优能力,同时提高了算法的收敛精度㊁稳定性和收敛速度㊂表3㊀SSA 和ISSA 的测试结果Tab.3㊀SSA and ISSA test results测试函数算法最优值平均值最值标准偏差F 1SSA4.8801ˑ10-358.2877ˑ10-208.6216ˑ10-192.1765ˑ10-19ISSA 0 2.2508ˑ10-1966.2383ˑ10-1950F 2SSA 1.3876ˑ10-343.5179ˑ10-193.2669ˑ10-187.9483ˑ10-19ISSA 0 2.7370ˑ10-2188.2109ˑ10-217F 3SSA 4.6777ˑ10-212.1774ˑ10-124.1323ˑ10-117.9168ˑ10-12ISSA 02.9173ˑ10-1054.2388ˑ10-1049.8060ˑ10-105F 4SSA -11897.79-7980.75-3539.172687.95ISSA -12653.87-12163.34-10673.76787.21F 5SSA 0 3.7007ˑ10-181.1102ˑ10-161.9929ˑ10-17ISSA 0000F 6SSA 0 1.8948ˑ10-155.6843ˑ10-141.0204ˑ10-14ISSA 05.4㊀交通流预测评估与分析为了验证所提出的组合预测模型ICEEMDAN-ISSA-BiGRU 性能的优越性,在同一数据集下构建10个基础预测模型,并与ICEEMDAN-ISSA-BiGRU进行比较分析㊂表4给出了用于交通流预测对比的典型模型清单,其中模型1~5与本文模型进行消融对比实验,模型6~10与本文模型进行不同模型性能对比实验㊂表4㊀典型交通流预测模型Tab.4㊀Typical traffic flow prediction model模型模型介绍缩写模型1循环神经网络GRU 模型2双向门控循环单元BiGRU 模型3基于SSA 优化BiGRU 模型进行预测SSA-BiGRU 模型4基于本文改进SSA 算法优化BiGRU 模型进行预测ISSA-BiGRU 模型5CEEMDAN 和本文改进的SSA 算法优化BiGRU 模型进行预测CEEMDAN-ISSA-BiGRU续表4模型模型介绍缩写模型6自回归移动平均模型ARIMA模型7反向传播网络BPNN模型8长期记忆递归神经网络LSTM模型9CEEMDAN对交通流量数据分解,GWO算法优化LSTM模型进行预测CEEMDAN-GWO-LSTM 模型10CEEMDAN对交通流量数据分解,K-means优化BiLSTM模型进行预测CEEMDAN-K-means-BiLSTM本文方法本文改进的CEEMDAN对交通流量数据分解,本文改进的SSA算法优化BiGRU模型进行预测ICEEMDAN-ISSA-BiGRU㊀㊀表5为不同模型的交通流预测性能对比,为了更直观地展示不同模型之间的误差情况,将各典型预测模型性能指标以曲线的方式进行对比展示,如图5所示㊂可以看出,本文模型的MAE㊁RMSE㊁RMSE的计算效果最好,预测精度最高㊂①由图5(a)及表5可以看出,BiGRU通过增加双向性能和门控机制可以更好地捕捉序列中的上下文信息及长期依赖关系,获得更充足的时序特征,其预测曲线较GRU更加贴合原始交通流量曲线,预测精度显著提高,表明本文选择双向门控循环单元的优势㊂②由图5(b)及表5可以看出,通过改进的自适应麻雀搜索优化算法代替梯度下降法训练所得的最佳网络参数BiGRU网络相较于SSA-BiGRU预测模型更为准确,其MAE降低了3.89,MAPE降低了4.38%,RMSE降低了14.41,验证了改进的自适应麻雀搜索优化算法对提高预测模型精度的有效性㊂③由图5(c)及表5可以看出,与CEEMDAN-ISSA-BiGRU相比,本文模型的MAE降低了3.31, MAPE降低了5.33%,RMSE降低了3.55㊂验证了CEEMDAN改进后能够有效地减少信号中的噪声,提高信号的可靠性,同时减少了分解模态分量之间的干扰和重叠,提高了分解效果和准确度㊂④由图5(d)及表5可以看出,在单个模型中,不同模型的预测精度值相似㊂同时,考虑到交通流时空特征的BPNN模型与只考虑交通流时空特征的ARIMA㊁LSTM和GRU模型相比,没有明显优势㊂与使用原始交通流数据的单一模型相比,基于分解原理的组合模型在MAE㊁MAPE和RMSE上均表现出较好的改进,SSA-BiGRU模型的MAPE和RMSE 分别比BiGRU提高了4.47%和12.21㊂⑤由图5(e)及表5可以看出,本文组合模型弥补了CEEMDAN-GWO-LSTM和CEEMDAN-K-means-BiLSTM对非平稳性考虑的缺失以及对CEEMDAN中残留噪声和伪模态的忽视,因此预测性能更加突出,其MAE㊁MAPE和RMSE指标性能相对较优的CEEMDAN-K-means-BiLSTM分别下降了4.36㊁5.77%及4.13㊂结果表明,本文提出的组合预测模型在交通流预测方面具备较好的性能㊂表5㊀不同模型的交通流预测性能对比Tab.5㊀Comparison of traffic flow prediction performance of different models模型MAE MAPE/%RMSE模型134.0237.3140.32模型229.0428.2236.76模型322.5623.7524.55模型418.6719.3720.14模型514.1115.4515.97模型633.7436.2550.66模型734.1237.1949.86模型834.9638.3250.98模型916.2317.3719.34模型1015.3415.8916.56本文方法10.9810.1212.42(a)GRU与BiGRU预测曲线对比(b)SSA改进前后组合模型性能曲线对比。
随机森林GLS包使用指南说明书
How to use RandomForestsGLSThe package RandomForestsGLSfits non-linear regression models on dependent data with Generalised Least Square(GLS)based Random Forest(RF-GLS)detailed in Saha,Basu and Datta(2020)https: ///abs/2007.15421.We will start by loading the RandomForestsGLS R package.library(RandomForestsGLS)Next,we discuss how the RandomForestsGLS package can be used for estimation and prediction in a non-linear regression setup under correlated errors in different scenarios.1.Spatial DataWe consider spatial point referenced data with the following model:y i=m(x i)+w(s i)+ i;where,y i,x i respectively denotes the observed response and the covariate corresponding to the i th observed location s i.m(x i)denotes the covariate effect,spatial random effect,w(s)accounts for spatial dependence beyond covariates,and accounts for the independent and identically distributed random Gaussian noise. In the spatial mixture model setting,the package RandomForestsGLS allows forfitting m(.)using RF-GLS. Spatial random effects are modeled using Gaussian Process as is the practice.For modelfitting,we use the computationally convenient Nearest Neighbor Gaussian Process(NNGP)(Datta,Banerjee,Finley,and Gelfand(2016)).Along with prediction of the covariate effect(mean function)m(.)we also offer kriging based prediction of spatial responses at new location.IllustrationWe simulate a data from the following model:y i=10sin(πx i)+w(s i)+ i; ∼N(0,τ2I),τ2=0.1;w∼exponential GP;σ2=10;φ=1. Here,the mean function is E(Y)=10sin(πX);w accounts for the spatial correlation,which is generated as a exponential Gaussian process with spatial varianceσ2=10and spatial correlation decayφ=1;and is the i.i.d random noise with varianceτ2=0.1,which is also called the nugget in spatial literature.For illustration purposes,we simulate with n=200:rmvn<-function(n,mu=0,V=matrix(1)){p<-length(mu)if(any(is.na(match(dim(V),p))))stop("Dimension not right!")D<-chol(V)t(matrix(rnorm(n*p),ncol=p)%*%D+rep(mu,rep(n,p)))}set.seed(5)n<-200coords<-cbind(runif(n,0,1),runif(n,0,1))set.seed(2)x<-as.matrix(runif(n),n,1)sigma.sq=10phi=1tau.sq=0.1D<-as.matrix(dist(coords))R<-exp(-phi*D)w<-rmvn(1,rep(0,n),sigma.sq*R)y<-rnorm(n,10*sin(pi*x)+w,sqrt(tau.sq))ModelfittingIn the package RandomForestsGLS,the working precision matrix used in the GLS-loss are NNGP approxima-tions of precision matrices corresponding to Matérn covariance function.In order tofit the model,the code requires:•Coordinates(coords):an n×2matrix of2-dimensional locations.•Response(y):an n length vector of response at the observed coordinates.•Covariates(X):an n×p matrix of the covariates in the observation coordinates.•Covariates for estimation(Xtest):an ntest×p matrix of the covariates where we want to estimate the function.Must have identical variables as that of X.Default is X.•Minimum size of leaf nodes(nthsize):We recommend not setting this value too small,as that will lead to very deep trees that takes a lot of time to be built and can produce unstable estimates.Default value is20.•The parameters corresponding to the covariance function(detailed afterwards).For the details on choice of other parameters,please refer to the helpfile of the code RFGLS_estimate_spatial, which can be accessed with?RFGLS_estimate_spatial.Known Covariance ParametersIf the covariance parameters are known,we set param_estimate=FALSE(default value);the code additionally requires the following:•Covariance Model(cov.model):Supported keywords are:“exponential”,“matern”,“spherical”,and “gaussian”for exponential,Matérn,spherical and Gaussian covariance function respectively.Default value is“exponential”.•σ2(sigma.sq):The spatial variance.Default value is1.•τ2(tau.sq):The nugget.Default value is0.01.•φ(phi):The spatial correlation decay parameter.Default value is5.•ν(nu):The smoothing parameter corresponding to the Matérn covariance function.Default value is0.5.We canfit the model as follows:set.seed(1)est_known<-RFGLS_estimate_spatial(coords,y,x,ntree=50,cov.model="exponential",nthsize=20,sigma.sq=sigma.sq,tau.sq=tau.sq,phi=phi)The estimate of the function at the covariates Xtest is given in estimation_reult$predicted.For inter-pretation of the rest of the outputs,please see the helpfile of the code RFGLS_estimate_ing covariance models other than exponential model are in beta testing stage.Unknown Covariance ParametersIf the covariance parameters are not known we set param_estimate=TRUE;the code additionally requires the covariance model(cov.model)to be used for parameter estimation prior to RF-GLSfitting.Wefit the model with unknown covariance parameters as follows.set.seed(1)est_unknown<-RFGLS_estimate_spatial(coords,y,x,ntree=50,cov.model="exponential",nthsize=20,param_estimate=TRUE)Prediction of mean functionGiven afitted model using RFGLS_estimate_spatial,we can estimate the mean function at new covariate values as follows:Xtest<-matrix(seq(0,1,by=1/10000),10001,1)RFGLS_predict_known<-RFGLS_predict(est_known,Xtest)Performance comparisonWe obtain the Mean Integrated Squared Error(MISE)of the estimateˆm from RF-GLS on[0,1]and compare it with that corresponding to the classical Random Forest(RF)obtained using package randomForest(with similar minimum nodesize,nodesize=20,as default nodesize performs worse).We see that our method has a significantly smaller MISE.Additionally,we show that the MISE obtained with unknown parameters in RF-GLS is comparable to that of the MISE obtained with known covariance parameters.library(randomForest)set.seed(1)RF_est<-randomForest(x,y,nodesize=20)RF_predict<-predict(RF_est,Xtest)#RF MISEmean((RF_predict-10*sin(pi*Xtest))^2)#>[1]8.36778#RF-GLS MISEmean((RFGLS_predict_known$predicted-10*sin(pi*Xtest))^2)#>[1]0.150152RFGLS_predict_unknown<-RFGLS_predict(est_unknown,Xtest)#RF-GLS unknown MISEmean((RFGLS_predict_unknown$predicted-10*sin(pi*Xtest))^2)#>[1]0.1851928We plot the true m(x)=10sin(πx)along with the loess-smoothed version of estimatedˆm(.)obtained from RF-GLS and RF where we show that RF-GLS estimate approximates m(x)better than that corresponding to RF.rfgls_loess_10<-loess(RFGLS_predict_known$predicted~c(1:length(Xtest)),span=0.1)rfgls_smoothed10<-predict(rfgls_loess_10)rf_loess_10<-loess(RF_predict~c(1:length(RF_predict)),span=0.1)rf_smoothed10<-predict(rf_loess_10)xval<-c(10*sin(pi*Xtest),rf_smoothed10,rfgls_smoothed10)xval_tag<-c(rep("Truth",length(10*sin(pi*Xtest))),rep("RF",length(rf_smoothed10)), rep("RF-GLS",length(rfgls_smoothed10)))plot_data<-as.data.frame(xval)plot_data$Methods<-xval_tagcoval<-c(rep(seq(0,1,by=1/10000),3))plot_data$Covariate<-covallibrary(ggplot2)ggplot(plot_data,aes(x=Covariate,y=xval,color=Methods))+geom_point()+labs(x="x")+labs(y="f(x)")Prediction of spatial responseGiven afitted model using RFGLS_estimate_spatial,we can predict the spatial response/outcome at new locations provided the covariates at that location.This approach performs kriging at a new location using the mean function estimates at the corresponding covariate values.Here we partition the simulated data into training and test sets in4:1ratio.Next we perform prediction on the test set using a modelfitted on the training set.est_known_short<-RFGLS_estimate_spatial(coords[1:160,],y[1:160],matrix(x[1:160,],160,1),ntree=50,cov.model="exponential",nthsize=20,param_estimate=TRUE)RFGLS_predict_spatial<-RFGLS_predict_spatial(est_known_short,coords[161:200,],matrix(x[161:200,],40,1))pred_mat<-as.data.frame(cbind(RFGLS_predict_spatial$prediction,y[161:200]))colnames(pred_mat)<-c("Predicted","Observed")ggplot(pred_mat,aes(x=Observed,y=Predicted))+geom_point()+geom_abline(intercept=0,slope=1,color="blue")+ylim(0,16)+xlim(0,16)Misspecification in covariance modelThe following example considers a setting when the parameters are estimated from a misspecified covariance model.We simulate the spatial correlation from a Matérn covariance function with smoothing parameter ν=1.5.Whilefitting the RF-GLS,we estimate the covariance parameters using an exponential covariance model(ν=0.5)and show that the obtained MISE can compare favorably to that of classical RF.#Data simulation from matern with nu=1.5nu=3/2R1<-(D*phi)^nu/(2^(nu-1)*gamma(nu))*besselK(x=D*phi,nu=nu)diag(R1)<-1set.seed(2)w<-rmvn(1,rep(0,n),sigma.sq*R1)y<-rnorm(n,10*sin(pi*x)+w,sqrt(tau.sq))#RF-GLS with exponential covarianceset.seed(3)est_misspec<-RFGLS_estimate_spatial(coords,y,x,ntree=50,cov.model="exponential",nthsize=20,param_estimate=TRUE)RFGLS_predict_misspec<-RFGLS_predict(est_misspec,Xtest)#RFset.seed(4)RF_est<-randomForest(x,y,nodesize=20)RF_predict<-predict(RF_est,Xtest)#RF-GLS MISEmean((RFGLS_predict_misspec$predicted-10*sin(pi*Xtest))^2)#>[1]0.1380569#RF MISEmean((RF_predict-10*sin(pi*Xtest))^2)#>[1]2.2956392.Autoregressive Time Series DataRF-GLS can also be used for function estimation in a time series setting under autoregressive errors.We consider time series data with errors from an AR(q)process as follows:y t=m(x t)+e t;e t=qi=1ρi e t−i+ηtwhere,y i,x i denotes the response and the covariate corresponding to the t th time point,e t is an AR(q) pprocess,ηt denotes the i.i.d.white noise and(ρ1,···,ρq)are the model parameters that captures the dependence of e t on(e t−1,···,e t−q).In the AR time series scenario,the package RandomForestsGLS allows forfitting m(.)using RF-GLS.RF-GLS exploits the sparsity of the closed form precision matrix of the AR process for modelfitting and prediction of mean function m(.).IllustrationHere,we simulate from the AR(1)process as follows:y=10sin(πx)+e;e t=ρe t−1+ηt;ηt∼N(0,σ2);e1=η1;ρ=0.9;σ2=10.Here,E(Y)=10sin(πX);e which is an AR(1)process,accounts for the temporal correlation,σ2denotes the variance of white noise part of the AR(1)process andρcaptures the degree of dependence of e t on e t−1. For illustration purposes,we simulate with n=200:rho<-0.9set.seed(1)b<-rhos<-sqrt(sigma.sq)eps=arima.sim(list(order=c(1,0,0),ar=b),n=n,rand.gen=rnorm,sd=s)y<-c(eps+10*sin(pi*x))ModelfittingIn case of time series data,the code requires:•Response(y):an n length vector of response at the observed time points.•Covariates(X):an n×p matrix of the covariates in the observation time points.•Covariates for estimation(Xtest):an ntest×p matrix of the covariates where we want to estimate the function.Must have identical variables as that of X.Default is X.•Minimum size of leaf nodes(nthsize):We recommend not setting this value too small,as that will lead to very deep trees that takes a lot of time to be built and can produce unstable estimates.Default value is20.•The parameters corresponding to the AR process(detailed afterwards).For the details on choice of other parameters,please refer to the helpfile of the code RFGLS_estimate_timeseries, which can be accessed with?RFGLS_estimate_timeseries.Known AR process ParametersIf the AR process parameters are known we set param_estimate=FALSE(default value);the code additionally requires lag_params=c(ρ1,···,ρq).We canfit the model as follows:set.seed(1)est_temp_known<-RFGLS_estimate_timeseries(y,x,ntree=50,lag_params=rho,nthsize=20) Unknown AR process ParametersIf the AR process parameters are not known,we set param_estimate=TRUE;the code requires the orderof the AR process,which is obtained from the length of the lag_params input vector.Hence if we want to estimate the parameters from a AR(q)process,lag_params should be any vector of length q.Here wefit the model with q=1set.seed(1)est_temp_unknown<-RFGLS_estimate_timeseries(y,x,ntree=50,lag_params=rho,nthsize=20,param_estimate=TRUE) Prediction of mean functionThis part of time series data analysis is identical to that corresponding to the spatial data.Xtest<-matrix(seq(0,1,by=1/10000),10001,1)RFGLS_predict_temp_known<-RFGLS_predict(est_temp_known,Xtest)Here also,similar to the spatial data scenario,RF-GLS outperforms classical RF in terms of MISE both with true and estimated AR process parameters.library(randomForest)set.seed(1)RF_est_temp<-randomForest(x,y,nodesize=20)RF_predict_temp<-predict(RF_est_temp,Xtest)#RF MISEmean((RF_predict_temp-10*sin(pi*Xtest))^2)#>[1]7.912517#RF-GLS MISEmean((RFGLS_predict_temp_known$predicted-10*sin(pi*Xtest))^2)#>[1]2.471876RFGLS_predict_temp_unknown<-RFGLS_predict(est_temp_unknown,Xtest)#RF-GLS unknown MISEmean((RFGLS_predict_temp_unknown$predicted-10*sin(pi*Xtest))^2)#>[1]0.8791857Misspecification in AR process orderWe consider a scenario where the order of autoregression used for RF-GLS modelfitting is mis-specified.We simulate the AR errors from an AR(2)process andfit RF-GLS with an AR(1)process.#Simulation from AR(2)processrho1<-0.7rho2<-0.2set.seed(2)b<-c(rho1,rho2)s<-sqrt(sigma.sq)eps=arima.sim(list(order=c(2,0,0),ar=b),n=n,rand.gen=rnorm,sd=s)y<-c(eps+10*sin(pi*x))#RF-GLS with AR(1)set.seed(3)est_misspec_temp<-RFGLS_estimate_timeseries(y,x,ntree=50,lag_params=0,nthsize=20,param_estimate=TRUE)RFGLS_predict_misspec_temp<-RFGLS_predict(est_misspec_temp,Xtest)#RFset.seed(4)RF_est_temp<-randomForest(x,y,nodesize=20)RF_predict_temp<-predict(RF_est_temp,Xtest)#RF-GLS MISEmean((RFGLS_predict_misspec_temp$predicted-10*sin(pi*Xtest))^2)#>[1]1.723218#RF MISEmean((RF_predict_temp-10*sin(pi*Xtest))^2)#>[1]3.735003ParallelizationFor RFGLS_estimate_spatial,RFGLS_estimate_timeseries,RFGLS_predict and RFGLS_predict_spatial one can also take the advantage of parallelization,contingent upon the availability of multiple cores.The component h in all the functions determines the number of cores to be used.Here we demonstrate an example with h=2.#simulation from exponential distributionset.seed(5)n<-200coords<-cbind(runif(n,0,1),runif(n,0,1))set.seed(2)x<-as.matrix(runif(n),n,1)sigma.sq=10phi=1tau.sq=0.1nu=0.5D<-as.matrix(dist(coords))R<-exp(-phi*D)w<-rmvn(1,rep(0,n),sigma.sq*R)y<-rnorm(n,10*sin(pi*x)+w,sqrt(tau.sq))#RF-GLS model fitting and prediction with parallel computationset.seed(1)est_known_pl<-RFGLS_estimate_spatial(coords,y,x,ntree=50,cov.model="exponential",nthsize=20,sigma.sq=sigma.sq,tau.sq=tau.sq,phi=phi,h=2)RFGLS_predict_known_pl<-RFGLS_predict(est_known_pl,Xtest,h=2)#MISE from single coremean((RFGLS_predict_known$predicted-10*sin(pi*Xtest))^2)#>[1]0.150152#MISE from parallel computationmean((RFGLS_predict_known_pl$predicted-10*sin(pi*Xtest))^2)#>[1]0.150152For RFGLS_estimate_spatial with very small dataset(n)and small number of trees(ntree),communi-cation overhead between the nodes for parallelization outweighs the benefits of the parallel computing hence it is recommended to parallelize only for moderately large n and/or ntree.It is strongly recommended that the max value of h is kept strictly less than the number of total cores available. Parallelization for RFGLS_estimate_timeseries can be addressed identically.For RFGLS_predict and RFGLS_predict_spatial,even for large dataset,single core performance is very fast,hence unless ntest and ntree are very high,we do not recommend using parallelization for RFGLS_predict and RFGLS_predict_spatial.。
引入虚拟变量的时间序列分解法在卷烟销量预测中的应用
Application of Time-series Decomposition with Dummy Variables to Cigarette Sales Forecast
LUO Biao, YAN Wei-Wei, WAN Liang
(School of Management, University of Science & Technology of China, Hefei 230026, China)
Special Issue 专论·综述
215
计 算 机 系 统 应 用
2012 年 第 21 卷 第 12 期
1
卷烟月销量预测方法回顾
相比于年销量和季度销量预测, 卷烟月销量预测
日对阳历销量的影响. 实际预测中通常从定性的角度 估计传统节日的影响, 在定量预测的基础上根据经验 进行调整[10], 这种预测方法容易受主观因素影响, 因 此只能作为辅助手段 . 邹益民 (2008)[11] 将阳历销量数 据转换为农历数据的预测方法 , 不仅需要编程 , 还要 考虑闰月及各月实际天数等问题, 同时还要频繁地进 行阳历和农历的转换 , 不便于实际操作 , 预测准确性 不高 , 相对误差最高仍为 132.63%. 本文考虑引入虚
[1]
时间序列分解法将序列分解为趋势变动 (Trend, T)、季节变动(Seasonal, S)、周期变动(Circle, C)和不 规则变动 (Irregular, I), 能够很好地对时间序列进行 预测[2]. 但目前的时间序列分解法在分析季节性变动 因素时, 主要针对阳历时间进行分析 , 无法体现出中 国农历节日如中秋、春节的消费特色, 无法准确预测 节日性的销量增长[3], 而这些特定节日本身对卷烟销 量产生重要影响. 为解决这一问题, 本文在时间序列 分解法的基础上, 引入虚拟变量, 把对卷烟销量有着 重要影响的农历传统节日进行虚拟变量化 , 以更好 地拟合卷烟销量变化趋势来提高预测的精度.
基于CNN-LSTM-Attention组合模型对我国货运量时序预测对比
基于CNN-LSTM-Attention组合模型对我国货运量时序预测对比作者:燕学博曹世鑫来源:《物流科技》2024年第14期摘要:為了进一步提高我国货运量的预测准确性,文章基于卷积神经网络和长短期记忆网络模型,引入注意力机制(Attention Mechanism)的组合预测模型,以对我国货运量进行时序预测。
首先,利用卷积神经网络提取货运量数据变化特征。
其次,将所提取的特征构成时间序列作为长短期记忆网络的输入。
最后,通过注意力集中捕捉预测模型中经LSTM层输出的信息特征,划分权重比例,提取关键信息,实现货运量预测。
结合全国月度货运量历史数据进行时序预测,然后与其他神经网络预测的各种评价指标进行对比,结果显示,CNN-LSTM-Attention模型预测误差小于其他模型,预测准确性相对较好。
关键词:货运量;预测;CNN;LSTM;注意力机制中图分类号:F259.22 文献标志码:A DOI:10.13714/ki.1002-3100.2024.14.002文章编号:1002-3100(2024)14-0005-05Comparison of Time-Series Prediction of Freight Transportation Volume in China Based on CNN-LSTM-Attention Combination ModelYAN Xuebo1,CAO Shixin2 (1. School of Management, Fujian University of Technology,Fuzhou 350118, China; 2. School of Transportation, Fujian University of Technology, Fuzhou 350118, China)Abstract: In order to further improve the prediction accuracy of China's high freight volume,this paper introduces a combined prediction model of Attention Mechanism based on convolutional neural network and long and short-term memory network model to forecast China's freight volume in time series. First of all, the convolutional neural network is used to extract the features of the freight volume data changes, and then the extracted features are used to constitute a time series as the input of the long and short-term memory network, and finally, the attention is focused on capturing the features of the information output from the LSTM layer in the prediction model, dividing the weight ratio, extracting the key information, and realizing the prediction of the freight volume. Combined with the national monthly freight volume historical data for time series prediction, and then compared with other neural network prediction of various evaluation indexes, the results show that the CNN-LSTM-Attention model prediction error is smaller than other models, and the prediction accuracy is relatively good.Key words: freight volume; prediction; CNN; LSTM; attention mechanism0 引言近年来,我国的货物运输总量持续增长,但增速整体上呈现出逐渐减缓的趋势,这主要是因为我们的货运量预测不够准确和合理,导致了资源的浪费[1]。
时间序列分析中模式识别方法的应用-模式识别论文
时间序列分析中模式识别方法的应用摘要:时间序列通常是按时间顺序排列的一系列被观测数据,其观测值按固定的时间间隔采样。
时间序列分析(Time Series Analysis)是一种动态数据处理的统计方法,就是充分利用现有的方法对时间序列进行处理,挖掘出对解决和研究问题有用的信息量。
经典时间序列分析在建模、预测等方面已经有了相当多的成果,但是由于实际应用中时间序列具有不规则、混沌等非线性特征,使得预测系统未来的全部行为几乎不可能,对系统行为的准确预测效果也难以令人满意,很难对系统建立理想的随机模型。
神经网络、遗传算法和小波变换等模式识别技术使得人们能够对非平稳时间序列进行有效的分析处理,可以对一些非线性系统的行为作出预测,这在一定程度上弥补了随机时序分析技术的不足。
【1】本文主要是对时间序列分析几种常见方法的描述和分析,并重点介绍神经网络、遗传算法和小波变换等模式识别方法在时间序列分析中的典型应用。
关键字:时间序列分析模式识别应用1 概述1.1 本文主要研究目的和意义时间序列分析是概率论与数理统计学科的一个分支,它是以概率统计学作为理论基础来分析随机数据序列(或称动态数据序列),并对其建立数学模型,即对模型定阶、进行参数估计,以及进一步应用于预测、自适应控制、最佳滤波等诸多方面。
由于一元时间序列分析与预测在现代信号处理、经济、农业等领域占有重要的地位,因此,有关的新算法、新理论和新的研究方法层出不穷。
目前,结合各种人工智能方法的时序分析模型的研究也在不断的深入。
时间序列分析已是一个发展得相当成熟的学科,已有一整套分析理论和分析工具。
传统的时间序列分析技术着重研究具有随机性的动态数据,从中获取所蕴含的关于生成时间序列的系统演化规律。
研究方法着重于全局模型的构造,主要应用于对系统行为的预测与控制。
时间序列分析主要用于以下几个方面:a 系统描述:根据观测得到的时间序列数据,用曲线拟合的方法对系统进行客观的描述;b 系统分析:当观测值取自两个以上变量时,可用一个时间序列中的变化去说明另一个时间序列中的变化,从而深入了解给定时间序列产生的机理;c 未来预测:一般用数学模型拟合时间序列,预测该时间序列未来值;d 决策和控制:根据时间序列模型可调整输入变量使系统发展过程保持在目标值上,即预测到偏离目标时便可进行控制。
一种新的部分神经进化网络的股票预测(英文)
一种新的部分神经进化网络的股票预测(英文)A New Partial Neural Evolution Network for Stock PredictionAbstract:Stock prediction has always been a challenging task due to its high complexity and uncertainty. In recent years, with the advancement of artificial intelligence and machine learning techniques, researchers have developed various models to improve the accuracy of stock prediction. In this study, we propose a new approach called Partial Neural Evolution Network (P-NEN) for stock prediction. P-NEN combines the advantages of neural networks and evolutionary algorithms to enhance the predictive performance. Through experiments and comparisons with existing models, we demonstrate the effectiveness of P-NEN in stock prediction.1. IntroductionStock prediction plays a vital role in financial decision-making and has attracted significant attention from researchers and investors. The accurate prediction of stock prices has considerable potential in maximizing profits and minimizing risks. In recent years, artificial intelligence and machine learning have shown promising results in various domains, including finance. Therefore, applying these techniques to stock prediction has become an active area of research.2. Methodology2.1 Neural NetworksNeural networks have demonstrated great success in various fields, including pattern recognition and time series prediction. They are composed of interconnected artificial neurons and can learn complex patterns from historical data. In stock prediction, neural networks can capture the nonlinear relationships between the input features and the target variable.2.2 Evolutionary AlgorithmsEvolutionary algorithms, such as genetic algorithms and particle swarm optimization, are widely used in optimization problems. These algorithms imitate the process of natural evolution and can effectively search for the optimal solution within a large parameter space. In stock prediction, evolutionary algorithms can optimize the neural network's parameters to enhance its performance.2.3 Partial Neural Evolution Network (P-NEN)Inspired by the above two techniques, we propose a novel model called P-NEN for stock prediction. P-NEN consists of a neural network architecture and an evolutionary algorithm. The neural network is responsible for capturing the patterns and relationships in the historical stock data, while the evolutionary algorithm optimizes the neural network's parameters. P-NEN incorporates partial connections between neurons to improve the network's efficiency and prevent overfitting.3. Experimental ResultsTo evaluate the performance of P-NEN, we conduct experiments on real-world stock data. We compare P-NEN with traditional neural networks, genetic algorithms, and other state-of-the-art models. The evaluation metricsinclude accuracy, precision, recall, and F1-score. The results demonstrate that P-NEN outperforms the baseline models and achieves higher accuracy in stock prediction.4. DiscussionThe experimental results validate the effectiveness of P-NEN in stock prediction. By combining neural networks and evolutionary algorithms, P-NEN takes advantage of both techniques and achieves improved predictive performance. The partial connections in P-NEN further enhance the model's efficiency and prevent overfitting. However, there are still challenges in stock prediction, such as data quality and forecasting market trends during extreme events. Future research should focus on addressing these challenges and further improving the accuracy and reliability of stock prediction models.5. ConclusionIn this study, we propose a new approach called Partial Neural Evolution Network (P-NEN) for stock prediction. P-NEN combines the strengths of neural networks and evolutionary algorithms to enhance the predictive performance. Through experiments and comparisons, we demonstrate the effectiveness of P-NEN in stock prediction. We believe that P-NEN has the potential to be applied in real-world financial scenarios and contribute to more accurate and reliable stock predictions.References:[1] Zhang, Y., Cheng, J., & Zhang, G. (2020). A neural network based method for stock prediction. Neurocomputing, 380, 173-184.[2] Li, S., Wu, Q., & Yang, G. (2019). Stock price prediction based on hybrid model of chaotic maps and quantum particle swarm optimization algorithm. Applied Soft Computing, 74, 653-662.[3] Wang, Y., & Ji, Y. (2018). Stock price prediction using deep neural network with dilated convolutional layers. Neurocomputing, 275, 187-197.。
基于SSA-LSTM的瓦斯浓度预测模型
基于SSA−LSTM 的瓦斯浓度预测模型兰永青1, 乔元栋2, 程虹铭1, 雷利兴1, 罗化峰1(1. 山西大同大学 煤炭工程学院,山西 大同 037003;2. 山西大同大学 建筑与测绘工程学院,山西 大同 037003)摘要:为了更好地捕捉瓦斯浓度的时变规律及有效信息,实现对采煤工作面瓦斯浓度的精准预测,采用麻雀搜索算法(SSA )优化长短期记忆 (LSTM) 网络,提出了一种基于SSA−LSTM 的瓦斯浓度预测模型。
采用均值替换法对原始瓦斯浓度时序数据中的缺失数据及异常数据进行处理,再进行归一化和小波阈值降噪;对比测试了SSA 与灰狼优化 (GWO) 算法、粒子群优化(PSO )算法的性能差异,验证了SSA 在寻优精度、收敛速度和适应能力等方面的优势;利用SSA 的自适应性依次对LSTM 的学习率、隐藏层节点个数、正则化参数等超参数进行寻优,以此来提高全局寻优能力,避免预测模型陷入局部最优;将得到的最佳超参数组合代入LSTM 网络模型中,输出预测结果。
将SSA−LSTM 与LSTM 、GWO−LSTM 、PSO−LSTM 瓦斯浓度预测模型进行比较,实验结果表明:基于SSA−LSTM 的瓦斯浓度预测模型的均方根误差(RMSE )较LSTM ,PSO−LSTM ,GWO−LSTM 分别减少了77.8%,58.9%,69.7%;平均绝对误差(MAE )分别减少了83.9%,37.8%,70%,采用SSA 优化的LSTM 预测模型相较于传统LSTM 模型具有更高的预测精度和鲁棒性。
关键词:瓦斯浓度预测;时序预测;深度学习;长短期记忆网络;麻雀搜索算法;超参数寻优中图分类号:TD712 文献标志码:AGas concentration prediction model based on SSA-LSTMLAN Yongqing 1, QIAO Yuandong 2, CHENG Hongming 1, LEI Lixing 1, LUO Huafeng 1(1. School of Coal Engineering, Shanxi Datong University, Datong 037003, China ;2. School of Architecture and Geomatics Engineering, Shanxi Datong University, Datong 037003, China)Abstract : In order to better capture the time-varying patterns and effective information of gas concentration,and achieve precise prediction of gas concentration in coal working faces, a gas concentration prediction model based on SSA-LSTM is proposed by optimizing the long short term memory (LSTM) network using sparrow search algorithm (SSA). The model uses the mean replacement method to process missing and abnormal data in the original gas concentration time series data, followed by normalization and wavelet threshold denoising. The performance differences between SSA and grey wolf optimization (GWO) and particle swarm optimization (PSO)algorithms are compared and tested. The result verifies the advantages of SSA in terms of optimization precision,convergence speed, and adaptability. By utilizing the adaptability of SSA, the hyperparameters of LSTM, such as learning rate, number of hidden layer nodes, and regularization parameters, are sequentially optimized to improve the global optimization capability and avoid the prediction model falling into local optimum. The obtained optimal hyperparameter combination is substituted into the LSTM network model and the prediction results are paring SSA-LSTM with LSTM, GWO-LSTM, and PSO-LSTM gas concentration prediction models, the experimental results show that the root mean square error (RMSE) of the gas concentration prediction model收稿日期:2023-09-20;修回日期:2024-02-20;责任编辑:胡娴。
基于核密度估计的集成剪枝和增量学习时间序列预测方法
收稿日期:2020年9月13日,修回日期:2020年10月25日作者简介:朱钢樑,男,硕士研究生,研究方向:集成学习,增量学习。
∗1引言时间序列预测(TSP )是机器学习和数据工程领域中一个重要且活跃的研究课题,在许多数据挖掘应用中具有不可或缺的重要性。
一般而言,时间序列涉及各种研究领域,例如:经济(股票价格,失业率和工业生产),流行病学(传染病病例率),医学(心电图和脑电图)和气象学(温度,风速和降雨量)[1]。
许多研究关心的是平稳时间序列预测而不是非平稳时间序列预测。
然而,实际的时间序列几乎都是非平稳的,限制了平稳时间序列技术在实际生产生活中的应用。
因此,对非平稳时间序列预测的研究变得重要和有价值[2~4]。
在过去的几十年,神经网络(NN )凭借其非参数,数据驱动和任何线性和非线性函数的通用逼近的理论特性,引起了时间序列领域的研究人员的极大关注[5]。
随着大量研究人员证明了基于神经网络的预测系统的优越性[6],越来越多的研究已经开始在神经网络的基础上设计时间序列预测模型[6~11]。
然而,对前馈神经网络(FNNs )参数的训练将耗费基于核密度估计的集成剪枝和增量学习时间序列预测方法∗朱钢樑(南京航空航天大学计算机科学与技术学院南京210016)摘要时间序列预测(TSP )在机器学习中是一个重要问题。
论文提出了一种基于核密度估计(KDE )的集成增量学习方法,用于时间序列的预测问题。
算法首先根据集成学习的原理产生基学习器池。
然后用基学习器池对预测样本的输出值得到核密度估计,并用得到的核密度估计来剪枝基学习器池。
得到最终的剪枝集成系统后,用该剪枝集成系统来预测样本的输出。
最后,算法根据样本在动态选择集上筛选出的最近邻集合进行增量学习。
在数据集IAP ,ICS ,MCD 上的试验结果表明,提出的时间序列预测算法和当前流行的算法相比效果有一定程度的提高。
关键词时间序列预测;KDE ;增量学习;动态集成剪枝中图分类号N945.24DOI :10.3969/j.issn.1672-9722.2021.04.021Ensemble Pruning and Incremental Learning Time Series Prediction Based on Kernel Density EstimationZHU Gangliang(College of Computer Science and Technology ,Nanjing University of Aeronautics and Astronautics ,Nanjing 210016)AbstractTime series prediction (TSP )is an important issue in machine learning.The paper proposes an ensemble pruningand incremental learning method based on kernel density estimation (KDE )for time series prediction problems.The algorithm first generates a base learner pool based on the principle of ensemble learning.Then the algorithm uses the base learner pool to obtain the kernel density estimation of the output value of the predicted sample ,and the obtained kernel density estimation is used to prune thebase learner pool to obtain the final ensemble pruning system ,and then the ensemble pruning system is used to predict the sample 's output.Finally ,the algorithm performs incremental learning based on the k nearest neighbor set filtered by the sample on the dynam ⁃ic selection set.The experimental results on data sets IAP ,ICS and MCD show that the proposed time series prediction algorithm has a certain improvement compared with the current popular algorithms.Key Words time series prediction ,KDE ,incremental learning ,dynamic ensemble pruningClass NumberN945.242021年第4期计算机与数字工程大量时间并导致不同参数层之间的依赖性。
多尺度时间序列预测
目
录
第一章 绪论 .................................................................................................................... 1 1.1 研究背景和意义 ................................................................................................. 1 1.2 研究现状和存在的问题...................................................................................... 2 1.3 本文的研究内容 ................................................................................................. 6 1.4 本文的组织结构 ................................................................................................. 6 第二章 时间序列预测相关知识...................................................................................... 7 2.1 时间序列预测概述 ............................................................................................. 7 2.1.1 时间序列预测的任务 ................................................................................ 7 2.1.2 时间序列预测的支撑技术 ........................................................................ 8 2.2 时间序列预测的传统方法 .................................................................................. 9 2.2.1 移动平均法 ............................................................................................... 9 2.2.2 指数平滑法 ..............................................................................................11 2.2.3 自回归移动平均模型 .............................................................................. 12 2.3 时间序列预测的人工智能方法 ........................................................................ 15 2.3.1 神经网络 ................................................................................................. 15 2.3.2 支持向量机 ............................................................................................. 16 2.4 时间序列预测的混合方法 ................................................................................ 17 2.5 尺度空间思想 ................................................................................................... 18 2.6 本章小结........................................................................................................... 20 第三章 多尺度时间序列数值预测 ................................................................................ 21 3.1 主要流程........................................................................................................... 21 3.2 时间序列预处理 ............................................................................................... 22 3.2.1 高斯核与小波基作为变换核的异同 ....................................................... 22 3.2.2 预处理数据 ............................................................................................. 24 3.3 数值预测的实现 ............................................................................................... 27 3.2.1 相似形状检索的必要性 .......................................................................... 27
基于时频融合卷积神经网络的股票指数预测
㊀第54卷第2期郑州大学学报(理学版)Vol.54No.2㊀2022年3月J.Zhengzhou Univ.(Nat.Sci.Ed.)Mar.2022收稿日期:2021-05-31基金项目:教育部人文社会科学青年基金项目(21YJCZH045);中央高校基本科研业务专项资金项目(JBK2101001)㊂第一作者:姜振宇(1998 ),女,硕士研究生,主要从事数据挖掘与分析研究,E-mail:jiangzhenyu@㊂通信作者:蔡福旭(1972 ),男,副主任医师,主要从事时序数据挖掘和医学统计研究,E-mail:caifuxu20@㊂基于时频融合卷积神经网络的股票指数预测姜振宇1,㊀黄雁勇1,㊀李天瑞2,㊀蔡福旭3(1.西南财经大学统计学院㊀四川成都611130;2.西南交通大学计算机与人工智能学院㊀四川成都611756;3.莆田学院附属医院㊀福建莆田351100)摘要:传统的股票指数预测方法是在含噪声㊁非平稳以及非线性的原始股指序列数据上实施的,这将导致预测精度的下降㊂为了解决这个问题,提出了一种基于时频融合卷积神经网络的股指预测方法㊂首先通过引入变分模态分解(VMD)将原始序列数据分解到频域特征上,使得分解后的股指数据具有低信噪比,同时具有更明显的趋势性和平稳性㊂进一步结合时序卷积神经网络(TCN),构建了时频融合的卷积神经网络模型㊂最后在6个实际数据集上与8个基准方法进行比较,实验结果表明该方法具有更高的预测精度和更好的解释性㊂关键词:股票指数预测;时频融合;变分模态分解;时序卷积网络中图分类号:TP311㊀㊀㊀㊀㊀文献标志码:A㊀㊀㊀㊀㊀文章编号:1671-6841(2022)02-0081-08DOI :10.13705/j.issn.1671-6841.2021225Fusion of Time-frequency-based Convolutional Neural Network inFinancial Time Series ForecastingJIANG Zhenyu 1,HUANG Yanyong 1,LI Tianrui 2,CAI Fuxu 3(1.School of Statistics ,Southwestern University of Finance and Economics ,Chengdu 611130,China ;2.School of Computing and Artificial Intelligence ,Southwest Jiaotong Univeristy ,Chengdu 611756,China ;3.Putian College Affiliated Hospital ,Putian 351100,China )Abstract :The traditional stock index forecasting methods were conducted on the noisy,non-stationary andnon-linear original stock index time series data,which would degrade the prediction accuracy.In order to deal with this issue,a novel stock index prediction method was proposed by incorporating the time-frequen-cy features and the convolutional neural network.Firstly,the original time series data were decomposed in-to time-frequency features by employing the variational mode decomposition (VMD).The decomposed se-ries data had a low signal-to-noise ratio and also stationarity with a clear trend.Then,by combining with temporal convolutional network (TCN ),a fusion of time-frequency-based convolutional neural network model was proposed.Finally,compared with eight baseline methods on six real-world datasets,the experi-mental results showed that our method had higher prediction accuracy and better interpretability.Key words :stock market index prediction;fusion of time-frequency;variational mode decomposition;temporal convolutional network0㊀引言股票是社会生产力发展的产物,顺势而生的股票市场为企业主体进行融资提供了平台,并促进了资金的供求平衡㊂通过股票市场,个人投资者以及金融机构可以进行金融投资以期获得预期收益,市场上的变动密切关系到相关投资者的切身利益㊂同郑州大学学报(理学版)第54卷时,随着经济市场化的进程不断推进,股票已然成为我国国民经济的重要组成部分,与国家的宏观经济息息相关,它的运行情况反映了个体企业的运营境况㊁国民经济的发展态势和宏观经济的健康状况㊂因此,股票市场分析一直以来都备受学术界和业界的广泛关注㊂为了避免对多只股票分别建模的冗余操作,许多金融机构或证券交易所编制了由多只代表性股票汇总得到的股票指数㊂对股票指数的建模不仅可以同时对多只股票进行分析,而且可以观察某个股票市场的整体情况㊂尽管如此,原始的股指数据往往是只包含历史信息的单一时间序列,这使得建模可用的信息有限㊂同时,原始股指序列具有非线性和非平稳性的特点,而且往往还包含大量的噪声和无用信息㊂这使得传统时序预测的方法无法得到较高的预测精度㊂针对这一问题,本文提出了一种基于时频融合卷积神经网络的预测方法㊂首先通过使用变分模态分解(variational mode decomposition, VMD)将原始序列分解为多个不同时间尺度的本征模态子序列(intrinsic mode function,IMF),其中的低频序列较原序列有着更明显的趋势㊂这保证了提取的历史信息更有效,也降低了原序列的噪声和非平稳性㊂然后,利用时序卷积网络(temporal convo-lutional network,TCN)分别对这些时频特征子序列进行建模并预测㊂进一步,通过VMD的逆分解操作,将子序列融合为股指序列的预测结果㊂最后在多个实际数据集上进行了实验,实验结果表明我们的模型比其他一些基准模型具有更高的预测精度㊂本文的主要贡献包括:1)首次将变分模态分解(VMD)方法引入到股指预测模型中,利用VMD将噪声多㊁不平稳㊁非线性的原始股指序列数据分解成多个更具有规律性的模态子序列,降低了直接使用预测模型提取原序列有效信息的难度,提高了预测的精度;2)提出了一种基于时频融合卷积神经网络的股指预测方法㊂将原始股指数据通过VMD分解后得到的多条时频信号数据分别输入到TCN中,然后对输出的结果进行有效融合,得到股指序列的预测值㊂在几个真实数据集上的实验结果表明我们的方法具有更高的预测精度,同时具有更好的解释性㊂1 相关工作时间序列分析具有相对完整的理论体系,发展至今产生了许多经典的预测模型㊂差分整合移动平均自回归模型(autoregressive integrated moving aver-age model,ARIMA)被应用于股价的预测,短期预测取得了较好的结果[1-2]㊂由于金融时间序列具有异方差等特点,自回归条件异方差模型(autoregressive conditional heteroskedasticity,ARCH)及其广义变体(generalized autoregressive conditional heteroskedastic-ity,GARCH)也被广泛运用在金融时序分析中㊂魏宇将GARCH运用在沪深300指数上,对其波动率进行了预测[3];Hassan则使用隐马尔科夫(hidden markov model,HMM)模型对股票市场建模,并得到了较为准确的预测结果[4];朱永明结合粗糙集理论,使用不同指标对股市进行建模分析[5]㊂上述单一的传统统计学模型具有形式简单㊁可解释性强等特点,但其模型系数的规模通常不大,函数的形式也相对简单,所以其拟合能力较为有限㊂除了传统统计学模型,机器学习中的一些模型也常用于时间序列预测㊂支持向量回归(support vector regression,SVR)是机器学习模型中经典的回归模型㊂Meesad等使用SVR结合了不同窗口设置方法对股票数据进行建模,验证了这一模型的有效性[6]㊂随着建模的数据量越来越大,涉及的数据类型也更加灵活多变㊂深层神经网络(deep neural network,DNN)在处理大数据问题上具有独特优势,其在金融时间序列预测上的有效性也被证明[7];姚宏亮等结合贝叶斯神经网络和均线滞后特征,提出了DSMA模型,并对模型效果进行了实验证明[8]㊂DNN的发展衍生出不同结构的模型,如循环神经网络(recurrent neural network,RNN)及其变体㊁卷积神经网络(convolutional neural networks,CNN)[9]及其变体㊂杨青等使用长短期记忆(long short-term memory,LSTM)对全球共30个股票指数进行不同期限研究,验证了LSTM的良好表现[10];文献[11]使用粒子群优化算法(particle swarm optimization, PSO)优化的分位数回归神经网络(quantile regres-sion neural network,QRNN)对8个金融数据进行建模,都取得了较好的结果㊂上述模型在解决数据非线性㊁不连续和高维的问题上具有一定的优势,但是它们都是直接对原始单时间序列进行建模,模型精度往往受到噪声㊁非平稳以及有限信息等因素的影响㊂真实世界的原始序列往往包含多重信息,其中还掺杂部分噪声与冗余无用的内容㊂为了解决这种问题,熊志斌结合神经网络,使用ARIMA对美元等三种汇率进行建模,并使用PSO对神经网络进行优化[12];Du通过对ARIMA和BPNN的复合模型证实28㊀第2期姜振宇,等:基于时频融合卷积神经网络的股票指数预测了股票数据的非线性,同时预测的结果更加精确[13]㊂针对金融时间序列的非平稳㊁噪声多等特点,Hsieh等结合Haar小波分解与人工蜂群算法优化的RNN对DJIA等四只股票指数进行建模,验证了模型的预测精度[14];Cao等使用自适应白噪声完整经验模态分解(complete ensemble empirical mode decomposition with adaptive noise,CEEMDAN)与LSTM对SPX等四只股票指数进行建模,并以SVR等模型作为基准方法进行对比实验,验证了所提出模型的优越性[15]㊂上述模型具有一定的效果,然而在实际应用中容易产生模态混叠,同时预测效果不够高㊂综上所述,尽管以上的方法已经取得了一些较好的结果,然而在处理噪声多㊁非线性以及非平稳的股指序列数据时仍然存在一定的局限性㊂本文通过提取股指序列数据的时频特征,同时结合TCN和VMD给出了基于时频融合卷积网络的股指序列预测模型,并通过对比实验及可视化分析进行验证㊂2㊀基础理论介绍本节主要介绍与所提出模型有关的基础理论知识㊂2.1㊀变分模态分解(VMD)VMD是2014年提出的一种完全非递归的变分模式分解方法[16],它与经验模态分解(empiricalmode decomposition,EMD)[17]㊁CEEMDAN[18]等同属于自适应信号分解方法,旨在根据数据自身时间尺度特征进行信号分解㊂同时,VMD避免了模态混叠等问题,并具有更坚实的数学基础㊂在VMD算法中,每个模态都是由调幅-调频信号来表示的,定义为uk(t)=A k(t)cos(Φk(t)),(1)式中:A k(t)是u k(t)的幅值;Φk(t)是相位;u k(t)的瞬时频率为ωk(t)ȡ0㊂在足够长的区间[t-δ,t+δ],δʈ2π/ωk(t)上,可以将模态视为谐波信号㊂先通过希尔伯特变换,获得u k(t)的解析信号的单边频谱;进一步,对于每种模式,与各自估计的中心频率的指数混合,将每个模式的频谱移至 基带 ;最后,通过解析信号的高斯平滑度,即其导数的二范数的平方,来估计各模态的带宽㊂上述步骤产生的约束变分问题为min {u k},{ωk}{ðK k=1 ∂t[(δ(t)+jπt)∗u k(t)]e-jωk t 22}s.t.ðK k=1u k(t)=f(t),(2)其中:f是原始信号;u k(k=1,2, ,K)是所有的模态的集合;ωk(k=1,2, ,K)是对应中心频率的集;j是虚数单位㊂对上式进行递归求解,就可以得到想要的模态分量㊂2.2㊀时序卷积网络(TCN)CNN是被广泛运用的一种神经网络,其核心是通过卷积计算来提取局部的数据特征㊂为了使得模型更加适合序列学习,并避免梯度问题,TCN在CNN的基础上加入了如下特殊结构[19]㊂1)因果卷积:因果卷积在WaveNet中第一次被提出[20],它将层之间的信息传递方向限制成单一方向,这符合时序任务只能获取历史信息的要求㊂2)膨胀卷积:CNN难以处理序列学习问题,这主要是因为它不具备抓取长时依赖信息的能力㊂受限于卷积核大小,CNN需要不断叠加卷积层来获得更长期的感受野,这会导致参数体量庞大等问题㊂膨胀卷积则采用了间隔采样的方法,在层数较少的情况下网络可以获得更大的感受野㊂3)残差连接:神经网络的表达能力一定程度上随着网络层数的增加而提升,但训练难度也随之增加,容易出现梯度消失等问题㊂残差连接的输出被表述成输入与输入的非线性函数的线性求和,这实现了信息的跨层传递㊂同时,若网络通过链式法则进行反向传播,整个输出项的梯度经过多次连乘后不会接近消失㊂3㊀本文的方法上一节我们介绍了VMD和TCN的背景知识,以此为基础,详细介绍基于时频融合卷积神经网络的股指预测模型:首先,介绍模型的整体框架;然后,给出模型学习的具体过程㊂3.1㊀模型框架原始的股指序列数据包含噪声,而且是非平稳和非线性的㊂为此,本文用VMD将原始股指序列数据进行有效分解,得到多条具有时频信息的子序列㊂进而通过结合TCN,构建了基于时频融合的卷积神经网络模型㊂图1是我们所提出模型的框架图,每个模块描述如下㊂1)时频特征获取模块㊂设置VMD分解所需的模态数K,对原始股指序列数据FT进行VMD分解㊂获得原股指序列数据的K个基本模态分量,记为IMF1, ,IMF K,分别代表原序列从高频到低频38郑州大学学报(理学版)第54卷㊀㊀图1㊀所提出模型的框架Figure 1㊀General framework of the proposed model的震荡成分㊂根据VMD 的重构原理,用t 代表时间序列中的时间戳,可得到ðKk =1IMF k(t )=FT (t )㊂(3)㊀㊀使用max-min 归一化,IMF _S k (t )=IMF k (t )-min(IMF k (t ))max(IMF k (t ))-min(IMF k (t )),其中:min(IMF k (t ))㊁max(IMF k (t ))代表的是计算序列中的最小值和最大值,标准化操作后得到IMF _S k ㊂为了进行向前一步预测,本文构造了滑动窗口,并确定时间窗口的长度t =L ㊂按照时间窗口构建输入特征和输出特征,得到X k ㊁Y k ,再按照训练测试比将其划分为X k _train ㊁Y k _train 和X k _test ㊁Y k _test ㊂2)时序卷积神经网络模块㊂使用X k _train ㊁Y k _train对模型进行训练,为了避免过长训练时间,并减少过拟合的风险,引入早停机制㊂针对输入了不同时频特征数据的TCN 网络,本文设定了不同的参数使其更具有针对性㊂通过训练集的参数优化得到网络参数,再使用训练好的模型对X k _test 进行预测,得到预测结果为Yᶄk _test ,对其进行反标准化,得到IMFᶄk _test ㊂3)预测值融合模块㊀根据公式(3),将每个TCN 网络得到的子序列预测进行重构,最终得到的时间序列预测结果为FTᶄtest ㊂3.2㊀模型学习过程这一部分将着重对模型的学习过程进行介绍㊂分解过程中,单一的股指序列数据被对应分解成K 个模态子序列,优化的目标函数如式(2)所示㊂记原始股指序列数据为FT ɪR T,其对应子序列为IMF 1, ,IMF K ㊂利用二次惩罚项与拉格朗日乘数法将式(2)变成如下的无约束优化问题,L ({IMF k },{ωk },λ)ʒ=αðKk =1 ∂t [(δ(t )+jπt )∗IMF k (t )]e-j ωkt22+ FT (t )-ðKk =1IMF k(t )22+λ(t ),FT (t )-ðK k =1IMF k(t )⓪,其中:α为二次惩罚因子,作用是在有噪声时保证重构的精度;λ(t )是拉格朗日乘子㊂用交替方向乘子法(alternating direction methodof multipliers,ADMM)对IMF n +1k ㊁ωn +1k和λn+1进行更新,优化过程如算法1中的1)~6)㊂预测过程中,TCN 的输入为IMF _S k ㊂膨胀卷积使得网络的感受野更大,针对一维的输入向量C ɪR n 和过滤器f :{0, ,k -1}ңR ,膨胀卷积可以表示为F (t )=(C ㊃f d )(t )=ðk -1i =0f (i )㊃Xt -di,其中:d 是膨胀系数;k 是滤波器大小;t -di 表明了历史信息的方向㊂同时,TCN 模型中还使用残差连接:z (i )为残差块输出;z (i-1)为残差块的输入;m ()为非线性映射,则残差连接为z (i )=m (z (i -1))+z (i -1)㊂㊀㊀将上述的膨胀卷积和残差连接合并记为tcn (),其中给定超参数μk ,需要学习的参数记做θk ㊂对所提出的基于时频融合卷积神经网络使用BP 反向传播进行参数优化,优化方法为梯度下降法㊂假设时间窗口长度为L ,预测期长τ,那么输入㊁输出对数量为N =T -L -τ+1,且X k ɪR N ㊃T ,Y k ɪR N ㊃τ㊂我们将很容易得到,Yᶄk =tcn (X k ;θk )㊂训练过程中,使用均方误差MSE 作为损失函数,记X k _train 的第n 行为x kn ,Y k _train 的第n 行为y kn ,得到训练的目标函数为MSE k =1N ðNn =1(y kn -tcn (x kn ;θk ))2㊂㊀㊀原始序列共分解成K 个子序列,需要同时进行K 个上述优化,算法1给出了这一过程㊂算法1㊀基于时频融合卷积神经网络模型的优化算法输入:原始时间序列FT ;相关参数K ㊁L ㊁T ㊁τ㊁κ;神经网络超参数μk ;早停参数patience ㊂输出:训练期的预测值FTᶄtest ㊂48㊀第2期姜振宇,等:基于时频融合卷积神经网络的股票指数预测初始化:{IMF 1k },{ω1k },λᶄ1,n ѳ0;TCN 中所有的可学习参数θk ㊂1)repeat n ѳn +1㊀㊀㊀ʊ数据预处理2)㊀for k =1ʒK ,执行3)㊀㊀针对ωȡ0,更新IMF k 和ωkIMF n +1k(ω)ѳ11+2α(ω-ωn k )2㊃{FT (ω)-ði <kIMFn +1i (ω)-ði >k IMF n i(ω)+λᶄn(ω)2}ωn +1kѳʏɕ0ωIMF n +1k(ω)2d ωʏɕ0IMFn +1k(ω)2d ω4)End for5)针对ωȡ0,采用对偶上升法λᶄn +1(ω)ѳλᶄn (ω)+κ(FT (ω)-ðkIMFn +1k(ω))6)until:ðkIMFn +1k-IMF nk22IMF n k 22<ε,输出IMF k =IMF n +1k7)for k =1ʒK ,使用min-max 标准化,得到IMF _S k ,构造得到X k _train ㊁Y k _train 和X k _test ㊁Y k _test8)㊀for n =1ʒN 执行㊀㊀ʊ参数优化9)㊀㊀repeat㊀㊀㊀㊀针对(X k _train ,Y k _train )使用BP 最小化MSE k ,更新参数θk10)㊀㊀until 满足早停条件,记录θᶄk ,tcn k11)㊀end forYᶄk _test =tcn k (X k _test ;θk ),反标准化得到IMFᶄk _test ,从而FTᶄtest=ðKk =1IMFᶄk _test12)end for4㊀时频融合卷积神经网络的股指预测实验分析4.1㊀数据集和实验环境本文选取了国内外6只股票指数的每日收盘指数,数据均来自国泰君安数据库CSMAR 的股票市场分析模块(https:ʊ /)㊂选取的指数分别是标准普尔500指数(Standard and Poor 500Index,SPX)㊁纳斯达克综合指数(IXIC)㊁日经指数225(Nikkei 225,N225)㊁阿姆斯特丹泛欧指数(Am-sterdam Stock Exchange,AEX )㊁香港恒生指数(Hang Seng Index,HSI)和上证综合指数(Shanghai Securities Composite Index,SSE ),其范围涵盖了多个国家和地区,兼具代表性和地区特点㊂选择的时间跨度为2010年1月4日 2019年12月4日,由于不同市场上节假日等因素的影响,不同股票数据在序列长度上略有差异㊂实验中使用80%的数据作为训练集,剩下20%作为测试集㊂实验所使用的设备搭载主频为2.1GHz 的CPU,程序编程环境是Anaconda3㊁Python3.7和Tensorflow-CPU㊂4.2㊀对比实验与参数设置为了验证所提出模型的有效性,本文使用统计模型中的ARIMA [1]㊁GARCH [21]㊁HMM [22]机器学习中的SVR [23]和神经网络中的LSTM [10]㊁TCN [24]模型以及C-LSTM [15]㊁C-TCN [25]模型(C 表示CEEM-DAN)作为基准模型,它们都曾被应用于股票指数等时间序列的预测问题㊂引入常用的三种评价指标:均方根误差(RMSE )㊁平均绝对误差(MAE )和平均绝对百分比误差(MAPE )㊂设n 是测试集的长度,三者的基本计算公式分别为MAE =1n ðni =1FTᶄtest (i )-FT test (i ),RMSE =1n ðni =1(FTᶄtest (i )-FT test (i ))2,MAPE =100%n ðn i =1FTᶄtest (i )-FT test (i )FT test (i )㊂㊀㊀我们对模型进行了30次重复实验来避免偶然性,并计算平均指标作为最终对比的准则㊂所有预测模型都采用了网格调参的方法,其中TCN 的主要参数及调节范围如表1所示㊂表1㊀TCN 的参数表Table 1㊀The parameters table of TCN参数取值范围kernel _size2,3,5,10dilations 2,3,4,5nb _filters 10,16,32activationrelu,linear ,sigmoidkernel _initializer VarianceScaling ,orthogonal batch _size 16,32,64,128,256patience10,20,30,40,504.3㊀实验结果分析表2㊁3㊁4分别是9个模型对6个数据集重复30次建模后计算的RMSE ㊁MAE 和MAPE 指标平均值,小括号内为标准差,黑体为最优值㊂SVR 等模型由于重复实验结果不变,因此标准差为0㊂不同模型在三个指标上的表现基本一致,仅考虑指标RMSE ,最优模型为本文提出的模型,其在数据集HSI 上较58郑州大学学报(理学版)第54卷C-TCN 提升34%,在SPX 上较C-LSTM 提升31%㊂实证结果表明基于时频融合的方法有利于提升预测模型的性能:首先,将原始时序分解成相对平稳的时频特征子序列,可以进一步提取出趋势和噪声项;其次,使用TCN 对子序列进行建模,发挥其高效的学习能力,实现了对序列的高精度预测㊂表2㊀对比实验的RMSETable 2㊀RMSE of the comparative experiments股票指数本文C-TCN C-LSTM LSTM TCN SVR ARIMA GARCH HMM SPX 10.31(0.52)15.45(4.22)14.95(1.61)35.61(3.59)28.97(2.99)26.61(0)25.75(0)25.76(0)25.57(0)IXIC 39.69(6.00)54.50(9.98)56.14(11.93)123.83(8.39)110.43(30.53)86.86(0)85.79(0)86.00(0)85.98(0)N22588.72(4.77)115.17(4.64)108.08(1.88)261.85(40.27)274.99(53.31)229.73(0)230.08(0)229.81(0)229.92(0)AEX 1.84(0.12)1.99(0.14)2.03(0.14) 4.54(0.33) 5.00(0.73)4.26(0)4.27(0)4.26(0)4.26(0)HSI108.98(12.77)167.14(7.66)220.90(86.59)331.54(9.54)359.43(41.13)318.41(0)320.04(0)318.35(0)318.27(0)SSE13.13(1.22)16.57(1.43)18.39(1.45)35.81(1.51)39.18(4.71)34.98(0)34.83(0)34.86(0)35.27(0)表3㊀对比实验的MAETable 3㊀MAE of the comparative experiments股票指数本文C-TCN C-LSTM LSTM TCN SVR ARIMA GARCH HMM SPX 7.38(0.53)11.10(4.66)11.76(1.70)27.61(3.67)21.95(3.59)18.95(0)18.22(0)18.13(0)18.01(0)IXIC 30.07(6.00)39.15(10.84)44.67(11.97)99.68(10.46)88.91(31.73)63.40(0)62.55(0)62.44(0)62.42(0)N22566.89(3.61)84.53(4.87)81.71(1.52)196.93(38.97)213.60(55.23)167.93(0)166.84(0)166.22(0)165.99(0)AEX 1.38(0.10) 1.51(0.13) 1.57(0.13) 3.46(0.35) 3.91(0.77) 3.20(0)3.21(0)3.19(0)3.19(0)HSI87.41(10.49)130.51(7.82)172.20(86.51)253.94(9.00)279.74(38.64)239.87(0)246.44(0)239.58(0)239.50(0)SSE10.04(1.21)12.82(1.42)14.27(1.24)26.18(1.56)29.61(4.95)25.13(0)25.42(0)25.15(0)25.50(0)表4㊀对比实验的MAPETable 4㊀MAPE of the comparative experiments股票指数本文C-TCN C-LSTM LSTM TCN SVR ARIMA GARCH HMM SPX 0.27(0.018)0.40(0.171)0.42(0.059)0.99(0.137)0.79(0.132)0.68(0)0.66(0)0.66(0)0.65(0)IXIC 0.40(0.076)0.52(0.144)0.59(0.155) 1.33(0.147) 1.18(0.403)0.68(0)0.83(0)0.84(0)0.83(0)N2250.31(0.016)0.37(0.022)0.37(0.007)0.90(0.178)0.97(0.245)0.77(0)0.76(0)0.77(0)0.76(0)AEX 0.25(0.019)0.28(0.024)0.28(0.024)0.64(0.065)0.72(0.140)0.59(0)0.59(0)0.59(0)0.58(0)HSI0.31(0.037)0.46(0.029)0.62(0.322)0.90(0.033)0.99(0.135)0.85(0)0.87(0)0.85(0)0.84(0)SSE0.34(0.039)0.44(0.049)0.49(0.043)0.90(0.052)1.01(0.170)0.86(0)0.87(0)0.86(0)0.87(0)4.4㊀模型有效性的可视化呈现本小节以SPX 收盘指数序列作为一个实例,对模型的不同阶段进行可视化㊂首先,对原始序列进行可视化,如图2(a)所示㊂图2㊀VMD 分解前后序列Figure 2㊀Sequences before and after VMD68㊀第2期姜振宇,等:基于时频融合卷积神经网络的股票指数预测㊀㊀该序列呈现出整体上的长期趋势,这为使用模型进行预测提供了一定的依据㊂此外,序列具有波动性大㊁噪声多和非线性等特点,对它构建传统的线性预测模型将难以获得高精度的预测结果㊂神经网络模型更适合用来处理这类数据,这与上一节的实验结果相一致㊂对原始序列进行VMD,得到的部分IMF序列如图2(b)所示㊂图中从上至下,子序列的频率递增,其中频率最低的为IMF1,代表了股指序列的整体走向,其明显比原始SPX序列更加平滑㊂整体来看,图2(b)中的低频序列具有明显的变化趋势,它们具有波动相对小㊁噪声相对少的特点㊂针对得到的每个IMF序列,设置时间窗口长度为4,构造TCN对未来一期的数据进行预测㊂得到每个IMF序列的预测结果后,将预测结果融合得到SPX股指预测值,如图3所示㊂图3㊀模型预测结果Figure3㊀The forecast result of the model其中:橙线是本文方法预测值;蓝线为真实序列;虚线是其他基准模型㊂两条实曲线重合度较高,这表明本文提出的模型的预测结果与真实数据基本一致,模型可以获得很好的预测结果㊂5㊀总结与展望本文提出了基于时频融合卷积神经网络的股票指数预测模型㊂实验中选取实际股票市场中具有代表性的6只股票指数,以常用于股票指数预测的8个模型作为对比基准,计算了每个模型预测结果的3个评价指标,实验证明本文提出的模型相对于其他基准模型获得了更好的预测效果,RMSE较其他模型提升最高达到34%㊂不仅如此,此模型也可以扩展到其他的应用领域㊂我们研究的仅仅是基于单时间序列历史信息的预测问题,但在实际的研究中,许多宏㊁微观指标,如汇率㊁利率等,以及其他一些股指的技术指标都被证明对股票指数的变化有影响,后续工作将考虑引入这些信息来有效融合模型的输入特征㊂此外,文本挖掘和自然语言处理通过对金融新闻进行情感分析,可以掌握投资者和市场的需求动向,将它们考虑进模型也为未来的工作提供了思路㊂参考文献:[1]㊀ARIYO A A,ADEWUMI A O,AYO C K.Stock priceprediction using the ARIMA model[C]ʊThe16th Inter-national Conference on Computer Modelling and Simula-tion.Piscataway:IEEE Press,2014:106-112. [2]㊀WICHAIDIT S,KITTITORNKUN S.Predicting SET50stock prices using CARIMA(cross correlation ARIMA)[C]ʊInternational Computer Science and EngineeringConference.Piscataway:IEEE Press,2015:1-4. [3]㊀魏宇.沪深300股指期货的波动率预测模型研究[J].管理科学学报,2010,13(2):66-76.WEI Y.Volatility forecasting models for CSI300index fu-tures[J].Journal of management sciences in China,2010,13(2):66-76.[4]㊀HASSAN M R,NATH B.Stock market forecasting usinghidden Markov model:a new approach[C]ʊThe5th In-ternational Conference on Intelligent Systems Design andApplications.Piscataway:IEEE Press,2005:192-196.[5]㊀朱永明.基于粗糙集理论的股市预测研究[J].郑州大学学报(理学版),2009,41(4):40-44.ZHU Y M.Study on prediction of stocking market basedon rough set theory[J].Journal of Zhengzhou university(natural science edition),2009,41(4):40-44. [6]㊀MEESAD P,RASEL R I.Predicting stock market priceusing support vector regression[C]ʊ2013InternationalConference on Informatics,Electronics and Vision.Pis-cataway:IEEE Press,2013:1-6.[7]㊀HEATON J B,POLSON N G,WITTE J H.Deep learn-ing in finance[EB/OL].(2018-01-04)[2021-05-03].https:ʊ/pdf/1602.06561.pdf. [8]㊀姚宏亮,艾刘可,王浩,等.均线滞后的时序自回归股市态势预测算法[J].郑州大学学报(理学版),2018,50(3):60-66.YAO H L,AI L K,WANG H,et al.Time series autore-gressive stock market forecasting algorithm based on mov-ing average hysteresis[J].Journal of Zhengzhou universi-ty(natural science edition),2018,50(3):60-66.[9]㊀李艳灵,杨志鹏,王莎莎,等.基于卷积神经网络进行电影院人群分布统计[J].信阳师范学院学报(自然科学版),2020,33(4):675-680.78郑州大学学报(理学版)第54卷LI Y L,YANG Z P,WANG S S,et al.The distributionof cinema population based on convolutional neural net-work[J].Journal of Xinyang normal university(naturalscience edition),2020,33(4):675-680. [10]杨青,王晨蔚.基于深度学习LSTM神经网络的全球股票指数预测研究[J].统计研究,2019,36(3):65-77.YANG Q,WANG C W.A study on forecast of globalstock indices based on deep LSTM neural network[J].Statistical research,2019,36(3):65-77. [11]PRADEEPKUMAR D,RAVI V.Forecasting financialtime series volatility using particle swarm optimizationtrained quantile regression neural network[J].Appliedsoft computing,2017,58:35-52.[12]熊志斌.ARIMA融合神经网络的人民币汇率预测模型研究[J].数量经济技术经济研究,2011,28(6):64-76.XIONG Z B.Research on RMB exchange rate forecastingmodel based on combining ARIMA with neural networks[J].The journal of quantitative&technical economics,2011,28(6):64-76.[13]DU Y L.Application and analysis of forecasting stockprice index based on combination of ARIMA model andBP neural network[C]ʊChinese Control and DecisionConference.Piscataway:IEEE Press,2018:2854-2857.[14]HSIEH T J,HSIAO H F,YEH W C.Forecasting stockmarkets using wavelet transforms and recurrent neural net-works:an integrated system based on artificial bee colonyalgorithm[J].Applied soft computing,2011,11(2):2510-2525.[15]CAO J,LI Z,LI J.Financial time series forecastingmodel based on CEEMDAN and LSTM[J].Physica A:statistical mechanics and its applications,2019,519:127-139.[16]DRAGOMIRETSKIY K,ZOSSO D.Variational mode de-composition[J].IEEE transactions on signal processing,2014,62(3):531-544.[17]RILLING G,FLANDRIN P,GONCALVES P.On empir-ical mode decomposition and its algorithms[C]ʊThe6thIEEE-EURASIP Workshop on Nonlinear Signal and ImageProcessingP Workshop on Nonlinear Signal and ImageProcessing.Piscataway:IEEE Press,2003:8-11. [18]TORRES M E,COLOMINAS M A,SCHLOTTHAUERG,et al.A complete ensemble empirical mode decompo-sition with adaptive noise[C]ʊ2011IEEE InternationalConference on Acoustics,Speech and Signal Processing.Piscataway:IEEE Press,2011:4144-4147. [19]CHUNG J,GULCEHRE C,CHO K,et al.Empirical e-valuation of gated recurrent neural networks on sequencemodeling[EB/OL].(2014-12-11)[2021-04-15].https:ʊ/pdf/1412.3555.pdf.[20]OORD A V D,DIELEMAN S,ZEN H,et al.WaveNet:a generative model for raw audio[EB/OL].(2016-09-19)[2021-03-28].https:ʊ/pdf/1609.03499.pdf.[21]LIN Z.Modelling and forecasting the stock market volatil-ity of SSE composite index using GARCH models[J].Future generation computer systems,2018,79:960-972.[22]ZHANG X,LI Y X,WANG S Z,et al.Enhancing stockmarket prediction with extended coupled hidden Markovmodel over multi-sourced data[J].Knowledge and infor-mation systems,2019,61(2):1071-1090. [23]LAHMIRI S.Minute-ahead stock price forecasting basedon singular spectrum analysis and support vector regres-sion[J].Applied mathematics and computation,2018,320:444-451.[24]WANG X,WANG Y J,WENG B,et al.Stock2Vec:ahybrid deep learning framework for stock market predic-tion with representation learning and temporal convolu-tional network[EB/OL].(2020-09-29)[2021-05-03].https:ʊ/pdf/2010.01197.pdf. [25]WANG J,LUO Y Y,TANG L Y,et al.A new weightedCEEMDAN-based prediction model:an experimental in-vestigation of decomposition and non-decomposition ap-proaches[J].Knowledge-based systems,2018,160:188-199.88。
文献翻译-基于最小二乘支持向量回归的短期混沌时间序列预测
外文文献翻译Short Term Chaotic Time Series Predictionusing Symmetric LS-SVM RegressionAbstract—In this article, we illustrate the effect of imposing symmetry as prior knowledge into the modelling stage, within the context of chaotic time series predictions. It is illustrated that using Least-Squares Support Vector Machines with symmetry constraints improves the simulation performance, for the cases of time series generated from the Lorenz attractor, and multi-scroll attractors. Not only accurate forecasts are obtained, but also the forecast horizon for which these predictions are obtained is expanded.1. IntroductionIn applied nonlinear time series analysis, the estimation of a nonlinear black-box model in order to produce accurate forecasts starting from a set of observations is common practice. Usually a time series model is estimated based on available data up to time t, and its final assessment is based on the simulation performance from t + 1 onwards. Due to the nature of time series generated by chaotic systems, where the series not only shows nonlinear behavior but also drastic regime changes due to local instability of attractors, this is a very challenging task. For this reason, chaotic time series have been used as benchmark in several time series competitions.The modelling of chaotic time series can be improved by exploiting some of its properties. If the true underlying system is symmetric, this information can be imposed to the model as prior knowledge ,in which case it is possible to obtain better forecasts than those obtained with a general model . In this article, short term predictions for chaotic time series are generated using Least-Squares Support Vector Machines (LS-SVM) regression. We show that LS-SVM with symmetry constraints can produce accurate predictions. Not only accurate forecasts are obtained, but also the forecast horizon for which these predictions are obtained is expanded, when compared with the unconstrained LS-SVM formulation.This paper is structured as follows. Section 2 describes the LS-SVM technique for regression, and how symmetry can be imposed in a straightforward way.Section 3 describes the applications for the cases of the x−coordinate of the Lorenz attractor, and the data generated by a nonlinear transformation of multi-scroll attractors.2. LS-SVM with Symmetry ConstraintsLeast-Squares Support Vector Machines (LS-SVM) is a powerful nonlinear black-box regression method,which builds a linear model in the so-called feature space where the inputs have been transformed by means of a (possibly infinite dimensional) nonlinear mapping . This is converted to the dual space by means of the Mercer’s theorem and the use of a positive definite kernel, without computing explicitly the mapping. The LS-SVM formulation, solves a linear system in dual space under a least-squares cost function , where the sparseness property can be obtained by e.g. sequentially pruning the support valuespectrum or via a fixed-size subset selection approach. The LS-SVM training procedure involves the selection of a kernel parameter and the regularization parameter of the cost function, which can be done e.g. by cross-validation, Bayesian techniques or others. The inclusion of a symmetry constraint (odd or even) to the nonlinearity within the LS-SVM regression framework can be formulated as follows . Given the sample of N points {},1k k N x y k =, with input vectors p k x R ∈and output values k y R ∈, the goal is to estimate a model of the form ()T y w x b e ϕ=++():hn p R R ϕ∙→ is the mapping to a high dimensional (and possibly infinite dimensional) feature space, and the residuals e are assumed to be i.i.d. with zero mean and constant (and finite) variance. The following optimization problem with a regularized cost function is formulated:2,,111min 22(),1,...,..()(),1,...,k N T k w b e k T k k k T T k k w w e y w x b e k N s t w x aw x k N γϕϕϕ=+⎧=++=⎨=-=⎩∑ where a is a given constant which can take either -1 or 1. The first restriction is the standard modelformulation in the LS-SVM framework. The second restriction is a shorthand for the cases where we want to impose the nonlinear function ()T k w x ϕ to be even (resp. odd) by using a = 1 (resp. a = −1). The solution is formalized in the KKT lemma.3. Application to Chaotic Time SeriesIn this section, the effects of imposing symmetry to the LS-SVM are presented for two cases of chaotic time series. On each example, an RBF kernel is used and the parameters σ and γ are found by 10-fold cross validation over the corresponding training sample. The results using the standard LS-SVM are compared to those obtained with the symmetry-constrained LS-SVM (S-LS-SVM) from (2). The examples are defined in such a way that there are not enough training datapoints on every region of the relevant space; thus, it is very difficult for a black-box model to ”learn” the symmetry just by using the available information. The examples are compared in terms of the performance in the training sample (cross-validation mean squared error, MSE-CV) and the generalization performance (MSE out of sample, MSE-OUT). For each case, a Nonlinear AutoRegressive (NAR) black-box model is formulated: ()((1),(2),...,())y t g y t y t y t p e t=---+where g is to be identified by LS-SVM and S-LS-SVM. The order p is selected during the cross-validation process as an extra parameter. After each model is estimated, they are used in simulation mode, where the future predictions are computed with the estimated model using past predictions: ()((1),(2),...,())y t g y t y t y t p ∧∧∧∧∧=---3.1. Lorenz attractorThis example is taken from [1]. The x−coordinate of the Lorenz attractor is used as an example of a time series generated by a dynamical system. A sample of 1000 datapoints is used for training, which corresponds to an unbalanced sample over the evolution of the system, shown on Figure 1 as a time-delay embedding.Figure 2 (top) shows the training sequence (thick line) and the future evolution of the series (test zone). Figure 2 (bottom) shows the simulations obtained from both models on the test zone. Results are presented on Table 1. Clearly the S-LS-SVM can simulate the system for the next 500 timesteps, far beyond the 100 points that can be simulated by the LS-SVM.3.2. Multi-scroll attractorsThis dataset was used for the K.U.Leuven Time Series Prediction Competition . The series was generatedby()tanh()x h x y W Vx ∙==where h is the multi-scroll equation, x is the 3-dimensional coordinate vector, and W,V are the interconnection matrices of the nonlinear function (a 3-units multilayer perceptron, MLP). This MLP function hides the underlying structure of the attractor .A training set of 2,000 points was available for model estimation, shown on Figure 3, and the goal was to predict the next 200 points out of sample. The winner of the competition followed a complete methodology involving local modelling, specialized many-steps ahead cross-validation parameters tuning, and the exploitation of the symmetry properties of the series (which he did by flipping the series around the time axis).Following the winner approach, both LS-SVM and S-LS-SVM are trained using 10-step-ahead cross-validation for hyperparameters selection. To illustrate the difference between both models, the out of sample MSE is computed considering only the first n simulation points, where n = 20, 50, 100, 200. It is important to emphasize that both models are trained using exactly the same methodology for order and hyperparameter selection; the only difference is the symmetry constraint for the S-LS-SVM case. Results are reported on Table 2. The simulations from both models are shown on Figure 4.Figure 1: The training (left) and test (right) series from the x−coordinate of the Lorenz attractorFigure 2: (Top) The series from the x−coordinate of the Lorenz attractor, part of which is used for training (thick line).(Bottom) Simulations with LS-SVM (dashed line), S-LS-SVM (thick line) compared to the actual values (thin line).Figure 3: The training sample (thick line) and future evolution (thin line) of the series from the K.U.Leuven Time Series CompetitionFigure 4: Simulations with LS-SVM (dashed line), S-LSSVM (thick line) compared to the actual values (thin line) for the next 200 points of the K.U.Leuven data.Table 1: Performance of LS-SVM and S-LS-SVM on the Lorenz data.Table 2: Performance of LS-SVM and S-LS-SVM on the K.U.Leuven data.4. ConclusionsFor the task of chaotic time series prediction, we have illustrated how to use LS-SVM regression with symmetry constraints to improve the simulation performance for the cases of series generated by Lorenz attractor and multi-scroll attractors. By adding symmetry constraints to the LS-SVM formulation, it is possible to embed the information about symmetry into the kernel level. This translates not only in better predictions for a given time horizon, but also on a larger forecast horizon in which the model can track the time series into the future.基于最小二乘支持向量回归的短期混沌时间序列预测摘要:本文解释了在混沌序列预测范围内,先验知识到模型阶段任意使用对称性的作用效果。
Stock_Price_Forecasting_Based_on_the_MDT-CNN-CBAM-
Theory and Practice of Science and Technology2022, VOL. 3, NO. 6, 81-90DOI: 10.47297/taposatWSP2633-456914.20220306Stock Price Forecasting Based on the MDT-CNN-CBAM-GRU Model: An Empirical StudyYangwenyuan DengBusiness School, University of New South Wales, Sydney 1466, AustraliaABSTRACTRecently, more researchers have utilized artificial neural network topredict stock price which has the characteristic of time series. This paperproposes the MDT-CNN-CBAM-GRU to forecast the close price of theshares. Meanwhile, three models are set as comparing experiment. CSI300 index and MA 5 are added as new price factors. The daily historicaldata of China Ping An from 1994 to 2020 is utilized to train, validate andtest models. The results of the experiment prove MDT-CNN-CBAM-GRU isthe optimal and GRU has better performance than LSTM. Thus, MDT-CNN-CBAM-GRU can effectively predict the closing price of one stock whichcould be a reference for investing decision.KEYWORDSStock price; Deep learning; Gated Recurrent Unit (GRU); Multi-directionalDelayed Embedding (MDT); Convolutional Block Attention Module(CBAM)1 IntroductionWith the development of Chinese stock market, investors realize the great significance in stock price prediction [1]. Due to the volatility and complexity of stock market, shares prediction contains multi-dimensional variables and massive time-series data [2]. Traditional methods have several shortages such as inefficiency, subjectivity, and poor integrity of inventory content information. To resolve these shortages, artificial intelligence have been introduced to this area. Machine learning such as deep learning, decision trees and logistic regression have emerged in financial data research [3-5].Deep learning is a new branch of machine learning which transfer the low-level feature to high-level feature to simplify learning task [6]. The CNN-LSTM model is a classic model of the deep learning. It has been widely used in different area due to its better performance and prediction accuracy compared with single models [7-8]. Zhao and Xue prove the CBAM module could improve the performance of CNN-LSTM [9]. Cao et al. innovatively applied the multi-directional delayed embedding (MDT) to transform price factor which contributes to the generalization and time-sensitization of forecasting results [10].Based on the CNN-LSTM model, this paper proposes MDT-CNN-CBAM-GRU model. In this experiment, Jupyter notebook is the program platform, and Keras of TensorFlow is used as the neural framework to build model. The experimental data includes the share price factors of ChinaYangwenyuan Deng 82Ping An 1. This experiment will verify the effectiveness of CBAM module and MDT module. Meanwhile, the performance of GRU is compared with LSTM include the time efficiency and prediction errors. Three evaluation indexes are used to present the prediction results.2 Related WorkRecently, machine learning has become a hot spot in financial areas [11]. Artificial neural network (ANN) has been proved as a feasible tool to forecast complex nonlinear statistics while the time efficiency of neural networks is low [12]. In addition, gradient vanishing and local optimal solution affect the further development of ANN model. Based on ANN, recurrent neural network (RNN) was proposed which would memorize short part information of previous stage [13]. In 2014, gated recurrent unit (GRU) is proposed by Cho et al. as a variant of LSTM [14-15]. LSTM and GRU could address the gradient vanishing issue of RNN.Lecun et al. propose the Convolutional Neural Network in 1988 which is a feedforward neural network to solve time series issues [16-17]. CNN-LSTM is widely used in time financial area and further research have been taken to improve it.The first method to improve model is building more complex models. Wang et al. state the CNN-BiSLSTM model has better forecasting accuracy than CNN-LSTM [18]. Kim T and Kim HY prove that CNN-LSTM model combined with stock price features is more effective [19]. Dai et al. proposed a Dual-path attention mechanism with VT-LSTM which improve the model accuracy [20].Price factors selection and pre-processing is another direction to improve models. Zhang et al. add industry factor as model inputs which contributes to better prediction results [21]. The research of Kang et al. proves the self-attention input contributes smaller prediction error [22]. Yu et al. verified that the amount of training samples affects the effectiveness and accuracy of deep learning models [23].3 MDTThe traditional data processing method for the deep learning is the sliding window method [24]. It divides a time series into multiple consecutive subsequences of length along the time step. The two-dimensional time series matrix will be divided it into multiple fixed-size sub-matrices as the inputs of deep learning.The sliding windows fails to consider the correlations of multidimensional time series. To solve this issue, this paper introduces the multi-directional delayed embedding (MDT) tensor processing technology. Shi et al. combine the MDT method and ARIMA model to prove MDT will improve the accuracy of model [25].MDT method will transform daily stock factor vector x=(x1,x2,…,x n),T∈R n into a Hankel matrix M(x) shown in Figure 1:τ1 China Ping An Insurance (Group) Co., Ltd. (hereinafter referred to as "Ping An",) was born in Shekou, Shenzhen in 1988. It is the first joint-stock insurance enterprise in China, and has developed into an integrated, close and diversified comprehensive financial service group integrating financial insurance, banking, investment and other financial businesses.Theory and Practice of Science and Technology The MDT operation can be represented by following formula:M τ(x )=fold (n ,τ)(Cx )Function fold (n ,τ):R τ×(n -τ+1)→R τ×(n -τ+1)is a folding operator that converts vectors into a matrix. Set the Hankel matrix M τ(x )=(v 1,v 2,…v n -τ+1), where v i represents the number i column vector of the Hankel matrix:vi =(xi ,xi +1,…xτ)T 4 CNN-CBAM-GRU(1) CNNCNN is widely used in time series data prediction because of its good performance and time saving. CNN includes pooling layers which transform the data to reduce the feature dimension:l t =tanh (x t *k t +b t )Where l t represent the output of after convolution neural network, x t represents the input vector, k t represents the weight of the convolution kernel, b t is the convolution kernel bias, and tanh is the activation function.(2) CBAMSanghyun et al. introduce the Convolutional Block Attention Module in 2018 which is a simple and effective module which has been widely used in CNN model [26]. The overview of CBAM is presented in Figure 2:The technological process can be concluded as:F 1=Mc (F )⊗F ,F 2=Ms (F 1)⊗F 1,F represents the input which is intermediate feature map F ∈R C ×H ×W . Mc ∈R C ×1×1is a 1D channel attention map and Ms ∈R 1×H ×W is a 2D spatial attention map. ⊗ is the element-wise multiplication which broadcasts the attention values.Channel attention module compress the spatial feature dimension of the input by utilizing the Figure 1 The transformed Hankel matrixFigure 2 The schematic diagram of CBAM 83Yangwenyuan Deng Avg Pooling and Max Pooling at the same time:Mc (F )=σ(MLP (AvgPool (F ))+MLP (MaxPool (F )))=σ(W 1(W 0(F c avg )))+W 1(W 0(F ))Where W 0∈R cr ×c ,W 1∈r cr ×c . the Spatial Attention Module address the issue of where the efficient information area is by aggregating two pooling operations to generate two 2D maps:Mc (F )=σ(f 7×7([AvgPool (F )]))MLP (MaxPool (F ))=σ(f 7×7[F s avg ,F s max ])(3) GRUGRU merge input gate and forget gat into an update gate to improve the efficiency of training while maintain the model accuracy [27]. GRU has two gate structure which respectively are update gate and reset gate. The overview of GRU is presented in Figure 3:1) r t represent the reset gate which controls the amount of the information needed to be forgotten in previous hidden layer h t -1.2) The update gate Z t control the extent to which the information of previous status is brought into current status h ~t .3) W is the weight matrix, b is the bias vector, [h t -1,x t ] represents the connection of the two vectors. σ and tanh are the sigmoid or hyperbolic tangent functions.The process of GRU could be summarized as follow:Z t =σ(W z ⋅h t -1+W z ⋅x t ),rt =σ(W r ⋅ht -1+W r ⋅x t ),h ~t =tanh (W h ~⋅(r t ⊙h t -1)+W h ~⋅x t ),h t =(1-Z t )⊙h t -1+Z t ⊙h t Where ⋅ represents matrix multiplication, and ⊙ represents matrix corresponding elementmultiplication.Figure 3 Gated Recurrent Unit 84Theory and Practice of Science and Technology (4) CNN-CBAM-GRU training and prediction process1) Standardized inputs: Before the MDT process, data of each column have been processed with Z-score normalization:z i =x i -μσ2) Where μ is the mean, σ is the standard deviation. Then, the normalized data will be transferred to Hankel matrices by MDT.3) Network Initialization: initialize the weights and biases of CNN-CBAM-GRU layers.4) CNN layers: through CNN layers, the key features of Hankel matrices are drawn as the input for later layers.5) CBAM module: The CBAM module will further process the features.6) GRU layers: the processed data are used by GRU to predict the close price.7) Output layer: full connection layers utilize the outputs of GRU to calculate the weight of model.8) Prediction result test and circulation: Judge whether the validation loss reduce after training. Return to step 3 until finish all epochs.9) Saving the best model: If validation loss of this epoch is smaller than the previous stored one, save current model as the best model in the experiment folder.10) Load the best model: load the model structure and weight.11) Prediction and denormalization: utilize the weight of best model to predict the test set close price. The prediction result will be denormalized and compared with true value.12) Experiment result: visualize the result and present the evaluation index results.5 Experiments(1) Experimental EnvironmentA notebook computer, equipped with NVIDIA GeForce GTX 1060 6G and Intel 8750H, implements all experiments. Python 3.9 is the programming language. Anaconda with Jupyter notebook is used as the program platform and Keras built in TensorFlow package construct the neuralnetworkFigure 4 The process of model 85Yangwenyuan Deng structure.(2) Experimental DataChina Ping An price factors is used as experimental data and the close price is the forecasting target. The experimental data contain 6000-day price data from 1994 to 2020 downloaded from the Baostock. Total data is divided into 3 parts: 80% for train set and 10% for both validation set and test set. This paper innovatively takes the CSI 300 index and moving average 5 as price factors. There is total 11 parameters to forecast the close price which is presented in Table 1:(3) Model ImplementationEvery model will independently run for 15 times to find the optimal weights. This paper chooses three evaluation indexes, respectively root mean square error (RMSE), mean absolute error (MAE), and R-square (R2) to evaluate the performance of different models. The formulas of them are calculated as follows:MAE =1n i =1n ||||||y ^i-y i ,RMSE =R 2=1-(∑i =1n (y ^i -y i )2)/n (∑i =1n ()y ^i -y i 2)/n ,Where y ^i represent the prediction value of models and y i is the true value. The closer value of MAE and RMSE to 0 indicates the better performance of model. The close value of R 2 to 1 represent the higher accuracy of model.(4) Implementation of MDT-CNN-CBAM-GRUThe pre-setting parameters of MDT-CNN-CBAM-GRU model are listed in Table 2.6 ResultsThe visual results are presented in Figure 5 to Figure 8. Where the orange line with * represent the prediction value of close price and the blue line represent the true value of close price.The evaluation index results of models are presented in Table 3:The average time for each step training is shown in Table 4:Table 1 Stock price factorsDate94-07Amount 1.165176e+07Volume 1385000Turn 0.51547Index 3.84893Open 0.41541PeTTM 12.58321PbMRQ 2.855036PctChg 3.026634Ma50.410159High 0.422353Low 0.42185886Theory and Practice of Science and TechnologyTable 2 Model parametersParametersConvolution layer filters Convolution layer kernel_size Convolution layer activation function MaxPooling2D pool_sizePooling layer paddingPooling layer activation function Dropout layersCBAM_attention reduce axisGRU layerskernel_regularizerNumber of hidden units in GRU layer 1 Number of hidden units in GRU layer 2 GRU layer activation function Dense layers kernel_initializer Dropout layers 2Learning rateTime_stepLoss functionBatch_sizeOptimizerEpochsValue643Relu2SameRelu0.232L2(0.01)12864Relu Random normal0.250.0011Mean square error64Adam200Figure 5 The prediction of CNN-LSTM8788Yangwenyuan Deng ArrayThe prediction of MDT-CNN-GRUFigure 6 The prediction of MDT-CNN-LSTMFigure 7 Theory and Practice of Science and Technology 7 ConclusionThe MDT-CNN-CBAM-GRU proposed has the optimal forecasting accuracy and satisfied time efficiency, which could provide reference for investors investing in share market.Compared with LSTM, GRU has better prediction accuracy and faster speed. However, here are some details to be improved in further research:(1) If time is enough, 30-time independent training for each model will be a better choice.(2) In further research, more experiment of GRU need to be conducted as the GRU has better performance compared with LSTM.(3) The generalization of models needs to be tested in future research by predicting different financial product such as funds, options and other stocks.About the AuthorYangwenyuan Deng, Master of Commerce in Finance of University of New South Wales, and his research field is Finance & Machine Learning.References[1] Meng, S., Fang, H. & Yu, D. (2020). Fractal characteristics, multiple bubbles, and jump anomalies in the Chinese stock market. Complexity, 2020: 7176598.[2] ABU-MOSTAFA, YS. & ATIYA, AF. (1996). Introduction to financial forecasting. Applied intelligence, 6: 205-213.[3] Huang, QP ., Zhou, X., Wei, Y & Gan, JY. (2015). Application of SVM and neural network model in the stock prediction research. Microcomputer and Application, 34: 88-90.[4] Chen, S., Goo, YJ. & Shen, ZD. (2014). A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements. The Scientific World Journal, 2014: 968712.[5] Fang, X., Cao, HY. & Li, XD. (2019). Stock trend prediction based on improved random forest algorithm. Journal of Hangzhou Dianzi University, 39: 25-30.[6] Zhang, QY., Yang, DM. & Hang, JT. (2021). Research on Stock Price Prediction Combined with Deep Learning and Decomposition Algorithm. Computer Engineering and Applications, 57: 56-64.[7] Luo, X. & Zhang, JL. (2020). Stock price forecasting based on multi time scale compound depth neural network. Wuhan Finance, 2020: 32-40.[8] Lu, W., Li, J., Li, Y., Sun, A. & Wang, J. (2020). A CNN-LSTM-Based Model to Forecast Stock Prices. Complexity, 2020: 6622927.[9] Zhao, HR. & Xue, L. (2021). Research on Stock Forecasting Based on LSTM-CNN-CBAM Model. Computer EngineeringTable 3 Evaluation index values of different modelsModelCNN-GRUMDT-CNN-LSTMMDT-CNN-GRUMDT-CNN-CBAM-GRU RMSE 0.32200.15980.09100.0890MAE 0.22680.13200.07940.0639R 20.94180.98660.99590.9959Table 4 Average training timeModelCNN-GRUMDT-CNN-GRUMDT-CNN-LSTMMDT-CNN-CBAM-GRU Time 2s 11ms / step 1s 9ms/ step 2s 10ms/ step 2s 10ms/ step 89Yangwenyuan Deng 90and Applications, 57: 203-207.[10] Cao, CF., Luo, ZN., Xie, JX. & Li, L. (2022). Stock Price Prediction Based on MDT-CNN-LSTM Model. ComputerEngineering and Applications, 58: 280-286.[11] Li, J., Pan, S., Huang, L. & Zhu, X. (2019). A machine learning based method for customer behavior prediction.Tehnicki Vjesnik Technical Gazette, 26: 1670-1676.[12] L¨angkvist, M., Karlsson, L. & Loutfi, A. (2014). A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognition Letters, 42: 11-24.[13] Sherstinsky, A. (2020). Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM)network. Physica D: Nonlinear Phenomena, 404: 132306.[14] Hochreiter, S. & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9: 1735–1780.[15] Cho, K., Van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. & Bengio, Y. (2014). Learningphrase representations using RNN encoder-decoder for statistical machine translation. arXiv, 1406: 1078.[16] Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. (1998). Gradient based learning applied to document recognition.Proceedings of the IEEE, 86: 2278-2324.[17] Hu, Y. (2018). Stock market timing model based on convolutional neural network – a case study of Shanghaicomposite index. Finance& Economy, 4: 71-74.[18] Wang, HY., Wang, JX., Cao, LH., Sun, Q. & Wang, JY. (2021). A Stock Closing Price Prediction Model Based on CNN-BiSLSTM. Complexity, 2021: 5360828.[19] Kim, T. & Kim, HY. (2019). Forecasting stock prices with a feature fusion LSTM-CNN model using differentrepresentations of the same data. PLOS ONE, 14: 0212320.[20] Dai, YR., An, JX. & Tao, QH. (2022). Financial Time-Series Prediction by Fusing Dual-Pathway Attention with VT-LSTM.Computer Engineering and Applications. 6:10.[21] Zhang, YF., Wang, J. Wu, ZH. & L, YF. (2022). Stock movement prediction with dynamic and hierarchical macroinformation of market. Journal of Computer Applications, 6:7.[22] Kang, RX., Niu, BN., Li, X. & Miao, YX. (2021). Predicting Stock Prices Using LSTM with the Self-attention Mechanismand Multi-source Data. Journal of Chinese Computer Systems,12: 9.[23] Yu, SS., Chu, SW., Chan, YK. & Wang, CM. (2019). Share Price Trend Prediction Using CRNN with LSTM Structure.Smart Science, 7: 189-197.[24] Li, XF., Liang, X. & Zhou, XP. (2016). An Empirical Study on Manifold Learning of Sliding Window of Stock Price TimeSeries. Chinese Journal of Management Science, 24: 495-503.[25] SHI, Q., YIN, J. & CAI, J. (2020). Block Hankel tensor ARIMA for multiple short time series forecasting. Proceedings ofthe AAAI Conference on Artificial Intelligence, 34: 5758-5766.[26] Woo, S., Park, J., Lee, JY. & Kweon, S. (2018). CBAM:convolutional block attention module. Proceedings of theEuropean Conference on Computer Vision (ECCV), 2018: 3-19.[27] Dang, JW. & Cong, XQ. (2021). Research on hybrid stock index forecasting model based on CNN and GRU.Computer Engineering and Applications, 57: 167-174.。
时间序列论文(英文)time seies project(1)
Abstract: This article collects a series quarterly data of C hina’s GDP from 1992 to 2010, and we use the method of factor decomposition to collect the long-term increasing trend and seasonality, then use ARMA model to fit the residuals, do analysis to get the final model and use it to generate a short-term GDP-forecast of china.Key words: factor decomposition; ARMA model; GDP forecast;1. Introduction1.1BackgroundFrom 1978, since the reform and opening up, china’s economy is developing rapidly and steadily. After joining the WTO, the developing speed has reached a new level. GDP (Gross Domestic Product) , which is the basis of national economic production of statistical indicators, can be used to reflect a country’s economy. It is the core of Statistical indicators in the national economy. GDP combines responses of the most basic aspects of macroeconomic, can not only measure the overall national output and income scale, but also can explore the economic fluctuations and cycles. Hence, it is of great importance to fit and analyze GDP accurately for exploring a count ry’s macroeconomics trend.The aim of this article is to generate a GDP forecast model and use it to predict the future GDP of china.1.2 MethodA lot of methods have been used to analysis economy phenomenon, time series analysis is one of the most efficient methods. A time series is a collection of observations of well-defined data items obtained through repeated measurements over time. Time-series methods use economic theory mainly as a guide to variable selection, and rely on past patterns in the data to predict the future. An observed time series can be decomposed into three components: the trend (long term direction), the seasonal (systematic, calendar related movements) and the irregular (unsystematic, short term fluctuations). When these factors occur, we can use the method of decomposition, from which can we collect useful information of the data, we defined it as factor decomposition here.The trend component typically represents the longer term developments of the time series of interest and is often specified as a smooth function of time.The recurring but persistently changing patterns within the years are captured by the seasonal component. It is quite common in economic time series, when it occurs, we should use seasonal adjustment method. Seasonal adjustment is the process of estimating and then removing from a time series influences that are systematic and calendar related. Observed data needs to be seasonally adjusted as seasonal effects can conceal both the true underlying movement in the series, as well as certain non-seasonal characteristics which may be of interest to analysts.The irregular component represents the irregular fluctuations which are affected by causal factors. It usually defined as residual. Considering the Insufficiency of the deterministic decomposition, we should test the residuals, if there is no autocorrelation among the residual, it means that the information of the time series is totally recovered by the deterministic decomposition.In the case of the existence of autocorrelation, ARMA model can be used to fit the residuals. ARMA is a one of those most common time series model which was used to make precise estimation according to short term data. Its main idea can be concluded as a combination of several time-related components which can be used to predict the future data. The time series components from the ARMA model is a set of random variables which related to time itself, which shows uncertainty when observed individually combined with each other shows some kinds of regularity and can be expressed by corresponding statistical model. The ARMA model consists of two parts, an autoregressive (AR) part and a moving average (MA) part. The model is usually then referred to asthe ARMA(p,q) model where p is the order of the autoregressive part and q is the order of the moving average part.2. Data Analysis2.1 DatasetThe data we collected contains historical GDP from 1992-2010, the reason we choose this time duration rather than the 1978-2011 which most other prediction article would like to use is that during the first 10-15 years the economic growth rate is relatively slow compared with the later year’s (1990-now) growth. So we would like to wipe out the interference of the early data. Another reason we use recent years data (1992-2011) is that it is hard for us to look for the quarterly GDP data before 1992 due to the imperfection of the statistical system of China in the end of 20th century.We are going to use these historical GDP data as a time series, learn and analyze the data, then based on the past patterns to get a forecast model, use the model to predict the future GDP.2.2 Data Graphical AnalysisFigure 1 shows a plot of the data, and we can find that there is a significant long-term trend and varying seasonality in the time series. The trend seems to be quadratic while the seasonality illustrate a strong yearly component occurring at lags that are multiples of s=4. For the purpose of demonstration, the sample ACF of the data is displayed in Figure 2, also, it appears a significant seasonality.Figure 1.Quartely china’s GDP from 1992(1) to 2010(4) Figure 2.Sample ACF of the GDP data3. Time Series Model3.1 Factor DecompositionAfter the previous analysis, now we are going to use the method of factor decomposition to build a time series model, its principle is that through the decomposition method, we collect the usefulinformation and measure the influence of the trend and seasonality. Define t Y as GDP, x as time. We set the decomposition model as bellows: (1)t t t t Y T S ε=++(2) 230123t T x x x ββββ=+++(3) 0112233t S D D D αααα=+++The reason we use (2) as the trend model is because we can find that it seems to be a thrice model from the pattern in figure 2. 1D ,2D and 3D in (3) is the dummy variables of seasons, and D1=c(1,0,0,0,1,0,0,0,………) D2=c(0,1,0,0,0,1,0,0,……..) D3=c(0,0,1,0,0,0,1,0,……..)3.11 Data TransformationA significant varying seasonality is observed from figure 1, since the varying seasonality will have negative effects on the model fitting, so we should take some transformation of the GDP value to make the seasonality constant. The Box-Cox transformation is part of the family of power transformation, where the data is transformed using a power functions whilst preserving the rank of the data, so we take Box-Cox transformation of GDP.Figure 3 illustrate the box -cox plot of GDP, Looking at the Box -Cox diagram in figure 3, λ is near the0.2 mark, so we use 0.2GDP as a new response variable defined as 't Y . Figure 4 shows the GDP plot after transformation. From the plot we can find that the seasonality is almost constan t.02040602000060000100000IndexY102030405060-0.20.00.20.40.60.81.0LagA C FSeries YFigure 3.box-cox transformation Figure 4. Quartely china’s GDP after transformation3.12 Build ModelAfter transformation, now the decomposition model is tt t t Y T S ε'=++. Using the “R” statisticspackage to analyse the time series and build model, the result of the full model is as follows:LM1: 231235.9760.092190.0015320.000014720.50720.37630.3433t t Y x x x D D D ε'=+-+---+ T = (110.322) (16.798) (-9.281) (10.442) (-15.340) (-11.405) (-10.418) P = (<2e -16) (<2e -16) (9.22e -14) (7.62e -16) (<2e -16) (<2e -16) (8.40e -16) Multiple R -squared: 0.9934 Adjusted R -squared: 0.9928F -statistic: 1724 p -value: 2.2e -16In LM1, the T values of the coefficients are all reasonable, the P values of t test are all very smallfrom which we can conclude that each explanatory variable is significant for the model fitting. F-statistic is 1724 and its p value is small enough, so suggesting that the regression is very significant. Notice that the adjusted R-squared equals to 0.9928, which means that about 99.28% of GDP can be explained from the model, indicating that the model fit the data quite well.Consquensely, the model we choose is quite a perfect one, but we still do some test of the residuals.3.13 Residuals AnalysisFigure 5 displayed a plot of residuals and fitted, we can find a significant cyclical trend in the plot. Figure 6 shows the ACF plot of the residuals, the plot also give a strong evidence that there is a cyclical trend, suppose i t is because the previous model did not fully catch the data’s seasonality though th e model fit the data quite well, in other words, it is still caused by the seasonality. Hence we would better to take some measure to deal with the cyclical residuals.-2-1012-950-900-850-800-750l o g -L i k e l i h o o d95%204060678910IndexY 1Figure 5.Residuals Vs FittedFigure 6. ACF of the Residuals3.13 Residual ARIMA ModelSince the cyclical trend has been observed in residuals, we are going to use the ARIMA model to fit the residuals, because as previous said, the cyclical is caused by the seasonality, so at first, we take one order difference of the residuals at s=4 quarters.Figure 7. ACF and PACF of the Residuals after difference0204060-0.2-0.10.00.10.2datar e s i d u a l s204060-0.4-0.20.00.20.40.60.81.0LagA C FSeries res10203040506070-0.20.20.61.0LagA C FSeries res110203040506070-0.20.20.6LagP a r t i a l A C FSeries res1Figure 7 shows the ACF plot and PACF plot of the residuals after first difference, the ACF plot seems to be more reasonable than the one before difference, and most of the ACF are within the bound line. Inspecting the ACF and PACF, we might feel that the ACF is cutting of at lag 3 and the PACF is tailing off at lag 1. For the purpose of choosing a better model, we are going to compare some reasonable ARMA models, such as AR(1), MA(3), ARMA(1,1), ARMA(1,2) and ARMA(1,3)Table 2 The AICc values of ARMA(p,q) model 01p ≤≤, 03q ≤≤.Table 3 The BIC values of ARMA(p,q) model 01p ≤≤, 03q ≤≤.Based on the AICc and BIC critical, we should choose the model with smallest AICc value or BIC value. According to table 2 and table 3, the ARMA(0,1) model has the smallest AICc value and BIC value as well. So we choose ARMA(0,1) model as the final model of the differenced residuals, say44(1)t t t R B εε=∇=-. The model is as follow:LM2 MA(1): 10.6582t t t R v v -=+Figure 8. Diagnostics for the MA(1) fit on the differenced residualsStandardized Res idualsTim e10203040506070-3051015-0.40.6ACF of Res iduals LAGA C F-2-1012-30Normal Q-Q Plot of Std Res idualsTheoretical QuantilesS a m p l e Q u a n t i l e s51015200.00.8p values for Ljung-Box s tatis ticlagp v a l u eThe diagnostics for the model are displayed in Figure 8. Notice that the few outliers in the series as exhibited in the plot of the standard residuals and their normal Q-Q plot, and some of autocorrelation that still remains according to the p-values for Ljung-Box statistical plot. but otherwise, the model fits well. So we are going to use the full model to predict the future GDP of China, the full model is as follow:231235.9760.092190.0015320.000014720.50720.37630.3433t t Y x x x D D Dε'=+-+---+410.6582t t tv vε-∇=+4. PredictionForecasts based on the full fitted model for the next 8 quarters are shown in table 4, this is because of the principle of time series, it is considered to be applied in doing short-term forecast. Compared with the actual GDP value of 2011, the predicted value is a little bigger than the actual GDP value, especially the fourth quarter GDP of 2011. However the predictions here are still reasonable, regardless of the wide of the 95% intervals, the actual value are all within the intervals. We also predicted the quarterly China’s GDP value here.5. ConclusionAs we can see, the data shows obvious characteristics of seasonality and uptrend in long term as well as some kind of periodicity which are very significant from both patterns we've drawn and the model we've constructed . The long term trend shows a sign of nonlinearity. We uses the method of factor decomposition which we have defined in the first part of the article whom has the advantages of directly viewing and easiness to be understood, meanwhile, the weak point of this model is the insufficient use of the information in the residual, and that's why we use ARMA model to fit its residual which can make use of the remaining residual information at large.China now is the world's largest economy, and it is critical to make macro-control and policy-making to continue its booming economy due to the result we have got from the GDP prediction. Time series method is suitable for short-term prediction, and we predict the 8 quarters GDP value after 2010. The prediction we've made shows that China's economy is currently developing at a very high speed which is in line with reality.Appendix> data=read.table("st5209new.txt",header=T)> Y=data$GDP> x=data$time> x2=x^2> x3=x^3>plot(Y,ty=”l”)>acf(Y)>D1=c(1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0, 1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0)>D2=c(0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0, 0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0)>D3=c(0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0, 0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0)>lm1=lm(Y~x+x2+x3+D1+D2+D3)>boxcox(lm1)>Y1=Y^0.2>plot(Y1,ty=”l”)>lm2=lm(Y1~x+x2+x3+D1+D2+D3)>summary(lm2)>res=lm2$residuals> plot(res,xlab="data",ylab="residuals")> abline(h=0)>acf(res,100)>pacf(res,100)>res1=diff(res,lag=4)>acf(res1,100)>pacf(res1,100)>sarima(res1,1,0,0)>sarima(res1,0,0,1)>sarima(res1,0,0,2)>sarima(res1,0,0,3)>sarima(res1,1,0,1)>sarima(res1,1,0,2)>sarima(res1,1,0,3)> sarima.for(res1,n.ahead=8,0,0,1)>new=data.frame(x=c(77,78,79,80,81,82,83,84),x2=c(5929,6084,6241,6400,6561,6724,6889,7056),x3=c( 456533,474552,493039,512000,531441,551368,571787,592704),D1=c(1,0,0,0,1,0,0,0),D2=c(0,1,0,0,0,1, 0,0),D3=c(0,0,1,0,0,0,1,0))> predict(lm4,new,interval="prediction")。
统计学文献7
A NOTE ON THE FILTERING FOR SOME TIME SERIES MODELSBy S.Peiris and A.ThavaneswaranUniversity of Sydney,University of ManitobaFirst Version received May2002Abstract.This paper is concerned withfiltering for various types of time series modelsincluding the class of generalized ARCH models and stochastic volatility models.Weextend the results of Thavaneswaran and Abraham(1988)for some time series modelsusing martingale estimating functions.Nonlinearfiltering for biostatistical time seriesmodels with censored observations is also discussed as a special case.Keywords.Time series;censored observations;filtering;recursive;correlation;biosta-tistics;Kalmanfilter;volatility.1.INTRODUCTIONCensored observations may arise naturally in time series if there is an upper and/ or lower limit of detection,e.g.when one is monitoring levels of an airborne contaminant or recording daily bioassays of hormone levels in a patient.When some observations are censored,regression analysis with autoregressive errors had been studied in Zeger and Brookmeyer(1986).The‘static’models for discrete time which treat baseline hazard coefficients and covariate parameters as‘fixed effects’are appropriate if the number of intervals is comparably small.In situations with many intervals,but not enough to apply models for continuous time,such unrestricted modelling andfitting of hazard functions will often lead to nonexistence and divergence of ML estimates due to the large number of parameters.This difficulty in real data problems becomes even more apparent if covariate effects are also assumed to be time-varying,as, for example,the effect of a certain therapy in medical context.To avoid such problems,one may investigate more parsimonious para-meterization by specifying certain functional forms(e.g.piecewise polynomials) for time-varying coefficients.However,simply imposing such parametric functions can conceal spikes in the hazard function or other unexpected patterns.We describe here anotherflexible approach by adopting state space techniques.The dynamic models obtained in this way make simultaneous analysis and smoothing of the baseline hazard function and covariate effects possible. Fahrmeir(1994)and Fahrmeir and Tutz(1997)studied parameter estimation for time series models with censored observations to solve many related problems assuming a ramdom walk parameter process driven by a Gaussian noise. 0143-9782/04/03397–407JOURNAL OF TIME SERIES ANALYSIS Vol.25,No.3Ó2004Blackwell Publishing Ltd.,9600Garsington Road,Oxford OX42DQ,UK and350Main Street,Malden,MA02148,USA.Abraham et al.(1998)and Thavaneswaran and Heyde(1999)studied the prediction problem for various class of time series models following Godambe (1999).In this paper,we study the associatedfiltering problem via the estimating function approach.Section2gives some basic results as three lemmas,thefirst one on theorem on normal correlation and the rest are based on the extended version of the theorem on normal correlation forfixed and randomly censored cases.Section3gives some examples of popular time series models to show the importance and applicability of the results in Section2.In Section4,we consider a case with censored data.2.SOME BASIC RESULTSThis section considers three lemmas based on the correlation coefficient and a normal model to form a basis(for some results)of the paper.Lemma1.Let Y and Q be two jointly distributed normal random variables havingmeans l Y,l h and the variances ry 2and rh2respectively.Then the mean and thevariance of Q conditional on Y areE½H j Y ¼l hþqr hr yðYÀl yÞ;ð1ÞandVarðH j YÞ¼r2hð1Àq2Þ;ð2Þwhere q is the correlation between Q and Y.That is,the conditional probability distribution of Q given Y¼y is normal with mean linear in y and variance independent of y.Now we consider the probability distribution of afixed censored variable Y defined byY¼YÃ;if YÃ>0 0;if otherwise,where Y*$N(l,r2).The probability distribution of Y is the mixture of a discrete and an absolutely continuous probability distributions on R+.Specifically, we havePðY¼0Þ¼PðYÃO0Þ¼UðÀl=rÞ;where U(Æ)is the cumulative distribution function(cdf)of the standard normal distribution.The unconditional and conditional moments of this right censored variable are important in many applications and can be computed using the Lemma2below. 398S.PEIRIS AND A.THAVANESWARANÓBlackwell Publishing Ltd2004Lemma 2.Moments with censored observationsLet Y ¼Y *I Y *>0,where Y *$N (l ,r 2).Then we have(i)EY ¼r/(l /r )+l U (l /r )¼r W (l /r ),whereW ðy Þ¼Z yÀ1U ðt Þd t ¼y U ðy Þþ/ðy Þ;and /(Æ)is the probability density function (pdf)the standard normal distribution,(ii)E (Y |Y >0)¼l +r h (l /r ),where h (x )¼/(x )/U (x ),(iii)VarY ¼r 2ðU ðlr Þþl r W ðl r ÞÀW 2ðl r ÞÞ;(iv)Var ðY j Y >0Þ¼r 2ð1Àlr h ðl r ÞÀh 2ðl r ÞÞ:The proof of this lemma is given in the Appendix.Now suppose that Q is a random variable given byH ¼H Ãif Y Ã>00otherwise,and letH ÃY à $N l h l y !;r 2h qr h r y qr h r y r 2y! !:Note that this generalizes the preceding case since the threshold (which was zero previously)is now given by Z *¼Q *)Y *.For this we haveH ¼H ÃI Y Ã>0¼H ÃI H Ã>Z Ã:Lemma 3reports the unconditional and conditional moments of Q :Lemma 3.The unconditional and conditional expectations of Q are (respectively )given byE H ¼l h U ðl y =r y Þþqr h /ðl y =r y ÞandE ðH j Y Ã>0Þ¼l h þqr h h ðl y =r y Þ:(See the Appendix for the proof.)Next section considers some applications of these lemmas in various time series models in practice.3.SOME EXAMPLESHere we begin with an example on Kalman filtering for a simple state space model and show that the filtered estimates do not even converge.399FILTERING FOR SOME TIME SERIES MODELS ÓBlackwell Publishing Ltd 20043.1.State Space ModelsConsider the simple state space model given byh tþ1¼a h tþb e tþ1y tþ1¼A h tþBe tþ1;where{e t+1}and{e t+1}are independent white noise sequences having mean zeroand variances r2e and r2erespectively.Let F y t be the r-field generated by{y t,y t)1,…}.We give the conditional mean and variance of h t in Lemma4:Lemma 4.The conditional mean of thefiltered estimate of htgiven F y t; m t¼Eðh t j F y tÞ,and its conditional variance c t¼E½ðh tÀm tÞ2j F y t are given by^htþ1¼a^h tþaA c tA2c tþB2r2e½y tþ1ÀA^h t ð3Þc tþ1¼Varðh tþ1À^h tþ1jF y tþ1Þ¼a2c tþb2r2eÀa2A2c2tA2c tþB2r2e:ð4Þ(See the Appendix for the proof.)Note:To obtain the steady state mean square error,set c t+1¼c t¼c which givesc¼a2cþb2ÀðaAÞ2c2 A2cþB2;where r2e ¼r2e¼1.This clearly shows the high volatility of thefiltered estimatein the long run.In most of the applications of Kalmanfiltering,even with normal variables,the linearfilters(non-optimal)given in(3)and(4)have been used as an approximation.3.2.ARCH modelsConsider an ARCH(1)model given byy t¼e t r tð5Þr2t¼h0þh1y2tÀ1;ð6Þwhere{e t}is an independent sequences of random variables having mean zero and variance r2e.We are interested in the recursive estimates of h0and h1.When data successively come in time or the data are unequally spaced due to missing values, errors,etc.,it is natural to look for a recursive estimates for the parameters involved.400S.PEIRIS AND A.THAVANESWARANÓBlackwell Publishing Ltd2004Let h t ¼y 2t Àh T u t À1ðy Þ;where h T ¼(h 0,h 1)and u t )1(y )is a 2·1columnvector satisfying u t À1ðy Þ¼1y 2t À1:The choice of h t can be extended to cover most of the ARCH models considered in the literature for suitably chosen h T and u t )1(y ).Following Thavaneswaran and Abraham (1988),the corresponding optimal estimating function can be written asg 0¼Xn t ¼2a 0t À1ðy 2t Àh T u t À1ðy ÞÞ;where a 0t À1¼½E ð@h t =@h Þ =E ½h 2t j F y t À1 .The resulting optimal estimate based on the first n observations is given by^h n ¼X n t ¼2a 0t À1u T t À1ðy Þ !À1X n t ¼2a 0t À1y t !:ð7ÞWhen the (n +1)th observation becomes available,the estimate based on all the observations is given by^h n þ1¼X n þ1t ¼2a 0t À1u T t À1ðy Þ !À1X n t ¼2a 0t À1y 2t!:Now,this can be written as^hn þ1À^h n ¼K n þ1X n þ1t ¼2a 0t À1y 2t ÀK À1n þ1^h n "#;whereK À1n þ1¼Xn þ1t ¼2a 0t À1u T t À1ðy Þ:Using the relationK À1n þ1¼K À1n þa 0n u T n ðy Þ;we have^h n þ1À^h n ¼K n þ1Xn þ1t ¼2a 0t À1y 2t ÀðK À1n þa 0n À1u T n ðy ÞÞ^h n "#¼K n þ1a 0n y 2n þ1Àa 0n u T n ðy Þ^hn h i ¼K n þ1a 0n y 2n þ1Àu T n ðy Þ^h n h i :Now it is clear that401FILTERING FOR SOME TIME SERIES MODELS ÓBlackwell Publishing Ltd 2004KÀ1nþ1¼KÀ1nþa0n u T nðyÞð8Þ^hnþ1¼^h nþK nþ1a0n½y nþ1Àu T nðyÞ^h n :ð9ÞThe algorithm in(9)gives a new estimate at time n+1as the old estimate at time n plus an adjustment.This adjustment is based on the prediction errory2 nþ1ÀE½y2nþ1j F y n ,since the term u T^h n¼E½y nþ1j F y n can be considered as anestimated forecast of y n+1given F yn .Given starting values h0and K0one cancompute the estimate recursively using(8)and(9).This algorithm also extends the model reference adaptive system algorithm proposed by Aase(1983)to derive an algorithm for a multiparameter case.The algorithm may be interpreted in the Bayesian set-up by considering the following state space formy2t¼h T t u tÀ1ðyÞþh th T t¼h Tand assuming that h t and h T are independently normally distributed.The algorithm obtained here is the same as the nonlinear version of the Kalmanfilter. It should be noted that we have not made any distributional assumptions on{e t} or on h T to obtain the recursive algorithms(8)and(9).If we solve the recursive relations(8)and(9)using initial values h0and K0we obtain an expression for^h n, the following‘off-line’version.^h n ¼h0KÀ10þX nt¼2a0tÀ1u TtÀ1ðyÞ!À1KÀ1þX nt¼2a0tÀ1y2t!:As K0fi1,^h n !X nt¼2a0tÀ1u TtÀ1ðyÞ!À1X nt¼2a0tÀ1y2t!:ð10ÞThis is the same as the m.l.e.if{e t}has a normal distribution.The basic univariate ARCH model has been extended in a number of directions,e.g.some are dictated by economic insight and others by(broadly) statistical ideas.The most important generalization of this class is the extension to include moving average terms and is called the generalized ARCH(GARCH) models.The simplest example of the GARCH class is GARCH(1,1),given by (Shephard,1996)y t¼e t r t:ð11Þr2 t ¼h0þh1y2tÀ1þb1r2tÀ1:ð12Þ402S.PEIRIS AND A.THAVANESWARAN ÓBlackwell Publishing Ltd2004This model can be written as a non-Gaussian linear ARMA model for y 2t asy 2t ¼h 0þh 1y 2t À1þb 1r 2t À1þv t ¼h 0þðh 1þb 1Þy 2t À1þv t Àb 1v t À1;ð13Þwhere v t ¼r 2t ðe 2t À1Þis a martingale diffing the martingale estimatingfunctions given in (7),one can easily obtain the recursive estimates of the parameters.In practice,the initial values of the parameters are obtained by first fitting an ARMA(1,1)model for the non-Gaussian process y 2t .By writing the model in the following state space form for y 2t y 2t ¼h 0þðh 1þb 1Þy 2t À1þv t Àb 1v t À1ð14Þv t ¼b 1v t À1þe tð15Þone can easily obtain the recursive estimates of r 2t .3.3.Filtering for unobserved ARCH or stochastic volatility modelsConsider the class of unobserved ARCH models studied in Shephard (1996).These models are defined by allowing random additive perturbations of the coefficients of ordinary ARCH models.That is,we assume that the process {y t }is given byy t ¼h t þg t ;ð16Þh t ¼ t r t ;ð17Þandr 2t ¼a 0þa 1h 2t À1;ð18Þwhere t and g t are mutually independent,normally distributed random variables with variances 1and r 2g respectively.Unlike the other ARCH-type models outlined above,for the model (16),it is not easy to evaluate E ðy t j F y t À1Þanalytically.This is because h t )1is not F y t À1measurable.Hence,it is convenient to think of these models as parameter-driven and classify them as stochastic volatility models .Now it is possible to obtain the filtered estimate of h t using the unconditional distribution of h t as N ð0;a 0þa 1h 2t À1Þand the theory of combined estimating functions given in Thavaneswaran and Heyde (1999).Section 4considers a special case with censored data.4.CENSORED DATASuppose that y t satisfies an AR(1)type equationy t ¼h y t À1þ t ;403FILTERING FOR SOME TIME SERIES MODELS ÓBlackwell Publishing Ltd 2004where the t are i.i.d.(independent and identically distributed)Gaussian variates with mean zero and variance r2.Suppose that we observe right censored observations,i.e.y t¼y t for y t6R t and y t¼R t for y t>R t,where the R t are known censoring levels(or limits).If the i th observation is censored,then we haveE hðy i j y i>R i;y iÀ1;...Þ¼h y iÀ1þE hð i j h y iÀ1þ i>R iÞ:The conditional mean and variance of y i are given in Theorem1below: Theorem1.(i)The conditional mean is given byE hðy i j y i>R i;y iÀ1;...Þ¼h y iÀ1þ/ðh y iÀ1ÀR iÞUðh y iÀ1ÀR iÞ;(ii)The conditional variance is given byVarðy i j y i>R i;y iÀ1;...Þ¼r2ð1Àlrhðh y iÀ1ÀR iÞÀh2ðh y iÀ1ÀR iÞÞ:The proof follows from Lemma2.Note that the conditional moments of the censored observations are nonlinear functions of the observations and,hence,the usual Kalmanfiltering algorithm is not applicable.4.1.ExampleNow we give an interesting biostatistical time series example of prediction with censored autocorrelated data.Censored observations may arise naturally in time series if there is an upper or lower limit of detection,e.g.when one is monitoring levels of an airborne contaminant or recording daily bioassays of hormone levels of patients.Regression analysis with autoregressive errors when some observations are censored had been studied in Zeger and Brookmeyer(1986). Let y t satisfyy t¼h y tÀ1þ t;where the t are i.i.d.Gaussian variates with mean zero and variance r2.We observe possibly censored data,i.e.,y t¼y t for y t P L t and y t¼L t for y t<L t where the L t are known limits of detection.If the i th observation is censored,thenE hðy t j y t<L t;y tÀ1;...Þ¼h y tÀ1þE hð t j h y tÀ1þ t<L tÞ:It is not difficult to show thatE hðy t j y t<L t;y tÀ1;...Þ¼h y tÀ1À/ðh y tÀ1ÀL tÞ1ÀUðh y tÀ1ÀL tÞ:404S.PEIRIS AND A.THAVANESWARAN ÓBlackwell Publishing Ltd2004It is of interest to note that,even if the error distribution is non-normal,the posterior mean(or the mean square predictor)based on censored observations becomes a nonlinear function of the observations.This shows that,even for linear models with censored observations,nonlinear predictors are more appropriate.APPENDIXProof of Lemma2.First we give the proof of(i).The other statements,(ii)–(iv),can be proved similarly.We haveEY¼Z10y1r/yÀlrd y¼rZ1Àl=r t/ðtÞd tþlZ1Àl=r/ðtÞd t¼r/lþl Ul¼r Wl:The proof of(ii)follows from the fact that for any event B s.t.P(B)>0and if E(Y)exists, so does E[Y|B]andE½Y j B ¼E½YI B PðBÞand by(i).The proof of(iii)follows by observingEY2¼Z10y21r/yÀlrd y:Using the fact that d/(t)¼)t/(t)d tEY2¼r2U l rþlrWlrh i:The proof of(iv)is similar to the proof of(ii).Proof of Lemma3.It is useful to introduce the conditional expectationEðHÃj YÃÞ¼l hþqr hr yðYÃÀl yÞ:Then we can writeEðHÞ¼PðYÃ>0ÞEðH j YÃ>0Þ¼PðYÃ>0ÞEðHÃj YÃ>0ÞEðHÞ¼PðYÃ>0ÞÂðEðEðHÃj YÃÞj YÃ>0ÞþEðHÃÀEðHÃj YÃÞj YÃ>0ÞÞ: Using the fact that Q*)E(Q*|Y*)and Y*are independent,we haveEðHÃÀEðHÃj YÃÞj YÃ>0Þ¼EðHÃÀEðHÃj YÃÞÞ¼E HÃÀEEðHÃj YÃÞ¼0:405FILTERING FOR SOME TIME SERIES MODELSÓBlackwell Publishing Ltd2004HenceE H¼PðYÃ>0ÞEðEðHÃj YÃÞj YÃ>0Þ¼PðYÃ>0Þl hÀqr h r yr2yl yþqr hr yEðYÃj YÃ>0Þ!E H¼Uðl y=r yÞl hþq r hyr y/ðl y=r yÞy y!¼l h Uðl y=r yÞþqr h/ðl y=r yÞ:Hence thefirst result follows.Similarly,E½H j YÃ>0 ¼E½HUðl yr yÞ¼l hþqr h kl yr y:Proof of Lemma4.y tþ1Àh tþ1¼A h tÀa h tþBe tþ1Àb e tþ1:h tþ1ÀE½h tþ1jF y t ¼a h tþb e tþ1ÀE½a h tþb e tþ1jF y t ¼aðh tÀ^h tÞþb e tþ1m tþ1¼y tþ1ÀE½y tþ1jF y t ¼Aðh tÀ^h tÞþBe tþ1:NowVar½h tþ1ÀE½h tþ1jF y t ¼a2þb2r2eCovðy tþ1ÀE½y tþ1jF y t ;½h tþ1ÀE½h tþ1jF y t Þ¼aA c tVarðy tþ1ÀE½y tþ1jF y t Þ¼A2c tþB2r2e:Recursive estimates are given by^htþ1¼E½h tþ1jF ytþ1¼E½h tþ1jF y t;m tþ1¼E½h tþ1jF y t þaA c tA2c tþB2r2e½y tþ1ÀEðy tþ1jF y tÞ^htþ1¼a^h tþaA c tA reþB r e½y tþ1ÀEðy tþ1jF y tÞc tþ1¼Varðh tþ1À^h tþ1jF ytþ1Þ¼a2c tþb2r2eÀa2A2c2tt e:ACKNOWLEDGEMENTSThe second author acknowledges the NSERC support for this work.The paper was completed while the second author was visiting from the Department 406S.PEIRIS AND A.THAVANESWARANÓBlackwell Publishing Ltd2004Statistics,University of Manitoba,Canada.He also thank the Statistics research group,School of Mathematics and Statistics,University of Sydney for the support during his visit.We thank the referee and Professor M.B.Priestley for their valuable comments and useful suggestions to improve the quality and the readability of the paper.REFERENCESAase ,K.K.(1983)Recursive estimation in non-linear time series models of autoregressive type.J.R.Stat.Soc.B 45,228–37.Abraham ,B.,Thavaneswaran ,A.and Peiris ,S.(1998)On the prediction for nonlinear time series models using estimating functions.In IMS Selected Proceedings of the Symposium on Estimating Functions (eds I.V.Basawa and R.L.Taylor ),32,259–68.Fahrmeir ,L.(1994)Dynamic modelling and penalized likelihood estimation for discrete time survival data.Biometrika 81,317–30.Fahrmeir ,L.and Tutz ,G.(1997)Multivariate Statistical Modelling based on Generalized Linear Models .New York:Springer-Verlag.Godambe ,V.P.(1999)Linear Bayes and optimal estimation.Ann.Inst.Statist.Math.51,201–15.Shephard ,N.(1996)Statistical aspects of ARCH and stochastic volatility.In Time Series Models in Econometrics,Finance and Other Fields (eds D.R.Cox ,D.V.Hinkley and E.Brandorff-Nielson ).London:Chapman and Hall,1–55.Thavaneswaran , A.and Abraham , B.(1988)Estimation of nonlinear time series models using estimating equations.J.Time Series Anal.9,99–108.Thavaneswaran ,A.and Heyde ,C.C.(1999)Prediction via estimating functions.J.Statist.Planning and Inf.77,89–101.Zeger ,L.and Brookmeyer ,R.(1986)Regression analysis with censored autocorrelated data.J.Amer.Statist.Assoc.81,722–9.407FILTERING FOR SOME TIME SERIES MODELS ÓBlackwell Publishing Ltd 2004。
LSTM组合预测模型方法及其应用
SOFTWARE 2020软 件第41卷 第11期2020年Vol. 41, No.110 引言时间序列数据[1]是指一组按照时间发生先后顺序进行排列的数据点序列。
通常一组时间序列的时间间隔为以恒定值,比如1分、1时等。
通过对历史时间序列的分析,能在一定程度上达到对未来长期或短期的精确预测,进而为生产实践创造更大的收益。
文献[2]-[4]使用单一预测模型对交通流量等进行预测,虽能达到预测效果,但效果不是很理想,同时也缺基金项目:国家重点研发计划(No.2018YFC0808306);河北省重点研发计划项目(19270318D);河北省物联网监控工程技术研究中心(No.3142018055);青海省物联网重点实验室(No.2017-ZJ-Y21)作者简介:朱洪根(1996―),男,江西修水人,硕士研究生,研究方向:数据挖掘和数据的有效性分析。
通讯作者:田立勤(1970―),男,陕西定边人,博士,教授,研究方向:随机Petri 网、物联网远程信息监控、网络安全评价与用户行为认证、网络性能评价与优化。
LSTM 组合预测模型方法及其应用朱洪根1 田立勤1,2 魏君3 陈楠1(1.华北科技学院计算机学院,北京 101601;2.青海师范大学计算机学院,青海西宁 810000;3.河北省环境监测中心,河北石家庄 050000)摘 要:在时间序列预测方法中,传统单一预测模型不能有效降低时间序列数据中的随机误差对模型预测结果的影响,导致模型预测结果精度较低。
针对此问题,提出长短时记忆网络(long short-term memory)组合模型预测方法。
首先使用时间序列训练集数据训练LSTM 模型,计算得到预测目标值在训练集上的残差序列;然后将该残差序列和训练集数据用于训练LSTM 模型,得到最终预测模型,记为LSTM-LSTM。
仿真结果表明,LSTM-LSTM 的预测误差低于单一LSTM 模型以及其他组合预测模型。
HIERARCHICAL TIME-SERIES PREDICTION METHOD
专利名称:HIERARCHICAL TIME-SERIES PREDICTIONMETHOD发明人:Davide Burba,Trista Pei-Chun CHEN申请号:US17132133申请日:20201223公开号:US20220156555A1公开日:20220519专利内容由知识产权出版社提供专利附图:摘要:A hierarchical time-series prediction method is adapted to a plurality ofreconciled predictions of a plurality of nodes of a hierarchical structure. The plurality of nodes have a plurality of time-series respectively, the plurality of reconciled predictionscorrespond to the plurality of time-series, the plurality of nodes comprises a plurality of bottom nodes, and the hierarchical time-series prediction method comprises: generating a plurality of individual predictions corresponding to the plurality of time-series respectively by a plurality of predictive models; generating a plurality of bottom-level predictions corresponding to the plurality of bottom nodes according to the plurality of individual predictions and an encoder network; and generating the plurality of reconciled predictions according to the plurality of bottom-level predictions and a decoder associated with the hierarchical structure.申请人:Inventec (Pudong) Technology Corporation,INVENTEC CORPORATION地址:Shanghai CN,Taipei TW国籍:CN,TW更多信息请下载全文后查看。
基于Prophet_与LightGBM_气象特征优化模型的短时机场飞行流量预测
第 22卷第 8期2023年 8月Vol.22 No.8Aug.2023软件导刊Software Guide基于Prophet与LightGBM气象特征优化模型的短时机场飞行流量预测李卿,罗佳(贵阳人文科技学院大数据与信息工程学院,贵州贵阳 550025)摘要:随着经济的高速发展,地域性枢纽机场飞行流量日益增大,若短时流量过大则易产生拥塞,管制部门为解决拥塞问题会采取飞行流量管制措施,从而造成大量航班延误并增加了管制员工作负荷。
针对以上问题,提出一种基于Prophet的高精度进出场流量预测模型。
该模型在时间序列上充分考虑到季节性、趋势性和假期性,分别采取不同的拟合函数,在周期性较强的数据集上具有良好的预测效果。
选择双流机场2019-07-27至2019-09-25之间的高精度起飞降落报文统计数据,对其进行数据清洗和Z-SCORE标准化处理后构建进出场流量数据集。
依照实际流量曲线特点,选择每日固定时间段作为变点调整Prophet预测模型,再结合图像进行可视化调参,将得到的结果与基于ARIMA和LSTM的模型结果进行比较,发现所提模型相较性能最为接近的LSTM算法在MAE上有17.9%的提升,在RMSE上有52.0%的提升,在MAPE上有50.82%的提升。
同时,为了解决单一特征在流量预测中的局限性,引入多种气象特征数据,并根据实际飞行情况进行特征转换,再结合LightGBM算法对Prophet预测结果进行优化修正,最终结果较原始Prophet得到进一步提升,且正确反映了气候变化对飞行流量的影响。
关键词:Prophet;LighGBM;METAR;短时机场飞行流量预测;时间序列DOI:10.11907/rjdk.231305开放科学(资源服务)标识码(OSID):中图分类号:TP391 文献标识码:A文章编号:1672-7800(2023)008-0111-06Short-term Airport Flight Flow Prediction Based on Prophet andLightGBM's METAR Features Optimization ModelLI Qing, LUO Jia(Faculty of Big Data and Information Engineering, Guiyang Institute of Humanities and Technology, Guiyang 550025, China)Abstract:With the rapid development of economy, the flight flow of regional Airline hub is increasing. If the short-term flow is too large,congestion is likely to occur. The control department will take flight flow control measures to solve the congestion problem, resulting in a large number of flight delays and increasing the workload of controllers. To address the above issues, a high-precision entrance and exit flow predic⁃tion model based on Prophet is proposed. This model fully considers seasonality, trendiness, and vacation characteristics in the time series,and adopts different fitting functions. It has good predictive performance on datasets with strong periodicity. Select the high-precision takeoff and landing message statistical data between 2019-07-27 and 2019-09-25 of Shuangliu Airport, and construct the inbound and outbound flow data set after Data cleansing and Z-SCORE standardization. According to the characteristics of the actual traffic curve, a daily fixed time period was selected as the change point to adjust the Prophet prediction model, and visual parameter adjustment was performed using images. The obtained results were compared with the results of models based on ARIMA and LSTM. It was found that the proposed model had a 17.9% improvement in MAE, 52.0% improvement in RMSE, and 50.82% improvement in MAPE compared to the LSTM algorithm with the closest performance. At the same time, in order to address the limitations of a single feature in flow prediction, multiple meteorological feature data were introduced, and feature transformations were carried out based on actual flight conditions. Combined with the LightGBM algorithm, the Prophet prediction results were optimized and corrected. The final result was further improved compared to the original Prophet, and correctly reflected the impact of climate change on flight flow.Key Words:Prophet; LightGBM; METAR; short-term airport flow prediction; time series收稿日期:2023-03-25基金项目:贵州省教育厅自然科学基金项目(KY-2021112)作者简介:李卿(1993-),男,硕士,贵阳人文科技学院大数据与信息工程学院讲师,研究方向为信息抽取和多模态模型;罗佳(1988-),女,硕士,贵阳人文科技学院大数据与信息工程学院副教授,研究方向为时序预测与网络性能优化。
基于ARIMA和LSTM混合模型的时间序列预测
第38卷第2期 计算机应用与软件Vol 38No.22021年2月 ComputerApplicationsandSoftwareFeb.2021基于ARIMA和LSTM混合模型的时间序列预测王英伟 马树才(辽宁大学经济学院 辽宁沈阳110036)收稿日期:2019-07-24。
王英伟,博士,主研领域:大数据,人工智能。
马树才,教授。
摘 要 由于现实中的时间序列通常同时具有线性和非线性特征,传统ARIMA模型在时间序列建模中常表现出一定局限性。
对此,提出基于ARIMA和LSTM混合模型进行时间序列预测。
应用线性ARIMA模型进行时间序列预测,用支持向量回归(SVR)模型对误差序列进行预测,采用深度LSTM模型对ARIMA模型和SVR模型的预测结果组合,并将贝叶斯优化算法用于选择深度LSTM模型的超参数。
实验结果表明,与其他混合模型相比,该模型在五种不同时间序列预测中能够有效提高预测精度。
关键词 ARIMA模型 SVR模型 深度LSTM模型 贝叶斯优化算法 时间序列预测中图分类号 TP302.7 文献标志码 A DOI:10.3969/j.issn.1000 386x.2021.02.047TIMESERIESFORECASTINGBASEDONARIMA_DLSTMHYBRIDMODELWangYingwei MaShucai(InstituteofEconomics,LiaoningUniversity,Shenyang110036,Liaoning,China)Abstract Becausereal worldtimeseriesusuallycontainbothlinearandnonlinearpatterns,traditionalARIMAmodelhasalimitedperformanceinthetimeseriesmodeling.Inviewofthis,weproposeARIMA_DLSTMhybridmodelfortimeseriesforecasting.LinearARIMAmodelwasusedfortimeseriespredictionfirstly,andthensupportvectorregression(SVR)wasusedforerrorseriesprediction.ThedeepLSTMmodelwasintroducedtocombinetheforecastsofARIMAmodelandSVRmodel,andBayesianoptimizationalgorithmwasadoptedtoobtaintheoptimalhyper parameterofdeepLSTMmodel.TheexperimentalresultsoffivetimeseriesforecastingshowthatARIMA_DLSTMmodelcaneffectivelyimprovethepredictionaccuracycomparedwithotherhybridmodels.Keywords ARIMAmodel SVRmodel DeepLSTMmodel Bayesianoptimizationalgorithm Timeseriesforecasting0 引 言时间序列预测在众多领域有广泛应用,如金融、经济、工程和航空等,并成为机器学习领域的重要研究课题[1]。
211196483_基于时序预测与异常检测的烟草违法销售预警
第41卷第3期2023年5月贵州师范大学学报(自然科学版)JournalofGuizhouNormalUniversity(NaturalSciences)Vol.41.No.3May.2023引用格式:肖霄,冯鹏程,刘露霓,等.基于时序预测与异常检测的烟草违法销售预警[J].贵州师范大学学报(自然科学版),2023,41(3):119 124.[XIAOX,FENGPC,LIULN,etal.Earlywarningofillegalsalesoftobaccobasedontimeseriespredictionandanormalydetection[J].JournalofGuizhouNormalUniversity(NaturalSciences),2023,41(3):119 124.]基于时序预测与异常检测的烟草违法销售预警肖 霄1,冯鹏程2,刘露霓1,张高豪1,江晶晶1,谢 刚2 ,游子毅3,冷继兵2(1.贵州省烟草公司贵阳分公司专卖管理监督科,贵州贵阳 550002;2.贵州师范大学大数据与计算机科学学院,贵州贵阳 550025;3.贵州师范大学物理与电子科学学院,贵州贵阳 550025)摘要:为提升烟草市场监管水平,通过某烟草专卖局的协作调研和历史销售数据,构建基于深度自回归网络(Deepautoregressionnetwork,DARN)和季节性自回归差分移动平均模型(Seasonalautoregressionintegratedmov ingaverage,SARIMA)的混合预测模型。
然后以预测销量为基础进行异常检测,设计了烟草商户违法销售预警模型。
实验表明混合预测模型较单个模型预测误差均有改善。
预警模型在测试集上达到50%查实率,满足市场监管预警基本要求。
关键词:时序预测;异常检测;烟草行业;销售预警中图分类号:TS4-06 文献标识码:A 文章编号:1004—5570(2023)03-0119-06DOI:10.16614/j.gznuj.zrb.2023.03.016EarlywarningofillegalsalesoftobaccobasedontimeseriespredictionandanormalydetectionXIAOXiao1,FENGPengcheng2,LIULuni1,ZHANGGaohao1,JIANGJingjing1,XIEGang2 ,YOUZiyi3,LENGJibing2(1.MonopolyManagementandSupervisionDepartmentofGuizhouTobaccoCompanyGuiyangBranch,Guiyang,Guizhou550002,China;2.SchoolofBigDataandComputerScience,GuizhouNormalUniversity,Guiyang,Guizhou550025,China;3.SchoolofPhysicsandElectronicScience,GuizhouNormalUniversity,Guiyang,Guizhou550025,China)Abstract:Inordertoimprovetheleveloftobaccomarketregulation,withthecollaborativeresearchandhistoricalsalesdataofamunicipalTobaccoMonopolyAdministration,amixedpredictionmodelbasedonthedeepautoregressionnetwork(DARN)andtheseasonalautoregressionIntegratedMovingAverage(SARIMA)modelsareconstructed.Then,basedontheforecastsalesvolume,abnormalde tectioniscarriedouttodesignapre warningmodelforillegalsalesbehaviorsoftobaccomerchants.Theexperimentalresultsshowthatthepredictionerrorofthemixedmodelisbetterthanthatofthesin glemodel.Theearlywarningmodelachieves50%verificationrateinthetestset,whichmeetsthebasicrequirementsofearlywarninginmarketsupervision.Keywords:sequentialpredict;abnormaldetection;thetobaccoindustry;salesearly warning收稿日期:2022-05-23基金项目:贵州省烟草公司贵阳市公司科技项目(黔烟筑科[2020]3号)通讯作者:谢 刚(1980-),男,博士,教授,研究方向:数据挖掘,E mail:48263091@qq.com.0 引言烟草行业一直是我国税收的重要来源。
二次指数平滑预测模型的一点探讨
利 用 指 数 平 滑 序 列 计 算 参 数 at 与 bt 袁
a扇设
设 设
t
=
2S
(1 t
)-S
(2 t
),
设
b缮设
设
设 墒设
t
=
琢 1-琢
(
S
( t
1
)
-
S
( t
2
)
)
构 造 预 测 模 型 院 x蓻 t + m = at + btm , m = 1 , 2 , 噎噎
蓻x t + m 是 第 t + m 期 的 预 测 值 遥
年份 t 1
2
3
4
5
6
7
8
9
(1)
St
11 . 14 11 . 12 12 . 48 15 . 11 17 . 13 19 . 50 21 . 17 24 . 29 27 . 81
S(2) t
11 . 59 11 . 41 11 . 83 12 . 09 14 . 11 15 . 31 17 . 89 19 . 45 22 . 79
156 科技视界 Science & Technology Vision
渊2冤
在进行指数平滑预测时袁平滑系数 琢 的选择很重
要袁若时间序列波动较小袁则 琢 应取小一点袁范围在
0.1~0.5袁若时间序列波动较大袁则 琢 应取大一点袁如
0.6~0.8 之间遥
2 二次指数平滑预测模型
当时间序列成直线趋势变动时袁用一次指数平滑
进行预测存在明显的滞后偏差袁而利用二次指数平滑
预测法袁建立直线预测模型能较好的消除滞后偏差袁
Science & Technology Vision
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
1. Introduction
1158
R.N. Yadav et al. / Applied Soft Computing 7 (2007) 1157–1163
the synaptic or receptor potentials through itself to the site of impulse initiation. Axons convert these signals into train of impulses popularly known as spikes [3]. However, these are simplified models and many instances of dendritic processing have been observed which serve as a principle substrate for information processing within the neuron itself. Dendrites provide a back-propagation medium for the neuron to itself and the interfacing between axons and dendrites are remodeled throughout life explaining the formation of long-term memory [4–6]. Multiplication in neurons often occurs in dendritic trees with voltage-dependent membrane conductances [7]. In [8], the authors provide a linear subspace based approach to model the computation abilities of a neuron model. Linearity is believed to be sufficient for capturing the passive, or cable, properties of the dendritic membrane where synaptic inputs are currents that add. However, synaptic inputs can interact nonlinearly when the synapses are co-localized on patches of dendritic membrane with specific properties. The spatial grouping of the synapses on the dendritic tree is reflected in the computations performed at local branches. An artificial neuron model should then be capable of including this inherent nonlinearity in the mode of aggregation. Multiplication being the most basic of all nonlinearities has been a natural choice of models trying to include nonlinearity in the artificial neuron model. In [1], the authors explain the relevance of using multiplication as a computationally powerful and biologically realistic possibility of synthesizing high-dimensional Gaussian radial basis function from low dimensionality. The role of multiplication is explained in the computation underlying motion perception and learning in pairs of individual synapses to a small set of neurons. The nonlinear capability of a neuron is usually modeled through a stationary nonlinearity introduced after the aggregation. This however is not sufficient to capture the possible nonlinear associations among the inputs to the single neuron system.
authors in [23,24]. However, with an increasing number of terms in the higher-order expression for the polynomial neuron, it is exceedingly difficult to train a network of such neurons. We consider a simpler model for the polynomial neuron with a well-defined training procedure based on standard back-propagation. The polynomial neuron proposed in this work considers a product of linear terms in each dimension of the space. Section 2 describes the single neuron systems and the related literatures. The description of the proposed multiplicative neuron with its capacity and learning rules is provided in Section 3. Section 4 discusses the detail applications of the proposed model for time series prediction problems. Section 5 provides the concluding remarks of the paper.
Time series prediction and system identifications are key problems of function approximation. Various neural network architectures and learning methodologies have been used in the literature for time series prediction. In this paper, we used a single multiplicative neuron for time series prediction. An artificial neuron is a mathematical model for the biological neuron and approximates its functional capabilities. The major issue in artificial neuron models is the description of single neuron computatieurons with the application of the input signals. The McCulloch–Pitts model initiated the use of summing units as the neuron model, while neglecting all possible nonlinear capabilities of the single neuron and the role of dendrites in information processing in the neural system. In [1], the authors discuss the relevance of multiplicative operations, particularly in computation underlying motion perception and learning. It has been further proved that Weierstrass’s theorem ensures that a network composed of one input layer and one hidden layer of product units can represent any continuous function on a finite interval [2]. The detail description of learning methodology in multiplicative neural networks has been provided by the
Department of Electrical Engineering, Indian Institute of Technology, Kanpur, India Available online 9 March 2006