南京大学学报(自然科学版) ›› 2023, Vol. 59 ›› Issue (3): 363–372.doi: 10.13232/j.cnki.jnju.2023.03.001

• •    下一篇

三支残差修正的时间序列预测

方宇1,2, 贾春虹1, 吴思琪1, 闵帆1,2()   

  1. 1.西南石油大学计算机科学学院,成都,610500
    2.西南石油大学机器学习研究中心,成都,610500
  • 收稿日期:2023-02-15 出版日期:2023-05-31 发布日期:2023-06-09
  • 通讯作者: 闵帆 E-mail:minfan@swpu.edu.cn
  • 基金资助:
    国家自然科学基金(62006200);中央引导地方科技发展专项(2021ZYD0003);2021年第二批产学合作协同育人项目(202102211111);西南石油大学2021年一流本科课程培育建设项目(X2021YLKC035);西南石油大学研究生全英文课程建设项目(2020QY04)

Time series prediction with three⁃way residual error amendment

Yu Fang1,2, Chunhong Jia1, Siqi Wu1, Fan Min1,2()   

  1. 1.School of Computer Science, Southwest Petroleum University, Chengdu, 610500, China
    2.Lab of Machine Learning, Southwest Petroleum University, Chengdu, 610500, China
  • Received:2023-02-15 Online:2023-05-31 Published:2023-06-09
  • Contact: Fan Min E-mail:minfan@swpu.edu.cn

摘要:

时间序列预测是大数据发展背景下的重要研究课题,具有广泛的应用前景,其主要任务是根据时序数据反映的发展规律去推测未来某阶段的走势,但大多数预测模型未能充分考虑残差带来的影响,无法取得更优的预测结果.提出一种三支残差修正的融合时序预测模型,能够有效地将残差圈定在一定范围内,提高时间序列的预测精度.首先,利用时间序列分解算法STL (Seasonal?Trend Decomposition Procedure Based on Loess)将时间序列分解为趋势项、周期项和余项;其次,针对分解后的三个分量,设计轻量级梯度提升机(Lightweight Gradient Boosting Machine,LightGBM)和时间卷积网络(Temporal Convolutional Network,TCN)的融合预测模型;最后,结合三支决策理论设计了三支残差修正算法,修正余项预测过程中产生的残差,进而修正时间序列的预测结果.实验结果证明,提出的模型在绝大多数情况下优于其他对比模型,预测效果更好.

关键词: LightGBM, STL, TCN, 时序预测, 三支决策

Abstract:

With the rise of big data as a backdrop,time series prediction is a significant research area with a wide range of potential applications. According to the development law reflected by time series data,the primary goal of time series prediction is to foretell the trend of a specific stage in future. Most prediction models fail to fully consider the impact of the residual error,which makes it difficult to obtain better prediction results. This paper proposes a fusion time series prediction model with three?way residual error amendment. This model effectively bounds the residual error within a certain range,thereby improves the prediction accuracy of time series. Firstly,the time series decomposition algorithm STL (Seasonal?Trend decomposition procedure based on Loess) is used to decompose the time series into trend item,seasonal item and remain item. Secondly,a fusion prediction model of lightweight gradient boosting machine (LightGBM) and temporal convolutional network (TCN) are designed for the three decomposed components. Thirdly,combined with the three?way decisions theory,the three?way residual error amendment algorithm is designed to correct the residual error generated in the prediction process of remain item. Finally,the time series prediction results are adjusted significantly and righteously. Experimental results show that the proposed model is superior to other models in the vast majority of cases and has better prediction effects.

Key words: LightGBM, STL, TCN, time series prediction, three?way decisions

中图分类号: 

  • TP181

图1

滑动时间窗口处理"

图2

3WREA算法的流程"

图3

DT?LGBM?3WREA模型的流程"

表1

实验使用的数据集描述"

ID名称数据集概述属性样本
1出生数量

1959年加利福尼亚州

每日女性出生人数

2365
2最低温度

1981-1990年墨尔本市

每日最低温度

23650
3广告数量

2017年9月13日至21日

每小时点击广告数

2216
4停车数量

2016年10月至12月

伯明翰市各区位停车数量

435717
5股票价格

2016-2018年每日

最低价、开盘价等信息

6731
6电力负荷

2012-2015年某地区

每日天气、电力负荷

71113
7燃气负荷

2016-2018年郑州市

每日天气、燃气负荷

91096
8交通流量

2012-2018年9月4号

州际公路每小时交通量

948204
9水痘病例

2005-2015年匈牙利

20个县每周水痘病例

21521

图4

对部分超参数的调整"

表2

TCN与LGBM的趋势项评估指标对比"

数据集MAERMSE
TCNLGBMTCNLGBM
股票价格0.06030.06660.07350.0814
燃气负荷0.01710.02960.02280.0423
电力负荷0.03790.04530.05690.0598
水痘病例0.05560.05670.08260.0788
交通流量0.04160.04800.05580.0593
停车数量0.03750.03970.04610.0478
最低温度0.03700.03940.04670.0500
广告数量0.05470.07640.06510.1030
出生数量0.06250.07040.07820.0852

表3

TCN与LGBM的周期项评估指标对比"

数据集MAERMSE
TCNLGBMTCNLGBM
股票价格0.29480.22290.30150.2757
燃气负荷0.29550.30000.37350.3802
电力负荷0.14830.22720.16150.2810
水痘病例0.31200.33860.36390.4283
交通流量0.12930.11660.12930.1573
停车数量0.38540.46160.53240.6093
最低温度0.17180.24550.20730.2802
广告数量0.14120.11370.23790.1249
出生数量0.10920.21470.12230.2349

表4

TCN,LGBM和LGBM?3WREA的余项评估指标对比"

数据集MAERMSE
TCNLGBMLGBM⁃3WREATCNLGBMLGBM⁃3WREA
股票价格0.26030.27390.22210.26670.34360.2373
燃气负荷0.41380.37640.24430.50380.49530.3163
电力负荷0.32180.35750.30580.39030.50450.3697
水痘病例0.39270.28900.16680.48370.41990.2528
交通流量0.27520.24200.22780.38280.42130.4537
停车数量0.24390.29610.19600.31780.56980.3066
最低温度0.54200.42370.20560.69960.53770.3162
广告数量0.59580.32590.34660.76140.38170.4961
出生数量0.53430.47750.33300.65650.61010.4480

表5

本文算法和对比算法在九个数据集上的MAE指标对比"

TCNLightGBMCNNLSTMDT⁃LGBMDT⁃LGBM⁃3WREA
股票价格0.14440.15910.18160.17680.10830.1093
燃气负荷0.02780.02260.02620.02120.02120.0211
电力负荷0.09160.07680.15630.10660.09070.1161
水痘病例0.16980.21540.10240.20650.14730.1591
交通流量0.05240.07010.07460.05840.05590.0499
停车数量0.04880.02070.02920.03860.03250.0277
最低温度0.06770.06780.06750.06700.06340.0528
广告数量0.06560.07790.06300.05950.05940.0539
出生数量0.10950.10750.11040.10920.13960.1070

表6

本文算法和对比算法在九个数据集上的RMSE指标对比"

TCNLightGBMCNNLSTMDT⁃LGBMDT⁃LGBM⁃3WREA
股票价格0.17620.20730.23330.22730.13160.1298
燃气负荷0.03330.03040.03460.02860.02810.0281
电力负荷0.11880.11550.19930.14180.11850.1447
水痘病例0.23430.27710.15150.26120.16850.1732
交通流量0.07110.09050.09780.07660.06840.0659
停车数量0.05770.02690.03610.04670.04690.0419
最低温度0.08570.08580.08610.08580.07880.0661
广告数量0.09340.10750.08290.07960.07510.0678
出生数量0.14160.13820.13820.13800.17630.1233
1 温玉莲,林培光. 基于行业背景差异下的金融时间序列预测方法. 南京大学学报(自然科学)202157(1):90-100.
Wen Y L, Lin P G. Financial time series forecasting method based on industry background differences. Journal of Nanjing University (Natural Science)202157(1):90-100.
2 苏雅茜,崔超然,曲浩. 基于自注意力移动平均线的时间序列预测. 南京大学学报(自然科学)202258(4):649-657.
Su Y X, Cui C R, Qu H. Self?attentive moving average for time series prediction. Journal of Nanjing University (Natural Science)202258(4):649-657.
3 Chen Y C, Huang W C. Constructing a stock?price forecast CNN model with gold and crude oil indicators. Applied Soft Computing2021(112):107760.
4 Yao H X, Wu F, Ke J T,et al. Deep multi?view spatial?temporal network for taxi demand prediction∥Proceedings of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans,LA,USA:AAAI Press,2018:316.
5 Huang J Y, Liu J H. Using social media mining technology to improve stock price forecast accuracy. Journal of Forecasting202039(1):104-116.
6 Shi X J, Chen Z R, Wang H,et al. Convolutional LSTM network:A machine learning approach for precipitation nowcasting∥Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal,Canada:MIT Press,2015:802-810.
7 Pak U, Ma J, Ryu U,et al. Deep learning?based PM2.5 prediction considering the spatiotemporal correlations:A case study of Beijing,China. Science of the Total Environment2020(699):133561.
8 Singh P, Dwivedi P. Integration of new evolutionary approach with artificial neural network for solving short term load forecast problem. Applied Energy2018(217):537-549.
9 Alzahrani S I, Aljamaan I A, Al?Fakih E A. Fore?casting the spread of the COVID?19 pandemic in Saudi Arabia using ARIMA prediction model under current public health interventions. Journal of Infection and Public Health202013(7):914-919.
10 Wang Z Y, Qiu J, Li F F. Hybrid models combining EMD/EEMD and ARIMA for Long?term stream?flow forecasting. Water201810(7):853.
11 Sun X L, Liu M X, Sima Z Q. A novel crypto?currency price trend forecasting model based on LightGBM. Finance Research Letters2020(32):101084.
12 Liu Y J, Dong H B, Wang X M,et al. Time series prediction based on temporal convolutional network∥2019 IEEE/ACIS 18th International Conference on Computer and Information Science. Beijing,China:IEEE,2019:300-305.
13 Jiao F, Huang L, Song R J,et al. An improved STL?LSTM model for daily bus passenger flow prediction during the COVID?19 pandemic. Sensors202121(17):5950.
14 Cheng W, Wang Y, Peng Z,et al. High?efficiency chaotic time series prediction based on time convolu?tion neural network. Chaos,Solitons & Fractals,2021(152):111304.
15 Yao Y Y. Three?way decision:An interpretation of rules in rough set theory∥The 4th International Conference on Rough Sets and Knowledge Technology. Springer Berlin Heidelberg,2009:642-649.
16 Abdar M, Samami M, Mahmoodabad S D,et al. Uncertainty quantification in skin cancer classification using three?way decision?based Bayesian deep learning. Computers in Biology and Medicine2021(135):104418.
17 Xu Y Y, Gu S M, Li H X,et al. A hybrid approach to three?way conversational recommendation. Soft Computing202226(24):13885-13897.
18 Yu H. Three?way decisions and three?way clustering∥International Joint Conference on Rough Sets. Springer Berlin Heidelberg,2018:13-28.
19 Li Z W, Zhang P F, Xie N X,et al. A novel three?way decision method in a hybrid information system with images and its application in medical diagnosis. Engineering Applications of Artificial Intelligence2020(92):103651.
20 Fang Y, Cao X M, Wang X,et al. Three?way sampling for rapid attribute reduction. Information Sciences2022(609):26-45.
21 Jia F, Liu P D. A novel three?way decision model under multiple?criteria environment. Information Sciences2019(471):29-51.
22 Zhang Y B, Zhang Z F, Miao D Q,et al. Three?way enhanced convolutional neural networks for sentence?level sentiment classification. Information Sciences2019(477):55-64.
23 Qian J, Liu C H, Miao D Q,et al. Sequential three?way decisions via multi?granularity. Information Sciences2020(507):606-629.
24 Bai S J, Kolter J Z, Koltun V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. 2018,arXiv:1803. 01271.
25 Ke G L, Meng Q, Finley T,et al. LightGBM:A highly efficient gradient boosting decision tree∥Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach,CA,USA:Curran Associates Inc.,2017:3149-3157.
26 方宇,曹雪梅,李宾倩,等. 天然气集输异常工况处理的主动学习方法. 西南石油大学学报(自然科学版)202042(6):124-132.
Fang Y, Cao X M, Li B Q,et al. Active learning method for abnormal operating conditions of natural gas gathering system. Journal of Southwest Petroleum University (Science & Technology Edition)202042(6):124-132.
27 Fang Y, Min F. Cost?sensitive approximate attribute reduction with three?way decisions. International Journal of Approximate Reasoning2019(104):148-165.
28 Abdollahi H. A novel hybrid model for forecasting crude oil price based on time series decomposition. Applied Energy2020(267):115035.
29 Yang S B, Deng Z G, Li X F,et al. A novel hybrid model based on STL decomposition and one?dimensional convolutional neural networks with positional encoding for significant wave height forecast. Renewable Energy2021(173):531-543.
[1] 刘鑫, 胡军, 张清华, 于洪. 多用户偏好下基于三支决策的动态属性约简[J]. 南京大学学报(自然科学版), 2022, 58(1): 9-18.
[2] 张呈玲, 李进金, 林艺东. 基于面向对象(属性)概念格的三支规则提取[J]. 南京大学学报(自然科学版), 2021, 57(4): 599-610.
[3] 王颖俐, 魏玲. 基于改进的区间损失函数聚合法的三支决策[J]. 南京大学学报(自然科学版), 2021, 57(3): 493-501.
[4] 杜祥通, 李永忠. 基于深度信念网络和三支决策的入侵检测算法[J]. 南京大学学报(自然科学版), 2021, 57(2): 272-278.
[5] 顾萍萍,周献中. 基于概率语言术语集评价的三支决策方法研究[J]. 南京大学学报(自然科学版), 2020, 56(4): 505-514.
[6] 徐媛媛,张恒汝,闵帆,黄雨婷. 三支交互推荐[J]. 南京大学学报(自然科学版), 2019, 55(6): 973-983.
[7] 龙柄翰, 徐伟华. 模糊三支概念分析与模糊三支概念格[J]. 南京大学学报(自然科学版), 2019, 55(4): 537-545.
[8] 张 婷1,2,张红云1,2*,王 真3. 基于三支决策粗糙集的迭代量化的图像检索算法[J]. 南京大学学报(自然科学版), 2018, 54(4): 714-.
[9] 靳义林1,2*,胡 峰1,2. 基于三支决策的中文文本分类算法研究[J]. 南京大学学报(自然科学版), 2018, 54(4): 794-.
[10]  方 宇1,闵 帆1*,刘忠慧1,杨 新2.  序贯三支决策的代价敏感分类方法[J]. 南京大学学报(自然科学版), 2018, 54(1): 148-.
[11] 赵天娜1,米据生1*,解 滨2,梁美社1,3. 基于多伴随直觉模糊粗糙集的三支决策[J]. 南京大学学报(自然科学版), 2017, 53(6): 1081-.
[12]  张春英1,2,乔 鹏1,2,王立亚1,2*,刘 璐1,2,张建松1,3.  基于概率PS-粗糙集的动态三支决策及应用[J]. 南京大学学报(自然科学版), 2017, 53(5): 937-.
[13]  薛占熬1,2*,辛现伟1,2,袁艺林1,2,薛天宇1,2. 基于直觉模糊可能性测度的三支决策模型的研究
[J]. 南京大学学报(自然科学版), 2016, 52(6): 1065-.
[14] 汪 璐,贾修一*,顾雁囡. 三支决策贝叶斯网络分类器[J]. 南京大学学报(自然科学版), 2016, 52(5): 833-.
[15] 张燕平1,2, 邹慧锦1,2,赵姝1,2. 基于CCA的代价敏感三支决策模型[J]. 南京大学学报(自然科学版), 2015, 51(2): 447-452.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!