南京大学学报(自然科学版) ›› 2023, Vol. 59 ›› Issue (4): 550–560.doi: 10.13232/j.cnki.jnju.2023.04.002

• • 上一篇    下一篇

基于分位数因子模型的高维时间序列因果关系分析

梁慧玲1,2, 刘慧1,2(), 刘力维1,2, 赵佳3, 阮怀军3   

  1. 1.山东财经大学计算机科学与技术学院,济南,250014
    2.山东省数字媒体技术重点实验室,山东财经大学,济南,250014
    3.山东省农业科学院信息技术研究所,济南,250000
  • 收稿日期:2023-06-13 出版日期:2023-07-31 发布日期:2023-08-18
  • 通讯作者: 刘慧 E-mail:liuh_lh@sdufe.edu.cn
  • 基金资助:
    国家自然科学基金(62072274);山东省科技成果转移转化项目(2021LYXZ021);山东省泰山学者特聘专家计划(tstp20221137)

Causal relationship analysis of high⁃dimensional time series based on quantile factor model

Huiling Liang1,2, Hui Liu1,2(), Liwei Liu1,2, Jia Zhao3, Huaijun Ruan3   

  1. 1.College of Computer Science and Technology,Shandong University of Finance and Economics,Ji'nan,250014,China
    2.Key Laboratory of Digital Media Technology of Shandong Province,Shandong University of Finance and Economics, Ji'nan,250014,China
    3.Institute of Information Technology,Shandong Academy of Agricultural Sciences,Ji'nan,250000,China
  • Received:2023-06-13 Online:2023-07-31 Published:2023-08-18
  • Contact: Hui Liu E-mail:liuh_lh@sdufe.edu.cn

摘要:

从观察数据中发现变量之间的因果关系是许多科学研究领域的关键问题,传统Granger因果模型受到维度灾难的影响,难以准确地在高维时间序列中发现因果关系.提出一种基于分位数因子模型的Granger因果分析新方法QFM?CGC用于高维时间序列因果关系的判定.首先,QFM?CGC采用赤池信息量准则进行模型选择,避免人为干预设置滞后阶数的操作;然后,对向量自回归(Vector Autoregressive,VAR)模型中的条件变量建立分位数因子模型进行降维,减少VAR模型中的待估计系数,对降维后的VAR模型重新进行条件Granger因果分析;最后,使用蒙特卡洛模拟评估不同方法识别底层系统与观测时间序列的连通性结构的能力.在不同维度变量的线性仿真系统和两组现实数据集上与基准方法和经典方法进行了比较,实验结果验证了该方法的有效性.

关键词: 高维时间序列, 分位数因子模型, 条件Granger因果分析, 数据挖掘

Abstract:

Finding the causal relationship between variables from observed data is a key issue in many scientific research fields. Because the traditional Granger causality model is affected by the curse of dimension,it is difficult to accurately find causality in high?dimensional time series. In this paper,we propose a new Granger causality analysis method based on quantile factor model,QFM?CGC algorithm,which is used to find causality relationship in high?dimensional time series. Firstly,QFM?CGC uses Akaike information criterion to select models,which avoids setting the lag order by human intervention. Then,the quantile factor model is established to reduce the dimensionality of the conditional variables in a vector autoregressive (VAR) model,thus reducing the number of coefficients that need to be estimated. The reduced?dimensional VAR model is used for a conditional Granger causality analysis. Finally,Monte Carlo simulation is applied to evaluate the performance of different methods to identify the connectivity structure between the underlying system and the observation time series. Experiments compare the proposed method with benchmark and classical methods on a linear simulation system with variables in different dimensions and two sets of real data,confirming its effectiveness.

Key words: high?dimensional time series, quantile factor model, conditional Granger causality analysis, data mining

中图分类号: 

  • TP391

图1

VAR5(5)真实因果关系(黑色表示存在因果关系)"

图2

VAR5(5)模型阶数选择"

表1

VAR5(5)的100次蒙特卡洛实验中因果关系结果的频率"

方法CGCPMIMEPCA⁃CGCmBTS⁃CGCQFM⁃CGC
X1X298%1%000
X1X399%95%100%94%99%
X2X198%99%40%72%99%
X2X304%000
X2X4100%98%50%98%100%
X3X108%000
X3X201%000
X3X53%86%43%96%100%
X4X104%000
X4X201%000
X4X302%000
X4X5100%97%26%100%99%
X5X102%000
X5X2100%99%18%100%99%
X5X301%000

图3

VAR10(5)真实因果关系(黑色表示存在因果关系)"

图4

VAR10(5)模型阶数选择"

表2

VAR10(5)的100次蒙特卡洛实验中因果关系结果的频率"

方法CGCPMIMEPCA⁃CGCmBTS⁃CGCQFM⁃CGC
X1X299%99%100%99%100%
X1X632%95%75%57%91%
X1X935%98%40%55%100%
X2X40100%85%99%100%
X2X501%000
X4X588%99%099%100%
X5X151%85%100%57%100%
X5X336%100%100%99%100%
X5X495%69%100%97%100%
X5X823%99%100%80%94%
X6X70100%99%89%100%
X6X800008%
X7X10100%100%100%99%100%
X8X6100%96%100%99%100%
X9X801%000
X10X801%000
X10X9100%100%100%99%100%

表3

不同分位数下的因子估计数"

分位数τ因子个数
0.011
0.051
0.102
0.254
0.505
0.755
0.902
0.951
0.991

表4

F^QFA和F^PCA的比较结果"

分位数τF̂QFAτ的元素的个数
12345
0.010.657
0.050.733
0.100.7960.871
0.250.9520.9320.9390.890
0.500.9930.9760.9640.9450.923
0.750.9060.9450.9430.9030.882
0.900.3160.911
0.950.261
0.990.266

表5

GDP预测结果"

对比方法因变量(编号)RMSEMAPESMAPE
CGC

28,64,74,104,

116,162

2.535582.303541.06974
mBTS⁃CGC4,9,11,16,18,26,36,60,66,86,89,102,138,141,148,2032.378872.017091.06333
PCA⁃CGC6,70,71,77,141,1482.246802.065301.03592
PMIME70,137,161,1631.909921.544850.95669
QFM⁃CGC (τ=0.90)

2,4,7,10,21,

79,163,171

1.703791.468410.93114
QFM⁃CGC (τ=0.99)

2,4,7,10,21,26,

79,160,163,189

1.808521.587680.95876

图5

PCA?CGC的GDP预测图"

图6

mBTS?CGC的GDP预测图"

图7

CGC的GDP预测图"

图8

PMIME的GDP预测图"

图9

QFM?CGC (τ=0.90)的GDP预测图"

图10

QFM?CGC (τ=0.99)的GDP预测图"

表6

北京AQI及气象时间序列编号及变量对照表"

编号123456
变量PM2.5PM10SO2NO2COO3
编号7891011
变量气温气压露点降雨量风速

表7

NO2的预测结果"

对比方法因变量(编号)RMSEMAPESMAPE
mBTS⁃CGC5,9,10,115.895890.496750.02104
PCA⁃CGC2,5,93.370850.567040.02329
PMIME,CGC6,7,115.372640.498210.02043

QFM⁃CGC

τ=0.01,0.05

1,2,3,5,72.464990.424110.01817
1 Granger C W J. Investigating causal relations by econometric models and cross?spectral methods. Econometrica196937(3):424-438.
2 Wismüller A, Vosoughi M A, DSouza A,et al. Exploring directed network connectivity in complex systems using large?scale augmented Granger causality∥Proceedings of SPIE 12033,Medical Imaging 2022:Computer?Aided Diagnosis. San Diego,CA,USA:SPIE,2022:168-177.
3 Maradana R P, Pradhan R P, Dash S,et al. Innovation and economic growth in European Economic Area countries:The Granger causality approach. IIMB Management Review201931(3):268-282.
4 Billio M, Getmansky M, Lo A W,et al. Econometric measures of connectedness and systemic risk in the finance and insurance sectors. Journal of Financial Economics2012104(3):535-559.
5 Chang T, Gupta R, Inglesi?Lotz R,et al. Renewable energy and growth:Evidence from heterogeneous panel of G7 countries using Granger causality. Renewable and Sustainable Energy Reviews2015(52):1405-1412.
6 Hlinka J, Hartman D, Vejmelka M,et al. Reliability of inference of directed climate networks using conditional mutual information. Entropy201315(6):2023-2045.
7 Blinowska K J, Ku? R, Kamiński M. Granger causality and information flow in multivariate processes. Physical Review E200470(5):050902.
8 Geweke J. Measurement of linear dependence and feedback between multiple time series. Journal of the American Statistical Association198277(378):304-313.
9 李松,胡晏铭,郝晓红,等. 基于维度分组降维的高维数据近似k近邻查询. 计算机研究与发展202158(3):609-623.
Li S, Hu Y M, Hao X H,et al. Approximate k?nearest neighbor query of high dimensional data based on dimension grouping and reducing. Journal of Computer Research and Development202158(3):609-623.
10 刘淑伟,陈威,赵伟,等. 基于簇内乘积量化的最近邻检索方法. 计算机学报202043(2):303-314.
Liu S W, Chen W, Zhao W,et al. Nearest neighbor search based on product quantization in clusters. Chinese Journal of Computers202043(2):303-314.
11 Wold S, Esbensen K, Geladi P. Principal component analysis. Chemometrics and Intelligent Laboratory Systems19872(1-3):37-52.
12 Abdi H, Williams L J. Principal component analysis. WIREs Computational Statistics20102(4):433-459.
13 Chen L, Dolado J J, Gonzalo J. Quantile factor models. Econometrica202189(2):875-910.
14 Mooney C Z. Monte Carlo simulation. Thousand Oaks:Sage Publications,1997,103.
15 Pankratz A. Forecasting with dynamic regression models. Hoboken:John Wiley & Sons,2012,400.
16 Brandt P T, Williams J T. Multiple time series models. Sage Publications,2006,120.
17 Guo S X, Seth A K, Kendrick K M,et al. Partial Granger causality?eliminating exogenous inputs and latent variables. Journal of Neuroscience Methods2008172(1):79-93.
18 Dickey D A, Fuller W A. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association197974(366):427-431.
19 Barber R F, Drton M. High?dimensional Ising model selection with Bayesian information criteria. Electronic Journal of Statistics20159(1):567-607.
20 Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control197419(6):716-723.
21 Zhou Z Y, Chen Y H, Ding M Z,et al. Analyzing brain networks with PCA and conditional Granger causality. Human Brain Mapping200930(7):2197-2206.
22 Siggiridou E, Kugiumtzis D. Granger causality in multivariate time series using a time?ordered restricted vector autoregressive model. IEEE Transactions on Signal Processing201664(7):1759-1773.
23 Kugiumtzis D. Direct?coupling information measure from nonuniform embedding. Physical Review E201387(6):062918.
24 Quiroga R Q, Kraskov A, Kreuz T,et al. Performance of different synchronization measures in real data:A case study on electroencephalographic signals. Physical Review E200265(4):041903.
25 Jia Z Y, Lin Y F, Liu Y X,et al. Refined nonuniform embedding for coupling detection in multivariate time series. Physical Review E2020101(6):062113.
[1] 罗思涵, 杨燕. 一种基于深度学习和元学习的出行时间预测方法[J]. 南京大学学报(自然科学版), 2022, 58(4): 561-569.
[2] 李佳佳, 丁伟, 王伯伟, 聂秀山, 崔超然. 基于随机森林的民俗体育对身体指标影响评估方法[J]. 南京大学学报(自然科学版), 2021, 57(1): 59-67.
[3] 吴静怡,吴钟强,商琳. 基于Shapelet的不相关情感子序列挖掘方法[J]. 南京大学学报(自然科学版), 2020, 56(1): 57-66.
[4]  芦俊丽1,2,王丽珍1*,赵家松1,肖 清1.  从动态空间数据库中挖掘共生关系和竞争关系[J]. 南京大学学报(自然科学版), 2018, 54(2): 436-.
[5]  杨 宇1,吉根林1*,赵 斌1,黄潇婷2.  一种新的基于时空轨迹的汇合模式挖掘算法[J]. 南京大学学报(自然科学版), 2018, 54(1): 97-.
[6] 宋威,刘明渊,李晋宏. 基于事务型滑动窗口的数据流中高效用项集挖掘算法[J]. 南京大学学报(自然科学版), 2014, 50(4): 494-.
[7]  毕方明1**,王为奎2,陈龙1
.  基于空间密度的群以噪声发现聚类算法研究*
[J]. 南京大学学报(自然科学版), 2012, 48(4): 491-498.
[8]  朱娟1.2**,吉根林1.2
.  基于相邻关系的地理标识语言空间线对象离群检测算法*[J]. 南京大学学报(自然科学版), 2012, 48(1): 84-90.
[9]  申 彦**,宋顺林,朱玉全
.  一种基于半监督的大规模数据集聚类算法*
[J]. 南京大学学报(自然科学版), 2011, 47(4): 372-382.
[10]  周雷1,喻言2**,李志瑞2,王洁3,孙贞1,欧进萍4
.  海洋平台振动采集的超低频无线传感器设计*[J]. 南京大学学报(自然科学版), 2011, 47(4): 414-419.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!