南京大学学报(自然科学版) ›› 2017, Vol. 53 ›› Issue (6): 1125–.

• • 上一篇    下一篇

一种基于最大最小独立性的因果发现算法

谢 峰1,蔡瑞初1*,陈 薇1,郝志峰1,2   

  • 出版日期:2017-11-27 发布日期:2017-11-27
  • 作者简介:1.广东工业大学,广州,510006;2.佛山科学技术学院,佛山,528000
  • 基金资助:
    基金项目:NSFC-广东联合基金(U1501254),国家自然科学基金(61472089,61572143),广东省自然科学基金(2014A030306004,2014A030308008),广东省科技计划项目(2013B051000076,2015B010108006,2015B010131015),广东特支计划(2015TQ01X140),广州市珠江科技新星(201610010101)
    收稿日期:2017-08-12
    *通讯联系人,E-mail:cairuichu@gmail.com

A causal discovery algorithm based on maximum and minimum independence

 Xie Feng1,Cai Ruichu1*,Chen Wei1,Hao Zhifeng1,2   

  • Online:2017-11-27 Published:2017-11-27
  • About author: 1.Guangdong University of Technology,Guangzhou,510006,China;2.Foshan University,Foshan,528000,China

摘要:  线性非高斯无环模型(LiNGAM)具有在没有任何先验知识的情况下能够仅仅从观察数据中完整地识别因果网络的优势,这使得它得到了越来越多研究者的关注.然而,现有求解LiNGAM模型的算法中一部分存在对初始值敏感,容易陷入局部最优解的问题,一部分存在对于外生变量识别率低的缺陷.为此,提出了一种基于最大最小独立性的因果发现算法.通过引入自适应的独立性判定参数,根据此参数来找出与其余所有变量回归得到的残差都独立的变量,即为外生变量.该算法不仅避免了传统算法对独立性值差异敏感而导致识别率低的问题,而且也避免了不同数据集对固定独立性参数敏感而导致无法识别的缺陷.将该算法应用于虚拟网络和真实网络中,实验结果都表明,各种维度下该算法都优于现有的其他算法.

Abstract: The linear non-Gaussian acyclic model(LiNGAM)has the advantage of being able to completely identify the causal network only from the observational data without any prior knowledge,which has been attracting more and more researchers’ attention.However,in the present solution,some of the algorithms are sensitive to the initial values and would easily fall into the local optimal solution.And others have the disadvantage of low recognition rate of the exogenous variable.We propose an effective casual discovery algorithm based on the maximum and minimum independence.In contrast to the previous methods,our algorithm’s parameters are obtained through adaptive learning.The details are as follows.Firstly,we formalize the maximum and minimum independence criteria,which is applied to find the minimum values that makes each variable and the relative residuals independent.Secondly,we find one maximum value that makes only one variable and its residuals independent from the corresponding minimum values found at the former step.The value we chose finally is regarded as an independence judgment parameter.In a word,we obtain the appropriate parameter values according to the maximum and minimum independence criteria.In the end,we use this parameter to find those variables that are independent of the residuals returned by all the other variables.Such variable is the exogenous variable.Based on this,we can achieve the goal of selecting greater parameters with different data sets through adaptive learning,rather than giving them a fixed initial value.That is to say,we could get exogenous variables which are more precise with our method.Once the exogenous variables are obtained,all we only need to do is to iteratively find a complete causal order.To be concluded,the algorithm could not only avoids the problem that the traditional algorithm is sensitive to the difference of independence value,but also avoids the defect that different data sets are sensitive to fixed independent parameters.To illustrate the effectiveness of our algorithm,we evaluate our method on virtual networks and real networks.The experimental results show that it outperforms previous approaches consistently under different dimensions.

 [1] Xu L L,Fan T T,Wu X,et al.A pooling-LiNGAM algorithm for effective connectivity analysis of fMRI data.Frontiers in Computational Neuroscience,2014,8:125.
[2] Ramsey J D,Sanchez-Romero R,Glymour C.Non-Gaussian methods and high-pass filters in the estimation of effective connections.Neuroimage,2014,84:986-1006.
[3] Lai P C,Bessler D A.Price discovery between carbonated soft drink manufacturers and retailers:A disaggregate analysis with PC and LiNGAM algorithms.Journal of Applied Economics,2015,18(1):173-197.
[4] Xu X J.Contemporaneous causal orderings of US corn cash prices through directed acyclic graphs.Empirical Economics,2017,52(2):731-758.
[5] Cai R C,Yuan C,Hao Z F,et al.A causal model for disease pathway discovery.In:Proceedings of the 21st International Conference on Neural Information Processing.Springer Berlin Heidelberg,2014:350-357.
[6] Pearl J.Causality:Models,reasoning and inference.The 2nd Edition.New York:Cambridge University Press,2003,43-51.
[7] Spirtes P,Glymour C,Scheines R.Causation,prediction,and search.London:MIT Press,2000,84-89.
[8] Bollen K A.Structural equations with latent variables.New York:John Wiley & Sons Press,1989,40-79.
[9] Shimizu S,Hoyer P O,Hyvrinen A,et al.A linear non-Gaussian acyclic model for causal discovery.The Journal of Machine Learning Research,2006,7:2003-2030.
[10] Shimizu S.LiNGAM:Non-Gaussian methods for estimating causal structures.Behaviormetrika,2014,41(1):65-98.
[11] Hoyer P O,Hyttinen A.Bayesian discovery of linear acyclic causal models.In:Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence.Quebec,Canada:AUAI Press,2009:240-248.
[12] Shimizu S,Inazumi T,Sogawa Y,et al.Direct LiNGAM:A direct method for learning a linear non-Gaussian structural equation model.The Journal of Machine Learning Research,2011,12:1225-1248.
[13] Hyvrinen A,Smith S M.Pairwise likelihood ratios for estimation of non-Gaussian structural equation models.The Journal of Machine Learning Research,2013,14(1):111-152.
[14] Hoyer P O,Shimizu S,Kerminen A J,et al.Estimation of causal effects using linear non-Gaussian causal models with hidden variables.International Journal of Approximate Reasoning,2008,49(2):362-378.
[15] Entner D,Hoyer P O.Discovering unconfounded causal relationships using linear non-Gaussian models.In:The JSAI International Symposium on Artificial Intelligence.Springer Berlin Heidelberg,2010:181-195.
[16] Tashiro T,Shimizu S,Hyvrinen A,et al.Parcelingam:A causal ordering method robust against latent confounders.Neural Computation,2014,26(1):57-83.
[17] Shimizu S,Bollen K.Bayesian estimation of causal direction in acyclic structural equation models with individual-specific confounder variables and non-Gaussian distributions.The Journal of Machine Learning Research,2014,15(1):2629-2652.
[18] Shimizu S.Learning LiNGAM based on data with more variables than observations.arXiv:1208.4183,2013.
[19] Sogawa Y,Shimizu S,Shimamura T,et al.Estimating exogenous variables in data with more variables than observations.Neural Networks,2011,24(8):875-880.
[20] Cai R C,Zhang Z J,Hao Z F.SADA:A general framework to support robust causation discovery.In:Proceedings of the 30th International Conference on Machine Learning.Atlanta,GA,USA:JMLR Press,2013:208-216.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!