南京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (2): 313–.

• • 上一篇    下一篇

基于坐标下降邻域选择的高斯图模型结构并行估计

李晓宇1,周 铭2,袁晓彤1*,罗 琦1,刘青山1*   

  • 出版日期:2016-03-27 发布日期:2016-03-27
  • 作者简介: 1.江苏省大数据分析技术重点实验室,南京信息工程大学,南京,210044;2.南京信息工程大学电子与信息工程学院,南京,210044
  • 基金资助:
    基金项目:国家自然科学基金(61402232,61532009,61522308),江苏省自然科学基金(BK20141003,BK2012045)
    收稿日期:2015-11-17
    *通讯联系人,E­mail:xtyuan1980@gmail.comqsliu@nuist.edu.cn

Gaussian Graphical Models parallel estimation via coordinate descent neighborhood selection

Li Xiaoyu1,Zhou Ming2,Yuan Xiaotong1*,Luo Qi1,Liu Qingshan1*   

  • Online:2016-03-27 Published:2016-03-27
  • About author: 1.Jiangsu Key Laboratory of Big Data Analysis Technology,Nanjing University of Information Science & Technology,Nanjing,210044,China;2.School of Electronic & Information Engineering,Nanjing University of Information Science & Technology,Nanjing,210044,China

摘要: 在许多机器学习问题中,往往需要研究高维数据中各个特征之间的统计相关性.稀疏高斯图模型作为解决这一问题的有效方法之一,广泛应用于数据挖掘、生物信息、金融分析等应用问题中.由于模型参数量是数据维度的平方量级,基于高维数据的稀疏高斯图模型的参数估计一直是统计机器学习研究中的挑战性问题之一.提出了一种新颖的基于坐标下降优化的稀疏高斯图模型并行估计算法,其核心思想是根据高斯图模型结构估计等价于每个变量的稀疏近邻选择这一基本结论,采用坐标下降来求解每个近邻选择子问题.通过将样本矩阵进行分布式存储,在MPI(Message­Passing Interface)框架下实现了这些子问题的并行求解.实验结果表明,该算法具有良好的并行运算性能,在保证结构估计精度的同时,能够大幅度提升运算速度.

Abstract: In many machine learning tasks,it is necessary to investigate the statistical relationship among a set of high dimensional random variables.As an effective approach to solve this problem,sparse Gaussian Graphical Models(GGMs)has been widely applied to data mining,bioinformatics and financial analysis applications.Due to the high scale of model parameters as a square of data dimensionality,sparse GGMs learning remains a challenge problem especially in high dimensional settings.To address this problem,we propose in this paper a novel coordinate descent procedure to recover the sparse graph structure in a decentralized way.The core idea is based on the fact that sparse GGMs can be recovered by neighbor selection Lasso programs.We propose to apply coordinate descent optimization to each individual Lasso subproblem.When the samples are distributed on different machines,the coordinate descent procedure can be efficiently implemented in a parallel way using the MPI(Message­Passing Interface)framework.Experimental results show that our algorithm is able to significantly improve the running time efficiency,at almost no cost of structure estimation accuracy.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!