南京大学学报(自然科学版) ›› 2020, Vol. 56 ›› Issue (4): 561–569.doi: 10.13232/j.cnki.jnju.2020.04.014

• • 上一篇    下一篇

储层预测的代价敏感主动学习算法

汪敏1,赵飞1,闵帆2()   

  1. 1.西南石油大学电气信息学院,成都,610500
    2.西南石油大学计算机科学学院,成都,610500
  • 收稿日期:2020-04-29 出版日期:2020-07-30 发布日期:2020-08-06
  • 通讯作者: 闵帆 E-mail:minfanphd@163.com
  • 基金资助:
    四川省青年科技创新研究团队项目(2019JDTD0017);教育部高等教育司产学合作协同育人项目(201801140013)

Reservoir prediction through cost⁃sensitive active learning

Min Wang1,Fei Zhao1,Fan Min2()   

  1. 1.School of Electrical Information,Southwest Petroleum University,Chengdu,610500,China
    2.Institute for Artificial Intelligence,School of Computer Science,Southwest Petroleum University,Chengdu,610500,China
  • Received:2020-04-29 Online:2020-07-30 Published:2020-08-06
  • Contact: Fan Min E-mail:minfanphd@163.com

摘要:

传统的储层预测需要耗费大量的时间且对研究人员的专业能力要求极高,采用人工智能方法实现储层预测可以有效地改善预测效率.然而,因为环境、设备等原因导致油气井数据中存在大量属性值缺失,大大降低了储层识别精度.针对属性值缺失造成分类困难的问题,提出一个统一评估和动态选择的代价敏感主动学习算法(Active Learning Algorithm with Unified Evaluation and Dynamic Selection,ALES):(1)考虑各种代价的设置和计算,包括误分类代价、属性代价、标签代价和样本代价;(2)使用softmax回归实现对属性值和标签价值的统一评估;(3)提出一种具有排列组合和贪婪策略的最优获取方案,实现属性值和标签的动态选择.采用三个真实测井数据进行实验,显著性实验分析证明了ALES的有效性及其相对于监督代价敏感分类算法和缺失填补算法的优越性.

关键词: 主动学习, 代价敏感, 不完备数据, 统一评估, 动态选择

Abstract:

For oil and gas industry,traditional reservoir prediction usually takes a lot of time and requires researchers to have high expertise,while using artificial intelligence to realize reservoir prediction effectively improves the efficiency of prediction. However,due to environmental and equipment reasons,there are a large number of missing attribute values in oil and gas well data,which greatly reduce the accuracy of reservoir identification. To solve the problem of classification difficulty due to the lack of attribute values,we propose a cost?sensitive active learning algorithm with unified evaluation and dynamic selection (ALES). First,we consider the setting and calculation of various costs,including misclassification costs,attribute costs,label costs and sample costs. Second,we use softmax regression to achieve a unified evaluation of attribute values and label values. Third,we propose an optimal acquisition scheme with permutation and greedy strategies to achieve dynamic selection of attribute values and labels. The experiments used three actual logging interpretation data. The results of significance test verify the effectiveness of ALES and its superiority to the state?of?the?art supervised cost?sensitive classification algorithms and missing filling algorithms.

Key words: active learning, cost?sensitive, incomplete data, unified evaluation, dynamic selection

中图分类号: 

  • TP181

图1

ALES算法框图"

表1

不完备信息系统"

Uc1c2c3c4
x1*3.51.40.2
x2*3.01.40.2
x3***0.2
x44.63.11.5*
x55.0*1.4*
x67.03.24.7*
x76.43.24.51.5
x86.93.14.91.5
x9*2.34.0*
x10*2.84.61.5
x116.33.3*2.5
x12**5.11.9
x137.13.05.92.1
x146.32.95.61.8
x15*3.05.82.2

表2

算法1的复杂度计算"

复杂度描述
总计O(mn2)+O(n2)+O(m'tn)=O(mn2)
第2行O(mn2)选择初始训练集
第3行O(n2)训练θ模型
第6~19行O(m'tn)迭代选择属性值和标签

表3

数据集信息"

序号名字样本数属性数类别数
1Well_01301114
2Well_0240872
3Well_03414972

表4

不同缺失率下ALES算法和其他六种对比算法的平均代价比较"

10%
NBkNNJ48CALFGESIBPCAALES
Well_010.53430.29900.94120.35780.41180.31370.1485
Well_021.03651.32231.36880.90231.32230.88370.7601
Well_030.68020.32300.29400.23040.24730.37890.2438
MeanRank5.384.255.633.134.53.881.25
30%
NBkNNJ48CALFGESIBPCAALES
Well_010.52450.66180.72550.62940.67160.61760.2490
Well_021.20271.31561.09630.87311.13620.79730.7734
Well_030.45840.45500.39000.44650.40010.47380.3745
MeanRank4.635.384.383.884.634.131.00
50%
NBkNNJ48CALFGESIBPCAALES
Well_010.87750.83330.79900.44800.81370.56370.3814
Well_021.53491.46841.22920.94221.42190.87040.7794
Well_030.76690.75250.71780.65120.93470.71150.5261
MeanRank6.135.384.132.885.383.131.00

表5

数据缺失50%时ALES和六种对比算法的post?hoc对比"

算法z=(R0-Ri)/SEp
ALES vs. NB3.35510.0008
ALES vs. kNN2.86410.0042
ALES vs. GESI2.86410.0042
ALES vs. J482.04580.0408
ALES vs. BPCA1.39110.1642
ALES vs. CALF1.22750.2196

图2

不同缺失率时ALES和六种算法的平均代价对比(从上至下分别对应:Well 01;Well 02;Well 03)"

1 Zahin S A,Ahmed C F,Alam T. An effective method for classification with missing values. Applied Intelligence,2018,48(10):3209-3230.
2 Zhang J,Clayton M K,Townsend P A. Missing data and regression models for spatial images. IEEE Transactions on Geoscience and Remote Sensing,2015,53(3):1574-1582.
3 Silva?Ramírez E L,Pino?Mejías R,López?Coello M,et al. Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Networks,2011,24(1):121-129.
4 Azadeh A,Asadzadeh S M,Jafari?Marandi R,et al. Optimum estimation of missing values in randomized complete block design by genetic algorithm. Knowledge?Based Systems,2013,37:37-47.
5 Melville P,Saar?Tsechansky M,Provost F,et al. Active feature?value acquisition for classifier induction∥The 4th IEEE International Conference on Data Mining. Brighton,United Kingdom:IEEE,2004:483-486.
6 Kwon O,Sim J M. Effects of data set features on the performances of classification algorithms. Expert Systems with Applications,2013,40(5):1847-1857.
7 Min F,Liu F L,Wen L Y,et al. Tri?partition cost?sensitive active learning through kNN. Soft Computing,2019,23(5):1557-1572.
8 Settles B. Active learning. San Rafael:Morgan & Claypool Publishers,2012:1-114.
9 Tong S,Koller D. Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research,2002,2(1):45-66.
10 Wang M,Min F,Zhang Z H,et al. Active learning through density clustering. Expert Systems with Applications,2017,85:305-317.
11 Wang M,Fu K,Min F,et al. Active learning through label error statistical methods. Knowledge?Based Systems,2020,189:105140.
12 Rodriguez A,Laio A. Machine learning. clustering by fast search and find of density peaks. Science,2014,344(6191):1492-1496.
13 Allcock J,Zhang S Y. Quantum machine learning. National Science Review,2019,6(1):26-28.
14 Dennis J E,Moré J J. Quasi?newton methods,motivation and theory. SIAM Review,1977,19(1):46-89.
15 黄帷,闵帆,任杰. 基于协同过滤加权预测的主动学习缺失值填补算法. 南京大学学报(自然科学),2018,54(4):758-765.
Huang W,Min F,Ren J. Missing value imputation with active learning based on collaborative filtering weighted prediction. Journal of Nanjing University (Natural Science),2018,54(4):758-765.
16 Gheyas I A,Smith L S. A neural network?based framework for the reconstruction of incomplete data sets. Neurocomputing,2010,73(16-18):3039-3065.
17 Meng F C,Cai C,Yan H. A bicluster?based bayesian principal component analysis method for microarray missing value estimation. IEEE Journal of Biomedical and Health Informatics,2014,18(3):863-871.
18 Holmes G,Donkin A,Witten I H. WEKA:A machine learning workbench∥Proceedings of ANZIIS'94:Australian New Zealnd Intelligent Information Systems Conference. Brisbane,Australia:IEEE,1994:357-361.
19 Triguero I,González S,Moyano J M,et al. KEEL 3.0:an open source software for multi?stage analysis in data mining. International Journal of Computational Intelligence Systems,2017,10(1):1238-1249.
20 Reyes O,Altalhi A H,Ventura S. Statistical comparisons of active learning strategies over multiple datasets. Knowledge?Based Systems,2018,145:274-288.
[1] 刘鑫,胡军,张清华. 属性组序下基于代价敏感的约简方法[J]. 南京大学学报(自然科学版), 2020, 56(4): 469-479.
[2] 张银芳,于洪,王国胤,谢永芳. 一种用于数据流自适应分类的主动学习方法[J]. 南京大学学报(自然科学版), 2020, 56(1): 67-73.
[3] 柴变芳,魏春丽,曹欣雨,王建岭. 面向网络结构发现的批量主动学习算法[J]. 南京大学学报(自然科学版), 2019, 55(6): 1020-1029.
[4] 张龙波, 李智远, 杨习贝, 王怡博. 决策代价约简求解中的交叉验证策略[J]. 南京大学学报(自然科学版), 2019, 55(4): 601-608.
[5] 黄 帷,闵 帆*,任 杰. 基于协同过滤加权预测的主动学习缺失值填补算法[J]. 南京大学学报(自然科学版), 2018, 54(4): 758-.
[6]  方 宇1,闵 帆1*,刘忠慧1,杨 新2.  序贯三支决策的代价敏感分类方法[J]. 南京大学学报(自然科学版), 2018, 54(1): 148-.
[7] 黄伟婷1*,赵 红2. 基于误差数据的最小代价属性选择分治算法[J]. 南京大学学报(自然科学版), 2016, 52(5): 890-.
[8] 张燕平1,2, 邹慧锦1,2,赵姝1,2. 基于CCA的代价敏感三支决策模型[J]. 南京大学学报(自然科学版), 2015, 51(2): 447-452.
[9]  白龙飞1,王文剑2**,郭虎升1.  一种新的支持向量机主动学习策略*
[J]. 南京大学学报(自然科学版), 2012, 48(2): 182-189.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 魏 桐,童向荣. 基于加权启发式搜索的鲁棒性信任路径生成[J]. 南京大学学报(自然科学版), 2018, 54(6): 1161 -1170 .
[2] 韩明鸣, 郭虎升, 王文剑. 面向非平衡多分类问题的二次合成QSMOTE方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 1 -13 .
[3] 贾海宁, 王士同. 面向重尾噪声的模糊规则模型[J]. 南京大学学报(自然科学版), 2019, 55(1): 61 -72 .
[4] 肖梦君, 吴俊仙, 张 娜, 陈 昕, 饶泽兵, 陈允梓. VDR基因肠上皮特异性敲除小鼠的构建及其对炎症性肠病的影响[J]. 南京大学学报(自然科学版), 2019, 55(2): 332 -338 .
[5] 朱 逸,李 歌,唐东明,张豹山,杨 燚. 宽频带超材料微波吸收结构研究[J]. 南京大学学报(自然科学版), 2019, 55(3): 478 -485 .
[6] 董越男,吴兵党. 铁离子对紫外/乙酰丙酮法降解甲基橙的影响[J]. 南京大学学报(自然科学版), 2019, 55(3): 504 -510 .
[7] 徐扬,周文瑄,阮慧彬,孙雨,洪宇. 基于层次化表示的隐式篇章关系识别[J]. 南京大学学报(自然科学版), 2019, 55(6): 1000 -1009 .
[8] 柴变芳,魏春丽,曹欣雨,王建岭. 面向网络结构发现的批量主动学习算法[J]. 南京大学学报(自然科学版), 2019, 55(6): 1020 -1029 .
[9] 黄华娟,韦修喜. 基于自适应调节极大熵的孪生支持向量回归机[J]. 南京大学学报(自然科学版), 2019, 55(6): 1030 -1039 .
[10] 郭小松,赵红丽,贾俊芳,杨静,孟祥军. 密度泛函理论方法研究第一系列过渡金属对甘氨酸的配位能力[J]. 南京大学学报(自然科学版), 2019, 55(6): 1040 -1046 .