南京大学学报(自然科学版) ›› 2021, Vol. 57 ›› Issue (3): 356–363.doi: 10.13232/j.cnki.jnju.2021.03.002

• • 上一篇    下一篇

基于紫外⁃可见光谱和机器学习方法的溶解性有机质吸附预测模型研究

崔鹤, 刘昆, 瞿晓磊()   

  1. 污染控制与资源化研究国家重点实验室,南京大学环境学院,南京,210023
  • 收稿日期:2021-01-20 出版日期:2021-06-08 发布日期:2021-06-08
  • 通讯作者: 瞿晓磊 E-mail:xiaoleiqu@nju.edu.cn
  • 作者简介:E⁃mail:xiaoleiqu@nju.edu.cn
  • 基金资助:
    国家自然科学基金(21876075);“场地土壤污染成因与治理技术”重点专项(2019YFC1804201)

Sorption model for dissolved organic matter based on UV⁃visible spectra and machine learning

He Cui, Kun Liu, Xiaolei Qu()   

  1. State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, 210023, China
  • Received:2021-01-20 Online:2021-06-08 Published:2021-06-08
  • Contact: Xiaolei Qu E-mail:xiaoleiqu@nju.edu.cn

摘要:

有机化合物在溶解性有机质(Dissolved Organic Matter,DOM)上的吸附行为显著影响其环境归趋和生物有效性.以标化的DOM紫外?可见光谱和有机化合物的正辛醇?水分配系数为特征,运用随机森林算法,建立有机碳标化分配系数(KOC)的预测模型.结果显示,随机森林模型在全部来源DOM上的预测精度显著高于目前普遍使用的线性自由能模型,但略低于两相体系模型.随机森林模型对土壤和泥炭来源的DOM吸附预测精度显著优于其他模型,说明随机森林模型具有很好的适用性.根据随机森林模型输出的特征重要性,发现模型学习到了表征DOM分子量大小、腐殖化程度以及苯环上取代基类型的光谱特征.通过特征选择,发现模型使用少数重要性较高的特征可以达到使用全谱的效果,即使在只选取两个波长时,预测精度依然显著优于线性自由能模型.由于DOM的紫外?可见光谱可原位实时分析,基于紫外?可见光谱和机器学习方法的预测模型未来可进行原位及高通量KOC时空规律解析,从而实现更准确更精细的风险评估与管理.

关键词: 溶解性有机质, 有机碳标化分配系数, 预测模型, 紫外?可见光谱, 机器学习

Abstract:

The partition process of organic pollutants to dissolved organic matter (DOM) significantly influences their environmental fate and bioavailability. Organic carbon?water partition coefficient (KOC) prediction model was established based on the normalized UV?visible spectra of DOM and the octanol?water partition coefficient (KOW) of organic pollutants with random forest algorithm. Results suggested that the root mean squared error (RMSE) of the random forest model was significantly lower than that of most commonly used linear free energy relationships (LFERs),but slightly higher than the two?phase system model. As for the performance on soil and peat DOM,the random forest model performed best among all the models tested,indicating its broad applicability. The features selected by the random forest model were related to the molecular size,the extent of humification and the substituents on aromatic rings,which were important for organic sorption. The RMSE of the random forest model with selected features was similar to that with all normalized spectra and still significantly lower than the RMSE of LFERs even with two features. With the in?situ UV?visible spectra measurement,in?situ,high throughput and low cost prediction of KOC can be realized with our model,which can facilitate our understanding of the environmental fate and bioavailability of organic compounds and help realize accurate and high?resolution risk assessment and management.

Key words: dissolved organic matter, organic carbon?water partition coefficient, predictive model, UV?visible spectroscopy, machine learning

中图分类号: 

  • X5

表1

lg KOC预测模型的预测效果对比"

模 型训练集RMSE验证集RMSE训练集R2验证集R2折外 RMSE折外R2折外土壤泥炭DOM的RMSE
sp?LFER模型0.318±0.0040.320±0.0110.434±0.0180.419±0.0480.3210.4260.666
pp?LFER模型0.314±0.0040.319±0.0130.448±0.0190.425±0.0490.3190.4320.652
两相体系模型0.234±0.0030.238±0.0100.694±0.0080.679±0.0310.2380.6830.503
随机森林模型0.143±0.0080.258±0.0210.886±0.0100.625±0.0250.2590.6250.291

图1

sp?LFER模型,pp?LFER模型,两相体系模型以及随机森林模型实测值与模型预测值对比"

图2

200~500 nm范围内紫外?可见光谱各波长特征重要性热力图"

图3

A330/A355,A375/A355,A395/A355与常用紫外?可见光谱参数的相关性矩阵"

表2

不同特征个数的随机森林模型lg KOC预测效果"

特征选择训练集RMSE验证集RMSE训练集R2验证集R2折外 RMSE折外R2
200~500 nm内全部波长0.143±0.0080.258±0.0210.886±0.0100.625±0.0250.2590.625
Top300.130±0.0080.257±0.0220.906±0.0090.628±0.0300.2580.629
Top200.174±0.0050.251±0.0210.831±0.0090.646±0.0240.2520.646
Top150.179±0.0050.251±0.0230.821±0.0080.645±0.0320.2520.645
Top100.178±0.0060.253±0.0190.823±0.0100.641±0.0190.2530.642
Top50.187±0.0020.258±0.0190.804±0.0050.626±0.0190.2580.628
Top30.217±0.0010.259±0.0120.736±0.0080.621±0.0050.2590.624
Top20.221±0.0030.268±0.0190.727±0.0060.594±0.0270.2690.596
Top10.210±0.0060.276±0.0360.754±0.0110.568±0.0900.2780.568
Top00.306±0.0050.314±0.0130.306±0.0050.441±0.0420.3150.447
1 Chiou C T,Peters L J,Freed V H. A physical concept of soil?water equilibria for nonionic organic compounds. Science,1979,206 (4420):831-832.
2 Luthy R G,Aiken G R,Brusseau M L,et al. Sequestration of hydrophobic organic contaminants by geosorbents. Environmental Science & Technology,1997,31 (12):3341-3347.
3 Bronner G,Goss K U. Predicting sorption of pesticides and other multifunctional organic chemicals to soil organic carbon. Environmental Science & Technology,2011,45 (4):1313-1319.
4 Chiou C T,Porter P E,Schmedding D W. Partition equilibriums of nonionic organic compounds between soil organic matter and water. Environmental Science & Technology,1983,17 (4):227-231.
5 Endo S,Goss K U. Applications of polyparameter linear free energy relationships in environmental chemistry. Environmental Science & Technology,2014,48 (21):12477-12491.
6 Goss K U,Schwarzenbach R P. Linear free energy relationships used to evaluate equilibrium partitioning of organic compounds. Environmental Science & Technology,2001,35 (1):1-9.
7 Nguyen T H,Goss K U,Ball W P. Polyparameter linear free energy relationships for estimating the equilibrium partition of organic compounds between water and the natural organic matter in soils and sediments. Environmental Science & Technology,2005,39 (4):913-924.
8 Zhu D Q,Pignatello J J. A concentration?dependent multi?term linear free energy relationship for sorption of organic compounds to soils based on the hexadecane dilute?solution reference state. Environmental Science & Technology,2005,39 (22):8817-8828.
9 Fu H Y,Liu K,Alvarez P J J,et al. Quantifying hydrophobicity of natural organic matter using partition coefficients in aqueous two?phase systems. Chemosphere,2019,218:922-929.
10 Liu K,Fu H Y,Zhu D Q,et al. Prediction of apolar compound sorption to aquatic natural organic matter accounting for natural organic matter hydrophobicity using aqueous two?phase systems. Environmental Science & Technology,2019,53 (14):8127-8135.
11 Peuravuori J,Pihlaja K. Molecular size distribution and spectroscopic properties of aquatic humic substances. Analytica Chimica Acta,1997,337 (2):133-149.
12 Helms J R,Stubbins A,Ritchie J D,et al. Absorption spectral slopes and slope ratios as indicators of molecular weight,source,and photobleaching of chromophoric dissolved organic matter. Limnology and Oceanography,2008,53 (3):955-969.
13 Chin Y P,Aiken G,O'Loughlin E. Molecular weight,polydispersity,and spectroscopic properties of aquatic humic substances. Environmental Science & Technology,1994,28 (11):1853-1858.
14 Breiman L. Random forests. Machine Learning,2001,45 (1):5-32.
15 Breiman L. Bagging predictors. Machine Learning,1996,24 (2):123-140.
16 Sun H Y,Shi X,Mao J D,et al. Tetracycline sorption to coal and soil humic acids:An examination of humic structural heterogeneity. Environmental Toxicology and Chemistry,2010,29(9):1934-1942.
17 Korshin G V,Li C W,Benjamin M M. Monitoring the properties of natural organic matter through UV spectroscopy:A consistent theory. Water Research,1997,31 (7):1787-1795.
18 Chin Y P,Aiken G R,Danielsen K M. Binding of pyrene to aquatic and commercial humic substances:The role of molecular weight and aromaticity. Environmental Science & Technology,1997,31 (6):1630-1635.
19 Fu H Y,Wei C H,Qu X L,et al. Strong binding of apolar hydrophobic organic contaminants by dissolved black carbon released from biochar:A mechanism of pseudomicelle partition and environmental implications. Environmental Pollution,2018 (232):402-410.
20 Liu K,Kong L R,Wang J X,et al. Two?phase system model to assess hydrophobic organic compound sorption to dissolved organic matter. Environmental Science & Technology,2020,54 (19):12173-12180.
21 Yao X,Zhang Y L,Zhu G W,et al. Resolving the variability of CDOM fluorescence to differentiate the sources and fate of DOM in Lake Taihu and its tributaries. Chemosphere,2011,82 (2):145-155.
22 Abraham M H,Doherty R M,Kamlet M J,et al. Linear solvation energy relationships. Part38. An analysis of the use of solvent parameters in the correlation of rate constants,with special reference to the solvolysis of t?butyl chloride. Journal of the Chemical Society,Perkin Transactions,1987,2 (8):1097-1101.
23 Abraham M H,Doherty R M,Kamlet M J,et al. Linear solvation energy relationships. Part37. An analysis of contributions of dipolarity?polarisability,nucleophilic assistance,electrophilic assistance,and cavity terms to solvent effects on t?butyl halide solvolysis rates. Journal of the Chemical Society,Perkin Transactions,1987,2 (7):913-920.
24 Bergstra J,Bardenet R,Bengio Y,et al. Algorithms for hyper?parameter optimization∥Proceedings of the 24th International Conference on Neural Information Processing Systems. Red Hook,NY,USA:Curran Associates Inc.,2011:2546-2554.
25 Hutter F,Hoos H H,Leyton?Brown K. Sequential model?based optimization for general algorithm configuration∥Learning and Intelligent Optimization. Springer Berlin Heidelberg,2011:507-523.
26 Wang L Y,Wu F C,Zhang R Y,et al. Characterization of dissolved organic matter fractions from Lake Hongfeng,Southwestern China Plateau. Journal of Environmental Sciences,2009,21 (5):581-588.
27 Huo S L,Xi B D,Zan F Y,et al. Dissolved organic matter in digested piggery wastewater from combined treatment process. Desalination and Water Treatment,2013,51 (10-12):2351-2361.
[1] 曹欣怡,李鹤,王蔚. 基于语料库的语音情感识别的性别差异研究[J]. 南京大学学报(自然科学版), 2019, 55(5): 758-764.
[2] 王鹏,林志斌. 基于响度级、耳间互相关系数和中心频率的主观声场宽度预测模型[J]. 南京大学学报(自然科学版), 2019, 55(5): 804-812.
[3] 阚 威, 李 云. 基于LSTM的脑电情绪识别模型[J]. 南京大学学报(自然科学版), 2019, 55(1): 110-116.
[4]  王卓君,申德荣*,聂铁铮,寇 月,于 戈.  UCM-PPM:基于用户分级的多参量Web预测模型[J]. 南京大学学报(自然科学版), 2018, 54(1): 85-.
[5] 朱亚奇1,邓维斌1 ,2*. 一种基于不平衡数据的聚类抽样方法[J]. 南京大学学报(自然科学版), 2015, 51(2): 421-429.
[6] 朱亚奇1,邓维斌1,2*. 一种基于不平衡数据的聚类抽样方法[J]. 南京大学学报(自然科学版), 2015, 51(2): 421-429.
[7]  刘紫赞,沈勇,王思理,沈坚
.  用客观测量数据预测微型扬声器感知音质的复回归模型[J]. 南京大学学报(自然科学版), 2012, 48(5): 648-653.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!