南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (1): 73–84.doi: 10.13232/j.cnki.jnju.2019.01.007

• • 上一篇    下一篇

结合目标检测的小目标语义分割算法

胡 太,杨 明*   

  1. 南京师范大学计算机科学与技术学院,南京,210023
  • 接受日期:2018-12-13 出版日期:2019-02-01 发布日期:2019-01-26
  • 通讯作者: 杨 明, E-mail:myang@njnu.edu.cn E-mail:myang@njnu.edu.cn
  • 基金资助:
    国家自然科学基金重点项目(61432008),国家自然科学基金(61876087,61272222),赛尔网络下一代互联网技术创新项目(NGII20170524)

A small object semantic segmentation algorithm combined with object detection

Hu Tai,Yang Ming*   

  1. School of Computer Science and Technology,Nanjing Normal University,Nanjing,210023,China
  • Accepted:2018-12-13 Online:2019-02-01 Published:2019-01-26
  • Contact: Yang Ming, E-mail:myang@njnu.edu.cn E-mail:myang@njnu.edu.cn

摘要: 卷积神经网络(Convolutional Neural Networks,CNN)可以提供比传统分类算法更强大的分类器并且能够自学习得到深层特征,有效地提高了图像语义分割的准确性. 然而,基于CNN的语义分割算法依然存在一些挑战,例如在复杂场景中现有较优的方法较难分割小目标. 为了解决复杂场景下小目标分割的难题,提出一种结合目标检测的小目标语义分割算法. 与现有较优方法不同的是,该方法没有直接利用单个神经网络模型同时分割单幅图像中的小尺寸和较大尺寸目标,而是将小目标分割任务从完整图像的分割任务中分离. 算法首先训练一个目标检测模型以获取小目标图像块,然后设计一个小目标分割网络得到图像块的分割结果,最终根据该结果修正整体图像的分割图. 该算法提升了语义分割数据集的总体性能,同时能够有效地解决小目标分割的难题.

关键词: 图像语义分割, 小目标分割, 卷积神经网络, 目标检测

Abstract: Convolutional Neural Networks(CNN) can provide classifiers which are more powerful than traditional classification methods and can automatically learn deep features,which significantly improve the accuracy of image semantic segmentation. However,these semantic segmentation methods based on CNNs still have some challenges,such as the difficulty in segmenting the small objects in the complex scenes. In this paper,we proposed a semantic segmentation algorithm for small objects combined with object detection,aiming to solve the segmentation challenges of small objects. This work does not directly use a single neural network to segment both small-sized and large-sized objects simultaneously. Instead,it separates the small object segmentation task from the complete image segmentation task and trains an object detection model to obtain small object image blocks. A small object segmentation network is designed to get the small object segmentation results,and the results are used to modify the overall image segmentation results. The modified segmentation maps have a better segmentation performance on small objects.

Key words: image semantic segmentation, small objects segmentation, convolutional neural networks, object detection

中图分类号: 

  • TP391
[1] Gould S,He X M. Scene understanding by labeling pixels. Communications of the ACM,2014,57(11):68-77.
[2] Koller D,Friedman N. Probabilistic graphical models:Principles and techniques. New York:MIT Press,2009,142-147.
[3] Lafferty J D,Mccallum A,Pereira F C N. Conditional random fields:probabilistic models for segmenting and labeling sequence data ∥ Proceedings of the Eighteenth International Conference on Machine Learning. San Francisco,CA,USA:Morgan Kaufmann Publishers Inc.,2001:282-289.
[4] Deng J,Dong W,Socher R,et al. ImageNet:A large-scale hierarchical image database ∥ 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami,FL,USA:IEEE,2009:248-255.
[5] Shelhamer E,Long J,Darrell T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence,2014,39(4):640-651.
[6] Dean J,Corrado G S,Monga R,et al. Large scale distributed deep networks ∥ Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe,NV,USA:Curran Associates Inc.,2012:1223-1231.
[7] Krizhevsky A,Sutskever I,Hinton G E,et al. ImageNet classification with deep convolutional neural networks ∥ Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe,NV,USA:Curran Associates Inc.,2012:1097-1105.
[8] Sermanet P,Eigen D,Zhang X,et al. Overfeat:Integrated recognition,localization and detection using convolutional networks. 2013,arXiv:1312.6229.
[9] Simonyan K,Zisserman A. Two-stream convolutional networks for action recognition in videos ∥ Proceedings of Advances in Neural Information Processing Systems. Red Hook,NY,USA:Curran Associates,Inc.,2014:568-576.
[10] Russakovsky O,Deng J,Su H,et al. ImageNet large scale visual recognition challenge. International Journal of Computer Vision,2015,115(3):211-252.
[11] Perronnin F,Snchez J,Mensink T. Improving the fisher kernel for large-scale image classification ∥ Proceedings of the 11th European Conference on Computer Vision. Springer Berlin Heidelberg,2010:143-156.
[12] Hariharan B,Arbelez P,Girshick R,et al. Hypercolumns for object segmentation and fine-grained localization ∥ Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston,MA,USA:IEEE,2015:447-456.
[13] Noh H,Hong S,Han B,et al. Learning deconvolution network for semantic segmentation ∥ 2015 IEEE International Conference on Computer Vision. Santiago,Chile:IEEE,2015:1520-1528.
[14] Simonyan K,Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014,arXiv:1409.1556.
[15] Badrinarayanan V,Kendall A,Cipolla R. Segnet:A deep convolutional encoder-decoder architec-ture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(12):2481-2495.
[16] Ronneberger O,Fischer P,Brox T. U-net:Convolutional networks for biomedical image segmentation ∥ International Conference on Medical image computing and computer-assisted intervention. Springer Berlin Heidelberg,2015:234-241.
[17] Chen L C,Papandreou G,Kokkinos I,et al. DeepLab:Semantic image segmentation with deep convolutional nets,Atrous convolution,and fully connected cRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,40(4):834-848.
[18] Chen L,Yang Y,Wang J,et al. Attention to scale:scale-aware semantic image segmenta-tion ∥ 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,NV,USA:IEEE,2016:3640-3649.
[19] Zhao H S,Shi J P,Qi X J,et al. Pyramid scene parsing network ∥ 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA:IEEE,2017:2881-2890.
[20] Yu F,Koltun V. Multi-scale context aggregation by dilated convolutions. 2015,arXiv:1511.07122.
[21] Liu W,Anguelov D,Erhan D,et al. SSD:single shot MultiBox detector ∥ European Conference on Computer Vision. Springer Berlin Heidelberg,2016:21-37.
[22] He K M,Zhang X Y,Ren S Q,et al. Deep residual learning for image recognition ∥ 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,NV,USA:IEEE,2016:770-778.
[23] Zeiler M D,Fergus R. Visualizing and understanding convolutional networks ∥ European Conference on Computer Vision. Springer Berlin Heidelberg,2014:818-833.
[24] Jia Y Q,Shelhamer E,Donahue J,et al. Caffe:convolutional architecture for fast feature embedding ∥ Proceedings of the 22nd ACM International Conference on Multimedia. New York,NY,USA:ACM,2014:675-678.
[25] Ioffe S,Szegedy C. Batch normalization:accelerating deep network training by reducing internal covariate shift ∥ Proceedings of the 32nd International Conference on Machine Learning. Lille,France:ACM,2015:448-456.
[26] Everingham M,Eslami S M A,Van Gool L,et al. The Pascal visual object classes challenge:A retrospective. International Journal of Computer Vision,2015,111(1):98-136.
[27] Lin T Y,Maire M,Belongie S,et al. Microsoft coco:Common objects in context ∥ European Conference on Computer Vision. Springer Berlin Heidelberg,2014:740-755.
[28] Chen L C,Papandreou G,Kokkinos I,et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs. 2014,arXiv:1412.7062.
[1] 朱伟,张帅,辛晓燕,李文飞,王骏,张建,王炜. 结合区域检测和注意力机制的胸片自动定位与识别[J]. 南京大学学报(自然科学版), 2020, 56(4): 591-600.
[2] 梅志伟,王维东. 基于FPGA的卷积神经网络加速模块设计[J]. 南京大学学报(自然科学版), 2020, 56(4): 581-590.
[3] 赵子龙,赵毅强,叶茂. 基于FPGA的多卷积神经网络任务实时切换方法[J]. 南京大学学报(自然科学版), 2020, 56(2): 167-174.
[4] 王吉地,郭军军,黄于欣,高盛祥,余正涛,张亚飞. 融合依存信息和卷积神经网络的越南语新闻事件检测[J]. 南京大学学报(自然科学版), 2020, 56(1): 125-131.
[5] 狄 岚, 何锐波, 梁久祯. 基于可能性聚类和卷积神经网络的道路交通标识识别算法[J]. 南京大学学报(自然科学版), 2019, 55(2): 238-250.
[6] 安 晶, 艾 萍, 徐 森, 刘 聪, 夏建生, 刘大琨. 一种基于一维卷积神经网络的旋转机械智能故障诊断方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 133-142.
[7] 孔 颉, 孙权森, 纪则轩, 刘亚洲. 基于仿射不变离散哈希的遥感图像快速目标检测新方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 49-60.
[8] 梁蒙蒙1,周 涛1,2*,夏 勇3,张飞飞1,杨 健1. 基于随机化融合和CNN的多模态肺部肿瘤图像识别[J]. 南京大学学报(自然科学版), 2018, 54(4): 775-.
[9]  丁轶,郭乔进,李宁**.  一种新的目标检测方法:Latent Dirichlet classification*
[J]. 南京大学学报(自然科学版), 2012, 48(2): 214-220.
[10]  高凯亮, 覃团发** , 陈跃波, 常侃 .  一种混合高斯背景模型下的像素分类运动目标检测方法*

[J]. 南京大学学报(自然科学版), 2011, 47(2): 195-200.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 梅世嘉,施 斌,曹鼎峰,魏广庆,张 岩,郝 瑞. 基于AHFO方法的Green-Ampt模型K0取值试验研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1085 -1094 .
[2] 许 林,张 巍*,梁小龙,肖 瑞,曹剑秋. 岩土介质孔隙结构参数灰色关联度分析[J]. 南京大学学报(自然科学版), 2018, 54(6): 1105 -1113 .
[3] 阚建飞, 任永峰, 翟继友, 董学育, 霍 瑛. 基于稀疏模型和Gabor小波字典的跟踪算法[J]. 南京大学学报(自然科学版), 2019, 55(1): 85 -91 .
[4] 严云洋, 瞿学新, 朱全银, 李 翔, 赵 阳. 基于离群点检测的分类结果置信度的度量方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 102 -109 .
[5] 安 晶, 艾 萍, 徐 森, 刘 聪, 夏建生, 刘大琨. 一种基于一维卷积神经网络的旋转机械智能故障诊断方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 133 -142 .
[6] 仲昭朝, 邹 婷, 唐惠炜, 庄 重, 张 臻. 铜胁迫对蚕豆根尖细胞凋亡及线粒体功能的影响[J]. 南京大学学报(自然科学版), 2019, 55(1): 154 -160 .
[7] 徐秀芳, 徐 森, 花小朋, 徐 静, 皋 军, 安 晶. 一种基于t-分布随机近邻嵌入的文本聚类方法[J]. 南京大学学报(自然科学版), 2019, 55(2): 264 -271 .
[8] 韩普,刘亦卓,李晓艳. 基于深度学习和多特征融合的中文电子病历实体识别研究[J]. 南京大学学报(自然科学版), 2019, 55(6): 942 -951 .
[9] 徐媛媛,张恒汝,闵帆,黄雨婷. 三支交互推荐[J]. 南京大学学报(自然科学版), 2019, 55(6): 973 -983 .
[10] 郑文萍,刘韶倩,穆俊芳. 一种基于相对熵的随机游走相似性度量模型[J]. 南京大学学报(自然科学版), 2019, 55(6): 984 -999 .