南京大学学报(自然科学版) ›› 2023, Vol. 59 ›› Issue (6): 996–1002.doi: 10.13232/j.cnki.jnju.2023.06.009

• • 上一篇    下一篇

基于俯视角融合的多模态三维目标检测

钱多, 殷俊()   

  1. 上海海事大学信息工程学院, 上海,201306
  • 收稿日期:2023-08-10 出版日期:2023-11-30 发布日期:2023-12-06
  • 通讯作者: 殷俊 E-mail:junyin@shmtu.edu.cn
  • 基金资助:
    上海市浦江人才计划(22PJD029)

Multi⁃modal 3D object detection based on Bird⁃Eye⁃View fusion

Duo Qian, Jun Yin()   

  1. College of Information Engineering,Shanghai Maritime University,Shanghai,201306,China
  • Received:2023-08-10 Online:2023-11-30 Published:2023-12-06
  • Contact: Jun Yin E-mail:junyin@shmtu.edu.cn

摘要:

三维目标检测中图像数据难以获得目标距离信息,点云数据难以获得目标类别信息,为此提出一种将图像转为俯视角特征的方法,将多尺度图像特征按水平维度展平,通过稠密变换层转变为多尺度图像俯视角特征,最终重塑为全局图像俯视角特征.在此基础上,提出一种基于俯视角融合的多模态三维目标检测网络,利用特征拼接或元素相加的方法融合图像俯视角特征与点云俯视角特征.在KITTI数据集上的实验表明,提出的基于俯视角融合的多模态三维目标检测网络对于车辆、行人目标的检测效果优于其他流行的三维目标检测方法.

关键词: 三维目标检测, 多模态融合, 点云, 俯视角, 深度学习

Abstract:

In order to solve the problem that it is difficult to obtain target distance information from image data and target category information from point cloud data in 3D object detection,a method is proposed to convert the image into Bird?Eye?View features. This method flattens the multi?scale image features according to horizontal dimensions and transforms them into multi?scale image Bird?Eye?View features through dense transformation layers,and finally reshapes them into global image top angle features. On this basis,a multi?modal 3D object detection network based on Bird?Eye?View fusion is proposed to fuse the Bird?Eye?View features of image and point cloud with feature concating or element addition. Experiments on KITTI data set show that the multi?modal 3D object detection network based on Bird?Eye?View fusion proposed in this paper is better than other popular 3D object detection methods for vehicles and pedestrians.

Key words: 3D object detection, multi?modal fusion, point cloud, Bird?Eye?View, deep learning

中图分类号: 

  • U471.1

图1

本文模型的网络架构"

图2

俯视图的特征转变"

图3

多尺度2D特征提取网络"

表1

多尺度特征在俯视图占比"

k01234
Sk8163264128
Zk 70.4 m36.418.29.04.52.3
FPNoutputP3P4P5P6P7

表2

KITTI车辆3D检测结果的平均精度"

检测器输入简单中等困难
MV3D[12]Lidar+RGB70.71%63.44%56.02%
AVOD⁃FPN[13]Lidar+RGB81.88%71.94%66.45%
F⁃pointnet[10]Lidar+RGB82.03%71.32%62.19%
MMF[15]Lidar+RGB85.31%75.41%66.31%
SECOND[4]Lidar82.55%70.35%66.67%
Ours(拼接)Lidar+RGB85.53%72.40%70.46%
Ours(元素相加)Lidar+RGB84.23%71.14%70.55%

表3

KITTI车辆BEV检测结果的平均精度"

检测器输入简单中等困难
MV3D[12]Lidar+RGB86.12%76.78%68.50%
AVOD⁃FPN[13]Lidar+RGB88.53%83.79%77.11%
F⁃pointnet[10]Lidar+RGB87.67%83.89%75.88%
MMF[15]Lidar+RGB89.49%86.56%79.31%
SECOND[4]Lidar91.05%83.16%80.60%
Ours(拼接)Lidar+RGB91.92%85.34%83.22%
Ours(元素相加)Lidar+RGB90.27%84.47%80.18%

表4

KITTI车辆2D检测结果的平均精度"

检测器输入简单中等困难
MV3D[12]Lidar+RGB90.56%89.45%80.16%
AVOD⁃FPN[13]Lidar+RGB89.79%87.55%80.12%
F⁃pointnet[10]Lidar+RGB90.54%89.84%81.26%
MMF[15]Lidar+RGB91.82%89.77%87.65%
SECOND[4]Lidar---
Ours(拼接)Lidar+RGB95.52%89.61%87.30%
Ours(元素相加)Lidar+RGB94.98%88.72%87.05%

表5

KITTI行人3D检测结果的平均精度"

检测器输入简单中等困难
AVOD⁃FPN[13]Lidar+RGB50.80%42.81%40.88%
F⁃pointnet[10]Lidar+RGB51.17%44.56%40.33%
Ours(拼接)Lidar+RGB50.10%46.67%42.35%
Ours(元素相加)Lidar+RGB50.77%45.82%40.16%

表6

KITTI骑行人3D检测结果的平均精度"

检测器输入简单中等困难
AVOD⁃FPN[13]Lidar+RGB64.00%52.18%46.61%
F⁃pointnet[10]Lidar+RGB71.88%55.59%50.11%
Ours(拼接)Lidar+RGB66.56%49.88%48.72%
Ours(元素相加)Lidar+RGB65.81%48.75%46.83%

图4

三维目标检测结果的可视化"

1 Roddick T, Cipolla R. Predicting semantic map representations from images using pyramid occupancy networks∥Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,WA,USA:IEEE,2020:11135-11144.
2 Philion J, Lift Fidler S.,splat,shoot:Encoding images from arbitrary camera rigs by implicitly unprojecting to 3D∥Proceedings of the 16th European Conference on Computer Vision. Springer Berlin Heidelberg,2020:194-210.
3 迟旭然,裴伟,朱永英,等. Fast Stereo?RCNN三维目标检测算法. 小型微型计算机系统202243(10):2157-2161.
Chi X R, Pei W, Zhu Y Y,et al. Fast Stereo?RCNN 3D target detection algorithm. Journal of Chinese Computer Systems202243(10):2157-2161.
4 Yan Y, Mao Y X, Li B. SECOND:Sparsely embedded convolutional detection. Sensors201818(10):3337.
5 Lang A H, Vora S, Caesar H,et al. PointPillars:Fast encoders for object detection from point clouds∥Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach,CA,USA:IEEE,2019:12689-12697.
6 陆慧敏,杨朔. 基于深度神经网络的自动驾驶场景三维目标检测算法. 北京工业大学学报202248(6):589-597.
Lu H M, Yang S. Three?dimensional object detection algorithm based on deep neural networks for automatic driving. Journal of Beijing University of Technology202248(6):589-597.
7 张燕咏,张莎,张昱,等. 基于多模态融合的自动驾驶感知及计算. 计算机研究与发展202057(9):1781-1799.
Zhang Y Y, Zhang S, Zhang Y,et al. Multi?modality fusion perception and computing in autonomous driving. Journal of Computer Research and Development202057(9):1781-1799.
8 王亚东,田永林,李国强,等. 基于卷积神经网络的三维目标检测研究综述. 模式识别与人工智能202134(12):1103-1119.
Wang Y D, Tian Y L, Li G Q,et al. 3D object detection based on convolu?tional neural networks:a survey. Pattern Recognition and Artificial Intelligence202134(12):1103-1119.
9 Qi C R, Liu W, Wu C X,et al. Frustum PointNets for 3D object detection from RGB?D data∥Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA:IEEE,2018:918-927.
10 Wang Z X, Jia K. Frustum ConvNet:Sliding frustums to aggregate local point?wise features for amodal 3D object detection∥Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macau,China:IEEE,2019:1742-1749.
11 Chen X Z, Ma H M, Wan J,et al. Multi?view 3D object detection network for autonomous driving∥Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA:IEEE,2017:6526-6534.
12 Ku J, Mozifian M, Lee J,et al. Joint 3D proposal generation and object detection from view aggregation∥Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid,Spain:IEEE,2018:1-8.
13 Lin T Y, Dollár P, Girshick R,et al. Feature pyramid networks for object detection∥Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA:IEEE,2017:936-944.
14 Liu Z J, Tang H T, Amini A,et al. BEVFusion:Multi?task multi?sensor fusion with unified bird's?eye view representation. 2022,arXiv:.
15 Liang M, Yang B, Chen Y,et al. Multi?task multi?sensor fusion for 3D object detection∥Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach,CA,USA:IEEE,2019:7337-7345.
16 Pang S, Morris D, Radha H. CLOCs:Camera?LiDAR object candidates fusion for 3D object detection∥Proceedings of 2020 IEEE/RSJ Inter?national Conference on Intelligent Robots and Systems. Las Vegas,NV,USA:IEEE,2020:10386-10393.
17 He K M, Zhang X Y, Ren S Q,et al. Deep residual learning for image recognition∥Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,NV,USA:IEEE,2016:770-778.
18 Zhou Y, Tuzel O. VoxelNet:End?to?end learning for point cloud based 3D object detection∥Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA:IEEE,2018:4490-4499.
19 Ren S Q, He K M, Girshick R,et al. Faster R?CNN:Towards real?time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence201739(6):1137-1149.
20 Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite∥Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence,RI,USA:IEEE,2012:3354-3361.
[1] 王昱翔, 葛洪伟. 基于U2⁃Net的金属表面缺陷检测算法[J]. 南京大学学报(自然科学版), 2023, 59(3): 413-424.
[2] 谭嘉辰, 董永权, 张国玺. SSM: 基于孪生网络的糖尿病视网膜眼底图像分类模型[J]. 南京大学学报(自然科学版), 2023, 59(3): 425-434.
[3] 卞苏阳, 严云洋, 龚成张, 冷志超, 祝巧巧. 基于CXANet⁃YOLO的火焰检测方法[J]. 南京大学学报(自然科学版), 2023, 59(2): 295-301.
[4] 林灏昶, 秦云川, 蔡宇辉, 李肯立, 唐卓. 基于目标检测的图形用户界面控件识别方法[J]. 南京大学学报(自然科学版), 2022, 58(6): 1012-1019.
[5] 周佳倩, 林培光, 李庆涛, 王基厚, 刘利达. MDDE:一种基于投资组合的金融市场趋势分析方法[J]. 南京大学学报(自然科学版), 2022, 58(5): 876-883.
[6] 罗思涵, 杨燕. 一种基于深度学习和元学习的出行时间预测方法[J]. 南京大学学报(自然科学版), 2022, 58(4): 561-569.
[7] 杜渊洋, 邓成伟, 张建. 基于深度卷积神经网络的RNA三维结构打分函数[J]. 南京大学学报(自然科学版), 2022, 58(3): 369-376.
[8] 陈轶洲, 刘旭生, 孙林檀, 李文中, 方立兵, 陆桑璐. 基于图神经网络的社交网络影响力预测算法[J]. 南京大学学报(自然科学版), 2022, 58(3): 386-397.
[9] 张玮, 赵永虹, 邱桃荣. 基于注意力机制和深度学习的运动想象脑电信号分类方法[J]. 南京大学学报(自然科学版), 2022, 58(1): 29-37.
[10] 孟浩, 刘强. 基于FPGA的卷积神经网络训练加速器设计[J]. 南京大学学报(自然科学版), 2021, 57(6): 1075-1082.
[11] 陈磊, 孙权森, 王凡海. 基于深度对抗网络和局部模糊探测的目标运动去模糊[J]. 南京大学学报(自然科学版), 2021, 57(5): 735-749.
[12] 倪斌, 陆晓蕾, 童逸琦, 马涛, 曾志贤. 胶囊神经网络在期刊文本分类中的应用[J]. 南京大学学报(自然科学版), 2021, 57(5): 750-756.
[13] 杨静, 赵文仓, 徐越, 冯旸赫, 黄金才. 一种基于少样本数据的在线主动学习与分类方法[J]. 南京大学学报(自然科学版), 2021, 57(5): 757-766.
[14] 贾霄, 郭顺心, 赵红. 基于图像属性的零样本分类方法综述[J]. 南京大学学报(自然科学版), 2021, 57(4): 531-543.
[15] 普志方, 陈秀宏. 基于卷积神经网络的细胞核图像分割算法[J]. 南京大学学报(自然科学版), 2021, 57(4): 566-574.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!