基于俯视角融合的多模态三维目标检测

doi:10.13232/j.cnki.jnju.2023.06.009

南京大学学报(自然科学版) ›› 2023, Vol. 59 ›› Issue (6): 996–1002.doi: 10.13232/j.cnki.jnju.2023.06.009

基于俯视角融合的多模态三维目标检测

钱多, 殷俊()

上海海事大学信息工程学院, 上海，201306

收稿日期:2023-08-10 出版日期:2023-11-30 发布日期:2023-12-06
通讯作者: 殷俊 E-mail:junyin@shmtu.edu.cn
基金资助:
上海市浦江人才计划(22PJD029)

Multi⁃modal 3D object detection based on Bird⁃Eye⁃View fusion

Duo Qian, Jun Yin()

College of Information Engineering，Shanghai Maritime University，Shanghai，201306，China

Received:2023-08-10 Online:2023-11-30 Published:2023-12-06
Contact: Jun Yin E-mail:junyin@shmtu.edu.cn

摘要/Abstract

摘要：

三维目标检测中图像数据难以获得目标距离信息，点云数据难以获得目标类别信息，为此提出一种将图像转为俯视角特征的方法，将多尺度图像特征按水平维度展平，通过稠密变换层转变为多尺度图像俯视角特征，最终重塑为全局图像俯视角特征.在此基础上，提出一种基于俯视角融合的多模态三维目标检测网络，利用特征拼接或元素相加的方法融合图像俯视角特征与点云俯视角特征.在KITTI数据集上的实验表明，提出的基于俯视角融合的多模态三维目标检测网络对于车辆、行人目标的检测效果优于其他流行的三维目标检测方法.

关键词: 三维目标检测, 多模态融合, 点云, 俯视角, 深度学习

Abstract:

In order to solve the problem that it is difficult to obtain target distance information from image data and target category information from point cloud data in 3D object detection，a method is proposed to convert the image into Bird?Eye?View features. This method flattens the multi?scale image features according to horizontal dimensions and transforms them into multi?scale image Bird?Eye?View features through dense transformation layers，and finally reshapes them into global image top angle features. On this basis，a multi?modal 3D object detection network based on Bird?Eye?View fusion is proposed to fuse the Bird?Eye?View features of image and point cloud with feature concating or element addition. Experiments on KITTI data set show that the multi?modal 3D object detection network based on Bird?Eye?View fusion proposed in this paper is better than other popular 3D object detection methods for vehicles and pedestrians.

Key words: 3D object detection, multi?modal fusion, point cloud, Bird?Eye?View, deep learning

中图分类号:

U471.1

钱多, 殷俊. 基于俯视角融合的多模态三维目标检测[J]. 南京大学学报(自然科学版), 2023, 59(6): 996–1002.

Duo Qian, Jun Yin. Multi⁃modal 3D object detection based on Bird⁃Eye⁃View fusion[J]. Journal of Nanjing University(Natural Sciences), 2023, 59(6): 996–1002.

图/表 10

图1

图2

图3

表1

多尺度特征在俯视图占比"

k	0	1	2	3	4
$S k$	8	16	32	64	128
$Z k 70.4 m$	36.4	18.2	9.0	4.5	2.3
FPN_output	P3	P4	P5	P6	P7

表1

表2

表3

表4

表5

表6

图4

参考文献 20

1	Roddick T， Cipolla R. Predicting semantic map representations from images using pyramid occupancy networks∥Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle，WA，USA：IEEE，2020：11135-11144.
2	Philion J， Lift Fidler S.，splat，shoot：Encoding images from arbitrary camera rigs by implicitly unprojecting to 3D∥Proceedings of the 16th European Conference on Computer Vision. Springer Berlin Heidelberg，2020：194-210.
3	迟旭然，裴伟，朱永英，等. Fast Stereo?RCNN三维目标检测算法. 小型微型计算机系统，2022，43(10)：2157-2161.
	Chi X R， Pei W， Zhu Y Y，et al. Fast Stereo?RCNN 3D target detection algorithm. Journal of Chinese Computer Systems，2022，43(10)：2157-2161.
4	Yan Y， Mao Y X， Li B. SECOND：Sparsely embedded convolutional detection. Sensors，2018，18(10)：3337.
5	Lang A H， Vora S， Caesar H，et al. PointPillars：Fast encoders for object detection from point clouds∥Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach，CA，USA：IEEE，2019：12689-12697.
6	陆慧敏，杨朔. 基于深度神经网络的自动驾驶场景三维目标检测算法. 北京工业大学学报，2022，48(6)：589-597.
	Lu H M， Yang S. Three?dimensional object detection algorithm based on deep neural networks for automatic driving. Journal of Beijing University of Technology，2022，48(6)：589-597.
7	张燕咏，张莎，张昱，等. 基于多模态融合的自动驾驶感知及计算. 计算机研究与发展，2020，57(9)：1781-1799.
	Zhang Y Y， Zhang S， Zhang Y，et al. Multi?modality fusion perception and computing in autonomous driving. Journal of Computer Research and Development，2020，57(9)：1781-1799.
8	王亚东，田永林，李国强，等. 基于卷积神经网络的三维目标检测研究综述. 模式识别与人工智能，2021，34(12)：1103-1119.
	Wang Y D， Tian Y L， Li G Q，et al. 3D object detection based on convolu?tional neural networks：a survey. Pattern Recognition and Artificial Intelligence，2021，34(12)：1103-1119.
9	Qi C R， Liu W， Wu C X，et al. Frustum PointNets for 3D object detection from RGB?D data∥Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City，UT，USA：IEEE，2018：918-927.
10	Wang Z X， Jia K. Frustum ConvNet：Sliding frustums to aggregate local point?wise features for amodal 3D object detection∥Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macau，China：IEEE，2019：1742-1749.
11	Chen X Z， Ma H M， Wan J，et al. Multi?view 3D object detection network for autonomous driving∥Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu，HI，USA：IEEE，2017：6526-6534.
12	Ku J， Mozifian M， Lee J，et al. Joint 3D proposal generation and object detection from view aggregation∥Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid，Spain：IEEE，2018：1-8.
13	Lin T Y， Dollár P， Girshick R，et al. Feature pyramid networks for object detection∥Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu，HI，USA：IEEE，2017：936-944.
14	Liu Z J， Tang H T， Amini A，et al. BEVFusion：Multi?task multi?sensor fusion with unified bird's?eye view representation. 2022，arXiv:.
15	Liang M， Yang B， Chen Y，et al. Multi?task multi?sensor fusion for 3D object detection∥Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach，CA，USA：IEEE，2019：7337-7345.
16	Pang S， Morris D， Radha H. CLOCs：Camera?LiDAR object candidates fusion for 3D object detection∥Proceedings of 2020 IEEE/RSJ Inter?national Conference on Intelligent Robots and Systems. Las Vegas，NV，USA：IEEE，2020：10386-10393.
17	He K M， Zhang X Y， Ren S Q，et al. Deep residual learning for image recognition∥Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas，NV，USA：IEEE，2016：770-778.
18	Zhou Y， Tuzel O. VoxelNet：End?to?end learning for point cloud based 3D object detection∥Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City，UT，USA：IEEE，2018：4490-4499.
19	Ren S Q， He K M， Girshick R，et al. Faster R?CNN：Towards real?time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39(6)：1137-1149.
20	Geiger A， Lenz P， Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite∥Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence，RI，USA：IEEE，2012：3354-3361.

相关文章 15

[1]	王昱翔, 葛洪伟. 基于U²⁃Net的金属表面缺陷检测算法[J]. 南京大学学报(自然科学版), 2023, 59(3): 413-424.
[2]	谭嘉辰, 董永权, 张国玺. SSM: 基于孪生网络的糖尿病视网膜眼底图像分类模型[J]. 南京大学学报(自然科学版), 2023, 59(3): 425-434.
[3]	卞苏阳, 严云洋, 龚成张, 冷志超, 祝巧巧. 基于CXANet⁃YOLO的火焰检测方法[J]. 南京大学学报(自然科学版), 2023, 59(2): 295-301.
[4]	林灏昶, 秦云川, 蔡宇辉, 李肯立, 唐卓. 基于目标检测的图形用户界面控件识别方法[J]. 南京大学学报(自然科学版), 2022, 58(6): 1012-1019.
[5]	周佳倩, 林培光, 李庆涛, 王基厚, 刘利达. MDDE：一种基于投资组合的金融市场趋势分析方法[J]. 南京大学学报(自然科学版), 2022, 58(5): 876-883.
[6]	罗思涵, 杨燕. 一种基于深度学习和元学习的出行时间预测方法[J]. 南京大学学报(自然科学版), 2022, 58(4): 561-569.
[7]	杜渊洋, 邓成伟, 张建. 基于深度卷积神经网络的RNA三维结构打分函数[J]. 南京大学学报(自然科学版), 2022, 58(3): 369-376.
[8]	陈轶洲, 刘旭生, 孙林檀, 李文中, 方立兵, 陆桑璐. 基于图神经网络的社交网络影响力预测算法[J]. 南京大学学报(自然科学版), 2022, 58(3): 386-397.
[9]	张玮, 赵永虹, 邱桃荣. 基于注意力机制和深度学习的运动想象脑电信号分类方法[J]. 南京大学学报(自然科学版), 2022, 58(1): 29-37.
[10]	孟浩, 刘强. 基于FPGA的卷积神经网络训练加速器设计[J]. 南京大学学报(自然科学版), 2021, 57(6): 1075-1082.
[11]	陈磊, 孙权森, 王凡海. 基于深度对抗网络和局部模糊探测的目标运动去模糊[J]. 南京大学学报(自然科学版), 2021, 57(5): 735-749.
[12]	倪斌, 陆晓蕾, 童逸琦, 马涛, 曾志贤. 胶囊神经网络在期刊文本分类中的应用[J]. 南京大学学报(自然科学版), 2021, 57(5): 750-756.
[13]	杨静, 赵文仓, 徐越, 冯旸赫, 黄金才. 一种基于少样本数据的在线主动学习与分类方法[J]. 南京大学学报(自然科学版), 2021, 57(5): 757-766.
[14]	贾霄, 郭顺心, 赵红. 基于图像属性的零样本分类方法综述[J]. 南京大学学报(自然科学版), 2021, 57(4): 531-543.
[15]	普志方, 陈秀宏. 基于卷积神经网络的细胞核图像分割算法[J]. 南京大学学报(自然科学版), 2021, 57(4): 566-574.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

检测器	输入	简单	中等	困难
MV3D^[12]	Lidar+RGB	70.71%	63.44%	56.02%
AVOD⁃FPN^[13]	Lidar+RGB	81.88%	71.94%	66.45%
F⁃pointnet^[10]	Lidar+RGB	82.03%	71.32%	62.19%
MMF^[15]	Lidar+RGB	85.31%	75.41%	66.31%
SECOND^[4]	Lidar	82.55%	70.35%	66.67%
Ours(拼接)	Lidar+RGB	85.53%	72.40%	70.46%
Ours(元素相加)	Lidar+RGB	84.23%	71.14%	70.55%

检测器	输入	简单	中等	困难
MV3D^[12]	Lidar+RGB	86.12%	76.78%	68.50%
AVOD⁃FPN^[13]	Lidar+RGB	88.53%	83.79%	77.11%
F⁃pointnet^[10]	Lidar+RGB	87.67%	83.89%	75.88%
MMF^[15]	Lidar+RGB	89.49%	86.56%	79.31%
SECOND^[4]	Lidar	91.05%	83.16%	80.60%
Ours(拼接)	Lidar+RGB	91.92%	85.34%	83.22%
Ours(元素相加)	Lidar+RGB	90.27%	84.47%	80.18%

检测器	输入	简单	中等	困难
MV3D^[12]	Lidar+RGB	90.56%	89.45%	80.16%
AVOD⁃FPN^[13]	Lidar+RGB	89.79%	87.55%	80.12%
F⁃pointnet^[10]	Lidar+RGB	90.54%	89.84%	81.26%
MMF^[15]	Lidar+RGB	91.82%	89.77%	87.65%
SECOND^[4]	Lidar	-	-	-
Ours(拼接)	Lidar+RGB	95.52%	89.61%	87.30%
Ours(元素相加)	Lidar+RGB	94.98%	88.72%	87.05%

检测器	输入	简单	中等	困难
AVOD⁃FPN^[13]	Lidar+RGB	50.80%	42.81%	40.88%
F⁃pointnet^[10]	Lidar+RGB	51.17%	44.56%	40.33%
Ours(拼接)	Lidar+RGB	50.10%	46.67%	42.35%
Ours(元素相加)	Lidar+RGB	50.77%	45.82%	40.16%

检测器	输入	简单	中等	困难
AVOD⁃FPN^[13]	Lidar+RGB	64.00%	52.18%	46.61%
F⁃pointnet^[10]	Lidar+RGB	71.88%	55.59%	50.11%
Ours(拼接)	Lidar+RGB	66.56%	49.88%	48.72%
Ours(元素相加)	Lidar+RGB	65.81%	48.75%	46.83%

基于俯视角融合的多模态三维目标检测

Multi⁃modal 3D object detection based on Bird⁃Eye⁃View fusion

RichHTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 20

相关文章 15

Metrics

本文评价

推荐阅读 0