南京大学学报(自然科学版) ›› 2024, Vol. 60 ›› Issue (1): 18–25.doi: 10.13232/j.cnki.jnju.2024.01.003

• • 上一篇    下一篇

基于Transformer的城市三角网格语义分割方法

资文杰1, 贾庆仁1, 陈浩1,2(), 李军1,2, 景宁1   

  1. 1.国防科技大学电子科学学院,长沙,410073
    2.自然资源部南方丘陵区自然资源监测监管重点实验室,长沙,410073
  • 收稿日期:2023-10-27 出版日期:2024-01-30 发布日期:2024-01-29
  • 通讯作者: 陈浩 E-mail:hchen@nudt.edu.cn
  • 基金资助:
    国家自然科学基金(U19A2058);湖南省自然科学基金(2021JJ40667)

Transformer based urban triangle mesh semantic segmentation method

Wenjie Zi1, Qingren Jia1, Hao Chen1,2(), Jun Li1,2, Ning Jing1   

  1. 1.College of Electronic Science and Technology,National University of Defense Technology,Changsha,410073,China
    2.Key Laboratory of Natural Resources Monitoring and Supervision in Southern Hilly Region,Ministry of Natural Resources,Changsha,410073,China
  • Received:2023-10-27 Online:2024-01-30 Published:2024-01-29
  • Contact: Hao Chen E-mail:hchen@nudt.edu.cn

摘要:

对城市三角网格(Urban Triangle Mesh)数据进行语义分割以识别不同类别的物体,是理解和分析三维城市场景的一种非常重要的方法.城市三角网格是一种具有丰富空间拓扑关系的三维空间几何数据,包含大量的几何信息,然而,现有的方法仅仅单独对每种几何信息进行特征提取,然后简单地融合再进行语义分割,难以利用几何信息之间的关联性,对个别物体的分割性能不佳.为了解决上述问题,提出一种基于自注意力机制Transformer的模型UMeT (Urban Mesh Transformer),其由多层感知机和MeshiT (Mesh in Transformer)模块构成,不仅可以利用多层感知机提取高维特征,还可以利用MeshiT模块计算各种几何信息之间的关联性,有效挖掘城市三角网格数据中隐含的关联.实验证明,UMeT能提取高维特征,同时保证城市三角网格数据的空间不变性,从而提升了语义分割的准确性.

关键词: 城市三角网格, 语义分割, Transformer, mesh, 自注意力机制

Abstract:

For understanding and analyzing three?dimensional city scenes,semantic segmentation from urban triangle mesh data is a very important method for recognizing objects of different categories. Urban triangle mesh is a spatial three?dimensional geometric data with rich spatial topological relationships,which contains a lot of spatial geometric information. However,existing methods only extract features for each geometric information separately,and simply fuse them for semantic segmentation with difficulty in utilizing the relationship between spatial information,resulting in poor performance in segmenting individual objects of urban triangle mesh data. To solve these problems,we propose a network model UMeT (Urban Mesh Transformer) based on self?attention mechanism Transformer,which contains MLP (Multi?Layer Perceptron) and MeshiT(Mesh in Transformer) module. It not only uses MLP module to extract high?dimensional features,but also uses the MeshiT module to calculate the relationship between various geometric information,effectively mining the hidden relationship in urban triangle mesh data. UMeT extracts high?dimensional features,and ensures spatial invariance of urban triangle mesh data at the same time,improving the accuracy of semantic segmentation.

Key words: urban triangle mesh data, semantic segmentation, Transformer, mesh, self?attention mechanism

中图分类号: 

  • TP75

图1

城市三角网格"

图2

城市三角网格语义分割流程图"

图3

城市三角网格(a)和超面(b)"

图4

UMeT模型的整体概览"

表1

UMeT及对比模型的实验结果"

模型F1mIoUmAccR
UMeT0.8050.6920.9310.745
MRF⁃RF0.3450.2730.6640.342
KPConv0.4080.2730.5270.436
SUM⁃RF0.7560.6800.9310.715

表2

UMeT及对比模型的F1分数"

模型地面高植物建筑物车辆
UMeT0.9230.9280.9650.9390.6310.173
MRF⁃RF0.0540.9240.7820.0040.3900.001
KPConv0.5100.5440.5670.7390.5110.006
SUM⁃RF0.9150.9290.9600.9370.6120.165

图5

城市三角网格语义分割的结果"

表3

UMeT的消融实验结果"

模型F1mIoUmAccR
MLP0.6910.5960.9030.701
UMeT0.8050.6920.9310.745
1 王静远,李超,熊璋,等. 以数据为中心的智慧城市研究综述. 计算机研究与发展201451(2):239-259.
Wang J Y, Li C, Ziong Z,et al. Survey of data?centric smart city. Journal of Computer Research and Development201451(2):239-259.
2 方勇,龚辉,张丽,等. 从全球激光点云到三维数字地球空间框架:全球精确测绘进阶之路. 激光与光电子学进展202259(12):1200002.
Fang Y, Gong H, Zhang L,et al. From global laser point cloud acquisition to 3D digital geospatial framework:The advanced road of global accurate mapping. Laser & Optoelectronics Progress202259(12):1200002.
3 王晓宇,孙卡. 基于osgEarth的三维虚拟校园可视化. 计算机与现代化2020(11):89-93.
Wang X Y, Sun K. Visualization of 3D virtual campus based on osgEarth. Computer and Modernization2020(11):89-93.
4 Gao W X, Nan L L, Boom B,et al. PSSNet:Planarity?sensible semantic segmentation of large?scale urban meshes. ISPRS Journal of Photogram?metry and Remote Sensing2023,196:32-44.
5 Vaswani A, Shazeer N, Parmar N,et al. Attention is all you need∥Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach,CA,USA:Curran Associates Inc.,2017:6000-6010.
6 Gao W X, Nan L L, Boom B,et al. SUM:A benchmark dataset of semantic urban meshes. ISPRS Journal of Photogrammetry and Remote Sensing2021,179:108-120.
7 Zhang J Y, Zhao X L, Chen Z,et al. A review of deep learning?based semantic segmentation for point cloud. IEEE Access2019,7:179118-179133.
8 Sharp N, Attaiki S, Crane K,et al. DiffusionNet:Discretization agnostic learning on surfaces. ACM Transactions on Graphics202241(3):27.
9 Smirnov D, Solomon J. HodgeNet:Learning spectral geometry on triangle meshes. ACM Transactions on Graphics202140(4):166.
10 Sinha A, Bai J, Ramani K. Deep learning 3D shape surfaces using geometry images∥The 14th European Conference on Computer Vision. Springer Berlin Heidelberg,2016:213-240.
11 Le T, Bui G, Duan Y. A multi?view recurrent neural network for 3D mesh segmentation. Computers & Graphics2017,66:103-112.
12 Masci J, Boscaini D, Bronstein M M,et al. Geodesic convolutional neural networks on riemannian manifolds∥Proceedings of 2015 IEEE International Conference on Computer Vision workshop. Santiago,Chile:IEEE,2015:832-840.
13 He W C, Jiang Z, Zhang C M,et al. CurvaNet:Geometric deep learning based on directional curvature for 3D shape analysis∥Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Virtual Event:ACM,2020:2214-2224.
14 Lahav A, Tal A. MeshWalker:Deep mesh understanding by random walks. ACM Transactions on Graphics202039(6):263.
15 Hanocka R, Hertz A, Fish N,et al. MeshCNN:A network with an edge. ACM Transactions on Graphics201938(4):90.
16 Hu S M, Liu Z N, Guo M H,et al. Subdivision?based mesh convolution networks. ACM Transactions on Graphics202241(3):25.
17 Rouhani M, Lafarge F, Alliez P. Semantic segmentation of 3D textured meshes for urban scene analysis. ISPRS Journal of Photogrammetry and Remote Sensing2017,123:124-139.
18 Minaee S, Kalchbrenner N, Cambria E,et al. Deep learning?based text classification:A comprehensive review. ACM Computing Surveys202154(3):62.
19 Camg?z N C, Koller O, Hadfield S,et al. Sign language transformers:Joint end?to?end sign language recognition and translation∥Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,WA,USA:IEEE,2020:10020-10030.
20 Dosovitskiy A, Beyer L, Kolesnikov A,et al. An image is worth 16×16 words:Transformers for image recognition at scale. 2020,arXiv:.
21 Deng J, Dong W, Socher R,et al. ImageNet:A large?scale hierarchical image database∥2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami,FL,USA:IEEE,2009:248-255.
22 Carion N, Massa F, Synnaeve G,et al. End?to?end object detection with transformers∥The 16th European Conference on Computer Vision. Springer Berlin Heidelberg,2020:213-229.
23 Chu X X, Tian Z, Wang Y Q,et al. Twins:Revisiting the design of spatial attention in vision transformers. 2021,arXiv:.
24 Gao D H, Zhang B, Wang Q,et al. SCAT:Stride Consistency with Auto?regressive regressor and Transformer for hand pose estimation∥Proceedings of 2021 IEEE/CVF International Conference on Computer Vision Workshops. Montreal,Canada:IEEE,2021:2266-2275.
25 d’Ascoli S, Touvron H, Leavitt M L,et al. ConViT:Improving vision transformers with soft convolutional inductive biases∥Proceedings of the 38th International Conference on Machine Learning. Vienna,Austria:Curran Associates Inc.,2021:2286-2296.
26 Tolstikhin I, Houlsby N, Kolesnikov A,et al. MLP?mixer:An all?MLP architecture for vision. 2021,arXiv:.
27 Thomas H, Qi C R, Deschaud J E,et al. KPConv:Flexible and deformable convolution for point clouds∥Proceedings of 2019 IEEE/CVF Inter?national Conference on Computer Vision. Seoul,Korea (South):IEEE,2019:6410-6419.
[1] 姚瑶, 杨吉斌, 张雄伟, 陈乐乐, 范君怡. 基于多维注意力机制的单通道语音增强方法[J]. 南京大学学报(自然科学版), 2023, 59(4): 669-679.
[2] 曲皓, 狄岚, 梁久祯, 刘昊. 双端输入型嵌套融合多尺度信息的织物瑕疵检测[J]. 南京大学学报(自然科学版), 2023, 59(3): 398-412.
[3] 谭嘉辰, 董永权, 张国玺. SSM: 基于孪生网络的糖尿病视网膜眼底图像分类模型[J]. 南京大学学报(自然科学版), 2023, 59(3): 425-434.
[4] 宋耀莲, 殷喜喆, 杨俊. 基于时空特征学习Transformer的运动想象脑电解码方法[J]. 南京大学学报(自然科学版), 2023, 59(2): 313-321.
[5] 唐伟佳, 张华, 侯志荣. 基于空间卷积融合的中文文本匹配方法[J]. 南京大学学报(自然科学版), 2022, 58(5): 868-875.
[6] 苏雅茜, 崔超然, 曲浩. 基于自注意力移动平均线的时间序列预测[J]. 南京大学学报(自然科学版), 2022, 58(4): 649-657.
[7] 井花花, 晏涛, 刘渊. 融合全局和局部特征的光场图像空间超分辨率算法[J]. 南京大学学报(自然科学版), 2022, 58(2): 298-308.
[8] 曾宪华, 陆宇喆, 童世玥, 徐黎明. 结合马尔科夫场和格拉姆矩阵特征的写实类图像风格迁移[J]. 南京大学学报(自然科学版), 2021, 57(1): 1-9.
[9] 胡 太, 杨 明. 结合目标检测的小目标语义分割算法[J]. 南京大学学报(自然科学版), 2019, 55(1): 73-84.
[10] 顾健伟, 曾 诚, 邹恩岑, 陈 扬, 沈 艺, 陆 悠, 奚雪峰. 基于双向注意力流和自注意力结合的机器阅读理解[J]. 南京大学学报(自然科学版), 2019, 55(1): 125-132.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!