基于Transformer的城市三角网格语义分割方法

doi:10.13232/j.cnki.jnju.2024.01.003

南京大学学报(自然科学版) ›› 2024, Vol. 60 ›› Issue (1): 18–25.doi: 10.13232/j.cnki.jnju.2024.01.003

基于Transformer的城市三角网格语义分割方法

资文杰¹, 贾庆仁¹, 陈浩¹^,²(), 李军¹^,², 景宁¹

^1.国防科技大学电子科学学院，长沙，410073
^2.自然资源部南方丘陵区自然资源监测监管重点实验室，长沙，410073

收稿日期:2023-10-27 出版日期:2024-01-30 发布日期:2024-01-29
通讯作者: 陈浩 E-mail:hchen@nudt.edu.cn
基金资助:
国家自然科学基金(U19A2058);湖南省自然科学基金(2021JJ40667)

Transformer based urban triangle mesh semantic segmentation method

Wenjie Zi¹, Qingren Jia¹, Hao Chen¹^,²(), Jun Li¹^,², Ning Jing¹

^1.College of Electronic Science and Technology，National University of Defense Technology，Changsha，410073，China
^2.Key Laboratory of Natural Resources Monitoring and Supervision in Southern Hilly Region，Ministry of Natural Resources，Changsha，410073，China

Received:2023-10-27 Online:2024-01-30 Published:2024-01-29
Contact: Hao Chen E-mail:hchen@nudt.edu.cn

摘要/Abstract

摘要：

对城市三角网格（Urban Triangle Mesh）数据进行语义分割以识别不同类别的物体，是理解和分析三维城市场景的一种非常重要的方法.城市三角网格是一种具有丰富空间拓扑关系的三维空间几何数据，包含大量的几何信息，然而，现有的方法仅仅单独对每种几何信息进行特征提取，然后简单地融合再进行语义分割，难以利用几何信息之间的关联性，对个别物体的分割性能不佳.为了解决上述问题，提出一种基于自注意力机制Transformer的模型UMeT （Urban Mesh Transformer），其由多层感知机和MeshiT （Mesh in Transformer）模块构成，不仅可以利用多层感知机提取高维特征，还可以利用MeshiT模块计算各种几何信息之间的关联性，有效挖掘城市三角网格数据中隐含的关联.实验证明，UMeT能提取高维特征，同时保证城市三角网格数据的空间不变性，从而提升了语义分割的准确性.

关键词: 城市三角网格, 语义分割, Transformer, mesh, 自注意力机制

Abstract:

For understanding and analyzing three?dimensional city scenes，semantic segmentation from urban triangle mesh data is a very important method for recognizing objects of different categories. Urban triangle mesh is a spatial three?dimensional geometric data with rich spatial topological relationships，which contains a lot of spatial geometric information. However，existing methods only extract features for each geometric information separately，and simply fuse them for semantic segmentation with difficulty in utilizing the relationship between spatial information，resulting in poor performance in segmenting individual objects of urban triangle mesh data. To solve these problems，we propose a network model UMeT (Urban Mesh Transformer) based on self?attention mechanism Transformer，which contains MLP (Multi?Layer Perceptron) and MeshiT(Mesh in Transformer) module. It not only uses MLP module to extract high?dimensional features，but also uses the MeshiT module to calculate the relationship between various geometric information，effectively mining the hidden relationship in urban triangle mesh data. UMeT extracts high?dimensional features，and ensures spatial invariance of urban triangle mesh data at the same time，improving the accuracy of semantic segmentation.

Key words: urban triangle mesh data, semantic segmentation, Transformer, mesh, self?attention mechanism

中图分类号:

TP75

资文杰, 贾庆仁, 陈浩, 李军, 景宁. 基于Transformer的城市三角网格语义分割方法[J]. 南京大学学报(自然科学版), 2024, 60(1): 18–25.

Wenjie Zi, Qingren Jia, Hao Chen, Jun Li, Ning Jing. Transformer based urban triangle mesh semantic segmentation method[J]. Journal of Nanjing University(Natural Sciences), 2024, 60(1): 18–25.

图/表 8

图1

图2

图3

图4

表1

表2

图5

表3

参考文献 27

1	王静远，李超，熊璋，等. 以数据为中心的智慧城市研究综述. 计算机研究与发展，2014，51(2)：239-259.
	Wang J Y， Li C， Ziong Z，et al. Survey of data?centric smart city. Journal of Computer Research and Development，2014，51(2)：239-259.
2	方勇，龚辉，张丽，等. 从全球激光点云到三维数字地球空间框架：全球精确测绘进阶之路. 激光与光电子学进展，2022，59(12)：1200002.
	Fang Y， Gong H， Zhang L，et al. From global laser point cloud acquisition to 3D digital geospatial framework：The advanced road of global accurate mapping. Laser & Optoelectronics Progress，2022，59(12)：1200002.
3	王晓宇，孙卡. 基于osgEarth的三维虚拟校园可视化. 计算机与现代化，2020(11)：89-93.
	Wang X Y， Sun K. Visualization of 3D virtual campus based on osgEarth. Computer and Modernization，2020(11)：89-93.
4	Gao W X， Nan L L， Boom B，et al. PSSNet：Planarity?sensible semantic segmentation of large?scale urban meshes. ISPRS Journal of Photogram?metry and Remote Sensing，2023，196：32-44.
5	Vaswani A， Shazeer N， Parmar N，et al. Attention is all you need∥Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach，CA，USA：Curran Associates Inc.，2017：6000-6010.
6	Gao W X， Nan L L， Boom B，et al. SUM：A benchmark dataset of semantic urban meshes. ISPRS Journal of Photogrammetry and Remote Sensing，2021，179：108-120.
7	Zhang J Y， Zhao X L， Chen Z，et al. A review of deep learning?based semantic segmentation for point cloud. IEEE Access，2019，7：179118-179133.
8	Sharp N， Attaiki S， Crane K，et al. DiffusionNet：Discretization agnostic learning on surfaces. ACM Transactions on Graphics，2022，41(3)：27.
9	Smirnov D， Solomon J. HodgeNet：Learning spectral geometry on triangle meshes. ACM Transactions on Graphics，2021，40(4)：166.
10	Sinha A， Bai J， Ramani K. Deep learning 3D shape surfaces using geometry images∥The 14^th European Conference on Computer Vision. Springer Berlin Heidelberg，2016：213-240.
11	Le T， Bui G， Duan Y. A multi?view recurrent neural network for 3D mesh segmentation. Computers & Graphics，2017，66：103-112.
12	Masci J， Boscaini D， Bronstein M M，et al. Geodesic convolutional neural networks on riemannian manifolds∥Proceedings of 2015 IEEE International Conference on Computer Vision workshop. Santiago，Chile：IEEE，2015：832-840.
13	He W C， Jiang Z， Zhang C M，et al. CurvaNet：Geometric deep learning based on directional curvature for 3D shape analysis∥Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Virtual Event：ACM，2020：2214-2224.
14	Lahav A， Tal A. MeshWalker：Deep mesh understanding by random walks. ACM Transactions on Graphics，2020，39(6)：263.
15	Hanocka R， Hertz A， Fish N，et al. MeshCNN：A network with an edge. ACM Transactions on Graphics，2019，38(4)：90.
16	Hu S M， Liu Z N， Guo M H，et al. Subdivision?based mesh convolution networks. ACM Transactions on Graphics，2022，41(3)：25.
17	Rouhani M， Lafarge F， Alliez P. Semantic segmentation of 3D textured meshes for urban scene analysis. ISPRS Journal of Photogrammetry and Remote Sensing，2017，123：124-139.
18	Minaee S， Kalchbrenner N， Cambria E，et al. Deep learning?based text classification：A comprehensive review. ACM Computing Surveys，2021，54(3)：62.
19	Camg?z N C， Koller O， Hadfield S，et al. Sign language transformers：Joint end?to?end sign language recognition and translation∥Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle，WA，USA：IEEE，2020：10020-10030.
20	Dosovitskiy A， Beyer L， Kolesnikov A，et al. An image is worth 16×16 words：Transformers for image recognition at scale. 2020，arXiv：.
21	Deng J， Dong W， Socher R，et al. ImageNet：A large?scale hierarchical image database∥2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami，FL，USA：IEEE，2009：248-255.
22	Carion N， Massa F， Synnaeve G，et al. End?to?end object detection with transformers∥The 16^th European Conference on Computer Vision. Springer Berlin Heidelberg，2020：213-229.
23	Chu X X， Tian Z， Wang Y Q，et al. Twins：Revisiting the design of spatial attention in vision transformers. 2021，arXiv:.
24	Gao D H， Zhang B， Wang Q，et al. SCAT：Stride Consistency with Auto?regressive regressor and Transformer for hand pose estimation∥Proceedings of 2021 IEEE/CVF International Conference on Computer Vision Workshops. Montreal，Canada：IEEE，2021：2266-2275.
25	d’Ascoli S， Touvron H， Leavitt M L，et al. ConViT：Improving vision transformers with soft convolutional inductive biases∥Proceedings of the 38th International Conference on Machine Learning. Vienna，Austria：Curran Associates Inc.，2021：2286-2296.
26	Tolstikhin I， Houlsby N， Kolesnikov A，et al. MLP?mixer：An all?MLP architecture for vision. 2021，arXiv:.
27	Thomas H， Qi C R， Deschaud J E，et al. KPConv：Flexible and deformable convolution for point clouds∥Proceedings of 2019 IEEE/CVF Inter?national Conference on Computer Vision. Seoul，Korea (South)：IEEE，2019：6410-6419.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

模型	F1	mIoU	mAcc	R
UMeT	0.805	0.692	0.931	0.745
MRF⁃RF	0.345	0.273	0.664	0.342
KPConv	0.408	0.273	0.527	0.436
SUM⁃RF	0.756	0.680	0.931	0.715

模型	地面	高植物	建筑物	水	车辆	船
UMeT	0.923	0.928	0.965	0.939	0.631	0.173
MRF⁃RF	0.054	0.924	0.782	0.004	0.390	0.001
KPConv	0.510	0.544	0.567	0.739	0.511	0.006
SUM⁃RF	0.915	0.929	0.960	0.937	0.612	0.165

模型	F1	mIoU	mAcc	R
MLP	0.691	0.596	0.903	0.701
UMeT	0.805	0.692	0.931	0.745

[1]	姚瑶, 杨吉斌, 张雄伟, 陈乐乐, 范君怡. 基于多维注意力机制的单通道语音增强方法[J]. 南京大学学报(自然科学版), 2023, 59(4): 669-679.
[2]	曲皓, 狄岚, 梁久祯, 刘昊. 双端输入型嵌套融合多尺度信息的织物瑕疵检测[J]. 南京大学学报(自然科学版), 2023, 59(3): 398-412.
[3]	谭嘉辰, 董永权, 张国玺. SSM: 基于孪生网络的糖尿病视网膜眼底图像分类模型[J]. 南京大学学报(自然科学版), 2023, 59(3): 425-434.
[4]	宋耀莲, 殷喜喆, 杨俊. 基于时空特征学习Transformer的运动想象脑电解码方法[J]. 南京大学学报(自然科学版), 2023, 59(2): 313-321.
[5]	唐伟佳, 张华, 侯志荣. 基于空间卷积融合的中文文本匹配方法[J]. 南京大学学报(自然科学版), 2022, 58(5): 868-875.
[6]	苏雅茜, 崔超然, 曲浩. 基于自注意力移动平均线的时间序列预测[J]. 南京大学学报(自然科学版), 2022, 58(4): 649-657.
[7]	井花花, 晏涛, 刘渊. 融合全局和局部特征的光场图像空间超分辨率算法[J]. 南京大学学报(自然科学版), 2022, 58(2): 298-308.
[8]	曾宪华, 陆宇喆, 童世玥, 徐黎明. 结合马尔科夫场和格拉姆矩阵特征的写实类图像风格迁移[J]. 南京大学学报(自然科学版), 2021, 57(1): 1-9.
[9]	胡　太, 杨　明. 结合目标检测的小目标语义分割算法[J]. 南京大学学报(自然科学版), 2019, 55(1): 73-84.
[10]	顾健伟, 曾　诚, 邹恩岑, 陈　扬, 沈　艺, 陆　悠, 奚雪峰. 基于双向注意力流和自注意力结合的机器阅读理解[J]. 南京大学学报(自然科学版), 2019, 55(1): 125-132.

基于Transformer的城市三角网格语义分割方法

Transformer based urban triangle mesh semantic segmentation method

RichHTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 27

相关文章 10

Metrics

本文评价

推荐阅读 0