南京大学学报(自然科学版) ›› 2020, Vol. 56 ›› Issue (2): 167–174.doi: 10.13232/j.cnki.jnju.2020.02.002

• • 上一篇    下一篇

基于FPGA的多卷积神经网络任务实时切换方法

赵子龙1,2,赵毅强1,2(),叶茂1,2   

  1. 1.天津大学微电子学院,天津,300072
    2.天津市成像与感知微电子技术重点实验室,天津大学,天津,300072
  • 收稿日期:2020-01-13 出版日期:2020-03-30 发布日期:2020-04-02
  • 通讯作者: 赵毅强 E-mail:yq_zhao@tju.edu.cn

Real⁃time switching method of multiple convolutional neural network tasks based on FPGA

Zilong Zhao1,2,Yiqiang Zhao1,2(),Mao Ye1,2   

  1. 1.School of Microelectronics,Tianjin University,Tianjin,300072,China
    2.Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology,Tianjin University,Tianjin,300072,China
  • Received:2020-01-13 Online:2020-03-30 Published:2020-04-02
  • Contact: Yiqiang Zhao E-mail:yq_zhao@tju.edu.cn

摘要:

使用硬件平台实现卷积神经网络的计算可以获得良好的加速效果和功耗,但由于卷积神经网络模型庞大、计算复杂、硬件平台资源有限,在实际应用中多个卷积神经网络任务之间只能串行计算,这导致系统在处理多个任务时的实时性较差.为提升硬件系统的实时性,提出一种多卷积神经网络任务实时切换方法.基于FPGA (Field Programmable Gate Array)平台进行卷积神经网络部署,根据功能划分系统模块.采用“任务序列+控制模块”的设计结构,控制系统根据卷积神经网络任务的优先级进行计算和切换;在计算模块中,复用可配置的卷积单元减少资源开销;提出一种多任务层级切换机制以提升系统的实时性.利用手写数字识别网络进行验证,实验结果表明:可配置的设计减少了除BRAM (Block Random Access Memory)外50%以上的资源开销;在50 MHz的工作频率下,FPGA的识别速度是CPU (Central Processing Unit)的4.51倍,功耗比为CPU的2.84倍;采用实时切换机制最快可使最高优先级任务提前57.26 ms被响应,提升了串行计算系统的实时性.

关键词: 现场可编程门阵列, 卷积神经网络, 多任务切换, 实时性系统

Abstract:

Using hardware platform to implement the calculation of CNN (Convolutional Neural Network) can obtain good acceleration effect and power consumption. However,due to the large CNN model,complex calculation and limited resource of hardware platform,multiple CNN tasks can only be calculated sequentially,which results in poor real?time performance of the system in processing multiple tasks. In order to improve the real?time performance of hardware system,a multiple CNN tasks switching method is proposed. The CNN is deployed based on the FPGA platform,and the system modules are divided according to functions. The design structure of "Task sequence+Control module" is adopted,and the control system calculates and switches according to the priority of CNN tasks. In the calculation module,the configurable convolution unit is reused to reduce the resource overhead. A multiple tasks layer switching mechanism is proposed to improve the real?time performance of the system. Using handwritten number recognition network for verification,the experiment results show that the configurable design reduces the resource overhead by more than 50% except BRAM. At 50 MHz working frequency,the recognition speed of FPGA is 4.51 times that of CPU (Central Processing Unit),and the power consumption ratio is 2.84 times of CPU. Using the real?time switching mechanism can make the highest priority task be responded 57.26 ms ahead of time,which improves the real?time performance of the serial computing system.

Key words: field programmable gate array, convolutional neural network, multiple tasks switching, real?time system

中图分类号: 

  • TN4

图1

手写数字识别网络结构"

图2

多卷积神经网络任务切换系统的架构"

图3

卷积神经网络任务序列"

图4

任务切换流程图"

图5

可配置的卷积单元"

图6

层级切换机制"

表1

资源使用情况"

资源使用(个)节省比例(%)
可配置非可配置
LUT269095455352.17
FF315926413150.74
BRAM9010816.67
DSP10021854.13

表2

FPGA和Intel Core i5?3340的计算速度对比"

FPGAIntel Core i5?3340加速比(倍)
计算精度计算时间(ms)计算精度计算时间(ms)
双精度浮点28.63双精度浮点129.204.51

图7

FPGA的功耗分布情况"

表3

FPGA和Intel Core i5?3340的功耗对比"

FPGAIntel Core i5?3340功耗比(倍)
工作频率(MHz)

功耗

(W)

工作频率(MHz)

功耗

(W)

504.44270011.032.84

表4

实时的和普通的层级切换机制的对比"

时间点(ms)实时的层级切换机制普通的层级切换机制
切换点可提前计算时间(ms)切换点可提前计算时间(ms)
0

立即

切换

57.26C1与S2层之间19.84
9.1048.16S2与C3层之间17.96
13.4343.83C3与S4层之间8.97
22.3534.91C5与FC层之间0.07
1 Ranjan R,Patel V M,Chellappa R.HyperFace:a deep multi?task learning framework for face detec?tion,landmark localization,pose estimation and gender recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(1):121-135.
2 Hinton G,Deng L,Yu D,et al.Deep neural networks for acoustic modeling in speech recognition:the shared views of four research groups.IEEE Signal Processing Magazine,2012,29(6):82-97.
3 Abdel?Hamid O,Mohamed A,Jiang H,et al.Convolutional neural networks for speech recognition.IEEE/ACM Transactions on Audio,Speech,and Language Processing,2014,22(10):1533-1545.
4 Krizhevsky A,Sutskever I,Hinton G E.ImageNet Classification with deep convolutional neural networks∥Proceedings of the 25th International Conference on Neural Information Processing Systems.Lake Tahoe,NV,USA:Curran Associates Inc.,2012:1097-1105.
5 He K M,Zhang X Y,Ren S Q,et al.Deep residual learning for image recognition∥2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,NV,USA:IEEE,2016:770-778.
6 吴艳霞,梁楷,刘颖等.深度学习FPGA加速器的进展与趋势.计算机学报,2019,41(11):2461-2480.
Wu Y X,Liang K,Liu Y,et al.The progress and trends of FPGA?based accelerators in deep learning. Chinese Journal of Computers,2019,41(11):2461-2480.
7 Alwani M,Chen H,Ferdman M,et al.Fused?layer CNN accelerators∥2016 49th Annual IEEE/ACM International Symposium on Microarchitecture.Taipei,China:IEEE,2016:1-12.
8 Simonyan K,Zisserman A.Very deep convolutional networks for large?scale image recognition.2014,arXiv:1409.1556.
9 Li Z S,Wang L,Guo S S,et al.Laius:an 8?Bit fixed?point CNN hardware inference engine∥2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications.Guangzhou,China:IEEE,2017:143-150.
10 Lecun Y,Bottou L,Bengio Y,et al.Gradient?based learning applied to document recognition.Proceedings of the IEEE,1998,86(11):2278-2324.
11 Bai L,Zhao Y M,Huang X M.A CNN accelerator on FPGA using depthwise separable convolution.IEEE Transactions on Circuits and Systems II:Express Briefs,2018,65(10):1415-1419.
12 Howard A G,Zhu M L,Chen B,et al.MobileNets:efficient convolutional neural networks for mobile vision applications.2017,arXiv:1704.04861.
13 Lee T Y,Lin N Y,Chen W C,et al.An efficient task placement method for reconfigurable FPGA systems∥2013 Seventh International Conference on Complex,Intelligent and Software Intensive Systems. Taichung,China:IEEE,2013:451-455.
14 Zhu Z W,Zhang J N,Zhao J J,et al.A hardware and software task?scheduling framework based on CPU+FPGA heterogeneous architecture in edge computing.IEEE Access,2019,7(1):148975-148988.
15 Hinton G E,Srivastava N,Krizhevsky A,et al.Improving neural networks by preventing co?adaptation of feature detectors.2012,arXiv:1207.0580.
16 张榜,来金梅.一种基于FPGA的卷积神经网络加速器的设计与实现.复旦学报(自然科学版),2018,57(2):236-242.
Zhang B,Lai J M.Design and implementation of a FPGA?based accelerator for convolutional neural networks. Journal of Fudan University (Natural Science),2018,57(2):236-242.
[1] 朱伟,张帅,辛晓燕,李文飞,王骏,张建,王炜. 结合区域检测和注意力机制的胸片自动定位与识别[J]. 南京大学学报(自然科学版), 2020, 56(4): 591-600.
[2] 梅志伟,王维东. 基于FPGA的卷积神经网络加速模块设计[J]. 南京大学学报(自然科学版), 2020, 56(4): 581-590.
[3] 王吉地,郭军军,黄于欣,高盛祥,余正涛,张亚飞. 融合依存信息和卷积神经网络的越南语新闻事件检测[J]. 南京大学学报(自然科学版), 2020, 56(1): 125-131.
[4] 狄 岚, 何锐波, 梁久祯. 基于可能性聚类和卷积神经网络的道路交通标识识别算法[J]. 南京大学学报(自然科学版), 2019, 55(2): 238-250.
[5] 安 晶, 艾 萍, 徐 森, 刘 聪, 夏建生, 刘大琨. 一种基于一维卷积神经网络的旋转机械智能故障诊断方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 133-142.
[6] 胡 太, 杨 明. 结合目标检测的小目标语义分割算法[J]. 南京大学学报(自然科学版), 2019, 55(1): 73-84.
[7] 梁蒙蒙1,周 涛1,2*,夏 勇3,张飞飞1,杨 健1. 基于随机化融合和CNN的多模态肺部肿瘤图像识别[J]. 南京大学学报(自然科学版), 2018, 54(4): 775-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 林 銮,陆武萍,唐朝生,赵红崴,冷 挺,李胜杰. 基于计算机图像处理技术的松散砂性土微观结构定量分析方法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1064 -1074 .
[2] 段新春,施 斌,孙梦雅,魏广庆,顾 凯,冯晨曦. FBG蒸发式湿度计研制及其响应特性研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1075 -1084 .
[3] 梅世嘉,施 斌,曹鼎峰,魏广庆,张 岩,郝 瑞. 基于AHFO方法的Green-Ampt模型K0取值试验研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1085 -1094 .
[4] 卢 毅,于 军,龚绪龙,王宝军,魏广庆,季峻峰. 基于DFOS的连云港第四纪地层地面沉降监测分析[J]. 南京大学学报(自然科学版), 2018, 54(6): 1114 -1123 .
[5] 胡 淼,王开军,李海超,陈黎飞. 模糊树节点的随机森林与异常点检测[J]. 南京大学学报(自然科学版), 2018, 54(6): 1141 -1151 .
[6] 洪思思,曹辰捷,王 喆*,李冬冬. 基于矩阵的AdaBoost多视角学习[J]. 南京大学学报(自然科学版), 2018, 54(6): 1152 -1160 .
[7] 魏 桐,童向荣. 基于加权启发式搜索的鲁棒性信任路径生成[J]. 南京大学学报(自然科学版), 2018, 54(6): 1161 -1170 .
[8] 秦 娅, 申国伟, 赵文波, 陈艳平. 基于深度神经网络的网络安全实体识别方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 29 -40 .
[9] 陆慎涛, 葛洪伟, 周 竞. 自动确定聚类中心的移动时间势能聚类算法[J]. 南京大学学报(自然科学版), 2019, 55(1): 143 -153 .
[10] 仲昭朝, 邹 婷, 唐惠炜, 庄 重, 张 臻. 铜胁迫对蚕豆根尖细胞凋亡及线粒体功能的影响[J]. 南京大学学报(自然科学版), 2019, 55(1): 154 -160 .