南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (5): 733739.doi: 10.13232/j.cnki.jnju.2019.05.004
摘要:
神经网络在嵌入式端的应用日益广泛,为满足嵌入式端低功耗,低延迟等特点,通常的解决方案是针对长短记忆序列LSTM模型(Long?Short Term Memory)进行压缩,并定制专用的硬件加速器.当LSTM模型经过剪枝等压缩操作后,其网络模型将变得稀疏且不规则,会给PE(Process Element)运算单元带来负载不均衡的问题.通过排序的方法,将权重矩阵按一定的规则重新分发给各个PE单元,并在此基础上针对稀疏化的模型定制专用的硬件单元.在赛灵思zynq系列XCZU9EG?2FFVB1156E开发板上进行实验,实验结果显示,当PE单元多消耗0.314%硬件资源的情况下,其运算速度取得了2%的提升.
中图分类号:
1 | Graves A , Mohamed A R , Hinton G . Speech recognition with deep recurrent neural networks∥Proceedings of the 2013 IEEE International Conference on Acoustics,Speech and Signal Processing. Vancouver,Canada:IEEE,2013:6645-6649. |
2 | Rybalkin V , Wehn N , Yousefi M R ,et al . Hardware architecture of bidirectional long short?term memory neural network for optical character recognition∥Design,Automation & Test in Europe Conference & Exhibition (DATE),2017.Lausanne,Switzerland:IEEE,2017. |
3 | Cho K , Van Merrienboer B , Gulcehre C ,et al . Learning phrase representations using RNN encoder?decoder for statistical machine translation. arXiv:1406.1078,2014. |
4 | Cho K , Van Merrienboerning B , Gulcehre C ,et al .Learning phrase representations using RNN encoder?decoder for statistical machine translation. arXiv:1406.1078,2014. |
5 | Ye J M , Wang L N , Li G X ,et al . Learning compact recurrent neural networks with block?term tensor decomposition. arXiv:1712.05134,2018. |
6 | Hochreiter S , Schmidhuber J . Long short?term memory. Neural Computation,1997,9(8):1735-1780. |
7 | Guan Y J , Yuan Z H , Sun G Y ,et al . FPGA?based accelerator for long short?term memory recurrent neural networks∥2017 22nd Asia and South Pacific Design Automation Conference (ASP?DAC). Chiba,Japan:IEEE,2017:629-634. |
8 | Ouyang P , Yin S Y , Wei S J . A fast and power efficient architecture to parallelize LSTM based RNN for cognitive intelligence applications∥2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC). Austin,TX,USA:IEEE,2017:1-63. |
9 | Han S , Kang J L , Mao H Z ,et al . ESE:Efficient speech recognition engine with sparse LSTM on FPGA∥Proceedings of the 2017 ACM/SIGDA International Symposium on Field?Programmable Gate Arrays. Monterey,CA,USA:ACM,2017. |
10 | Han S , Liu X Y , Mao H Z ,et al . EIE:Efficient inference engine on compressed deep neural network∥2016 ACM/IEEE 43rd Annual Interna?tional Symposium on Computer Architecture (ISCA). Seoul,South Korea:IEEE,2016:243-254. |
11 | Wang Z S , Lin J , Wang Z F . Accelerating recurrent neural networks: A memory?efficient approach. IEEE Transactions on Very Large Scale Integration (VLSI) Systems,2017,25(10):2763-2775. |
12 | Wang S , Li Z , Ding C W ,et al . C?LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs∥Proceedings of the 2018 ACM/SIGDA International Sympo?sium on Field?Programmable Gate Arrays. Monterey,CA,USA:ACM,2017. |
13 | Conti F , Cavigelli L , Paulin G ,et al . CHIPMUNK:A systolically scalable 0.9 mm2,3.08 Gop/s/mW @1.2mW accelerator for near?sensor recurrent neural network inference. arXiv: 1711.05734,2018. |
14 | Liu B , Dong W , Xu T T ,et al . E?ERA:An energy?efficient reconfigurable architecture for RNNs using dynamically adaptive approximate computing. IEICE Electronics Express,2017,14(15):20170637. |
15 | Shin D , Lee J , Lee J ,et al . 14.2 DNPU:An 8.1TOPS/W reconfigurable CNN?RNN processor for general?purpose deep neural networks∥2017 IEEE International Solid?State Circuits Conference (ISSCC). San Francisco ,CA,USA:IEEE,2017. |
16 | Park J , Kung J , Yi W ,et al . Maximizing system performance by balancing computation loads in LSTM accelerators∥2018 Design,Automation & Test in Europe Conference & Exhibition (DATE). Dresden,Germany:IEEE,2018. |
No related articles found! |
|