D2D通信中基于Q学习的联合资源分配与功率控制算法

doi:10.13232/j.cnki.jnju.2018.06.014

南京大学学报(自然科学版) ›› 2018, Vol. 54 ›› Issue (6): 1183–1192.doi: 10.13232/j.cnki.jnju.2018.06.014

D2D通信中基于Q学习的联合资源分配与功率控制算法

王　倩¹，聂秀山¹，耿蕾蕾¹，尹义龙²*

1.山东财经大学计算机科学与技术学院，济南，250014；2.山东大学软件学院，济南，250101

接受日期:2018-10-17 出版日期:2018-12-01 发布日期:2018-12-01
通讯作者: 尹义龙, ylyin@sdu.edu.cn E-mail:ylyin@sdu.edu.cn
基金资助:
国家自然科学基金(61573219，61671274)，山东省重点研发计划(2017CXGC1504)，山东省自然科学基金(ZR2017MF053)，中国博士后科学基金面上项目(2016M602141)，山东省高校优势学科人才团队培育计划

Joint resource allocation and power control strategy based on Q－Learning method in cellular D2D network

Wang Qian¹，Nie Xiushan¹，Geng Leilei¹，Yin Yilong²*

1.College of Computer Science and Technology，Shandong University of Finance and Economics，Ji’nan，250014，China；2.School of Software，Shandong University，Ji’nan，250101，China

Accepted:2018-10-17 Online:2018-12-01 Published:2018-12-01
Contact: Yin Yilong, ylyin@sdu.edu.cn E-mail:ylyin@sdu.edu.cn

摘要/Abstract

摘要： D2D(Device to Device)通信可实现距离相近的用户设备直接通信，有效地提升系统的吞吐量，获得高频谱效率和能量效率，但D2D通信共享蜂窝网络频谱资源时，会造成蜂窝网络与D2D链路严重的层间干扰. 为减少层间干扰带来的影响，提出一种基于Q学习的联合资源分配与功率控制算法. 从Q学习的角度来构建数学模型，将蜂窝网络中的多个D2D用户对视为多智能体学习者，利用历史状态(历史吞吐量和功率值)，不需要精确的信道状态信息(Channel State Information，CSI)和互干扰等先验知识，通过Q学习算法，学习得到分布式的信道选择和功率控制的联合最优策略. 可以动态调整D2D用户功率，在保证蜂窝用户服务质量的前提下，通过D2D功率控制获得最大化系统吞吐量. 仿真结果表明，基于Q学习的联合资源分配与功率控制的算法有效提高了系统的吞吐量．

关键词: Q学习, D2D通信, 资源分配, 功率控制

Abstract: In D2D(Device to Device)with underlay cellular networks，D2D users can reuse the spectrum resources with cellular users to improve the system spectrum efficiency. However，it can cause sever co－channel interference between cellular users and D2D users. In order to coordinate the interference caused by the reuse of spectrum resources，a joint resource allocation and power control strategy based on Q－Learning method is proposed in this paper. Based on Q learning，multiple D2D users in the cellular network act as the multi－agent learner，and the Q value table of the system throughput is obtained by learning the historical state. With dynamic power control for D2D users，the maximum of Q action is achieved. We can obtain the joint optimal strategy of the channel selection and power control. Under guaranteeing the service quality of cellular users，the system throughput can be maximized by D2D users power control. During the Q－Learning process，it is not the requirement to know the exact channel state information(CSI)and mutual interference between the D2D terminal and the base station. Simulation results show that the proposed scheme can improve the system throughput obviously.

Key words: Q－Learning, D2D communication, resource allocation, power control

中图分类号:

TP391

王　倩，聂秀山，耿蕾蕾，尹义龙. D2D通信中基于Q学习的联合资源分配与功率控制算法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1183–1192.

Wang Qian，Nie Xiushan，Geng Leilei，Yin Yilong. Joint resource allocation and power control strategy based on Q－Learning method in cellular D2D network[J]. Journal of Nanjing University(Natural Sciences), 2018, 54(6): 1183–1192.

参考文献

[1]　Shafi M，Molisch A F，Smith P J，et al. 5G：A tutorial overview of standards，trials，challenges，deployment，and practice. IEEE Journal on Selected Areas in Communications，2017，35(6)：1201－1221.
[2] Kong L H，Khan M K，Wu F，et al. Millimeter－wave wireless communications for IoT－cloud supported autonomous vehicles：Overview，design，and challenges. IEEE Communications Magazine，2017，55(1)：62－68.
[3] Lu L，Li G Y，Swindlehurst A L，et al. An overview of massive MIMO：Benefits and challenges. IEEE Journal of Selected Topics in Signal Processing，2014，8(5)：742－758.
[4] Huang P H，Kao H，Liao W J. Cross－tier cooperation for optimal resource utilization in ultra－dense heterogeneous networks. IEEE Transactions on Vehicular Technology，2017，66(12)：11193－11207.
[5] Orsino A，Ometov A，Fodor G，et al. Effects of heterogeneous mobility on D2D－and drone－assisted mission－critical MTC in 5G. IEEE Communications Magazine，2017，55(2)：79－87.
[6] Wang X F，Zhang Y H，Leung V C M，et al. D2D big data：Content deliveries over wireless device－to－device sharing in large－scale mobile networks. IEEE Wireless Communications，2018，25(1)：32－38.
[7] Wang L，Tang H，Wu H Q，et al. Resource allocation for D2D communications underlay in Rayleigh fading channels. IEEE Transactions on Vehicular Technology，2017，66(2)：1159－1170.
[8] Salehi M，Mohammadi A，Haenggi M. Analysis of D2D underlaid cellular networks：SIR meta distribution and mean local delay. IEEE Transactions on Communications，2017，65(7)：2904－2916.
　 [9] Sun P，Shin K G，Zhang H L，et al. Transmit power control for D2D－underlaid cellular networks based on statistical features. IEEE Transactions on Vehicular Technology，2017，66(5)：4110－4119.
[10] Yang C G，Li J D，Semasinghe P，et al. Distributed interference and energy－aware power control for ultra－dense D2D networks：A mean field game. IEEE Transactions on Wireless Communications，2017，16(2)：1205－1217.
[11] Xu H，Huang N，Yang Z H，et al. Pilot allocation and power control in D2D underlay massive MIMO systems. IEEE Communications Letters，2017，21(1)：112－115.
[12] Li J D，Huang S. Delay－aware power control for D2D communication with successive interference cancellation and hybrid energy source. IEEE Wireless Communications Letters，2017，6(6)：806－809.
[13] Wang Q，Wang W，Jin S，et al. Quality－optimized joint source selection and power control for wireless multimedia D2D communication using Stackelberg game. IEEE Transactions on Vehicular Technology，2015，64(8)：3755－3769.
[14] Ren Y，Liu F Q，Liu Z，et al. Power control in D2D－based vehicular communication networks. IEEE Transactions on Vehicular Technology，2015，64(12)：5547－5562.
[15] Huang Y，Nasir A A，Durrani S，et al. Mode selection，resource allocation，and power control for D2D－enabled two－tier cellular network. IEEE Transactions on Communications，2016，64(8)：3534－3547.
[16] Lin M，Ouyang J，Zhu W P. Joint beamforming and power control for device－to－device communications underlaying cellular networks. IEEE Journal on Selected Areas in Communications，2016，34(1)：138－150.
[17] Chen H，Li Y H，Jiang Y X，et al. Distributed power splitting for SWIPT in relay interference channels using game theory. IEEE Transactions on Wireless Communications，2015，14(1)：410－420.
[18] Sakr A H，Hossain E. Cognitive and energy harvesting－based D2D communication in cellular networks：Stochastic geometry modeling and analysis. IEEE Transactions on Communications，2015，63(5)：1867－1880.
[19] Ni W，Collings I B，Lipman J，et al. Graph theory and its applications to future network planning：Software－defined online small cell management. IEEE Wireless Communications，2015，22(1)：52－60.
[20] Alfa A S，Maharaj B T，Lall S，et al. Mixed－integer programming based techniques for resource allocation in underlay cognitive radio networks：A survey. Journal of Communications and Networks，2016，18(5)：744－761.
[21] Chen Q M，Yu G D，Shan H G，et al. Cellular meets WiFi：Traffic offloading or resource sharing？IEEE Transactions on Wireless Communications，2016，15(5)：3354－3367.
[22] Zhang X M，Zhang Y，Yan F，et al. Interference－based topology control algorithm for delay－constrained mobile ad hoc networks. IEEE Transactions on Mobile Computing，2015，14(4)：742－754.
[23] Ji M Y，Caire G，Molisch A F. Wireless device－to－device caching networks：Basic principles and system performance. IEEE Journal on Selected Areas in Communications，2016，34(1)：176－189.
[24] Maghsudi S，Stańczak S. Joint channel selection and power control in infrastructureless wireless networks：A multiplayer multiarmed bandit framework. IEEE Transactions on Vehicular Technology，2015，64(10)：4565－4578.
[25] Watkins C J C H，Dayan P. Q－learning. Machine Learning，1992，8(3－4)：279－292.
[26] Jiang C X，Zhang H J，Ren Y，et al. Machine learning paradigms for next－generation wireless networks. IEEE Wireless Communications，2017，24(2)：98－105.
[27] Peng H X，Li D Z，Abboud K，et al. Performance analysis of IEEE 802.11p DCF for multiplatooning communications with autono－mous vehicles. IEEE Transactions on Vehicular Technology，2017，66(3)：2485－2498.
[28] Maghsudi S，Stańczak S. Channel selection for network－assisted D2D communication via no－regret bandit learning with calibrated forecasting. IEEE Transactions on Wireless Communications，2015，14(3)：1309－1322.
[29] Sutton R S，Barto A G. Reinforcement learning：An introduction. Cambridge：MIT Press，1998，322.
[30] Ding Z G，Lei X F，Karagiannidis G K，et al. A survey on non－orthogonal multiple access for 5G networks：Research challenges and future trends. IEEE Journal on Selected Areas in Communications，2017，35(10)：2181－2195.
[31] Gao C H，Li Y，Zhao Y L，et al. A two－level game theory approach for joint relay selection and resource allocation in network coding assisted D2D communications. IEEE Transactions on Mobile Computing，2017，16(10)：2697－2711.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

D2D通信中基于Q学习的联合资源分配与功率控制算法

Joint resource allocation and power control strategy based on Q－Learning method in cellular D2D network

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

Metrics

本文评价

推荐阅读 10

[1]	李君科,郭兵,李明江,郭勇,周锦程,李德光. BATS:异构系统环境下的任务调度节能方法[J]. 南京大学学报(自然科学版), 2020, 56(2): 206-215.
[2]	施伟嘉，王少尉*. 基于OFDM的认知无线网络资源分配[J]. 南京大学学报(自然科学版), 2014, 50(3): 342-.
[3]	唐岚，张兴敢，柏业超*. 在中继选择系统中的发送功率和调制方式的联合优化方案[J]. 南京大学学报(自然科学版), 2014, 50(3): 350-.