南京大学学报(自然科学版) ›› 2018, Vol. 54 ›› Issue (6): 1183–1192.doi: 10.13232/j.cnki.jnju.2018.06.014

• • 上一篇    下一篇

D2D通信中基于Q学习的联合资源分配与功率控制算法

王 倩1,聂秀山1,耿蕾蕾1,尹义龙2*   

  1. 1.山东财经大学计算机科学与技术学院,济南,250014;2.山东大学软件学院,济南,250101
  • 接受日期:2018-10-17 出版日期:2018-12-01 发布日期:2018-12-01
  • 通讯作者: 尹义龙, ylyin@sdu.edu.cn E-mail:ylyin@sdu.edu.cn
  • 基金资助:
    国家自然科学基金(61573219,61671274),山东省重点研发计划(2017CXGC1504),山东省自然科学基金(ZR2017MF053),中国博士后科学基金面上项目(2016M602141),山东省高校优势学科人才团队培育计划

Joint resource allocation and power control strategy based on Q-Learning method in cellular D2D network

Wang Qian1,Nie Xiushan1,Geng Leilei1,Yin Yilong2*   

  1. 1.College of Computer Science and Technology,Shandong University of Finance and Economics,Ji’nan,250014,China;2.School of Software,Shandong University,Ji’nan,250101,China
  • Accepted:2018-10-17 Online:2018-12-01 Published:2018-12-01
  • Contact: Yin Yilong, ylyin@sdu.edu.cn E-mail:ylyin@sdu.edu.cn

摘要: D2D(Device to Device)通信可实现距离相近的用户设备直接通信,有效地提升系统的吞吐量,获得高频谱效率和能量效率,但D2D通信共享蜂窝网络频谱资源时,会造成蜂窝网络与D2D链路严重的层间干扰. 为减少层间干扰带来的影响,提出一种基于Q学习的联合资源分配与功率控制算法. 从Q学习的角度来构建数学模型,将蜂窝网络中的多个D2D用户对视为多智能体学习者,利用历史状态(历史吞吐量和功率值),不需要精确的信道状态信息(Channel State Information,CSI)和互干扰等先验知识,通过Q学习算法,学习得到分布式的信道选择和功率控制的联合最优策略. 可以动态调整D2D用户功率,在保证蜂窝用户服务质量的前提下,通过D2D功率控制获得最大化系统吞吐量. 仿真结果表明,基于Q学习的联合资源分配与功率控制的算法有效提高了系统的吞吐量.

关键词: Q学习, D2D通信, 资源分配, 功率控制

Abstract: In D2D(Device to Device)with underlay cellular networks,D2D users can reuse the spectrum resources with cellular users to improve the system spectrum efficiency. However,it can cause sever co-channel interference between cellular users and D2D users. In order to coordinate the interference caused by the reuse of spectrum resources,a joint resource allocation and power control strategy based on Q-Learning method is proposed in this paper. Based on Q learning,multiple D2D users in the cellular network act as the multi-agent learner,and the Q value table of the system throughput is obtained by learning the historical state. With dynamic power control for D2D users,the maximum of Q action is achieved. We can obtain the joint optimal strategy of the channel selection and power control. Under guaranteeing the service quality of cellular users,the system throughput can be maximized by D2D users power control. During the Q-Learning process,it is not the requirement to know the exact channel state information(CSI)and mutual interference between the D2D terminal and the base station. Simulation results show that the proposed scheme can improve the system throughput obviously.

Key words: Q-Learning, D2D communication, resource allocation, power control

中图分类号: 

  • TP391
[1] Shafi M,Molisch A F,Smith P J,et al. 5G:A tutorial overview of standards,trials,challenges,deployment,and practice. IEEE Journal on Selected Areas in Communications,2017,35(6):1201-1221.
[2] Kong L H,Khan M K,Wu F,et al. Millimeter-wave wireless communications for IoT-cloud supported autonomous vehicles:Overview,design,and challenges. IEEE Communications Magazine,2017,55(1):62-68.
[3] Lu L,Li G Y,Swindlehurst A L,et al. An overview of massive MIMO:Benefits and challenges. IEEE Journal of Selected Topics in Signal Processing,2014,8(5):742-758.
[4] Huang P H,Kao H,Liao W J. Cross-tier cooperation for optimal resource utilization in ultra-dense heterogeneous networks. IEEE Transactions on Vehicular Technology,2017,66(12):11193-11207.
[5] Orsino A,Ometov A,Fodor G,et al. Effects of heterogeneous mobility on D2D-and drone-assisted mission-critical MTC in 5G. IEEE Communications Magazine,2017,55(2):79-87.
[6] Wang X F,Zhang Y H,Leung V C M,et al. D2D big data:Content deliveries over wireless device-to-device sharing in large-scale mobile networks. IEEE Wireless Communications,2018,25(1):32-38.
[7] Wang L,Tang H,Wu H Q,et al. Resource allocation for D2D communications underlay in Rayleigh fading channels. IEEE Transactions on Vehicular Technology,2017,66(2):1159-1170.
[8] Salehi M,Mohammadi A,Haenggi M. Analysis of D2D underlaid cellular networks:SIR meta distribution and mean local delay. IEEE Transactions on Communications,2017,65(7):2904-2916.
  [9] Sun P,Shin K G,Zhang H L,et al. Transmit power control for D2D-underlaid cellular networks based on statistical features. IEEE Transactions on Vehicular Technology,2017,66(5):4110-4119.
[10] Yang C G,Li J D,Semasinghe P,et al. Distributed interference and energy-aware power control for ultra-dense D2D networks:A mean field game. IEEE Transactions on Wireless Communications,2017,16(2):1205-1217.
[11] Xu H,Huang N,Yang Z H,et al. Pilot allocation and power control in D2D underlay massive MIMO systems. IEEE Communications Letters,2017,21(1):112-115.
[12] Li J D,Huang S. Delay-aware power control for D2D communication with successive interference cancellation and hybrid energy source. IEEE Wireless Communications Letters,2017,6(6):806-809.
[13] Wang Q,Wang W,Jin S,et al. Quality-optimized joint source selection and power control for wireless multimedia D2D communication using Stackelberg game. IEEE Transactions on Vehicular Technology,2015,64(8):3755-3769.
[14] Ren Y,Liu F Q,Liu Z,et al. Power control in D2D-based vehicular communication networks. IEEE Transactions on Vehicular Technology,2015,64(12):5547-5562.
[15] Huang Y,Nasir A A,Durrani S,et al. Mode selection,resource allocation,and power control for D2D-enabled two-tier cellular network. IEEE Transactions on Communications,2016,64(8):3534-3547.
[16] Lin M,Ouyang J,Zhu W P. Joint beamforming and power control for device-to-device communications underlaying cellular networks. IEEE Journal on Selected Areas in Communications,2016,34(1):138-150.
[17] Chen H,Li Y H,Jiang Y X,et al. Distributed power splitting for SWIPT in relay interference channels using game theory. IEEE Transactions on Wireless Communications,2015,14(1):410-420.
[18] Sakr A H,Hossain E. Cognitive and energy harvesting-based D2D communication in cellular networks:Stochastic geometry modeling and analysis. IEEE Transactions on Communications,2015,63(5):1867-1880.
[19] Ni W,Collings I B,Lipman J,et al. Graph theory and its applications to future network planning:Software-defined online small cell management. IEEE Wireless Communications,2015,22(1):52-60.
[20] Alfa A S,Maharaj B T,Lall S,et al. Mixed-integer programming based techniques for resource allocation in underlay cognitive radio networks:A survey. Journal of Communications and Networks,2016,18(5):744-761.
[21] Chen Q M,Yu G D,Shan H G,et al. Cellular meets WiFi:Traffic offloading or resource sharing?IEEE Transactions on Wireless Communications,2016,15(5):3354-3367.
[22] Zhang X M,Zhang Y,Yan F,et al. Interference-based topology control algorithm for delay-constrained mobile ad hoc networks. IEEE Transactions on Mobile Computing,2015,14(4):742-754.
[23] Ji M Y,Caire G,Molisch A F. Wireless device-to-device caching networks:Basic principles and system performance. IEEE Journal on Selected Areas in Communications,2016,34(1):176-189.
[24] Maghsudi S,Stańczak S. Joint channel selection and power control in infrastructureless wireless networks:A multiplayer multiarmed bandit framework. IEEE Transactions on Vehicular Technology,2015,64(10):4565-4578.
[25] Watkins C J C H,Dayan P. Q-learning. Machine Learning,1992,8(3-4):279-292.
[26] Jiang C X,Zhang H J,Ren Y,et al. Machine learning paradigms for next-generation wireless networks. IEEE Wireless Communications,2017,24(2):98-105.
[27] Peng H X,Li D Z,Abboud K,et al. Performance analysis of IEEE 802.11p DCF for multiplatooning communications with autono-mous vehicles. IEEE Transactions on Vehicular Technology,2017,66(3):2485-2498.
[28] Maghsudi S,Stańczak S. Channel selection for network-assisted D2D communication via no-regret bandit learning with calibrated forecasting. IEEE Transactions on Wireless Communications,2015,14(3):1309-1322.
[29] Sutton R S,Barto A G. Reinforcement learning:An introduction. Cambridge:MIT Press,1998,322.
[30] Ding Z G,Lei X F,Karagiannidis G K,et al. A survey on non-orthogonal multiple access for 5G networks:Research challenges and future trends. IEEE Journal on Selected Areas in Communications,2017,35(10):2181-2195.
[31] Gao C H,Li Y,Zhao Y L,et al. A two-level game theory approach for joint relay selection and resource allocation in network coding assisted D2D communications. IEEE Transactions on Mobile Computing,2017,16(10):2697-2711.
[1] 李君科,郭兵,李明江,郭勇,周锦程,李德光. BATS:异构系统环境下的任务调度节能方法[J]. 南京大学学报(自然科学版), 2020, 56(2): 206-215.
[2] 施伟嘉,王少尉*. 基于OFDM的认知无线网络资源分配[J]. 南京大学学报(自然科学版), 2014, 50(3): 342-.
[3] 唐岚,张兴敢,柏业超*. 在中继选择系统中的发送功率和调制方式的联合优化方案[J]. 南京大学学报(自然科学版), 2014, 50(3): 350-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 许 林,张 巍*,梁小龙,肖 瑞,曹剑秋. 岩土介质孔隙结构参数灰色关联度分析[J]. 南京大学学报(自然科学版), 2018, 54(6): 1105 -1113 .
[2] 卢 毅,于 军,龚绪龙,王宝军,魏广庆,季峻峰. 基于DFOS的连云港第四纪地层地面沉降监测分析[J]. 南京大学学报(自然科学版), 2018, 54(6): 1114 -1123 .
[3] 孔 颉, 孙权森, 纪则轩, 刘亚洲. 基于仿射不变离散哈希的遥感图像快速目标检测新方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 49 -60 .
[4] 顾健伟, 曾 诚, 邹恩岑, 陈 扬, 沈 艺, 陆 悠, 奚雪峰. 基于双向注意力流和自注意力结合的机器阅读理解[J]. 南京大学学报(自然科学版), 2019, 55(1): 125 -132 .
[5] 安 晶, 艾 萍, 徐 森, 刘 聪, 夏建生, 刘大琨. 一种基于一维卷积神经网络的旋转机械智能故障诊断方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 133 -142 .
[6] 王蔚, 胡婷婷, 冯亚琴. 基于深度学习的自然与表演语音情感识别[J]. 南京大学学报(自然科学版), 2019, 55(4): 660 -666 .
[7] 齐小刚, 强敏, 刘立芳. RSboFMC:提高数据可用性和负载均衡性的碎片矩阵缓存策略[J]. 南京大学学报(自然科学版), 2019, 55(4): 667 -677 .
[8] 王博闻, 史江峰, 史逝远, 张伟杰, 马晓琦, 赵业思. 基于遥感数据定位老龄树群[J]. 南京大学学报(自然科学版), 2019, 55(4): 699 -707 .
[9] 马益平,严浩军,王琼京,赵亚云,张秋菊,孔春龙. 混合配体法合成氨基MIL⁃101(Cr)及其二氧化碳吸附和除湿性能[J]. 南京大学学报(自然科学版), 2019, 55(5): 840 -849 .
[10] 黄华娟,韦修喜. 基于自适应调节极大熵的孪生支持向量回归机[J]. 南京大学学报(自然科学版), 2019, 55(6): 1030 -1039 .