南京大学学报(自然科学版) ›› 2018, Vol. 54 ›› Issue (3): 604611.
陈列蕾,方 晖*
Chen Lielei, Fang Hui*
摘要: 客观准确的关键词能够帮助电子数据库对科研文献进行分类,也能帮助研究人员缩小文献检索的范围。提出基于TFIDF与Scopus数据库检索的方法自动提取英文科研文献的关键词,将Scopus数据库包含的所有文档作为语料库,并利用Scopus API实现库内自动检索。相对于传统的人工建立并标记语料库,该方法更方便,可用数据更丰富。该方法利用摘要冗余信息量少的特点,结合全文信息的统计特征从摘要中提取关键词;考虑并建立了摘要的结构特征词,通过统计引入了短语的位置特征并进行加权,还扩展了两类停用词库用于过滤干扰词。实验结果表明该方法具有较好的性能。
[1] Kumar N, Srinathan K. Automatic keyphrase extraction from scientific documents using N-gram filtration technique // Proceedings of the 8th ACM Symposium on Document Engineering. Sao Paulo, Brazil: ACM, 2008: 199-208. [2] Turney P D. Learning algorithms for keyphrase extraction. Information Retrieval, 2000, 2(4): 303-336. [3] Frank E, Paynter G W, Witten I H, et al. Domain-specific keyphrase extraction // Proceedings of the 16th International Joint Conference on Artificial Intelligence. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1999: 668-673. [4] Jiang X, Hu Y H, Li H. A ranking approach to keyphrase extraction // Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Boston, MA, USA: ACM, 2009: 756-757. [5] Nguyen T D, Kan M Y. Keyphrase extraction in scientific publications // Proceedings of the 10th International Conference on Asian Digital Libraries: Looking Back 10 Years and Forging New Frontiers. Hanoi, Vietnam: Springer-Verlag, 2007: 317-326. [6] Wan X J, Xiao J G. Single document keyphrase extraction using neighborhood knowledge // Proceedings of the 23rd AAAI Conference on Artificial Intelligence. Chicago, IL, USA: AAAI Press, 2008: 855-860. [7] Ercan G, Cicekli I. Using lexical chains for keyword extraction. Information Processing & Management, 2007, 43(6): 1705-1714. [8] Mihalcea R, Tarau P. TextRank: Bringing order into texts // Proceedings of 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain: DBLP, 2004: 404-411. [9] Liu Z Y, Huang W Y, Zheng Y B, et al. Automatic keyphrase extraction via topic decomposition // Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Cambridge, MA, USA: Association for Computational Linguistics, 2010: 366-376. [10] Liu Z Y, Li P, Zheng Y B, et al. Clustering to find exemplar terms for keyphrase extraction // Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Singapore, Republic of Singapore: Association for Computational Linguistics, 2009, 1: 257-266. [11] Haddoud M, Abdeddaïm S. Accurate keyphrase extraction by discriminating overlapping phrases. Journal of Information Science, 2014, 40(4): 488-500. [12] Rose S, Engel D, Cramer N, et al. Automatic keyword extraction from individual documents // Berry M W, Kogan J. Text Mining: Application and Theory. Hoboken, NJ, USA: John Wiley & Sons, 2010: 1-20. [13] Kim S N, Kan M Y. Re-examining automatic keyphrase extraction approaches in scientific articles // Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications. Suntec, Singapore, Republic of Singapore: Association for Computational Linguistics, 2009: 9-16. [14] Santorini B. Part-of-speech tagging guidelines for the penn treebank project. Annual Meeting of ACl, 1990, 22(10): 88-96. [15] Toutanova K, Klein D, Manning C D, et al. Feature-rich part-of-speech tagging with a cyclic dependency network // Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Edmonton, Canada: Association for Computational Linguistics, 2003: 173-180. [16] Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 1988, 24(5): 513-523. [17] 施聪莺, 徐朝军, 杨晓江. TFIDF算法研究综述. 计算机应用, 2009, 29(S1): 167-170, 180. (Shi C Y, Xu C J, Yang X J. Study of TFIDF algorithm. Journal of Computer Applications, 2009, 29(S1): 167-170.) [18] Milas-Bracovi? M, Zajec J. Author abstracts of research articles published in scholarly journals in Croatia (Yugoslavia): An evaluation. Libri, 1989, 39(4): 303-318. [19] Endres-Niggemeyer B, Maier E, Sigel A. How to implement a naturalistic model of abstracting: four core working steps of an expert abstractor. Information Processing & Management, 1995, 31(5): 631-674. [20] Salager-Meyer F. Discoursal flaws in Medical English abstracts: A genre analysis per research-and text-type. Text & Talk, 1990, 10(4): 365-384. [21] Hartley J, Betts L. Common weaknesses in traditional abstracts in the social sciences. Journal of the American Society for Information Science and Technology, 2009, 60(10): 2010-2018. [22] Jamar N, ?auperl A, Bawden D. The components of abstracts: The logical structure of abstracts in the areas of materials science and technology and of library and information science. New Library World, 2014, 115(1-2): 15-33. [23] Kanoksilapatham B. Generic characterisation of civil engineering research article abstracts. 3L: The Southeast Asian Journal of English Language Studies, 2013, 19(3): 1-10. [24] Elsevier Developers. Elsevier Scopus APIs. https://dev.elsevier.com/sc_apis.html. [25] Elsevier Developers. API key settings. https://dev.elsevier.com/api_key_settings.html. [26] Liu H X, Goulding J, Brailsford T. Towards computation of novel ideas from corpora of scientific text // Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, 2015: 541-556. [27] Elsevier. Scopus: Access and use Support Center. https://service.elsevier.com/app/home/supporthub/scopus/. |
No related articles found! |
|