中文题名: | 基于深度学习的典籍人称代词指代消解研究 |
姓名: | |
学号: | 2018814048 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 125500 |
学科名称: | 管理学 - 图书情报 |
学生类型: | 硕士 |
学位: | 图书情报硕士 |
学校: | 南京农业大学 |
院系: | |
专业: | |
研究方向: | 自然语言处理 |
第一导师姓名: | |
第一导师单位: | |
第二导师姓名: | |
完成日期: | 2020-04-30 |
答辩日期: | 2020-05-30 |
外文题名: | Research on the anaphoric resolution of personal pronouns in classical books based on deep learning |
中文关键词: | |
外文关键词: | Anaphora resolution ; Personal pronouns ; Classics ; The ancient Chinese |
中文摘要: |
在中华文化源远流长的历史长河中,留下了浩如烟海的珍贵古汉语典籍文献。典籍文本包含着丰富的历史信息,记载着前人非凡出彩的哲思,它奠定了民族文化的根基,对传统文化的弘扬与传承至关重要。随着信息时代的发展,针对民族文化的重要载体——古代汉语典籍文本,如何运用古文信息处理技术对古代汉语典籍进行深度挖掘与知识发现有着非凡意义,不仅有助于传统文化的发扬与传承,同时也有利于提升国家文化软实力。 人称代词是在自然语言中指代人物实体的代词,一个完整的指代关系由用于指向指代词的“照应语”和所指内容,即“先行语”共同组成。古汉语典籍中的人称代词与现代汉语虽功用一致,但由于古代汉语与现代汉语在语法、字词等方面存在诸多差异,在人称代词方面也存在数量上、单复数、以及词性兼类的诸多差别。因此正确识别出古汉语中的人称代词对古代汉语典籍研究的深度挖掘有着不容小觑的作用,同时人称代词识别的性能对指代消解的性能起着影响作用。本文研究探讨古汉语典籍中出现的句内人称代词指代消解问题,分别采用传统机器学习与深度学习的方法对人称代词的识别、指代消解方法进行对比研究。本文的重点工作内容为以下三点:
|
外文摘要: |
There are various kinds of precious ancient Chinese classics in the long history of Chinese culture. The text of classic books contains rich historical information and records the outstanding philosophy of predecessors. It lays the foundation of national culture and is crucial to the promotion and inheritance of traditional culture. With the development of the information age, it is of great significance to make deep exploration and knowledge discovery of ancient Chinese classics by using the information processing technology of ancient Chinese classics, which is an important carrier of national culture -- ancient Chinese classics. It is not only conducive to the development and inheritance of traditional culture, but also conducive to the promotion of national cultural soft power. Personal pronoun is a pronoun that refers to a person entity in natural language, and a complete referential relation is composed of "reference language" used to refer to the pronoun and the referent content, that is, "antecedent language". Although the personal pronouns in ancient Chinese classics have the same function as that in modern Chinese, there are many differences between ancient Chinese and modern Chinese in grammar, words and so on. Therefore, the correct identification of personal pronouns in ancient Chinese plays an important role in the in-depth study of ancient Chinese classics. In this paper, the problem of reference resolution in ancient Chinese classics is discussed in depth, the methods of personal pronoun recognition and reference resolution are compared and studied by using traditional machine learning and deep learning methods. This paper focuses on the following three points:
|
参考文献: |
[1]Chen C, Ng V. Combining the Best of Two Worlds: A Hybrid Approach to Multilingual Coreference Resolution[C].Joint Conference on EMNLP and CoNLL-Shared Task. Association for Computational Linguistics, 2012: 56-63. [2]Clark K, Manning C D. Entity-Centric Coreference Resolution with Model Stacking[C].Meeting of the Association for Computational Linguistics and the, International Joint Conference on Natural Language Processing. 2015: 1405-1415. [3]Curdle C, Wagstaft K. Noun Phrase Coreference as Clustering[C].In Proceedings of the Joint Conference on Empirical Methods in NLP and Very Large Corpora. 1999: 277–308. [4]FEN X,ZHANG Y,GLASSJ.Speech featurede-Noising and dereverberation via deep auto encoders for noisy reverberant speech recogintion[C].Proceedings of the IEEE International Conference on Acoutstics,Speech and Signal Processing.Florence:IEEE,2004: 1759-763. [5]Hochreiters, Schmidhuber J. Longshort-termmemory[J].Neural Computation,1997,9(8): 1735-1780 [6]Hinton G E, Osindero S,Teh Y W.A fast learning algorithm for deep belief nets[J].Neural Computation,2006,18 18(7): 1527-1554. [7]Hobbs J R. Resolving pronoun references[J]. Lingua, 1978, 44(4): 311-338. [8]Hua Chengcheng, Wang Hong, Chen Jichi, et al.Novel functional brain network methods based on CNN with an application in proficiency evaluation[J].Neurocomputing, 2019, 359:153-162. [9]Kennedy C , Boguraev B. Anaphora for Everyone: Pronominal Anaphora Resolution without a Parser[C].Conference on Computational Linguistics. Association for Computational Linguistics, 1996:113-118. [10]Kong F, Zhou G D, Zhu Q. Employing the centering theory in pronoun resolution from the semantic perspective[C].Conference on Empirical Methods in Natural Language Processing: Volume. Association for Computational Linguistics, 2009: 987-996. [11]Lappin S, Leass H J. An algorithm for pronominal anaphora resolution[J]. Computational Linguistics, 1994, 20(4): 535-561. [12]Lecun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. [13]Lee H, Peirsman Y, Chang A, et al. Stanford's multi-pass sieve coreference resolution system at the CoNLL-2011 shared task[C].Fifteenth Conference on Computational Natural Language Learning: Shared Task. 2011: 28-34. [14]Li Hongyang,Chen Jiang,Lu Huchuan,et al.CNN for saliency detection with low-level feature integration[J].Neurocomputing,2017,226: 212-220. [15]Li Jun,Huang Guimin,Chen Jianheng,et al.Dual CNN for relation extraction with knowledge- based attention and word embedding[J].Computational Intelligence and Neuroscience,2019: 1-10. [16]Luo X, Ittycheriah A,Jing H,et al.A Mention-Synchronous Coreference Resolution Algorithm Based on the Bell Tree[C].Meeting of the Association for Computational Linguistics,21-26 July, 2004, Barcelona, Spain.DBLP, 2004: 135-142. [17]Mccarthy J F, Lehnert W G. Using Decision Trees for Coreference Resolution[C].Fourteenth International Joint Conference on Artificial Intelligence. Montreal, 1995: 1050-1055. [18]Rumelhart D E,Hinton G E, Williams R J.Learning internal representation by back-propagation of errors[J].Nature, 1986,323(323):533-536. [19]Raghunathan K,Lee H,Rangarajan S,et al.A multi-pass sieve for coreference resolution[C].Proceddings of the 2010 Conference on Empirical Methods in Natural Language Processing.MIT,Massachusetts:ACL,2010: 492-501. [20]Soon W M, Ng H T, Lim D C Y. A machine learning approach to coreference resolution of noun phrases[J]. Computational Linguistics, 2001,27(4): 521-544. [21]VAN DEEMTER K,KIBBLE R.On coreferring:coreference in MUC and related annotation schemes[J].Computational Linguistics,2000,26(4): 629-637. [22]VINCENT P,LAROCHELLE H,LAJOIEI,et al.Stacked denoising auto encoders:learning useful representations in a deep network with a local denoising criterion[J].The Journal of Machine Learning Research,2010,11(11): 3371-3408. [23]Wiseman S, Rush A M, Shieber S M. Learning Global Features for Coreference Resolution[C].Meeting of the Association for Computational Linguistics, 12-17 June, 2016, San Diego, California, 2016: 994-1004. [24]Yang X, Zhou G, Su J, et al. Coreference Resolution Using Competition Learning Approach[C].Meeting of the Association for Computational Linguistics, 7-12 July, 2003, Sapporo Convention Center, Sapporo, Japan. DBLP, 2003: 176-183. [25]Zenlenko D,Aone C,Tibbetts J. Coreference resolution for information extraction[A].Proceedings of theACL Workshop on Reference Resolution and Its Applications[C]. Barcelona,Spain: ACL,2004: 9-16. [26]陈小荷.先秦文献信息处理[M].北京:世界图书出版公司,2013:298-314. [27]樊峻畅.红外图像中基于卷积神经网络的车辆检测[D].西安:西安电子科技大学,2017. [28]冯岭,谢世博,刘斌.基于多层感知机的技术创新人才发现方法[J].计算机应用与软件,2019,36(7):26-31. [29]付健,孔芳.融入结构化信息的端到端中文指代消解[J].计算机工程,2020,46(01):45-51. [30]高强.基于深度卷积网络学习算法及其应用研究[D].北京:北京化工大学,2015. [31]高俊伟,孔芳,朱巧明,李培峰,华秀丽.无监督中文名词短语指代消解研究[J].计算机工程,2012,38(17):189-191. [32]郭锡良.1991年古汉语语法研究简述[J].语文建设,1992(05):22-24. [33]顾孙炎.基于深度神经网络的中文命名实体识别研究[D].南京:南京邮电大学,2018. [34]黄建年.农业古籍的计算机断句标点与分词标引研究[D].南京:南京农业大学,2009. [35]黄水清,王东波,何琳.以《汉学引得丛刊》为领域词表的先秦典籍自动分词探讨[J].图书情报工作,2015,59(11):127-133. [36]黄水清,王东波.古文信息处理研究的现状及趋势[J].图书情报工作,2017,61(12):43-49. [37]侯宇青阳,全吉成,王宏伟.深度学习发展综述[J].舰船电子工程,2017,37(04):5-9+111. [38]李义琳.上古汉语和现代汉语人称代词比较[J].山西师大学报(社会科学版),1990(03):91-94. [39]郎君.基于决策树的中文名词短语指代消解[C].中国中文信息学会.第二届全国学生计算语言学研讨会论文集.中国中文信息学会:中国中文信息学会,2004:172-174. [40]李国臣,罗云飞.采用优先选择策略的中文人称代词的指代消解[J].中文信息学报,2005,19(04):24-30. [41]刘汉生.《史记》与《世说新语》人称代词比较[J].天中学刊,2007(01):81-84. [42]梁社会,陈小荷.先秦文献《孟子》自动分词方法研究[J].南京师范大学文学院学报,2013 (3):175-182. [43]路玉君.基于RNN的陆空通话语义描述与度量方法[D].天津:中国民航大学,2017. [44]李阳辉,谢明,易阳.基于降噪自动编码器及其改进模型的微博情感分析[J].计算机应用研究,2017,34(2):373-377. [45]李冬白.基于深度学习的维吾尔语代词指代消解研究[D].新疆:新疆大学,2017. [46]李东欣,禹龙,田生伟,李圃,赵建国.注意力机制的LSTM-DBN维语人称代词指代消解[J].计算机技术与发展,2019,29(07):33-38. [47]钱智勇,周建忠,童国平,苏新宁.基于HMM的楚辞自动分词标注研究[J].图书情报工作,2014,58(04):105-110. [48]秦越,禹龙,田生伟,赵建国,冯冠军.基于深度置信网络的维吾尔语人称代词待消解项识别[J].计算机科学,2017,44(10):228-233. [49]邱冰,皇甫娟.基于中文信息处理的古代汉语分词研究[J].微计算机信息,2008(24):100-102. [50]疏骏.基于中心理论的汉语指代词消解算法[C].Northeastern University、Tsinghua University、Chinese Information Processing Society of China、Chinese Languages Computer Society, USA.Advances in Computation of Oriental Languages-Proceedings of the 20th International Conference on Computer Processing of Oriental Languages.Northeastern University、Tsinghua University、Chinese Information Processing Society of China、Chinese Languages Computer Society, USA:中国中文信息学会,2003:247-254. [51]石民,李斌,陈小荷.基于CRF的先秦汉语分词标注一体化研究[J].中文信息学报,2010,24(02):39-45. [52]田生伟,秦越,禹龙,吐尔根·依布拉音,冯冠军.基于Bi-LSTM的维吾尔语人称代词指代消解[J].电子学报,2018,46(07):1691-1699. [53]田启川,王满丽.深度学习算法研究进展[J].计算机工程与应用,2019,55(22):25-33. [54]王爽,熊德兰,王晓霞.基于实例的古文机器翻译设计与实现[J].许昌学院学报,2009,28(05):88-91. [55]王厚峰.指代消解的基本方法和实现技术[J].中文信息学报,2002,16(6):9-17. [56]王厚峰,梅铮.鲁棒性的汉语人称代词消解[J].软件学报,2005,16(05):700-707. [57]王厚峰,何婷婷.汉语中人称代词的消解研究[J].计算机学报,2001(02):136-143. [58]吴兵兵.基于词向量和LSTM的汉语零指代消解研究[D].哈尔滨:哈尔滨工业大学,2016. [59]王博立,史晓东,苏劲松.一种基于循环神经网络的古文断句方法[J].北京大学学报(自然科学版),2017,53(02):255-261. [60]许敏,王能忠,马彦华.汉语中指代问题的研究及讨论[J].西南师范大学学报(自然科学版),1999(06):633-637. [61]徐四海,夏锡骏.古代汉语第二人称代词三题[J].江苏广播电视大学学报,2007(01):51-54. [62]徐凡,朱巧明,周国栋.衔接性驱动的篇章一致性建模研究[J].中文信息学报,2014,28(3):11-21. [63]奚雪峰,周国栋.基于Deep Learning的代词指代消解[J].北京大学学报自然科学版,2014,50(1):100-110. [64]夏吾吉,华却才让.基于混合策略的藏文人称代词指代消解研究[J].计算机工程与应用,2018,54(07):66-69+1. [65]于丽丽,丁德鑫,曲维光,陈小荷,李惠.基于条件随机场的古汉语词义消歧研究[J].微电子学与计算机,2009,26(10):45-48. [66]袁悦,王东波,黄水清,李斌.不同词性标记集在典籍实体抽取上的差异性探究[J].数据分析与知识发现,2019,3(03):57-65. [67]杨启萌,禹龙,田生伟,艾山·吾买尔.基于多注意力机制的维吾尔语人称代词指代消解[J/OL].自动化学报:1-11[2020-03-18].https://doi.org/10.16383/j.aas.c180678. [68]周俊生,黄书剑,陈家骏,曲维光.一种基于图划分的无监督汉语指代消解算法[J].中文信息学报,2007(02):77-82. [69]周庆,刘斌,余正伟等.综合模块化航电软件测试环境研究[J].航空学报,2012(11):722-733. [70]周炫余,刘娟,卢笑.篇章中指代消解研究综述[J].武汉大学学报理学版,2014,60(1):24-36. [71]张立民,刘凯.基于深度玻尔兹曼机的文本特征提取研究[J].微电子学与计算机,2015,32(2):142-147. [72]周炫余,刘娟,邵鹏.基于测度优化 Laplacian SVM的中文指代消解方法[J].电子学报,2016,44(12):3064-3072. [73]曾丹梦.基于深度学习的眼动关键技术研究[D].西安电子科技大学,2019. [74]翟东海,侯佳林,刘月.基于深度学习的文本情感分析算法并行化研究[J].西南交通大学学报,2019,54(3):647-654. |
中图分类号: | G25 |
开放日期: | 2020-06-11 |