A crowd-efficient learning approach for NER based on online encyclopedia

Maolong Li, Zhixu Li, Qiang Yang, Zhigang Chen, Pengpeng Zhao, Lei Zhao

Research output: Contribution to journalArticlepeer-review

Abstract

Named Entity Recognition (NER) is a core task of NLP. State-of-art supervised NER models rely heavily on a large amount of high-quality annotated data, which is quite expensive to obtain. Various existing ways have been proposed to reduce the heavy reliance on large training data, but only with limited effect. In this paper, we propose a crowd-efficient learning approach for supervised NER learning by making full use of the online encyclopedia pages. In our approach, we first define three criteria (representativeness, informativeness, diversity) to help select a much smaller set of samples for crowd labeling. We then propose a data augmentation method, which could generate a lot more training data with the help of the structured knowledge of online encyclopedia to greatly augment the training effect. After conducting model training on the augmented sample set, we re-select some new samples for crowd labeling for model refinement. We perform the training and selection procedure iteratively until the model could not be further improved or the performance of the model meets our requirement. Our empirical study conducted on several real data collections shows that our approach could reduce 50% manual annotations with almost the same NER performance as the fully trained model.
Original languageEnglish (US)
JournalWorld Wide Web
DOIs
StatePublished - Dec 2 2019

Fingerprint Dive into the research topics of 'A crowd-efficient learning approach for NER based on online encyclopedia'. Together they form a unique fingerprint.

Cite this