Camel: Content-Aware and Meta-path Augmented Metric Learning for Author Identification

Chuxu Zhang, Chao Huang, Lu Yu, Xiangliang Zhang, Nitesh V. Chawla

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Scopus citations

Abstract

In this paper, we study the problem of author identification in big scholarly data, which is to effectively rank potential authors for each anonymous paper by using historical data. Most of the existing de-anonymization approaches predict relevance score of paper-author pair via feature engineering, which is not only time and storage consuming, but also introduces irrelevant and redundant features or miss important attributes. Representation learning can automate the feature generation process by learning node embeddings in academic network to infer the correlation of paper-author pair. However, the learned embeddings are often for general purpose (independent of the specific task), or based on network structure only (without considering the node content). To address these issues and make a further progress in solving the author identification problem, we propose Camel, a content-aware and meta-path augmented metric learning model. Specifically, first, the directly correlated paper-author pairs are modeled based on distance metric learning by introducing a push loss function. Next, the paper content embedding encoded by the gated recurrent neural network is integrated into the distance loss. Moreover, the historical bibliographic data of papers is utilized to construct an academic heterogeneous network, wherein a meta-path guided walk integrative learning module based on the task-dependent and content-aware Skipgram model is designed to formulate the correlations between each paper and its indirect author neighbors, and further augments the model. Extensive experiments demonstrate that Camel outperforms the state-of-the-art baselines. It achieves an average improvement of 6.3% over the best baseline method.
Original languageEnglish (US)
Title of host publicationProceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18
PublisherACM Press
Pages709-718
Number of pages10
ISBN (Print)9781450356398
DOIs
StatePublished - 2018

Fingerprint Dive into the research topics of 'Camel: Content-Aware and Meta-path Augmented Metric Learning for Author Identification'. Together they form a unique fingerprint.

Cite this