Scalable Bayesian learning of recurrent neural networks for language modeling

Zhe Gan, Chunyuan Li, Changyou Chen, Yunchen Pu, Qinliang Su, Lawrence Carin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Scopus citations

Abstract

Recurrent neural networks (RNNs) have shown promising performance for language modeling. However, traditional training of RNNs using back-propagation through time often suffers from overfitting. One reason for this is that stochastic optimization (used for large training sets) does not provide good estimates of model uncertainty. This paper leverages recent advances in stochastic gradient Markov Chain Monte Carlo (also appropriate for large training sets) to learn weight uncertainty in RNNs. It yields a principled Bayesian learning algorithm, adding gradient noise during training (enhancing exploration of the model-parameter space) and model averaging when testing. Extensive experiments on various RNN models and across a broad range of applications demonstrate the superiority of the proposed approach relative to stochastic optimization.
Original languageEnglish (US)
Title of host publicationACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
PublisherAssociation for Computational Linguistics (ACL)acl@aclweb.org
Pages321-331
Number of pages11
ISBN (Print)9781945626753
DOIs
StatePublished - Jan 1 2017
Externally publishedYes

Fingerprint

Dive into the research topics of 'Scalable Bayesian learning of recurrent neural networks for language modeling'. Together they form a unique fingerprint.

Cite this