[1602.02410] Exploring the Limits of Language Modeling
LSTMs), exploiting small character CNNs, and by sharing our findings in this paper and accompanying code and models (to be released upon publication), we hope to inspire research on large scale Language Modeling, a problem we consider crucial towards language understanding
Abstract: In this work we explore recent advances in Recurrent Neural Networks for
large scale Language Modeling, a task central to language understanding. We
extend current models to deal with two key challenges present in this task:
corpora and vocabulary sizes, and complex, long term structure of language. We
perform an exhaustive study on techniques such as character Convolutional
Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark.
Our best single model significantly improves state-of-the-art perplexity from
51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20),
while an ensemble of models sets a new record by improving perplexity from 41.0
down to 23.7. We also release these models for the NLP and ML community to
study and improve upon.
‹Figure 1. A high-level diagram of the models presented in this paper. (a) is a standard LSTM LM. (b) represents an LM where both input and Softmax embeddings have been replaced by a character CNN. In (c) we replace the Softmax by a next character prediction LSTM network. (Introduction)Figure 2. The difference in log probabilities between the best LSTM and KN-5 (higher is better). The words from the holdout set are grouped into 25 buckets of equal size based on their frequencies. (Smaller Models with CNN Softmax)›