[1910.11235] Rethinking Exposure Bias In Language Modeling
The two easy-toimplement strategies help alleviate the reward sparseness in RL training and tackle the exposure bias problem.
Abstract: Exposure bias describes the phenomenon that a language model trained under
the teacher forcing schema may perform poorly at the inference stage when its
predictions are conditioned on its previous predictions unseen from the
training corpus. Recently, several generative adversarial networks (GANs) and
reinforcement learning (RL) methods have been introduced to alleviate this
problem. Nonetheless, a common issue in RL and GANs training is the sparsity of
reward signals. In this paper, we adopt two simple strategies, multi-range
reinforcing, and multi-entropy sampling, to amplify and denoise the reward
signal. Our model produces an improvement over competing models with regards to
BLEU scores and road exam, a new metric we designed to measure the robustness
against exposure bias in language models.
‹Figure 1: EMNLP2017 WMT News Road Exam based on prefixes from training and testing datasets [Higher is better]. In each experiment, the data source for the prefixes is used as the reference to calculate BLEUF4. (Datasets)›