[1412.3555] Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
In order to understand better how a gated unit helps learning and to separate out the contribution of each component, for instance gating units in the LSTM unit or the GRU, of the gating units, more thorough experiments will be required in the future.

Abstract: In this paper we compare different types of recurrent units in recurrent
neural networks (RNNs). Especially, we focus on more sophisticated units that
implement a gating mechanism, such as a long short-term memory (LSTM) unit and
a recently proposed gated recurrent unit (GRU). We evaluate these recurrent
units on the tasks of polyphonic music modeling and speech signal modeling. Our
experiments revealed that these advanced recurrent units are indeed better than
more traditional recurrent units such as tanh units. Also, we found GRU to be
comparable to LSTM.

‹Figure 1: Illustration of (a) LSTM and (b) gated recurrent units. (a) i, f and o are the input, forget and output gates, respectively. c and c̃ denote the memory cell and the new memory cell content. (b) r and z are the reset and update gates, and h and h̃ are the activation and the candidate activation. (Gated Recurrent Neural Networks)Figure 2: Learning curves for training and validation sets of different types of units with respect to (top) the number of iterations and (bottom) the wall clock time. y-axis corresponds to the negativelog likelihood of the model shown in log-scale. (Results and Analysis)

Figure 3: Learning curves for training and validation sets of different types of units with respect to (top) the number of iterations and (bottom) the wall clock time. x-axis is the number of epochs and y-axis corresponds to the negative-log likelihood of the model shown in log-scale. (Results and Analysis)›