Enable JavaScript to see more content
Recently hyped ML content linked in one simple page
Sources: reddit/r/{MachineLearning,datasets}, arxiv-sanity, twitter, kaggle/kernels, hackernews, awesome-datasets, sota changes
Made by: Deep Phrase HK Limited
1
|
|
[1910.11218] Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed
We proposed two techniques of promoting the knowledge of source syntax in the Transformer model of NMT by multi-tasking and evaluated them at reasonably large data sizes
Abstract: The utility of linguistic annotation in neural machine translation seemed to
had been established in past papers. The experiments were however limited to
recurrent sequence-to-sequence architectures and relatively small data
settings. We focus on the state-of-the-art Transformer model and use comparably
larger corpora. Specifically, we try to promote the knowledge of source-side
syntax using multi-task learning either through simple data manipulation
techniques or through a dedicated model component. In particular, we train one
of Transformer attention heads to produce source-side dependency tree. Overall,
our results cast some doubt on the utility of multi-task setups with linguistic
information. The data manipulation techniques, recommended in previous works,
prove ineffective in large data settings. The treatment of self-attention as
dependencies seems much more promising: it helps in translation and reveals
that Transformer model can very easily grasp the syntactic structure. An
important but curious result is, however, that identical gains are obtained by
using trivial "linear trees" instead of true dependencies. The reason for the
gain thus may not be coming from the added linguistic knowledge but from some
simpler regularizing effect we induced on self-attention matrices.
‹ Fig. 1. Sample dependency tree, inputs and expected outputs of linguistic secondary tasks. (Simple Alternating Multi-Task) Fig. 3. Learning curves of the de2cs baseline and dummy secondary tasks over training steps. MT BLEU left, percentage of correct answers for the secondary task right. (Simple Alternating Multi-Task) Fig. 4. Learning MT BLEU curves of the de2cs baseline and linguistic secondary tasks over training steps (left) and over MT epochs (right). (Training Cost of the Multi-Task) Fig. 6. Dummy dependencies with diagonal matrix (the columns represent the heads, the rows are dependents). (Diagonal Parse) Fig. 5. Joint dependency parsing and translation model (“DepParse”). (Model Architecture) Fig. 7. Histogram of normalized self-attention weights in the encoder. (Diagonal Parse) Fig. 8. Histogram of self-attention weights in the encoder’s layer 4 when parsing from layer 4. (Self-Attention Patterns in the Encoder)›
|
|
Related: TFIDF
[1805.10850] Inducing Grammars with and for Neural Machine Translation[1804.08915] Scheduled Multi-Task Learning: From Syntax to Translation[1805.04237] Neural Machine Translation for Bilingually Scarce Scenarios: A Deep Multi-task Learning Approach[1704.04675] Graph Convolutional Encoders for Syntax-aware Neural Machine Translation[1909.02074] Jointly Learning to Align and Translate with Transformer Models[1808.10267] Multi-Source Syntactic Neural Machine Translation[1708.00993] Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning[1811.02278] Off-the-Shelf Unsupervised NMT[1905.09418] Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned[1908.11782] Latent Part-of-Speech Sequences for Neural Machine Translation
|
|
Related: TFIDF
[1805.10850] Inducing Grammars with and for Neural Machine Translation[1804.08915] Scheduled Multi-Task Learning: From Syntax to Translation[1805.04237] Neural Machine Translation for Bilingually Scarce Scenarios: A Deep Multi-task Learning Approach[1704.04675] Graph Convolutional Encoders for Syntax-aware Neural Machine Translation[1909.02074] Jointly Learning to Align and Translate with Transformer Models[1808.10267] Multi-Source Syntactic Neural Machine Translation[1708.00993] Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning[1811.02278] Off-the-Shelf Unsupervised NMT[1905.09418] Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned[1908.11782] Latent Part-of-Speech Sequences for Neural Machine Translation