[1810.13327] Cross-Lingual Transfer Learning for Multilingual Task Oriented Dialog
As yet another extension, one could combine the approaches of multilingual CoVe embeddings and monolingual ELMo (or BERT, Devlin et al., 2018) embeddings and jointly train an encoder with a language model and an MT objective, which would potentially combine the benefit of training a model on large monolingual corpora while at the same time aligning the vector spaces of the two languages
Abstract: One of the first steps in the utterance interpretation pipeline of many
task-oriented conversational AI systems is to identify user intents and the
corresponding slots. Since data collection for machine learning models for this
task is time-consuming, it is desirable to make use of existing data in a
high-resource language to train models in low-resource languages. However,
development of such models has largely been hindered by the lack of
multilingual training data. In this paper, we present a new data set of 57k
annotated utterances in English (43k), Spanish (8.6k) and Thai (5k) across the
domains weather, alarm, and reminder. We use this data set to evaluate three
different cross-lingual transfer methods: (1) translating the training data,
(2) using cross-lingual pre-trained embeddings, and (3) a novel method of using
a multilingual machine translation encoder as contextual word representations.
We find that given several hundred training examples in the the target
language, the latter two methods outperform translating the training data.
Further, in very low-resource settings, multilingual contextual word
representations give better results than using cross-lingual static embeddings.
We also compare the cross-lingual methods to using monolingual resources in the
form of contextual ELMo representations and find that given just small amounts
of target language data, this method outperforms all cross-lingual methods,
which highlights the need for more sophisticated cross-lingual methods.
‹Figure 1: Slot and intent model architecture. Word embeddings are passed through a biLSTM layer which is shared across the slot detection and intent prediction tasks. (NLU models)Figure 2: Results for different training set sizes. The top and the bottom of the error bars correspond to the highest and lowest value of the exact match metric among the 10 runs. (Cross-lingual learning)›
[1905.05475] Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies[1902.09492] Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing[1910.11856] On the Cross-lingual Transferability of Monolingual Representations[1908.11326] Translate and Label! An Encoder-Decoder Approach for Cross-lingual Semantic Role Labeling[1909.00437] Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation[1812.09617] Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification[1909.09265] Cross-lingual Dependency Parsing with Unlabeled Auxiliary Languages[1808.04736] Adversarial Neural Networks for Cross-lingual Sequence Tagging[1907.03112] Best Practices for Learning Domain-Specific Cross-Lingual Embeddings[1906.05889] On the Effect of Word Order on Cross-lingual Sentiment Analysis