FoTran 2018 abstracts

Abstracts from FoTran 2018

Some of the abstract from our workshop on representation learning from multilingual data.

André Martins: Beyond Softmax: Sparsity, Constraints, Latent Structure – All End-to-End Differentiable!

Abstract: Softmax is the most popular transformation to map a vector of real numbers onto a probability distribution, being widely used in the output layer and attention mechanisms of neural machine translation systems. In this talk, I will present differentiable alternatives to softmax that are able to output _sparse_, _constrained_, and _structured_ probability distributions, usable both as output or hidden layers in neural networks. The starting point is _sparsemax_, which has similar properties to softmax but is sparse, hence appealing for interpretability. I will show how forward evaluation and gradient back-propagation can be done efficiently, and will derive a “sparsemax loss” that is the counterpart of the cross-entropy loss. I will proceed with constrained versions of softmax and sparsemax, which allow placing upper bounds on the probabilities, therefore being suitable for modeling fertility in neural machine translation. Finally, I will introduce SparseMAP, the structured counterpart of sparsemax, and show how it can be used for sparse structured output prediction and sparse latent structure models, overcoming the combinatorial number of structures. Experiments in dependency parsing and natural language inference reveal competitive accuracy, improved interpretability, and the ability to capture natural language ambiguities, which is attractive for pipeline systems.

This is joint work with Ramon Astudillo, Julia Kreutzer, Chaitanya Malaviya, Pedro Ferreira, Vlad Niculae, Mathieu Blondel, and Claire Cardie.

Bio: André Martins is the Head of Research at Unbabel, a research scientist at Instituto de Telecomunicações, and an invited professor at Instituto Superior Técnico in the University of Lisbon. He received his dual-degree PhD in Language Technologies in 2012 from Carnegie Mellon University and Instituto Superior Técnico. His research interests include natural language processing, machine learning, deep learning, and optimization. He received a best paper award at the Annual Meeting of the Association for Computational Linguistics (ACL) for his work in natural language syntax, and a SCS Honorable Mention at CMU for his PhD dissertation. He is one of the co-founders and organizers of the Lisbon Machine Learning Summer School (LxMLS).

Ivan Vulić: Multilingual NLP via Cross-Lingual Word Embeddings

In recent past, NLP as a field has seen tremendous utility of word embeddings as features in downstream tasks. The fact that these word vectors can be trained on unlabeled monolingual corpora of a language makes them an inexpensive resource in NLP. With the increasing use of monolingual word vectors, there is a need for word vectors that can be used as efficiently across multiple languages as monolingually. Therefore, learning bilingual and multilingual word embeddings is currently an important research topic. These vectors offer an elegant and language-pair independent way to represent content across different languages in shared cross-lingual embedding spaces, and also enable the integration of knowledge from external resources (e.g., WordNet, dictionaries) into the embedding spaces. In this talk, I will briefly discuss the current techniques in cross-lingual word embedding learning, presenting the model typology based on multilingual training data requirements. I will then introduce several illustrative applications of the induced embedding spaces, including bilingual dictionary induction, ad-hoc cross-lingual information retrieval, and cross-lingual transfer for dialogue state tracking.

Kyunghyun Cho: Three recent directions in neural machine translation

In this talk, I will describe three research problems I have recently worked on and found worth further discussion and investigation in the context of neural machine translation. First, I will discuss whether the standard autoregressive sequence model could be replaced with non-autoregressive one and if so, how we would do so by introducing the idea of iterative refinement for sequence generation. Second, I will introduce one particular type of meta-learning algorithms, called MAML [Finn et al., 2017] and discuss how this is well-suited for multilingual translation and in particular low-resource translation. Lastly, I will quickly discuss slightly old work on real-time translation. All of these works are highly experimental but at the same time extremely fun to think about and discuss.

Poster presentations

Anisia Katinskaia, Javad Nouri, Roman Yangarber: Revita: a language-learning platform at the intersection of ITS and CALL (presented by Jose María Hoya Quecedo)

Abstract: We present Revita, a Web-based platform for language learning—beyond the beginner level. We survey the literature about recent advances in the fields of computer-aided language learning (CALL) and intelligent tutoring systems (ITS). We outline the established desiderata of CALL and ITS and discuss how Revita addresses (the majority of) the theoretical requirements. Finally, we claim that, to the best of our knowledge, Revita is currently the only platform for learning/tutoring beyond the beginner level, that is functional, freely-available and supports multiple languages.