Zero-shot transfer across 93 languages

Zero-shot transfer across 93 languages

To accelerate the transfer of natural language processing (NLP) applications to many more languages, we have significantly expanded and enhanced our LASER (Language-Agnostic SEntence Representations) toolkit. Our sentence embeddings are also strong at parallel corpus mining, establishing a new state of the art in the BUCC shared task for three of its four language pairs. Using this data set, our sentence embeddings obtain strong results in multilingual similarity search even for low-resource languages. LASER’s vector representations of sentences are generic with respect to both the input language and the NLP task. For example, the properties of the multilingual semantic space can be used for paraphrasing a sentence or searching for sentences with similar meaning — either in the same language or in any of the 93 others now supported by LASER.

Source: code.fb.com