Home » Publication » 24734

Dettaglio pubblicazione

2019, Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Pages 3226-3234

Misspelling Oblivious Word Embeddings (04b Atto di convegno in volume)

Piktus Aleksandra, Bora Edizel Necati, Bojanowski Piotr, Grave Edouard, Ferreira Rui, Silvestri Fabrizio

In this paper we present a method to learn word embeddings that are resilient to misspellings. Existing word embeddings have limited applicability to malformed texts, which contain a non-negligible amount of out-of-vocabulary words. We propose a method combining FastText with subwords and a supervised task of learning misspelling patterns. In our method, misspellings of each word are embedded close to their correct variants. We train these embeddings on a new dataset we are releasing publicly. Finally, we experimentally show the advantages of this approach on both intrinsic and extrinsic NLP tasks using public test sets.
Gruppo di ricerca: Algorithms and Data Science, Gruppo di ricerca: Theory of Deep Learning
keywords
© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma