Data Science PhD course Neural Information Retrieval and NLP - Proff. Fabrizio Silvestri e Nicola Tonellotto | Dipartimento di Ingegneria informatica, automatica e gestionale

Data Science PhD course Neural Information Retrieval and NLP - Proff. Fabrizio Silvestri e Nicola Tonellotto

Speaker:

Prof. Fabrizio Silvestri (Sapienza Univ. of Rome) and Prof. Nicola Tonellotto (Univ. of Pisa)

speaker DIAG:

Fabrizio Silvestri

Data dell'evento:

Sabato, 15 May, 2021 - 09:15

Luogo:

Zoom Meeting https://uniroma1.zoom.us/j/87279867964?pwd=dWpRMk5yRnBqRFVTRzRPd2l3a3BiQT09

Contatto:

Stefano Leonardi

Dottorato di Ricerca in Data Science

Neural Information Retrieval and NLP

Prof. Fabrizio Silvestri (Sapienza Università di Roma) and Prof. Nicola Tonellotto (University of Pisa)

Join Zoom Meeting https://uniroma1.zoom.us/j/87279867964?pwd=dWpRMk5yRnBqRFVTRzRPd2l3a3BiQT09 Meeting ID: 872 7986 7964 Passcode: 433023

20/05 Lesson 1 - Prof. Fabrizio Silvestri

from 9.30 to 13.30 - Intro to PyTorch, Language Models, Implementing Word2Vec in PyTorch

from 15.30 to 17.30 - Practicum

21/05 Lesson 2 - Prof. Fabrizio Silvestri

from 10.30 to 13.30 - Self-attention, Transformers, BERT, and Beyond. HuggingFace Transformers

from 15.30 to 17.30 - Practicum

27/05 Lesson 3 - Prof. Nicola Tonellotto (Univ. of Pisa)

from 9.30 to 13.30 - Intro to Information Retrieval. Classical models and limitations. Neural Models for IR

from 15.30 to 17.30 - Practicum

28/05 Lesson 4 - Prof. Nicola Tonellotto (Univ. of Pisa)

from 9.30 to 13.30 - Neural models for IR

15.30 to 17.30 - Practicum

Abstract:

Advances from the natural language processing community have recently sparked a renaissance in the task of ad-hoc search. Particularly, large contextualized language modeling techniques, such as BERT, have equipped ranking models with a far deeper understanding of language than the capabilities of previous bag-of-words (BoW) models. Applying these techniques to a new task is tricky, requiring knowledge of deep learning frameworks, and significant scripting and data munging. In this course, we provide background on classical (e.g., BoW), modern (e.g., Learning to Rank). We introduce students to the Transformer architecture also showing how they are used in foundational aspects of modern large language models (e.g., BERT) and contemporary (e.g., doc2query) search ranking and re-ranking techniques. Going further, we detail and demonstrate how these can be easily experimentally applied to new search tasks in a new declarative style of conducting experiments exemplified by the PyTerrier and OpenNIR search toolkits.