Seminario pubblico di Federico Scafoglieri (Procedura valutativa per n.4 posti di Ricercatore a tempo determinato tipologia A - SC 09/H1 SSD ING-INF/05)
Tuesday, 11 July, 2023 - 16:00
Title: Addressing Model and Instance level Heterogeneities for Data Preparation under Knowledge Representation Principles
Abstract: Data preparation is the process of collecting, aggregating, transforming, and cleaning raw data to prepare it for future processing and analysis. It is a crucial process in all data-intensive applications, such as analytics or those involving machine learning approaches that are sensitive to low data quality, thus impacting the final result. However, data preparation often encounters challenges stemming from data-model heterogeneity, requiring the integration of sources beyond traditional structured databases. Additionally, data-level issues like entity duplication further complicate the process.
In this seminar, I will delve into approaches rooted in knowledge representation techniques and data integration principles to address these challenges. The discussion will revolve around two key topics:
1) Extending the data integration paradigm known as Ontology-based data access in order to incorporate semi-structured and unstructured sources (mainly raw text). This extension aims to enhance the integration of diverse data sources, facilitating a more comprehensive data preparation process.
2) Introducing a novel approach based on collective, formal, logical, and reasoning-based methods to tackle the complex problem of entity resolution while also focusing on the query answering task. This approach aims to ensure data accuracy and minimize redundancy by effectively addressing entity duplication.
The main aim of the techniques that I will present is to improve the efficiency and effectiveness of data preparation, ultimately enhancing the overall quality and reliability of data analysis outcomes.
Short Bio: Federico Maria Scafoglieri serves as a PostDoc Researcher at the Department of Computer, Control, and Management Engineering (DIAG) Antonio Ruberti at Sapienza University of Rome, where he received a Ph.D. in Engineering in Computer Science in 2022. His research focuses mainly on data management, particularly in the areas of data integration and data quality. He has participated in various academic and industrial projects on these topics. During his PhD, he was research scholar at IBM Research Almaden in California, USA. He received the Best Demo Paper award at ISWC 2021.