Data Management (academic year 2022/2023)
For whom is this course. This 6 credits course is for the students of the Master of Science in Engineering of Computer Science (School of Engineering) of the Sapienza Università di Roma. This course is also for students of the Corso di Laurea Magistrale in Ingegneria Gestionale of the same School. The language for both the course and the exam is English. The lectures are held in the second semester (February 2022 - May 2022).
Prerequisites. A good knowledge of the fundamentals of Programming Structures (algorithms and data structures), Programming Languages, Databases (SQL, relational data model, Entity-Relationship data model, conceptual and logical database design), Theoretical Computer Science (computational complexity, computability) is required.
Course goals. The course presents the basic concepts of data management systems. Several major issues related to the theory and the design of data management systems are covered, including NoSQL databases, transaction management, concurrency control, recovery, file and index organizations, query processing.
- February 26, 2024. The exam of the April sessions (only for "fuori corso" or "part-time" students) will be held at 9:00pm on April 10, 2024 in classroom A2 at DIAG, via Ariosto 25. Who will not book for the exam within the deadline (see the INFOSTUD system) will not be accepted at the exam and there will be no exception to this rule.
- M. Lenzerini, Lecture notes (slides to download)
Students can download the course slides by accessing the MOODLE system at this page.
The slides will be available during the lecture period. Please, note that all students of Sapienza can access the MOODLE system by using the user name and the password of the university - R. Ramakrishnan, J. Gehrke. Database Management Systems. McGraw-Hill, 2004
- Students willing to read more about concurrency control can freely download an excellent book from the following site: http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx
If one is looking for a more modern book, which is not free, a good suggestion is:
Gerhard Weikum, Gottfried Vossen, "Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery", The Morgan Kaufmann Series in Data Management Systems.
- Monday: [1:00pm - 3:00pm], via Eudossiana 18 (RM041), classroom 41,
- Wednesday: [05:10pm - 7:00pm], via Eudossiana 18 (RM041), classroom 41,
- Thursday: [11:00am - 1:00pm] via Eudossiana 18 (RM041), classroom 41.
The lectures can also be attended on-line, following this link:
https://uniroma1.zoom.us/j/83220889311?pwd=ZkxrQ3crNFJDVlIwS21jelp4bjRXZ...
The recording of the on-line lectures will be available in the MOODLE page of the course.
Week | Monday (1:00pm - 3:00pm) classroom 41 | Wednesday (5:10pm - 7:00pm) classroom 41 | Thursday (11:00am - 1:00pm) classroom 41 |
01 (Feb 27) | Lectures 1,2 - Course overview - Relational data model - Recap of relational algebra |
|
|
03 (Mar 06) | Lectures 3,4 - Exercises on relational algebra - Recap of SQL - Architecture of a data manager |
Lectures 5,6 - Buffer management - Introduction to transaction management - Serializability |
Lectures 7,8 - Anomalies in concurrency - View serializability |
04 (Mar 13) | Lectures 9,10 - The notion of conflict - Conflict serializability |
Lectures 11,12 - Properties of conflict serializability |
|
05 (Mar 20) | Lectures 13,14 - Exercises on view and conflict serializability |
Lectures 15,16 - Schedules with exclusive locks - 2-phase locking with exclusive locks |
Lectures 17,18 - 2-phase locking with shared and exclusive locks - Deadlock management |
06 (Mar 27) | Lectures 19,20 - Recoverability, cascadeless, strictness, rigorousness, - Concurrency through timestamps - Multiversion concurrency control |
Lectures 21,22 - Concurrency control in SQL - Concurrency control in Postgres |
Lectures 23,24 - Exercises on concurrency control - Recovery manager |
07 (Apr 03) | Lectures 25,26 - Introduction to Data Warehousing - Architectures of Data Warehouses |
Lectures 27,28 - Multidimensional modeling - OLAP |
|
08 (Apr 10) | Lectures 29 - ROLAP vs MOLAP - Conceptual modeling of DWs: the DFM model |
Lectures 30,31 - The star schema - The snowflake schema |
|
09 (Apr 17) | Lectures 32,33 - Logical design of a Data Warehouse - Exercises on Data Warehousing |
Lectures 34,35 - NoSQL databases - The graph data model |
|
10 (Apr 24) | Lectures 36,37 - RDF databases - The SPARQL query language |
Lectures 38,39 (Ing. Roberto Delfino) - Data Warehousing tools - Tableau public - Instructions of the student projects |
|
11 (May 01) | Lectures 40,41 (Ing. Roberto Delfino) - Neo4j |
Lectures 42,43 - Record, page and simple file organizations - External sorting |
|
12 (May 08) | Lectures 44,45 - Indexes and their properties - Sorted indexes |
Lectures 46,47 - Tree-based indexes - Exercises on file organizations |
Lectures 48,49 - Introduction to algorithms for relational operators - One pass algorithms |
13 (May 15) | Lectures 50,51 - Nested loop algorithms |
Lectures 52,53 - Two pass algorithms |
Lectures 54,55 - Index-based algorithms - Parallel algorithms |
14 (May 22) | Lectures 56,57 - Exercises on query evaluation algorithms |
Lectures 58,59 - Data Management in PA (seminar by Ing. Giuseppe Buono, Agenzia delle Entrate) |
Lectures 60,61 - Discussion on projects (Ing. Roberto Delfino) |
14 (May 29) | Lectures 62,63 - Exercises on file organizations and relational operators |
- 1. The structure of a Data Base Management System (DBMS)
- 2. Buffer management: buffer pool, replacement strategies, operations on the buffer
- 3. Concurrency management: The concept of transaction; The notion of serializability; Concurrency management strategies; Concurrency control in SQL and PostgreSQL
- 4. Recovery: Crash management; Classification of failures; Recovery strategies
- 5. NoSQL data models and systems; The case of graph databases; The Neo4j system
- 6. Data warehousing: architectures, multidimensional modeling, OLAP, design methodologies; The DFM model; The logical models for DW: the star schema, the snowflake schema
- 7. Physical structures for data management: Record and page organizations, Simple file organizations, Indexed file organizations
- 8. Query processing: evaluation of relational algebra operators
The student enrolled in the Master of Engineering in Computer Science in an A.Y. before 2021/22 are not obliged to carry out a project and therefore for them the exam can be just a written exam. The exam will be on the topics of the current A.Y. 2022/2023 (programma d'esame dell'A.A. 2022/2023). If they choose not to do the project, they MUST send an email to Prof. Lenzerini WITHIN the deadline for booking at the exam. If they do not do that within the stated deadline, they will have to do the exam with the rules of A.Y. 2021/22 or A.Y. 2022/23 and there will be no exception to this rule. Please, take into account that the written exam will be anyway on the "programma d'esame" of the current A.Y. 2022/2023.
For the student enrolled in the Master of Engineering in Computer Science in A.Y. 2021/22 or A.Y. 2022/23, the exam is composed by two parts: the project and the written exam. The mark obtained at the exam is the sum of the mark obtained at the project presentation (max 6 points) and the mark obtained at the written exam (max 24 points). If the mark for the project is less than 3 or the mark for the written exam is less than 15, the overall exam is considered failed.
Note: the project can be presented at any time, before, or even AFTER the written exam. The exam will be formally registered when both the project presentation and the written exam will be carried out. When both parts of the exam will be successfully completed, the student must wait for the first exam session ("appello") available, and book in INFOSTUD in order to have the exam formally registered.
When the student has completed one of the two parts (either the written exam, or the project), (s)he can complete the other part either in the same academic year, or in the next academic year. After this deadline, the student looses the grade for the completed part, and must carry out both parts.
- Past written exams: you can have a look at the texts of past exams
- To book for the written exam: Please, follow the on-line booking procedure.
- Schedule of written exams (decided by the administration office, not by the professor):
- First written exam: June 2023
- Second written exam: July 2023
- Third written exam: September 2023
- Special exam session (only for "fuori corso" or "part-time" students): October 2023
- Fourth written exam: January 2024
- Fifth written exam: February 2024
- Second special written exam session (only for "fuori corso" and "part-time" students): April 2024
For the student enrolled in the A.Y. 2021/22 or 2022/23, the project is part of the exam, for which they get at most 6 points: indeed, every student enrolled in the A.Y. 2021/22 or 2022/23 must present a project in dedicated sessions with the professor or the tutor. Every project can be carried out by one student or by a group of two students. Every student or group of two students must send an email to the tutor Roberto Delfino (delfino@diag.uniroma1.it) with the topic chosen from the project, and wait for the confirmation. The project is considered completed if its evaluation gets at least 3 points.
Students have two options for carrying out the project: option 1 or option 2. For further information about the two categories and the detailed instructions to complete a project please refer to this document.
The list of possibile topics for projects corresponding to option 1 is as follows (other topics will be added during the course):
- 1. Pick up a relational database systems (i.e., IBM, Oracle, mySQL, MS Server, etc.) and illustrate the concurrency control strategy adopted in the system.
- 2. Pick up a noSQL database systems (i.e., MongoDB, Neo4j, etc.) and illustrate the concurrency control strategy adopted in the system.
- 3. Illustrate the techniques for concurrency control in the context of distributed relational databases.
- 4. Illustrate a concurrency control technique that has not been discussed in detail during the lectures.
- 5. Design and implement a software that emulates a scheduler based on a specific protocol or strategy for concurrency control
- 6. Design and implement a software for visualizing the external multipass sorting algorithm (with varying size of the relation and varying number of frames in the buffer).
- 7. Design a simple query processing algorithm for RDFS.
In order to carry out a project according to option 1, the student or the group must figure out which material to use and prepare both a brief written report on the topic and the slides for a 15 minute presentation, plus a demo, if needed. The discussion about the projects chosen according to option 1 is carried out on-line at https://meet.google.com/hzy-save-oqw.
The list of possible topics for projects corresponding to option 2 is as follows:
- 1. Pick up a relational database system (i.e., IBM, Oracle, mySQL, MS Server, etc.), a noSQL database system (i.e., MongoDB, Neo4j, etc.) and a dataset, to compare their performances w.r.t. the execution of some relevant queries.
- 2. Pick up a noSQL database system (i.e., MongoDB, Neo4j, etc.) and a dataset, showing a relevant usage of them over the selected dataset.
- 3. Select a datasets or a collection of datasets and a Data Warehouse tool, and carry out a Data Warehouse project, from the requirement analysis to the implementation.
- 4. Select a Data Warehousing or Analytics tool (i.e., Google Data Studio, Tableau, etc.) and a dataset, showing some relevant data analysis performed over the selected dataset.
In order to carry out a project according to option 2, the student or the group must figure out which material, tool and dataset to use and prepare a short demo (5 minutes) showing the execution of the chosen task over the dataset and the slides for a 15 minute presentation.
If you want to book an appointment with Roberto Delfino for project presentation or for questions, please send Roberto an email (delfino@diag.uniroma1.it) with subject "[DM]: followed by what you need". Roberto will be pleased of clarify all your doubts.
- Data about the evaluation of the course by students of the previous editions are available in the home pages of the corresponding editions (see below). Data about the 2022-2023 edition will be posted here as soon as they are available.
- Academic year 2021/2022
- Academic year 2020/2021
- Academic year 2019/2020
- Academic year 2018/2019
- Academic year 2017/2018
- Academic year 2016/2017
- Academic year 2015/2016
- Academic year 2014/2015
- Academic year 2013/2014
- Academic year 2012/2013
- Academic year 2011/2012
- Academic year 2010/2011
- Academic year 2009/2010
- Academic year 2008/2009
- Academic year 2007/2008
- Anno accademico 2006/2007 (in Italian)
- Anno accademico 2005/2006 (in Italian)
- Anno accademico 2004/2005 (in Italian)
- Anno accademico 2003/2004 (in Italian)
- Office hours. Tuesday, 5:00 pm at the Meet room at https://meet.google.com/hzy-save-oqw -- please, look at the last minute news for the next office hours