Data Management (academic year 2022/2023)



For whom is this course. This 6 credits course is for the students of the Master of Science in Engineering of Computer Science (School of Engineering) of the Sapienza Università di Roma. This course is also for students of the Corso di Laurea Magistrale in Ingegneria Gestionale of the same School. The language for both the course and the exam is English. The lectures are held in the second semester (February 2022 - May 2022).
Prerequisites. A good knowledge of the fundamentals of Programming Structures (algorithms and data structures), Programming Languages, Databases (SQL, relational data model, Entity-Relationship data model, conceptual and logical database design), Theoretical Computer Science (computational complexity, computability) is required.
Course goals. The course presents the basic concepts of data management systems. Several major issues related to the theory and the design of data management systems are covered, including NoSQL databases, transaction management, concurrency control, recovery, file and index organizations, query processing.


  • News
    • February 26, 2024. The exam of the April sessions (only for "fuori corso" or "part-time" students) will be held at 9:00pm on April 10, 2024 in classroom A2 at DIAG, via Ariosto 25. Who will not book for the exam within the deadline (see the INFOSTUD system) will not be accepted at the exam and there will be no exception to this rule.
  • Tutor: Roberto Delfino (delfino@diag.uniroma1.it)
  • Lecture material
    • M. Lenzerini, Lecture notes (slides to download)
      Students can download the course slides by accessing the MOODLE system at this page.
      The slides will be available during the lecture period. Please, note that all students of Sapienza can access the MOODLE system by using the user name and the password of the university
    • R. Ramakrishnan, J. Gehrke. Database Management Systems. McGraw-Hill, 2004
    • Students willing to read more about concurrency control can freely download an excellent book from the following site: http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx
      If one is looking for a more modern book, which is not free, a good suggestion is:
      Gerhard Weikum, Gottfried Vossen, "Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery", The Morgan Kaufmann Series in Data Management Systems.

  • Lectures
    • Monday: [1:00pm - 3:00pm], via Eudossiana 18 (RM041), classroom 41,
    • Wednesday: [05:10pm - 7:00pm], via Eudossiana 18 (RM041), classroom 41,
    • Thursday: [11:00am - 1:00pm] via Eudossiana 18 (RM041), classroom 41.

    The lectures can also be attended on-line, following this link:
    https://uniroma1.zoom.us/j/83220889311?pwd=ZkxrQ3crNFJDVlIwS21jelp4bjRXZ...

    The recording of the on-line lectures will be available in the MOODLE page of the course.

    Week Monday (1:00pm - 3:00pm) classroom 41 Wednesday (5:10pm - 7:00pm) classroom 41 Thursday (11:00am - 1:00pm) classroom 41
    01 (Feb 27) Lectures 1,2
    - Course overview
    - Relational data model
    - Recap of relational algebra
    -------
    -------
    03 (Mar 06) Lectures 3,4
    - Exercises on relational algebra
    - Recap of SQL
    - Architecture of a data manager
    Lectures 5,6
    - Buffer management
    - Introduction to transaction management
    - Serializability
    Lectures 7,8
    - Anomalies in concurrency
    - View serializability
    04 (Mar 13) Lectures 9,10
    - The notion of conflict
    - Conflict serializability
    -------
    Lectures 11,12
    - Properties of conflict serializability
    05 (Mar 20) Lectures 13,14
    - Exercises on view and conflict serializability
    Lectures 15,16
    - Schedules with exclusive locks
    - 2-phase locking with exclusive locks
    Lectures 17,18
    - 2-phase locking with shared and exclusive locks
    - Deadlock management
    06 (Mar 27) Lectures 19,20
    - Recoverability, cascadeless, strictness, rigorousness,
    - Concurrency through timestamps
    - Multiversion concurrency control
    Lectures 21,22
    - Concurrency control in SQL
    - Concurrency control in Postgres
    Lectures 23,24
    - Exercises on concurrency control
    - Recovery manager
    07 (Apr 03) Lectures 25,26
    - Introduction to Data Warehousing
    - Architectures of Data Warehouses
    Lectures 27,28
    - Multidimensional modeling
    - OLAP
    -------
    08 (Apr 10)
    -------
    Lectures 29
    - ROLAP vs MOLAP
    - Conceptual modeling of DWs: the DFM model
    Lectures 30,31
    - The star schema
    - The snowflake schema
    09 (Apr 17) Lectures 32,33
    - Logical design of a Data Warehouse
    - Exercises on Data Warehousing
    Lectures 34,35
    - NoSQL databases
    - The graph data model
    -------
    10 (Apr 24)
    -------
    Lectures 36,37
    - RDF databases
    - The SPARQL query language
    Lectures 38,39
    (Ing. Roberto Delfino)
    - Data Warehousing tools
    - Tableau public
    - Instructions of the student projects
    11 (May 01)
    -------
    Lectures 40,41
    (Ing. Roberto Delfino)
    - Neo4j
    Lectures 42,43
    - Record, page and simple file organizations
    - External sorting
    12 (May 08) Lectures 44,45
    - Indexes and their properties
    - Sorted indexes
    Lectures 46,47
    - Tree-based indexes
    - Exercises on file organizations
    Lectures 48,49
    - Introduction to algorithms for relational operators
    - One pass algorithms
    13 (May 15) Lectures 50,51
    - Nested loop algorithms
    Lectures 52,53
    - Two pass algorithms
    Lectures 54,55
    - Index-based algorithms
    - Parallel algorithms
    14 (May 22) Lectures 56,57
    - Exercises on query evaluation algorithms
    Lectures 58,59
    - Data Management in PA (seminar by Ing. Giuseppe Buono, Agenzia delle Entrate)
    Lectures 60,61
    - Discussion on projects (Ing. Roberto Delfino)
    14 (May 29)
    -------
    Lectures 62,63
    - Exercises on file organizations and relational operators
    -------

  • Topics covered (programma d'esame)
    • 1. The structure of a Data Base Management System (DBMS)
    • 2. Buffer management: buffer pool, replacement strategies, operations on the buffer
    • 3. Concurrency management: The concept of transaction; The notion of serializability; Concurrency management strategies; Concurrency control in SQL and PostgreSQL
    • 4. Recovery: Crash management; Classification of failures; Recovery strategies
    • 5. NoSQL data models and systems; The case of graph databases; The Neo4j system
    • 6. Data warehousing: architectures, multidimensional modeling, OLAP, design methodologies; The DFM model; The logical models for DW: the star schema, the snowflake schema
    • 7. Physical structures for data management: Record and page organizations, Simple file organizations, Indexed file organizations
    • 8. Query processing: evaluation of relational algebra operators
  • Exam

    The student enrolled in the Master of Engineering in Computer Science in an A.Y. before 2021/22 are not obliged to carry out a project and therefore for them the exam can be just a written exam. The exam will be on the topics of the current A.Y. 2022/2023 (programma d'esame dell'A.A. 2022/2023). If they choose not to do the project, they MUST send an email to Prof. Lenzerini WITHIN the deadline for booking at the exam. If they do not do that within the stated deadline, they will have to do the exam with the rules of A.Y. 2021/22 or A.Y. 2022/23 and there will be no exception to this rule. Please, take into account that the written exam will be anyway on the "programma d'esame" of the current A.Y. 2022/2023.

    For the student enrolled in the Master of Engineering in Computer Science in A.Y. 2021/22 or A.Y. 2022/23, the exam is composed by two parts: the project and the written exam. The mark obtained at the exam is the sum of the mark obtained at the project presentation (max 6 points) and the mark obtained at the written exam (max 24 points). If the mark for the project is less than 3 or the mark for the written exam is less than 15, the overall exam is considered failed.

    Note: the project can be presented at any time, before, or even AFTER the written exam. The exam will be formally registered when both the project presentation and the written exam will be carried out. When both parts of the exam will be successfully completed, the student must wait for the first exam session ("appello") available, and book in INFOSTUD in order to have the exam formally registered.

    When the student has completed one of the two parts (either the written exam, or the project), (s)he can complete the other part either in the same academic year, or in the next academic year. After this deadline, the student looses the grade for the completed part, and must carry out both parts.

    • Past written exams: you can have a look at the texts of past exams
    • To book for the written exam: Please, follow the on-line booking procedure.
    • Schedule of written exams (decided by the administration office, not by the professor):
      • First written exam: June 2023
      • Second written exam: July 2023
      • Third written exam: September 2023
      • Special exam session (only for "fuori corso" or "part-time" students): October 2023
      • Fourth written exam: January 2024
      • Fifth written exam: February 2024
      • Second special written exam session (only for "fuori corso" and "part-time" students): April 2024
  • Projects

    For the student enrolled in the A.Y. 2021/22 or 2022/23, the project is part of the exam, for which they get at most 6 points: indeed, every student enrolled in the A.Y. 2021/22 or 2022/23 must present a project in dedicated sessions with the professor or the tutor. Every project can be carried out by one student or by a group of two students. Every student or group of two students must send an email to the tutor Roberto Delfino (delfino@diag.uniroma1.it) with the topic chosen from the project, and wait for the confirmation. The project is considered completed if its evaluation gets at least 3 points.

    Students have two options for carrying out the project: option 1 or option 2. For further information about the two categories and the detailed instructions to complete a project please refer to this document.

  • The list of possibile topics for projects corresponding to option 1 is as follows (other topics will be added during the course):

    • 1. Pick up a relational database systems (i.e., IBM, Oracle, mySQL, MS Server, etc.) and illustrate the concurrency control strategy adopted in the system.
    • 2. Pick up a noSQL database systems (i.e., MongoDB, Neo4j, etc.) and illustrate the concurrency control strategy adopted in the system.
    • 3. Illustrate the techniques for concurrency control in the context of distributed relational databases.
    • 4. Illustrate a concurrency control technique that has not been discussed in detail during the lectures.
    • 5. Design and implement a software that emulates a scheduler based on a specific protocol or strategy for concurrency control
    • 6. Design and implement a software for visualizing the external multipass sorting algorithm (with varying size of the relation and varying number of frames in the buffer).
    • 7. Design a simple query processing algorithm for RDFS.

    In order to carry out a project according to option 1, the student or the group must figure out which material to use and prepare both a brief written report on the topic and the slides for a 15 minute presentation, plus a demo, if needed. The discussion about the projects chosen according to option 1 is carried out on-line at https://meet.google.com/hzy-save-oqw.

    The list of possible topics for projects corresponding to option 2 is as follows:

    • 1. Pick up a relational database system (i.e., IBM, Oracle, mySQL, MS Server, etc.), a noSQL database system (i.e., MongoDB, Neo4j, etc.) and a dataset, to compare their performances w.r.t. the execution of some relevant queries.
    • 2. Pick up a noSQL database system (i.e., MongoDB, Neo4j, etc.) and a dataset, showing a relevant usage of them over the selected dataset.
    • 3. Select a datasets or a collection of datasets and a Data Warehouse tool, and carry out a Data Warehouse project, from the requirement analysis to the implementation.
    • 4. Select a Data Warehousing or Analytics tool (i.e., Google Data Studio, Tableau, etc.) and a dataset, showing some relevant data analysis performed over the selected dataset.

    In order to carry out a project according to option 2, the student or the group must figure out which material, tool and dataset to use and prepare a short demo (5 minutes) showing the execution of the chosen task over the dataset and the slides for a 15 minute presentation.

    If you want to book an appointment with Roberto Delfino for project presentation or for questions, please send Roberto an email (delfino@diag.uniroma1.it) with subject "[DM]: followed by what you need". Roberto will be pleased of clarify all your doubts.

  • Information about course evaluation by students
    • Data about the evaluation of the course by students of the previous editions are available in the home pages of the corresponding editions (see below). Data about the 2022-2023 edition will be posted here as soon as they are available.
  • Information on past editions of this course