Data Management (academic year 2021/2022)



For whom is this course. This 6 credits course is for the students of the Master of Science in Engineering of Computer Science (School of Engineering) of the Sapienza Università di Roma. This course is also for students of the Corso di Laurea Magistrale in Ingegneria Gestionale of the same School. The language for both the course and the exam is English. The lectures are held in the second semester (February 2022 - May 2022).
Prerequisites. A good knowledge of the fundamentals of Programming Structures (algorithms and data structures), Programming Languages, Databases (SQL, relational data model, Entity-Relationship data model, conceptual and logical database design), Theoretical Computer Science (computational complexity, computability) is required.
Course goals. The course presents the basic concepts of data management systems. Several major issues related to the theory and the design of data management systems are covered, including NoSQL databases, transaction management, concurrency control, recovery, file and index organizations, query processing.


  • News

    • June 22, 2022. For each of the exam session, the students enrolled in the Master of Engineering in Computer Science before A.Y. 2021/22 can ask to do the exam with the topics and the rules (e.g., without project) of A.Y. 2020/21. In order to that, they MUST send an email to Prof. Lenzerini WITHIN the deadline for booking at the exam. If they do not do that within the stated deadline, they will have to do the exam with the topics and the rules of A.Y. 2021/22 and there will be no exception to this rule.
    • April 21, 2022. The exams for the summer are scheduled as follows: June 8, 2002 (9:00am, classroom 41 in San Pietro in Vincoli), July 13, 2002 (9:00am, classroom 41 in San Pietro in Vincoli), September 9, 2002 (9:00am, classroom 41 in San Pietro in Vincoli). The students should book for the exam using INFOSTUD. Only the students that book for the exam within the deadline will be admitted to the exam, and there will be no exception to this rule.

  • Tutor: Riccardo Valentini (valentini@diag.uniroma1.it)
  • Lecture material
    • M. Lenzerini, Lecture notes (slides to download)
      Students can download the course slides by accessing the MOODLE system at this page.
      The slides will be available during the lecture period. Please, note that all students of Sapienza can access the MOODLE system by using the user name and the password of the university
    • R. Ramakrishnan, J. Gehrke. Database Management Systems. McGraw-Hill, 2004
    • Students willing to read more about concurrency control can freely download an excellent book from the following site: http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx
      If one is looking for a more modern book, which is not free, a good suggestion is:
      Gerhard Weikum, Gottfried Vossen, "Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery", The Morgan Kaufmann Series in Data Management Systems.

  • Lectures
    • Wednesday: [5:00pm - 7:00pm], via Eudossiana 18 (RM031), classroom 1,
    • Thursday: [08:00am - 10:00am], via Eudossiana 18 (RM041), classroom 41,
    • Thursday: [10:00am - 12:00pm] via Eudossiana 18 (RM041), classroom 41.

    The lectures can also be attended on-line, following this link:
    https://uniroma1.zoom.us/j/83220889311?pwd=ZkxrQ3crNFJDVlIwS21jelp4bjRXZ...

    The recording of the on-line lectures will be available in the MOODLE page of the course.

    Week Wednesday (5:00pm - 7:00pm) classroom 1 Thursday (8:00am - 10:00am) classroom 41 Thursday (10:00am - 12:00pm) classroom 41
    01 (Feb 21) Lectures 1,2
    - Course overview
    - Relational data model
    - Recap of relational algebra
    -------
    -------
    02 (Feb 28)
    -------
    Lectures 3,4
    - Exercises on relational algebra
    - Recap of SQL
    - Architecture of a data manager
    Lectures 5,6
    - Buffer management
    - Introduction to transaction management
    - Serializability
    03 (Mar 07) Lectures 7,8
    - Anomalies in concurrency
    - View serializability
    Lectures 9,10
    - The notion of conflict
    - Conflict-serializability
    Lectures 11,12
    - Properties of conflict-serializability
    04 (Mar 14) Lectures 13,14
    - Locking policies
    - The 2-phase locking protocol
    Lectures 15,16
    - Properties of 2-phase locking
    -------
    05 (Mar 21) Lectures 17,18
    - Exercises on concurrency control - part 1
    Lectures 19,20
    - The notion of recoverability
    Lectures 21,22
    - Timestamp-based concurrency control
    06 (Mar 28) Lectures 23,24
    - Multiversion concurrency control
    Lectures 25,26
    - Concurrency control in SQL and PostgreSQL
    Lectures 27,28
    - Exercises on concurrency control - part 2
    07 (Apr 04) Lectures 29,30
    - The recovery module
    - Records, pages, files in secondary storage
    Lectures 31,32
    - File organizations
    Lectures 33,34
    - Sorting in secondary storage
    08 (Apr 11) Lectures 35,36
    - File organizations with indexes
    -------
    -------
    09 (Apr 18) Lectures 37,38
    - Exercises on file organizations
    Lectures 39,40
    - Tree-based indexes
    Lectures 41,42
    - Introduction to algorithms for relational operators
    - One-pass algorithms for unary operators
    10 (Apr 25)
    -------

    Lectures 43,44
    - One-pass algorithms for binary operators
    - Nested loop algorithms
    Lectures 45,46
    - Two-pass algorithms based on sorting
    11 (May 02) Lectures 47,48
    - Two-pass algorithms based on hashing
    - Multi-pass algorithms
    - Index-based algorithms
    -------
    12 (May 09) Lectures 49,50
    - Parallel algorithms
    - NoSQL databases
    - Graph-oriented databases
    Lectures 51,52
    - Graph databases
    Lectures 53,54
    - RDF databases
    13 (May 16) Lectures 55,56
    - The Noe4j system (Dott. Riccardo Valentini)
    - Description of the projects (Dott. Riccardo Valentini)
    Lectures 57,58
    - The notion of Data warehousing
    - The architectures of data warehouses
    Lectures 59,60
    - OLAP vs OLTP
    - OLAP operators
    14 (May 23) Lectures 61,62
    - The Dimensional Fact Model
    - Conceptual design of a data warehouse
    Lectures 63,64
    - The Star schema e the Snowflake schema
    - Conceptual design of a data warehouse
    - Logical design of a data warehouse
    Lectures 65,66
    - Tools for OLAP (Dott. Riccardo Valentini)
  • Exercises
    • Exercises on concurrency control - part 1 (see the Moodle page)
    • Exercises on concurrency control - part 2 (see the Moodle page)
    • Exercises on file organizations (see the Moodle page)
    • Exercises on evaluation of relational operators (see the Moodle page)

  • Topics covered
    • 1. The structure of a Data Base Management System (DBMS)
    • 2. Buffer management: buffer pool, replacement strategies, operations on the buffer
    • 3. Concurrency management: The concept of transaction, The notion of serializability, Concurrency management strategies
    • 4. Recovery: Crash management, Classification of failures, Recovery strategies
    • 5. Physical structures for data bases: Record and page organizations, Simple file organizations, Indexed file organizations
    • 6. Query processing: Evaluation of relational algebra operators, logical and physical query plans
    • 7. NoSQL data models and systems: the case of graph databases
    • 8. Data warehousing: architectures, multidimensional model, OLAP, design
  • Projects
    In order to have the exam registered, every student enrolled in the A.Y. 2021/22 must present a project in dedicated sessions with the professor or the tutor. Every project can be carried out by one student or by a group of two students. Every student or group of two students must send an email to the tutor Riccardo Valentini (valentini@diag.uniroma1.it) with the topic chosen from the project, and wait for the confirmation.

    Note: the project can be presented at any time, even AFTER the exam, during the Academic Year (i.e., until the April 2023 session). If a student respects this deadline, the grade obtained at the exam will not be lost. After such deadline, the students must repeat the exam.

    Students have two options for carrying out the project: option 1 or option 2. For further information about the two categories and the detailed instructions to complete a project please refer to this document.

  • The list of possibile topics for projects corresponding to option 1 is as follows (other topics will be added during the course):

    1. Pick up a relational database systems (i.e., IBM, Oracle, mySQL, MS Server, etc.) and illustrate the concurrency control strategy adopted in the system.
    2. Pick up a noSQL database systems (i.e., MongoDB, Neo4j, etc.) and illustrate the concurrency control strategy adopted in the system.
    3. Illustrate the techniques for concurrency control in the context of distributed relational databases.
    4. Design and implement a software for visualizing the external multipass sorting algorithm (with varying size of the relation and varying number of frames in the buffer).

    In order to carry out a project according to option 1, the student or the group must figure out which material to use and prepare both a brief written report on the topic and the slides for a 15 minute presentation.

    The list of possible topics for projects corresponding to option 2 is as follows:
    1. Pick up a relational database system (i.e., IBM, Oracle, mySQL, MS Server, etc.), a noSQL database system (i.e., MongoDB, Neo4j, etc.) and a dataset, to compare their performances w.r.t. the execution of some relevant queries.
    2. Pick up a noSQL database system (i.e., MongoDB, Neo4j, etc.) and a dataset, showing a relevant usage of them over the selected dataset.
    3. Select a Data Warehousing tool (i.e., Google Data Studio, Tableau, etc.) and a dataset, showing some relevant data analysis performed over the selected dataset.

    In order to carry out a project according to option 2, the student or the group must figure out which material, tool and dataset to use and prepare the execution of a task over the dataset (short demo of 3-5 minutes) and the slides for a 15 minute presentation.

    Here are the two links to book an appointment with Riccardo Valentini to discuss about doubts or questions on the projects:

    If you have any further questions, please send Riccardo an email (valentini@diag.uniroma1.it) with subject "[DM]: followed by what you need". Riccardo will be pleased of clarify all your doubts.

  • Exam
    • Important: the students enrolled in the Master of Engineering in Computer Science before A.Y. 2021/22 can ask to do the exam with the topics and the rules (e.g., without project) of A.Y. 2020/21. In order to that, they MUST send an email to Prof. Lenzerini WITHIN the deadline for booking at the exam. If they do not do that within the stated deadline, they will have to do the exam with the topics and the rules of A.Y. 2021/22 and there will be no exception to this rule.
    • Final exam: the final exam is constituted by a written exam, and a possible oral examinations. Students have 2 hours for completing the written exam
    • Past written exams: you can have a look at the texts of past exams
    • To book for the exam: Please, follow the on-line booking procedure.
    • Schedule of exams:
      • First written exam: June 2022
      • Second written exam: July 2022
      • Third written exam: September 2022
      • Special exam session (only for "fuori corso" or "part-time" students): October 2022
      • Fourth written exam: January 2023
      • Fifth written exam: February 2023
      • Second special exam session (only for "fuori corso" and "part-time" students): April 2023
  • Information about course evaluation by students
    • Data about the evaluation of the course by students of the previous editions are available in the home pages of the corresponding editions (see below). Data about the 2021-2022 edition will be posted here as soon as they are available.
  • Information on past editions of this course