Data Management (academic year 2023/2024)



For whom is this course. This 6 credits course is for the students of the Master of Science in Engineering of Computer Science (School of Engineering) of the Sapienza Università di Roma. This course is also for students of the Corso di Laurea Magistrale in Ingegneria Gestionale of the same School. The language for both the course and the exam is English. The lectures are held in the second semester (February 2024 - May 2024).
Prerequisites. A good knowledge of the fundamentals of Programming Structures (algorithms and data structures), Programming Languages, Databases (SQL, relational data model, Entity-Relationship data model, conceptual and logical database design), Theoretical Computer Science (computational complexity, computability) is required.
Course goals. The course presents the basic concepts of data management systems. Several major issues related to the theory and the design of data management systems are covered, including NoSQL databases, transaction management, concurrency control, recovery, file and index organizations, query processing.


  • News
    • May 22, 2024. The written exams of the summer session will be held on June 5 2024 (at 2pm, classroom 204 Marco Polo), July 11 2024 (at 9am, classroom 204 Marco Polo) and September 17 2024 (at 9am, classroom 204 Marco Polo). Who will not book for the exam within the deadline (see the INFOSTUD system) will not be accepted at the exam and there will be no exception to this rule.
    • February 24, 2024. The lectures will be start on Monday, February 26, 2024
  • Tutor: Roberto Delfino (delfino@diag.uniroma1.it)
  • Lecture material
    • M. Lenzerini, Lecture notes (slides to download)
      Students can download the course slides by accessing the MOODLE system at this page.
      The slides will be available during the lecture period. Please, note that all students of Sapienza can access the MOODLE system by using the user name and the password of the university
    • R. Ramakrishnan, J. Gehrke. Database Management Systems. McGraw-Hill, 2004
    • Students willing to read more about concurrency control can freely download an excellent book from the following site: http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx
      If one is looking for a more modern book, which is not free, a good suggestion is:
      Gerhard Weikum, Gottfried Vossen, "Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery", The Morgan Kaufmann Series in Data Management Systems.

  • Lectures
    • Monday: [1:00pm - 3:00pm], via Eudossiana 18 (RM041), classroom 41,
    • Tuesday: [2:00pm - 4:00pm], via Eudossiana 18 (RM041), classroom 41,
    • Thursday: [11:00am - 1:00pm] via Eudossiana 18 (RM041), classroom 41.

    The lectures can also be attended on-line, following this link:
    https://uniroma1.zoom.us/j/83220889311?pwd=ZkxrQ3crNFJDVlIwS21jelp4bjRXZ...

    The recording of the on-line lectures will be available in the MOODLE page of the course.

    Week Monday (1:00pm - 3:00pm) classroom 41 Tuesday (2:00pm - 4:00pm) classroom 41 Thursday (11:00am - 1:00pm) classroom 41
    01 (Feb 26) Lectures 1,2
    - Course overview
    - Relational data model
    - Recap of relational algebra
    Lectures 3,4
    - Exercises on relational algebra
    Lectures 5,6
    - Recap of SQL
    03 (Mar 04) Lectures 7,8
    - The concept of Data Warehouse
    -------
    Lectures 9,10
    - Multidimensional data modeling
    04 (Mar 11) Lectures 11,12
    - OLAP
    - Methodologies for multidimensional modeling: DFM
    -------
    Lectures 13,14
    - Logical design of DFM schemas: the notion of Star schema
    05 (Mar 18)
    -------
    Lectures 15,16
    - The notion of Snowflake schema
    - Exercise on DW design
    - The Tableau system for OLAP
    - Description of projects
    (Dott. Roberto Delfino)
    Lectures 17,18
    - NoSQL data models
    - The graph data model
    06 (Mar 25) Lectures 19,20
    - Regular path queries
    - Storage structures for graph databases
    Lectures 21,22
    - The Neo4j system
    (Dott. Roberto Delfino)
    -------
    07 (Apr 01)
    -------
    -------
    Lectures 23,24
    - RDF databases
    - The SPARQL query language
    08 (Apr 08) Lectures 25,26
    - Semantics of the SPARQL query language
    - Introduction to Knowledge Graphs
    Lectures 27
    - RDFS-based Knowledge Graphs: syntax
    Lectures 28
    - RDFS-based Knowledge Graphs: semantics and inference
    09 (Apr 15) Lectures 29
    - Document databases
    (Dott. Roberto Delfino)
    Lectures 30,31
    - MongoDB
    (Dott. Roberto Delfino)
    Lectures 32,33
    - Transactions and concurrency
    - Serializability
    - View-serializability
    10 (Apr 22) Lectures 34,35
    - Conflict-serializability
    Lectures 36,37
    - The scheduler: objectives and algorithms
    - Concurrency control based on locking
    -------
    11 (Apr 29) Lectures 38,39
    - The 2PL protocol
    - Exercises on concurrency control
    Lectures 40,41
    - The notion of recoverability
    - ACR, strict and rigorous schedules
    Lectures 42,43
    - Multi-version concurrency control
    - Concurrency control in PostgreSQL
    12 (May 06) Lectures 44,45
    - Buffer management
    - Basic file organizations
    Lectures 46,47
    - External sorting
    - Index-based organizations
    Lectures 48,49
    - External sorting
    - Index-based organizations
    13 (May 13) Lectures 50,51
    - Tree-based index-based organizations
    - Exercises on file organizations
    Lectures 52,53
    - Overview of query evaluation
    - Classification of algorithms for relational operators
    - One pass algorithms
    Lectures 54
    - Nested-loop algorithms
    14 (May 20) Lectures 55,56
    - Two-pass algorithms
    - Multi-pass algorithms
    Lectures 57,58
    - Index-based algorithms
    Lectures 59,60
    - Parallel algorithms
    - Exercises on file organizations
    15 (May 27) Lectures 61,62
    - Exercises on file organizations and query evaluation
    Lectures 63,64
    - Exercises on file organizations and query evaluation
    -------

  • Topics covered (programma d'esame)
    • 1. Recap on databases and SQL
    • 2. Data warehousing: architectures, multidimensional modeling, OLAP
    • 3. Data Warehouse design methodologies; The DFM model; The logical models for DW: the star schema, the snowflake schema
    • 4. NoSQL data models and systems: the case of graph databases; The Neo4j system
    • 5. The notion of Knowledge Graph; RDFS Knowledge Graphs:syntax, semantics, inference
    • 6. NoSQL data models and systems: the case of Document databases; The MongoDB system
    • 7. The structure of a Data Management System (DBMS)
    • 8. Buffer management: buffer pool, replacement strategies, operations on the buffer
    • 9. Concurrency management: The concept of transaction; The notion of serializability; Concurrency management strategies; Concurrency control in SQL and PostgreSQL; Recovery: Crash management; Classification of failures; Recovery strategies
    • 10. Physical structures for data management: Record and page organizations, Simple file organizations, Indexed file organizations
    • 11. Query processing: evaluation of relational algebra operators
  • Exam
    • The student enrolled in A.Y. 2023/24 can choose between two options:
      • OPTION 1: doing only the written exam; in this case the written exam is full and the mark is between 18 and 30, as usual.
      • OPTION 2: doing both the written exam and the project. In this case,
        • the written exam is shortened wrt to OPTION 1, the maximum mark is 24 and the minimum mark for passing the exam is 15,

        • for the project, the maximum mark is 8 and the minimum mark for passing the project is 4.

      The procedure for choosing between the two options is easy: if a student chooses option 2, then (s)he must contact the tutor and tell him about the decision of choosing OPTION 2 before booking in INFOSTUD for the written exam, and even before formulating the project proposal; if a student chooses option 1, then (s)he simply has to book for the written exam without having contacted the tutor for the step described above.

      Notice that there is no deadline for choosing OPTION 2; notice also that booking for the exam without having informed the tutor about the decision of opting for OPTION 2 implies that the student has opted for OPTION 1.

    • The student enrolled in an A.Y. before 2021/22 were not obliged to carry out a project and therefore the rules for them are the same as the rules for the student enrolled in A.Y. 2023/24.
    • The student enrolled in A.Y. 2021/22 or A.Y. 2022/23 has two options: (a) to follow completely the rules of the corresponding A.Y. (such rules are described in the page of the 2022/23 edition of the course); (b) to follow completely the rules of A.Y. 2023/24, in which case they have to choose between OPTION 1 and 2 above. Note that option b) is in particular tailored for students who follow the course in the 2023/24 edition.

    Note: the project can be presented at any time, before, or even AFTER the written exam. The exam will be formally registered when both the project presentation and the written exam will be carried out. When both parts of the exam will be successfully completed, the student must wait for the first exam session ("appello") available, and book in INFOSTUD in order to have the exam formally registered.

    The procedure for the proposal, the assignment, the preparation and the discussion of a project is described in this document.

    • Past written exams: you can have a look at the texts of past exams
    • To book for the written exam: Please, follow the on-line booking procedure.
    • Schedule of written exams (decided by the administration office, not by the professor):
      • First written exam: June 2024
      • Second written exam: July 2024
      • Third written exam: September 2024
      • Special exam session (only for "fuori corso" or "part-time" students): October 2024
      • Fourth written exam: January 2025
      • Fifth written exam: February 2025
      • Second special written exam session (only for "fuori corso" and "part-time" students): April 2025
  • Projects

    Information about the projects will be posted soon.

  • Information about course evaluation by students
    • Data about the evaluation of the course by students of the previous editions are available in the home pages of the corresponding editions (see below). Data about the 2022-2023 edition will be posted here as soon as they are available.
  • Information on past editions of this course
  • Office hours. Tuesday, 5:00 pm at the Meet room at https://meet.google.com/hzy-save-oqw -- please, look at the last minute news for the next office hours.