For whom is this course. This 3 credit course is actually
one of the sections of the course Elective in Software and Services of
the Master of Science in Engineering in Computer Science the Sapienza
UniversitÓ di Roma.
Prerequisites. A good knowledge of the fundamentals of
Programming Structures, Programming Languages, Databases (SQL,
relational data model, Entity-Relationship data model, conceptual and
logical database design) and Database systems.
Course goals. In one sentence, Big Data is data that exceeds the
processing capacity of conventional database systems. In particular,
Big Data applications deal with huge amounts of data, possibly
collected from a huge number of data sources (volume), with
highly heterogeneous format (variety), at a very high rate (velocity).
This scenario calls for new technologies to be developed, ranging from
new data storage mechanisms to new computing frameworks. In this course
we will look at several key technologies used in manipulating, storing,
and analyzing big data. In particular, we will study architectures for
data intensive distributed applications, Data Warehouse solutions,
NoSQL storage solutions, including RDF and graph databases.
- When: Tuesday, 4:00pm - 6:00pm (occasionally also 6:00pm-7:00pm)
starting from February 25 until June 5, 2020
- Where: Classroom A3, DIAG, via Ariosto 25, Roma
- Lectures 1, 2 (February 25)
- Course Introduction; Introduction to Big Data
- Lectures 3,4 (March 3)
- Graph Databases: Introduction to Graph databases; Graph DBs vs relational DBs; Graph Abstract Data Type and Implementation of Graphs; Querying Graph Databases;Types of Graph Databases
- Lectures 5, 6 (March 17) - remote lecture, video available through http://elearning2.uniroma1.it/
- Property Graph Databases: A Neo4j overview
- Lectures 7, 8 (March 24) - remote lecture, video available through http://elearning2.uniroma1.it/
- Graph Databases: Resource Description framework (RDF); RDFS
- Lectures 9, 10, 11 (March 31) - remote lecture, video available through http://elearning2.uniroma1.it/
- Graph Databases: RDF storage; SPARQL; Linked Open Data
- Lectures 12, 13, 14 (April 7) - remote lecture, video available through http://elearning2.uniroma1.it/
- Aggregate Data Models: the notion of aggregate; NoSQL data models: Key-value, Document-based and Column-family; a brief note on Data Modeling in NoSQL databases
- Lectures 15, 16 (April 21) - remote lecture, video available through http://elearning2.uniroma1.it/
- Lectures 17, 18 (April 28) - remote lecture, video available through http://elearning2.uniroma1.it/
- Aggregate databases: Distribution Models; Consistency
- Lectures 19, 20, 21 (May 5) - remote lecture, video available through http://elearning2.uniroma1.it/
- Aggregate databases: Map-reduce
- Data Warehousing: introduction; architectures; ETL
- Lectures 22, 23 (May 12) - remote lecture, video available through http://elearning2.uniroma1.it/
- Data Warehousing: multidimensional model; Accessing Data Warehouses: reports, dashboards,OLAP, data mining; ROLAP vs. MOLAP.
- Lectures 24, 25, 26 (May 19) - remote lecture, video available through http://elearning2.uniroma1.it/
- Data Warehousing: Conceptual Modeling of DWs and the Dimensional Fact Model; star schema and snowflake schema; views; logical modeling of DWs.
- May 26
- Presentations of students' projects
Slides are available at http://elearning2.uniroma1.it/
To access the material enter in the system with your INFOSTUD
account and select the course on Big Data Management
- NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence.
Pramod J. Sadalage and Martin Fowler. Addison-Wesley. 2013
- Data Warehouse Design: Modern Principles and Methodologies. Matteo Golfarelli and Stefano Rizzi. McGraw-Hill. 2009.
There are two modalities for the exam:
(1) Development of a small project. Students are strongly encouraged to propose their own idea for projects. As a suggestion, they can refer to (and also select from) the following list of tools. The project connected to a tool consists, for example, in studying the logical data model(s) adopted by the tool, the native storage data structure it uses, the query language it provides, and highlighting further distinguishing features. Also, a demonstration of the basic use of the tool through one or more examples is required. Presentation connected to projects (possibly through slides) should last around 20 minutes (including the demo).
- Graph database and RDF tools
OrientDB (it has features of both document and graph DBMSs).
ThingSpan (the new product incorporating InfiniteGraph functionalities)
GraphDB (free edition).
- key-value database tools
- document database tools
MarkLogic (Enterprise NoSQL)
- column-family database tools
- DataWarehousing tools
- Qlikview (a
proprietary front-end tool for Business intelligence. A personal
edition can be downloaded for study purposes. Being it a front-end
tool, the focus of student analysis should be on the mechanisms
provided by the tool for data analytics, and for multidimensional
access to data, rather than on data models or storage data structure).
Note: This kind of projects can be developed individually or
by groups of two students. In this latter case,
presentation should be equally separated into two parts, one managed by
each member of the group, and the overall presentation time
can be extended to 30-40 minutes.
The exam will consist in the project presentation with possible additional questions on the
topics covered by this
section of the Large Scale Data Management Course.
To have a project assigned, students must send an email to
indicating the kind project they are willing
to present (please, do not start working on a project before you have
(2) Article Presentation
Article presentation consists in preparing a 20 minute presentation about
scientific papers assigned by the lecturer or proposed by students. Send an email to firstname.lastname@example.org to ask for the assignment of papers to study as final work (please, do not start studying a paper for exam presentation before you have it assigned).
Note: Article presentation can be carried out only individually
Note: Both project and paper presentations and paper will be preferably
carried out during the office ours. Students are however required to send an email in advance to fix the exact date and hour of their presentation.
Note: We recall that these exam details refer only to the
section on Big Data Management of the course "Big Data Management". Once you have passed the exam of this section, it will be notified to Prof. Maurizio Lenzerini, which is the responsible for the course for this academic year. The exam of the overall course of
"Large Scale Data Management" will be officially recorded (verbalizzato) through the INFOSTUD system only once the student will have successfully passed the exams of all the sections of the course. For details on this final registration please refere to the web page of the course "Large Scale Data Management".