BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Date iCal//NONSGML kigkonsult.se iCalcreator 2.20.2//
METHOD:PUBLISH
X-WR-CALNAME;VALUE=TEXT:Eventi DIAG
BEGIN:VTIMEZONE
TZID:Europe/Paris
BEGIN:STANDARD
DTSTART:20191027T030000
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20200329T020000
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:calendar.19542.field_data.0@www.diag.uniroma1.it
DTSTAMP:20260415T191838Z
CREATED:20200227T165838Z
DESCRIPTION:1. Robust Statistics for Data Reduction (March 9th - 10th 2020\
 , 09:00-13:00\, DIAG\, Aula B203)Prof. Alessio Farcomeni (Tor Vergata)2. D
 imensionality Reduction in Clustering and Streaming (March 16th - 17th 202
 0\, 09:00-13:00\, DIAG\,  Aula B203)Prof. Chris Schwiegelshohn (Sapienza)A
 bstracts1. Robust Statistics for Data Reduction We will briefly introduce 
 the main principles and ideas in robust statistics\, focusing on trimming 
 methods. The working example will be that of estimation of location and sc
 atter in multidimensional problems\, together with outlier identification.
  We will then discuss some methods for robust clustering based on impartia
 l trimming and snipping.  A simple robust method for dimensionality reduct
 ion will be finally discussed. Illustrations will be based on the R softwa
 re and some contributed extension packages.  Tentative schedule: Introduct
 ion to robust inference. Concepts of: masking\, swamping\, breakdown point
 \, Tukey-Huber contamination\, entry-wise contamination. Estimation of loc
 ation and scatter based on the Minimum Covariance Determinant. The fastMCD
  algorithm. Outlier identification. Robust clustering: trimmed $k$-means\,
  snipped $k$-means. The tclust and sclust algorithms. Selecting the trimmi
 ng level and number of clusters though the classification trimmed likeliho
 od curves. Plug-in methods for dimension reduction. Brief overview of most
  recent contributions and venues for further work.  The course will be bas
 ed on the book: Farcomeni\, A. and Greco\, L. (2015) Robust Methods for Da
 ta Reduction\, Chapman & Hall/CRC Press 2. Dimensionality Reduction in Clu
 stering and StreamingFirst Day:The curse of dimensionality is a common occ
 urrence when working with large data sets. In few dimensions (such as the 
 Euclidean plane)\, we visualize problems very well and can often find inte
 resting properties of a data set just by hand. In more than three dimensio
 ns\, our ability to visualize a problem is already severely impacted and o
 ur intuition from the Euclidean plane may lead us completely astray. Moreo
 ver\, algorithms often scale poorly:Finding nearest neighbors in 2d can be
  done in nearly linear time. In high dimensions\, it becomes very difficul
 t to improve over either n^2.Geometric data structures and decompositions 
 become hard to implement. Line sweeps\, Voronoi diagrams\, grids\, nets us
 ually scale by at least a factor 2^d\, where d is the dimension. In some c
 ases\, it may be even worse.Many problems that are easy to solve in 2D\, s
 uch as clustering\, become computationally intractable in high dimensions.
  Often\, exact solutions require running times that are exponential in the
  number of dimensions.Unfortunately\, high dimensional data sets are not t
 he exception\, but rather the norm in modern data analysis. As such\, much
  of computational data analysis has been devoted with finding ways to redu
 ce the dimension. In this course\, we will study two popular methods\, nam
 ely principal component analysis (PCA) and random projections. Principal c
 omponent analysis originated in statistics\, but is also known under vario
 us other names\, depending on the fields (e.g. eigenvector problem\, low r
 ank approximation\, etc). We will illustrate the method\, highlighting the
  problem that is solved and the underlying assumptions of PCA. Next\, we w
 ill see a powerful tool for dimension reduction known as the Johnson-Linde
 nstrauss lemma. The Johnson-Lindenstrauss lemma states that given a point 
 set A in an arbitrary high dimension\, we can transform A into a point set
  A' in dimension log |A|\, while preserving all pairwise distances. For bo
 th of these problems\, we will see applications\, including k-nearest neig
 hbor classification and k-means. Second day:Large data sets form a sister 
 topic to dimension reduction. While the benefits of having a small dimensi
 on are immediately understood\, reducing the size of the data is a compara
 tively recent paradigm. There are many reasons for data compression. Aside
  from data storage and retrieval\, we want to minimize the amount of commu
 nication in distributed computing\, enable online and streaming algorithms
 \, or simply run an accurate (but expensive) algorithm on a smaller datase
 t. A key concept in large-scale data analysis are coresets. We view corese
 ts as a succinct summary of a data set that behaves\, for any candidate so
 lution\, like the original data set. The surprising success story of data 
 compression is that for many problems\, we can construct coresets of size 
 independent of the input. For example\, linear regression in d dimensions 
 admits coresets of size O(d)\, k-means has coresets of size O(k)\, irrespe
 ctive of the number of data points of the original data set. In our course
 \, we will describe the coreset paradigm formally. Moreover\, we will give
  an overview of methods to construct coresets for various problems. Exampl
 es include constructing coresets from random projections\, by analyzing gr
 adients\, or via sampling. We will further highlight a number of applicati
 ons. 
DTSTART;TZID=Europe/Paris:20200309T090000
DTEND;TZID=Europe/Paris:20200317T130000
LAST-MODIFIED:20200227T170118Z
LOCATION:DIAG - Aula B203
SUMMARY:Computational and Statistical Methods of Data Reduction - Data Scie
 nce PhD Course - Prof. Alessio Farcomeni (Univ. of Tor Vergata) - Dr. Chri
 s Schwiegelshohn (DIAG - Sapienza)\n\n\n  \n  \n\n    \n\n\nChris\n\n\nSch
 wiegelshohn  \n\n  \n\n    \n\n\n\n\n\nOspite\n\nMember of: \n\n  \n\n  \n
 \n    \n\nqualifica_rr: \n\nAssistant professors (ricercatori)
URL;TYPE=URI:https://www.diag.uniroma1.it/node/19542
END:VEVENT
END:VCALENDAR