BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Date iCal//NONSGML kigkonsult.se iCalcreator 2.20.2//
METHOD:PUBLISH
X-WR-CALNAME;VALUE=TEXT:Eventi DIAG
BEGIN:VTIMEZONE
TZID:Europe/Paris
BEGIN:STANDARD
DTSTART:20191027T030000
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
RDATE:20201025T030000
TZNAME:CET
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20200329T020000
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:calendar.19896.field_data.0@www.diag.uniroma1.it
DTSTAMP:20260415T021840Z
CREATED:20200511T063258Z
DESCRIPTION:The lectures of the  Data Science PhD course on  Computational 
 and Statistical Methods of Data Reduction will be held this week online at
  meet.google.com/hkg-kgxg-azp with the following program:  I. Computationa
 l methods: sampling and inferential issues (May 11th - 12th 2020\, 09:00-1
 3:00)Prof. Serena Arima (Università del Salento)II.  Dimensionality Reduct
 ion in Clustering and Streaming (May 14th - 15th 2020\, 09:00-13:00) Prof.
  Chris Schwiegelshohn (La Sapienza) I. Computational methods: sampling and
  inferential issues (Prof.ssa Arima) 1. Random number generation algorithm
 :-  Acceptance- rejection algorithm\;- Monte Carlo Methods\;- Importance s
 ampling\;- Gibbs sampling\;- Antithetic variables  2. Numerical methods fo
 r likelihood inference:- EM algorithm\;- Bootstrap\;- Jackknife3. Monte Ca
 rlo and  Monte Carlo Markov Chain II. Dimensionality Reduction in Clusteri
 ng and Streaming (Prof. Schwiegelshohn) First Day:The curse of dimensional
 ity is a common occurrence when working with large data sets. In few dimen
 sions (such as the Euclidean plane)\, we visualize problems very well and 
 can often find interesting properties of a data set just by hand. In more 
 than three dimensions\, our ability to visualize a problem is already seve
 rely impacted and our intuition from the Euclidean plane may lead us compl
 etely astray. Moreover\, algorithms often scale poorly:Finding nearest nei
 ghbors in 2d can be done in nearly linear time. In high dimensions\, it be
 comes very difficult to improve over either n^2.Geometric data structures 
 and decompositions become hard to implement. Line sweeps\, Voronoi diagram
 s\, grids\, nets usually scale by at least a factor 2^d\, where d is the d
 imension. In some cases\, it may be even worse.Many problems that are easy
  to solve in 2D\, such as clustering\, become computationally intractable 
 in high dimensions. Often\, exact solutions require running times that are
  exponential in the number of dimensions.Unfortunately\, high dimensional 
 data sets are not the exception\, but rather the norm in modern data analy
 sis. As such\, much of computational data analysis has been devoted with f
 inding ways to reduce the dimension. In this course\, we will study two po
 pular methods\, namely principal component analysis (PCA) and random proje
 ctions. Principal component analysis originated in statistics\, but is als
 o known under various other names\, depending on the fields (e.g. eigenvec
 tor problem\, low rank approximation\, etc). We will illustrate the method
 \, highlighting the problem that is solved and the underlying assumptions 
 of PCA. Next\, we will see a powerful tool for dimension reduction known a
 s the Johnson-Lindenstrauss lemma. The Johnson-Lindenstrauss lemma states 
 that given a point set A in an arbitrary high dimension\, we can transform
  A into a point set A' in dimension log |A|\, while preserving all pairwis
 e distances. For both of these problems\, we will see applications\, inclu
 ding k-nearest neighbor classification and k-means. Second day:Large data 
 sets form a sister topic to dimension reduction. While the benefits of hav
 ing a small dimension are immediately understood\, reducing the size of th
 e data is a comparatively recent paradigm. There are many reasons for data
  compression. Aside from data storage and retrieval\, we want to minimize 
 the amount of communication in distributed computing\, enable online and s
 treaming algorithms\, or simply run an accurate (but expensive) algorithm 
 on a smaller dataset. A key concept in large-scale data analysis are cores
 ets. We view coresets as a succinct summary of a data set that behaves\, f
 or any candidate solution\, like the original data set. The surprising suc
 cess story of data compression is that for many problems\, we can construc
 t coresets of size independent of the input. For example\, linear regressi
 on in d dimensions admits coresets of size O(d)\, k-means has coresets of 
 size O(k)\, irrespective of the number of data points of the original data
  set. In our course\, we will describe the coreset paradigm formally. More
 over\, we will give an overview of methods to construct coresets for vario
 us problems. Examples include constructing coresets from random projection
 s\, by analyzing gradients\, or via sampling. We will further highlight a 
 number of applications.  
DTSTART;TZID=Europe/Paris:20200511T083000
DTEND;TZID=Europe/Paris:20200511T083000
LAST-MODIFIED:20200528T094447Z
LOCATION:DIAG - Sapienza
SUMMARY:Data Science PhD course on  Computational and Statistical Methods o
 f Data Reduction - Prof. Serena Arima and Prof. Chris Schwiegelshohn \n\n
 \n  \n  \n\n    \n\n\nChris\n\n\nSchwiegelshohn  \n\n  \n\n    \n\n\n\n\n
 \nOspite\n\nMember of: \n\n  \n\n  \n\n    \n\nqualifica_rr: \n\nAssistant
  professors (ricercatori)
URL;TYPE=URI:http://www.diag.uniroma1.it/node/19896
END:VEVENT
END:VCALENDAR
