BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Date iCal//NONSGML kigkonsult.se iCalcreator 2.20.2//
METHOD:PUBLISH
X-WR-CALNAME;VALUE=TEXT:Eventi DIAG
BEGIN:VTIMEZONE
TZID:Europe/Paris
BEGIN:STANDARD
DTSTART:20191027T030000
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20200329T020000
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:calendar.18986.field_data.0@www.diag.uniroma1.it
DTSTAMP:20240420T072043Z
CREATED:20200203T164412Z
DESCRIPTION:Despite their formidable success in recent years\, a fundamenta
 l understanding of deep neural networks (DNNs) is still lacking. Open ques
 tions include the origin of the slowness of the training dynamics\, and th
 e relationship between the dimensionality of parameter space and number of
  training examples\, since DNNs empirically generalize very well even when
  over-parametrized. A popular way to address these issues is to study the 
 topology of the cost function (the loss landscape) and the properties of t
 he algorithm used for training (usually stochastic gradient descent\, SGD)
 .Here\, we use methods and results coming from the physics of disordered s
 ystems\, in particular glasses and sphere packings. On one hand\, we are a
 ble to understand to what extent DNNs resemble widely studied physical sys
 tems. On the other hand\, we use this knowledge to identify properties of 
 the learning dynamics and of the landscape.In particular\, through the stu
 dy of time correlation functions in weight space\, we argue that the slow 
 dynamics is not due to barrier crossing\, but rather to an increasingly la
 rge number of null-gradient directions\, and we show that\, at the end of 
 learning\, the system is diffusing at the bottom of the landscape. We also
  find that DNNs exhibit a phase transition between over- and under-paramet
 rized regimes\, where perfect fitting can or cannot be achieved. We show t
 hat in this overparametrized phase there cannot be spurious local minima. 
 In the vicinity of this transition\, properties of the curvature of the lo
 ss function minima are critical.This kind of knowledge can be used both as
  a basis for a more grounded understanding of DNNs and for hands-on requir
 ements such as hyperparameter optimization and model selection.
DTSTART;TZID=Europe/Paris:20200213T103000
DTEND;TZID=Europe/Paris:20200213T103000
LAST-MODIFIED:20200521T211813Z
LOCATION:A7 - DIAG
SUMMARY:MORE@DIAG: Landscape and Training Dynamics of DNNs: lessons from ph
 ysics-inspired methods - Marco Baity Jesi
URL;TYPE=URI:http://www.diag.uniroma1.it/node/18986
END:VEVENT
END:VCALENDAR