BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Date iCal//NONSGML kigkonsult.se iCalcreator 2.20.2//
METHOD:PUBLISH
X-WR-CALNAME;VALUE=TEXT:Eventi DIAG
BEGIN:VTIMEZONE
TZID:Europe/Paris
BEGIN:STANDARD
DTSTART:20191027T030000
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20200329T020000
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:calendar.18986.field_data.0@www.diag.uniroma1.it
DTSTAMP:20230323T213904Z
CREATED:20200203T164412Z
DESCRIPTION:Despite their formidable success in recent years\, a fundamenta
l understanding of deep neural networks (DNNs) is still lacking. Open ques
tions include the origin of the slowness of the training dynamics\, and th
e relationship between the dimensionality of parameter space and number of
training examples\, since DNNs empirically generalize very well even when
over-parametrized. A popular way to address these issues is to study the
topology of the cost function (the loss landscape) and the properties of t
he algorithm used for training (usually stochastic gradient descent\, SGD)
.Here\, we use methods and results coming from the physics of disordered s
ystems\, in particular glasses and sphere packings. On one hand\, we are a
ble to understand to what extent DNNs resemble widely studied physical sys
tems. On the other hand\, we use this knowledge to identify properties of
the learning dynamics and of the landscape.In particular\, through the stu
dy of time correlation functions in weight space\, we argue that the slow
dynamics is not due to barrier crossing\, but rather to an increasingly la
rge number of null-gradient directions\, and we show that\, at the end of
learning\, the system is diffusing at the bottom of the landscape. We also
find that DNNs exhibit a phase transition between over- and under-paramet
rized regimes\, where perfect fitting can or cannot be achieved. We show t
hat in this overparametrized phase there cannot be spurious local minima.
In the vicinity of this transition\, properties of the curvature of the lo
ss function minima are critical.This kind of knowledge can be used both as
a basis for a more grounded understanding of DNNs and for hands-on requir
ements such as hyperparameter optimization and model selection.
DTSTART;TZID=Europe/Paris:20200213T103000
DTEND;TZID=Europe/Paris:20200213T103000
LAST-MODIFIED:20200521T211813Z
LOCATION:A7 - DIAG
SUMMARY:MORE@DIAG: Landscape and Training Dynamics of DNNs: lessons from ph
ysics-inspired methods - Marco Baity Jesi
URL;TYPE=URI:http://www.diag.uniroma1.it/node/18986
END:VEVENT
END:VCALENDAR