BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Date iCal//NONSGML kigkonsult.se iCalcreator 2.20.2//
METHOD:PUBLISH
X-WR-CALNAME;VALUE=TEXT:Eventi DIAG
BEGIN:VTIMEZONE
TZID:Europe/Paris
BEGIN:STANDARD
DTSTART:20251026T030000
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
RDATE:20261025T030000
TZNAME:CET
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20260329T020000
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:calendar.30789.field_data.0@www.diag.uniroma1.it
DTSTAMP:20260518T113717Z
CREATED:20260506T044347Z
DESCRIPTION:AbstractSafety Neurons in Large Language ModelsLarge language m
 odels (LLMs) achieve state-of-the-art performance across a wide range of t
 asks\, but their widespread deployment raises urgent concerns around secur
 ity\, privacy\, and misuse. Building on recent progress in sparse mechanis
 tic interpretability\, particularly findings from vision models\, this tal
 k examines the hypothesis that a small set of neurons or features may play
  a disproportionate role in safety-aligned behavior in LLMs. We begin by p
 resenting methods for identifying such sparse\, interpretable substructure
 s and evaluating how inference-time manipulation of these components can d
 egrade safety behavior in both white-box and black-box settings. We then e
 xtend this perspective to Mixture-of-Experts (MoE) models\, introducing a 
 training-free\, lightweight\, and architecture-agnostic framework for prob
 ing and stress-testing the safety alignment of modern MoE LLMs during infe
 rence. Finally\, the talk discusses the broader implications and applicati
 ons of “safety features\,” including their role in safety-relevant behavio
 r in code-generation models\, and highlights opportunities for more robust
  alignment and defense.Bio. Stjepan Picek is a full professor at the Unive
 rsity of Zagreb\, Faculty of Electrical Engineering and Computing\, Croati
 a. He also holds an associate professor position at Radboud University\, N
 ijmegen\, and an adjunct professor position at the University of Bergen\, 
 Norway. Before that\, he was an assistant professor at TU Delft and a post
 doctoral researcher at MIT\, USA\, and KU Leuven\, Belgium. Stjepan comple
 ted PhD in computer science in 2015 at the University of Zagreb\, Croatia 
 and Radboud University\, The Netherlands. In 2024\, he finished a PhD in m
 athematics at the University of Paris 8\, France. His research interests i
 nclude security and cryptography\, machine learning\, and evolutionary com
 putation. To date\, Stjepan has given more than 80 invited talks and publi
 shed more than 200 refereed papers. He is a program committee member and r
 eviewer for a number of conferences and journals and a member of several p
 rofessional societies. His work has been featured in the mainstream media 
 and on popular technology blogs. He is a member of ELLIS and a Fellow of t
 he Young Academy of Europe.Stjepan Picek is visiting Sapienza/DIAG in the 
 context of the EMAI program\, in order to conduct research and teaching ac
 tivities.
DTSTART;TZID=Europe/Paris:20260508T100000
DTEND;TZID=Europe/Paris:20260508T100000
LAST-MODIFIED:20260506T055427Z
LOCATION:Aula A3\, via Ariosto 25\, Roma
SUMMARY:Safety Neurons in Large Language Models  - Stjepan Picek
URL;TYPE=URI:https://www.diag.uniroma1.it/node/30789
END:VEVENT
END:VCALENDAR
