Self-Supervised Learning, JEPA, World Models, and AI’s future

Abstract
Current AI architectures remain significantly inferior to humans and animals in their ability to reason, plan, and understand the physical world. This lecture unpacks the rationale for moving beyond supervised and reinforcement learning toward objective-driven AI. We will explore Moravec’s paradox — the discrepancy between human data efficiency and computational difficulty — and the inherent limitations of autoregressive models that lead to hallucinations and compounding errors. The talk focuses on the transition from simple feedforward prediction (‘system 1’) toward world models capable of deliberate reasoning and inference through optimisation (‘system 2’). We will introduce the conceptual framework of Joint-Embedding Predictive Architectures (JEPA) and Energy-Based Models (EBMs) as a unifying mathematical language for self-supervised learning.
-
Option A: recursion and optimal control. This path examines the ‘action’ component of the world model. We will discuss the role of recursion as embodied by Recurrent Neural Network (RNN) equations and show how to use backpropagation through time to find optimal sequences of actions via Model-Predictive Control (MPC).
-
Option B: latent-variable energy-based models. This path uses a toy architectural example to master the mechanics of EBMs. We will define the relationship between energy and free energy, explore various loss functionals, and demonstrate how inference is performed as a minimisation problem
Ultimately, this lecture provides a technical roadmap for moving beyond Generative AI toward autonomous and controllable systems. It addresses the why by analysing the data inefficiency of current paradigms, the what by defining objective-driven world model architectures, and the how by exploring the mathematical foundations of Joint-Embedding Predictive Architectures (JEPA) and Energy-Based Models (EBMs).
Speaker reference: https://atcold.github.io