A progressive path through modern AI — from matrix multiplication to Transformers. Each concept builds on the last. See it, break it, understand it.
The fundamental data structures of deep learning — scalars, vectors, matrices — and the dot product operation that powers every neural network layer.
Start LessonHow individual neurons combine weights, inputs, and activation functions to form Multi-Layer Perceptrons — the universal function approximators.
Start LessonThe chain rule applied recursively — how neural networks calculate the gradient of a loss function with respect to every weight, enabling learning.
Start LessonHow spatial feature extraction using shared filter weights gives CNNs their power for images — and the convolution operation that makes it work.
Complete previous lesson to unlockHow recurrent networks process sequences through hidden state, and the vanishing gradient problem that crippled them — and the LSTM solution.
Complete previous lesson to unlockThe Query–Key–Value framework that lets models selectively focus on any part of a sequence — the breakthrough enabling modern language models.
Complete previous lesson to unlockMulti-head attention, residual connections, layer normalization, and positional encoding — the architecture behind GPT, BERT, and all modern LLMs.
Complete previous lesson to unlockHow text becomes numbers: BPE tokenization, the embedding lookup table, semantic vector spaces, and why embeddings capture meaning geometrically.
Complete previous lesson to unlock