All Resources

Important papers I have read and GitHub repos which contain code implementations — spanning foundational architectures, reasoning techniques, and model compression.

Papers

Attention is All You Need

Paper

The foundational 2017 paper by Vaswani et al. that introduced the Transformer architecture.

Access Resource

Neural Machine Translation by Jointly Learning to Align and Translate

Paper

Bahdanau et al. (2015) - The paper that introduced attention mechanisms to seq2seq models.

Access Resource

Effective Approaches to Attention-based Neural Machine Translation

Paper

Luong et al. (2015) - Introduced global and local attention variants for improved efficiency.

Access Resource

Distilling the Knowledge in a Neural Network

Paper

Hinton et al. (2015) - Introduces knowledge distillation, showing how a smaller student model can be trained to mimic a larger teacher model using soft probability targets.

Access Resource

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Paper

Wei et al. (2022) - Demonstrates that prompting LLMs with intermediate reasoning steps significantly improves performance on complex reasoning tasks.

Access Resource

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper

DeepSeek (2025) - Shows how reinforcement learning alone can incentivize chain-of-thought reasoning in LLMs, achieving performance comparable to OpenAI o1.

Access Resource

Tutorials & Code

The Annotated Transformer

Tutorial

Harvard NLP's line-by-line implementation guide with detailed explanations and code.

Access Resource

Deep Learning Implementation

Code

PyTorch implementations of all architectures covered in the Foundations series.

Access Resource