All Resources
Important papers I have read and GitHub repos which contain code implementations — spanning foundational architectures, reasoning techniques, and model compression.
Papers
Attention is All You Need
PaperThe foundational 2017 paper by Vaswani et al. that introduced the Transformer architecture.
Neural Machine Translation by Jointly Learning to Align and Translate
PaperBahdanau et al. (2015) - The paper that introduced attention mechanisms to seq2seq models.
Effective Approaches to Attention-based Neural Machine Translation
PaperLuong et al. (2015) - Introduced global and local attention variants for improved efficiency.
Distilling the Knowledge in a Neural Network
PaperHinton et al. (2015) - Introduces knowledge distillation, showing how a smaller student model can be trained to mimic a larger teacher model using soft probability targets.
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
PaperWei et al. (2022) - Demonstrates that prompting LLMs with intermediate reasoning steps significantly improves performance on complex reasoning tasks.
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
PaperDeepSeek (2025) - Shows how reinforcement learning alone can incentivize chain-of-thought reasoning in LLMs, achieving performance comparable to OpenAI o1.