Max of List
Given a list of numbers, predict the maximum. We have two models of varying complexity, both attention-only transformers trained to 100% accuracy. Your goal: reverse-engineer the algorithm each model has learned.
Puzzle 1a is the easier variant — each number (0–9) gets its own token, and the model is just a single attention layer. A good place to start.
Puzzle 1b is a bit more complex — numbers range from 0–99, but each number is tokenized as two separate digits (tens and ones). The model has two attention layers and must first figure out how to somehow combine the information from both digits.
The starter notebook loads the pre-trained weights from HuggingFace and walks you through basic inference and attention visualization. Open it in Colab, save a copy to your own Drive, and start exploring.
Submit a link to a Colab notebook with your findings. Your notebook should be clean and easy to follow — use markdown cells to explain your reasoning clearly, include well-labeled plots to support your claims, and walk the reader through the algorithm(s) you found. Think of it as a presentation of your results, not scratch work.
A good workflow: do your rough exploration in a working notebook, then create a fresh notebook for your submission where you present things clearly and concisely.
Deadline: April 30, 2026 (anywhere on Earth)