Necchi Sewing Machine Tutorial

104_first_rl.py

Supervised fine-tuning teaches a model from example outputs. Reinforcement learning (RL) teaches from *rewards* -- the model generates its own outputs, and a reward function scores them. The model ...

GitHub

301_cookbook_abstractions.py

In tutorial 04, we wrote a GRPO training loop from scratch: sample completions, grade them, compute advantages, build datums, train. That works, but every new task would repeat the same boilerplate.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

104_first_rl.py

301_cookbook_abstractions.py

Trending now