Prediction and Learning
This part follows the model from final hidden states to training updates. It explains how GPT chooses the next token, measures mistakes, and changes its weights.
- In 11 Vocabulary Projection — From Vectors to Words, the final vector becomes logits and next-token probabilities.
- In 12 Loss — How the Model Learns, predictions become a scalar training signal through cross-entropy and perplexity.
- In 13 Training — Teaching the Model, gradients and weight updates connect the loss back to the model parameters.