1 minute read

Testing GPT-2 Set Up & Test Training

Today, I worked on setting up my GPT-2 model and tested training it with a small sample to ensure that everything works before full training.

GPT-2 Set Up

I set up the model by loading it and its pre-trained tokenizer. The model will run on my computer’s cpu because I don’t have a CUDA (GPU) available at the moment.

Screenshot 2025-07-14 at 9 51 43 PM

Then, I created a simple input format to train the model with one example.

Screenshot 2025-07-14 at 9 53 21 PM

Thankfully, the model produced the expected outcome without any errors, giving me the green light to move on to pre-training.

Test Training

First, I loaded a small subset (100 examples) of data to train my gpt-2 model.

Screenshot 2025-07-14 at 9 54 32 PM

Then, I loaded the model and added two special tokenizers (padding and separator) to ensure that the formatting is uniform.

Screenshot 2025-07-14 at 9 55 04 PM

For the training arguements, I set the epoch to 1 and learning rate to 5e-5 (which is the slowest rate) since the model is running on my computer’s cpu.

Screenshot 2025-07-14 at 9 55 54 PM

After creating the trainer, I moved onto testing the output extraction. I used a low temperature for this test so that the model is more concise dealing with a small subset of data.

Screenshot 2025-07-14 at 9 58 24 PM

I also had this test to be ran on wandb to visualize how my model is being trained. The downward slope of both the eval and train loss graphs indicate that the model is generalizing well to unseen data and fits the training data well.

W B Chart 7_14_2025, 10_04_24 PM W B Chart 7_14_2025, 10_04_32 PM

That’s it for today, and I will work on full training moving on.