LLM Fine-tuning: Day 4
Testing GPT-2 Set Up & Test Training
Today, I worked on setting up my GPT-2 model and tested training it with a small sample to ensure that everything works before full training.
GPT-2 Set Up
I set up the model by loading it and its pre-trained tokenizer. The model will run on my computer’s cpu because I don’t have a CUDA (GPU) available at the moment.
Then, I created a simple input format to train the model with one example.
Thankfully, the model produced the expected outcome without any errors, giving me the green light to move on to pre-training.
Test Training
First, I loaded a small subset (100 examples) of data to train my gpt-2 model.
Then, I loaded the model and added two special tokenizers (padding and separator) to ensure that the formatting is uniform.
For the training arguements, I set the epoch to 1 and learning rate to 5e-5 (which is the slowest rate) since the model is running on my computer’s cpu.
After creating the trainer, I moved onto testing the output extraction. I used a low temperature for this test so that the model is more concise dealing with a small subset of data.
I also had this test to be ran on wandb to visualize how my model is being trained. The downward slope of both the eval and train loss graphs indicate that the model is generalizing well to unseen data and fits the training data well.
That’s it for today, and I will work on full training moving on.