LLM Fine-tuning: Day 5
Full Training and Notebooks (Jupyter & Colabs)
Full Training Script Development
Today, I worked on developing a script to train gpt 2 on all of the tokenized data, and did some debugging to expedite the process.
The code itself is largely similar with test-training.py. I just tweaked some details and removed the load_small_subset function. I first set up wandb to track the model’s training progress, set up the model, and load the full dataset.
Then, I set the training arguements so that the learning rate is set to 5e-5 (lowest rate) and all of parameters set to low figures so that the training process doesn’t overload my local cpu.
I kept the model temperature the same (0.7) as the testing trial since I’m still dealing with significantly less data compared to industry models.
I ran the code and everything seemed to work. But, it was going to take +200 hours. I did some testing in Jupyterlabs to see if I could optimize the training process but it only brought the time down to 170 hours. Instead of using my computer’s local cpu, I decided to migrate into Colabs since they provide a free T4 GPU.
Colabs
I first mounted the colab notebook onto my drive, cloned my repository to gain access to my scripts, and installed all required libraries.
After that, I checked the status of the GPU to ensure that it is working properly.
Then, I ran my full training code and it gave me 10 hours which was a significant improvement. But 10 hours was still insanely long. So checked the GPU usage and it was only showing 4.9/15.0 GB. Thus, I did some more optimization to fully harness the GPU. I increased the batch size so that the training process is more efficient, and it brought down the time to 7 hours.
I also tweaked the learning rate to 1e-4 and 3e-4 for fine-tuning, but it didn’t make much of a difference.
Moving on, I will work on further optimizing in order to train my model for efficiently.