LLM Fine-tuning: Day 6
Full Training on Colabs (T4 GPU)
Colabs
Today, I continued to optimize my training script in order to reduce the time it takes to train my gpt 2 model.
First Attempt:
- Reduced the epochs to 2 instead of 3 as an attempt to reduce training time by at least 50%
- Switched to higher learning rate 5e-4 for faster convergence
- Reduced warmup steps (200 vs 500)
- Increased batch size from 12 to 16
- Increased number of data workers from 4 to 6
- Reduced frequency for evaluation and saving
I also tried enabling TensorFloat-32, but I noticed that it wasn’t supported on the T4 GPU.
With those changes, I ran the code but it returned a “Out of Memory” Error, meaning that CUDA was out of memory.
In order to accomodate those limits, I reduced the batch sizes back to 12, but left everything else constant.
In the end, I was able to bring down the training time to around 5 hours, whiling using 90-95% of the T4 GPU.
Moving On, if I can’t reduce the time further, I will just train my model as it is.