LLM Fine-tuning: Day 6

less than 1 minute read

Full Training on Colabs (T4 GPU)

Colabs

Today, I continued to optimize my training script in order to reduce the time it takes to train my gpt 2 model.

First Attempt:

Reduced the epochs to 2 instead of 3 as an attempt to reduce training time by at least 50%
Switched to higher learning rate 5e-4 for faster convergence
Reduced warmup steps (200 vs 500)
Increased batch size from 12 to 16
Increased number of data workers from 4 to 6
Reduced frequency for evaluation and saving

I also tried enabling TensorFloat-32, but I noticed that it wasn’t supported on the T4 GPU.

Screenshot 2025-07-31 at 4 20 13 PM

With those changes, I ran the code but it returned a “Out of Memory” Error, meaning that CUDA was out of memory.

Screenshot 2025-07-31 at 3 45 49 PM

In order to accomodate those limits, I reduced the batch sizes back to 12, but left everything else constant.

In the end, I was able to bring down the training time to around 5 hours, whiling using 90-95% of the T4 GPU.

Moving On, if I can’t reduce the time further, I will just train my model as it is.