GPU out of memory during forward pass in inceptionV3

moo3030 · February 28, 2024, 2:56pm

I am trying to train an inceptionV3 model with a batch size of 256. The whole dataset does not exceed 5MB and the size of the model is around 45MB, so i have no problem in loading the model and batches of data.
the error i am getting looks like this:

The code crashes in the first epoch, first batch and during the first forward pass. Am I doing something wrong or is this an expected behavior. The forward pass takes about 12GB on its own

This is my code:

(ignore the cpu memory usage, it is not reported correctly)

Thanks!

ptrblck · February 28, 2024, 10:23pm

The OOM might be expected since the forward activations, needed to compute the gradients, could take the majority of the memory as described here.

moo3030 · February 29, 2024, 10:12am

Thanks, I appreciate your help!