Regular increase in RAM over a period of Time

Hello,

I am training CNN model using pytorch on GPU. At initial stage the RAM usage is 25% but during training the memory consumption get increas slowly. After 12 hours 90 % of memory get consumed and OOM error get encounter. I check the memory usage it shows that 80% of memory get consumed by Page Pool and Non-Page pool.
My PC have only one GPU of 16GB. And other configurations are as follows:

  1. Batch size: 256
  2. RAM : 32 GB
  3. GPU: 16GB
  4. processor: i7 11th gen

Pytorch config :

Number of workers: 4
Training device : Cuda
Number of iterations per epoch : 12000(approx)

Do you use no_grad() when you evaluate your dataset?

Are you running out of host RAM or GPU memory?

I am running out of RAM


The training code is in the image file.

@Abhi_rathi

emmm… I mean, when your train is over, after call model.eval(), you should use torch.no_grad() to avoid store grad during eval or inference. :rofl:

@bfss
Again I am facing the same issue. The increase in RAM consumption is during the training period. As the the iterations in a single epoch increase the page pool and non page pool memory is getting increase over the time. At initial stage the memory occupied by these pool are few MBs.

@Abhi_rathi

Sorry, I have no idea about the memory usage. :rofl:

Generally speaking pytorch will use GPU memory.