I am training CNN model using pytorch on GPU. At initial stage the RAM usage is 25% but during training the memory consumption get increas slowly. After 12 hours 90 % of memory get consumed and OOM error get encounter. I check the memory usage it shows that 80% of memory get consumed by Page Pool and Non-Page pool.
My PC have only one GPU of 16GB. And other configurations are as follows:
Batch size: 256
RAM : 32 GB
GPU: 16GB
processor: i7 11th gen
Pytorch config :
Number of workers: 4
Training device : Cuda
Number of iterations per epoch : 12000(approx)
@bfss
Again I am facing the same issue. The increase in RAM consumption is during the training period. As the the iterations in a single epoch increase the page pool and non page pool memory is getting increase over the time. At initial stage the memory occupied by these pool are few MBs.