Memory profile on the fly data augmentation

I would like to compare offline and online data augmentation (DA). In terms of time and memory usage during training.

  • Offline DA happens when the augmentation is done before training.
  • Online DA happens in the data class and is performed for each mini-batch.

So I am looking in how to memory profile the dataloader in pytorch with and without DA.
Then I would like to profile the memory usage of the model during training.

I think DA happens on the CPU and the model is trained on the GPU.

How would you do that ?