Changing data drive

I noticed that Pytorch is writing and deleting a lot of data to my main drive and I couldn’t find a way, how to change the drive, that the data is written to. Is that even possible?

In my case for each epoch it generates and deletes about 8 GB of data, which can really hurt my SSD’s lifespan when things run for hours and there are multiply multi-hundred epoch trainings daily. I’d like to save that data either in a RAM disk (if there are speed benefits) or to an HDD.

By the way: I can’t make out, why there is even 8 GB of new data generated per epoch, when the dataset is just about 20 MB big in form of about 2500ish images. I do random flips, rotations and crops, but that can’t cause the temporary data to be bigger by a factor of 400, can it?

Are you storing the checkpoints frequently on your drive?
Besides e.g. the pretrained weights from torchvision, PyTorch shouldn’t store anything on the drive, if you have enough RAM.
However, if you are running out of RAM, your OS might use your SSD as the swap file (but again this is not PyTorch-specific).

Could you narrow down, which files are being saved?

Weird, it seems to be solved after disabling the swap file mechanism.
I don’t understand however, why a swap file would’ve been created in the first place, when of 32 GB of RAM only 15-22 were used.

Thanks for your help!

The swap won’t necessarily be only used when your RAM is full (at least not on Ubuntu using the default settings).
You can change the tendency of the kernel to use the swap using the swappiness setting as described here (scroll down to swappiness).

These settings might be OS dependent, so I’m not sure how other Linux distributions handle it.