We are using Pytorch 1.10 and Cuda 11.x
my question would be related to the topic Disk(SSD or HDD) space increase(decrease)ing during training
We are training a model on-prem and on GCP. We are observing that the space available on disk is constantly diminishing during training. We have observed this behavior with our own algorithm and with a random basic algorithm found on the net (that i can provide if required).
Given the previously quoted article, we understand that it could be because we run out RAM, which would be understandable with our algorithm but not with the one pick-up from internet.
The writing on disk get to a problematic level with our in-house algorithm getting to 4Gb per epoch, which is something we will sort out, but the behavior of a constant increase in use of disk space is something we can’t find any explanation about.
We are posting here to check if this behavior is consistent with Pytorch expected behavior, and if anyone else has had any trouble with it before ?
PS: We do have all sorts of logs from shutils or du commands in different environments (mostly on GCP), We’ll provide the relevant files/lines of logs in line with the discussion