Disk(SSD or HDD) space increase(decrease)ing during training

Hi,
I was using yolov5. Disk space is increasing (sometimes decreasing) during training. is it normal? I think have enough gpu(8gb) and ram(32) memory and dataset is small(custom dataset).

You could check if the swap is used or if you are storing any data explicitly.
Besides these use cases I would not be aware of any disk usage.

swap is not used(I think). I think it’s because of data augmentation. During data augmentation(before training) process → disk(SSD) memory reduced, after training(finish) increased.

What kind of library for the data augmentation are you using? It sounds strange that your local SSD would be used as I don’t know why it would be necessary and it could also decrease the performance if you need to write/read from the disk while transforming data.

I think I found reason, it is because of → number of dataloader workers.
if I increase num_workers it takes more space from disk, when reduce num_workers it takes less space from disk.

Increasing the num_workers would use shared memory for the communication. Take a look at these docs for more information.
I don’t know which OS you are using, but note this section:

Since workers rely on Python multiprocessing, worker launch behavior is different on Windows compared to Unix.

  • On Unix, fork() is the default multiprocessing start method. Using fork(), child workers typically can access the dataset and Python argument functions directly through the cloned address space.
  • On Windows or MacOS, spawn() is the default multiprocessing start method. Using spawn(), another interpreter is launched which runs your main script, followed by the internal worker function that receives the dataset, collate_fn and other arguments through pickle serialization.

I don’t know exactly how the Windows implementation was done, but in case memory mapping is used your hard drive could be used to store a copy of the data.

1 Like