My dataset is 20.0gb in total. I am using A100 8gpus each has 40gb memory. Why my inference is failing? And what is meant by that “21.09 GiB already allocated” message??? I see this already allocated thing for all gpus.
Before initializing dpp I divide all the file paths in 8 parts. So is not each gpu supposed to work with only 20/8=2.5gb data?