To compute the weights, I need the count for each class. But whereas in my case the sampler is used to create the datasets too. So, only after creating the datasets i can get access to the sample count.
( I used sampler to get the indices of images and stack up images sequentially to create samples, then create datasets and load it using the DataLoader)
Do you have any suggestion for this? I can attach the code if required.
I don’t think there is anything wrong per se with looping over the dataset to get the class distribution. That said, if it takes a long time and you expect to run your training often, the typical thing is to make it a preprocessing step (just like e.g. the famous ImageNet mean and std for normalization have been part of preprocessing before people just kept them hardcoded).
Creating a dataloader isn’t that expensive (the expensive stuff is only done when iterating them and re-done every epoch), so there isn’t anything wrong with having one that is used to collect statistics and then creating a new one with the weighted sampler.