I’ve seen in the example here that for
DataParallel one has to follow the following steps:
- Create model
- Wrap the model with
- Send to device
These lead to my first confusion when I tried
DistributedDataParallel following the pytorch imagenet example which actually has steps 2 and 3 in reverse order and without that order it throws an error about dense cuda tensors.
My second confusion comes again from following the pytorch imagenet example of
DistributedDataParallel usage which utilises
torch.utils.data.distributed.DistributedSampler to wrap the
train_set but not the
test set. Why is that?
My final confusion arises from the fact that running
DistributedDataParallel I observe the gpu volatility going up down like it’s the stock market all the way from 0% to 99%, anyone can shed some light on this?