Furthermore, it seems that the memory is actually filled with garbage; notice the size occupied (>2100MB) for just a torch.zeros(1)!
Moreover, I notice that when the device on which processes are run is set as ‘cuda:0’, ‘cuda:1’ gets memory allocated over as 12 MB.
What is exactly happening here?