Multiprocessing cuda tensor proliferation on 'cuda:0'

Furthermore, it seems that the memory is actually filled with garbage; notice the size occupied (>2100MB) for just a torch.zeros(1)!

Moreover, I notice that when the device on which processes are run is set as ‘cuda:0’, ‘cuda:1’ gets memory allocated over as 12 MB.

What is exactly happening here?