hi, i currently have a model that intakes a tensor N x C x H x W and a label tensor N x L. In the forward method, I want to first use different net for different label and then combine them together for other layers. so i current have some code like:
def forward(self, data):
for i in range(label_types_num):
idx = get_label_idx(i) # get all the index that has this label
group_data = data[idx, :]
o = self.seperate_nets[i](group_data)
output_tensor[idx, :] = o
# then output_tensor is passed to other layers...
Just wondering is there a way to parallel do the for loop?
hi, i take a look at that snippet, and it seems for me that the distribution is done by distributing data onto different GPUs (correct me if i’m wrong). However, here I would like some solutions that parallel the for loop in a single GPU case since each ‘data’ in my code itself is already in a single GPU when calling the forward.
If you analyze more carefully the snippet, you will notice that torch.distributed.launch launches CPU threads. Indeed, you can leverage this to run a model on multiple GPUs (one per thread), but you can do much more, like running multiple threads for one GPU. One limitation of doing this is that, as far as I can tell, the threads don’t share memory, which means that if you launch 10 threads on the same GPU, it will allocate 10x more memory.