Hi,
I am trying to run over 6000 custom layers initialized with nn.ModuleList() on GPUs. I can use 80-120 GPUs on cluster. What is the best way to distribute the layers on the GPU? Should I do it in init method or in forward? Do the layers run asynchronously when I execute them in a for loop? I don’t know how to execute over 6000 layers more efficiently with pytorch.
I am trying to share my model on different GPUs. But I’m not sure if this is properly done in this way. I want to decrease the time needed for training.