This function is supposed to take an nn.Sequential network and put different layers on different GPUs, depending on a user specified “strategy”:
import argparse parser = argparse.ArgumentParser() parser.add_argument("-multigpu_strategy", default='4,9,14') params = parser.parse_args() def setup_multi_gpu(net): gpu_splits = params.multigpu_strategy.split(',') gpu = 0 new_net = nn.Sequential() for i, layer in enumerate(net): if i == 0: new_layer = layer.cuda(0) else: if i in gpu_splits: gpu+=1 new_layer = layer.cuda(gpu) new_net.add_module(str(i), new_layer) return new_net.cuda()
Though the above function doesn’t seem to work. The first GPU has the same amount of usage regardless of the “strategy”, and the other GPUs only have a few hundred MiB of GPU usage.
The function is meant for conv nets. I did try using
nn.DataParallel, but that didn’t seem to work, which is why I tried to create a solution with the function above. My input has a batch size of 1, and I can’t seem to do
net(input) after doing
net = nn.DataParallel(net).
What am I doing wrong here?
The batch size should be larger than the number of GPUs used.
So I can’t use DataParallel because my code uses a batch size of 1.