nn.Parameter not copied to gpus in forward pass when using nn.DataParallel

grtzsohalf · April 2, 2020, 2:30pm

test.py:

import sys
import torch
import torch.nn as nn
import torch.nn.functional as F


gpus = list(map(int, sys.argv[1].split(',')))


class Net(nn.Module):
    def __init__(self):
        super().__init__()

        self.alpha = nn.ParameterList()

        for i in range(4):
            self.alpha.append(nn.Parameter(1e-3*torch.randn(i+2, 5)))

        self.cnn = nn.Conv2d(1, 1, 1, 1, 1)


    def forward(self, x):
        print(self.alpha)
        print(self.cnn)
        return x


if __name__ == '__main__':
    net = Net().cuda()
    if len(gpus) > 1:
        net = nn.DataParallel(net, device_ids=gpus)

    net(torch.rand(4, 5))

When I run python3 test.py 0 (which means device_id = [0]), the output is

ParameterList(
    (0): Parameter containing: [torch.cuda.FloatTensor of size 2x5 (GPU 0)]
    (1): Parameter containing: [torch.cuda.FloatTensor of size 3x5 (GPU 0)]
    (2): Parameter containing: [torch.cuda.FloatTensor of size 4x5 (GPU 0)]
    (3): Parameter containing: [torch.cuda.FloatTensor of size 5x5 (GPU 0)]
)
Conv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))

However, when I run python3 test.py 0,1 (which means device_id = [0, 1]), the output is

ParameterList()
Conv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
ParameterList()
Conv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))

Only nn.Module is copied to gpus in forward pass.
How can I use and train nn.Parameter just like nn.Module with nn.DataParallel?