Pytorch 0.4.0 nn.Module.to(device) not work

ryan_stark · April 28, 2018, 2:50am

I got a RuntimeError: Expected object of type torch.DoubleTensor but found type torch.cuda.FloatTensor for argument #2 ‘weight’ when i try to train a DCGAN. It remind me that my input`s type is torch.cuda.FloatTensor, but my model only accept torch.DoubleTensor data, they does not match. The problem is i have set my model to specified GPU.
main.py

    ...
    cudnn.benchmark = True
    device = torch.device('cuda:3')
    G = Generator().to(device=device)
    D = Discriminator().to(device=device)
    ...

model.py

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.step = nn.Sequential(
            nn.Conv2d(1, 16, 3, padding=1),
            nn.BatchNorm2d(16, eps=0.001),
            nn.ReLU(),
            nn.Conv2d(16, 32, 3, padding=1),
            nn.BatchNorm2d(32, eps=0.002),
            nn.ReLU(),
            nn.Conv2d(32, 64, 3, padding=1),
            nn.BatchNorm2d(64, eps=0.003),
            nn.ReLU(), 
            nn.Conv2d(64, 32, 3, padding=1),
            nn.BatchNorm2d(32, eps=0.003),
            nn.ReLU(), 
            nn.Conv2d(32, 16, 3, padding=1),
            nn.BatchNorm2d(16, eps=0.002),
            nn.ReLU(),
            nn.Conv2d(16, 1, 3, padding=1),
            nn.Tanh()
        )

    def forward(self, x):
        return self.step(x)

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.convs = nn.Sequential(
            nn.Conv2d(1, 16, 3, padding=1),
            nn.BatchNorm2d(16, eps=0.001),
            nn.LeakyReLU(),
            nn.Conv2d(16, 16, 3, padding=1),
            nn.BatchNorm2d(16, eps=0.001),
            nn.LeakyReLU(),

            nn.MaxPool2d(2, 2),

            nn.Conv2d(16, 32, 3, padding=1),
            nn.BatchNorm2d(32, eps=0.001),
            nn.LeakyReLU(),
            nn.Conv2d(32, 32, 3, padding=1),
            nn.BatchNorm2d(32, eps=0.001),
            nn.LeakyReLU(),

            nn.MaxPool2d(2, 2),

            nn.Conv2d(32, 64, 3, padding=1),
            nn.BatchNorm2d(64, eps=0.001),
            nn.LeakyReLU()
        )
        self.fcn = nn.Sequential(
            nn.Linear(64 * 7 * 7, 512),
            nn.LeakyReLU(),
            nn.Dropout(p=0.4),
            nn.Linear(512, 32),
            nn.LeakyReLU(),
            nn.Dropout(p=0.4),
            nn.Linear(32, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        x = self.convs(x)
        x = x.view(x.size(0), -1)
        x = self.fcn(x)
        return x

nn.Module.to(device) just does not work for me. Does i make some mistake or is there some way to do this in pytorch 0.4.0?

SherlockLiao · April 28, 2018, 4:45am

maybe you don’t have cuda 3

ryan_stark · April 28, 2018, 6:19am

I`m pretty sure i have…

ptrblck · April 28, 2018, 6:54pm

Could you print torch.cuda.device_count()?
Also, could you run the following code, so we can narrow down the issue:

model = nn.Sequential(nn.Linear(10, 1)).cuda('cuda:3')
print(model[0].weight.type())

ryan_stark · April 29, 2018, 2:06am

Ok… the result looks like this:

>>> import torch
>>> import torch.nn as nn
>>> torch.cuda.device_count()
4
>>> model = nn.Sequential(nn.Linear(10,1)).cuda('cuda:3')
>>> print(model[0].weight.type())
torch.cuda.FloatTensor

rasbt · April 29, 2018, 2:31am

Is it maybe a documentation error and it meant to say the opposite, like your model correctly wants a torch.cuda.FloatTensor but got a torch.DoubleTensor from the training data? It seems that cuda is at least partly working, either for your model or your training data. Maybe you forgot to also cast your input data and this is a typo in the error message? (A long shot, but maybe worth trying if you haven’t yet)

E.g., in general, it would be sth like this

D = Discriminator().to(device=device)    

for batch_idx, (features, targets) in enumerate(train_loader):
    
    features = features.to(device)
    targets = targets.to(device)
    logits, probas = D(features)

EDIT:

Expected object of type torch.DoubleTensor but found type torch.cuda.FloatTensor for argument #2 ‘weight’ when i try to train a DCGAN. It remind me that my input`s type is torch.cuda.FloatTensor,

Oh, wait, couldn’t be argument #1 the input data and like you show in your later comment,

>>> print(model[0].weight.type())
torch.cuda.FloatTensor

seems to be correctly using cuda, so that argument #1, the input data would be the input data the weight gets matrix-multiplied with, which is probably a torch.DoubleTensor (in case you haven’t used to(device) on that one as well)

ryan_stark · April 29, 2018, 2:48am

yeah, i have cast my input to cuda
main.py

...
for epoch in range(num_epochs):
        for t, x in enumerate(loader):
            optimizerD.zero_grad()
            optimizerG.zero_grad()
            x.requires_grad_().to(device)
            noise_size = x.shape[0]
            noise = torch.randn(noise_size, 1, 28, 28).requires_grad_().to(device)
...

and when i do this:
main.py

...
device = torch.device('cuda:3')
G = Generator().to(device)
D = Discriminator().to(device)
for param in G.parameters():
    print(type(param.data))
...

I always get ‘<class ‘torch.Tensor’>’, should it be torch.cuda.*?

ryan_stark · April 29, 2018, 2:51am

Is it maybe a documentation error and it meant to say the opposite, like your model correctly wants a torch.cuda.FloatTensor but got a torch.DoubleTensor from the training data.

This make sense, i have the same feeling, the document may be wrong.

rasbt · April 29, 2018, 2:54am

As far as I know, if both the input data and the model parameters must always be the same type in order to be compatible.

It’s a bit of a gotcha, but type(data) would always return torch.Tensor . You would need to use data.type() to really see what’s going on, i.e., to show whether it’s torch.cuda.FloatTensor or torch.FloatTensor etc.

Could you maybe try again checking your input data types before they go into the model?

EDIT:

I don’t know if x.requires_grad_().to(device) is sufficient. Maybe try

x.requires_grad_()
x = x.to(device)

ryan_stark · April 29, 2018, 3:07am

As you said, I double checked data type of both model and input. The model need torch.cuda.FloatTensor, but the input is torch.cuda.DoubleTensor. And you are right, x.requires_grad_().to(device) is not work, x.to(device) is`t a inplace operation. Thanks for your help!

rasbt · April 29, 2018, 3:09am

Oh nice, so the problem is basically solved then if you do a

x = x.float().to(device)

?

Btw I think I was initially wrong when I said that it was a typo in the error message, I think it’s indeed correct regarding what’s called “argument #1” and “argument #2”, but yeah, it can be easily confusing

ryan_stark · April 29, 2018, 3:23am

yeah, It`s worked! And the error message is very confusing, we always feel a model should need some input rather than input need a model. This is a little counterintuitive. Thanks again for your help!