Input type (torch.cuda.FloatTensor) and bias type (torch.FloatTensor) should be the same

nmvr · August 17, 2021, 11:53am

Hi, I’m currently using Blitz to create a 3D bayesian Unet and when trying to train the model I get the following error:
Input type (torch.cuda.FloatTensor) and bias type (torch.FloatTensor) should be the same

I have searched around similar topics, I found one for the same error which didn’t help, and several regarding a similar message but about the weights, which was also of no help for this situation (both model and data are going for the same cuda device).

The model is defined here: model - Pastebin.com
And the training loop is here: loop - Pastebin.com

The definition of the BayesianConv3d layer can be found here.

Full error trace below.

Any help regarding this would be greatly appreciated.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-d8ddbcf20a6e> in <module>
     19         #print("labels: ", labels.shape)
     20         optimizer.zero_grad()
---> 21         outputs = model(inputs)
     22         #loss = loss_function(outputs, labels)
     23         loss = model.sample_elbo(inputs=inputs,

~/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/Desktop/bayesian/BUNet3D.py in forward(self, x)
    115 
    116     def forward(self, x):
--> 117         x1 = self.convd1(x)
    118         x2 = self.convd2(x1)
    119         x3 = self.convd3(x2)

~/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/Desktop/bayesian/BUNet3D.py in forward(self, x)
     42         if not self.first:
     43             x = self.maxpool(x)
---> 44         x = self.conv1(x)
     45         x = self.bn1(x)
     46         #x = self.bn1(self.conv1(x))

~/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/miniconda3/lib/python3.8/site-packages/blitz/modules/conv_bayesian_layer.py in forward(self, x)
    366         self.log_prior = self.weight_prior_dist.log_prior(w) + b_log_prior
    367 
--> 368         return F.conv3d(input=x,
    369                         weight=w,
    370                         bias=b,

RuntimeError: Input type (torch.cuda.FloatTensor) and bias type (torch.FloatTensor) should be the same

ptrblck · August 17, 2021, 7:14pm

Based on the error message it seems that the error is raised in conv_bayesian_layer.py so could you check how the bias is initialized and make sure it’s properly pushed to the GPU?

nmvr · August 19, 2021, 10:13am

I checked the implementation and nothing seems to be wrong unless I’m missing something.
Both weights and biases are initialized normally, with both not being pushed to any specific device.

Here is the part o the code that addresses both weight and bias initialization:

#our weights
        self.weight_mu = nn.Parameter(torch.Tensor(out_channels, in_channels // groups, *kernel_size).normal_(posterior_mu_init, 0.1))
        self.weight_rho = nn.Parameter(torch.Tensor(out_channels, in_channels // groups, *kernel_size).normal_(posterior_rho_init, 0.1))
        self.weight_sampler = TrainableRandomDistribution(self.weight_mu, self.weight_rho)

        #our biases
        if self.bias:
            self.bias_mu = nn.Parameter(torch.Tensor(out_channels).normal_(posterior_mu_init, 0.1))
            self.bias_rho = nn.Parameter(torch.Tensor(out_channels).normal_(posterior_rho_init, 0.1))
            self.bias_sampler = TrainableRandomDistribution(self.bias_mu, self.bias_rho)
            self.bias_prior_dist = PriorWeightDistribution(self.prior_pi, self.prior_sigma_1, self.prior_sigma_2, dist = self.prior_dist)
        else:
            self.register_buffer('bias_zero', torch.zeros((self.out_channels)) )

        # Priors (as BBP paper)
        self.weight_prior_dist = PriorWeightDistribution(self.prior_pi, self.prior_sigma_1, self.prior_sigma_2, dist = self.prior_dist)
        self.log_prior = 0
        self.log_variational_posterior = 0

    def forward(self, x):
        #Forward with uncertain weights, fills bias with zeros if layer has no bias
        #Also calculates the complecity cost for this sampling
        if self.freeze:
            return self.forward_frozen(x)

        w = self.weight_sampler.sample()

        if self.bias:
            b = self.bias_sampler.sample()
            b_log_posterior = self.bias_sampler.log_posterior()
            b_log_prior = self.bias_prior_dist.log_prior(b)

        else:
            b = self.bias_zero
            b_log_posterior = 0
            b_log_prior = 0

        self.log_variational_posterior = self.weight_sampler.log_posterior() + b_log_posterior
        self.log_prior = self.weight_prior_dist.log_prior(w) + b_log_prior

        return F.conv3d(input=x,
                        weight=w,
                        bias=b,
                        stride=self.stride,
                        padding=self.padding,
                        dilation=self.dilation,
                        groups=self.groups)

ptrblck · August 19, 2021, 5:36pm

Could you check x.device, w.device, as well as b.device inside the forward before applying F.conv3d?

nmvr · August 20, 2021, 9:41am

x device:  cuda:0
b device:  cpu
w device:  cuda:0

It is indeed in the cpu, however, I have no idea why.
Neither one of them is pushed into any device in the layer class, the entire model is later pushed into cuda:0 before the training cycle.

Would defining a device in the layer class and pushing the bias there solve the issue? Won’t it cause problems when the DataParallel is used in the model later?

ptrblck · August 20, 2021, 6:33pm

I don’t know how each of the posted methods is implemented and would guess that new tensors are created internally without using the device argument of an already defined and registered parameter or buffer.
bias_zero is properly registered as a buffer and will thus be pushed to the right device.
However, self.bias_sampler.sample() could create a new CPUTensor, which would then create the issue.
In case you’ve implemented these methods, make sure that new tensors are pushed to the right device e.g. via:

device = next(self.paramters()).device
new_tensor = torch.randn(..., device=device)