Input type (torch.cuda.FloatTensor) and bias type (torch.FloatTensor) should be the same

Hi, I’m currently using Blitz to create a 3D bayesian Unet and when trying to train the model I get the following error:
Input type (torch.cuda.FloatTensor) and bias type (torch.FloatTensor) should be the same

I have searched around similar topics, I found one for the same error which didn’t help, and several regarding a similar message but about the weights, which was also of no help for this situation (both model and data are going for the same cuda device).

The model is defined here: model - Pastebin.com
And the training loop is here: loop - Pastebin.com

The definition of the BayesianConv3d layer can be found here.

Full error trace below.

Any help regarding this would be greatly appreciated.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-d8ddbcf20a6e> in <module>
     19         #print("labels: ", labels.shape)
     20         optimizer.zero_grad()
---> 21         outputs = model(inputs)
     22         #loss = loss_function(outputs, labels)
     23         loss = model.sample_elbo(inputs=inputs,

~/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/Desktop/bayesian/BUNet3D.py in forward(self, x)
    115 
    116     def forward(self, x):
--> 117         x1 = self.convd1(x)
    118         x2 = self.convd2(x1)
    119         x3 = self.convd3(x2)

~/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/Desktop/bayesian/BUNet3D.py in forward(self, x)
     42         if not self.first:
     43             x = self.maxpool(x)
---> 44         x = self.conv1(x)
     45         x = self.bn1(x)
     46         #x = self.bn1(self.conv1(x))

~/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/miniconda3/lib/python3.8/site-packages/blitz/modules/conv_bayesian_layer.py in forward(self, x)
    366         self.log_prior = self.weight_prior_dist.log_prior(w) + b_log_prior
    367 
--> 368         return F.conv3d(input=x,
    369                         weight=w,
    370                         bias=b,

RuntimeError: Input type (torch.cuda.FloatTensor) and bias type (torch.FloatTensor) should be the same

Based on the error message it seems that the error is raised in conv_bayesian_layer.py so could you check how the bias is initialized and make sure it’s properly pushed to the GPU?

I checked the implementation and nothing seems to be wrong unless I’m missing something.
Both weights and biases are initialized normally, with both not being pushed to any specific device.

Here is the part o the code that addresses both weight and bias initialization:

#our weights
        self.weight_mu = nn.Parameter(torch.Tensor(out_channels, in_channels // groups, *kernel_size).normal_(posterior_mu_init, 0.1))
        self.weight_rho = nn.Parameter(torch.Tensor(out_channels, in_channels // groups, *kernel_size).normal_(posterior_rho_init, 0.1))
        self.weight_sampler = TrainableRandomDistribution(self.weight_mu, self.weight_rho)

        #our biases
        if self.bias:
            self.bias_mu = nn.Parameter(torch.Tensor(out_channels).normal_(posterior_mu_init, 0.1))
            self.bias_rho = nn.Parameter(torch.Tensor(out_channels).normal_(posterior_rho_init, 0.1))
            self.bias_sampler = TrainableRandomDistribution(self.bias_mu, self.bias_rho)
            self.bias_prior_dist = PriorWeightDistribution(self.prior_pi, self.prior_sigma_1, self.prior_sigma_2, dist = self.prior_dist)
        else:
            self.register_buffer('bias_zero', torch.zeros((self.out_channels)) )

        # Priors (as BBP paper)
        self.weight_prior_dist = PriorWeightDistribution(self.prior_pi, self.prior_sigma_1, self.prior_sigma_2, dist = self.prior_dist)
        self.log_prior = 0
        self.log_variational_posterior = 0

    def forward(self, x):
        #Forward with uncertain weights, fills bias with zeros if layer has no bias
        #Also calculates the complecity cost for this sampling
        if self.freeze:
            return self.forward_frozen(x)

        w = self.weight_sampler.sample()

        if self.bias:
            b = self.bias_sampler.sample()
            b_log_posterior = self.bias_sampler.log_posterior()
            b_log_prior = self.bias_prior_dist.log_prior(b)

        else:
            b = self.bias_zero
            b_log_posterior = 0
            b_log_prior = 0

        self.log_variational_posterior = self.weight_sampler.log_posterior() + b_log_posterior
        self.log_prior = self.weight_prior_dist.log_prior(w) + b_log_prior

        return F.conv3d(input=x,
                        weight=w,
                        bias=b,
                        stride=self.stride,
                        padding=self.padding,
                        dilation=self.dilation,
                        groups=self.groups)

Could you check x.device, w.device, as well as b.device inside the forward before applying F.conv3d?

x device:  cuda:0
b device:  cpu
w device:  cuda:0

It is indeed in the cpu, however, I have no idea why.
Neither one of them is pushed into any device in the layer class, the entire model is later pushed into cuda:0 before the training cycle.

Would defining a device in the layer class and pushing the bias there solve the issue? Won’t it cause problems when the DataParallel is used in the model later?

I don’t know how each of the posted methods is implemented and would guess that new tensors are created internally without using the device argument of an already defined and registered parameter or buffer.
bias_zero is properly registered as a buffer and will thus be pushed to the right device.
However, self.bias_sampler.sample() could create a new CPUTensor, which would then create the issue.
In case you’ve implemented these methods, make sure that new tensors are pushed to the right device e.g. via:

device = next(self.paramters()).device
new_tensor = torch.randn(..., device=device)