Gradient and the tensor dtype inconsistencies

The main code is from here:

I defined a SE attention as below:

class SEBlock(torch.nn.Module):
    def __init__(self, in_channels):
        super(SEBlock, self).__init__()
        self.avg_pool = torch.nn.AdaptiveAvgPool2d(1)
        self.fc1 = torch.nn.Conv2d(in_channels, in_channels, 1, bias=False)
        self.fc2 = torch.nn.Conv2d(in_channels, in_channels, 1, bias=False)
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        y = self.avg_pool(x)
        y = F.relu(self.fc1(y))
        y = self.fc2(y)
        y = self.sigmoid(y)
        return x * y

And simplified section of where error originates:

class SynthesisLayer(torch.nn.Module):
    def __init__(self,
        ....
        attention = True,               # Whether use attention mechanism
    ):
        super().__init__()
        

        # Self attention
        self.attention = attention

        if self.attention:
            # Initialize SEBlock here
            self.se_block = SEBlock(self.out_channels)  # SEBlock create

    def forward(self, x, w, noise_mode='random', force_fp32=False, update_emas=False):
        ...

        # Execute modulated conv2d.
        dtype = torch.float16 if (self.use_fp16 and not force_fp32 and x.device.type == 'cuda') else torch.float32

        x = modulated_conv2d(x=x.to(dtype), w=self.weight, s=styles,
            padding=self.conv_kernel-1, demodulate=(not self.is_torgb), input_gain=input_gain)
        
		if self.attention:
            # Apply SEBlock for attention
            x = x.to(dtype)
            self.se_block = self.se_block.to(dtype) 
            x = self.se_block(x)
        
		#...
        return x

So, it runs ok without SE block, but when it is on I’m getting:

line 319, in training_loop
    param.grad = grad.reshape(param.shape)
    ^^^^^^^^^^
RuntimeError: attempting to assign a gradient with dtype 'float' to a tensor with dtype 'struct c10::Half'. Please ensure that the gradient and the tensor have the same dtype

I can not reason to this since I use the same type as x and also cast it to se block. Any suggestions?

Hi, your issue is due to differing memory sizes. It seems to me that either your input or your model is operating off of c10::Half which is an unsigned short value. This means it only has 16 bits. The float value has 32 bits.

I have attached a link below to the pytorch site that gives a little bit more info about the struct C10:
https://pytorch.org/cppdocs/api/structc10_1_1_half.html#exhale-struct-structc10-1-1-half

I would recommend either ensuring your input is 16 bits before inputting it into the model or somehow adjusting your model to take higher precision types (however this latter suggestion might increase training time and memory requirements). Hope this helps!

Normally, the output of modulated_conv2d is returned as x. I added an se_block attention layer, and I made sure that its output has the same shape and dtype as the original modulated_conv2d output. At least that is what print statements say.

Could you post a minimal and executable code snippet reproducing this issue, please?