Gradient and the tensor dtype inconsistencies

cltexe · April 14, 2025, 2:07pm

The main code is from here:

training/networks_stylegan3.py

main

# Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
#
# NVIDIA CORPORATION and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA CORPORATION is strictly prohibited.

"""Generator architecture from the paper
"Alias-Free Generative Adversarial Networks"."""

import numpy as np
import scipy.signal
import scipy.optimize
import torch
from torch_utils import misc
from torch_utils import persistence
from torch_utils.ops import conv2d_gradfix
from torch_utils.ops import filtered_lrelu
from torch_utils.ops import bias_act

This file has been truncated. show original

I defined a SE attention as below:

class SEBlock(torch.nn.Module):
    def __init__(self, in_channels):
        super(SEBlock, self).__init__()
        self.avg_pool = torch.nn.AdaptiveAvgPool2d(1)
        self.fc1 = torch.nn.Conv2d(in_channels, in_channels, 1, bias=False)
        self.fc2 = torch.nn.Conv2d(in_channels, in_channels, 1, bias=False)
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        y = self.avg_pool(x)
        y = F.relu(self.fc1(y))
        y = self.fc2(y)
        y = self.sigmoid(y)
        return x * y

And simplified section of where error originates:

class SynthesisLayer(torch.nn.Module):
    def __init__(self,
        ....
        attention = True,               # Whether use attention mechanism
    ):
        super().__init__()
        

        # Self attention
        self.attention = attention

        if self.attention:
            # Initialize SEBlock here
            self.se_block = SEBlock(self.out_channels)  # SEBlock create

    def forward(self, x, w, noise_mode='random', force_fp32=False, update_emas=False):
        ...

        # Execute modulated conv2d.
        dtype = torch.float16 if (self.use_fp16 and not force_fp32 and x.device.type == 'cuda') else torch.float32

        x = modulated_conv2d(x=x.to(dtype), w=self.weight, s=styles,
            padding=self.conv_kernel-1, demodulate=(not self.is_torgb), input_gain=input_gain)
        
		if self.attention:
            # Apply SEBlock for attention
            x = x.to(dtype)
            self.se_block = self.se_block.to(dtype) 
            x = self.se_block(x)
        
		#...
        return x

So, it runs ok without SE block, but when it is on I’m getting:

line 319, in training_loop
    param.grad = grad.reshape(param.shape)
    ^^^^^^^^^^
RuntimeError: attempting to assign a gradient with dtype 'float' to a tensor with dtype 'struct c10::Half'. Please ensure that the gradient and the tensor have the same dtype

I can not reason to this since I use the same type as x and also cast it to se block. Any suggestions?

mycul · April 14, 2025, 2:34pm

Hi, your issue is due to differing memory sizes. It seems to me that either your input or your model is operating off of c10::Half which is an unsigned short value. This means it only has 16 bits. The float value has 32 bits.

I have attached a link below to the pytorch site that gives a little bit more info about the struct C10:
https://pytorch.org/cppdocs/api/structc10_1_1_half.html#exhale-struct-structc10-1-1-half

I would recommend either ensuring your input is 16 bits before inputting it into the model or somehow adjusting your model to take higher precision types (however this latter suggestion might increase training time and memory requirements). Hope this helps!

cltexe · April 15, 2025, 1:35pm

Normally, the output of modulated_conv2d is returned as x. I added an se_block attention layer, and I made sure that its output has the same shape and dtype as the original modulated_conv2d output. At least that is what print statements say.

ptrblck · April 15, 2025, 2:13pm

Could you post a minimal and executable code snippet reproducing this issue, please?