Strange type error

I want to get the network weights as float16, so the network is currently nn().to(torch.float16)nd, however, I get an error due to the biases being float32 type.

This is the error

    return F.conv2d(input, weight, bias, self.stride,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input type (c10::Half) and bias type (float) should be the same

this is the simplified net:

# %%
class NeuralNetwork(nn.Module):
    """
    A PyTorch neural network for image classification.

    Args:
        channels: Number of channels in the input image.
        height: Height of the input image in pixels.
        width: Width of the input image in pixels.
    """

    def __init__(self, channels: int, height: int, width: int):
        super().__init__()
        self.backbone = Backbone()
        output_dimensions = self.backbone.forward(
            torch.zeros(1, channels, height, width).to(device, dtype=dtype)
        ).shape
        self.head = Head(output_dimensions[-1])

    def forward(self, x):
        x = self.backbone(x)
        x = self.head(x)
        return x

Does this need explicitly passing the dtype to each layer ? Is there a better way to achieve this ?

Could you post the missing code pieces making the code minimal and executable so that we can reproduce and debug the issue?

I can. Just a quick question that could save me the effort: would it be equivalent to train in float32 and only later, before exporting, convert the model to torch.float16 using model.half() ?

I only need the output net to be very small and fast for small devices/web.

I don’t fully understand the question. What would be equivalent to training the model in FP32 first?

Training in FP16

So I see two possibilities:

  1. Train using float32 weights and input data (images) and then export it as half precision
  2. Train using float16 weights and float16 input data, and then only export as it is.

Currently, I will try to do 1. because it seems easier, but I started with 2. originally, and was wondering what would be a more experienced user opinion.

The second approach can easily fail and diverge, which is why we recommend using amp for mixed-precision training. Depending on the model pure float16 training could still work, but it’s generally less stable.

2 Likes

never heard amp before. Yet, it seems unnecessary here