Gradient becomes nan

Hello

I implemented an own layer with the following little forward method:

    def __init__(self, a, b):
        super().__init__()

        self.a = nn.Parameter(torch.Tensor([a]))
        self.a.requires_grad = False
        self.b = nn.Parameter(torch.Tensor([b]))
        self.b.requires_grad = False

        self.i = 0

    def activate_parameter_learning(self):
        self.a.requires_grad = True
        self.b.requires_grad = True
    def forward(self, input: torch.Tensor, mse_error):
        if(torch.abs(mse_error).item() >= 0.0001):
            weight = 1 - 1 / (1 + self.a * torch.exp(self.b / mse_error))
        else:
            weight = 1
        output = input.mul(weight)
        return output

But after some backprop steps the gradient of a and b becomes Nan.
Do you have any idea how I can find the step in the back propagation where this happens.

Thx