Batch_norm the derivative for 'running_mean' is not implemented

I’m trying to reproduce the Wide residual network 28-2 for a semi supervised learning article I’m creating. But I’m having trouble using the Batch_norm.

I keep getting this error:
File “C:\Anaconda3\lib\site-packages\torch\nn\functional.py”, line 1708, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: the derivative for ‘running_mean’ is not implemented

Currently I’m using it like this:
F.batch_norm(z,weight=bnW0,bias=bnB0,running_mean=bnM0,running_var=bnV0,training=training)

where weight,bias,running_mean, and running_var all have been instatiated as:
nn.Parameter((torch.rand(16) - 0.5) * 1e-1)

Is batch_norm currently not working in training mode, or am I just doing something wrong here?

Hi tueboesen,
The running_mean and running_var should be registered as buffers and not parameters. The update to these tensors happens inside the function call. And gradients are not available. Hence, the error.

3 Likes

Okay thank you for clarifying.

If there is no plan of implementing that at some point I would say that the error should probably be changed to be more informative, because right now it just sounds like batch_norm isn’t working.