How to use v0.1.12 BN implementation in v0.4.0

Hi,
My pytorch project encountered different behaviors in pytorch v0.1.12 and v0.4.0 (Discussion here https://github.com/xingyizhou/pytorch-pose-hg-3d/issues/16 and https://github.com/bearpaw/pytorch-pose/issues/33, just for reference, there is no need to look into the project in this topic though) and I have a firm reason to believe the problem lies in the BN layer. For debug purpose, is it possible for me to locally switch the BN implementation back to v0.1.12 in pytorch v0.4.0? Or can anyone tell me or point me to the files the exact changes of BN layer from v0.1.12 to after v0.2.0? Thanks very much!

In 0.1.12 batch_norm is defined here. The corresponding C function is defined here.

I’ve compared the BatchNorm implementation for version 0.3.0 (code).
Between these versions there are some minor changes:

  • long was replaces by int64_t
  • THTensor_(resizeAs)(gradInput, input) was moved a bit up

You can find the current implementation of BatchNorm here.
Basically some conditions were added, since BatchNorm supports track_running_stats from version 0.4.0 on.

Are you sure the difference is due to the BatchNorm layers?

Thanks for the timely reply! It’s very helpful. My reason for blaming BN layers is that I observe unreasonably large output (>1000) after BN layer with model.eval() is on. Also, models without BN layers does not suffer the discrepancy between Pytorch versions. I will try to hack the BN implementation and see if it resolves the problem.

Hi ptrblck,
I am wondering, is the C code you provided actually running as BN in the default pytorch setting (when torch.backends.cudnn.enabled is True) or it is the hidden cudnn library? Thanks!

Under certain conditions it uses cudnn. In 0.4 this is checked here. You could grep for the magic size to find it in other versions.

Best regards

Thomas