How to use Batch normalization in testing model

yichuan9527 · February 26, 2017, 3:53am

the parameter affine in nn.BatchNorm2d（） is true when I train the model
and I need to set affine is False when I test the model

Do I get the right understand?

apaszke · February 26, 2017, 12:45pm

No. Affine only switches the gamma and beta transform that you can see in the docs. Use module.eval() to switch it to evaluation mode.

Also, remember that using input Variables created with volatile=True will make inference much faster and will greately reduce memory usage.

yichuan9527 · February 26, 2017, 2:31pm

OK， I understand. Thank you very much！

apaszke · February 26, 2017, 6:42pm

No, it doesn’t. But you only need the input to be volatile to perform inference efficiently. No need to touch the parameters, as volatile=True takes precedence over all flags, and doesn’t even consider parameter flags. Just create the input like that: Variable(input, volatile=True)

apaszke · February 26, 2017, 7:05pm

No. It’s never recommended to reuse the same Variables between iterations. This will likely lead to graphs growing indefinitely and increasing memory usage. Just recreate them every time, it’s extremely cheap to do.

smth · February 27, 2017, 6:27pm

What I did in the WassersteinGAN code is not an optimal approach, i have to fix that code (will do now).

apaszke · February 27, 2017, 6:36pm

@Veril note that these are Tensors not Variables. Modifying tensors as many times as you want is ok - they don’t remember what you did with them. Variables do, and this makes the graphs longer and longer.

apaszke · February 27, 2017, 6:48pm

It would be a great slowdown, but not necessarily a leak. We free the buffers once you call backward so you’d be only using up CPU memory for the graph nodes.

smth · February 27, 2017, 6:56pm

I’ve fixed the WassersteinGAN code via https://github.com/martinarjovsky/WassersteinGAN/commit/e553093d3b2a44a1b6d0c8739a973598af6aa535

@apaszke in my (old code) case I’ve hacked up reusing variables carefully, but it’s a hack.

smth · February 27, 2017, 7:41pm

by fix, I mean I added more bugs
Fixed now in master.

mderakhshani · June 5, 2017, 9:49am

One problem occurred when I did model.eval. The output of the network grew exponentially and the sigmoid function at the end of the network in my cost function gave me overflow. when I don’t apply model.eval the network does not generate any warning or error, but when I do, it generates. Could you please tell me why this problem happen? Here is my network structure:

YoloV2 (
  (path1): ModuleList (
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
    (2): LeakyReLU (0.1)
    (3): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (5): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    (6): LeakyReLU (0.1)
    (7): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (8): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (9): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (10): LeakyReLU (0.1)
    (11): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (12): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    (13): LeakyReLU (0.1)
    (14): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (15): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (16): LeakyReLU (0.1)
    (17): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (18): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (19): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (20): LeakyReLU (0.1)
    (21): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (22): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (23): LeakyReLU (0.1)
    (24): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (25): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (26): LeakyReLU (0.1)
    (27): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (28): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (29): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (30): LeakyReLU (0.1)
    (31): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (32): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (33): LeakyReLU (0.1)
    (34): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (35): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (36): LeakyReLU (0.1)
    (37): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (38): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (39): LeakyReLU (0.1)
    (40): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (41): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (42): LeakyReLU (0.1)
  )
  (parallel1): ModuleList (
    (0): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (1): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (2): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
    (3): LeakyReLU (0.1)
    (4): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (5): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (6): LeakyReLU (0.1)
    (7): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (8): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
    (9): LeakyReLU (0.1)
    (10): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (11): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (12): LeakyReLU (0.1)
    (13): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (14): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
    (15): LeakyReLU (0.1)
    (16): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (17): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
    (18): LeakyReLU (0.1)
    (19): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (20): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
    (21): LeakyReLU (0.1)
  )
  (parallel2): ModuleList (
    (0): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    (2): LeakyReLU (0.1)
    (3): space_to_depth (
    )
  )
  (path2): ModuleList (
    (0): Conv2d(1280, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
    (2): LeakyReLU (0.1)
    (3): Conv2d(1024, 425, kernel_size=(1, 1), stride=(1, 1), bias=False)
  )
)

Here is comprehensive detail about my problem.
https://github.com/pytorch/pytorch/issues/1725