How to use Batch normalization in testing model

the parameter affine in nn.BatchNorm2d() is true when I train the model
and I need to set affine is False when I test the model

Do I get the right understand?

1 Like

No. Affine only switches the gamma and beta transform that you can see in the docs. Use module.eval() to switch it to evaluation mode.

Also, remember that using input Variables created with volatile=True will make inference much faster and will greately reduce memory usage.

4 Likes

OK, I understand. Thank you very much!

No, it doesn’t. But you only need the input to be volatile to perform inference efficiently. No need to touch the parameters, as volatile=True takes precedence over all flags, and doesn’t even consider parameter flags. Just create the input like that: Variable(input, volatile=True)

No. It’s never recommended to reuse the same Variables between iterations. This will likely lead to graphs growing indefinitely and increasing memory usage. Just recreate them every time, it’s extremely cheap to do.

2 Likes

What I did in the WassersteinGAN code is not an optimal approach, i have to fix that code (will do now).

@Veril note that these are Tensors not Variables. Modifying tensors as many times as you want is ok - they don’t remember what you did with them. Variables do, and this makes the graphs longer and longer.

It would be a great slowdown, but not necessarily a leak. We free the buffers once you call backward so you’d be only using up CPU memory for the graph nodes.

I’ve fixed the WassersteinGAN code via https://github.com/martinarjovsky/WassersteinGAN/commit/e553093d3b2a44a1b6d0c8739a973598af6aa535

@apaszke in my (old code) case I’ve hacked up reusing variables carefully, but it’s a hack.

by fix, I mean I added more bugs :slight_smile:
Fixed now in master.

1 Like

One problem occurred when I did model.eval. The output of the network grew exponentially and the sigmoid function at the end of the network in my cost function gave me overflow. when I don’t apply model.eval the network does not generate any warning or error, but when I do, it generates. Could you please tell me why this problem happen? Here is my network structure:

YoloV2 (
  (path1): ModuleList (
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
    (2): LeakyReLU (0.1)
    (3): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (5): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    (6): LeakyReLU (0.1)
    (7): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (8): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (9): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (10): LeakyReLU (0.1)
    (11): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (12): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    (13): LeakyReLU (0.1)
    (14): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (15): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (16): LeakyReLU (0.1)
    (17): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (18): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (19): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (20): LeakyReLU (0.1)
    (21): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (22): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (23): LeakyReLU (0.1)
    (24): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (25): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (26): LeakyReLU (0.1)
    (27): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (28): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (29): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (30): LeakyReLU (0.1)
    (31): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (32): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (33): LeakyReLU (0.1)
    (34): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (35): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (36): LeakyReLU (0.1)
    (37): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (38): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (39): LeakyReLU (0.1)
    (40): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (41): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (42): LeakyReLU (0.1)
  )
  (parallel1): ModuleList (
    (0): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (1): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (2): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
    (3): LeakyReLU (0.1)
    (4): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (5): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (6): LeakyReLU (0.1)
    (7): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (8): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
    (9): LeakyReLU (0.1)
    (10): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (11): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (12): LeakyReLU (0.1)
    (13): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (14): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
    (15): LeakyReLU (0.1)
    (16): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (17): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
    (18): LeakyReLU (0.1)
    (19): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (20): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
    (21): LeakyReLU (0.1)
  )
  (parallel2): ModuleList (
    (0): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    (2): LeakyReLU (0.1)
    (3): space_to_depth (
    )
  )
  (path2): ModuleList (
    (0): Conv2d(1280, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
    (2): LeakyReLU (0.1)
    (3): Conv2d(1024, 425, kernel_size=(1, 1), stride=(1, 1), bias=False)
  )
)

Here is comprehensive detail about my problem.
https://github.com/pytorch/pytorch/issues/1725