the parameter affine in nn.BatchNorm2d() is true when I train the model
and I need to set affine is False when I test the model
Do I get the right understand?
the parameter affine in nn.BatchNorm2d() is true when I train the model
and I need to set affine is False when I test the model
Do I get the right understand?
No. Affine only switches the gamma and beta transform that you can see in the docs. Use module.eval()
to switch it to evaluation mode.
Also, remember that using input Variables created with volatile=True
will make inference much faster and will greately reduce memory usage.
OK, I understand. Thank you very much!
No, it doesn’t. But you only need the input to be volatile
to perform inference efficiently. No need to touch the parameters, as volatile=True
takes precedence over all flags, and doesn’t even consider parameter flags. Just create the input like that: Variable(input, volatile=True)
No. It’s never recommended to reuse the same Variables between iterations. This will likely lead to graphs growing indefinitely and increasing memory usage. Just recreate them every time, it’s extremely cheap to do.
What I did in the WassersteinGAN code is not an optimal approach, i have to fix that code (will do now).
@Veril note that these are Tensors not Variables. Modifying tensors as many times as you want is ok - they don’t remember what you did with them. Variables do, and this makes the graphs longer and longer.
It would be a great slowdown, but not necessarily a leak. We free the buffers once you call backward
so you’d be only using up CPU memory for the graph nodes.
I’ve fixed the WassersteinGAN code via https://github.com/martinarjovsky/WassersteinGAN/commit/e553093d3b2a44a1b6d0c8739a973598af6aa535
@apaszke in my (old code) case I’ve hacked up reusing variables carefully, but it’s a hack.
by fix, I mean I added more bugs
Fixed now in master.
One problem occurred when I did model.eval
. The output of the network grew exponentially and the sigmoid function at the end of the network in my cost function gave me overflow
. when I don’t apply model.eval
the network does not generate any warning or error, but when I do, it generates. Could you please tell me why this problem happen? Here is my network structure:
YoloV2 (
(path1): ModuleList (
(0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
(2): LeakyReLU (0.1)
(3): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
(4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(5): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
(6): LeakyReLU (0.1)
(7): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
(8): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(9): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(10): LeakyReLU (0.1)
(11): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(12): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
(13): LeakyReLU (0.1)
(14): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(15): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(16): LeakyReLU (0.1)
(17): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
(18): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(19): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(20): LeakyReLU (0.1)
(21): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(22): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(23): LeakyReLU (0.1)
(24): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(25): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(26): LeakyReLU (0.1)
(27): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
(28): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(29): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(30): LeakyReLU (0.1)
(31): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(32): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(33): LeakyReLU (0.1)
(34): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(35): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(36): LeakyReLU (0.1)
(37): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(38): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(39): LeakyReLU (0.1)
(40): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(41): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(42): LeakyReLU (0.1)
)
(parallel1): ModuleList (
(0): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
(1): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
(3): LeakyReLU (0.1)
(4): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(6): LeakyReLU (0.1)
(7): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(8): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
(9): LeakyReLU (0.1)
(10): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(11): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(12): LeakyReLU (0.1)
(13): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(14): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
(15): LeakyReLU (0.1)
(16): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(17): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
(18): LeakyReLU (0.1)
(19): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(20): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
(21): LeakyReLU (0.1)
)
(parallel2): ModuleList (
(0): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
(2): LeakyReLU (0.1)
(3): space_to_depth (
)
)
(path2): ModuleList (
(0): Conv2d(1280, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
(2): LeakyReLU (0.1)
(3): Conv2d(1024, 425, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
)
Here is comprehensive detail about my problem.
https://github.com/pytorch/pytorch/issues/1725