Generator model giving nan during .eval()

sahni · July 30, 2019, 3:42pm

Hi,

I am using the following generator model for a project, which is similar to DCGAN tutorial. The only difference is that I have added a couple of Residual Blocks in the beginning. In train mode, everything works fine and proper results are generated. However, if I set the model to eval mode using .eval(), then the model generates NaN output.

I have narrowed it down to an issue in the residual block, but I am not sure why NaN is being generated. I am suspecting that the issue might be due to the Instance Norm layer. Could someone clarify why this could be happening?

Note: Input to the generator is a Bx40x1x1 tensor, which consists of 1s ans 0s.

class ResidualBlock(nn.Module):
    """Residual Block with instance normalization."""
    def __init__(self, dim_in, dim_out):
        super(ResidualBlock, self).__init__()
        self.main = nn.Sequential(
            nn.Conv2d(dim_in, dim_out, kernel_size=3, stride=1, padding=1, bias=False),
            nn.InstanceNorm2d(dim_out, affine=True, track_running_stats=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(dim_out, dim_out, kernel_size=3, stride=1, padding=1, bias=False),
            nn.InstanceNorm2d(dim_out, affine=True, track_running_stats=True))

    def forward(self, x):
        return x + self.main(x)

class Generator(nn.Module):
    def __init__(self, in_dim=40, conv_dim=64, out_dim=3):
        super(Generator, self).__init__()
        self.feat_extractor = nn.Sequential(
            # Residual Blocks
            ResidualBlock(in_dim, in_dim),
            ResidualBlock(in_dim, in_dim),
            ResidualBlock(in_dim, in_dim),
            ResidualBlock(in_dim, in_dim),
            ResidualBlock(in_dim, in_dim),
            ResidualBlock(in_dim, in_dim),
            # input is Z, going into a convolution
            nn.ConvTranspose2d(in_channels=in_dim, out_channels=conv_dim*8,
                                kernel_size=4, stride=1, padding=0, bias=False),
            nn.BatchNorm2d(conv_dim * 8),
            nn.ReLU(True),
            # state size. (conv_dim*8) x 4 x 4
            nn.ConvTranspose2d(in_channels=conv_dim*8, out_channels=conv_dim*4,
                                kernel_size=4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(conv_dim * 4),
            nn.ReLU(True),
            # state size. (conv_dim*4) x 8 x 8
            nn.ConvTranspose2d(in_channels=conv_dim*4, out_channels=conv_dim*2,
                                kernel_size=4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(conv_dim * 2),
            nn.ReLU(True),
            # state size. (conv_dim*2) x 16 x 16
            nn.ConvTranspose2d(in_channels=conv_dim*2, out_channels=conv_dim,
                                kernel_size=4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(conv_dim),
            nn.ReLU(True),
        )
        # state size. (conv_dim) x 32 x 32
        self.img_gen = nn.Sequential(
            nn.ConvTranspose2d(in_channels=conv_dim, out_channels=out_dim,
                                kernel_size=4, stride=2, padding=1, bias=False),
            nn.Tanh()
        )
        # state size. (out_dim) x 64 x 64

    def forward(self, input):
        return self.img_gen(self.feat_extractor(input))

Nikronic · July 30, 2019, 4:10pm

Hi,

I have tested your model separately and together with ResidulBlock and it never gererates NaN values.

Here is the code I have tested:


model = Generator()
model.eval()
x = torch.randn(1, 40, 1, 1)
x[x>0] = 0
x[x<0] = 1
o = model(x)
print(o.size())  #torch.Size([1, 3, 64, 64])

sahni · July 30, 2019, 4:20pm

Hi,

I think it is not generating the NaN values for you because when you are testing, the model is not trained yet. And the Instance Norm layer is also not trained.

I verified this on my system. Without loading the weights for the trained model, .eval mode works fine. But after loading the weights, it generates NaN.

Nikronic · July 30, 2019, 4:28pm

I thought the problem pertain the implementation. Have you tried a different system? like google colab?

Actually, in a model of mine, the loss values explodes in my local environemnt(old system) but on colab model trains very well.

sahni · July 30, 2019, 4:33pm

I think the problem pertains to some missing understanding of the instance norm layer. Since the code works perfectly in train mode, that is if I do evaluation without calling .eval, I don’t think it should be an environment issue.

I am wondering if their is some issue in instance norm for handling input which is predominantly zero during eval state.

Idan_Azuri · January 17, 2020, 5:09pm

Have a look here:

In short: in eval model batch norm doesn’t calculate the statistics of the current batch, that may lead to your issue.