Hello,
I have a network like this:
def SimpleModel(nn.Module):
def __init__(self):
super(BINACDNet3, self).__init__();
self.block1=nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=8, kernel_size=(1,9), stride=(1,2)),
nn.BatchNorm2d(16),
nn.ReLU()
nn.Conv2d(in_channels=16, out_channels=64, kernel_size=(1,5), stride=(1,2)),
nn.BatchNorm2d(16),
nn.ReLU(),
nn.MaxPool2d(kernel_size = (1,50))
);
self.block2 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=(3,3), stride=1, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=(3,3), stride=1, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.AvgPool2d(kernel_size = (1,4)),
nn.Flatten(),
nn.Linear(64, 10, bias=True)
);
self.output = nn.Sequential(
nn.Softmax(dim=1)
)
def forward(self, x):
x = self.block1(x);
x = x.permute((0, 2, 1, 3));
x = self.block2(x);
y = self.output(x);
return y;
This works fine as a full precision network.
However, when I convert this network to XNOR network, the gradient somehow vanishes and does not perform at all.
If I remove the x = x.permute((0, 2, 1, 3));
and change the kernel_size
of the conv layers of self.block2
it works and produces some sort of accuracy. From here, I find that the problem is the permute
after self.block1
.
I am wondering if anyone knows what is going on here. Any idea would be great. @ptrblck could you please help me in this regard.
I am looking forward to see any help.
Kind regards,
Mohaimen