I have a network like this:
def SimpleModel(nn.Module): def __init__(self): super(BINACDNet3, self).__init__(); self.block1=nn.Sequential( nn.Conv2d(in_channels=1, out_channels=8, kernel_size=(1,9), stride=(1,2)), nn.BatchNorm2d(16), nn.ReLU() nn.Conv2d(in_channels=16, out_channels=64, kernel_size=(1,5), stride=(1,2)), nn.BatchNorm2d(16), nn.ReLU(), nn.MaxPool2d(kernel_size = (1,50)) ); self.block2 = nn.Sequential( nn.Conv2d(in_channels=1, out_channels=32, kernel_size=(3,3), stride=1, padding=1), nn.BatchNorm2d(32), nn.ReLU(), nn.Conv2d(in_channels=32, out_channels=64, kernel_size=(3,3), stride=1, padding=1), nn.BatchNorm2d(64), nn.ReLU(), nn.AvgPool2d(kernel_size = (1,4)), nn.Flatten(), nn.Linear(64, 10, bias=True) ); self.output = nn.Sequential( nn.Softmax(dim=1) ) def forward(self, x): x = self.block1(x); x = x.permute((0, 2, 1, 3)); x = self.block2(x); y = self.output(x); return y;
This works fine as a full precision network.
However, when I convert this network to XNOR network, the gradient somehow vanishes and does not perform at all.
If I remove the
x = x.permute((0, 2, 1, 3)); and change the
kernel_size of the conv layers of
self.block2 it works and produces some sort of accuracy. From here, I find that the problem is the
I am wondering if anyone knows what is going on here. Any idea would be great. @ptrblck could you please help me in this regard.
I am looking forward to see any help.