I would like to implement a slightly different version of conv2d and use it inside my neural network.
I would like to take into account an additional binary data during the convolution. For the sake of clarity, let’s consider the first layer of my network. From the input grayscale image, I compute a binary mask where object is white and background is black. Then, for the convolution, I will consider a fixed size window filter moving equally along the image and the mask. If the center of the considered window belongs to the object (ie is white), then only the pixels in the grayscale image which are white in the mask for the considered window should contribute to the filtering. The same reasoning is applied for pixel belonging to the background.
Here is my code for my custom layer :
class MyConv2d(nn.Module): def __init__(self, n_channels, out_channels, kernel_size, dilation=1, padding=0, stride=1): super(MyConv2d, self).__init__() self.kernel_size = (kernel_size, kernel_size) self.kernal_size_number = kernel_size * kernel_size self.out_channels = out_channels self.dilation = (dilation, dilation) self.padding = (padding, padding) self.stride = (stride, stride) self.n_channels = n_channels self.weights = nn.Parameter(torch.Tensor(self.out_channels, self.n_channels, self.kernal_size_number)).data.uniform_(0, 1) def forward(self, x, mask): width = self.calculateNewWidth(x) height = self.calculateNewHeight(x) result = torch.zeros( [x.shape * self.out_channels, width, height], dtype=torch.float32, device=device ) windows_x = self.calculateWindows(x) windows_mask = self.calculateWindows(mask) windows_mask[windows_mask < 1] = -1 windows_mask_centers = windows_mask[:, :, windows_mask.size()//2].view(windows_mask.size(), windows_mask.size(), 1) windows_mask = windows_mask * windows_mask_centers windows_mask[windows_mask < 1] = 0 windows_x_seg = windows_x * windows_mask for channel in range(x.shape): for i_convNumber in range(self.out_channels): xx = torch.matmul(windows_x_seg[channel], self.weights[i_convNumber][channel]) xx = xx.view(-1, width, height) result[i_convNumber * xx.shape : (i_convNumber + 1) * xx.shape] += xx result = result.view(x.shape, self.out_channels, width, height) return result def calculateWindows(self, x): windows = F.unfold( x, kernel_size=self.kernel_size, padding=self.padding, dilation=self.dilation, stride=self.stride ) windows = windows.transpose(1, 2).contiguous().view(-1, x.shape, self.kernal_size_number) windows = windows.transpose(0, 1) return windows def calculateNewWidth(self, x): return ( (x.shape + 2 * self.padding - self.dilation * (self.kernel_size - 1) - 1) // self.stride ) + 1 def calculateNewHeight(self, x): return ( (x.shape + 2 * self.padding - self.dilation * (self.kernel_size - 1) - 1) // self.stride ) + 1
Then, I would like to call MyConv2d from my network;
Here is a snipset of my network :
class MyNetwork(nn.Module): def __init__(self): super(MyNetwork, self).__init__() self.conv = MyConv2d(1, 64, 5, stride=2, padding=0) # etc def forward(self, x, mask): x = F.relu(self.conv(x, mask)) # etc return x
First of all, I have a question regarding the execution speed. MyConv2d is much slower than conv2d (because of the double for loop I guess). Is there a way to speed it up?
Secondly, I have an issue at the very first iteration when I train my network on gpu. Indeed, once the input got through my first custom layer, I get back Nan values in the output. Do you have any idea why this happens? Is there something wrong with my implementation of MyConv2d?
Last, I recently have a weird error that came out of the blue when I train my network:
copy_if failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
This error occurs in MyConv2d when it runs into:
windows_mask[windows_mask < 1] = -1
Can you please help me fix this?
Many thanks in advance!