Hello!
I would like to implement a slightly different version of conv2d and use it inside my neural network.
I would like to take into account an additional binary data during the convolution. For the sake of clarity, let’s consider the first layer of my network. From the input grayscale image, I compute a binary mask where object is white and background is black. Then, for the convolution, I will consider a fixed size window filter moving equally along the image and the mask. If the center of the considered window belongs to the object (ie is white), then only the pixels in the grayscale image which are white in the mask for the considered window should contribute to the filtering. The same reasoning is applied for pixel belonging to the background.
Here is my code for my custom layer :
class MyConv2d(nn.Module):
def __init__(self, n_channels, out_channels, kernel_size, dilation=1, padding=0, stride=1):
super(MyConv2d, self).__init__()
self.kernel_size = (kernel_size, kernel_size)
self.kernal_size_number = kernel_size * kernel_size
self.out_channels = out_channels
self.dilation = (dilation, dilation)
self.padding = (padding, padding)
self.stride = (stride, stride)
self.n_channels = n_channels
self.weights = nn.Parameter(torch.Tensor(self.out_channels, self.n_channels, self.kernal_size_number)).data.uniform_(0, 1)
def forward(self, x, mask):
width = self.calculateNewWidth(x)
height = self.calculateNewHeight(x)
result = torch.zeros(
[x.shape[0] * self.out_channels, width, height], dtype=torch.float32, device=device
)
windows_x = self.calculateWindows(x)
windows_mask = self.calculateWindows(mask)
windows_mask[windows_mask < 1] = -1
windows_mask_centers = windows_mask[:, :, windows_mask.size()[2]//2].view(windows_mask.size()[0], windows_mask.size()[1], 1)
windows_mask = windows_mask * windows_mask_centers
windows_mask[windows_mask < 1] = 0
windows_x_seg = windows_x * windows_mask
for channel in range(x.shape[1]):
for i_convNumber in range(self.out_channels):
xx = torch.matmul(windows_x_seg[channel], self.weights[i_convNumber][channel])
xx = xx.view(-1, width, height)
result[i_convNumber * xx.shape[0] : (i_convNumber + 1) * xx.shape[0]] += xx
result = result.view(x.shape[0], self.out_channels, width, height)
return result
def calculateWindows(self, x):
windows = F.unfold(
x, kernel_size=self.kernel_size, padding=self.padding, dilation=self.dilation, stride=self.stride
)
windows = windows.transpose(1, 2).contiguous().view(-1, x.shape[1], self.kernal_size_number)
windows = windows.transpose(0, 1)
return windows
def calculateNewWidth(self, x):
return (
(x.shape[2] + 2 * self.padding[0] - self.dilation[0] * (self.kernel_size[0] - 1) - 1)
// self.stride[0]
) + 1
def calculateNewHeight(self, x):
return (
(x.shape[3] + 2 * self.padding[1] - self.dilation[1] * (self.kernel_size[1] - 1) - 1)
// self.stride[1]
) + 1
Then, I would like to call MyConv2d from my network;
Here is a snipset of my network :
class MyNetwork(nn.Module):
def __init__(self):
super(MyNetwork, self).__init__()
self.conv = MyConv2d(1, 64, 5, stride=2, padding=0)
# etc
def forward(self, x, mask):
x = F.relu(self.conv(x, mask))
# etc
return x
First of all, I have a question regarding the execution speed. MyConv2d is much slower than conv2d (because of the double for loop I guess). Is there a way to speed it up?
Secondly, I have an issue at the very first iteration when I train my network on gpu. Indeed, once the input got through my first custom layer, I get back Nan values in the output. Do you have any idea why this happens? Is there something wrong with my implementation of MyConv2d?
Last, I recently have a weird error that came out of the blue when I train my network:
copy_if failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
This error occurs in MyConv2d when it runs into:
windows_mask[windows_mask < 1] = -1
Can you please help me fix this?
Many thanks in advance!