Sparse Convolution layer

Hi All,

Wanting to make a net along the lines of this paper: A HARMONIC STRUCTURE-BASED NEURAL
NETWORK MODEL FOR MUSICAL PITCH DETECTION
I needed a sparse convolutional layer. I couldn’t find an implementation of it, so I made it myself. However, I’m still pretty new at pytorch, and so I’m looking for wisdom on whether I’ve done it properly, and how it could be improved.

The motivation for the sparse kernel: convolution works great for image recognition in part because the identity of an object in an image comes from the relationship between adjacent pixels. Ordinary convolution learns about those relationships. However, with a musical sound on a spectrogram, the identity of a musical note comes from a fundamental frequency and its harmonics. So a new kind of convolution is needed that uses a non-contiguous set of pixels for the kernel, chosen so that they can learn about harmonically related frequencies.

I’ll paste the code for the sparse convolutional layer below, but heres a link to the repo: github / jseales/ sparse_kernel, which also contains a README describing all the data and processes that happen within it.

Also, here’s the code for the rest of the net, and for loading data, etc. github / jseales/ harmonic_net/ Currently it runs, but doesn’t seem to learn very well. I don’t currently know if that’s due to problems with the sparse kernel, or hyperparameters, or what!

Any perspective on the sparse convolution implementation or any other aspect of the product will be most welcome!

class SparseConv1D(nn.Module):

def init(self, sk_ind, in_channels=1, out_channels=1, device=‘cpu’):
super(SparseConv1D, self).init()
self.out_channels = out_channels
self.in_channels = in_channels
self.sk_ind = np.array(sk_ind, dtype=int)
self.sk_len = len(sk_ind)
self.sk_weights = torch.randn(out_channels, in_channels, self.sk_len,
dtype=torch.float, requires_grad=True, device=device)
self.device = device
#print(‘self.sk_weights\n’, self.sk_weights)

def unfold_sparse_1D(self, input_tensor):
# Find the amount of zero padding needed to make the output the same
# size as the input.
# print(‘input_tensor.shape’, input_tensor.shape)
low_pad = int(max(0 - min(self.sk_ind), 0))
high_pad = int(max(0, max(self.sk_ind)))
input_array = input_tensor.cpu().detach().numpy()
padded_array = np.hstack((input_array,
np.zeros((self.in_channels, high_pad)),
np.zeros((self.in_channels, low_pad))))
# print(‘padded array\n’, padded_array)

# Construct an array of indices that will be used to make the 
# unfolded array via numpy fancy indexing. 
# Broadcast to make an array of shape(sk_len, input_len)
indices = self.sk_ind[:, np.newaxis] + np.arange(self.input_len)
# print('indices\n', indices)
# output of array has shape(in_channels, sk_len, input_len)
return torch.tensor(padded_array[np.arange(self.in_channels)[:, np.newaxis, np.newaxis], 
                                 indices[np.newaxis, :, :]], 
                                 dtype=torch.float, device=self.device)

def forward(self, input_tensor):
batch_size = input_tensor.shape[0]
self.input_len = input_tensor.shape[2]
output_batch = torch.empty(batch_size, self.out_channels, self.input_len,
dtype=torch.float, device=self.device)

for i in range(batch_size):
# Input_array will come in shape (in_channels, input_len)
  unfolded = self.unfold_sparse_1D(input_tensor[i])
  # print('unfolded\n', unfolded)
  #print(self.sk_weights)
  output_batch[i] = torch.mm(self.sk_weights.reshape(self.out_channels, self.in_channels * self.sk_len), 
                  unfolded.reshape(self.in_channels * self.sk_len, self.input_len))
return output_batch