Unlabeled Object Localization


I’m trying to get close to an object localization like Efficient Object Localization Using Convolutional Networks but with an un-labeled dataset. The features are somewhat simple:

So I was thinking I could hand-engineer a few examples (or use a GAN to create generalized versions) of kernels and use them as the basis for feature extraction, but I’m not sure how to get the location of the maximum response to the kernels. I have a couple ideas…

Is it possible to have nn.Conv2d return the convolution windows as a tensor? I was thinking I could take the maximally responding windows and put points at the location of the max value of the window as candidate points

Is it possible to differentiate a whole image with autograd? My other thought was that taking the curvature map of the resulting convolution would yield ~0 curvature around the maximum responses, esp. if the resulting tensor is clipped to some range.

Is there an efficient way to stack sliding windows of an image into the batch channel? I have a classifier trained on these features so theoretically I could just reorganize the image and then take the maximally-confident outputs of the classifier as the candidate location points. I was trying to use torch.chunk() for this but that is really annoying.

Thanks for any help.

  1. is not directly possible, but certainly possible to simulate it with the torch.unfold function. The way to do it is the same way one does convolution via matrix multiplication (unfold all the image patches): https://en.wikipedia.org/wiki/Toeplitz_matrix#Discrete_convolution

  2. Yes, you can absolutely do this out of the box.

  3. You can unfold + view / view + transpose to achieve this.


I’ll leave the function I’m using here in case anyone else can use it

def windowed_tensor(tensor, size, stride):
    """assumes a square 3D tensor"""
    tensor = tensor.unfold(1, size, stride)
    tensor = tensor.unfold(2, size, stride)
    tensor = torch.transpose(tensor, 0, 1)
    tensor = torch.transpose(tensor, 1, 2)
    tensor = tensor.contiguous()
    tensor = tensor.view(-1, 3, size, size)
    return tensor