Error for grid_generator in spatial transformer

Sohrab_Salimian · November 16, 2017, 8:43pm

result = self.forward(*input, **kwargs)
File “test.py”, line 141, in forward
x = self.stn(x)
File “test.py”, line 135, in stn
grid = F.affine_grid(theta, x.size())
File “/home/sohrab/anaconda3/lib/python3.5/site-packages/torch/nn/functional.py”, line 1527, in affine_grid
return AffineGridGenerator.apply(theta, size)
File “/home/sohrab/anaconda3/lib/python3.5/site-packages/torch/nn/_functions/vision.py”, line 98, in forward
grid = torch.bmm(base_grid.view(N, H * W, 3), theta.transpose(1, 2))
RuntimeError: invalid argument 2: equal number of batches expected, got 1, 49 at /home/sohrab/pytorch/aten/src/TH/generic/THTensorMath.c:1634

I am trying to input my data into the network but i keep getting this error, my dimensions of my images are (1,1,100,100)
im not sure what to do any help would be incredibly appreciated

richard · November 16, 2017, 9:16pm

Could you provide more information about how you’re calling affine_grid? The documentation suggests that the first argument should have size (N, 2, 3) and the second argument should be 4-dimensional with size (N, C, H, W).

Sohrab_Salimian · November 16, 2017, 9:20pm

so the affine_grid is a spatial transformer in the pytorch libraries, the error comes from the vision.py file on line 98 as indicated,

def forward(ctx, theta, size):
assert type(size) == torch.Size
N, C, H, W = size
ctx.size = size
if theta.is_cuda:
ctx.is_cuda = True
AffineGridGenerator._enforce_cudnn(theta)
grid = theta.new(N, H, W, 2)
theta = theta.contiguous()
torch._C._cudnn_affine_grid_generator_forward(theta, grid, N, C, H, W)
else:
ctx.is_cuda = False
base_grid = theta.new(N, H, W, 3)
linear_points = torch.linspace(-1, 1, W) if W > 1 else torch.Tensor([-1])
base_grid[:, :, :, 0] = torch.ger(torch.ones(H), linear_points).expand_as(base_grid[:, :, :, 0])
linear_points = torch.linspace(-1, 1, H) if H > 1 else torch.Tensor([-1])
base_grid[:, :, :, 1] = torch.ger(linear_points, torch.ones(W)).expand_as(base_grid[:, :, :, 1])
base_grid[:, :, :, 2] = 1
ctx.base_grid = base_grid
grid = torch.bmm(base_grid.view(N, H * W, 3), theta.transpose(1, 2))
grid = grid.view(N, H, W, 2)
return grid

this is what is contained in that file, i hope this helps thank you so very much for your help

richard · November 16, 2017, 9:26pm

What arguments are you passing into F.affine_grid? affine_grid takes two arguments: F.affine_grid(theta, size). The error is suggesting that theta.size()[0] should be equal to 1, but is actually equal to 49 (or the other way around).

Sohrab_Salimian · November 16, 2017, 9:29pm

im passing theta, x.size()
x.size has the dimensions [1,1,100,100], and theta is created using the following script:

self.localization = nn.Sequential(
nn.Conv2d(1, 8, kernel_size=7),
nn.MaxPool2d(2, stride=2),
nn.ReLU(True),
nn.Conv2d(8, 10, kernel_size=5),
nn.MaxPool2d(2, stride=2),
nn.ReLU(True)
)

    # Regressor for the 3 * 2 affine matrix
    self.fc_loc = nn.Sequential(
        nn.Linear(10 * 3 * 3, 32),
        nn.ReLU(True),
        nn.Linear(32, 3 * 2)
    )

    # Initialize the weights/bias with identity transformation
    self.fc_loc[2].weight.data.fill_(0)
    self.fc_loc[2].bias.data = torch.FloatTensor([1, 0, 0, 0, 1, 0])

# Spatial transformer network forward function
def stn(self, x):
    xs = self.localization(x)
    xs = xs.view(-1, 10 * 3 * 3)
    theta = self.fc_loc(xs)
    theta = theta.view(-1, 2, 3)

i attained the script straight from the pytorch implementation of the spatial transformer:

http://pytorch.org/tutorials/intermediate/spatial_transformer_tutorial.html

Sohrab_Salimian · November 16, 2017, 9:30pm

here is a more complete code

def init(self):
super(Net, self).init()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)

    # Spatial transformer localization-network
    self.localization = nn.Sequential(
        nn.Conv2d(1, 8, kernel_size=7),
        nn.MaxPool2d(2, stride=2),
        nn.ReLU(True),
        nn.Conv2d(8, 10, kernel_size=5),
        nn.MaxPool2d(2, stride=2),
        nn.ReLU(True)
    )

    # Regressor for the 3 * 2 affine matrix
    self.fc_loc = nn.Sequential(
        nn.Linear(10 * 3 * 3, 32),
        nn.ReLU(True),
        nn.Linear(32, 3 * 2)
    )

    # Initialize the weights/bias with identity transformation
    self.fc_loc[2].weight.data.fill_(0)
    self.fc_loc[2].bias.data = torch.FloatTensor([1, 0, 0, 0, 1, 0])

# Spatial transformer network forward function
def stn(self, x):
    xs = self.localization(x)
    xs = xs.view(-1, 10 * 3 * 3)
    theta = self.fc_loc(xs)
    theta = theta.view(-1, 2, 3)

    grid = F.affine_grid(theta, x.size())
    x = F.grid_sample(x, grid)

richard · November 16, 2017, 10:36pm

What’s the .size() of the thing you’re passing into stn?

Sohrab_Salimian · November 17, 2017, 12:36am

Well my input has the size (1,1,100,100)

Sohrab_Salimian · November 17, 2017, 8:21pm

does that information help at all?

richard · November 17, 2017, 8:38pm

I’m not sure what exactly in the tutorial poses a size restriction, but the input image to the network has to be 28 by 28. You could transform your 1x1x100x100 image to 1x1x28x28 and then send it in.

Sohrab_Salimian · November 17, 2017, 8:40pm

how would i reshape them, through numpy? because reshaping the actual matrix is impossible, unless i reshape the actual images, is that what you’re referring to?

richard · November 17, 2017, 8:52pm

http://pytorch.org/docs/master/torchvision/transforms.html?highlight=torchvision

Using torchvision, you can convert the tensor to a PIL image with torchvision.transforms.ToPILImage, then resize the image torchvision.transforms.Resize, then bring it back to a tensor with torchvision.transforms.ToTensor.