Error for grid_generator in spatial transformer

result = self.forward(*input, **kwargs)
File “test.py”, line 141, in forward
x = self.stn(x)
File “test.py”, line 135, in stn
grid = F.affine_grid(theta, x.size())
File “/home/sohrab/anaconda3/lib/python3.5/site-packages/torch/nn/functional.py”, line 1527, in affine_grid
return AffineGridGenerator.apply(theta, size)
File “/home/sohrab/anaconda3/lib/python3.5/site-packages/torch/nn/_functions/vision.py”, line 98, in forward
grid = torch.bmm(base_grid.view(N, H * W, 3), theta.transpose(1, 2))
RuntimeError: invalid argument 2: equal number of batches expected, got 1, 49 at /home/sohrab/pytorch/aten/src/TH/generic/THTensorMath.c:1634

I am trying to input my data into the network but i keep getting this error, my dimensions of my images are (1,1,100,100)
im not sure what to do any help would be incredibly appreciated

Could you provide more information about how you’re calling affine_grid? The documentation suggests that the first argument should have size (N, 2, 3) and the second argument should be 4-dimensional with size (N, C, H, W).

so the affine_grid is a spatial transformer in the pytorch libraries, the error comes from the vision.py file on line 98 as indicated,

def forward(ctx, theta, size):
assert type(size) == torch.Size
N, C, H, W = size
ctx.size = size
if theta.is_cuda:
ctx.is_cuda = True
AffineGridGenerator._enforce_cudnn(theta)
grid = theta.new(N, H, W, 2)
theta = theta.contiguous()
torch._C._cudnn_affine_grid_generator_forward(theta, grid, N, C, H, W)
else:
ctx.is_cuda = False
base_grid = theta.new(N, H, W, 3)
linear_points = torch.linspace(-1, 1, W) if W > 1 else torch.Tensor([-1])
base_grid[:, :, :, 0] = torch.ger(torch.ones(H), linear_points).expand_as(base_grid[:, :, :, 0])
linear_points = torch.linspace(-1, 1, H) if H > 1 else torch.Tensor([-1])
base_grid[:, :, :, 1] = torch.ger(linear_points, torch.ones(W)).expand_as(base_grid[:, :, :, 1])
base_grid[:, :, :, 2] = 1
ctx.base_grid = base_grid
grid = torch.bmm(base_grid.view(N, H * W, 3), theta.transpose(1, 2))
grid = grid.view(N, H, W, 2)
return grid

this is what is contained in that file, i hope this helps thank you so very much for your help

What arguments are you passing into F.affine_grid? affine_grid takes two arguments: F.affine_grid(theta, size). The error is suggesting that theta.size()[0] should be equal to 1, but is actually equal to 49 (or the other way around).

im passing theta, x.size()
x.size has the dimensions [1,1,100,100], and theta is created using the following script:

self.localization = nn.Sequential(
nn.Conv2d(1, 8, kernel_size=7),
nn.MaxPool2d(2, stride=2),
nn.ReLU(True),
nn.Conv2d(8, 10, kernel_size=5),
nn.MaxPool2d(2, stride=2),
nn.ReLU(True)
)

    # Regressor for the 3 * 2 affine matrix
    self.fc_loc = nn.Sequential(
        nn.Linear(10 * 3 * 3, 32),
        nn.ReLU(True),
        nn.Linear(32, 3 * 2)
    )

    # Initialize the weights/bias with identity transformation
    self.fc_loc[2].weight.data.fill_(0)
    self.fc_loc[2].bias.data = torch.FloatTensor([1, 0, 0, 0, 1, 0])

# Spatial transformer network forward function
def stn(self, x):
    xs = self.localization(x)
    xs = xs.view(-1, 10 * 3 * 3)
    theta = self.fc_loc(xs)
    theta = theta.view(-1, 2, 3)

i attained the script straight from the pytorch implementation of the spatial transformer:

http://pytorch.org/tutorials/intermediate/spatial_transformer_tutorial.html

here is a more complete code

def init(self):
super(Net, self).init()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)

    # Spatial transformer localization-network
    self.localization = nn.Sequential(
        nn.Conv2d(1, 8, kernel_size=7),
        nn.MaxPool2d(2, stride=2),
        nn.ReLU(True),
        nn.Conv2d(8, 10, kernel_size=5),
        nn.MaxPool2d(2, stride=2),
        nn.ReLU(True)
    )

    # Regressor for the 3 * 2 affine matrix
    self.fc_loc = nn.Sequential(
        nn.Linear(10 * 3 * 3, 32),
        nn.ReLU(True),
        nn.Linear(32, 3 * 2)
    )

    # Initialize the weights/bias with identity transformation
    self.fc_loc[2].weight.data.fill_(0)
    self.fc_loc[2].bias.data = torch.FloatTensor([1, 0, 0, 0, 1, 0])

# Spatial transformer network forward function
def stn(self, x):
    xs = self.localization(x)
    xs = xs.view(-1, 10 * 3 * 3)
    theta = self.fc_loc(xs)
    theta = theta.view(-1, 2, 3)

    grid = F.affine_grid(theta, x.size())
    x = F.grid_sample(x, grid)

What’s the .size() of the thing you’re passing into stn?

Well my input has the size (1,1,100,100)

does that information help at all?

I’m not sure what exactly in the tutorial poses a size restriction, but the input image to the network has to be 28 by 28. You could transform your 1x1x100x100 image to 1x1x28x28 and then send it in.

how would i reshape them, through numpy? because reshaping the actual matrix is impossible, unless i reshape the actual images, is that what you’re referring to?

http://pytorch.org/docs/master/torchvision/transforms.html?highlight=torchvision

Using torchvision, you can convert the tensor to a PIL image with torchvision.transforms.ToPILImage, then resize the image torchvision.transforms.Resize, then bring it back to a tensor with torchvision.transforms.ToTensor.