PyTorch 1.0 - How to predict single images - mnist example?

I followed the mnist example github_pytorch_mnist_example

I got almost everthing the same…

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4*4*50, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4*4*50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

the train function and test function stayed also (mostly) the same, omited args, instead I only pass log_interval to the train functions.

instead of using args I initialised the needed values by assignment:

log_interval = 64;
batch_size = 64;
test_batch_size = 64;

use_cuda = False; #not args.no_cuda and torch.cuda.is_available()
torch.manual_seed(1); #args.seed
device = torch.device("cuda" if use_cuda else "cpu");
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {};

lr = 0.01;
momentum = 0.5;

init the test loader

# init the train loader
# ...

# init the test loader
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('./data', train=False,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=test_batch_size, shuffle=True, **kwargs
)

train the model:

model = Net().to(device);
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum);

epochs = 3
for epoch in range(1, epochs + 1):
    train(model, device, train_loader, optimizer, epoch, log_interval)
    test(model, device, test_loader)

and this works fine so far, model trains and got good output.

But this is where the example ends.

My aim is to create a mnist example from zero to production.

For this the next thing I need to know is how to predict a single image.
I did not found documentation to that topic.

I tried this (which worked in PyTorch 0.4 imo):

single_loaded_img = test_loader.dataset.data[0]
single_loaded_img = single_loaded_img.to(device)
single_loaded_img = Variable(single_loaded_img)
out_predict = model(single_loaded_img)

but this returned:

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [20, 1, 5, 5], but got 2-dimensional input of size [28, 28] instead

I kind of understand that the model is expecting batch-ed?! data?
But I have no clue how to transform the image in a way that the model can use it to predict.

I would like the solution but I would more like to understand what I have to do, what knowledge I miss…

You are correct in your assumption about the missing batch dimension.
Even a single sample should contain a batch dimension with a size of 1.
Additionally to this, since you’re dealing with grayscale images (single channel), the channel dimension is also missing.
The input to nn.Conv2d should have the shape [batch_dimension, channel, height, width].

In your case you could call unsqueeze twice to add the missing dimensions of add these dimensions with a None index:

single_loaded_img = single_loaded_img[None, None]

Also, Variables are deprecated. If you are using a new version (> 0.3.1), you can just remove the Variable wrapper.

Thanks for the answer.

I updated my code to:

single_loaded_img = test_loader.dataset.data[0]
single_loaded_img = single_loaded_img.to(device)
single_loaded_img = single_loaded_img[None, None]

out_predict = model(single_loaded_img)

this produced the following error:

RuntimeError: _thnn_conv2d_forward is not implemented for type torch.ByteTensor

So I tried to follow this thread:

so I changed it to:

single_loaded_img = test_loader.dataset.data[0]
single_loaded_img = single_loaded_img.to(device)
single_loaded_img = single_loaded_img[None, None]
single_loaded_img = single_loaded_img.type('torch.DoubleTensor')

out_predict = model(single_loaded_img)

but this returned:

RuntimeError: Expected object of scalar type Double but got scalar type Float for argument #2 'weight'

so I figured I had to do this:

single_loaded_img = test_loader.dataset.data[0]
single_loaded_img = single_loaded_img.to(device)
single_loaded_img = single_loaded_img[None, None]
single_loaded_img = single_loaded_img.type('torch.FloatTensor') # instead of DoubleTensor

out_predict = model(single_loaded_img)

and finally

print(out_predict)
pred = out_predict.max(1, keepdim=True)[1]
print(pred)

it’s working, looks a bit wrapped tensor([[7]]) but ok

I still wonder, bec you wrote:

where do I can find / see / read this, bec the docs of the nn.Conv2d only says that the input_channl an int is

CLASS torch.nn. Conv2d ( in_channels , out_channels , kernel_size , stride=1 , padding=0 , dilation=1 , groups=1 , bias=True )

and

  • in_channels ( int ) – Number of channels in the input image
1 Like

Good to hear it’s working!
You can find the input and output shape infos in the docs in the “shape” section.