Own images formated like MNIST

I trained a network which should recognize characters from the EMNIST dataset. Which works fine.
But now I have the problem that I want to use my own images. Does anybody know a useful way to do that?

I tried this code but the results are wrong:

transform = transforms.Compose([
    transforms.Resize((28, 28)),

img = Image.open(picture_name)
img_tensor = transform(img)
img_tensor_array = img_tensor.unsqueeze(0)
#pil_image = transforms.ToPILImage(mode='L')(img_tensor)
with torch.no_grad():
    data = Variable(img_tensor_array.cuda())
out = model(data)
print(out.data.max(1, keepdim=True)[1])      # print the result

Thanks for all useful answers!!!

You have to make sure that the distribution of the test data are similar to the training data.

  • First, make sure that the sequence of transformations applied to train and test data result in the same range of values. For example, it can be that in the training data, the background is denoted as 1 and character pixels are denoted with non-zero values, but in the test data it’s opposite.
    Another thing (which does not seem to be the case based on your code), if you apply transforms.Normalize(mean=0.5, std=0.5), then the input tensors will be in the range [-1, 1] but the other one may be in range [0, 1].

  • Also, the images in the test set might have completely different background, different scales of the characters with respect to the image size, … So all these can be the reasons why the model does not work well on the test set. So, first you can visualize some samples from train and test sets, and see if there are distinguishable from each other, and try to make them as similar as possible. For example, you can apply thresholding (https://en.wikipedia.org/wiki/Thresholding_(image_processing)) to remove the background noise if thats necessary.

Thank your very much for your answer. I try to visualize it with matplotlib!

How does the normalize function work? Because I do not understand how the parameters change the tensors.

Another big problem for me are the dimensions of the images.
Do you know any piece of code to turn any RGB-picture or whatever into a 1 dimension image?

Thanks, tris_b


The normalize function will basically subtract the mean from the input image, and then divide by the std. So, if X is the image tensor, then normalize will return X_norm = (X-mean)/std.

So, for visualization, you have to reverse this operations:

X_rev = X_norm * std + mean

For converting an RGB image to a 1-channel image (monochrome), there are multiple ways:

  • Use only one of the channel in the RGB image, for example the red channel
  • Use the average of the three channels
  • Use a weighted sum of the input channels, with weight vectors [0.2989, 0.5870, 0.1140]
  • Use PyTorch function torchvision.transforms.functional.to_grayscale(img) (if you are dealing with a PIL image, not a tensor)
## img is a PIL image, and X is the corresponding tensor 
## method 1: take the red channel
X_gray = X[0, :, :]

## method 2: take the averge of all three input channels
X_gray = torch,mean(X, dim=0, keepdim=True)

## method 3: weighted sum of input channels
weights = torch.tensor([0.2989, 0.5870, 0.1140])
torch.sum(weights.view(3, 1, 1) * X, dim=0, keepdim=True)

## method 4: using torch function to_grayscale
from torchvision.transforms import functional as TF
img_gray = TF.to_grayscale(img)

Ok, that’s rather easy but what if I have values between 0 and 255. I inserted 255 and 0 for X and X_norm were 1 and 0. Then I solved the two equations and got 0 for mean and 255 for std. Can that be right?

Thank you very much! The grayscale works perfectly!

I see what you did. So if you assign mean=0 and std=255, then that means you are just scaling the range of pixel intensities from to be within [0, 1]. That can be simply done by X_norm = X/255.0 as well.

But also note that if you have a PIL image, and then use the function torchvision.transforms.functional.to_tensor(), the result will already be in the range [0, 1]. so in that case, make sure that do not divide by 255.0 again, otherwise, the pixel values will be really small, within range [0, 0.003921].

Got it.
Thank you very much for your time. You helped me a lot!

Sure, I am happy to help!