How to Classify Single Image using Loaded Net

Hi,

I trained, saved, and can load a resnet50 net, but am not sure of how to feed in a single image for classification using the loaded net.

I’ve tried using the dataloader function, but I think that is more appropriate for testing a group of images that already have labels, whereas a single image that you are trying to classify obviously would not have a label.

If you could point me to the right function to use or to a general methodology to follow, I would greatly appreciate it.

Sorry if this is a bit basic of a question, but for some reason I could not find much online to guide me on this.

Thanks

5 Likes

I use something like this. I think it’s a mashup of code from various tutorials/examples so my apologies if I should be crediting someone…

uses PIL and torchvision.transforms

imsize = 256
loader = transforms.Compose([transforms.Scale(imsize), transforms.ToTensor()])

def image_loader(image_name):
    """load image, returns cuda tensor"""
    image = Image.open(image_name)
    image = loader(image).float()
    image = Variable(image, requires_grad=True)
    image = image.unsqueeze(0)  #this is for VGG, may not be needed for ResNet
    return image.cuda()  #assumes that you're using GPU

image = image_loader(PATH TO IMAGE)

your_trained_net(image)

hope that helps

12 Likes

Awesome, thank you so much. That worked.

1 Like

Hi again,

One more quick question.
So I am able to classify the image using the code you listed above, but my output is the following:

Variable containing:
0
[torch.LongTensor of size 1]

How do I extract the value of this variable and convert it to, say, an int?
I want to use that int to print out the string description of the category it belongs to.
As I understand it, the value in this variable is the index of the class (folder) that it predicts the image belongs to?

Many thanks

You can extract the tensor present in a Variable by accessing the .data field.
Then, you can take the value corresponding to it by simply indexing it. Here is an example

variable = Variable(torch.rand(1))
num = variable.data[0]

Great, but once I have extracted the LongTensor, how do I convert it to an int?

Thanks

standard casting

int(variable.data[0])

ERROR PyTorch 0.1.11

I think the predict is 2-d tensor, so variable.data[0] is LongTensor. You should use predict.data[0][0] if batch_size is 1

2 Likes

One last thing-wanted to confirm this is in fact correct:

As I understand it, the value of the variable mentioned above is the index of the class (folder) that it predicts the image belongs to?
So, that int is the index of the class name (where the names of the classes are in alphabetical order)?

yes, provided the second argument to torch.max is looking at the correct dimension of the output Variable. Which yours should be, if you’re using a pretrained resnet.

int(predicted.data[0])

will be casting the index of the maximum value in the tensor contained in the Variable that is output from the network to an integer. Though, in this case, you don’t need the cast to int. Indexing to a specific location in a LongTensor will return an int.
Sorry, I know I suggested to use that before. Didn’t understand what you were asking :no_mouth:

1 Like

I keep getting an error “Expected 3D tensor” when trying to classify image with a model saved by @smth ImageNet example. What am I doing wrong? Model was trained on GPU with CUDA and saved in the pth.tar format.

# Bunch of imports go here

# Convert image to Variable
def Torchify( aImage ):
    ptLoader = transforms.Compose([transforms.ToTensor()])
    aImage = ptLoader( aImage ).float()
    aImage = Variable( aImage, volatile=True  )
    return aImage.cuda()

# Load model from Checkpoint
print("=> Loading Network")
ptModelAxial = densenet.__dict__['densenet161'](pretrained=False, num_classes=5)
ptModelAxial.classifier = nn.Linear(8832, 5)
ptModelAxial = torch.nn.DataParallel(ptModelAxial).cuda()
dTemp = torch.load("best.pth.tar")
ptModelAxial.load_state_dict(dTemp['state_dict'])
for p in ptModelAxial.parameters():
    p.requires_grad = False
ptModelAxial.eval()

InputImg = skimage.img_as_float(skimage.io.imread(sFileName))
ptModelPreds = ptModelAxial( Torchify(InputImg) )
print( ptModelPreds )

Error:

Traceback (most recent call last):
  File "extract.py", line 298
      ptModelPreds = ptModelAxial( Torchify(InputImg) )
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 45, in parallel_apply
    raise output
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 25, in _worker
    output = module(*input, **kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/keyur/kaggle/densenet.py", line 153, in forward
    features = self.features(x)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/container.py", line 64, in forward
    input = module(input)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 237, in forward
    self.padding, self.dilation, self.groups)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/functional.py", line 39, in conv2d
    return f(input, weight, bias)
RuntimeError: expected 3D tensor

send a tensor in of 1 x C x H x W. Even for a single image, you have to send in the mini-batch dimension.

1 Like

thanks for the answer it worked , but yeah image.unsqueeze(0) is still needed in ResNet otherwise you might get this error RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0

So I have a couple of questions here:

  1. I am not sure why when I use the line image = loader(image).float() I receive an error. Do you have any idea? but if i dont use it, it works.

Here is the error that I get:
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

  1. when I do the classification, e.g. resnet = models.resnet18(pretrained = True);
    predicted = resnet(inputVar) and plot the predicted values, the min and max values are -2.3611, and 2.4880, respectively.
    However, shouldnot the min and max values be between 0 and 1?
  1. one of them (model or input) is in cpu and the other is in GPU.
    Upload them to GPU using model = model.cuda(), image = image.cuda() commands.

  2. The output that we get from pretrained models are unnormalized linear layer outputs.
    you can use F.softmax() to convert them to probabilities.

but why still get error, the code is as follow:
loader = transforms.Compose([ transforms.Scale(311),
transforms.CenterCrop(299), transforms.ToTensor()])
def image_loader(image_name):
“”“load image, returns cuda tensor”""
image = Image.open(image_name)
image = loader(image).float()
image = Variable(image, requires_grad=True)
image = image.unsqueeze(0) #this is for VGG, may not be needed for ResNet
return image
# return image.cuda() #assumes that you’re using GPU
image = image_loader(‘test.jpg’)
print(image.shape)
model = models.inception_v3(pretrained=True)
model(image)

the image shape is torch size [1,3,299,299]
but get error:
Expected more than 1 value per channel when training, got input size [1, 768, 1, 1]

Could you try to set model.eval() and run it again?
Probably your model is still in training and thus probably a BatchNorm layer needs more than one pixel / sample to calculate the running stats.

1 Like

sorry for so late reply, thanks!

Found a weird behavior here (bug), any advice?

Common code:

normalize = transforms.Normalize(
   mean=[0.485, 0.456, 0.406],
   std=[0.229, 0.224, 0.225]
)
preprocess = transforms.Compose([
   transforms.Resize(256),
   transforms.CenterCrop(224),
   transforms.ToTensor(),
   normalize
   # transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

resnet_model = torchvision.models.resnet50(pretrained=True, num_classes=1000)
resnet_model.eval()

for i in range(4):
    # An instance of your model.
    img_pil = Image.open("/home/alejandro/workspace/uav_detection/images/" + str(i + 1) + ".jpg")
    # img_pil.show()
    img_tensor = preprocess(img_pil).float()
    img_tensor = img_tensor.unsqueeze_(0)

    fc_out = resnet_model(Variable(img_tensor))

    output = fc_out.detach().numpy()
    print(output.argmax())

Results with:
Python 2.7 Anaconda and pytorch 1.0.0a0+37627a1 (built from source):

918
918
918
918

Python 2.7 Anaconda and pytorch 0.4.1.post2 (downloaded with pip):

834
208
285
478