Image classification with libtorch

Hello,

I’m entirely new to PyTorch. Im doing the Fast.AI course. My (end) goal is to create (simple) classification (and segmentation) CNN, and actual use them. At this point im experimenting with a very simple CNN, and try to have a classification done from C++ to get a minimal happy flow path. And i am stuck. Code below actually does something. The output tensor has 10 results of the 10 output classes, and in about 10% of the cases it’s correct. Just as good as an dice.

I have trouble finding good example or tutorial how to do this. What i’m suspect is that the conversion from image.mpData to the input tensor is not correct. At this moment i don’t know for sure how the data should look like. Is it RGB, BGR or are the 3 full color planes stacked? Is first pixel top-left or bottom-left?

Also is my conversion from int to float correct? Do i need more pre-processing like a offset of -0.5 or normalisation?

If somebody can point out my faults, or in the right direction. Any help is appreciated.

Regards,
Richard

C++ code snippet for doing an classification.
image.mpData:

  • 3 byte per pixel (1 byte per color)
  • order: RGB, RGB, RGB…
  • first byte is top-left
  • Pitch/Stride is 96 bytes
  • image are 32x32 pixels
void Load()
{
  std::ifstream is("cifar10.pt", std::ifstream::binary);
  mModule = torch::jit::load(is);
}

int32_t ForwardImage(image)
{
  torch::Tensor tensor_image = torch::from_blob(image.mpData, { 1, 3, image.mHeight, image.mWidth }, torch::kByte).clone();
  tensor_image = ((tensor_image.to(at::kFloat))/255);
  std::vector<torch::jit::IValue> inputs;
  inputs.emplace_back(tensor_image);

  torch::jit::IValue output = self->mModule->forward(inputs);
  torch::Tensor outputTensor = output.toTensor();
  torch::Tensor classification = outputTensor.argmax(1);
  int32_t classificationWinner = classification.item().toInt();
  return classificationWinner;
}

Training the network. A copy of the contents of my jupyter notebook. Ends with an error of .31, thats fine for now.

## Load
imageset = '/ImageSets/cifar-10'
data = ImageDataBunch.from_folder(imageset, valid='test', size=32)
print(data.classes)

## Train
learn = cnn_learner(data, models.resnet34, metrics=error_rate)
learn.fit_one_cycle(5)

## Save model as .pt
learn.export() 
modelTS = learn.model.cpu()
exampleTensor = torch.rand(1, 3, 32, 32)
traced_script_module = torch.jit.trace(modelTS, exampleTensor)
traced_script_module.save("cifar10.pt")

I have it working :)))))

It turned out i had the image data in the wrong order.

image.mpData should not be RGB, RGB, RGB. But have first the complete red plane, then the green plane, and last the blue plane.

eg. If the image height and width both are 3 (very small just for this example.)
the byte input blob should look like:
R0 R1 R2 R3 R4 R5 R6 R7 R8 G0 G1 G2 G3 G4 G5 G6 G7 G8 B0 B1 B2 B3 B4 B5 B6 B7 B8

Ow, and 0,0 is the top-left of the image, also the float values of the image should be 0.000 - 1.000, dividing by 255 is correct. Other preprocessing is not needed.

Regards,
Richard