Image rotation detection

I wanted to create a model that would detect whether image is rotated 0, 90, 180 or 270 degrees (classify into one of 4 classes).

I took the following approach based on https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html :

  1. Took about 400 self made photos, rescaled to minimum dimension of 256 pixels, took random crop of 224x224, random rotation (affects label), added flop (does not affect label). Then the dimensions are swapped, the pixel values are divided by 256 (to be in 0-1 range) and normalized (to be centered around 0).
  2. Used resnet18 as base model and added fc layer with 4 outputs.
  3. I am trying to teach my network something for 25 epochs, which on a CPU is quite time consuming :slight_smile:

Unfortunately while the training accuracy goes slightly above 90% the validation accuracy does not show the model is useful at all. I tried to check how much the model deviates from expected output “by hand” and it seems that the model does not depend much on the input - when looking at values assigned to a class on output they are very similar across validation images (eg. -0.2, 0.6, -0.4, 0.1).

Therefore I have the following questions:

  1. Do you think the approach makes sense for this scenario?
  2. Do you think I have enough training data?
  3. How can I debug my training/validation process? What can go wrong?

Hi Tomasz!

From your original 400 photos, how many derived photos
did you make? How large is your total dataset, that is, how
many total images do you have, original plus augmented?

How do you split your dataset into training and validation?

Do you purposely include all four rotations of an image in
your dataset (or all eight rotations plus reflection (flop))?
(I would.)

Do you make a point of not putting augmented images derived
from the same original image in both the training and validation
datasets? (Note, depending on the details of which augmented
images went where this could arguably make your validation
accuracy come out either better or worse.)

I don’t know anything about resnet, but on the surface, this
seems reasonable.

To me such numbers look like a reasonably strong prediction that
your sample is in the second of your four classes (that is, the
class with the predicted logit of 0.6).

What do such numbers look like for typical training images?

This seems reasonable.

Quite possible not. How many parameters are in your whole
network? How many are in your added fully-connected layer?

I’m speculating that resnet18 has millions of parameters.
Although you augment your data, you start with 400 original
samples (so maybe 200 original samples in your training set).
I don’t know for certain, but it feels like a lot of parameters
for about 200 samples.

Do you further train the whole network, or just the added layer?
If you are further training the whole network, I would suggest
trying just training the added layer. If your validation accuracy
is then in line with your training accuracy (even if neither is
good), that would tend to suggest that you don’t have enough
training data in comparison with the number of parameters
(that you are training) in your network.

Well, you could try to double-check predictions for some of
both your training and validation images. Hopefully you are
using nearly exactly the same code for calculating your
training and validation accuracy. If you have two copies
of copy-pasted code you’ve opened a a place for a typo
or other error to creep in.

One quick check would be to pump your training set through
your validation-accuracy code and see if you reproduce your
training accuracy.

Good luck.

K. Frank

  1. For training I have 400 original files times 4 rotations times 2 (flop) which would give about 3200 pictures times random crops (256x256->224x224 - around 1000?). Since this is random not all of them will be used. For validation I have a separate set of 12 pictures - not much, but I think a good start.
  2. Yes all the rotations are included on purpose.
  3. The training and validation datasets are completely independent.
  4. Yes the numbers look like this is indeed the second class. The problem was that output was very, very similar for all validation images - it didn’t seem to depend on input much.
  5. I train the whole network. I am not sure that training just the last layer would be enough as normally the resnet used for image classification should be rotation invariant - quite opposite of what want to do here.
  6. Good point about trying to pump training data through validation pipeline. Thanks a lot.

While trying to simplify the code for validation I figured out that doing the scaling externally using ImageMagick and removing the scaling code from my program fixes the issue - my validation accuracy is in excess of 80%. Now I need to figure out what was wrong with image resizing using skimage.transform.resize…