Confused about constant loss and metrics

Context

I have been training a network using VGG16 and the simple datasets from Caltech101.

I added to the headless VGG16 a simple regression head, and trained the network to output a bounding box to the objects.

Both for “stop signs” dataset and “airplanes” it draws a very accurate bounding box.

The net also works well if I train for binary classification between “stop signs” and “airplanes” so I am assuming the code is correct.

Problem

My next step was to download a barcode dataset from Roboflow, clean up the data and run it again.

However I get a constant accuracy for the evaluation set and error, and for the training set as well.

I am confused as to what would be the reason for this, supposing that the code is correct.
It has been frustrating and I wonder if any person with more knowledge could give some advice.

Barcodes, if properly aligned, could likely be best run with just one layer, or with Conv1d layers. They are only varied in the horizontal dimension. Whereas CNNs like VGGs are trained to detect 2 dimensional features.

Additionally, pooling layers may effectively remove important information from a barcode.

Lastly, if you do use Conv layers, a limited number of them would likely be ideal with barcodes.

1 Like

when you say “just” one layer, do you mean no deep network? just a single dense layer?

in this case the barcodes (1 or 2 D) are embedded in images though, I am not sure if you talk about the same? I amt trying to get the bounding box around it (one per image for now.)

but I appreciate your input, I started the process of getting rid of the large NN and write a custom smaller one.

maybe something like this would be better?

  1. grey input
  2. conv2d
  3. flatten
    3.regression
  4. output vector (x1,y1,x2,y2)

If you’re going to use kernels, should make sure it is at least as wide as each spacing sequence in the barcode.

Additionally barcodes have different symbologies which each reader must be adapted to.

You’ll probably get better results just feeding the model the numbers of the bar code, either tokenized, or as an image with a CNN and something which tokenizes the barcode convention being used.

But I am not reading it, just finding the position (coordinates in the image). I dont need the value in rhis case. I guess tokenizing is mostly if I wanted the value right?

VGG16 is just a CNN. To get the bounding box, feeding cnns with labelled images is normally not enough.

You can use Sliding window, R-CNN, or SDD-VGG etc etc

I ended up using Yolov7, worked great.