Caffe2 resuming training and predictor issue questions


(Filip Jakab) #1

Hello,
I would highly appreciate a help. I’m making a project which contains ML component based on Caffe2 python version. It’s running on Ubuntu 16.04. Method: build from source. Everything was set up and built without errors… And after learning from MNIST and CIFAR tutorials from caffe2 jupyter pages I got to the phase when I have scripts which contain logic for lmdb creation, train, val and deploy model initialization and I can also train the train model on my own dataset(just about 2k entries for 3 classes by now). Things get weird when with this small dataset and like 1k iterations of 48 batch size, it can achieve 95% accuracy with 15% loss with basic CIFAR10 CNN structure.
I can also sucessfuly save initNet and predictNet of deploy model. But when I try to load these .pb files and ‘resume’ training on same dataset, it ‘learns’ from beginning… i.e the loss and accuracy improves from begining And thats what I’m struggling with save\load part. And Lastly, to the title. When I load model’s .pb files and feed those to workspace.Predictor method and run it with preprocessed image, I get results, which are not correct, and when I do this for another image(containing different class) I get nearly same results

Thanks in advance.

PS: I may also provide any source code files if neccessary

EDIT:

I found mistake in my code where I was defining validation model with name equal to train model’s name so it might got overwritten. And I also discovered that Caffe2 MNIST tutrial is not same as one written in jupyter at caffe2 tutorials github repo…

EDIT2:

But after previous error correction I still have issue with predictor… It returns same probabilities for multiple images of different classes…