i am currently struggling to figure out what is the problem.
i made custom vgg16 model in python environment and export model with using torch.jit.trace (No training)
and then i tried to train this model in c++ but it seems that accuracy is not the same as python environment
why?
※ Also i found specific model can’t train in c++ (another custom model works well but only custom vgg16 model accuracy seems weird, of course i used exact same training loop in c++ )
gpu enabled
current epoch = 0
average accuracy : 0.264888
average cost : 1.38596
current epoch = 1
average accuracy : 0.268539
average cost : 1.38558
current epoch = 2
average accuracy : 0.267135
average cost : 1.38528
current epoch = 3
Vgg16 Model (Model Structure)
TrainingScript(Include save model)
C++ repository is private access now . so if you guys want to see c++ code also
then i can give source code privately
torch.jit.trace will use the provided input to record all functions as they were executed and is thus unable to use any data-dependent control flow etc.
In your particular case all dropout layers would be “fixed” which is most likely one of the reasons the training fails using the traced model. torch.jit.script on the other hand should be able to track these operations as well.
i just figured out , i trained at least one epoch before save model
then training fail disappeared
My guess is that through 1 epoch training,
torch.jit.trace was able to trace the training process.
Is this guess correct?
Also, is 1 epoch training before saving the model likely to be an issue in the future? (potential issue?)
If there are no potion issues, I have no problem using this method now.
No, I don’t think training the model for one epoch should change anything and cannot explain why it seems to work now. trace would still keep the layers in their traced “state”, i.e. it would still keep the same dropout mask etc., which would still be concerning.
ummmm If you’re right, I won’t be using dropouts in the future. However, unlike the dropout issue, the model saved after learning 1 epoch seems to be learning smoothly in C++. I will post related screenshots.
Yes, just replace torch.jit.trace with torch.jit.script in your Python script and save the model afterwards. This section of the tutorial might be interesting to take a look at.