I have been importing a Theano-based pre-trained model to Pytorch. I think I have done right, but the model does not work properly and can not re-produce result like Theano. Could you please verify my snippets? Is there any difference between Convolution operation between these two frameworks? or any other things?
I’m not sure about native Theano code, but Lasagne flips the filters by default to compute a real convolution (instead of a correlation). Maybe it’s also the default in Theano? I couldn’t figure it out that fast.
To transfer Lasagne weights to Pytorch you have to flip them back. You can have a look at my approach of transferring weigths for the ProgGAN paper.
Important line:
@ptrblck, Actually my weight is based on Lasagne. I think the most critical part of my code is related to the convolution definition of mine, I mean the following code:
My doubt about the problem was really about the difference of convolution operation in these two frameworks. What I have understood from your snippet is that you are going to first rotate the conv weights in Lasange and second assign it to the weight of Pytorch’s conv. Am I right?
Yeah you are right.
I am flipping the kernels in the W and H dimension (in Lasagne filter_size), to make sure they are equal to the Lasagne definition.
Add np.copy around the sliced array like in my example. Be default the underlying array will be shared and Pytorch currently does not support negative indices.
Thanks. It worked.
But one more thing to mention, when I try to import pre-trained model to Pytorch from Lasange, something weird has happened. Let me describe a little about my problem. I would like to do fine-tuning on my dataset after importing. But Importing does not have any effective result. During the training, the accuracy metric of the system is random, and it seems that there are not any traces of learning from my data (with transfer learning). However, when I start to learn the network from a random state without any transfer learning, the performance metric changes and shows something is learned during the training. Do you have any idea what is the problem?
I’m glad it worked!
Did you compare the outputs of your Lasagne and Pytorch model? It should be approx. the same up to a tolerance. Just sample a random input with Numpy and call the forward passes on both models.
After you transfer the weights to Pytorch and run a simple prediction, what is the accuracy? Is it random, e.g. ~0.1 for 10 classes?
When you call model.parameters(), do you see all your layers?
Did you try a really low learning rate? Since the network is already pre-trained, the weights should be in a “good” state, so that the lr should be quite low. Could you try lowering the lr by factor 10 or 100 and run it again?
How similar is your new data set to the pre-trained data? Is it sampled from the same distribution (e.g. both are natural images)?
Hi again,
After Flipping the weights of convolution layers same as your snippet, I have done your suggestion, I mean comparing the output values. I have generated a random image in the range [0,255] with the size of 150*220 (same as the Lasange model input range) and forward it through both networks (Pytorch and Lasange models). However, the sum of the absolute errors between these two outputs is so huge. I think something strange has happened. The Lasange model which I am trying to transfer is from this GitHub repository. Could please help me, what is wrong with my above code?
Could you post your repository with your Pytorch and Lasagne code so that I could have a look at it?
I suppose you are trying to re-implement this model?
Thanks for the repo. I had a look at both implementations and found an error.
The Lasagne implementation uses flip_indices=False for the Conv layers, so that a flipping is not necessary.
Sorry for the confusion.
Anyway, I still have some questions regarding your re-implementation. Maybe we could continue this via messages or in Slack?
Sorry for my delayed response! Could you please tell me, did you use my provided sanity-check code and also my data folder, Or you have prepared another random value for yourself and then using this random value as the input to the pre-trained network (Lasange One)? Currently, my error is about: