Is there any difference between theano convolution and pytorch convolution?

Hi,

I have been importing a Theano-based pre-trained model to Pytorch. I think I have done right, but the model does not work properly and can not re-produce result like Theano. Could you please verify my snippets? Is there any difference between Convolution operation between these two frameworks? or any other things?

def build_architecture(self):
	with open(self.model_url, 'rb') as f:
		data = pickle.load(f, encoding='latin1')
	parameters = data['params']
	extractor = []
	classifier = []
	BN_counter = 1
	BN_flag = False
	classifier_flag = False
	conv3 = False
	for i, p in enumerate(parameters):
	    ndim = p.ndim
	    if ndim == 4:
	        out_channel, in_channel, F, _ = p.shape
	        if F == 11:
	            extractor += [nn.Conv2d(in_channel, out_channel, F, stride=4, padding=0, bias=False)]
	            extractor[-1].weight.data = torch.from_numpy(p)
	        elif F==5:
	            extractor += [nn.Conv2d(in_channel, out_channel, F, stride=1, padding=2, bias=False)]
	            extractor[-1].weight.data = torch.from_numpy(p)
	        elif F==3:
	            extractor += [nn.Conv2d(in_channel, out_channel, F, stride=1, padding=1, bias=False)]
	            extractor[-1].weight.data = torch.from_numpy(p)
	            conv3 = True
	        # extractor += [nn.ReLU()]
	    elif ndim == 2:
	        in_channel, out_channel = p.shape
	        classifier += [nn.Linear(in_channel, out_channel, bias=False)]
	        # classifier += [nn.ReLU()]
	        classifier_flag = True
	    elif ndim == 1:
	        if BN_counter == 4:
	            BN_flag = True
	            BN_counter = 1
	        else:
	            BN_counter = BN_counter + 1
	        if BN_flag == True:
	            in_channel = p.shape
	            if not classifier_flag:
	                extractor += [nn.BatchNorm2d(in_channel)]
	                extractor[-1].weight.data = torch.from_numpy(parameters[i-3])
	                extractor[-1].bias.data = torch.from_numpy(parameters[i-2])
	                extractor[-1].running_mean = torch.from_numpy(parameters[i-1])
	                extractor[-1].running_var = torch.from_numpy((1./(parameters[i]**2)) - 1e-4)
	                extractor += [nn.ReLU()]
	                if not conv3:
	                    extractor += [nn.MaxPool2d(3,stride=2)]
	            else:
	                classifier += [nn.BatchNorm1d(in_channel)]
	                classifier[-1].weight.data = torch.from_numpy(parameters[i-3])
	                classifier[-1].bias.data = torch.from_numpy(parameters[i-2])
	                classifier[-1].running_mean = torch.from_numpy(parameters[i-1])
	                classifier[-1].running_var = torch.from_numpy((1./(parameters[i]**2)) - 1e-4)
	                classifier += [nn.ReLU()]
	            BN_flag = False
	return extractor, classifier

I’m not sure about native Theano code, but Lasagne flips the filters by default to compute a real convolution (instead of a correlation). Maybe it’s also the default in Theano? I couldn’t figure it out that fast. :wink:

To transfer Lasagne weights to Pytorch you have to flip them back. You can have a look at my approach of transferring weigths for the ProgGAN paper.
Important line:

conv_layer.conv.weight.data = torch.FloatTensor(np.copy(conv_w.W.get_value()[:, :, ::-1, ::-1]))

Could you try flipping the kernels and run the code again?

@ptrblck, Actually my weight is based on Lasagne. I think the most critical part of my code is related to the convolution definition of mine, I mean the following code:

if ndim == 4:
    out_channel, in_channel, F, _ = p.shape
    if F == 11:
        extractor += [nn.Conv2d(in_channel, out_channel, F, stride=4, padding=0, bias=False)]
        extractor[-1].weight.data = torch.from_numpy(p)
    elif F==5:
        extractor += [nn.Conv2d(in_channel, out_channel, F, stride=1, padding=2, bias=False)]
        extractor[-1].weight.data = torch.from_numpy(p)
    elif F==3:
        extractor += [nn.Conv2d(in_channel, out_channel, F, stride=1, padding=1, bias=False)]
        extractor[-1].weight.data = torch.from_numpy(p)
        conv3 = True

My doubt about the problem was really about the difference of convolution operation in these two frameworks. What I have understood from your snippet is that you are going to first rotate the conv weights in Lasange and second assign it to the weight of Pytorch’s conv. Am I right?

By the way, Tnx for your response!

Yeah you are right.
I am flipping the kernels in the W and H dimension (in Lasagne filter_size), to make sure they are equal to the Lasagne definition.

1 Like

@ptrblck, Thank you. I will check it and give feedback to you on this topic.

Hi again,
Your suggestion did not work. what i have done was:

extractor[-1].weight.data = torch.from_numpy(p[:, :, ::-1, ::-1])

But the suggested command makes an error:

RuntimeError: some of the strides of a given numpy array are negative. This is currently not supported, but will be added in future releases.

Any idea?

Add np.copy around the sliced array like in my example. Be default the underlying array will be shared and Pytorch currently does not support negative indices.

Thanks. It worked.
But one more thing to mention, when I try to import pre-trained model to Pytorch from Lasange, something weird has happened. Let me describe a little about my problem. I would like to do fine-tuning on my dataset after importing. But Importing does not have any effective result. During the training, the accuracy metric of the system is random, and it seems that there are not any traces of learning from my data (with transfer learning). However, when I start to learn the network from a random state without any transfer learning, the performance metric changes and shows something is learned during the training. Do you have any idea what is the problem?

I’m glad it worked! :slight_smile:
Did you compare the outputs of your Lasagne and Pytorch model? It should be approx. the same up to a tolerance. Just sample a random input with Numpy and call the forward passes on both models.

After you transfer the weights to Pytorch and run a simple prediction, what is the accuracy? Is it random, e.g. ~0.1 for 10 classes?
When you call model.parameters(), do you see all your layers?

Did you try a really low learning rate? Since the network is already pre-trained, the weights should be in a “good” state, so that the lr should be quite low. Could you try lowering the lr by factor 10 or 100 and run it again?

How similar is your new data set to the pre-trained data? Is it sampled from the same distribution (e.g. both are natural images)?

1 Like

Hi again,
After Flipping the weights of convolution layers same as your snippet, I have done your suggestion, I mean comparing the output values. I have generated a random image in the range [0,255] with the size of 150*220 (same as the Lasange model input range) and forward it through both networks (Pytorch and Lasange models). However, the sum of the absolute errors between these two outputs is so huge. I think something strange has happened. The Lasange model which I am trying to transfer is from this GitHub repository. Could please help me, what is wrong with my above code?

How about the Batchnorm transferring weights:

extractor += [nn.BatchNorm2d(in_channel)]
extractor[-1].weight.data = torch.from_numpy(parameters[i-3])
extractor[-1].bias.data = torch.from_numpy(parameters[i-2])
extractor[-1].running_mean = torch.from_numpy(parameters[i-1])
extractor[-1].running_var = torch.from_numpy((1./(parameters[i]**2)) - 1e-4)

Is it correct or not?

Could you post your repository with your Pytorch and Lasagne code so that I could have a look at it?
I suppose you are trying to re-implement this model?

Hi, Here is my GitHub repository. You should first download the pre-trained model from here and run the code. By the way, tnx for your reply!

Thanks for the repo. I had a look at both implementations and found an error.
The Lasagne implementation uses flip_indices=False for the Conv layers, so that a flipping is not necessary.
Sorry for the confusion.

Anyway, I still have some questions regarding your re-implementation. Maybe we could continue this via messages or in Slack?

I continued the debugging layer by layer and it seems you have forgotten to load the weights for the fc1 layer.

Add this in line 62 in arch.py:

classifier[-1].weight.data = torch.from_numpy(np.array(p.T))

Now, the sum of absolute errors for random input is approx. 0.3365, which should be fine. :wink:

1 Like

Wow! Really? Thanks again for your response! I appreciate your favor to me!

Sorry for my delayed response! Could you please tell me, did you use my provided sanity-check code and also my data folder, Or you have prepared another random value for yourself and then using this random value as the input to the pre-trained network (Lasange One)? Currently, my error is about:

Sum of Absolute Error: 1.0465652960843954

You are welcome :wink:
I used the sanity_check to load the model, but then I used a random input sampled using numpy. Tomorrow I can use the provided input.

1 Like

Thank you man. Don’t forget your favor :smiley:

1 Like

I used './data/input.npy' and the checkpoint from './signet.pkl' and got 0.47596 sum of absolute error. Maybe you are using another checkpoint?

1 Like