How to manually load weights(from **.txt file) into conv2D.weight inside nn.Sequential?

yaluguo · August 11, 2017, 12:46pm

I have pulled out some weight and bias data from a pre-trained tensorflow CNN model and saved them in txt files.
I wonder how can I load these data into a NN model contained in nn.Sequential in my PyTorch code like below?

class CNN(nn.Module):
def init(self):
super(CNN, self).init()
self.conv1 = nn.Sequential(
nn.Conv2d(
in_channels=4,
out_channels=32,
kernel_size=8,
stride=4,
padding=2,
),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
)

hma · August 11, 2017, 1:34pm

You can access and set the convolution kernel by doing

self.conv1[0].weight.data = pretrained_weight
self.conv1[0].bias.data = pretrained_bias

yaluguo · August 12, 2017, 7:36am

Thx for your answers!! But I have some puzzles!

I have done below:

mycnn = CNN()
print (mycnn.state_dict().keys())

it shows：
[‘conv1.0.weight’, ‘conv1.0.bias’, ‘conv2.0.weight’, ‘conv2.0.bias’, ‘conv3.0.weight’, ‘conv3.0.bias’, ‘fc1.weight’, ‘fc1.bias’, ‘out.weight’, ‘out.bias’]

Then I try to do below:

print (mycnn.conv1[0].bias.data)
print (mycnn.state_dict()[‘conv1.0.bias’].data)

The outputs are different.

And I check the gradient:
It shows

mycnn.conv1[0].bias.grad = None

mycnn.state_dict()[‘conv1.0.bias’].grad is an ERROR
AttributeError: ‘torch.FloatTensor’ object has no attribute ‘grad’

Can you tell the difference between “mycnn.conv1[0].bias” and “mycnn.state_dict()[‘conv1.0.bias’]” in my Pytorch model?

hma · August 12, 2017, 9:54am

Maybe what store in mycnn.state_dict( ) are just pytorch tensors, not variables.

yaluguo · August 13, 2017, 1:44pm

I have build exactly same model in both TF and Pytorch. And I trained in TF. For some reason, I have to transfer the pretrained weight to Pytorch.

The network is like:

In TF, Conv2d filter shape is [filter_height, filter_width, in_channels, out_channels], while in Pytorch is (out_channels, in_channels, kernel_size[0], kernel_size[1]).

So I have done below in TF:

and I transfer to pytorch like:

It turns out that the DQN in pytorch is not working well as in TF!