Transferring weights from Keras to PyTorch

harritaylor · June 5, 2019, 8:23am

Hi Chirag,
I’m currently going through the same process and am having a similar issue to you.
Looking at your code, I’m just wondering - what is the purpose of the 2d zero padding on the input?

Thanks,
Harri

luvwinnie · July 31, 2020, 10:14am

Hi, i’m trying to convert my keras weight to pytorch too, i want to ask a question about the precision of pytorch weights, how come it roundup my weights? I thought it suppose to be same with given weight…
The below is my output images…

I’m setting the weight by the following code.

self.conv = nn.Conv2d(input_channel, output_channel, kernel_size, stride=stride, padding=padding)
print("before",self.conv.bias)
self.conv.weight.data = torch.Tensor(pretrained_weight["conv2d"][0].T)
self.conv.bias.data = torch.Tensor(pretrained_weight["conv2d"][1])
print("after",self.conv.bias)

ptrblck · August 1, 2020, 7:21am

Most likely there is no rounding involved, but the default print precision is lower in PyTorch than in numpy.
You can change it via: torch.set_printoptions(precision=10) (or any other value) to increase it.

luvwinnie · August 2, 2020, 5:35am

@ptrblck Thank you so much! With this i can confirm that my weight is correct!

luvwinnie · August 2, 2020, 8:55am

sorry for another question within this post.
I have been able to convert keras convolution layer to pytorch.
I want to ask that if converting the GRU layer from keras to pytorch. Is the following weight map correct?

keras: kernel → pytorch: weight_ih
keras: recurrent_kernel → pytorch: weight_hh
keras: bias[0] → pytorch: bias_ih
keras: bias[1] → pytorch: bias_hh

ptrblck · August 3, 2020, 2:28am

I don’t know, how Keras stores the parameters internally and cannot find any information in their docs. Do you have any reference to check it?
If not, you could create a simple layer in Keras and PyTorch, load the parameters using your current approach, and compare the outputs using a fixed input tensor (such as all ones).

chrver · March 19, 2021, 9:39pm

Hi and sorry for bringing this post to life. I, too, want to test if my pytorch model behaves similarly to a keras one so I tried to transfer the weights using the “keras_to_pyt” function by chirag1992m. However, after a forward pass with the two models I get different results. I am very new to pytorch and keras so I am pretty sure I have a bug somewhere, I just don’t know where to look. I attach the script (the code provided by chirag1992m at the start of the post works fine - meaning the passes from the two models are almost identical).

import random
import numpy as np
import time
import keras
import keras.backend as K
import tensorflow.compat.v1.keras.backend as backend
from keras.models import Sequential
from keras.layers import Dense, Dropout, Conv2D, MaxPooling2D, Activation, Flatten
from keras.optimizers import Adam
from keras.callbacks import TensorBoard
import tensorflow as tf

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision.transforms as T

class DQNAgent:
    def __init__(self):

        # Main model
        self.model = self.create_model()

     
    def create_model(self):
        model = Sequential()

        model.add(Conv2D(256, (3, 3), input_shape=(10,10,3)))  
        model.add(Activation('relu'))
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.2))

        model.add(Conv2D(256, (3, 3)))
        model.add(Activation('relu'))
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.2))

        model.add(Flatten())  
        model.add(Dense(64))

        model.add(Dense(9, activation='linear'))  
        model.compile(loss="mse", optimizer=Adam(lr=0.001), metrics=['accuracy'])
        return model

  
agent = DQNAgent()



class Flatten(nn.Module):
  def forward(self, x):
    N, C, H, W = x.size() # read in N, C, H, W
    #print('here???')
    #time.sleep(4)
    return x.view(N, -1)


def keras_to_pyt(km, pm):
    weight_dict = dict()
    for layer in km.layers:
        if type(layer) is keras.layers.convolutional.Conv2D:
            weight_dict[layer.get_config()['name'] + '.weight'] = np.transpose(layer.get_weights()[0], (3, 2, 0, 1))
            weight_dict[layer.get_config()['name'] + '.bias'] = layer.get_weights()[1]
        elif type(layer) is keras.layers.Dense:
            weight_dict[layer.get_config()['name'] + '.weight'] = np.transpose(layer.get_weights()[0], (1, 0))
            weight_dict[layer.get_config()['name'] + '.bias'] = layer.get_weights()[1]
    pyt_state_dict = pm.state_dict()

    pyt_state_dict['model.0.weight'] = torch.from_numpy(weight_dict['conv2d.weight'])
    pyt_state_dict['model.0.bias'] = torch.from_numpy(weight_dict['conv2d.bias'])
    pyt_state_dict['model.4.weight'] = torch.from_numpy(weight_dict['conv2d_1.weight'])
    pyt_state_dict['model.4.bias'] = torch.from_numpy(weight_dict['conv2d_1.bias'])
    pyt_state_dict['model.9.weight'] = torch.from_numpy(weight_dict['dense.weight'])
    pyt_state_dict['model.9.bias'] = torch.from_numpy(weight_dict['dense.bias'])
    pyt_state_dict['model.10.weight'] = torch.from_numpy(weight_dict['dense_1.weight'])
    pyt_state_dict['model.10.bias'] = torch.from_numpy(weight_dict['dense_1.bias'])
    
    pm.load_state_dict(pyt_state_dict)
    return pm


class DQN(nn.Module):

    def __init__(self,  outputs):
        super(DQN, self).__init__()
        
        self.model = nn.Sequential(
            nn.Conv2d(3, 256, kernel_size=3, stride=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout(p=0.2),
            nn.Conv2d(256, 256, kernel_size=3, stride=1),
            nn.ReLU(),            
            nn.MaxPool2d(2),
            nn.Dropout(p=0.2),
            Flatten(), 
            nn.Linear(256, 64),
            nn.Linear(64, outputs),
          )          
    
    def forward(self, x):         
        return self.model(x)
    
    
policy_net = DQN(9)

inp = np.random.normal(size=(1, 3, 10, 10)).astype(dtype=np.float32)

inp_pyt = torch.autograd.Variable(torch.from_numpy(inp.copy()).float())
inp_keras = np.transpose(inp.copy(), (0, 2, 3, 1))

keras_result = agent.model.predict(x=inp_keras, verbose=1)
pyt_res = policy_net(inp_pyt).data.numpy()

print('keras res= ', keras_result)
print('pyt res= ', pyt_res)

The given script prints the following:
keras res= [[ 0.05931383 -0.24000387 0.09934839 0.08949665 -0.02977627 0.06066565
0.37287465 0.09245875 -0.12169667]]
pyt res= [[ 0.13366437 -0.16113546 -0.03684063 0.28739393 0.09155414 0.08537194
-0.14583385 0.22953041 -0.07536763]]

I would appreciate any help, thanks!

ptrblck · March 20, 2021, 6:17am

Your models use dropout layers, which will randomly drop activations, and I’m not seeing any model.eval() calls to disable them (I don’t know how they would be disabled in Keras/TF), so I would expect to see different outputs.
Could you either remove these layers or make sure they are not used during the testing?

chrver · March 20, 2021, 3:36pm

Thanks for the reply, that did it!
The reason for this is that I am trying to transfer a deep reinforcement learning algorithm from tensorflow to pytorch. It works in tensorflow, but not in pytorch - the rewards never increase. Do you think the difference in the dropout layers can be the issue? (It doesn’t seem reasonable to me - I will try to debug it more)

ptrblck · March 21, 2021, 6:46am

No, I don’t think that the TF and PyTorch dropout implementations would differ in their functionality (they might in their implementation) and would check the optimizers etc. next.

Swaraj_Badhei · December 4, 2023, 7:07am

Hello,
Sorry for reviving this again. I am trying to transfer the weights but getting different results in PT. Can someone please help? Thanks in advance.
Cheers,

tf model

model = tf.keras.models.Sequential([
layers.Reshape((28,28,1), input_shape=(28,28)),
layers.Conv2D(32, (3,3), activation=‘relu’, input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(10, activation=‘softmax’)
])

model.compile(optimizer=‘adam’,
loss=‘sparse_categorical_crossentropy’,
metrics=[‘accuracy’])

model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

torch model

class Net(nn.Module):
def init(self):
super(Net, self).init()

    # Define layers
    self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
    self.pool = nn.MaxPool2d(kernel_size=2)
    self.fc = nn.Linear(32*13*13, 10)

def forward(self, x):
    # Reshape input
    x = x.view(-1, 1, 28, 28)

    # Convolutional layers
    x = self.conv1(x)
    x = nn.functional.relu(x)
    x = self.pool(x)

    # Flatten and fully connected layers
    x = torch.flatten(x, 1)
    x = self.fc(x)
    x = nn.functional.softmax(x, dim=1)

    return x

torch_model = Net()

Define the loss function and optimizer

criterion = nn.CrossEntropyLoss()

optimizer = torch.optim.Adam(torch_model.parameters(), lr=0.001)

torch_model.conv1.weight.data = torch.from_numpy(np.transpose(keras_model_weights[0]))

torch_model.conv1.bias.data = torch.from_numpy(np.transpose(keras_model_weights[1]))

torch_model.fc.weight.data = torch.from_numpy(np.transpose(keras_model_weights[2]))

torch_model.fc.bias.data = torch.from_numpy(np.transpose(keras_model_weights[3]))

test_array = x_test[101].reshape(1, 28, 28).astype(np.float32)
print(test_array.shape)

keras_output = model.predict(test_array, verbose=0)
test_torch_tensor = torch.from_numpy(test_array)

torch_model.eval()

with torch.no_grad():
output_tensor = torch_model(test_torch_tensor)

ptrblck · December 4, 2023, 2:22pm

Print the intermediate activations in both frameworks to check where the difference increases assuming the input is equal.

Swaraj_Badhei · December 5, 2023, 9:11am

Hello @ptrblck ,
Thanks a lot for your reply.
I checked that, and Conv2D is where the output is different, for Dense layers it’s working perfectly. Conv2D layer is raising this difference. Can you please tell me what I can look into further?

Currently using the following code for testing

Tensorflow Model

keras_conv_model = tf.keras.models.Sequential([
tf.keras.layers.Reshape((28, 28, 1), input_shape=(28, 28)),
tf.keras.layers.Conv2D(32, (3,3),
activation=‘relu’,
kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01),
bias_initializer=tf.zeros_initializer(),
input_shape=(28, 28, 1))
])

PyTorch Model

class SimpleConvNet(nn.Module):
def init(self):
super(SimpleConvNet, self).init()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3)
self.relu = nn.ReLU()

def forward(self, x):
    x = self.conv1(x)
    x = self.relu(x)
    return x

Regards,
Swaraj

ptrblck · December 5, 2023, 3:09pm

You might need to permute the weight if the memory layout differs e.g. via: w = w.permute(1, 0, 2, 3).

Swaraj_Badhei · December 6, 2023, 1:16pm

Changing the memory layout did the job.

torch_conv_model.conv1.weight.data = torch.from_numpy(keras_weights[0]).permute(3, 2, 0, 1)

Thanks a lot