Hi,

I am re-implementing my Tensorflow models with pytorch. The problem comes when I want to test it by loading weights of previously trained Tensorflow models, since I got very different performance. Obviously something goes wrong.

For debugging I start with a single conv layer where I initialise the conv kernel with the same weights and apply to a same input. Surprisingly, the Pytorch implementation and Tensorflow implementation gives different results.

Here’s the code:

```
# Tensorflow Python2.7
import numpy as np
import tensorflow as tf
# Different weights for testing
np.save('weg.npy', 0.008*np.random.randint(1,1000,(3,3,3,6))) # Option1
#np.save('weg.npy', np.random.randint(1,1000,(3,3,3,6))) # Option2
#np.save('weg.npy', 10.12*np.random.randint(1,1000,(3,3,3,6))) # Option3
inputs = tf.Variable(1.5*np.ones((1, 10, 10,3), dtype=np.float32))
net = tf.contrib.slim.conv2d(inputs, 6, [3, 3], stride=1,
weights_initializer=tf.constant_initializer(np.load('weg.npy')),
biases_initializer=tf.constant_initializer(0),
activation_fn=None)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
x = sess.run(net)
np.save('tf.npy', x)
print x.shape
```

Since I am using python3.6 environment for pytorch, so I didn’t put 2 codes together.

```
# Pytorch Python3.6
import torch.nn as nn
import torch
import numpy as np
# Prepare weights
weights = torch.from_numpy(np.load('weg.npy')).permute((3, 2, 0, 1)) # swap to NCHW
biases = torch.from_numpy(np.zeros(6))
weight_dict = OrderedDict()
weight_dict.update({'0.weight':weights})
weight_dict.update({'0.bias':biases})
inputs = torch.from_numpy(1.5*np.ones((1, 3, 10, 10), dtype=np.float32))
net = nn.Sequential(nn.Conv2d(3, 6, kernel_size=3, stride=1, padding=1))
net.load_state_dict(weight_dict)
# Compare results
m_ = net(inputs)
m=np.load('tf.npy')
print(np.linalg.norm(m - m_.permute((0, 2, 3,1)).data.numpy())) # Swap back to NHWC
```

So I found that the different options to initialise the * weights i.e. 'weg.npy’* led to various the errors results(notice the difference between weights options is just different coefficients)

```
Results:
np.random.randint(1,1000,(3,3,3,6)) => 0.0
0.008*np.random.randint(1,1000,(3,3,3,6)) => 0.00031963884
10.12*np.random.randint(1,1000,(3,3,3,6)) => 0.37016743
```

Does anyone have any idea why this is happening? I am really confused, hoping someone can help,thanks!