Different ouputs for the same network

Could you post your network definition, so that I could run it on my machine please?
EDIT: I will just use your snippet from the first post and assume it’s the complete network. :wink:

class VGGNet(nn.Module):
def init(self):
""“Select conv1_1 ~ conv5_1 activation maps.”""
super(VGGNet, self).init()
self.select = [15,22,29]
self.features = torch.nn.Sequential(
# conv1
torch.nn.Conv2d(3,64,3,padding=35),
torch.nn.ReLU(),
torch.nn.Conv2d(64, 64, 3, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2, stride=2),
# conv2
torch.nn.Conv2d(64, 128, 3, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(128, 128, 3, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2, stride=2),
# conv3
torch.nn.Conv2d(128, 256, 3, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(256, 256, 3, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(256, 256, 3, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2, stride=2),
# conv4
torch.nn.Conv2d(256, 512, 3, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(512, 512, 3, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(512, 512, 3, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2, stride=2),
# conv5
torch.nn.Conv2d(512, 512, 3, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(512, 512, 3, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(512, 512, 3, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2, stride=2)
)
self.deconv1 = torch.nn.Sequential(
torch.nn.ConvTranspose2d(256, 128, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(128, 64, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(64, 1, 3, padding=0,stride=1),
torch.nn.ReLU(),
)
self.deconv2 = torch.nn.Sequential(
torch.nn.ConvTranspose2d(512, 256, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(256, 128, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(128, 64, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(64, 1, 3, padding=0,stride=1),
torch.nn.ReLU(),
)
self.deconv3 = torch.nn.Sequential(
torch.nn.ConvTranspose2d(512, 512, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(512, 256, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(256, 128, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(128, 64, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(64, 1, 3, padding=0,stride=1),
torch.nn.ReLU(),
)
self.final_attention_pred = torch.nn.Sequential(
torch.nn.ConvTranspose2d(9, 1, 3, stride=1,padding=1)
)
self._initialize_weights()

def _initialize_weights(self):
    # initializing weights using ImageNet-trained model from PyTorch
    for i, layer in enumerate(models.vgg16(pretrained=True).features):
        if isinstance(layer, torch.nn.Conv2d):
            self.features[i].weight.data = layer.weight.data
            self.features[i].bias.data = layer.bias.data


def forward(self, x):
    ##return list of feature map at different size
    features = []
    for i, layer in enumerate(self.features):
        layer.register_backward_hook(printgradnorm)
        if(i in self.select ):
            x = layer(x)
            features.append(x)
        else:
            x = layer(x)
    for i in self.deconv1:
        i.register_backward_hook(printgradnorm)
    
    for i in self.deconv2:
        i.register_backward_hook(printgradnorm)
        
    for i in self.deconv3:
        i.register_backward_hook(printgradnorm)
        
    self.final_attention_pred[0].register_backward_hook(printgradnorm)
        
    saliency = [] 
    m = nn.Sigmoid()
    m1 = nn.Sigmoid()
    m2 = nn.Sigmoid()
    m3 = nn.Sigmoid()

    m.register_backward_hook(printgradnorm)
    attentionmap1 = self.deconv1(features[0])[:, :, 38:262, 38:262]
    attentionmap1 = attentionmap1.expand(1, 3, 224, 224)
    attentionmap2 = self.deconv2(features[1])[:, :, 38:262, 38:262]
    attentionmap2 = attentionmap2.expand(1, 3, 224, 224)
    attentionmap3 = self.deconv3(features[2])[:, :, 38:262, 38:262]
    attentionmap3 = attentionmap3.expand(1, 3, 224, 224)
    
    saliency.append(m(attentionmap1))
    display_image(saliency[0].data.cpu())
    
    saliency.append(m1(attentionmap2))
    #display_image(saliency[0].data.cpu())

    saliency.append(m2(attentionmap3))
    #display_image(saliency[0].data.cpu())

    output_data = torch.cat(saliency,1)
    output  = m3(self.final_attention_pred(output_data))
    return output

Have fun ^ ^ you can remove the print functions or if you want it :

def display_image(input):
x = input.permute(0,2,3,1)
x = x.numpy()
x = np.squeeze(x,axis = 0)
if(x.shape[2]==1):
x = np.squeeze(x,axis = 2)
plt.figure()
plt.imshow(x, cmap = matplotlib.cm.Greys_r)

This looks fine on my machine:

CPU:

model = VGGNet()
x = Variable(torch.randn(1, 3, 224, 224))
output = model(x)
o1 = output.clone()
output = model(x)
o2 = output.clone()
(o1 - o2).abs().sum()
>> Variable containing:
 0
[torch.FloatTensor of size 1]

GPU:

...
(o1 - o2).abs().sum()
>> Variable containing:
1.00000e-06 *
  7.5102
[torch.cuda.FloatTensor of size 1 (GPU 0)]

mhh strange I tested my input and at various stage of the forward and every time I have something different :thinking:
EDIT: the only things which could lead to a problem on my input are :
np.expand_dims
torch.from_numpy
j.permute

otherwise I don’t see what could insert randomness in this model :thinking:

@ptrblck Alright so if I do one network with same input and try it twice I get the same output at two different stages of the forward. BUT if If recreate the same network and use the same input and compare its ouputs to the previous network, I dont have the same thing :thinking: (cf screen shot above

Edit: if I create two models and check if their weight are the same using :

for p1, p2 in zip(model1.parameters(), model2.parameters()):
if p1.data.ne(p2.data).sum() > 0:
print(“nop”)
print(“true”)

I find that their weights are not the same, might be the reason of the difference. What is the correct way to copy weights ?

You are initializing all Conv2d layers, but skip the ConvTranspose2d layers, which are initialized using the default function.
That is why your models have different weights in the self.deconv modules.

So should I set their weights to 0 to produce the same results every time ?

Setting all weights to zero will create an output containing all zeros.
It will be deterministic, but I doubt that’s what you want.
Since you don’t have pre-trained weights for the ConvTranspose layers, you could initialize them once in your first model and copy these weights to all other models.
Do you really need different models with exactly the same weights?

Well what I really need is that every time I use the model with the same input, my features map are the same once the input have been though the conv and deconv blocks.

I think this will work using just one model or am I missing something?

I mean yeah it will definitely works if I always use the same model. (train it, save it, reuse it etc…).
But if I lose the model, or decide to retrain everything from scratch, I will get completely different results, while I use the same model shape and same data.

Ok, what do you think about loading one model, copying the pre-trained weights, initializing all other weights and finally saving this model as your baseline.
Then you would have to rewrite the initialize_weights function and load all weights from this baseline model.

That was my idea yeah. So there is no way to initialize my weights in a way that I can use all the time and any new network ?

You could try to set the seed with torch.manual_seed(SEED) before initializing the model.
Could you try that? I think you would have to reset it every time before creating the model.

Alright it does indeed works perfectly.
I never thought about it as I always used an imagenet pretrained network’s weights, but do we always set weights randomly when training a network from scratch ?

Yes, weight initialization is one crucial step in training a network from scratch.
PyTorch has a lot of different init functions.
E.g. one popular method for conv layers is xavier_uniform. Depending on your architecture, different weight inits might speed up training of even make it possible.

Have a look at the weight init section in CS231n.

Thank you for those explanations, I will try to find the best way for my problem. The paper I am basing my research on uses a Gaussian distribution but I am not sure it will fit my problem perfectly.
Anyway thanks a lot for you time and answers :slight_smile:

Recently, I meet the same problem. I set the random seed to ensure the same weight initialization and the same input data.
For example, I run my model 2 times. A the first times, I save the input data and the weight of the model and the output data to numpy array. Then I run my model again, and also save the above data. I compare the difference between them. Because I set the random seed, so the weight of them and the input data are the same. But it is really strange that the output of them are different. I run my model on GPU. I
Then, I run the model on cpu, I found that the output of them are the same.
So I wonder why there will be computing error when we use GPU, is this the bug of pytorch or something else?

Hey man,

There are some layers which have some random operations in them, maybe you should look into that, I had also a similar problem with the difference between GPU and CPU, and it happened to be some layer doing non deterministic operations.

While my model contain BN, convolutional layer, convolutional transpose layer, and dropout layer. I think the BN, convolutional transpose and convolutional transpose layer have the deterministic operation. Also because I set the random seed, so the dropout layer is still the same as the different experiment.
But now I have solved the problem by add

torch.backends.cudnn.deterministic = True

Even I use GPU, the outputs are the same!!!
But I still do not know why this operation work~

1 Like