Different ouputs for the same network

Hi again,
This may seems as a stupid question but I have different outputs for the same network every time I train just for the forward pass.
the shape of the network looks like this :

 
  self.features = torch.nn.Sequential(
             # conv1
             torch.nn.Conv2d(3,64,3,padding=35),
             torch.nn.ReLU(),
             torch.nn.Conv2d(64, 64, 3, padding=1),
             torch.nn.ReLU(),
             torch.nn.MaxPool2d(2, stride=2),
             # conv2
             torch.nn.Conv2d(64, 128, 3, padding=1),
             torch.nn.ReLU(),
             torch.nn.Conv2d(128, 128, 3, padding=1),
             torch.nn.ReLU(),
             torch.nn.MaxPool2d(2, stride=2),
             # conv3
             torch.nn.Conv2d(128, 256, 3, padding=1),
             torch.nn.ReLU(),
             torch.nn.Conv2d(256, 256, 3, padding=1),
             torch.nn.ReLU(),
             torch.nn.Conv2d(256, 256, 3, padding=1),
             torch.nn.ReLU()
 )
 self.deconv1 = torch.nn.Sequential(
             torch.nn.ConvTranspose2d(256, 128, 4, stride=2),
             torch.nn.ReLU(),
             torch.nn.ConvTranspose2d(128, 64, 4, stride=2),
             torch.nn.ReLU(),
             torch.nn.ConvTranspose2d(64, 1, 3, padding=0,stride=1)
 )

The image go through the features and deconv blocks and then is cropped (slicing) before going through a sigmoid.
The ouput is never the same, for the same image.
Am I not seing something stupid ?

yours

Justin

Do you run your model on your GPU?
Could you post the difference/error?

I tried on GPU and CPU, same thing.
Here are 2 different result obtained for the same network and same input.
image
image

Could you post your network definition, so that I could run it on my machine please?
EDIT: I will just use your snippet from the first post and assume it’s the complete network. :wink:

class VGGNet(nn.Module):
def init(self):
“”“Select conv1_1 ~ conv5_1 activation maps.”“”
super(VGGNet, self).init()
self.select = [15,22,29]
self.features = torch.nn.Sequential(
# conv1
torch.nn.Conv2d(3,64,3,padding=35),
torch.nn.ReLU(),
torch.nn.Conv2d(64, 64, 3, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2, stride=2),
# conv2
torch.nn.Conv2d(64, 128, 3, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(128, 128, 3, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2, stride=2),
# conv3
torch.nn.Conv2d(128, 256, 3, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(256, 256, 3, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(256, 256, 3, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2, stride=2),
# conv4
torch.nn.Conv2d(256, 512, 3, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(512, 512, 3, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(512, 512, 3, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2, stride=2),
# conv5
torch.nn.Conv2d(512, 512, 3, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(512, 512, 3, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(512, 512, 3, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2, stride=2)
)
self.deconv1 = torch.nn.Sequential(
torch.nn.ConvTranspose2d(256, 128, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(128, 64, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(64, 1, 3, padding=0,stride=1),
torch.nn.ReLU(),
)
self.deconv2 = torch.nn.Sequential(
torch.nn.ConvTranspose2d(512, 256, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(256, 128, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(128, 64, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(64, 1, 3, padding=0,stride=1),
torch.nn.ReLU(),
)
self.deconv3 = torch.nn.Sequential(
torch.nn.ConvTranspose2d(512, 512, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(512, 256, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(256, 128, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(128, 64, 4, stride=2),
torch.nn.ReLU(),
torch.nn.ConvTranspose2d(64, 1, 3, padding=0,stride=1),
torch.nn.ReLU(),
)
self.final_attention_pred = torch.nn.Sequential(
torch.nn.ConvTranspose2d(9, 1, 3, stride=1,padding=1)
)
self._initialize_weights()

def _initialize_weights(self):
    # initializing weights using ImageNet-trained model from PyTorch
    for i, layer in enumerate(models.vgg16(pretrained=True).features):
        if isinstance(layer, torch.nn.Conv2d):
            self.features[i].weight.data = layer.weight.data
            self.features[i].bias.data = layer.bias.data


def forward(self, x):
    ##return list of feature map at different size
    features = []
    for i, layer in enumerate(self.features):
        layer.register_backward_hook(printgradnorm)
        if(i in self.select ):
            x = layer(x)
            features.append(x)
        else:
            x = layer(x)
    for i in self.deconv1:
        i.register_backward_hook(printgradnorm)
    
    for i in self.deconv2:
        i.register_backward_hook(printgradnorm)
        
    for i in self.deconv3:
        i.register_backward_hook(printgradnorm)
        
    self.final_attention_pred[0].register_backward_hook(printgradnorm)
        
    saliency = [] 
    m = nn.Sigmoid()
    m1 = nn.Sigmoid()
    m2 = nn.Sigmoid()
    m3 = nn.Sigmoid()

    m.register_backward_hook(printgradnorm)
    attentionmap1 = self.deconv1(features[0])[:, :, 38:262, 38:262]
    attentionmap1 = attentionmap1.expand(1, 3, 224, 224)
    attentionmap2 = self.deconv2(features[1])[:, :, 38:262, 38:262]
    attentionmap2 = attentionmap2.expand(1, 3, 224, 224)
    attentionmap3 = self.deconv3(features[2])[:, :, 38:262, 38:262]
    attentionmap3 = attentionmap3.expand(1, 3, 224, 224)
    
    saliency.append(m(attentionmap1))
    display_image(saliency[0].data.cpu())
    
    saliency.append(m1(attentionmap2))
    #display_image(saliency[0].data.cpu())

    saliency.append(m2(attentionmap3))
    #display_image(saliency[0].data.cpu())

    output_data = torch.cat(saliency,1)
    output  = m3(self.final_attention_pred(output_data))
    return output

Have fun ^ ^ you can remove the print functions or if you want it :

def display_image(input):
x = input.permute(0,2,3,1)
x = x.numpy()
x = np.squeeze(x,axis = 0)
if(x.shape[2]==1):
x = np.squeeze(x,axis = 2)
plt.figure()
plt.imshow(x, cmap = matplotlib.cm.Greys_r)

This looks fine on my machine:

CPU:

model = VGGNet()
x = Variable(torch.randn(1, 3, 224, 224))
output = model(x)
o1 = output.clone()
output = model(x)
o2 = output.clone()
(o1 - o2).abs().sum()
>> Variable containing:
 0
[torch.FloatTensor of size 1]

GPU:

...
(o1 - o2).abs().sum()
>> Variable containing:
1.00000e-06 *
  7.5102
[torch.cuda.FloatTensor of size 1 (GPU 0)]

mhh strange I tested my input and at various stage of the forward and every time I have something different :thinking:
EDIT: the only things which could lead to a problem on my input are :
np.expand_dims
torch.from_numpy
j.permute

otherwise I don’t see what could insert randomness in this model :thinking:

@ptrblck Alright so if I do one network with same input and try it twice I get the same output at two different stages of the forward. BUT if If recreate the same network and use the same input and compare its ouputs to the previous network, I dont have the same thing :thinking: (cf screen shot above

Edit: if I create two models and check if their weight are the same using :

for p1, p2 in zip(model1.parameters(), model2.parameters()):
if p1.data.ne(p2.data).sum() > 0:
print(“nop”)
print(“true”)

I find that their weights are not the same, might be the reason of the difference. What is the correct way to copy weights ?

You are initializing all Conv2d layers, but skip the ConvTranspose2d layers, which are initialized using the default function.
That is why your models have different weights in the self.deconv modules.

So should I set their weights to 0 to produce the same results every time ?

Setting all weights to zero will create an output containing all zeros.
It will be deterministic, but I doubt that’s what you want.
Since you don’t have pre-trained weights for the ConvTranspose layers, you could initialize them once in your first model and copy these weights to all other models.
Do you really need different models with exactly the same weights?

Well what I really need is that every time I use the model with the same input, my features map are the same once the input have been though the conv and deconv blocks.

I think this will work using just one model or am I missing something?

I mean yeah it will definitely works if I always use the same model. (train it, save it, reuse it etc…).
But if I lose the model, or decide to retrain everything from scratch, I will get completely different results, while I use the same model shape and same data.

Ok, what do you think about loading one model, copying the pre-trained weights, initializing all other weights and finally saving this model as your baseline.
Then you would have to rewrite the initialize_weights function and load all weights from this baseline model.

That was my idea yeah. So there is no way to initialize my weights in a way that I can use all the time and any new network ?

You could try to set the seed with torch.manual_seed(SEED) before initializing the model.
Could you try that? I think you would have to reset it every time before creating the model.

Alright it does indeed works perfectly.
I never thought about it as I always used an imagenet pretrained network’s weights, but do we always set weights randomly when training a network from scratch ?

Yes, weight initialization is one crucial step in training a network from scratch.
PyTorch has a lot of different init functions.
E.g. one popular method for conv layers is xavier_uniform. Depending on your architecture, different weight inits might speed up training of even make it possible.

Have a look at the weight init section in CS231n.

Thank you for those explanations, I will try to find the best way for my problem. The paper I am basing my research on uses a Gaussian distribution but I am not sure it will fit my problem perfectly.
Anyway thanks a lot for you time and answers :slight_smile: