PyTorch Forward Propogation

Hi,
This maybe a naive question to ask but i am a beginner in pytorch and i am unable to figure out how pytorch is doing the forward propagation.
The code i am using is shown below-

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.cl1 = nn.Linear(5, 4)
        self.cl2 = nn.Linear(4, 2)
        
    def forward(self, x):
        print(model.cl1.weight,x)
        print(torch.mm(x,model.cl1.weight.T))
        x = self.cl1(x)
        print(x)
        x = F.relu(self.cl2(x))
        return x




model = MyModel()
x = torch.ones(1, 5)
output = model(x)

And this is the output i get for the above code.
As you can see that after the forward propagation step the values of x calculated using matrix multiplication and the values calculated using pytorch are different.
Why is this happening?am i missing something here?

Parameter containing:
tensor([[-0.0975, -0.3880, -0.2666, -0.1913,  0.3015],
        [ 0.0493, -0.3044, -0.3731,  0.2693, -0.3543],
        [ 0.0821, -0.4167, -0.2888,  0.3144,  0.3574],
        [ 0.3467,  0.4166, -0.0122, -0.0539, -0.3886]], requires_grad=True) tensor([[1., 1., 1., 1., 1.]])
tensor([[-0.6419, -0.7133,  0.0484,  0.3086]], grad_fn=<MmBackward>)
tensor([[-0.7729, -1.0955,  0.3968,  0.2085]], grad_fn=<AddmmBackward>)

Hi,

You are missing the bias in your manual computation :wink: You can pass bias=False when you create your Linear layer if you don’t want the baises.

Hi,
Thanks for clearing that up.
I have another question if you could please help me–
i am trying to apply a kmeans quantization technique to the weights of each layer during the training procedure.
What i mean is during the forward propagation at each layer i want to first use the kmeans algorithm to calculate the weights and then use these calculated weights and discard the old ones.
Similarly the same procedure for the backpropagation step also.
I am trying to use hooks to implement this functionality and for some trial purposes just to get an idea of how a hook works i have written the below code-

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.cl1 = nn.Linear(5, 4)
        self.cl2 = nn.Linear(4, 2)
        
    def forward(self, x):
        print(model.cl1.weight,model.cl1.bias)
        x = self.cl1(x)
        print(x)
        x = F.relu(self.cl2(x))
        return x


activation = {}
def get_activation(name):
    def hook(model, input, output):
        activation[name] = output.detach()
        print(model.weight)
        model.weight.data=torch.ones(4,5)
        print(model.weight)
        output=torch.mm(input[0],model.weight.T)
        return output
    return hook

model = MyModel()
model.cl1.register_forward_hook(get_activation('fc2'))
x = torch.ones(1, 5)
output = model(x)

and the output i am getting is as follows–

Parameter containing:
tensor([[-0.3525,  0.4128,  0.3498,  0.0078, -0.3536],
        [-0.2505, -0.0612, -0.1943, -0.2778, -0.3931],
        [-0.2942,  0.0659,  0.0260,  0.0492, -0.2993],
        [ 0.1888,  0.2567,  0.0124,  0.3343, -0.3156]], requires_grad=True) Parameter containing:
tensor([-0.3972, -0.0496, -0.0852, -0.1048], requires_grad=True)
Parameter containing:
tensor([[-0.3525,  0.4128,  0.3498,  0.0078, -0.3536],
        [-0.2505, -0.0612, -0.1943, -0.2778, -0.3931],
        [-0.2942,  0.0659,  0.0260,  0.0492, -0.2993],
        [ 0.1888,  0.2567,  0.0124,  0.3343, -0.3156]], requires_grad=True)
Parameter containing:
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]], requires_grad=True)
tensor([[5., 5., 5., 5.]], grad_fn=<MmBackward>)

As you can see from the above output i am getting the value but the grad_fn of the variable x changes.
What is the best way for me to overwrite the weights of each layer while not affecting the computation graph?
Thanks!

Given that you want these operations to be part of the computational graph, you should not use .data (in general you should never use it :wink: )

Also I am a bit confused why you modify cl1’s weights after you did the forward on it?

Okay but when i try to directly modify the weights without using .data it gives me an error.

Oh yes i see what you mean, if you could please tell me how do we overwrite the weights before the forward propagation step?

To do it before the forward I would do the following:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.cl1 = nn.Linear(5, 4)
        self.cl2 = nn.Linear(4, 2)

        # Move the original weights so that we can change it during the forward
        # but still have the original ones detected by .parameters() and the optimizer
        self.cl1.weight_orig = self.cl1.weight
        del self.cl1.weight
        
    def forward(self, x):
        # Recompute the weight attribute based on the original value
        self.cl1.weight = get_new_weight(self.cl1.weight_orig)

        # Do the regular forward
        x = self.cl1(x)
        x = F.relu(self.cl2(x))
        return x

Hi,
Thanks for your response!
Your code works and does not affect the computation graph but i don’t think that it is taking the bias into consideration after modifying the weights.
I also tried this procedure during the back propagation and don’t think i am getting the right results-

def modify_weight(weight):
    with torch.no_grad():
        if weight is not None:
            for i in range(weight.shape[0]):
                for j in range(weight.shape[1]):
                    weight[i][j]=1
    #print(weight)
    return weight
def modify_weight_grad(weight):
    if weight is not None:
        for i in range(weight.shape[0]):
            for j in range(weight.shape[1]):
                weight[i][j]=1
    #print(weight)
    return weight
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.cl1 = nn.Linear(5, 4)
        self.cl2 = nn.Linear(4, 1)
        self.cl1.weight_orig=self.cl1.weight
        del self.cl1.weight
    def forward(self, x):
        #print(model.cl1.weight,model.cl1.bias)
        #print(torch.mm(x,model.cl1.weight.T))
        self.cl1.weight=modify_weight(self.cl1.weight_orig)
        print(self.cl1.bias)
        x = self.cl1(x)
        print(x)
        x = F.relu(self.cl2(x))
        return x



model = MyModel()
optimizer=torch.optim.SGD(model.parameters(),lr=0.01)
model.cl1.register_forward_hook(get_activation('fc2'))
x = torch.ones(1, 5)
output = model(x)
output.backward()
model.cl1.weight_orig.grad=model.cl1.weight.grad
print(model.cl1.weight.grad)
model.cl1.weight.grad=modify_weight_grad(model.cl1.weight_orig.grad)
print(model.cl1.weight.grad)
print(model.cl1.weight)
optimizer.step()
print(model.cl1.weight)

And the output i get is–

bias Parameter containing:
tensor([-0.2568,  0.2500,  0.1713, -0.3188], requires_grad=True)
x after forward propogation with new weight tensor([[5., 5., 5., 5.]], grad_fn=<MmBackward>)
Gradient of the cl1 layer tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])
Gradient of cl1 layer after modification tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])
weight of cl1 layer Parameter containing:
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]], requires_grad=True)
weight of cl1 layer after taking an optimisation step Parameter containing:
tensor([[0.9900, 0.9900, 0.9900, 0.9900, 0.9900],
        [0.9900, 0.9900, 0.9900, 0.9900, 0.9900],
        [0.9900, 0.9900, 0.9900, 0.9900, 0.9900],
        [0.9900, 0.9900, 0.9900, 0.9900, 0.9900]], requires_grad=True)

As you can see from the above output the bias is not being considered and the value of the weight of cl1 layer after taking a step should be zero if i am not wrong.

Hi,

  • You should apply the same logic to .bias if you want it to work as well.
  • You should NOT modify the original parameters inplace. That is why you have the weight_orig that are the original weights that are optimized. and the weight that are the ones that are the processed version of weight_orig after your kmeans like op and that is used in the forward.
  • Keep in mind that you’re doing differentiable operations. And that if you override a value, the gradient for the original value before the override is 0.

Hi,
Thanks for your response.

  • I dont think i was clear about the bias question, what i was saying is that the bias is not considered at all once the weight has been modified(the forward propagation step is not adding the bias once i use the modified weights, it is simple giving me input*weight)

  • Correct me if i am wrong but am i not modifying the gradient value after it has been calculated so would it matter?i want to work with the modified gradient values.

the forward propagation step is not adding the bias once i use the modified weights

Did you set it to zero somewhere else? Or you passed bias=False?

Correct me if i am wrong but am i not modifying the gradient value after it has been calculated so would it matter?

You get the gradient for what your function computes. That’s all I’m saying. If that;s what you want, then everything is fine :slight_smile:

Hi,

No i am not setting the bias to zero or passing bias=False anywhere.
if i try and print the bias it prints it correctly but it is not using the bias during the forward propagation.