List(child.parameters())


(lambert) #1

I’m new to pytorch , please i’m trying to understand why the output of param[0] does not return the first tensor in the list but instead it is returning the first row of each individual tensor. My goal is to freeze some weight of
the first layer of the neural network.

for child  in Net.children():
    for param in list(child.parameters()):
        print(param[0])

output 
tensor([ 0.0013, -0.3676,  0.3981,  0.2008,  0.0662])
tensor(0.3942)
tensor([-0.4629, -0.2118,  0.4997, -0.2215])
tensor(0.2328)

#2

The code returns each Parameter of your model.
What do you mean by “first row of each individual tensor”?
Here is a small example:

model = nn.Sequential(
    nn.Conv2d(3, 6, 3, 1, 1),
    nn.ReLU(),
    nn.Conv2d(6, 12, 3, 1, 1)
)

for child in model.children():
    for name, param in child.named_parameters():
        print(name, param[0])

> ('weight', tensor([[[...]]])
> ('bias', tensor(...)
...

(lambert) #3

this my network

class network(nn.Module):
    def __init__(self):
        super(network,self).__init__()
        self.fc1 =nn.Linear(5,4)
        self.fc2 = nn.Linear(4,2)
    
    def forward(self,data):
        data = F.relu(self.fc1)
        data = self.fc2(data)
        return F.softmax(data)

LOOP1 prints all parameters for each layer as expected whereas LOOP2 seems to print the first row parameters of each child. I would like to access the weights of the first layer only for pruning. what is the easiest way to access the following tensor:

tensor([[-0.1513, -0.2254, -0.2822, -0.2793,  0.1676],
        [ 0.3483, -0.3100,  0.3420,  0.4152,  0.1245],
        [-0.0814,  0.0852, -0.0957, -0.2115,  0.1776],
        [-0.3269, -0.0743,  0.0511,  0.4126, -0.3794]])
#loop1
for child  in Net.children():
    for  param in child.parameters():
        print(param)

print("")
print("")

#loop 2
for child  in Net.children():
    for name,  param in child.named_parameters():
        print(name ,param[0])

OUTPUT
Parameter containing:
tensor([[-0.1513, -0.2254, -0.2822, -0.2793,  0.1676],
        [ 0.3483, -0.3100,  0.3420,  0.4152,  0.1245],
        [-0.0814,  0.0852, -0.0957, -0.2115,  0.1776],
        [-0.3269, -0.0743,  0.0511,  0.4126, -0.3794]])
Parameter containing:
tensor([ 0.0853, -0.2218, -0.4387,  0.2383])
Parameter containing:
tensor([[-0.1804,  0.2937, -0.4402, -0.2483],
        [-0.3421,  0.0601,  0.1138,  0.4677]])
Parameter containing:
tensor([ 0.4512, -0.2957])


weight tensor([-0.1513, -0.2254, -0.2822, -0.2793,  0.1676])
bias tensor(1.00000e-02 *
       8.5337)
weight tensor([-0.1804,  0.2937, -0.4402, -0.2483])
bias tensor(0.4512)

#4

Remove the indexing in the second loop at param.
Currently you print the first element of param with param[0].

The easiest way to get the weights from the first layer would be:

print(Net.fc1.weight)

PS: I’ve edited your post, since the code was not formatted.


(lambert) #5

thanks
i was able to set some weights to 0 using Net.fc1.weight[i][j] = 0 and retrain the network


(lambert) #6

hello, does setting the weight to zero as follow Net.fc1.weight[i][j] = 0 means that the gradient won t be computed for that specific weight during backpro?
thanks


#7

Are you initializing this specific weight with zero or are you constantly setting it to zero?
Manipulating it during training might lead to wrong gradients or errors.


(lambert) #8

i’m manipulating it during training as follow

 with torch.no_grad():
            paramet = list(net.parameters())
            for j in range(200):
                if j not in rows:
                    rows.append(j)             
                    i = random.randint(0,199) 
                    k = random.randint(0,199)
                    save_weight_indexes( (i , k) )        
                    net.fc2.weight[i][k]  = 0. 

i used WITH TORCH.NO_GRADIENT to wrap the loop and set the weights to zero

Here is my training code

freeze = True
    for epoch in range(2):
        for batch_idx, (data, target) in enumerate(train_loader):
            data, target = Variable(data), Variable(target)

            # resize data from (batch_size, 1, 28, 28) to (batch_size, 28*28)
            data = data.view(-1, 28*28)
            optimizer.zero_grad()                                           
            net_out = net(data)         
            loss = criterion(net_out , target)    
            
            loss.backward()
            remove_random_connections(freeze)  
            freeze = False
            optimizer.step() 
            

#9

I think @tom answered this question already here or do you still have open points?
If so, I would suggest to continue in the other thread. :wink: