Parameters of Subspace Projection are not updated durin learning

Hey All. (First post, maybe you have to be patient)
I am trying to do prototype-based metric learning, using prototorch a torch extension, where the prototypes live in a subspace.
An nn.parameter variable projects these prototypes and samples into this subspace, however it is not updated during training, although there are gradients for it.

The network is:

class Model(torch.nn.Module):
    def __init__(self, num_classes,init_data,tangent_projection_type="local",
                              prototypes_per_class=2, bottleneck_dim=128,):
        super().__init__()
        # Feature Extractor
        self.tpt = tangent_projection_type
        super(Model, self).__init__()
        self.fe = nn.Sequential(
          nn.Conv2d(1, 32, 3, 1),
          nn.ReLU(),
          nn.Conv2d(32, 64, 3, 1),
          nn.ReLU(),
          nn.MaxPool2d(2),
          nn.Dropout(0.25),
          nn.Flatten(),
          nn.Linear(9216, 128),
          nn.ReLU(),
          nn.Dropout(0.5))

# inital subspace is right singular values of inital batch with dxd shape
          self.subspaces = torch.nn.Parameter(
                             self.init_gobal_subspace(init_data).
                                    clone().detach().requires_grad_(True))

          self.glvq = Prototypes1D(input_dim=128,
                                 prototypes_per_class=prototypes_per_class,
                                 nclasses=num_classes,
                                 prototype_initializer='zeros')

and the feed forward pass is:

    def forward(self, x):
        # Feature Extraction
        x = self.fe(x)

        # Tangent projection and distance
        x = x @ self.subspaces
        projected_prototypes, self.glvq.prototypes @ self.subspaces
        dis = euclidean_distance(x, projected_prototypes)

Using a prototype based approach, the distance beteween correct samples is minimized.

Currently i learn it via the the loop function:

for epoch in range(n_epochs):
    for batch_idx, (x_train, y_train) in enumerate(train_loader):

        # Compute loss.
        distances, plabels = model(x_train)
        loss = criterion([distances, plabels], y_train)
        
        control  = model.subspaces.clone()

         # Take a gradient descent step
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        model.subspaces = nn.Parameter(orthogonalization(model.subspaces))

Note that the orthogonalization is necessary and removing it does not change the behaviour.
When removing the subspace from the forward pass the network learns fine. However, if added, the difference between the subspace before and after the optimizer step is zero:

Epoch: 01/50 Epoch Progress: 1.07 % Loss: 20.11 Subspace Difference: 0.00 
Epoch: 01/50 Epoch Progress: 1.60 % Loss: 21.93 Subspace Difference: 0.00 
Epoch: 01/50 Epoch Progress: 2.13 % Loss: 21.23 Subspace Difference: 0.00 
Epoch: 01/50 Epoch Progress: 2.67 % Loss: 21.22 Subspace Difference: 0.00 
Epoch: 01/50 Epoch Progress: 3.20 % Loss: 20.94 Subspace Difference: 0.00 
Epoch: 01/50 Epoch Progress: 3.73 % Loss: 23.43 Subspace Difference: 0.00 
Epoch: 01/50 Epoch Progress: 4.27 % Loss: 21.33 Subspace Difference: 0.00 

Am i doing something wrong generally? Or is this the wrong use of nn.Parameter?

I would be very glad if you could help me further.
Thank you very much

This is a bit dangerous as Parameters need to always be leafs and this might not be the case.

If you just want to update the content of a parameter without the autograd tracking it, you can do:

with torch.no_grad():
    model.subspaces.copy_(orthogonalization(model.subspaces))
1 Like

This indeed did it. Thank you very much for your help!

Do you know any resources, where i can learn more about the leaf behaviour?
The Parameter does not mention it.

Hi,

I’m afraid there isn’t much. But a leaf is a Tensor with no history. And the .backward() function will accumulate gradients in all the leafs that require gradients.
So if a Tensor has history, it is a not a leaf and .backward() won’t populate its .grad field.