Hi all,

I am looking in training a model using the approach described in Supervised Contrastive Learning, where there is a metric learning loss and a classification loss.

In the paper, they first train the encoder using solely the metric learning loss. Then, they freeze the encoder, and train a classification layer using cross-entropy loss. This requires the model to be trained in two stages, however the authors mention: `Note that in practice the linear classifier can be trained jointly with the encoder and projection networks by blocking gradient propagation from the linear classifier back to the encoder, and achieve roughly the same results without requiring two-stage training.`

Therefore I would like to ask if my pseudocode here would work as a way to train jointly the encoder and classification layer as proposed by the authors.

Some questions:

- Is this implementation of two backward calls correct? Or should I add the losses and perform the backward step once?
- Should I use two different optimizers for each part of the network (encoder and classification layer)?

```
class MyModel(nn.Module):
def __init__(self):
self.encoder = nn.Linear(100, 10)
self.lin = nn.Linear(10, 2)
def training_step(self, batch):
x, y = batch['x'], batch['y']
e = self.encoder(x)
metric_loss = MyMetricLoss(e, y)
metric_loss.backward()
optimizer.step()
optimizer.zero_grad()
# use detach to block gradients from classification
# to reach encoder
p = self.lin(e.detach())
classification_loss = CrossEntropy(p, y)
classification_loss.backward()
optimizer.step()
optimizer.zero_grad()
```

Many thanks!