Model not training, gradients are None

otatopeht · September 6, 2021, 8:22am

Hey everyone!
I’m currently finetuning a pretrained sentence transformer with in-domain data.

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)

Below you can find a snippet of the code:


from sentence_transformers import SentenceTransformer
import torch
activation = {}

def hook(name, output):
    activation[name] = output[0].detach()

model = SentenceTransformer('T-Systems-onsite/cross-en-de-roberta-sentence-transformer')
model.train()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
cos_sim = torch.nn.CosineSimilarity(dim=0)
optimizer.zero_grad()

query_prediction = model.encode("a man is cutting up a tomato", convert_to_tensor=True)
positive_prediction = model.encode("a man is slicing a tomato", convert_to_tensor=True)
negative_prediction = model.encode("she's brushing her hair", convert_to_tensor=True)

dist_pos = 1 - cos_sim(query_prediction, positive_prediction)
dist_neg = 1 - cos_sim(query_prediction, negative_prediction)

loss = torch.max(torch.tensor(0.), 0.7 + dist_pos - dist_neg)

if loss != torch.tensor(0.):
    loss.backward()
for p in model.parameters():
    print(p.grad)

for name, layer in model.named_modules():
    layer.register_forward_hook(hook(name,query_prediction))
print(activation)

The loss is being calculated, but the gradients are None. Therefore, the model is not training.

When going through the sentence transformer code, within the encode method the forward seems to be calculated with no grad. Might that be the problem?

Any tips or ideas on why the gradients are None would be much appreciated.

ptrblck · September 6, 2021, 9:16am

Did you verify that loss.backward() was called?
If so, do you see valid .grad_fn attributes for loss, dist_pos, dist_neg, query_***, ***_preciction?

otatopeht · September 6, 2021, 9:44am

Thank you for replying.
The loss.backward() is in fact called when the batch loss is not 0.
The output of .grad_fn attributes is:

>  loss <MaximumBackward object at 0x7f20ea3fb3d0>
>  query_prediction <SelectBackward object at 0x7f20ea3fb3d0>
>  positive_prediction <SelectBackward object at 0x7f20ea3fb3d0>
>  negative_prediction <SelectBackward object at 0x7f20ea3fb3d0>
>  dist_neg <RsubBackward1 object at 0x7f20ea3fb3d0>
>  dist_neg <RsubBackward1 object at 0x7f20ea3fb3d0>

ptrblck · September 6, 2021, 9:45am

This would mean that at least the model output is attached to the graph, so you could check the grad_fn attributes of previous activations and check, if any yields a None.

otatopeht · September 8, 2021, 6:39am

Thanks for the explanation!

How can I do that?

When checking the parameters like so:

for p in model.parameters():
    print(p.grad_fn)

all I get is None.

ptrblck · September 8, 2021, 6:43am

The parameters don’t have any grad_fn, as they are leaf nodes, so you would need to check the forward activations either directly in the forward method, e.g. via:

def forward(self, x):
    x = self.layer(x)
    print(x.grad_fn)
    ...

or via forward hooks.

otatopeht · September 8, 2021, 9:29am

Maybe something similar?


def hook(name, output):
    activation[name] = output[0].detach()

query_prediction = model.encode("a man is cutting up a tomato", convert_to_tensor=True)



for name, layer in model.named_modules():
    layer.register_forward_hook(hook(name,query_prediction))
print(activation)

The output tensors are all of the same vaue.

{'': tensor(-0.0094), 
'0': tensor(-0.0094),
 '0.auto_model': tensor(-0.0094), 
'0.auto_model.embeddings': tensor(-0.0094), 
'0.auto_model.embeddings.word_embeddings': tensor(-0.0094), 
.
.
.
'1': tensor(-0.0094)}

sprakashdash · February 23, 2024, 4:53pm

I have got the same issue, and I am getting <UnsafeViewBackward0 object at 0x2b2ac389e220> when I access grad_fn. Is this something I should not be expecting?

ptrblck · February 23, 2024, 10:15pm

Yes, UnsafeViewBackward is expected and _unsafe_view is described here.