Why are the grads None?

Hello everyone,
I’m trying to finetune a pretrained sentence transformer on indomaine data with Pytorch.
The loss is being calculated but the grads are None.

The model looks like this:

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)

And when trying to use the forward hooks as follow:

activation = {}                                                                                             
def get_activation(name):                                                                                   
    def hook(model, input, output):                                                                         
        activation[name] = output[0].detach()                                                               
    return hook                                                                                             
model = SentenceTransformer(xxx)                
for name, layer in model.named_modules():                                                               
        layer.register_forward_hook(get_activation(name))                                                                                                                                               
output = model.encode("test sentence")                                                                             
for key in activation:                                                                    
     print(activation[key])

I get this error

in hook
    activation[name] = output.detach()
AttributeError: 'tuple' object has no attribute 'detach'

Also when going through the sentence transformer code, within the encode method the forward seems to be calculated with no grad, might that be the problem?

With hooks, the returned object is of type tuple. So you need to get the 0-th element of that tuple in order to get the Tensor out of the tuple, as you did here

I think with hooks you shouldn’t embed them within other functions, the hook should be a standalone function like this,

def hook(module, input, output):                                                                         
  name = module.__class__.__name__
  activation[name] = output[0].detach()

but you’d need to change the name variable, if the name is derived from the the particular layer. This can be done via module.__class__.__name__ (check if these two statements give the same results!)

Thank you for replying!

I changed the code as you mentioned and I get this:

TypeError: hook() missing 2 required positional arguments: ‘input’ and ‘output’

Can you share exactly what you changed? So, it’s easier for me to debug! A reproducible example would be great!

Thank you for replying!
The error is fixed
Here is the code:

from sentence_transformers import SentenceTransformer

def hook(name, output):
    #name = module.__class__.__name__
    activation[name] = output[0].detach()
activation = {}
model = SentenceTransformer('T-Systems-onsite/cross-en-de-roberta-sentence-transformer')
output = model.encode("test sentence", convert_to_tensor=True)
for name, layer in model.named_modules():
    layer.register_forward_hook(hook(name,output))
print(activation)

the output of the intermediate layers is the same.

1 Like

That’s great it works!

One comment, hooks have predefined arguments which should take the module first. You can read about it in the docs, but just a comment in case future hooks don’t work!

1 Like

Hello,

That’s great, It’s work for me, Really appreciate for help.

1 Like