Why are the grads None?

otatopeht · September 1, 2021, 8:18am

Hello everyone,
I’m trying to finetune a pretrained sentence transformer on indomaine data with Pytorch.
The loss is being calculated but the grads are None.

The model looks like this:

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)

And when trying to use the forward hooks as follow:

activation = {}                                                                                             
def get_activation(name):                                                                                   
    def hook(model, input, output):                                                                         
        activation[name] = output[0].detach()                                                               
    return hook                                                                                             
model = SentenceTransformer(xxx)                
for name, layer in model.named_modules():                                                               
        layer.register_forward_hook(get_activation(name))                                                                                                                                               
output = model.encode("test sentence")                                                                             
for key in activation:                                                                    
     print(activation[key])

I get this error

in hook
    activation[name] = output.detach()
AttributeError: 'tuple' object has no attribute 'detach'

Also when going through the sentence transformer code, within the encode method the forward seems to be calculated with no grad, might that be the problem?

AlphaBetaGamma96 · September 1, 2021, 9:23am

With hooks, the returned object is of type tuple. So you need to get the 0-th element of that tuple in order to get the Tensor out of the tuple, as you did here

otatopeht:

def get_activation(name):                                                                                   
    def hook(model, input, output):                                                                         
        activation[name] = output[0].detach()                                                               
    return hook

I think with hooks you shouldn’t embed them within other functions, the hook should be a standalone function like this,

def hook(module, input, output):                                                                         
  name = module.__class__.__name__
  activation[name] = output[0].detach()

but you’d need to change the name variable, if the name is derived from the the particular layer. This can be done via module.__class__.__name__ (check if these two statements give the same results!)

otatopeht · September 1, 2021, 12:17pm

Thank you for replying!

I changed the code as you mentioned and I get this:

TypeError: hook() missing 2 required positional arguments: ‘input’ and ‘output’

AlphaBetaGamma96 · September 1, 2021, 12:18pm

Can you share exactly what you changed? So, it’s easier for me to debug! A reproducible example would be great!

otatopeht · September 2, 2021, 9:20am

Thank you for replying!
The error is fixed
Here is the code:

from sentence_transformers import SentenceTransformer

def hook(name, output):
    #name = module.__class__.__name__
    activation[name] = output[0].detach()
activation = {}
model = SentenceTransformer('T-Systems-onsite/cross-en-de-roberta-sentence-transformer')
output = model.encode("test sentence", convert_to_tensor=True)
for name, layer in model.named_modules():
    layer.register_forward_hook(hook(name,output))
print(activation)

the output of the intermediate layers is the same.

AlphaBetaGamma96 · September 2, 2021, 9:26am

That’s great it works!

One comment, hooks have predefined arguments which should take the module first. You can read about it in the docs, but just a comment in case future hooks don’t work!

Ryan06 · September 8, 2021, 5:43am

AlphaBetaGamma96:

With hooks, the returned object is of type tuple. So you need to get the 0-th element of that tuple in order to get the Tensor out of the tuple, as you did here
otatopeht:
def get_activation(name):                                                                                   
    def hook(model, input, output):                                                                         
        activation[name] = output[0].detach()                                                               
    return hook  
I think with hooks you shouldn’t embed them within other functions, the hook should be a standalone function like this,
def hook(module, input, output):                                                                         
  name = module.__class__.__name__
  activation[name] = output[0].detach()
but you’d need to change the name variable, if the name is derived from the the particular layer. This can be done via module.__class__.__name__ (check if these two statements give the same results!) Myloweslife

Hello,

That’s great, It’s work for me, Really appreciate for help.