Hello everyone,
I’m trying to finetune a pretrained sentence transformer on indomaine data with Pytorch.
The loss is being calculated but the grads are None.
The model looks like this:
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)
And when trying to use the forward hooks as follow:
activation = {}
def get_activation(name):
def hook(model, input, output):
activation[name] = output[0].detach()
return hook
model = SentenceTransformer(xxx)
for name, layer in model.named_modules():
layer.register_forward_hook(get_activation(name))
output = model.encode("test sentence")
for key in activation:
print(activation[key])
I get this error
in hook
activation[name] = output.detach()
AttributeError: 'tuple' object has no attribute 'detach'
Also when going through the sentence transformer code, within the encode method the forward seems to be calculated with no grad, might that be the problem?