Hello,
I’m trying to create a multi-label classification based on BERT fine-tuning. However, model parameters are not updated after loss.backward() and optimizer.step(), So my model is never training.
I already check that all model parameters have a True value of requires_grad.
Also, I followed graph calculation to be sure that all variable have also a True value of requires_grad
For precision, I use Pytorch 1.7.0
Here are my model’s layers :
UNERLinearModel(
(bert): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(30522, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
....
(11): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
(size_embeddings): Embedding(300, 300)
(lstm): BiLstmContextuelLayer(
(layers): ModuleList(
(0): BidirLSTMLayer(
(directions): ModuleList(
(0): LSTMLayer(
(cell): CustomLSTMCell()
)
(1): ReverseLSTMLayer(
(cell): CustomLSTMCell()
)
)
)
(1): BidirLSTMLayer(
(directions): ModuleList(
(0): LSTMLayer(
(cell): CustomLSTMCell()
)
(1): ReverseLSTMLayer(
(cell): CustomLSTMCell()
)
)
)
(2): BidirLSTMLayer(
(directions): ModuleList(
(0): LSTMLayer(
(cell): CustomLSTMCell()
)
(1): ReverseLSTMLayer(
(cell): CustomLSTMCell()
)
)
)
)
)
(entity_classifier): Linear(in_features=1, out_features=3, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
And my loss class :
loss_function = torch.nn.BCEWithLogitsLoss()
entity_logits = torch.tensor(entity_logits.float(), requires_grad=True)
entity_types = entity_types.view(-1).float()
entity_masks = entity_masks.view(-1).float()
a = self._model.lstm.layers[0].directions[0].cell.W.clone()
train_loss = loss_function(entity_logits, entity_types)
train_loss = (train_loss * entity_masks).sum() / entity_masks.sum()
# Code for only debug : print all model's paramters to update
for name, param in self._model.named_parameters():
if param.requires_grad:
print(name, param)'
# backward Loss is for updating parameters of all model layers
train_loss.backward(retain_graph=True)
torch.nn.utils.clip_grad_norm_(self._model.parameters(), self._max_grad_norm)
self._optimizer.step()
self._scheduler.step()
self._model.zero_grad()
b = self._model.lstm.layers[0].directions[0].cell.W.clone()
print(f"parameters update {not(torch.equal(a, b))}")
return train_loss.item()
How can I correctly debug this issue?
How to be sure that the graph is not broken somewhere?
How to be sure that loss backward is executed correctly?
Please help me with any suggestion, It’s been 4 days that I am stuck on this topic.
Thank you very much.