First the error information is here:
File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/allennlp/run.py", line 18, in <module>
main(prog="allennlp")
File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 70, in main
args.func(args)
File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/allennlp/commands/train.py", line 101, in train_model_from_args
args.recover)
File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/allennlp/commands/train.py", line 131, in train_model_from_file
return train_model(params, serialization_dir, file_friendly_logging, recover)
File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/allennlp/commands/train.py", line 321, in train_model
metrics = trainer.train()
File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/allennlp/training/trainer.py", line 749, in train
train_metrics = self._train_epoch(epoch)
File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/allennlp/training/trainer.py", line 493, in _train_epoch
loss.backward()
File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/autograd/__init__.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: MAGMA getrf : U(4,4) is 0, U is singular at /opt/conda/conda-bld/pytorch_1532584813488/work/aten/src/THC/generic/THCTensorMathMagma.cu:413
So I calculate the softmax of head_num
vectors and get a batch of matrix of shape (batch_size, head_num, encoding_dim) which we called it A
. And then I calculate as
S = torch.bmm(A, A.transpose(2,1)) - torch.eye(head_num).cuda()
At last, I calculate the determinant for S[i] and add to the final loss:
self_attention_loss = torch.tensor(0, dtype=torch.float).cuda()
for i in range(batch_size):
self_attention_loss += torch.det(loss_matrix[i])
loss += (self_attention_loss * self._loss_rate / torch.tensor(batch_size, dtype=torch.float).cuda())
It seems that the test codes as below is ok:
import torch
a = torch.Tensor([[0,0],[0,0]])
a.requires_grad=True
b = torch.det(a)
print(b)
b.backward()
print(a.grad)
The output is:
tensor(0., grad_fn=<DetBackward>)
tensor([[0., 0.],
[0., 0.]])
So I don’t know where the reason is? Is it because the determinant is zero?
Thanks you all a lot!