I trained Transformer model for mask word prediction but my model output always same for every class
tensor([[[-2.3026, -2.3026, -2.3026, ..., -2.3026, -2.3026, -2.3026],
[-2.3026, -2.3026, -2.3026, ..., -2.3026, -2.3026, -2.3026],
[-2.3026, -2.3026, -2.3026, ..., -2.3026, -2.3026, -2.3026],
...,
[-2.3026, -2.3026, -2.3026, ..., -2.3026, -2.3026, -2.3026],
[-2.3026, -2.3026, -2.3026, ..., -2.3026, -2.3026, -2.3026],
[-2.3026, -2.3026, -2.3026, ..., -2.3026, -2.3026, -2.3026]]],
device='cuda:0', grad_fn=<LogSoftmaxBackward>)
After torch.exp()
tensor([[[0.1000, 0.1000, 0.1000, ..., 0.1000, 0.1000, 0.1000],
[0.1000, 0.1000, 0.1000, ..., 0.1000, 0.1000, 0.1000],
[0.1000, 0.1000, 0.1000, ..., 0.1000, 0.1000, 0.1000],
...,
[0.1000, 0.1000, 0.1000, ..., 0.1000, 0.1000, 0.1000],
[0.1000, 0.1000, 0.1000, ..., 0.1000, 0.1000, 0.1000],
[0.1000, 0.1000, 0.1000, ..., 0.1000, 0.1000, 0.1000]]],
device='cuda:0', grad_fn=<ExpBackward>)
What can be wrong here? Thanks in advance