RuntimeError: matrix and matrix in loss.backward()

I am trying to implement Dynamic Coattention networks
When running the backward() I get this error. forward() works fine.
I use pytorch version 0.3.1 and python3.6 and cuda 9.0.
the loss function is NLLLoss and you can find the entire repo here
dcn.py is the main file.

Traceback (most recent call last):
  File "dcn.py", line 443, in experiment
    if not trainer.train():
  File "/home/vanangamudi/projects/dcn/trainer.py", line 162, in train
    loss.backward()
  File "/home/vanangamudi/env/pytorch35/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/vanangamudi/env/pytorch35/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
RuntimeError: matrix and matrix expected at /pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:241

Steps to reproduce:-

  1. Clone the repo
  2. create a dir under the cloned repo named dataset
  3. place the train and dev json files dataset squad
  4. In dcn.py change set flush = True
  5. run as >>> python dcn.py "" train