Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_mm

That actually solved my error. However, I am getting a new error now, which I am not able to resolve. It is as follows:

  File "run_techqa.py", line 623, in <module>
    main()
  File "run_techqa.py", line 617, in main
    model = train(args, train_dataset, model, optimizer, tokenizer, model_evaluator)
  File "run_techqa.py", line 223, in train
    outputs = model(**inputs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/apex/amp/_initialize.py", line 197, in new_fwd
    **applier(kwargs, input_caster))
  File "/content/MyDrive/IBM/TechQA-Base/techqa-master/model_techqa.py", line 113, in forward
    Hq = self.lq(hq)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1372, in linear
    output = input.matmul(weight.t())
RuntimeError: Expected object of scalar type Float but got scalar type Half for argument #2 'mat2' in call to _th_mm

I had used two functions in the forward part which I have not specified previously, these are:

   def splitting_(self,ids,type_ids):
      ans=[]
      for i,lis in enumerate(ids):
        typei=type_ids[i]
        ends=[]
        for j in range(len(typei)-1):
          if typei[j]!=typei[j+1]:
            ends.append(j+1)
        if len(ends)==1:
          qend=ends[0]
          dend=len(typei)
        else:
          qend,dend=ends[0],ends[1]
        qtns=lis[:qend]
        dtns=lis[qend:dend]
        temp=[qtns, dtns]
        ans.append(temp)
      q=[i[0] for i in ans]
      d=[i[1] for i in ans]
      return q,d

    # will take in list of n tensors and output a tensor of dimension n.
    # the list will contain tensors of sentence encoded using tokenizer.
    def _batchify(self,data, align_right=False, include_lengths=False):
      lengths = [x.size(0) for x in data]
      max_length = max(lengths)
      tens=torch.rand(1024,dtype=torch.float32)
      out=torch.stack([torch.stack([tens for _ in range(max_length)]) for _ in range(len(data))])
      for i in range(len(data)):
        data_length = data[i].size(0)
        offset = max_length - data_length if align_right else 0
        out[i].narrow(0, offset, data_length).copy_(data[i])
      if include_lengths:
          return out, lengths
      else:
          return out

Would really appreciate your help!

Note:
I have already tried doing hq=hq.float() and hq=torch.tensor(hq,dtype=torch.float32)