Gradients aren't CUDA tensors

Hi, guys, I have the problem about “Gradients aren’t CUDA tensors”,
My model is that:
class delawDecoderSelfAttnRnn(nn.Module):
def init(self, max_length=MAX_LENGTH, embedding_length=WORD_EMBEDDING_SIZE,
super(delawDecoderSelfAttnRnn, self).init()
self.embedding_length = WORD_EMBEDDING_SIZE
self.max_length = max_length
self.clusters_num = clusters_num
self.bmm_parameter = nn.Parameter(torch.randn(1, 1, self.max_length), requires_grad=True)
self.selfAttn = nn.Linear(self.embedding_length* 4, self.clusters_num)

def forward(self, truth_encoder_outputs, appeal_encoder_output): 
    output = F.relu(truth_encoder_outputs)
    output = torch.bmm(self.bmm_parameter, output.unsqueeze(0))
    output =, appeal_encoder_output.view(1, 1, -1)), 2)
    output = self.selfAttn(output)
    output = F.log_softmax(output[0])
    return output

After create a mode instance, and train the model, then have the error of Gradients aren’t CUDA tensors. Could any one can tell me is there problem in my model?