Loss.backward is so slow!

hello, When I add some index lookup operations(just add this part of codes),the speed of loss.backward() is so slow. It cost 90 seconds during the loss.backward() while it just cost 3 seconds in the whole training epoch before. I’m confused the operation in the loss.backward(). And how to correct it or accelerate my codes!
Thanks a lot in advance!

        global_entity = torch.zeros((batch_size,hidden_size))
        global_relation = torch.zeros((batch_size,hidden_size_relation))
        relation_indices = []
        global_entity_ = torch.zeros((self.num_relation,hidden_size))
        global_relation_ = torch.zeros((self.num_relation,hidden_size_relation))
        

        for relation in range(self.num_relation):
            current_global_indices = torch.LongTensor(list(Corpus_.re2entity[relation])).cuda()
            global_entity_[relation] = out_entity_1[current_global_indices].mean(dim=0)

            current_global_relation_indices = torch.LongTensor(list(Corpus_.re2path[relation])).cuda() 
            global_relation_[relation] = out_relation_1[current_global_relation_indices].mean(dim=0)

        for i,triple in enumerate(batch_inputs):
            head,relation,tail =  triple
            global_entity[i] = global_entity_[relation]
            global_relation[i] = global_relation_[relation]
            relation_indices.append(relation)
        relation_indices = torch.LongTensor(relation_indices).cuda()
        global_entity = global_entity.cuda()
        global_relation = global_relation.cuda()