Cublas runtime error on GPU running, but works on CPU

e_c is containing invalid values for the torch.gather operation (4 is out of bounds for a shape of weight_label.shape = [4]), so you should get an error pointing to this invalid index:

a = torch.gather(weight_label, 0, e_c.view(-1).cuda())

Unfortunately, some CUDA assert statements were disabled in 1.5, which triggers a silent failure and raises the error at the next CUDA operation which happens to be a cuBlas call.