Hi I am having some trouble figuring out why my code (in particular the loss function) runs faster on a CPU (~ 0.03 sec) than on a GPU (~ 0.2 sec).

Is there a way I can fix this?

Here is how I define my loss:

```
def loss(X, u_r):
t1 = perf_counter() ################################# For testing
X.requires_grad = True
u_r_X = torch.zeros(X.shape[0], X.shape[1], X.shape[1], device=device)
for i in range(X.shape[0]):
u_r_X[i,:,:] = torch.autograd.functional.jacobian(u_r, X[i][None,:], create_graph=True)[0,:,0,:]
energy_tensor = torch.zeros(X.shape[0], 1, device=device)
for j in range(X.shape[0]):
F = torch.eye(u_r_X[j].shape[0], device=device) + u_r_X[j]
energy_tensor[j] = 0.5 * (torch.sum((F)**2) - 2.0) - torch.log((torch.det(F))) + 50.0 * torch.log((torch.det(F)))**2
t2 = perf_counter() ################################# For testing
print(t2-t1, ' secs') ################################# For testing
return torch.mean(energy_tensor)
```