Help!Why grad is always NA

It is a [Multiple](javascript::wink: [Linear](javascript::wink: [Regression](javascript::wink: model
but it seems grad is not worked

theta.requires_grad=True
for eproch in range(epochs):
    y_pred=X_b.mm(theta)
    loss=torch.sqrt((y_pred-y).sum())
    loss.backward()
    print(theta.grad)
#     theta-=lr*theta.grad

tensor([[nan],
[nan],
[nan],
[nan]], device=‘cuda:0’, dtype=torch.float64)
tensor([[nan],
[nan],
[nan],
[nan]], device=‘cuda:0’, dtype=torch.float64)
tensor([[nan],
[nan],
[nan],
[nan]], device=‘cuda:0’, dtype=torch.float64)
tensor([[nan],
[nan],
[nan],
[nan]], device=‘cuda:0’, dtype=torch.float64)
tensor([[nan],
[nan],
[nan],
[nan]], device=‘cuda:0’, dtype=torch.float64)
tensor([[nan],
[nan],
[nan],
[nan]], device=‘cuda:0’, dtype=torch.float64)
tensor([[nan],
[nan],
[nan],
[nan]], device=‘cuda:0’, dtype=torch.float64)
tensor([[nan],
[nan],
[nan],
[nan]], device=‘cuda:0’, dtype=torch.float64)
tensor([[nan],
[nan],
[nan],
[nan]], device=‘cuda:0’, dtype=torch.float64)
tensor([[nan],
[nan],
[nan],
[nan]], device=‘cuda:0’, dtype=torch.float64)
tensor([[nan],
[nan],
[nan],
[nan]], device=‘cuda:0’, dtype=torch.float64)
tensor([[nan],
[nan],
[nan],
[nan]], device=‘cuda:0’, dtype=torch.float64)
tensor([[nan],
[nan],
[nan],
[nan]], device=‘cuda:0’, dtype=torch.float64)

type or paste code here

Hi,

What are the values of y and y_pred?
The gradient of sqrt at 0 is going to be infinite, leading to nans.