RuntimeError: derivative for aten::linear_backward is not implemented on MPS

fazaghifari · March 15, 2023, 3:52pm

Dear all,

I’m currently using Pytorch 1.13.1 on my Macbook M1 Pro

Long story short, I wanted to reproduce this notebook:
https://github.com/nanditadoloi/PINN/blob/main/solve_PDE_NN.ipynb on my machine. But obviously, I have to change the device to "mps"

However, it returns RuntimeError: derivative for aten::linear_backward is not implemented if I set device = torch.device("mps")

But it runs fine if I set the device to "cpu". Does anyone have a similar problem?

Addition: Just found the culprit. In the training step in the given link, it combines two loss functions the mse_u and mse_f. I tried to use mse_u only and it works fine on MPS, but when I tried mse_f only, the error message appear.

But I still have no idea why.

Thank you in advance

rekarantnetwork · April 11, 2023, 2:54pm

I work with PINNs on MPS too. It seems not all gradient calculation functions have been implemented for Apple Silicon MPS backend.

So we can’t calculate the PDE loss in the domain using MPS.

u_x = torch.autograd.grad(u.sum(), x, create_graph=True)[0]
u_t = torch.autograd.grad(u.sum(), t, create_graph=True)[0]

would throw errors.

This is a known problem. It seems you can still make it work by using the nightly version (only for linear layers). Please check the following issue:

github.com/pytorch/pytorch

Higher order derivatives not working when setting compute device to `torch.device("mps")`

opened 12:36PM - 06 Apr 23 UTC

HamedHemati

module: autograd triaged module: mps

### 🐛 Describe the bug I have a code example that performs fast updates, simila…r to what we have in MAML, to a perform inner updates and then computes the meta-loss to backpropagate through the optimization trajectory. When computing gradients in the fast updates, we can set `create_graph=True` to enable second-order derivatives when call backward on the meta-loss. When using `torch.device("mps")`, it throws an error that `derivative for aten:linear is not implemented`. It works fine when you set `create_graph=False` in the inner updates but then it wouldn't compute the higher order derivatives. I don't get the error when using `torch.device("cpu")` and `torch.device("cuda")`. Here is the code to reproduce the error: ```python device = torch.device("mps") model = nn.Sequential(nn.Linear(10, 5), nn.ReLU(), nn.Linear(5, 1)) model.to(device) # Initial fast parameters fast_params_0 = {n: deepcopy(p) for (n, p) in model.named_parameters()} # First inner update x = torch.randn(10, 10, device=device) y = torch.randn(10, 1, device=device) logits_0 = torch.func.functional_call(model, fast_params_0, x) loss = nn.MSELoss()(logits_0, y) grads_0 = torch.autograd.grad(loss, fast_params_0.values(), create_graph=True, retain_graph=True) # Compute fast parameters after the first inner update fast_params_1 = {n: p - 0.1 * g for ((n, p), g) in zip(fast_params_0.items(), list(grads_0))} # Compute meta-loss and backprop through the optimization trajectory x = torch.randn(10, 10, device=device) y = torch.randn(10, 1, device=device) logits_1 = torch.func.functional_call(model, fast_params_1, x) met_loss = nn.MSELoss()(logits_1, y) met_loss.backward() ``` And, the error I get: ``` RuntimeError: derivative for aten::linear_backward is not implemented ``` *I get the same error for any layer type. ### Versions ``` [pip3] numpy==1.23.5 [pip3] pytorchcv==0.0.67 [pip3] torch==2.0.0 [pip3] torchaudio==2.0.0 [pip3] torchmetrics==0.11.4 [pip3] torchvision==0.15.0 [conda] numpy 1.23.5 py39h1398885_0 [conda] numpy-base 1.23.5 py39h90707a3_0 [conda] pytorch 2.0.0 py3.9_0 pytorch [conda] pytorchcv 0.0.67 pypi_0 pypi [conda] torchaudio 2.0.0 py39_cpu pytorch [conda] torchmetrics 0.11.4 pypi_0 pypi [conda] torchvision 0.15.0 py39_cpu pytorch ``` cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved @soulitzer @Lezcano @Varal7 @kulinseth @malfet @DenisVieriu97 @razarmehr @abhudev