RuntimeError: element 0 of variables does not require grad and does not have a grad_fn

Yes, this line of code is causing the error since you are explicitly detaching the tensors from the computation graph and are calling into a 3rd party library (numpy in this case).
You would either have to use PyTorch operations or could write a custom autograd.Function using numpy operations and implement the backward pass manually as described here.