If some input tensors still need the gradient, backpropagation will work correctly, just avoiding to store the gradient in all parameters using .requires_grad_(False).
How large is the difference?
Did you try to make the results deterministic following these docs?
If the difference is at approx. <=1e-5 it might be due to FP32 precision.