Hey everyone, I’ve been trying to write up my own implementations in Numpy for some of the ops handled by Pytorch, I was writing my an implementation for softmax_backwards and it just ended up looking like this

```
dx = output * grad
s = dx.sum(axis=dimension, keepdims=True)
temp = grad - s
out = output * temp
```

Here output is just the output of Softmax itself and grad is the incoming gradient, and dimension is just whatever was supplied as the dim argument to softmax initially

. I believe this implementation lines up with what eventually happens in `pytorch/aten/src/ATen/native/SoftMax.cpp`

However I couldn’t match my output with the Pytorch generated one. Eventually I noticed that in the torch way of doing it in Softmax.cpp, defined in host_softmax_backward()I noticed that the output it returns ends up having the same data type as the grad input (which in this case lets say a float). However in the calculations happening in the function itself some of the math end us as doubles and when you assign it to the output It just gets casted down to a float and returned. In that sense if I change my attempted implementation of softmax_backwards to look like this

```
dx = output * grad
s = dx.sum(axis=dimension, keepdims=True, dtype=np.float64)
temp = grad - s
out = output * temp
out = out.astype(grad.dtype)
```

Then the numbers match up perfectly, so I guess I was wondering is this whole input_float—>intimidated_calculation_double—>output_float switching and then casting data types intentional behaviour?

Thanks!