Why cant i use Tanh/sigmoid?

It depends on the (chain of) operations as described here. If the activation is needed for gradient computation, inplace operations are disallowed, and PyTorch will raise an error.