Sort tensors inplace

Hello Everyone,

I encounter Cuda-out-of-memory error during the sorting operation due to a huge tensor.
The tensor shape is [40000, 2048] thus I think If we can perform the sorting operation in an inplace manner it would be very nice feature to become more efficient.
Also, is there any workaround to make it inplace by myself without changing the torch version, i.e., 1.5?

Best regards.