Reinterpret PyTorch array as a different dtype

wjakob · August 31, 2018, 10:14am

Dear all,

I’m looking for a way of reinterpreting a PyTorch tensor as a different dtype of matched size. This kind of operation is provided by NumPy, but as far as I can tell PyTorch only provides casts. (naturally, reinterpreting types is an operation that would only make sense for tensors without gradients).

Here is an example of what I mean, using NumPy:

>>> a = np.linspace(0, 1, 5, dtype=np.float32)
>>> a
array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ], dtype=float32)

>>> a.view(np.int32)
array([         0, 1048576000, 1056964608, 1061158912, 1065353216], dtype=int32)

My current workaround involves a PyTorch -> NumPy -> PyTorch roundtrip, which is not very nice/efficient.


>>> a = torch.linspace(0, 1, 5)
>>> torch.Tensor(a.numpy().view(dtype=np.int32))
tensor([         0., 1048576000., 1056964608., 1061158912., 1065353216.])

I’d be very grateful for any suggestions on how PyTorch could be convinced to change the dtype for an already-created memory region.

Thanks,
Wenzel

SimonW · August 31, 2018, 5:22pm

I’m curious. How is np.view(dtype) generally used?

ptrblck · August 31, 2018, 5:39pm

Probably not a general use case, but I’ve seen it being used in np.unique to consolidate the reshaped array.

wjakob · August 31, 2018, 9:28pm

These kinds of type reinterpretations are an important building block for implementing special functions. It’s quite common to switch from float to integer, do a few manipulations there (e.g. of the mantissa), and then switch back to floats in the end. In the NumPy world, np.view is the operation that enables one to do that. AFAIK it’s not possible with PyTorch. There is no analog of np.view as far as I can see, and directly assigning the dtype yields an error message:

>>> t.dtype = torch.int32
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: attribute 'dtype' of 'torch._C._TensorBase' objects is not writable

Or do you have any suggestions?

SimonW · September 1, 2018, 1:23am

If you are writing a C++ extension function, you can use reinterpret_cast<int64_t*>(tensor.data<float>()). But unfortunately I don’t think that this is well supported in Python. Please file a feature request on GitHub. Thanks!

yurib · December 25, 2018, 10:07pm

Hello, all!

Wonder if feature request has been filed? Another possible usage (for the lack of better option?) - cpu HalfTensor doesn’t support loads of operations, even as basic as index() & index_select(), which are useful for slicing batch from large (regressor) matrix (index_select is also very efficient/fast)
So one workaround is ‘pack’ 2 float16 columns to one float32 (or even 4 float16 to one float64), then select required rows (for next batch), then ‘unpack’ back to float16

numpy(), from_numpy() trick works but looks ugly

calvinmccarter · December 20, 2021, 10:49pm

For those curious, this feature has been implemented and merged: Add tensor.view(dtype) by zasdfgbnm · Pull Request #47951 · pytorch/pytorch · GitHub

So now you can use tensor.view(dtype=...).