(libtorch) How can I interpret an `at::Tensor` as a different datatype?

Nathan_Wood · May 18, 2020, 8:55pm

How can I interpret an at::Tensor as a different datatype? I’m doing RL, and I need to get quite a few tensors onto the GPU. Before, I could create one long empty float32 tensor in pinned memory, send it to the GPU when needed, and slice it, but this doesn’t work when mixing datatypes.

Basically, my question is how can I do this:

auto x = torch::empty({16}, torch::dtype(torch::kByte));
char *x_ptr = x.data_ptr<char>();
// write to x_ptr using other tensors...
auto x_gpu = x.to(torch::kCUDA);
auto a = x_gpu.slice(0, 0, 8)./* interpret as */(torch::kFloat16);
auto b = x_gpu.slice(0, 8, 16)./* interpret as */(torch::kFloat32);

glaringlee · May 19, 2020, 10:19pm

@Nathan_Wood
Correct me if I am wrong.
So after
auto a = x_gpu.slice(0, 0, 8)./* interpret as /(torch::kFloat16);
auto b = x_gpu.slice(0, 8, 16)./ interpret as */(torch::kFloat32);
what you want to see is that a is a tensor that has four float16 items and b has two float32 items, and they should point to the same memory location as x_gpu, correct?

If so, I think you have to implement your own code to treat a tensor as different data types (still passing byte type tensor and pass some meta when launch your gpu kernel code?). A tensor can only have one data type. And if you do to() to convert a tensor to another data type, this will be a copy.

Nathan_Wood · May 20, 2020, 12:47am

what you want to see is that a is a tensor that has four float16 items and b has two float32 items, and they should point to the same memory location as x_gpu, correct?

Correct

If so, I think you have to implement your own code to treat a tensor as different data types (still passing byte type tensor and pass some meta when launch your gpu kernel code?).

There is no custom GPU kernel code. I’m using libtorch because I require access to other C libraries and it’s more convenient to write the entire thing in C++ than a mix of python, C & C++ (since python extensions require C afaik).
The first thing that happens after the data comes out would be a cast to float32 (float16 is only used because the play buffer needs to be large), but it still seem really inconvenient and limiting to have to write an entire kernel to do the cast, when code to cast already exists, but cannot be used because the tensor has the wrong datatype.
The other thing that might fix this is something like tensor::from_blob that uses that as the data pointer. Does anything like that exist?

glaringlee · May 20, 2020, 3:48pm

@Nathan_Wood
To my knowledge, we don’t have such mechanism.

This is to me not a type cast, but a memory cast. In your example, you have 16 bytes originally, and you want the first half act like four float16 and second half act like two float 32.

from_blob doesn’t support mixed data type copy as well.

What you can do here is to convert a and b to two new tensors with correct type and do your process and then write the processed data back to x_gpu from a and b (I assume the output is still a and b as example), all using their data ptr, so you control how you want to copy (your tensor should be contiguous).