I’m attempting to construct a tensor directly on the GPU from a float array. When I generate a random tensor and pass it the same TensorOptions
item, it successfully generates on the GPU, but when I do the same thing with from_blob
, it gives me an error.
Minimum working example:
#include <bits/stdc++.h>
#include <torch/torch.h>
int main() {
auto options = torch::TensorOptions().device(torch::kCUDA);
float temp[12];
for (int i = 0; i < 12; i++) {
temp[i] = i;
}
torch::Tensor cudaTest = torch::rand({12}, options); //works
std::cout << cudaTest << std::endl;
cudaTest = torch::from_blob(temp, {12}, options); //doesn't work
std::cout << cudaTest << std::endl;
}
Output:
0.9597
0.6517
0.5960
0.4641
0.9111
0.4068
0.5783
0.0447
0.2747
0.7274
0.0227
0.7498
[ CUDAFloatType{12} ]
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: invalid argument (getDeviceFromPtr at ../aten/src/ATen/cuda/CUDADevice.h:13)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6a (0x7fcdfa928aaa in /usr/local/libtorch/lib/libc10.so)
frame #1: <unknown function> + 0x390babf (0x7fcdad161abf in /usr/local/libtorch/lib/libtorch_cuda.so)
frame #2: at::from_blob(void*, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&) + 0x823 (0x5580868fcd03 in ./a)
frame #3: torch::from_blob(void*, c10::ArrayRef<long>, c10::TensorOptions const&) + 0x8c (0x5580868fdaac in ./a)
frame #4: main + 0x189 (0x5580868f2e69 in ./a)
frame #5: __libc_start_main + 0xf3 (0x7fcda8e040b3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: _start + 0x2e (0x5580868f398e in ./a)
Aborted (core dumped)
Any idea what the problem could be? I don’t want to construct in regular memory and use .to(device) since it seems to be a bit slower.