Can't use from_blob to construct tensor on GPU in C++

I’m attempting to construct a tensor directly on the GPU from a float array. When I generate a random tensor and pass it the same TensorOptions item, it successfully generates on the GPU, but when I do the same thing with from_blob, it gives me an error.

Minimum working example:

#include <bits/stdc++.h>
#include <torch/torch.h>

int main() {
	auto options = torch::TensorOptions().device(torch::kCUDA);

	float temp[12];
	for (int i = 0; i < 12; i++) {
		temp[i] = i;
	}

	torch::Tensor cudaTest = torch::rand({12}, options); //works 
	std::cout << cudaTest << std::endl;

	cudaTest = torch::from_blob(temp, {12}, options); //doesn't work
	std::cout << cudaTest << std::endl;  
}

Output:

 0.9597
 0.6517
 0.5960
 0.4641
 0.9111
 0.4068
 0.5783
 0.0447
 0.2747
 0.7274
 0.0227
 0.7498
[ CUDAFloatType{12} ]
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: invalid argument (getDeviceFromPtr at ../aten/src/ATen/cuda/CUDADevice.h:13)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6a (0x7fcdfa928aaa in /usr/local/libtorch/lib/libc10.so)
frame #1: <unknown function> + 0x390babf (0x7fcdad161abf in /usr/local/libtorch/lib/libtorch_cuda.so)
frame #2: at::from_blob(void*, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&) + 0x823 (0x5580868fcd03 in ./a)
frame #3: torch::from_blob(void*, c10::ArrayRef<long>, c10::TensorOptions const&) + 0x8c (0x5580868fdaac in ./a)
frame #4: main + 0x189 (0x5580868f2e69 in ./a)
frame #5: __libc_start_main + 0xf3 (0x7fcda8e040b3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: _start + 0x2e (0x5580868f398e in ./a)

Aborted (core dumped)

Any idea what the problem could be? I don’t want to construct in regular memory and use .to(device) since it seems to be a bit slower.

@farmersrice
check this issue.
https://github.com/pytorch/pytorch/issues/15426’, I think our document need update.
You can not copy memory from CPU to GPU directly. Your temp[] is not on GPU.
I think you have to use .to(device) at this point.