I’m working on some cuda extension for pytorch. When I received the grad_output, printing it yields correct values.
std::cout << upstreamGrad << std::endl;
output
1 1 1 1 1 1
[ CUDAFloatType{1,6} ]
However, accessing it with a pointer returns wrong values.
std::vector<float> tmp(6);
cudaMemcpy(tmp.data(), upstreamGrad.data_ptr(), 6 * sizeof(float), cudaMemcpyDeviceToHost);
for (int i = 0; i < 6; i++) {
std::cout << tmp[i] << std::endl;
}
output
1
20
12
0
27
22
I was wondering if I’m missing something here.