Using ``data_ptr<float>`` on network evaluation and passing it to cuda kernel is not training network


I am passing in an array of network evaluation data pointers to a an operation efficiently in cuda. The general flow of my code is as such:

torch::Tensor* evals_arr = new torch::Tensor[4];
float **evals_host;

for (int i =0; i < 4; i++) {
    evals_arr[i] = networks[i]->forward(inputs[i]); // networks and inputs are std vectors of tensors
    // the evaluation evals_arr[i] is already on the GPU, so the pointer points to a location in the GPU
    evals_host[i] = evals_arr[i].data_ptr<float>();

// devals has already been cudaMalloced appropriately
cudaMemcpy(devals, evals, 4*sizeof(float*), cudaMemcpyHostToDevice);

I pass devals to my cuda kernel and use change its values in some ways. But doing this over multiple iterations of optimization does not change the evaluations of the neural networks at all. I tested the same code in C++ loops without cuda and the network evaluations change. Do I have to use accessors in order to retain the graph of the neural network?

How do I retain PyTorch’s computation graph when passing in C++ PyTorch tensors to cuda kernels?

Thank you.

Hello @ptrblck. Do you know if there is a way to retain the computation graph of tensors (such as neural network evaluations) whose data pointers are passed to cuda kernels?

Thank you.