Hello,
I am passing in an array of network evaluation data pointers to a an operation efficiently in cuda. The general flow of my code is as such:
torch::Tensor* evals_arr = new torch::Tensor[4];
float **evals_host;
for (int i =0; i < 4; i++) {
evals_arr[i] = networks[i]->forward(inputs[i]); // networks and inputs are std vectors of tensors
// the evaluation evals_arr[i] is already on the GPU, so the pointer points to a location in the GPU
evals_host[i] = evals_arr[i].data_ptr<float>();
}
// devals has already been cudaMalloced appropriately
cudaMemcpy(devals, evals, 4*sizeof(float*), cudaMemcpyHostToDevice);
I pass devals
to my cuda kernel and use change its values in some ways. But doing this over multiple iterations of optimization does not change the evaluations of the neural networks at all. I tested the same code in C++ loops without cuda and the network evaluations change. Do I have to use accessors in order to retain the graph of the neural network?
How do I retain PyTorch’s computation graph when passing in C++ PyTorch tensors to cuda kernels?
Thank you.