Computing Gradients with respect to intermediate activations

@cndn can you help? I see that Is retain_grad() supported in new C++ API? is kinda close to my issue. Thanks a lot!

I have to accumulate these gradients in a list data structure, which I want to pass into the constructor of the hook:

struct GradSaveHook : public torch::autograd::FunctionPreHook {
  GradSaveHook(std::vector<std::vector<torch::autograd::Variable> >& g, torch::autograd::Variable* x) : v(x), gradients(g) {};
  torch::autograd::variable_list operator()(const torch::autograd::variable_list& grads) {
    std::cout << "grads" << grads << "\n" << std::endl;
    gradients.push_back(grads);
    return grads;
  }
  torch::autograd::Variable* v;
  std::vector<std::vector<torch::autograd::Variable> > gradients;
};

I then run the following loop:

for(unsigned i=0; i < net.getNumLayers(); i++) {
        auto w = net.getWeightTensor(i, device);
        auto b = net.getBiasTensor(i, device);

        out = w.matmul(out) + b;
        out = torch::relu(out);
        activations.push_back(out);

        auto hook_ptr = std::make_shared<GradSaveHook>(new GradSaveHook(gradients, &out));
        out.add_hook(hook_ptr);
    }

Then, I define the loss:

auto loss = out[0];
loss.backward();