Add a capture hook for torch::autograd::grad()

swilson314 · June 9, 2021, 8:46pm

I’m making a call to torch::autograd::grad(), and I want to store individual gradients rather than accumulating them.

I tried adding a hook to the model parameters via tensor::register_hook(), ala TEST(CustomAutogradTest, Hooks) which requires a call to backward() rather than autograd::grad(), but the hook only got called once, with the gradient accumulated values.

I’ve called through the code and it seems like what I need is to add a hook to capture.hooks_. (See code below.) To do so, I need access to the current GraphTask, which is created in Engine::execute(), via a call to run_backward(). See DistEngine::computeDependencies() for an example of adding to capture.hooks_.

Is there a way to add to capture.hooks_? If not, it seems I need to replace Engine::execute(), maybe subclass Engine?

I feel like if I could do this, this method could be employed by the users who want to create a jacobian from c++.

void Engine::evaluate_function(
    std::shared_ptr<GraphTask>& graph_task,
    Node* func,
    InputBuffer& inputs,
    const std::shared_ptr<ReadyQueue>& cpu_ready_queue) {
  // If exec_info_ is not empty, we have to instrument the execution
  auto& exec_info_ = graph_task->exec_info_;
  if (!exec_info_.empty()) {
    auto& fn_info = exec_info_.at(func);
    if (auto* capture_vec = fn_info.captures_.get()) {
      // Lock mutex for writing to graph_task->captured_vars_.
      std::lock_guard<std::mutex> lock(graph_task->mutex_);
      for (const auto& capture : *capture_vec) {
        auto& captured_grad = graph_task->captured_vars_[capture.output_idx_];
        captured_grad = inputs[capture.input_idx_];
        for (auto& hook : capture.hooks_) {
          captured_grad = (*hook)(captured_grad);
        }
      }
    }

swilson314 · June 9, 2021, 10:57pm

I’m now doubtful of this approach, I set a breakpoint on AccumulateGrad::apply() (used for grad_accumulator_) and it’s just getting called once (per tensor) to call to backward(), so it’s actually a copy rather than an accumulator.