How to concatenate feature vectors from different extractors inplace

Hi everyone,

I’m working on combining feature vectors from multiple extractors. To optimize memory usage and runtime, I attempted in-place operations on a single tensor. My goal was to avoid memory allocation and copying by working on a single tensor and overwriting old values with new ones. However, this resulted in a RuntimeError:

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward

Am i backwarding a second time through the graph if i overwrite the tensors content? I could make it work by detaching the tensor before overwriting it with the new values.

def forward(self, state, action):
    ...
  # get rl state feature embeddings
  robot_emb_index = 0
  robot_state_index = self._obs_dim
  # works with detaching the tensor, without detaching i get the RuntimeError specified above
  self._robot_state_emb = self._robot_state_emb.detach()
  for feature_extractor in self._robot_state_extractors:
      self._robot_state_emb[:,robot_emb_index:robot_emb_index+feature_extractor._output_size] = feature_extractor(state[:,robot_state_index:robot_state_index+feature_extractor._input_size])
      robot_state_index += feature_extractor._input_size
      robot_emb_index += feature_extractor._output_size
  
  # works with detaching the tensor, without detaching i get the RuntimeError specified above
  self._critic_input = self._critic_input.detach()
  self._critic_input[:,:self._obs_embedding_size] = encoder_features
  self._critic_input[:,self._obs_embedding_size:self._obs_embedding_size+robot_emb_index] = self._robot_state_emb
  self._critic_input[:,self._obs_embedding_size+robot_emb_index:] = action

  features1 = F.relu(self._h1(self._critic_input))
  features2 = F.relu(self._h2(features1))
  q = self._h3(features2)
  return torch.squeeze(q)

What is the recommended way to do this? My goal is to avoid using expensive operations like torch.hstack(). It should be as efficient as possible.