Multi-Agent Advantage calculation is leading to in-place gradient error

Hi Aaron!

Starting with the forward-call traceback, look at the last couple of lines
in your code (that are then followed by calls into pytorch infrastructure).

I would certainly drill down into what self.critic (x) is doing.

[Edit: Some further words of clarification / explanation: As I’ve come to
understand it, anomaly detection’s forward-call traceback flags the
operation in the forward pass whose backward pass is being blocked
by the inplace modification of some tensor required by the backward
pass (rather than flagging the operation that modifies that tensor).

Given that you are using retain_graph = True, I speculate that you
are doing something like:

critic_loss.backward (retain_graph = True)
critic_optimizer.step()   # modifies critic's parameters inplace
...
actor_loss.backward()     # where actor loss depends on critic
actor_optimizer.step()

If so, you will try to backpropagate again through critic, which has had
its parameters modified inplace by its optimizer. Whether or not modifying
a tensor inplace will cause an inplace-modification error depend on the
details of whether that tensor is needed in the backward-pass computation,
but it is likely that at least some of critic’s parameters will be needed in
the backward pass.

If you have identified the tensor that is causing the inplace-modification
error – likely one of critic’s parameters – print out its ._version before
and after calling critic_optimizer.step(). If you can’t identify the tensor
in question, you can still test this theory by commenting out the call to
critic_optimizer.step() and see if this particulate inplace-modification
error goes away. (There may be others.)]

Also, look at the inplace-modification error itself. It is telling you that a
FloatTensor of shape [256, 1] is the tensor that is being modified
inplace. Where in your code do you have a tensor of that shape (that
occurs somewhere in the forward pass)? Look closely at how it is being
used and if you can see an inplace modification.

Note that the error message is complaining that it should be of “version 3”
rather than of “version 4.” Print out the tensor’s ._version property at
various strategic places in your code. The inplace modification is occurring
somewhere between ._version values of 3 and 4. You can insert
intermediate ._version print statements to perform a binary search to
locate exactly where the inplace modification is occurring.

For example, you might try printing out ._version just before and just
after the call to self.critic (x) upon which the forward-call traceback
casts some suspicion.

Note also that you are calling .backward (retain_graph = True).
First, make sure that this is correct logic for your use case. If it is, be
aware that calling optimizer.step() performs an inplace modification
of the parameters being optimized by optimizer. Again, you can check
this by printing out ._version for the problematic tensor before and after
the call to optimizer.step().

For some examples that illustrate these inplace-modification debugging
techniques, see this post:

Good luck!

K. Frank