I have implemented a form of option critic network with batch update for the critic but iterative for the actor. The first update is done after a batch has accumulated on both and then iterative on the actor but intermittently on the critic (when enough new samples have accumulated). This first update works fine but then when I only update the actor I get the following where the trackback reveals this to be the error:
nonspatial_latent_output = torch.unsqueeze(self.latent_nonspatial(nonspatial_input),0)
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [262176, 256]], which is output 0 of TBackward, is at version 2; expected version 1 instead
The only difference between the batch update is that that variable is then squeezed, but from what I gater neither squeeze nor unsqueeze should be inplace operations.
Any help would be greatly appreciated!!