Memory usage baloons when saving output of forward pass

deibeljc · March 7, 2018, 10:09pm

I am having a big memory issue where when I save and append the output of my model it increases the memory usage by a lot on every pass. I have no idea if this is intended or if this is a bug or something stupid I am doing. I modeled it off this example: https://github.com/pytorch/examples/blob/master/reinforcement_learning/actor_critic.py

Here is where I am appending the values to backprop on after the episode:

github.com

shriuken/starcraft2ai/blob/master/src/QAgent.py#L66


    if USE_CUDA and torch.cuda.is_available():
        probs, state_value, x, y = self.forward(Variable(state).cuda())
    else:
        probs, state_value, x, y = self.forward(Variable(state))
    m = Categorical(probs)
    x_probs = Categorical(x)
    y_probs = Categorical(y)
    x_coord = x_probs.sample()
    y_coord = y_probs.sample()
    action = m.sample()
    self.saved_actions.append([m.log_prob(action), state_value, x_probs.log_prob(x_coord), y_probs.log_prob(y_coord)])
    return action.data[0], x_coord, y_coord


def forward(self, x):
    x = convolution_layer(x, layer * num_layers)
    x = self.mp(residual_layer(x, 256, 78))
    x = F.relu(self.lin1(x.view(x.size(0), -1)))
    action_scores = self.action_head(x)
    state_values = self.value_head(x)
    x_coord = self.x(x)
    y_coord = self.y(x)

jpeg729 · March 7, 2018, 10:58pm

It looks like you are storing output Variables, and that is the source of the problem because when you store a Variable, you force python to keep in memory the entire computation graph for that Variable.

You should probably save the underlying tensors instead.

self.saved_actions.append(..., state_value.data, ...)

deibeljc · March 7, 2018, 11:02pm

When I do a backprop, don’t I need that computation graph? I got around this by taking the float values from the Variables but then my network never actually learned

EDIT: I just tested this by changing the pytorch RL example and saving the m.log_prob(action).data. It ended up not learning on the backprop and was stuck at an average length of 20-21. This leads me to believe I need the computation graph for backprop

github.com

pytorch/examples/blob/master/reinforcement_learning/actor_critic.py#L61




model = Policy()
optimizer = optim.Adam(model.parameters(), lr=3e-2)




def select_action(state):
state = torch.from_numpy(state).float()
probs, state_value = model(Variable(state))
m = Categorical(probs)
action = m.sample()
model.saved_actions.append(SavedAction(m.log_prob(action), state_value))
return action.data[0]




def finish_episode():
R = 0
saved_actions = model.saved_actions
policy_losses = []
value_losses = []
rewards = []
for r in model.rewards[::-1]:

jpeg729 · March 8, 2018, 7:47am

I see.
Well, either you save the log_prob of the action with its computation graph, or you recalculate the log_prob from the state before you calculate the losses.

So, either you accept ballooning memory, or you accept redoing computations.