Stateful recurrence in constant space


I am quickly running out of memory, but would have expected/hoped the subgraph to run in constant space.

My training data consists of a list of images. The number of images can vary from one to a couple thousand. For each training sample the image size is fixed, but between samples it can vary from 50 x 50 to 700 x 700. I want to aggregate/ compress these images down to a single one and do a pixel-wise classification. I am currently using this ConvLSTMCell implementation:

What I am currently doing is: Loop over the input images for each training sample, feed them into a small-ish ResNet and then feed the output into the ConvLSTMCell. Once I completed the loop, I use the final hidden state of the ConvLSTMCell for another ResNet component to do the classification (NLLoss2d).

for X, y in train_loader:
    state = None
    for X_ in X:
        output = first_stage(Variable(X_).cuda())
        state = aggregate(output, state)

    output = classify(state[0])

    loss = criterion(output, Variable(y).cuda())
    loss_ +=[0]


In my previous setup I just stacked the channels of the images and only used a single (but bigger) ResNet. I could easily fit 100 images this way, with the loop above, I can fit 3 at most.

I suspect my issue is that the generated computational graphs in the loop take up my memory. Is it somehow possible to have this run in constant space? Do you have an idea for a different/ better setup?

(In the first network I also have an embedding layer mapping from 378 to 10, my channels are generally not bigger than 60).