Model doesn't fit in a GPU

Hi there,
I have a simple RNN model, that has another pre-trained RNN for an encoder and a pretty simple decoder with attention.
I runs with batch=1, but anything bigger than that fails.
I tried to use nn.DataParallell, but it consistently hangs (PyTorch 0.4) (note: I can never get all GPUs fully free - usually someone is running stuff too on the cluster)
I can’t really reduce my model parameters further due to the pre-trained tensor dimensions.
Any advice on what else can be done? Thanks!

How many hidden_layers, hidden_neurons, seq_length do you have? I assume that the tensors are torch.Float64?

Only 1 layer, the hidden size is 4800, that’s the one I can’t reduce (actually I tried to go to 2400 - same thing), seq_length = 36.
Yes, floats.
The problem is caused by the other pre-trained model - the moment I take it out, I can pass even larger vector (let’s say random or generated by some other Embedding), but I need both of them in one.
I tried to use pre-trained model to pre-encode everything, but this would take about 2,000 hours, which is not feasible… so they have to sit together

So you are having 64bit (float), seq_length=36, hidden_size=4800, num_layers=1, batch_size=(lets say 4)
mem_needed = 64 * 36 * 4800 * 1 * 4 = 44236800bit = 0.0442368GigaBit = 5.5296MegaByte of total memory just for the RNN If I did the math right, you should not have any problems…mhhhh.
What size are u passing into the other RNN?

I hear you and agree. This si what’s causing me troubles:

    (embedding): Embedding(43860, 620, padding_idx=0)
    (rnn): GRU(620, 2400, batch_first=True, dropout=0.25)
    (embedding): Embedding(43860, 620, padding_idx=0)
    (rnn): GRU(620, 1200, batch_first=True, dropout=0.25, bidirectional=True)

models 1& 2 are pre-trained, and even though I set grad = False the moment I add them - nothing works. I can’t quite reduce anything on those two - it’s a black box pretty much. Any advice for a creative workaround?

Okay, I think I see the problem: pretty large embeddings, you could try if each model fits separately on your GPU and then just write their outputs into a .csv file and use these outputs to train your own model.

Yeah, that was my next approach.
Wouldn’t the file-base I/O be a huge bottleneck though?

you have to write your on Dataset class, I use pandas to load large .csv, it is fast enough for my use.

1 Like