Hi there,
I have a simple RNN model, that has another pre-trained RNN for an encoder and a pretty simple decoder with attention.
I runs with batch=1, but anything bigger than that fails.
I tried to use nn.DataParallell, but it consistently hangs (PyTorch 0.4) (note: I can never get all GPUs fully free - usually someone is running stuff too on the cluster)
I can’t really reduce my model parameters further due to the pre-trained tensor dimensions.
Any advice on what else can be done? Thanks!
How many hidden_layers
, hidden_neurons
, seq_length
do you have? I assume that the tensors
are torch.Float64
?
Only 1 layer, the hidden size is 4800, that’s the one I can’t reduce (actually I tried to go to 2400 - same thing), seq_length = 36.
Yes, floats.
The problem is caused by the other pre-trained model - the moment I take it out, I can pass even larger vector (let’s say random or generated by some other Embedding), but I need both of them in one.
I tried to use pre-trained model to pre-encode everything, but this would take about 2,000 hours, which is not feasible… so they have to sit together
So you are having 64bit (float), seq_length=36, hidden_size=4800, num_layers=1, batch_size=(lets say 4)
mem_needed = 64 * 36 * 4800 * 1 * 4 = 44236800bit = 0.0442368GigaBit = 5.5296MegaByte of total memory just for the RNN
If I did the math right, you should not have any problems…mhhhh.
What size are u passing into the other RNN?
I hear you and agree. This si what’s causing me troubles:
model1(
(embedding): Embedding(43860, 620, padding_idx=0)
(rnn): GRU(620, 2400, batch_first=True, dropout=0.25)
)
model2(
(embedding): Embedding(43860, 620, padding_idx=0)
(rnn): GRU(620, 1200, batch_first=True, dropout=0.25, bidirectional=True)
)
models 1& 2 are pre-trained, and even though I set grad = False
the moment I add them - nothing works. I can’t quite reduce anything on those two - it’s a black box pretty much. Any advice for a creative workaround?
Okay, I think I see the problem: pretty large embeddings, you could try if each model fits separately on your GPU and then just write their outputs into a .csv file and use these outputs to train your own model.
Yeah, that was my next approach.
Wouldn’t the file-base I/O be a huge bottleneck though?
you have to write your on Dataset
class, I use pandas
to load large .csv, it is fast enough for my use.