Memory error while training a variable sequence length LSTM

Venkatesh_Ragavan · May 31, 2020, 6:33am

CUDA out of memory. Tried to allocate 17179869176.57 GiB (GPU 0; 15.90 GiB total capacity; 8.57 GiB already allocated; 6.67 GiB free; 8.58 GiB reserved in total by PyTorch)

I am working with a text dataset with 50 to 60 data points. Each sequence has about 200K tokens on an average. The maximum length sequence has about 500K tokens. GPU Memory is about 16 GB. Hence, it’s throwing a memory error. Any suggestions on how to circumvent this issue?

ptrblck · June 1, 2020, 3:58am

is quite large, so that it won’t fit on the device
Could you post the complete shape of your inputs?

If this memory footprint is expected and is not raised by a bug, then you would have to reduce the input shape and model significantly.

Venkatesh_Ragavan · June 1, 2020, 6:00am

Yeah .

class Model(nn.Module):
    def __init__(self,vocab_len,input_size):
        super(Model,self).__init__()
        self.embed=nn.Embedding(num_embeddings=vocab_len,embedding_dim=128,)
        self.lstm=nn.LSTM(input_size=input_size,hidden_size=1024,num_layers=1)
        self.fc=nn.Linear(in_features=1024,out_features=1)
    def forward(self,x):
        x=self.embed(x)
        o,(ht,ct)=self.lstm(x)
        out=self.fc(ct)
        return nn.Sigmoid(out)

This is the model.
The input shape is (2,560874)[bs,seq_len).
Am I making a blinder?

ptrblck · June 1, 2020, 7:13am

The input tensor itself will just use ~4.2MB and the linear layer is also quite small.
How large is vocal_len and is input_size=560874?

Venkatesh_Ragavan · June 1, 2020, 7:53am

Yes. The vocab_len is 513

ptrblck · June 1, 2020, 8:08am

Thanks for the information.
It seems the nn.LSTM might use too much memory for your device, as it would need ~9GB.
However, the error message is wrong and yields a wrong number. We’ve recently fixed a similar issue for convolutions, so I’ll check it with the latest master build.

Venkatesh_Ragavan · June 2, 2020, 4:15pm

I figured it out. The text preprocessing was improper. Moving on to my next question.

class Model(nn.Module):
    def __init__(self,emb_sz,hidden_sz,num_c):
        super(Model,self).__init__()
        self.embed=nn.Embedding(num_embeddings=len(TEXT.vocab),embedding_dim=emb_sz,)
        self.lstm=nn.LSTM(input_size=emb_sz,hidden_size=hidden_sz,num_layers=1)
        self.fc=nn.Linear(in_features=hidden_sz,out_features=num_c)
    def forward(self,x):
        #print(x.size())
        x=x.permute(1,0)
        x=self.embed(x)
        #print(f'After embedding size {x.size()}')
        o,_=self.lstm(x)
        shape=o.size()
        o=o[:,-1,:]
        #o=o.view(shape[0],-1)
        #print(f'After LSTM size {o.size()}')
        out=self.fc(o)
        #print(f'After FC size {out.size()}')
        return F.sigmoid(out)

This my model.

def train(model,loss,opt,num_epochs,train_dl,valid_dl):
    model=model.cuda()
    for i in tqdm(range(num_epochs)):
        print('*'*5+f'Training epoch {i+1}'+'*'*5)
        for (x,y) in train_dl:
            model.train()
            x=Variable(x).cuda()
            y=torch.FloatTensor(y)
            y=Variable(y).cuda()
            pred=model(x)
            error=loss(pred,y)
            opt.zero_grad()
            error.backward()
            opt.step()
        for (x1,y1) in valid_dl:
            with torch.no_grad():
                x1=x1.cuda()
                y1=y1.cuda()
                model.eval()
                pred1=model(x1)
                error1=loss(pred1,y1)
            #print(pred1.size())
                preds=list(map(prediction,pred1))
                accuracy= accuracy_score(y1.cpu(),preds)
                print(f'Valid loss: {error1}\n Accuracy Score: {accuracy}')
        torch.save(model.state_dict,f'model{i+1}.pth')

This is my training loop.

I am getting a “CUDA out of memory. Tried to allocate 4.43 GiB (GPU 0; 15.90 GiB total capacity; 6.42 GiB already allocated; 4.42 GiB free; 10.83 GiB reserved in total by PyTorch) (malloc at /opt/conda/conda-bld/pytorch_1587428398394/work/c10/cuda/CUDACachingAllocator.cpp:289)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7f14c4533b5e in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)” Error after running about 7 epochs.
Any insights would be appreciated…