Memory issue at validation test

I’m having a problem where my network gets through the first epoch of the training set, no problem. But when it tries to calculate the validation or test accuracy, I get an ‘out of memory’ error.

I made a modification to the LSTMCell function in nn/_functions/rnn.py, with the aim of doing a little clustering on the activations within a single node based on each node’s other activations within a batch.

    def LSTMCell(input, hidden, w_ih, w_hh,count, b_ih=None, b_hh=None):    
        hx, cx = hidden
        gates = F.linear(input, w_ih, b_ih) + F.linear(hx, w_hh, b_hh)

        ingate, forgetgate, cellgate, outgate, sensitivitygate = gates.chunk(5, 1)            
        batch_size = cellgate.size(0) 
        a = F.relu(sensitivitygate)                              
        a = torch.unsqueeze(a,2)
        a = a.expand(-1, -1, 3*batch_size)       
        ingate = F.sigmoid(ingate)
        forgetgate = F.sigmoid(forgetgate)          
        cellgate = F.tanh(cellgate)
        outgate = F.sigmoid(outgate)   
        cy = (forgetgate * cx) + (ingate * cellgate)
        
        position = F.tanh(cy)
        position = torch.cat((position,hx,cx),0)
        position = torch.unsqueeze(position,2)
        position = position.expand(-1,-1, 3*batch_size)
        distance = (position - torch.transpose(position,dim0=0,dim1=2))**2
        distance = 1/(1+a*distance[0:batch_size,:,:])
        weighted_avg = torch.sum((distance*position[0:batch_size,:,:]),2)  
        weighted_avg = weighted_avg/torch.sum(distance,2)

        hy = outgate *  weighted_avg   
        return hy, cy

The error occurs in the distance = (position - torch.transpose(position,dim0=0,dim1=2))**2 line.
In this line I subtract a transposed matrix from the original (I transposed the batch_size dimension, and a 3rd dimension that I added, to stretch the square into a cube), to find the distance between each activation and each other activation in that batch, for that neuron.
I can imagine that this line is very demanding, but the network seems to be able to manage it for the training data.

Specific Error:
RuntimeError: cuda runtime error (2) : out of memory at c:\anaconda2\conda-bld\pytorch_1519496000060\work\torch\lib\thc\generic/THCStorage.cu:58``

Please help me figure out why I’m getting memory issues when the network gets to the validation set, and how I can resolve this!
It would be enormously appreciated!
Thanks!!
:slight_smile:

Could you post your training and validation code?
Do you use the same batch size for training and validation?

Hi ptrblck,
Thank you so much for your response!

Admittedly, I just use a network that I found online (‘pytorch-sentiment-classification-master’) so I can’t be entirely sure. But I do believe that Batch_Size is the same for both training and validation.

(Worth noting, is that everything is fine when I do not implement any changes to the LSTMCell, and I can do other modifications such as adding extra layers with no issue. It would appear that something about subtracting a 3 dimensional matrix from a transpose of itself, and squaring that output, makes this specific modification take up a tremendous amount of memory, but for some reason, it only overwhelms the GPU during validation and testing.)

Below is more a extensive copy of the code for training, validation and testing:

    train, dev, test = data.TabularDataset.splits(path='./SST2/', train='train.tsv',
                                                  validation='dev.tsv', test='test.tsv', format='tsv',
                                                  fields=[('text', text_field), ('label', label_field)]) 

    train_iter, dev_iter, test_iter = data.BucketIterator.splits((train, dev, test),
                 batch_sizes=(batch_size, len(dev), len(test)), sort_key=lambda x: len(x.text), repeat=False, device=None)
label_field = data.Field(sequential=False)
train_iter, dev_iter, test_iter = load_sst(text_field, label_field, BATCH_SIZE, Difficulty = Difficulty, cuda = cuda)
model = Using(embedding_dim=EMBEDDING_DIM, hidden_dim=HIDDEN_DIM, vocab_size=len(text_field.vocab),
                       use_gpu=cuda, label_size=len(label_field.vocab)-1, batch_size=BATCH_SIZE)

for epoch in range(EPOCHS):
    avg_loss, acc = train_epoch_progress(model, train_iter, loss_function, optimizer, text_field, label_field, epoch)
    tqdm.write('Train: loss %.2f acc %.1f' % (avg_loss, acc*100))
    dev_acc = evaluate(model, dev_iter, loss_function, 'Dev')
    if dev_acc > best_dev_acc:
        if best_dev_acc > 0:
            os.system('rm '+ out_dir + '/best_model' + '.pth')
        best_dev_acc = dev_acc
        best_model = model
        torch.save(best_model.state_dict(), out_dir + '/best_model' + '.pth')
        # evaluate on test with the best dev performance model
        test_acc = evaluate(best_model, test_iter, loss_function, 'Test')
test_acc = evaluate(best_model, test_iter, loss_function, 'Final Test')

Thanks again!