Memory issue at validation test

I’m having a problem where my network gets through the first epoch of the training set, no problem. But when it tries to calculate the validation or test accuracy, I get an ‘out of memory’ error.

I made a modification to the LSTMCell function in nn/_functions/, with the aim of doing a little clustering on the activations within a single node based on each node’s other activations within a batch.

    def LSTMCell(input, hidden, w_ih, w_hh,count, b_ih=None, b_hh=None):    
        hx, cx = hidden
        gates = F.linear(input, w_ih, b_ih) + F.linear(hx, w_hh, b_hh)

        ingate, forgetgate, cellgate, outgate, sensitivitygate = gates.chunk(5, 1)            
        batch_size = cellgate.size(0) 
        a = F.relu(sensitivitygate)                              
        a = torch.unsqueeze(a,2)
        a = a.expand(-1, -1, 3*batch_size)       
        ingate = F.sigmoid(ingate)
        forgetgate = F.sigmoid(forgetgate)          
        cellgate = F.tanh(cellgate)
        outgate = F.sigmoid(outgate)   
        cy = (forgetgate * cx) + (ingate * cellgate)
        position = F.tanh(cy)
        position =,hx,cx),0)
        position = torch.unsqueeze(position,2)
        position = position.expand(-1,-1, 3*batch_size)
        distance = (position - torch.transpose(position,dim0=0,dim1=2))**2
        distance = 1/(1+a*distance[0:batch_size,:,:])
        weighted_avg = torch.sum((distance*position[0:batch_size,:,:]),2)  
        weighted_avg = weighted_avg/torch.sum(distance,2)

        hy = outgate *  weighted_avg   
        return hy, cy

The error occurs in the distance = (position - torch.transpose(position,dim0=0,dim1=2))**2 line.
In this line I subtract a transposed matrix from the original (I transposed the batch_size dimension, and a 3rd dimension that I added, to stretch the square into a cube), to find the distance between each activation and each other activation in that batch, for that neuron.
I can imagine that this line is very demanding, but the network seems to be able to manage it for the training data.

Specific Error:
RuntimeError: cuda runtime error (2) : out of memory at c:\anaconda2\conda-bld\pytorch_1519496000060\work\torch\lib\thc\generic/``

Please help me figure out why I’m getting memory issues when the network gets to the validation set, and how I can resolve this!
It would be enormously appreciated!

Could you post your training and validation code?
Do you use the same batch size for training and validation?

Hi ptrblck,
Thank you so much for your response!

Admittedly, I just use a network that I found online (‘pytorch-sentiment-classification-master’) so I can’t be entirely sure. But I do believe that Batch_Size is the same for both training and validation.

(Worth noting, is that everything is fine when I do not implement any changes to the LSTMCell, and I can do other modifications such as adding extra layers with no issue. It would appear that something about subtracting a 3 dimensional matrix from a transpose of itself, and squaring that output, makes this specific modification take up a tremendous amount of memory, but for some reason, it only overwhelms the GPU during validation and testing.)

Below is more a extensive copy of the code for training, validation and testing:

    train, dev, test = data.TabularDataset.splits(path='./SST2/', train='train.tsv',
                                                  validation='dev.tsv', test='test.tsv', format='tsv',
                                                  fields=[('text', text_field), ('label', label_field)]) 

    train_iter, dev_iter, test_iter = data.BucketIterator.splits((train, dev, test),
                 batch_sizes=(batch_size, len(dev), len(test)), sort_key=lambda x: len(x.text), repeat=False, device=None)
label_field = data.Field(sequential=False)
train_iter, dev_iter, test_iter = load_sst(text_field, label_field, BATCH_SIZE, Difficulty = Difficulty, cuda = cuda)
model = Using(embedding_dim=EMBEDDING_DIM, hidden_dim=HIDDEN_DIM, vocab_size=len(text_field.vocab),
                       use_gpu=cuda, label_size=len(label_field.vocab)-1, batch_size=BATCH_SIZE)

for epoch in range(EPOCHS):
    avg_loss, acc = train_epoch_progress(model, train_iter, loss_function, optimizer, text_field, label_field, epoch)
    tqdm.write('Train: loss %.2f acc %.1f' % (avg_loss, acc*100))
    dev_acc = evaluate(model, dev_iter, loss_function, 'Dev')
    if dev_acc > best_dev_acc:
        if best_dev_acc > 0:
            os.system('rm '+ out_dir + '/best_model' + '.pth')
        best_dev_acc = dev_acc
        best_model = model, out_dir + '/best_model' + '.pth')
        # evaluate on test with the best dev performance model
        test_acc = evaluate(best_model, test_iter, loss_function, 'Test')
test_acc = evaluate(best_model, test_iter, loss_function, 'Final Test')

Thanks again!