Why Testing a CNN is taking a lot of memory?

nas · April 18, 2020, 10:58pm

Hi,

I am training a very simple CNN. While training everything goes well, but when it comes to testing I am getting Runtime error for not having enough memory! I am new in using PyTorch! I am posting the testing portion of the code below, any help will be highly appreciated!

loss_function = nn.MSELoss()
net.eval()
Variable_store_3 = np.empty((0,1)) 
Variable_store_1 = []  
for epoch in range(1):
    running_loss = 0
    Variable_store_2 = []
    for batch in test_dataloader:
        
        x, y = batch
        outputs = net(x)
        Variable_store_1.extend(outputs)   #for plotting purposes
        Variable_store_2 .extend(y)            #for plotting purposes
        outputs_np = outputs.detach().numpy()
        Variable_store_3 = np.append(Variable_store_3, outputs_np, axis = 0)
        loss = loss_function(outputs, y)
        running_loss += loss.item() * 128  #Batch_size
        
    final_loss = math.sqrt((running_loss / len(test_dataset)))
    print(f"{epoch+1} epoch | testing loss = {final_loss}")

ptrblck · April 18, 2020, 11:58pm

Could you wrap your testing code into a with torch.no_grad() block?
This would avoid storing the intermediate tensors, which would be needed to calculate the gradients in the backward pass.

Also, you might want to define a training and validation function, if that’s not already the case.
Since Python uses function scoping, some variables from the training might be freed additionally.

simaiden · April 19, 2020, 12:01am

Hoy much size is your data? I think that’s happen because you are loading all the outputs to Variable_store_3, and all this data can’t fit in memory. Also try using torch.no_grad(), this means taht you are not saving the gradients to use backprop (because you are not training)

for batch in test_dataloader:
    with torch.no_grad():
        x, y = batch
        outputs = net(x)
        Variable_store_1.extend(outputs)   #for plotting purposes
        Variable_store_2 .extend(y)            #for plotting purposes
        outputs_np = outputs.detach().numpy()
        Variable_store_3 = np.append(Variable_store_3, outputs_np, axis = 0)
        loss = loss_function(outputs, y)
        running_loss += loss.item() * 128  #Batch_size

nas · April 19, 2020, 1:08am

Thank you very much for replying and for the solution.

As you have advised, I have wrapped my testing code with torch.no_grad(). I just did a small training and with that the testing. And it seems the problem is solved! But, now I will be doing longer training and then testing and will post the update here.

Again, thank you, very much

nas · April 19, 2020, 1:12am

Thank you for your reply and for the solution.

The size of my testing data was 6080000.
Then downsampled into 23850.
and torch.Size([23750, 100, 1]) was this.