Out Of Memory error After 15 Epochs on Wideresnet

nbansal90 · May 13, 2018, 11:03pm

Getting Generic Out of Memory Error, after 15 epcohs, Seems like Some variable is accumulating over time, and then I am getting this error. My Program was working fine in earlier version of Pytorch I Upgraded to version 0.4.0, Where the volatile key word is deprecated.

So I made following changes to my Code:

def trans_func(x):
    with torch.no_grad():
        return (Variable(x.unsqueeze(0)))

which is being used here:

        transform_train = transforms.Compose([
                transforms.ToTensor(),
                transforms.Lambda(lambda x: F.pad(trans_func(x),(4,4,4,4),mode='reflect').data.squeeze()),

which was earlier:

        transform_train = transforms.Compose([
                transforms.ToTensor(),
                transforms.Lambda(lambda x: F.pad(Variable(x.unsqueeze(0), require_grad=False, volatile=True),(4,4,4,4),mode='reflect').data.squeeze()),

But Now I am getting CUDA_MEMORY_ERROR after 15 epcohs. I am deleting all temp variable after each epoch and doing torch.empty.cache too, but to no avail. Am i missing something over here.

Thanks!

ptrblck · May 14, 2018, 8:57am

I’m not sure, if torch.no_grad() is in the right place.
As you can see in the source code the previous grad property will be reset once the __exit__ method is called.
Since you are returning from your function, the grad will be enabled, if it was enabled before after the transformation is done.

Have a look at this small example:

def trans_func(x):
    print('Before', x.requires_grad)
    with torch.no_grad():
        print('After ', x.requires_grad)
        c = x * 2
        print('New calculation ', c.requires_grad)
        return c

x = torch.randn(1, 2, 3, requires_grad=True)
print('Grad enabled ', torch.is_grad_enabled())
c = trans_func(x)
c.requires_grad
print('Grad enabled ', torch.is_grad_enabled())

I’m not sure, if this is, what you would like to have.
Could you explain your use case a bit?

nbansal90 · May 16, 2018, 1:35am

Thanks @ptrblck! I get your point! And I am not doing this correctly! I will explain myself again.

I have older code, which has this code:

 transform_train = transforms.Compose([
                transforms.ToTensor(),
                transforms.Lambda(lambda x: F.pad(Variable(x.unsqueeze(0), require_grad=False, volatile=True),(4,4,4,4),mode='reflect').data.squeeze()),

Where we have this line of code:
Variable(x.unsqueeze(0), require_grad=False, volatile=True)

Now Since Volatile is Deprecated now, I was wondering what changes should I make to the existing code, to still make it work. Since it is replaced by with torch.no_grad(), I was trying out that.

Now As rightly pointed by you, The way I have written it, would not work. So how should I go about implementing this.
Regards,
Nitin

ptrblck · May 16, 2018, 7:53pm

You should wrap your whole evaluation / test code in to torch.no_grad().
Have a look at the MNIST example.

Would this work for you?

nbansal90 · June 7, 2018, 10:44pm

Thanks @ptrblck! I tried the way you have suggested!
It helped!