How does returning volatile output in forward function make a difference?

Sohil_Newa · February 24, 2018, 4:38am

I am currently experimenting with Freezeout Code of Andy Brocks.

I am trying to precompute the output of the layer. To completely remove the layer, I have set volatility of out as False and detach it. I am able to train the model .

if self.active and not test:
out = self.conv1(F.relu(self.bn1(x)))
out = self.conv2(F.relu(self.bn2(out)))
out = torch.cat((x, out), 1)
if(self.layer_index < DenseNet.freezeLayerIndex):
detach = out.detach()
detach.volatile = False
Bottleneck.out_saved[self.counter] = detach
self.counter+=1
if(self.counter >= self.maxCounter):
self.counter = 0
self.active = False

But when i used the trained model to validate on a test data , it throws argument 0 is not a variable error. My test function is below:

def test_fn(x, y):
output = net(V(x, volatile=True))
test_loss = F.nll_loss(output, V(y, volatile=True)).data[0]
    # Get the index of the max log-probability as the prediction.
    pred = output.data.max(1)[1].cpu()
    test_error = pred.ne(y).sum()

    return test_loss, test_error

I am not sure what is error since the input X I have sent is a variable.

ajbrock · February 25, 2018, 1:20pm

Brock* Can you post your full code? It’s not clear exactly what you’re passing from the forward() method, but just based on that snippet it looks like you might be passing a list somehow?

Sohil_Newa · February 25, 2018, 4:21pm

Sorry for the typo. Its a list where i am storing the output… Here’s the bottleneck code:

class Bottleneck(nn.Module):

out_saved = [0] * 900
def __init__(self, nChannels, growthRate,layer_index, train_size, test_size, batch_sz):
    super(Bottleneck, self).__init__()
    interChannels = 4*growthRate
    self.bn1 = nn.BatchNorm2d(nChannels)
    self.conv1 = nn.Conv2d(nChannels, interChannels, kernel_size=1,
                           bias=False)
    self.bn2 = nn.BatchNorm2d(interChannels)
    self.conv2 = nn.Conv2d(interChannels, growthRate, kernel_size=3,
                           padding=1, bias=False)

    # If the layer is still being trained
    self.active=True
    
    # The index of this layer relative to the overall net
    self.layer_index=layer_index

    #Change
    self.counter = 0
    self.train_size = train_size
    self.test_size = test_size
    self.batch_sz = batch_sz
    #self.outList = []
    remainder = train_size % batch_sz
    print("remainder = ", remainder)


    self.maxCounter = train_size//batch_sz

    print("self.maxCounter", self.maxCounter)

    if(remainder != 0):
        self.maxCounter += 1
        
    out_saved = [0] * (self.maxCounter)
    #print(out_saved)

def forward(self, x):
    test = DenseNet.test
    # If we're not training this layer, set to eval mode so that we use
    # running batchnorm stats (both for time-saving and to avoid updating
    # said stats).
    if not self.active:
        self.eval()
    # While Validation , Return the original Output 
    if test:
        #print test
        
        out = self.conv1(F.relu(self.bn1(x)))
        out = self.conv2(F.relu(self.bn2(out)))
        out = torch.cat((x, out), 1) 
        return out

    # If we're  active, return a detached output to prevent backprop.
    if self.active and  not test: 
        out = self.conv1(F.relu(self.bn1(x)))
        out = self.conv2(F.relu(self.bn2(out)))
    
        out = torch.cat((x, out), 1)

         #   print out
        if(self.layer_index < DenseNet.freezeLayerIndex):
            #Store The detach instead of output as we will be returning that.
            detach = out.detach()
            detach.volatile = False

            #print('Inside ',self.counter)
            Bottleneck.out_saved[self.counter] = detach
            self.counter+=1
            if(self.counter >= self.maxCounter):
                self.counter = 0
                self.active = False
            
        return out

    elif not test:
        detach = Bottleneck.out_saved[self.counter]
        self.counter+=1
        if(self.counter >= self.maxCounter):
            self.counter = 0
        return detach

ajbrock · February 26, 2018, 9:01am

So I see a couple of issues, the first being that on the lines where you say “Bottleneck.out_saved[self.counter]” that’s assigning the detached out value to a list which isn’t pegged to this layer, but as far as I can tell is pegged to a list which is shared across all Bottleneck instances? I think what you meant to do is use “self.out_saved[self.counter]”?

I also think there might be an issue with your flow control. You assign to out_saved while not testing but then you increment the counter, such that when you arrive at “elif not test” and try to pull detach from that out_saved[self.counter], I’m not sure that you’re grabbing the correct part of the list. It’s hard to tell from that snippet–I also assume you’re manually assigning “layer.test=True” somewhere in the training code, as opposed to using net.eval()?

BTW the easiest way to quickly debug this yourself would be to just insert a print statement before the line that throws the error and print the type of the output and its value. If your error says “not a Variable,” then what is the output? If it’s a list or a normal python float that could point to the issues I mention above.

Sohil_Newa · March 2, 2018, 12:59am

Thank you @ajbrock for the reply. My idea behind using a class variable( out_saved) that is shared between bottleneck layers was to only save the output of the last frozen layer(to make it as memory efficient as possible)
I am trying to directly compare your iteration wise training to my epoch wise training with pre-computation of output. As soon as we freeze the layer, the validation error is escalating… I will store the output in a file and analyze it if i am grabbing the correct part of the list and update you.

I just ran the code again and now it does not throw such error which is surprising for me.

Yes i have the test variable set to true such that while calculating validation error, it uses the original network.