I am currently experimenting with Freezeout Code of Andy Brocks.
I am trying to precompute the output of the layer. To completely remove the layer, I have set volatility of out as False and detach it. I am able to train the model .
if self.active and not test:
out = self.conv1(F.relu(self.bn1(x)))
out = self.conv2(F.relu(self.bn2(out)))
out = torch.cat((x, out), 1)
if(self.layer_index < DenseNet.freezeLayerIndex):
detach = out.detach()
detach.volatile = False
Bottleneck.out_saved[self.counter] = detach
self.counter+=1
if(self.counter >= self.maxCounter):
self.counter = 0
self.active = False
But when i used the trained model to validate on a test data , it throws argument 0 is not a variable error. My test function is below:
# Get the index of the max log-probability as the prediction.
pred = output.data.max(1)[1].cpu()
test_error = pred.ne(y).sum()
return test_loss, test_error
I am not sure what is error since the input X I have sent is a variable.
Brock* Can you post your full code? It’s not clear exactly what you’re passing from the forward() method, but just based on that snippet it looks like you might be passing a list somehow?
Sorry for the typo. Its a list where i am storing the output… Here’s the bottleneck code:
class Bottleneck(nn.Module):
out_saved = [0] * 900
def __init__(self, nChannels, growthRate,layer_index, train_size, test_size, batch_sz):
super(Bottleneck, self).__init__()
interChannels = 4*growthRate
self.bn1 = nn.BatchNorm2d(nChannels)
self.conv1 = nn.Conv2d(nChannels, interChannels, kernel_size=1,
bias=False)
self.bn2 = nn.BatchNorm2d(interChannels)
self.conv2 = nn.Conv2d(interChannels, growthRate, kernel_size=3,
padding=1, bias=False)
# If the layer is still being trained
self.active=True
# The index of this layer relative to the overall net
self.layer_index=layer_index
#Change
self.counter = 0
self.train_size = train_size
self.test_size = test_size
self.batch_sz = batch_sz
#self.outList = []
remainder = train_size % batch_sz
print("remainder = ", remainder)
self.maxCounter = train_size//batch_sz
print("self.maxCounter", self.maxCounter)
if(remainder != 0):
self.maxCounter += 1
out_saved = [0] * (self.maxCounter)
#print(out_saved)
def forward(self, x):
test = DenseNet.test
# If we're not training this layer, set to eval mode so that we use
# running batchnorm stats (both for time-saving and to avoid updating
# said stats).
if not self.active:
self.eval()
# While Validation , Return the original Output
if test:
#print test
out = self.conv1(F.relu(self.bn1(x)))
out = self.conv2(F.relu(self.bn2(out)))
out = torch.cat((x, out), 1)
return out
# If we're active, return a detached output to prevent backprop.
if self.active and not test:
out = self.conv1(F.relu(self.bn1(x)))
out = self.conv2(F.relu(self.bn2(out)))
out = torch.cat((x, out), 1)
# print out
if(self.layer_index < DenseNet.freezeLayerIndex):
#Store The detach instead of output as we will be returning that.
detach = out.detach()
detach.volatile = False
#print('Inside ',self.counter)
Bottleneck.out_saved[self.counter] = detach
self.counter+=1
if(self.counter >= self.maxCounter):
self.counter = 0
self.active = False
return out
elif not test:
detach = Bottleneck.out_saved[self.counter]
self.counter+=1
if(self.counter >= self.maxCounter):
self.counter = 0
return detach
So I see a couple of issues, the first being that on the lines where you say “Bottleneck.out_saved[self.counter]” that’s assigning the detached out value to a list which isn’t pegged to this layer, but as far as I can tell is pegged to a list which is shared across all Bottleneck instances? I think what you meant to do is use “self.out_saved[self.counter]”?
I also think there might be an issue with your flow control. You assign to out_saved while not testing but then you increment the counter, such that when you arrive at “elif not test” and try to pull detach from that out_saved[self.counter], I’m not sure that you’re grabbing the correct part of the list. It’s hard to tell from that snippet–I also assume you’re manually assigning “layer.test=True” somewhere in the training code, as opposed to using net.eval()?
BTW the easiest way to quickly debug this yourself would be to just insert a print statement before the line that throws the error and print the type of the output and its value. If your error says “not a Variable,” then what is the output? If it’s a list or a normal python float that could point to the issues I mention above.
Thank you @ajbrock for the reply. My idea behind using a class variable( out_saved) that is shared between bottleneck layers was to only save the output of the last frozen layer(to make it as memory efficient as possible)
I am trying to directly compare your iteration wise training to my epoch wise training with pre-computation of output. As soon as we freeze the layer, the validation error is escalating… I will store the output in a file and analyze it if i am grabbing the correct part of the list and update you.
I just ran the code again and now it does not throw such error which is surprising for me.
Yes i have the test variable set to true such that while calculating validation error, it uses the original network.