Implementing Squeezenet


I am trying to implement Squeezenet and train it on Cifar10 data, I have got the code ready but there seems to be some problem, my training set accuracy never increases though the loss function graph makes sense.

In Squeezenet, fire module require us to concatenate 1x1 Convolution and 3x3 convolution, to achieve this I have used function? Below is the code for fire module, I want to know if its right?

class fire(nn.Module):
def __init__(self, inplanes, squeeze_planes, expand_planes):
    super(fire, self).__init__()
    self.conv1 = nn.Conv2d(inplanes, squeeze_planes, kernel_size=1, stride=1)
    self.relu1 = nn.ReLU(inplace=True)
    self.conv2 = nn.Conv2d(squeeze_planes, expand_planes, kernel_size=1, stride=1)
    self.conv3 = nn.Conv2d(squeeze_planes, expand_planes, kernel_size=3, stride=1, padding=1)
    self.relu2 = nn.ReLU(inplace=True)

    # using MSR initilization
    for m in self.modules():
        if isinstance(m, nn.Conv2d):
            n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
  , math.sqrt(2./n))

def forward(self, x):
    x = self.conv1(x)
    x = self.relu1(x)
    out1 = self.conv2(x)
    out2 = self.conv3(x)
    out =[out1, out2], 1)
    out = self.relu2(out)
    return out

The model definition looks correct to me, so it’s probably some other bug.


Now I am trying to use 55 epoch learning rule used by @Soumith in his imagenet-multiGPU code, but I am facing a weird issue, it is giving me segfault, but when I prepare optimizer by choosing a static learning rate it runs fine?

Updated code:

I think the learning rate itself was the issue, it must be doing division by zero somewhere, I changed the learning rate and now it seems to be working fine.

No, segfaults are never caused by zero division. If you get a small repro script I can fix it.

The script that I used for training is:

def paramsforepoch(epoch):
    p = dict()
    regimes = [[1, 18, 1e-3, 5e-4],
               [19, 29, 5e-3, 5e-4],
               [30, 43, 1e-3, 0],
               [44, 52, 5e-4, 0],
               [53, 1e8, 1e-4, 0]]
    for i, row in enumerate(regimes):
        if epoch >= row[0] and epoch <= row[1]:
            p['learning_rate'] = row[2]
            p['weight_decay'] = row[3]
    return p

avg_loss = list()
fig1, ax1 = plt.subplots()
fig2, ax2 = plt.subplots()
# train the model
# TODO: Compute training accuracy and test accuracy
# TODO: train it on some data and see if it overfits.
# TODO: train the data on final model

# create a temporary optimizer
optimizer = optim.SGD(net.parameters(), lr=args.learning_rate, momentum=args.momentum, weight_decay=0.0005)

def adjustlrwd(p):
    for param_group in optimizer.state_dict()['param_groups']:
        param_group['lr'] = p['learning_rate']
        param_group['weight_decay'] = p['weight_decay']

# train the network
def train(epoch):

    # set the optimizer for this epoch
    if epoch > 0 or epoch > 18 or epoch > 29 or epoch > 43 or epoch > 52:
        p = paramsforepoch(epoch)
        print("Configuring optimizer with lr={:.3f} and weight_decay={:.3f}".format(p['learning_rate'], p['weight_decay']))

    global avg_loss
    correct = 0
    for b_idx, (data, targets) in enumerate(train_loader):
        # trying to overfit a small data
        if b_idx == 100:

        if args.cuda:
            data.cuda(), targets.cuda()
        # convert the data and targets into Variable and cuda form
        data, targets = Variable(data), Variable(targets)

        # train the network
        scores = net.forward(data)
        loss = F.nll_loss(scores, targets)

        # compute the accuracy
        pred =[1] # get the index of the max log-probability
        correct += pred.eq(


        if b_idx % args.log_schedule == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, (b_idx+1) * len(data), len(train_loader.dataset),
                100. * (b_idx+1) / 100,[0]))
            # also plot the loss, it should go down exponentially at some point

    # now that the epoch is completed plot the accuracy
    accuracy = correct / 6400.0
    print("training accuracy ({:.2f}%)".format(100*accuracy))