New to pytorch: trouble getting model to train

I’m trying to apply a modified alexnet to a compressed sensing task

My training data are grayscale 250x250 input images paired with 10000 element output vectors, stored as jpgs and numpy arrays respectively. Here is the modified alexnet class:

class AlexNet(nn.Module):
    def __init__(self,D_out=10000): #added D_out here 
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 64, kernel_size=11, stride=4, padding=2), #changed to 1 channel input image
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, D_out), #changed to D_out=10000 instead of the 1000 categories of vgg
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), 256 * 6 * 6)
        x = self.classifier(x)
        return x

I determined to minimize the Euclidean loss with Adam:

model = AlexNet()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)


k=30 #size of batch
N = 50 #number epochs
train_loader = DataLoader(train_data, batch_size=k, shuffle=True) #data loader
losses = [] #track the losses 
for epoch in range(N): 
    for i,(inputs,targets) in enumerate(train_loader): 
        #prepare batch
        inputs,targets = Variable(inputs), Variable(targets,requires_grad=False)
        
        #zero gradients 
        optimizer.zero_grad()
        
        #calculate model prediction
        outputs = model(inputs)
        
        #calculate loss
        loss = criterion(outputs,targets)
        
        #backpropagate loss 
        loss.backward()
        optimizer.step()

        #examine losses
        print(loss.data[0])
        losses.append(loss.data[0])
      

The loss doesn’t decrease. Any suggestions as to why? Any input is appreciated !

Is your loss not decreasing at all?
If so, you should lower your learning rate, e.g. by dividing by 10 (1e-3, 1e-4, 1e-5).
Also if that doesn’t help, try to normalize your input, e.g. with transforms.Normalize.

As a small advice, you should create a Dataset and DataLoader to handle to data loading, pre-processing etc.
You can find a tutorial here.

Thanks ptrblck. I’m working on a dataset class right now. Loss does not seem to decrease at all. I will implement transforms.Normalize in the dataset class.

This is for cell detection on a noisy background: Just to get the thing rolling I made fake training images like this

im-0954

The background is always the same, so I’d be surprised if normalizing helped

So you are using the cat in every image and just mask randomly these “cells”?
The output dimension is a prediction for each pixel, if it’s a “cell” or the background?

Yep, same cat with different masked positions. The output dimension is hoped to be a prediction of a compressed representation of that cell-or-background (sparse) boolean mask you refer to. I’m trying to copy this paper . I have the encoding/decoding by projection part going-- now I just need the learning

Ok, I skimmed the paper, but I don’t really understand all aspects.
You feed the output of the model into the “sensing matrix” and somehow get the cell positions?
This part is working already, right?

Does the output of the model have any boundaries, e.g. [-1, 1], so that we could apply a tanh on the last layer?
Since it’s a Linear layer, the values might grow quite big.

The model output along with the sensing matrix constrain an optimization problem which yields the cell positions, and this part works.

Inspecting a few target output vectors, the values are typically between -100 and 100, and I confirmed the decompression scheme fails with normalized vectors, so there’s no help with the tanh : (

I edited the code to reflect changes. The network still doesn’t seem to train.

Any other input is appreciated ! Here’s a plot of loss versus iteration