Same loss patterns while training Convolutional Autoencoder

Gautam_Venkatraman · November 3, 2018, 5:39am

Hi,

I am a noob and am creating a model in PyTorch for the first time. I am trying to create a convolutional autoencode. No matter what optimizer or learning rate I use, I get the same loss pattern. The code I am using is:

class MyDataset(Dataset):
    def __init__(self, image_paths, target_paths, train=True):
        self.image_paths = image_paths
        self.target_paths = target_paths
        
    def transform(self, image, target):
        # Transform to tensor
        resize = transforms.Resize(size=(2350,1650))
        image = resize(image)
        target = resize(target)
        grayscale = transforms.Grayscale(1)
        image = grayscale(image)
        target = grayscale(target)
        image = TF.to_tensor(image)
        target = TF.to_tensor(target)
        return image, target

    def __getitem__(self, index):
        image = Image.open(self.image_paths[index])
        target = Image.open(self.target_paths[index])
        x, y = self.transform(image, target)
        return x, y

    def __len__(self):
        return len(self.image_paths)

traindata = MyDataset(image_paths=train_data, target_paths=target_data, train=True)
testdata = MyDataset(image_paths=test_data, target_paths=None, train=False)

train_loader = DataLoader(traindata, batch_size=1, shuffle=True, num_workers=4)
test_loader = DataLoader(testdata, batch_size=1, shuffle=False, num_workers=4)

class ConvolutionalAutoEncoder(nn.Module):
    def __init__(self):
        super(ConvolutionalAutoEncoder, self).__init__()
        self.encoder_block1 = nn.Sequential(
            nn.Conv2d(1, 64, 3, stride=1, padding=1),
            nn.Tanh(),
            nn.Conv2d(64, 64, 3, stride=1, padding=1),
            nn.Tanh()
        )
#         self.encoder_block1 = nn.DataParallel(self.encoder_block1)
        self.encoder_block2 = nn.Sequential(
            nn.Conv2d(64, 128, 3, stride=1, padding=1),
            nn.Tanh(),
            nn.Conv2d(128, 128, 3, stride=1, padding=1),
            nn.Tanh()
        )
#         self.encoder_block2 = nn.DataParallel(self.encoder_block2)
        self.encoder_block3 = nn.Sequential(
            nn.Conv2d(128, 128, 3, stride=1, padding=1),
            nn.Tanh(),
            nn.Conv2d(128, 128, 3, stride=1, padding=1),
            nn.Tanh(),
            nn.Conv2d(128, 128, 3, stride=1, padding=1),
            nn.Tanh()
        )
#         self.encoder_block3 = nn.DataParallel(self.encoder_block3)
        self.encoder_block4 = nn.Sequential(
            nn.Conv2d(128, 128, 3, stride=1, padding=1),
            nn.Tanh(),
            nn.Conv2d(128, 128, 3, stride=1, padding=1),
            nn.Tanh(),
            nn.Conv2d(128, 128, 3, stride=1, padding=1),
            nn.Tanh()
        )
#         self.encoder_block4 = nn.DataParallel(self.encoder_block4)
        self.decoder_block4 = nn.Sequential(            
            nn.ConvTranspose2d(128, 128, 3, stride=1, padding=1),
            nn.Tanh(),
            nn.ConvTranspose2d(128, 128, 3, stride=1, padding=1),
            nn.Tanh(),
            nn.ConvTranspose2d(128, 128, 3, stride=1, padding=1),
            nn.Tanh()
        )
#         self.decoder_block4 = nn.DataParallel(self.decoder_block4)
        self.decoder_block3 = nn.Sequential(            
            nn.ConvTranspose2d(128, 128, 3, stride=1, padding=1),
            nn.Tanh(),
            nn.ConvTranspose2d(128, 128, 3, stride=1, padding=1),
            nn.Tanh(),
            nn.ConvTranspose2d(128, 128, 3, stride=1, padding=1),
            nn.Tanh()
        )  
#         self.decoder_block3 = nn.DataParallel(self.decoder_block3)
        self.decoder_block2 = nn.Sequential(            
            nn.ConvTranspose2d(128, 128, 3, stride=1, padding=1),
            nn.Tanh(),
            nn.ConvTranspose2d(128, 128, 3, stride=1, padding=1),
            nn.Tanh()
        )
#         self.decoder_block2 = nn.DataParallel(self.decoder_block2)
        self.decoder_block1 = nn.Sequential(   
            nn.ConvTranspose2d(128, 64, 3, stride=1, padding=1),
            nn.Tanh(),
            nn.ConvTranspose2d(64, 64, 3, stride=1, padding=1),
            nn.Tanh()
         )
#         self.decoder_block1 = nn.DataParallel(self.decoder_block1)
        self.decoder_block0 = nn.Sequential(  
            nn.ConvTranspose2d(64, 1, 3, stride=1, padding=1),
            nn.Sigmoid()
        )
#         self.decoder_block0 = nn.DataParallel(self.decoder_block0)
    def forward(self, x):
        x1 = self.encoder_block1(x)
        x2 = self.encoder_block2(x1)
        x3 = self.encoder_block3(x2)
        x4 = self.encoder_block4(x3)
        
        y4 = self.decoder_block4(x4)
        y3 = self.decoder_block3(y4+x3)
        y2 = self.decoder_block2(y3+x2)
        y1 = self.decoder_block1(y2)
        y0 = self.decoder_block0(y1)
        return y0

device = torch.device("cuda:2" if torch.cuda.is_available() else "cpu")
print(device)

model = ConvolutionalAutoEncoder().to(device)
learning_rate = 0.001
weight_decay = 0.1
momentum = 0.9
# optimizer = optim.SGD(model.parameters(), lr=learning_rate, weight_decay=weight_decay, momentum=momentum)
optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)

params = list(model.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

num_epochs = 30
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, data in enumerate(train_loader):
        inp, targ = data
        inp = inp.to(device)
        targ = targ.to(device)

        output = model(inp)
        optimizer.zero_grad()
        loss = F.binary_cross_entropy(output, targ)

        loss.backward()
        optimizer.step()
        
        if i % 50 == 0:
            print("Epoch[{}/{}]({}/{}): Loss: {:.4f}".format(epoch+1,num_epochs, i, len(train_loader), loss.item()))

The loss patterns is:

Epoch[1/30](0/207): Loss: 0.9538
Epoch[1/30](50/207): Loss: 0.6362
Epoch[1/30](100/207): Loss: 0.6552
Epoch[1/30](150/207): Loss: 0.5803
Epoch[1/30](200/207): Loss: 0.3776
Epoch[2/30](0/207): Loss: 0.6031
Epoch[2/30](50/207): Loss: 0.4620
Epoch[2/30](100/207): Loss: 0.6617
Epoch[2/30](150/207): Loss: 0.6603
Epoch[2/30](200/207): Loss: 0.5564
Epoch[3/30](0/207): Loss: 0.6142
Epoch[3/30](50/207): Loss: 0.5175
Epoch[3/30](100/207): Loss: 1.0225
Epoch[3/30](150/207): Loss: 0.8263
Epoch[3/30](200/207): Loss: 0.6272
Epoch[4/30](0/207): Loss: 0.5497
Epoch[4/30](50/207): Loss: 0.5409
Epoch[4/30](100/207): Loss: 0.5438

Please help. Also, if possible also advice on how I can make my model deeper. I keep getting CUDA out of memory error.

Thanks.

ptrblck · November 3, 2018, 11:28am

You are using F.binary_cross_entropy_with_logits as your loss function (criterion is not used in the training loop). Since you are using a nn.Sigmoid layer for the output of your model, you should use F.binary_cross_entropy instead or remove the nn.Sigmoid layer and keep F.binary_cross_entropy_with_logits.

Also some minor side notes:

Variables are deprecated since 0.4.0. If you are using a newer PyTorch version, just remove all Variables.
Usually you just call .zero_grad() once. Currently you are calling it on the model and the optimizer. I would remove the model call, as the optimizer already has all model parameters and will zero out all gradients.

Gautam_Venkatraman · November 5, 2018, 4:59am

Hi,

Thank you for the quick reply.

I made the changes as you said but I am still getting a fluctuating loss pattern.

I have also updated my first post to reflect my latest architecture and code.

I am using PyTorch 0.4.1.post2.

ptrblck · November 5, 2018, 10:27am

The fluctuating loss behavior might come from your hyperparameters, not from a code bug.
Did the model architecture work in the past with your kind of data?
Your model is currently quite deep, so if you started right away with this kind of deep model, the behavior might be expected. I’m usually the fan of reusing a known good architecture or “growing” my model, i.e. I start with quite a shallow model until it converges, and add some layers to it as long as nothing breaks.

Also, a good Idea is to try out a few weight initialization schemes.
Here is a dummy code:

def weight_init(m):
    if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
        nn.init.xavier_uniform_(m.weight, gain=nn.init.calculate_gain('tanh'))
        nn.init.zeros_(m.bias)

model.apply(weight_init)

Another approach would be to scale down your data a bit and just use very few samples for training to see, if there are other code bugs.

Gautam_Venkatraman · November 16, 2018, 5:06am

Hi,

Sorry for the late reply. I was on holiday.

Yeah, I was also thinking that maybe its because of the model architecture. As long as its not a code bug, I think I can solve the problem.

Hmm…as for weight initialization, I thought PyTorch used xavier initialization as default?

Thanks.

ptrblck · November 16, 2018, 10:54am

Well, currently the default is to initialize the weights with kaiming_uniform.
However, you could use the code to calculate the gain based on the non-linearities you’ve used.