VGG16 gives always the same output values

Hi everyone!
I work on some regression problem. I use a little bit modified VGG16 architecture (one extra conv2d at the beginning and one extra linear layer at the end). Labels and expected outputs are between <0,99>. The problem is that after every training iteration, during the validation network predicts same values for every input (e.g. output values: [12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51] when input labels: [1, 15, 3, 67, 3, 66, 14, 34]). The values in output change with every epoch but always are same for every input.

My modified network:

VGG(
(features): Sequential(
(0): Conv2d(256, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU(inplace=True)
(5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU(inplace=True)
(10): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(11): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(12): ReLU(inplace=True)
(13): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(14): ReLU(inplace=True)
(15): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(16): ReLU(inplace=True)
(17): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(18): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(19): ReLU(inplace=True)
(20): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(21): ReLU(inplace=True)
(22): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(23): ReLU(inplace=True)
(24): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(26): ReLU(inplace=True)
(27): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(28): ReLU(inplace=True)
(29): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(30): ReLU(inplace=True)
(31): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=1000, bias=True)
(7): Linear(in_features=1000, out_features=1, bias=True)
)
)

Any ideas?

Am I right in assuming this is a classification problem you are tackling? If so, you will need to have 100 neurons in the final linear layer since your expected labels are [0, 99].

Actually I wish to do it with regression. The numbers in labels are integers but they don’t have to. They are like house prices or age. But even with 100 neurons in final layer the problem is the same. As a loss function I use mse_loss for regression and cross_entropy for classification (but I would rather to do regression). Any ideas why that happens?

I’m not exactly sure why that might happen. However, what you could try is,

  1. Verify that the loss decreases during training.
  2. Normalizing or Standardizing the data.
  3. Does it give the same value if you feed in the training data as well? If it does, it may be an error in your code or the network is getting “stuck” at a point.

Could you post a snippet of the training/validation loop as well?

Basically that’s it (I made it simpler but the sense stayed unchanged):

def train(train_loader, model, optimizer, device):
model.train()
with tqdm(train_loader) as _tqdm:
    for x, y in _tqdm:
        x = x.to(device)
        y = y.to(device)
        outputs = model(x, y)
        loss = criterion(outputs, y)
        cur_loss = loss.item()
        sample_num = x.size(0)
        loss_monitor.update(cur_loss, sample_num)
        loss = loss_monitor.avg
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

return loss


def validate(val_loader, model, epoch, device):
model.eval()
with torch.no_grad():
    with tqdm(val_loader) as _tqdm:
        for x, y in _tqdm:
            x = x.to(device)
            outputs = model(x)
            # checking and saving output to calculate mae later
# I calculating mae ... works fine
return mae


def main():
start_epoch = 0
device = "cuda:0" if torch.cuda.is_available() else "cpu"
if device == torch.device("cuda:0"):
    cudnn.benchmark = True

val_dataset = LoadDataset("val")  # my own function to loading dataset - works fine
val_loader = DataLoader(
    val_dataset, batch_size=24, shuffle=False, num_workers=0
)
train_dataset = LoadDataset("train")
train_loader = DataLoader(
    train_dataset, batch_size=24, shuffle=False, num_workers=0
)

model = create_vgg_16()  # my own function which make a model based on vgg16- works fine
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
model = model.to(device)

best_val_mae = 100
num_epoch = 200

for epoch in range(start_epoch, num_epoch):

    train_loss = train(train_loader, model, optimizer, device)
    mae = validate(val_loader, model, epoch, device)

    # checkpoint
    if mae < best_val_mae:
        # saving model
        best_val_mae = mae
  1. Looks like loss decreases normally.
  2. I did it.
  3. When during validation I use data from training problem is the same :frowning:
    What do you mean by “stucking”?

I used smaller learning_rate and network doesn’t “stuck”, but values are very random. Does anyone have any ideas? Does my main loop look fine?