Hi, I am working on a depth estimation project using NYU Depth V2 Datasets. The depth image in the datasets is 16 bit so i convert it to tensor and normalize it with following code
# transformation
comm_trans = transforms.Compose([
transforms.Resize((240, 320)),
transforms.CenterCrop((228, 304)),
transforms.RandomHorizontalFlip()
])
image_trans = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
depth_trans = transforms.Compose([
transforms.Resize((64, 80)),
transforms.ToTensor(),
transforms.Lambda(lambda x: x.float()),
transforms.Lambda(lambda x: torch.div(x, 65535.0)),
transforms.Normalize((0.5, ), (0.5, ))
])
image = image_trans(comm_trans(image))
depth = depth_trans(comm_trans(depth))
However, when training the network, i found that the loss of each batch is the same after one epoch. In other words, my network is not learning for some reasons.
Epoch: [1/2] Step [20/885] Loss: 0.8645
Epoch: [1/2] Step [40/885] Loss: 0.6820
Epoch: [1/2] Step [60/885] Loss: 0.4783
…
Epoch: [2/2] Step [20/885] Loss: 0.9024
Epoch: [2/2] Step [40/885] Loss: 0.6820
Epoch: [2/2] Step [60/885] Loss: 0.4783
…
Then, i remove the scaling and normalization for depth images and everything works after that.
transforms.Lambda(lambda x: torch.div(x, 65535.0))
transforms.Normalize((0.5, ), (0.5, ))
So i am wondering what cause the problem and how do i fix it.
This is the hyperparameters and codes for training if that helps.
# hyperparameter
batch_size = 32
learning_rate = 0.001
total_epoch = 2
report_rate = 20
# optimizer and loss function
criterion = losses.BerHuLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for epoch in range(total_epoch):
running_loss = 0.0
epoch_loss = 0.0
for i, (image, depth) in enumerate(loader):
image = image.to(device)
depth = depth.to(device)
# forward pass
outputs = model(image)
loss = criterion(outputs, depth)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Calculate loss
running_loss += loss.item()
epoch_loss += running_loss
I did not post my model architecture due to space limitation. You can take a look at it here: GitHub - Olament/DepthNet: Monocular depth estimation with CNN