Vgg with or without batch norm seems exrremely different

Kevinkevin189 · January 10, 2019, 4:44am

When I’m coding on a binary segmentation task, background as zero and foreground as one.I choose vgg16 with or without batch normalization as my network backbone.Both are pre trained on Imagenet. And I use 1 image to overfit this network to test its performance.The result is weird and I can’t figure it out.
The with version, is able to produce a segmentation map,but the without one, produces all zero’s prediction.
Seems the without version is not tranferrable ?
Why it happens? I use weighted cross entropy as loss function and resizedcrop as data augmentation method,And I tried to remove weight and augmentation, it’s not better.I’m confused.
Network Detail:
My network are simple,encoder is vgg16,and I use Bilinear upsample ,Conv2d(512,2) and Sigmoid to transfer the feature to segmentation map directly. this method always works well while overfitting performance test.
the with version result is coarse but at least it produces something positive. the without one produces only black.

Kevinkevin189 · January 10, 2019, 4:47am

one the other hand, loss is constant in without version, but descend then keep constant in with version

ptrblck · January 10, 2019, 7:53pm

Could you try to play around with some hyperparameters (learning rate, weight init etc.)?
Your model without BatchNorm layers might be more sensitive to the input distribution.
Are you normalizing the image?

CasellaJr · July 8, 2021, 6:19pm

Did you solve? I have the same problem using a simpler VGG, on MNIST, so background black and numbers white