Model always output the same values

Hi there.

I am currently trying to replicate YOLO’s implementation by following some blogs.
However, I the model seems to always output the same output regardless of the input values.
Any ideas why this might be so?

The output of the model is basically a Tensor of (batch_size, 7, 7, NUM_CLASSES + BOXES_PER_CLASS * 5).

When I said output is the same, I mean all the individual Tensor values are the same, regardless of the input image that I pass to the network.

This is before even training. After training is the same.

Sequential(
(0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(18, 18))
(1): ReLU()
(2): MaxPool2d(kernel_size=(2, 2), stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(64, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU()
(5): MaxPool2d(kernel_size=(2, 2), stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1))
(7): ReLU()
(8): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU()
(10): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU()
(12): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU()
(14): MaxPool2d(kernel_size=(2, 2), stride=2, padding=0, dilation=1, ceil_mode=False)
(15): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(16): ReLU()
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU()
(19): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(20): ReLU()
(21): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU()
(23): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(24): ReLU()
(25): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(26): ReLU()
(27): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(28): ReLU()
(29): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(30): ReLU()
(31): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
(32): ReLU()
(33): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(34): ReLU()
(35): MaxPool2d(kernel_size=(2, 2), stride=2, padding=0, dilation=1, ceil_mode=False)
(36): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1))
(37): ReLU()
(38): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(39): ReLU()
(40): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1))
(41): ReLU()
(42): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(43): ReLU()
(44): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(45): ReLU()
(46): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(47): ReLU()
(48): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(49): ReLU()
(50): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(51): ReLU()
(52): Reshape()
(53): Linear(in_features=50176, out_features=4096, bias=True)
(54): ReLU()
(55): Linear(in_features=4096, out_features=4410, bias=True)
(56): Sigmoid()
(57): Reshape()
)

This is the string description of the model if this helps.

Difficult to tell without the train loop. But seeing from the code, you should use Softmax instead of sigmoid. Sigmoid is for binary classification (it will split the output into 1 if val > 0.5 and 0 otherwise).

I agree with you for not using sigmoid.

I have since changed the code. Coming from Keras, I’m used to putting activations as part of the main model. Since the loss function used in YOLO is a combination of multiple things, with classification objective and regression objectives, I should not have used a sigmoid layer on the network. Instead, I have removed the sigmoid, and applied different activations for different parts of the output.

I am now using a different model comprising of vgg16 and some additional layers after that. I don’t get the same output with different inputs anymore. I will try to investigate more on why the YOLO model futhermore.

And Kushaj, thanks for the quick reply. Really appreciate it.