Model always output the same values

xian_kgx · February 27, 2019, 10:46am

Hi there.

I am currently trying to replicate YOLO’s implementation by following some blogs.
However, I the model seems to always output the same output regardless of the input values.
Any ideas why this might be so?

xian_kgx · February 27, 2019, 10:49am

The output of the model is basically a Tensor of (batch_size, 7, 7, NUM_CLASSES + BOXES_PER_CLASS * 5).

When I said output is the same, I mean all the individual Tensor values are the same, regardless of the input image that I pass to the network.

xian_kgx · February 27, 2019, 10:50am

This is before even training. After training is the same.

xian_kgx · February 27, 2019, 10:53am

Sequential(
(0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(18, 18))
(1): ReLU()
(2): MaxPool2d(kernel_size=(2, 2), stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(64, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU()
(5): MaxPool2d(kernel_size=(2, 2), stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1))
(7): ReLU()
(8): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU()
(10): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU()
(12): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU()
(14): MaxPool2d(kernel_size=(2, 2), stride=2, padding=0, dilation=1, ceil_mode=False)
(15): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(16): ReLU()
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU()
(19): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(20): ReLU()
(21): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU()
(23): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(24): ReLU()
(25): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(26): ReLU()
(27): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(28): ReLU()
(29): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(30): ReLU()
(31): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
(32): ReLU()
(33): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(34): ReLU()
(35): MaxPool2d(kernel_size=(2, 2), stride=2, padding=0, dilation=1, ceil_mode=False)
(36): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1))
(37): ReLU()
(38): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(39): ReLU()
(40): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1))
(41): ReLU()
(42): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(43): ReLU()
(44): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(45): ReLU()
(46): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(47): ReLU()
(48): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(49): ReLU()
(50): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(51): ReLU()
(52): Reshape()
(53): Linear(in_features=50176, out_features=4096, bias=True)
(54): ReLU()
(55): Linear(in_features=4096, out_features=4410, bias=True)
(56): Sigmoid()
(57): Reshape()
)

This is the string description of the model if this helps.

Kushaj · February 27, 2019, 1:15pm

Difficult to tell without the train loop. But seeing from the code, you should use Softmax instead of sigmoid. Sigmoid is for binary classification (it will split the output into 1 if val > 0.5 and 0 otherwise).

xian_kgx · February 28, 2019, 12:51am

I agree with you for not using sigmoid.

I have since changed the code. Coming from Keras, I’m used to putting activations as part of the main model. Since the loss function used in YOLO is a combination of multiple things, with classification objective and regression objectives, I should not have used a sigmoid layer on the network. Instead, I have removed the sigmoid, and applied different activations for different parts of the output.

xian_kgx · February 28, 2019, 12:53am

I am now using a different model comprising of vgg16 and some additional layers after that. I don’t get the same output with different inputs anymore. I will try to investigate more on why the YOLO model futhermore.

And Kushaj, thanks for the quick reply. Really appreciate it.