I am currently building a YOLO version 1, following the research paper. According to their model, the fully connected layer of YOLO version 1 is as follow:
return nn.Sequential(
nn.Flatten(),
nn.Linear(1024*S*S, 4096),
nn.Dropout(),
nn.LeakyReLU(0.1),
nn.Linear(4096, S*S*(C+B*5) )
)
In the end, the output would be a size (7x7x30) tensor, where each 30-parameter long array presents the information of an image. Thus, all the parameters are either 0 or +ve values.
However, the nn.linear() would yield negative values, which result in an error. Because the loss function contains square root calculation, a negative value would result in nan.
I would like to ask by what means can avoid receiving errors; at the same time, the model can still stick on the research paper?