Does batch size have any effect on divergence of training alogorithm?

mderakhshani · April 25, 2017, 11:06am

I want to implement the yolo (You look only once) in Pytorch. I wrote its code and set its batch size to 64. But when I ran the algorithm, its cost always increased. But when I set the batch size to 32, the cost decreased in a long term. Could you please tell me Is it logical or not?

Mika_S · April 26, 2017, 6:55pm

Smaller batchsize gives the gradients sufficient noise to jump out of valleys. That being said 64 is not that big size. Are you seeing at the loss per image or the total loss (which would be higher for higher batch size).

mderakhshani · April 27, 2017, 2:52pm

Thanks for your response @Mika_S! I have found that one of my code line had some problem which it was a NAN value. I would like to know, have you ever read the yolo paper?

Mika_S · April 28, 2017, 8:58am

I have read yolo paper an year back. But shoot me questions and I can try to answer as best as I can :).

mderakhshani · April 28, 2017, 4:15pm

I have a question about its cost function implementation. Because the source code was provided by C language and I am so newbie in c. I would like to know, have you got any implementation about the cost in python? I have implemented it but i think the yolo’s authors use some tricks to obtain the best answer which they are unknown. Is it possible to collaborate with each other to provide the python implementation of that?

I have started a post in google group of darknet (Yolo basic framework) link, but did not get any answers.

Mika_S · May 6, 2017, 12:06am

Sorry for the late reply. Unfortunately I do not have an implementation of the cost in python.

Mika_S · May 6, 2017, 12:57am

Have you looked at this: https://github.com/longcw/yolo2-pytorch/blob/master/darknet.py