Is it mandatory to have batch size power of 2 on gpu?

Does pytorch or cuda have any specific optimization or something?

Hi,

No it is not mandatory.
And power of 2 are not particularly important either.
Maybe powers of 32 that are the size of the streaming multiprocessors? But even that depends a lot on how the cuda kernel is implemented and, in general, won’t lead to any significant difference.