More specifically, why are the kernel sizes and number of channels in convolutional kernels powers of 2? Why are batch sizes also powers of 2? I vaguely recall an explanation stating that it’s because computers compute faster this way, but can anyone please suggest research papers or articles as to why this is the case? I come from a non-CS background, so please bear with me.
Say you’ve chosen 2^n size of something initially. If you decrease it, it may sometimes not help at all - you usually have 2^k parallelism cores for SIMD (same instruction multiple data) processing, and with regard to saved memory - there are downsides with these “unaligned” zones of arbitrary size too. If you increase the size to 2^n+1 instead, this usually results in a new memory block allocation, so it is not efficient in many cases.