How weights and bias are handled in the core and why?

I’m trying to find the answer to the question asked at
" Proper way to implement biases in Neural Networks"

Is the separation between the weights and biases due to: 1. readability of the implementation, 2. performance considerations, 3. flexibility/modularity, 4… others?

Please link to the relevant code if possible.


I have the same opinion as most people on the link you sent:

  • It is easier to implement as you only do x*w +b. You don’t need to do concatenation of 1s to x.
  • The concatenation can be bad for performance.
  • You can easily get more flexibility in how to initialize or regularize them differently.