The class definition of
LinearFunction here confuses me in one way –
is the bias here a vector of size
output.shape where all the entries are the same?
Wouldn’t the proper definition (where all of entries of the bias are free) have
grad_bias = grad_output ?
Or is there a misunderstanding here on my part … thanks.