Book: Deep Learning from Scratch
Jupyter Notebook on GitHub, Code block 55
Why element-wise multiplication is applied to calculate dLdN = dLdS*dSdN, rather than matrix multiplication via either
I assume this is to make the dimensionality of the rest derivatives correct as shown below in the comment following each derivative. However, I still don’t understand why the calculation of dLdN is different from that of dLdX… Or, am I missing something important here?
def matrix_function_backward_sum_1(X: ndarray, W: ndarray, sigma: Array_Function) -> ndarray: ''' Compute derivative of matrix function with a sum with respect to the first matrix input ''' assert X.shape == W.shape # X: (m x n), W: (n x p) # matrix multiplication N = np.dot(X, W) # N: (m x p) # feeding the output of the matrix multiplication through sigma S = sigma(N) # S: (m x p) # sum all the elements L = np.sum(S) # L: a scalar # note: I'll refer to the derivatives by their quantities here, # unlike the math where we referred to their function names # dLdS - just 1s dLdS = np.ones_like(S) # (m x p) # dSdN dSdN = deriv(sigma, N) # (m x p) # dLdN dLdN = dLdS * dSdN # (m x p) element-wise multiplication? # dNdX dNdX = np.transpose(W, (1, 0)) # (p x n) # dLdX dLdX = np.dot(dSdN, dNdX) # (m x p) x (p x n) = (m x n) return dLdX