Mathematical expression of LogSoftmax

dchatterjee172 · February 23, 2018, 3:58am

I was checking the C code for LogSoftmax, then I came to this line,

pytorch/pytorch/blob/4ad7fab16e814dc8077210a127317ec3ac620967/aten/src/THNN/generic/LogSoftMax.c#L55


  real *input_data  = input_data_base  + outer_idx * outer_stride + inner_idx;
  real *output_data = output_data_base + outer_idx * outer_stride + inner_idx;


  real max_input = -THInf;
  for (d = 0; d < LOG_SOFTMAX_CAST_TYPE dim_size; d++)
    max_input = THMax(max_input, input_data[d * dim_stride]);


  accreal logsum = 0;
  for (d = 0; d < LOG_SOFTMAX_CAST_TYPE dim_size; d++)
    logsum += exp(input_data[d * dim_stride] - max_input);
  logsum = max_input + log(logsum);


  for (d = 0; d < LOG_SOFTMAX_CAST_TYPE dim_size; d++)
    output_data[d * dim_stride] = input_data[d * dim_stride] - logsum;
}


THTensor_(free)(input);
}


void THNN_(LogSoftMax_updateGradInput)(
        THNNState *state,

Now LogSoftmax can be expressed as, x_i - log( exp(x).sum() )
But what is the significance of adding maxinput with log( exp(x).sum() )?

taha · February 23, 2018, 5:27am

Subtracting max_input from the argument of the exp function prevents numerical overflow due to large numbers, see LogSumExp in Wikipedia.