The reduction="mean" will do average with respect to all elements, but in the other one, you are calculating the average with respect to bacth-size. So the denominator for computing the average in the first case is just batch-size (results.shape[0]), where is in the other case it is np.prod(results.shape).
Actually I tried that and this is not true. The difference in my example is caused by weighting scheme. If there would be no weight parameter in the function the results are the same. The normalization should be actually be weighted by class weights.
weights = torch.Tensor([3, 1, 9, 8]).cuda()
F.cross_entropy(results,labels,weight = weights,reduction="sum")/sum([weights[k] for k in labels])