UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead

Since upgrading PyTorch (to ‘0.5.0a0+a24163a’), I am getting the following warning:

/usr/lib/python3.7/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))

Is this due to the new pytorch version or is it due to python-3.7? Is there anything I need to do or is it going to be fixed in newer versions?

Thank you.

1 Like

The reduction argument was recently introduced in 0.4.1.
See e.g. the docs for nn.CrossEntropyLoss.

It’s currently just a warning that you are using deprecated arguments, so you should definitely use reduction instead of reduce or size_average.

6 Likes

Thank you very much!

You need to replace the deprecated arguments with the given suggestion.

See here how they are are generating the warning message.
https://fossies.org/linux/pytorch/torch/nn/_reduction.py

It will also help you understand why the reduction argument makes the other two redundant.

@ptrblck the docs still say size_average is required in order to use ignore_index in F.cross_entropy. It’s not clear how size_average is replaced by reduction in this case. How are we meant to use ignore_index?

Yeah, the documentation might be a bit misleading, as size_average is mentioned, while this argument is deprecated and reduction should be used instead.

There are basically the three reduction types, i.e. 'none', 'sum', and 'mean'.
ignore_index will be applied as:

  1. reduction='none': the loss will not be reduced, so you will get a loss tensor in the shape [batch_size]. The entries, where target==ignore_index, will have a zero loss.
  2. reduction='mean': the reduced loss will be the average of all entries, where target!=ignore_index.
  3. reduction='sum': the reduced loss will be the sum of the “raw loss”. Since the samples with ignored targets will get a zero loss, the sum should not change, if you filter them out or just sum over all values.

Here is a small code snippet to demonstrate my understanding:

# Setup
output = torch.randn(10, 10, requires_grad=True)
target = torch.arange(10)

# sanity check for plain loss without ignore_index
loss_raw = F.cross_entropy(output, target, reduction='none')

loss_mean = F.cross_entropy(output, target, reduction='mean')
print(loss_raw.mean() == loss_mean)

loss_sum = F.cross_entropy(output, target, reduction='sum')
print(loss_raw.sum() == loss_sum)

# Case 2: ignore_index=0, reduction='mean'
loss_raw_ignore = F.cross_entropy(
    output, target, reduction='none', ignore_index=0)

loss_mean_ignore = F.cross_entropy(
    output, target, reduction='mean', ignore_index=0)
print(loss_mean_ignore == loss_raw_ignore[loss_raw_ignore!=0].mean())

# Check gradients
output.grad = None
loss_mean_ignore.backward()
g0 = output.grad.clone()

output.grad = None
loss_raw_ignore[loss_raw_ignore!=0].mean().backward(retain_graph=True)
g1 = output.grad.clone()
print((g0 == g1).all())

# Case 3: ignore_index=0, reduction='sum'
loss_sum_ignore = F.cross_entropy(
    output, target, reduction='sum', ignore_index=0)
print(loss_sum_ignore == loss_raw_ignore.sum())

# Check gradients
output.grad = None
loss_sum_ignore.backward()
g0 = output.grad.clone()

output.grad = None
loss_raw_ignore.sum().backward(retain_graph=True)
g1 = output.grad.clone()

output.grad = None
loss_raw_ignore[loss_raw_ignore!=0].sum().backward()
g2 = output.grad.clone()
print((g0 == g1).all() and (g0 == g2).all())

Let me know, if I’m missing something.

4 Likes

Thank you that’s great info! :slight_smile:

1 Like