Hello, I’m a bit confused about how to accumulate the batch losses to obtain the epoch loss.

Two questions:

Is #1 (see comments below) correct way to calculate loss with masks)

Is #2 correct way to report epoch loss)

optimizer = torch.optim.Adam(model.parameters, lr=1e-3, weight_decay=1e-5)
criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
for epoch in range(10):
EPOCH_LOSS = 0.
for inputs, gt_labels, masks in training_dataloader:
optimizer.zero_grad()
outputs = model(inputs)
#1: Is this the correct way to calculate batch loss? Do I multiply batch_loss with outputs.shape[0[ before adding it to epoch_loss?
batch_loss = (masks * criterion(outputs, gt_labels.float())).mean()
EPOCH_LOSS += batch_loss
loss.backward()
optimizer.step()
#2: then what do I do here? Do I divide the EPOCH_LOSS with len(training_dataloader)?
print(f'EPOCH LOSS: {EPOCH_LOSS/len(training_dataloader)}:.3f')

BCEWithLogitsLoss returns a float tensor having a single element unless you call it with reduction='none'. Would you explain a bit more about what masks does in your model?

Since batch_loss is a tensor, it is recommended to use EPOCH_LOSS += batch_loss.item() instead of EPOCH_LOSS += batch_loss.

What I know is the length of dataloader(generator) is determined to round(len(dataset) / batch_size). EPOCH_LOSS / len(dataset) would be correct.

So my outputs shape is (14, 10, 128), where 14 is the batch_size, 10 is the seq_len, and 128 is the object vector where if an element in sequence belongs to any of 128 objects, it is marked as 1 and 0 otherwise. The mask tells us the true length of the sequences. So, its shape is (14, 10). For instance, the first sequence might only have 3 elements (so it’s true shape would be 3 x 128), and the rest (7 x 128) is just padding.

So basically, I should divide it by len(dataloader.dataset)?

Not always. As you can see from the implementation, the length of the dataset depends on a few factors.

def __len__(self) -> int:
if self._dataset_kind == _DatasetKind.Iterable:
# NOTE [ IterableDataset and __len__ ]
#
# For `IterableDataset`, `__len__` could be inaccurate when one naively
# does multi-processing data loading, since the samples will be duplicated.
# However, no real use case should be actually using that behavior, so
# it should count as a user error. We should generally trust user
# code to do the proper thing (e.g., configure each replica differently
# in `__iter__`), and give us the correct `__len__` if they choose to
# implement it (this will still throw if the dataset does not implement
# a `__len__`).
#
# To provide a further warning, we track if `__len__` was called on the
# `DataLoader`, save the returned value in `self._len_called`, and warn
# if the iterator ends up yielding more than this number of samples.
# Cannot statically verify that dataset is Sized
length = self._IterableDataset_len_called = len(self.dataset) # type: ignore[assignment, arg-type]
if self.batch_size is not None: # IterableDataset doesn't allow custom sampler or batch_sampler
from math import ceil
if self.drop_last:
length = length // self.batch_size
else:
length = ceil(length / self.batch_size)
return length
else:
return len(self._index_sampler)

If you have specified the bath_size and drop_last is true: you have to divide by len(dataloader) * batch_size

If you have specified the bath_size and drop_last is false: you have to divide by len(dataset)

If you didn’t specify the batch_size: you have to look at the sampler or batch_sampler

In his case not, because he selects just the valid data given the mask. If instead you would compute the loss for your original outputs and labels, then you would need to mask the output.