Labels for padding

Gears_Gears · October 20, 2023, 7:16pm

Hi!

I need a proper dummy label value for cross-entropy-loss calculation? Could I get some guidance please?

I have a convolutional network that takes input of shape (number of images in batch=B, channels=3, width, height). This input is a batch of images that comes from 5 classes.

I must simulate ‘imbalance’ so that sometimes, I sample B’ < B number of images. This means that I need to ‘pad’ my input with B - B’ number of all-zero images so that the network still works.
In the TensorFlow I’m trying to emulate, they give one-hot encoded labels for the B’ ‘real’ images and all-0s for the dummy all-zero images.

The problem is, PyTorch’s cross-entropy loss doesn’t take one-hot. My B’ images have labels will come from [0, 1, 2, 3, 4], so if I label the dummy images 0, it seems to confuse the model and I get very low accuracy (since both the dummy and some of actual input gets the same label).

Many Thanks!

ptrblck · October 20, 2023, 8:03pm

I’m not sure if I understand your use case but are you passing a zeros tensor into your TF implementation? If so, try to do the same in PyTorch by passing a floating point tensor as newer PyTorch releases accept a one-hot encoded tensor as the target in nn.CrossEntropyLoss.

Gears_Gears · October 20, 2023, 8:18pm

Dear ptrblck,

Thanks very much for your help! After reading the documentation more closely, I found that ignore-index might be a good way to go about it! I’m trying it out now!

Use case & some more information: I’m still working on the TAML algorithm but getting very bad (1.3 cross-entropy loss, 0.3 accuracy after 1000 iters using the same settings as the paper) results. Naively using 0 as the dummy label (when in fact it is a ‘real’ label) is what I’m fixing right now.

The algorithm tries to simulate the imbalanced situation where the number of samples in each classification task differs. So usually, for a 5-way classification task, we are getting B images from each of the 5 classes. The ‘imbalance’ here would be that sometimes we get less than B (so B is the upper bound here) number of images from some classes. TAML’s goal is to use an encoder to know based on the imbalance to reuse more or less information from previous training ‘experiences.’ The way the authors does it in their TF implementation is to one-hot labels like [0 1 0 0 0] for images and [0 0 0 0 0] for the padded images is needed to keep the total number of images in each task constant.