Good morning,
I’m developing an Active Learning method that aim to learn the mean entropy of a dynamic system over its iterations. The method during the training phase split each batch in two parts the first part is composed by labelled observations and the second half by unlabelled observations.
With this setting the entropy of the labelled observations will be very close to zero since the label are already known, whereas the mean entropy for the unlabelled will be greater than zero.
As backbone I’m using the Resnet18 and I’m trying to build a module that takes the richest features from the Resnet18 and use those to predict the mean entropy. This model is inspired by the Learning Loss for Active Learning. I’ve tried different approaches like the same of the paper, using LSTM as to generate a second latent space for then combining the embedding into a single tensor to output a unique number and others.
My actual loss is the following one:
(pred_entr, true_entr), labelled_mask = module_out
entr_loss = weight * self.mse_loss_fn(pred_entr, true_entr.detach())
lab_ce_loss = torch.mean(ce_loss[labelled_mask])
entr_loss = torch.mean(entr_loss[labelled_mask]) + torch.mean(entr_loss[~labelled_mask])
loss = lab_ce_loss + entr_loss
tot_loss_ce += lab_ce_loss.item()
tot_pred_loss += entr_loss.item()
return loss, tot_loss_ce, tot_pred_loss
The problem is that each tried module seems to predict the entropy closed to the true mean entropy, so not generalizing between labelled and unlabelled observations. A clear example can be seen in the following batch:
y_pred tensor([0.4383, 0.4997, 0.5297, 0.5567, 0.5844, 0.5976, 0.6143, 0.6490, 0.6997,
0.7140, 0.6844, 0.7030, 0.6516, 0.6449, 0.6421, 0.6884, 0.6199, 0.6346,
0.6184, 0.5991, 0.6045, 0.6100, 0.6266, 0.6036, 0.6032, 0.6007, 0.6126,
0.6221, 0.6658, 0.6119, 0.6137, 0.5963, 0.6037, 0.6094, 0.6132, 0.6483,
0.6149, 0.6218, 0.6320, 0.6341, 0.6307, 0.6157, 0.6513, 0.6634, 0.6167,
0.5982, 0.5949, 0.5960, 0.6221, 0.6287, 0.6378, 0.6195, 0.6178, 0.5969,
0.6011, 0.5924, 0.6073, 0.6176, 0.6125, 0.6129, 0.6177, 0.6359, 0.6327,
0.6091, 0.6178, 0.6397, 0.6152, 0.6106, 0.6272, 0.6018, 0.6046, 0.6060,
0.5904, 0.5992, 0.6061, 0.6079, 0.6098, 0.6530, 0.6545, 0.6113, 0.6237,
0.6131, 0.6421, 0.6377, 0.6378, 0.6392, 0.6138, 0.6220, 0.6098, 0.6130,
0.6118, 0.6167, 0.5994, 0.6138, 0.6180, 0.6261, 0.6103, 0.6293, 0.6488,
0.6666, 0.6345, 0.6225, 0.6291, 0.6060, 0.5960, 0.6283, 0.6275, 0.6036,
0.6108, 0.6043, 0.6064, 0.6088, 0.6560, 0.6395, 0.6434, 0.6546, 0.7354,
0.6438, 0.6504, 0.6272, 0.6338, 0.6424, 0.6178, 0.6097, 0.5981, 0.6070,
0.6295, 0.6453, 0.6343, 0.6856, 0.6389, 0.6356, 0.6166, 0.6462, 0.6704,
0.6851, 0.7265, 0.6482, 0.6454, 0.6360, 0.6221, 0.6273, 0.6177, 0.6150,
0.6009, 0.6140, 0.6209, 0.6712, 0.6249, 0.6098, 0.5918, 0.5963, 0.6028,
0.6019, 0.6139, 0.6120, 0.5944, 0.5988, 0.6067, 0.6106, 0.6692, 0.6257,
0.6236, 0.6204, 0.6108, 0.6558, 0.6263, 0.6523, 0.6191, 0.6146, 0.6136,
0.6238, 0.7082, 0.6636, 0.6276, 0.6271, 0.6619, 0.6508, 0.7036, 0.6556,
0.6340, 0.6269, 0.6136, 0.5972, 0.6045, 0.6103, 0.6200, 0.6100, 0.6551,
0.6407, 0.6725, 0.6385, 0.6417, 0.6267, 0.6088, 0.6029, 0.6109, 0.6355,
0.6150, 0.6205, 0.6508, 0.7106, 0.6621, 0.7171, 0.6297, 0.6142, 0.6284,
0.6340, 0.6618, 0.6339, 0.6718, 0.6218, 0.6060, 0.6094, 0.6534, 0.6184,
0.6118, 0.6456, 0.6380, 0.6315, 0.6734, 0.6129, 0.6042, 0.5906, 0.5916,
0.6081, 0.6058, 0.5971, 0.6152, 0.5887, 0.6081, 0.6193, 0.6110, 0.6111,
0.6084, 0.6112, 0.6075, 0.6289, 0.6221, 0.6203, 0.6276, 0.6203, 0.6225,
0.6078, 0.6159, 0.6229, 0.6239, 0.6372, 0.6356, 0.6639, 0.6381, 0.6365,
0.5786, 0.5670, 0.5103, 0.4311], device='cuda:0',
grad_fn=<SqueezeBackward0>)
y_true tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.8243, 0.7065, 0.7531, 0.9800, 0.6469, 0.9742, 0.5436,
1.2911, 0.9685, 0.8204, 0.6792, 0.8359, 0.5485, 1.3371, 0.6207, 1.1042,
1.2025, 0.5920, 0.8958, 0.9368, 0.9335, 1.0049, 0.7750, 0.7027, 0.6027,
0.5827, 0.9244, 1.2802, 0.7299, 0.6033, 0.6144, 0.8079, 1.5958, 1.0761,
1.5286, 1.6727, 0.6934, 0.8077, 0.7565, 1.3664, 1.0515, 0.7356, 0.7373,
0.8182, 0.7980, 1.9802, 0.9082, 0.8766, 1.6740, 1.2815, 0.7533, 1.2258,
1.3465, 1.2098, 1.2518, 1.3732, 1.4641, 1.9443, 1.5875, 1.0296, 1.2009,
1.1288, 0.5325, 1.0344, 0.7282, 1.2411, 1.0779, 0.8402, 0.8045, 1.0907,
0.6501, 0.9170, 1.2371, 0.7123, 1.7213, 0.6947, 0.7398, 0.7431, 0.8899,
0.9332, 1.0017, 1.1099, 0.7878, 0.9999, 0.6468, 0.7280, 0.9293, 1.0704,
0.7032, 0.5618, 0.9512, 1.0677, 0.6812, 1.3088, 0.8975, 0.7438, 0.8411,
0.5919, 0.5831, 0.9483, 1.1647, 0.8835, 1.0689, 0.7739, 0.6113, 1.6369,
0.7503, 0.9204, 0.9799, 1.0268, 1.1436, 1.0930, 1.2871, 0.5525, 0.9084,
1.1533, 0.7452, 1.1973, 0.7013, 1.3293, 1.2243, 1.8311, 0.6449, 0.5948,
0.7128, 0.9290, 0.6618, 0.7316], device='cuda:0')
This behaviour is continuing for all the training phase resulting in a poor performance in testing.
However trying to overfit a single batch lead the model to memorize in a few epoch the entropy pattern.