I’m trying to predict a multilabel multiclass output from a series of image features (i.e. panoramic images passed through Resnet-152 giving a tensor of BATCH_SIZE X NUM_IMAGES X 36 (NUM_VIEWS) x 2048 (FEATURE_SIZE)). The output of the model is BATCH_SIZE X NUM_CLASSES (~3000). Note that there is a heavy imbalance in the classes. One series of images may have multiple objects in it, so I used BCE loss (with logits).
CV is not my specialty. Below is the model I’ve been using (which may be part of the problem). There is an assumption that one particular image contributes to the output vector, which is why Conv1D was used (I wasn’t sure how to process only one set of panoramic images at a time) and the outputs of each convolution (all of CLASS_SIZE) are max pooled together.
class LandMarkPredictionModule(nn.Module): def __init__(self, num_classes, num_views=36, kernel_size = 4): super(LandMarkPredictionModule, self).__init__() self.num_views = num_views self.kernel_size = kernel_size self.num_classes = num_classes self.conv = torch.nn.Conv1d(in_channels = self.num_views, out_channels= 1, kernel_size = self.kernel_size) # figure out self.fc = nn.Linear(in_features = 2045, out_features = self.num_classes) # How to calculate this value automatically self.act_func = nn.Sigmoid() def forward(self, images): batch_size, length, num_views, feat_size = images.shape reshaped_images = images.reshape(batch_size * length, num_views, feat_size) convolved_images = self.conv(reshaped_images).squeeze(1) output = self.act_func(self.fc(convolved_images)) output = output.reshape(batch_size,length, self.num_classes) return output.max(dim=1)
The loss stalls out at ~0.69, even in a case where I try to over fit to just 1 data point. Below is some example code and the data used.
import torch.optim as optim pred_module = LandMarkPredictionModule(len(landmark_classes)) criterion = nn.BCEWithLogitsLoss() optimizer = optim.Adam(pred_module.parameters(), lr=1e-3) # USE BCE LOSS # Manage class imbalances NUM_EPOCHS = 100 tst_images = torch.from_numpy(np.stack((image_class_pairs,image_class_pairs),axis=0)) tst_labels = torch.from_numpy(np.stack((image_class_pairs,image_class_pairs),axis=0)) for epoch in range(NUM_EPOCHS): optimizer.zero_grad() logits = pred_module(tst_images) loss = criterion(logits,tst_labels) loss.backward() print(loss.item()) optimizer.step()
The sizes of the inputs and outputs above are:
I’m in the process of experimenting with the model, but beyond that what else could cause the loss to stay stuck (Data, Bug in the Code)?