I have a computer vision problem, but it’s probably relevant to other deep learning models.
Sometimes my model will immediately get stuck at a high training loss and won’t be able to improve from there. If I re-train the model several times, eventually one of the training runs results in learning. I’m assuming this has to do with bad initialization of weights, but I was wondering what all I should try to address this.
I have a small dataset (only ~100 positive examples). The problem is binary classification.
For reference here is the model:
class CNN(nn.Module): def __init__(self, input_dim): super().__init__() self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1) self.conv2 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, padding=1) self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2) self.conv3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1) self.conv4 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding=1) self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2) # Input to classifier is input_dim / 4 because 2 max pool layers with kernel_size of 2 final_dim = int(input_dim / 4) self.classifier = nn.Sequential( nn.Linear(in_features=64 * final_dim * final_dim, out_features=128), nn.ReLU(), nn.Dropout(), nn.Linear(in_features=128, out_features=2) ) def forward(self, x): x = self.conv1(x) x = F.relu(x) x = self.conv2(x) x = F.relu(x) x = self.pool1(x) x = F.dropout(x, p=.25) x = self.conv3(x) x = F.relu(x) x = self.conv4(x) x = F.relu(x) x = self.pool2(x) x = F.dropout(x, p=.25) x = x.reshape(x.shape, -1) x = self.classifier(x) x = F.softmax(x, dim=-1) return x
Data is normalized to .5 mean and .5 std and had range [0, 1]