Hi everyone! I’m fairly new to deep learning and I’m trying to build an image classification model. Currently, there are four classes of various military aircraft (F15, F16, F18 and F35, the dataset was downloaded from here: Military Aircraft Detection Dataset | Kaggle) with roughly 1300-1500 images each (I only took the ones from the ‘crop’ folder). I split each class’s images into 80% training set, 10% validation set and 10% test set.
The only transforms I applied were Resize (to 128 * 128) and ToTensor. I’m using the following architecture for the model:
class AircraftVision(nn.Module):
def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
super().__init__()
self.conv_block_1 = nn.Sequential(
nn.Conv2d(in_channels=input_shape,
out_channels=hidden_units,
kernel_size=3,
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Dropout(p=0.5)
)
self.conv_block_2 = nn.Sequential(
nn.Conv2d(in_channels=hidden_units,
out_channels=output_shape,
kernel_size=3,
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Dropout(p=0.5)
)
self.conv_block_3 = nn.Sequential(
nn.Conv2d(in_channels=output_shape,
out_channels=128,
kernel_size=3,
padding=1),z
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Dropout(p=0.5)
)
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(in_features=output_shape*32*16,
out_features=256),
nn.ReLU(),
nn.Linear(in_features=256, out_features=len(class_names))
)
def forward(self, x):
x = self.conv_block_1(x)
x = self.conv_block_2(x)
x = self.conv_block_3(x)
x = self.classifier(x)
return x
The loss function is CrossEntropyLoss and the optimizer is Adam (learning rate = 0.001, weight decay = 1e-5). The number of epochs is 30.
After several attempts at training the model, I always have a pretty high training accuracy (~95%) and low training loss.However, the validation loss falls slightly (with the accuracy rising) but usually around epoch 15 the accuracy fluctuates at around 60%-65%, while the loss rises slightly but steadily.
As you can see, I already tried adding dropout layers and weight decay. I’ve played around with the learning rate, weight decay, kernel_size and padding in the convolutional layers and a few other minor changes to the architecture.
I have several suspicions why the model is overfitting so much (in no particular order):
- There’s something wrong with my architecture.
- The data I’m using is not good enough, as some of the images are of pretty low quality.
- The training/validaiton loop I’m using (slightly modified from one of Sebastian Raschka’s books) is wrong. Here it is:
def train_val(model, num_epochs, train_dl, val_dl):
loss_hist_train = [0] * num_epochs
accuracy_hist_train = [0] * num_epochs
loss_hist_valid = [0] * num_epochs
accuracy_hist_valid = [0] * num_epochs
for epoch in range(num_epochs):
model.train()
for x_batch, y_batch in train_dl:
x_batch, y_batch = x_batch.to(device), y_batch.to(device)
pred = model(x_batch)
loss = loss_fn(pred, y_batch)
loss.backward()
optimizer.step()
optimizer.zero_grad()
loss_hist_train[epoch] += loss.item()*y_batch.size(0)
is_correct = (torch.argmax(pred, dim=1) == y_batch).float()
accuracy_hist_train[epoch] += is_correct.sum()
loss_hist_train[epoch] /= len(train_dl.dataset)
accuracy_hist_train[epoch] /= len(train_dl.dataset)
model.eval()
with torch.no_grad():
for x_batch, y_batch in valid_dl:
x_batch, y_batch = x_batch.to(device), y_batch.to(device)
pred = model(x_batch)
loss = loss_fn(pred, y_batch)
loss_hist_valid[epoch] += loss.item()*y_batch.size(0)
is_correct = (torch.argmax(pred, dim=1) == y_batch).float()
accuracy_hist_valid[epoch] += is_correct.sum()
loss_hist_valid[epoch] /= len(valid_dl.dataset)
accuracy_hist_valid[epoch] /= len(valid_dl.dataset)
print(f"Epoch: {epoch+1} | Train loss: {loss_hist_train[epoch]:.3f} | Train accuracy: {accuracy_hist_train[epoch]:.3f} | Validation loss: {loss_hist_valid[epoch]:.3f} | Validation accuracy: {accuracy_hist_valid[epoch]:.3f}")
return loss_hist_train, loss_hist_valid, accuracy_hist_train, accuracy_hist_valid
Sorry if this is a bit too much, but I’ve been working on this model for several days now and this overfitting is really starting to bug me. Does anyone have any advice on how to deal with this issue? Many thanks in advance!