Custom small CNN has better accuracy than the pretrained classifiers

I have a dataset of laser welding images of size 300*300 which contains two class of bad and good weld seam. I have followed Pytorch fine-tuning tutorial for an inception-v3 classifier.

on the other hand, I also build a custom CNN with 3 conv layer and 3 fc. What I observed is that the fine tuning showed lots of variation on validation accuracy. basically, I see different maximum accuracy every time I train my model. Plus, my accuracy in fine-tuning is much less than my custom CNN!! for example the accuracy for my synthetic images from a GAN is 86% with inception-v3, while it is 94% with my custom CNN. The real data for both network shows almost similar behaviour and accuracy, however accuracy in custom CNN is about 2% more.

I trained with different training scales of 200, 500 and 1000 train-set images (half of them for each class like for 200 images we have 100 good and 100 bad). I also include a resize transform of 224 in my train_loader; in fine tuning tutorial, this resize is automatically done to 299 for inception-v3. for each trial, the validation-size and its content is constant.

Do you know what cause this behavior? Is it because my dataset is so different from the pretrained model classes? am I not supposed to get better results with fine-tuning?

My custom CNN:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv3 = nn.Conv2d(16, 24, 5)
        self.fc1 = nn.Linear(13824, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 2)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        #x = x.view(-1, 16 * 5 * 5)
        x = x.view(x.size(0),-1)
        #print(x.shape)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        #x = F.softmax(x, dim=1)
        return x


model = Net()
criterion = nn.CrossEntropyLoss()
#optimizer = optim.Adam(model.parameters(), lr=0.001)
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=5e-4)
model.to(device) 

with training loop of:

epochs = 15
steps = 0
running_loss = 0
print_every = 10
train_losses, test_losses = [], []
train_acc, test_acc = [], []
for epoch in range(epochs):
    for inputs, labels in trainloader:
        steps += 1
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        logps = model.forward(inputs)
        loss = criterion(logps, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        
        if steps % print_every == 0:
            test_loss = 0
            accuracy = 0
            model.eval()
            with torch.no_grad():
                for inputs, labels in testloader:
                    inputs, labels = inputs.to(device), labels.to(device)
                    logps = model.forward(inputs)
                    batch_loss = criterion(logps, labels)
                    test_loss += batch_loss.item()
                    
                    ps = torch.exp(logps)
                    top_p, top_class = ps.topk(1, dim=1)
                    equals = top_class == labels.view(*top_class.shape)
                    accuracy += torch.mean(equals.type(torch.FloatTensor)).item()

            train_losses.append(running_loss/len(trainloader))
            test_losses.append(test_loss/len(testloader))
            
            #train_acc.append(running_loss/len(trainloader))
            test_acc.append(accuracy/len(testloader))  
            print(f"Epoch {epoch+1}/{epochs}.. "
                  f"Train loss: {running_loss/print_every:.3f}.. "
                  f"Test loss: {test_loss/len(testloader):.3f}.. "
                  f"Test accuracy: {accuracy/len(testloader):.3f}")
            running_loss = 0
            model.train()

How about performance on the real data?

Hi! The real data for both network shows almost similar behaviour and amount of accuracy, however accuracy in custom CNN is about 2% more.

That situation is common, a little bit of overfitting.