Cannot improve my accuracy

I cannot change the architecture or the loss function for the NN below so I kinda have to make small improvements here and there and would appreciate all the help.
The NN is a general-purposePreformatted text NN designed for binary classification.

class Net(nn.Module):   #Could be nn.Sequential also but then the forward will change as all modules will run sequentially

    def __init__(self, n_x, n_h, n_y):
        super(Net, self).__init__()
        # 1 input a feature of dimenstion 1024.
        self.fc1 = nn.Linear(n_x, n_h)  #Defining a fully connected layer with input dimension as 1024 and output as 
        self.fc2 = nn.Linear(n_h, n_y)
        self.dropout = nn.Dropout(p=0.5)

    def forward(self, x):     #This is the forward propagation function which will be called everytime during forward pass
        #x is the input that we will give in the network.
        x = self.fc1(x) #Passsing the function through the first Fully connected layer
        x = F.relu(x) #Applying the sigmoid activation to the outputof the first fc layer
        x = self.fc2(x)
        m = nn.Sigmoid() 
        # x = torch.round(m(x))
        x = torch.transpose(m(x), 0, 1)

        # print(x.shape)
        return x
from sklearn.utils import shuffle
def train_net(epochs,batch_size,train_x,train_y,model_size,lr):
  # print(train_x.shape)
  # train_y=train_y.T
  # print(train_y.shape)


  model = Net(n_x, n_h, n_y)
  optim = torch.optim.ASGD(model.parameters(),lr=0.005,weight_decay=.01)
  loss_function = nn.BCELoss() 
  train_losses = []
  accuracy = []
  for epoch in range(epochs):
    train_x, train_y = shuffle(train_x, train_y)
    # print((train_y==train_y1).all())
    train_loss = []
    batch_accuracy = []
    for idx in range(0, train_x.shape[0], batch_size):

      batch_x = torch.from_numpy(train_x[idx : idx + batch_size]).float() 
      batch_y = torch.from_numpy(train_y[:,idx : idx + batch_size]).float()    
      model_output = model(batch_x) 
      loss = loss_function(model_output, batch_y) 
      preds = model_output > 0.5
      nb_correct = (preds == batch_y).sum()
    # Scheduler made it worse 
    # scheduler.step(loss.item())  
    if epoch % 100 == 1:
      print("Iteration : {}, Training loss: {} ,Accuracy %: {}".format(epoch,np.mean(train_loss),(count/train_x.shape[0])*100))              
  plt.xlabel('iterations (per tens)')
  plt.title("Learning rate =" + str(lr))
  return model

I am shuffling the dataset with each epoch, but the problem is my data is clearly overfitting despite using early stopping, shuffling and using dropouts. I honestly don’t know what else to do/look for. Any suggestions are appreciated.

Can you plot the train validation curve? Overfitting implies, your model is doing very well on the training set while not generalizing to the validation set. Plotting a train-valid curve would solidify the claim.

I am not plotting my validation as I only have training accuracy of around 100 percent and test accuracy of .74 but I will plot it.

What is you dataset size and train/test split? You can try relevant data augmentation techniques to address the issue of overfitting.

You haven’t specified n_h here. Can you check to see if its value is not too large? If n_h is comparable to n_x, model may just learn to memorize entire input data and not generalize. If the model is overfitting and you don’t have enough data for validation set, try using smaller n_h. Alternatively you could do K-fold cross validation to avoid creating separate validation set. Using train-validation loss plot would give you the exact idea about when to stop training to avoid overfitting. Also, you have defined dropout but don’t seem to be using it.

I have 209 images as my training and 50 as my test.This is the project spec and I can’t change my test size,I can augment though,not sure what is the most effective way.

How can I use dropouts,I do realize I have defined them,but how do I use them?

Your dataset is very small and makes it quite easy to overfit. Maybe the suggested advice to use data augmentation would help in your case?

Also it seems as if you’re defining nn.Dropout(p=0.5) but not using it during forward?

1 Like

I am stuck with the size of the dataset,I will be working on augmenting my dataset but I am not sure how I would do that.

On dropouts,how would I use them in forward?

After you apply ReLU you apply the dropout you created in the init. You would use self.dropout(x) after you’ve applied the ReLU. For example with your code:

class Net(nn.Module):
    def __init__(self, n_x, n_h, n_y):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(n_x, n_h)
        self.fc2 = nn.Linear(n_h, n_y)
        self.dropout = nn.Dropout(p=0.5)
        self.sigmoid = nn.Sigmoid()
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        x = torch.transpose(self.sigmoid(x), 0, 1)

        return x
1 Like

Will report back the results ASAP. Currently Loss averages around .7.

I think data augmentation would help a lot in your case. It’s not too difficult to add either, for example you could do something like this:

my_transforms = transforms.Compose([
    transforms.ColorJitter(brightness=0.5), # random brightness
    transforms.RandomRotation(degrees=45), # max degree 45
    transforms.RandomHorizontalFlip(p=0.5), # flips horizontal with prob 0.5
    transforms.RandomGrayscale(p=0.2), # converts to grayscale with prob 0.2 (keeps channels)
    transforms.Normalize(mean=[0.0,0.0,0.0], std=[1,1,1]) # Note: this does nothing!

my_dataset = datasets.MNIST(root='dataset/', train=True, transform=my_transforms, download=True)

There are a lot more transforms you could use and you can read more about them here: Also depending on what images you have it might not make sense to have certain transformations. Like in our case with MNIST dataset, RandomHorizontalFlip() or RandomVerticalFlip() would probably not make too much sense.

The accuracy improved slightly with the dropouts implemented but not too much.Regrading the data augmentation,my data is numpy vectors would I have to load them to tensors first? Sorry,I am not the most ML saavy and have begun to learn this stuff. Furthermore would I append this new “data” to my already exsistent training set?

When I think about it I think changing architecture to a Convolutional Neural Network (CNN) might also help it generalize better. The dataset is also images, where CNN’s perform much better. Perhaps this might be easier to try first before diving deeper into data augmentation.

I am afraid changing to a CNN is not permitted in this assignment :frowning: . I am working on how to implement data augmentation in my training data.