Mini-batch gradient descent performs poorly

IoannisTselekoglou · November 27, 2022, 11:07am

Hey, I’m trying to implement mini-batch gradient descent on the popular iris dataset, but somehow I don’t manage to get the accuracy of the model above 75-80%. Also the loss does not decrease and is rather stuck at around 0.45, even when I set the number of iterations to 10000.
Something im missing here ?

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear_stack =  nn.Sequential(
         nn.Linear(4,128),
         nn.ReLU(),
         nn.Linear(128,64),
         nn.ReLU(),
         nn.Linear(64,3),
         )
    def forward(self, x):
        logits = self.linear_stack(x)
        return logits

training loop, batchsize per epoch = 10.
transform_label maps [0,1,2] to the labels.

lr = 0.01
model = NeuralNetwork()
optim = torch.optim.Adam(model.parameters(), lr=lr)
loss = torch.nn.CrossEntropyLoss()

n_iters = 1000
steps = n_iters/10
LOSS = []
for epochs in range(n_iters):  
    for i,(inputs, labels) in enumerate(train_loader):
        out = model(inputs)
        train_labels = transform_label(labels)
        l = loss(out, train_labels)
        l.backward()
        #update weights
        optim.step()
        optim.zero_grad()
    LOSS.append(l.item())
    if epochs%steps == 0:
        print(f"\n epoch: {int(epochs+steps)}/{n_iters}, loss: {sum(LOSS)/len(LOSS)}")
        #if i % 1 == 0:
            #print(f" steps: {i+1}, loss : {l.item()}")

output:

epoch: 100/1000, loss: 1.0636296272277832

epoch: 400/1000, loss: 0.5142968013338076

epoch: 500/1000, loss: 0.49906910391073867

epoch: 900/1000, loss: 0.4586030915751588

epoch: 1000/1000, loss: 0.4543738731996598

vdw · November 28, 2022, 10:01am

Using your code, I get a loss of 0.069 – it will never be 0 since some IRIS contains some duplicates, i.e., samples with the same features but different class labels.

Are you sure you prepare the dataset and/or the batches correctly? I would guess, something is off here. Below is the code I used to create the dataset and loader:

from sklearn.datasets import load_iris

data = load_iris()
X = torch.Tensor(data.data)
y = torch.LongTensor(data.target)

dataset = BaseDataset(X, y)
sampler = BatchSampler(RandomSampler(X), batch_size=10, drop_last=False)
train_loader = DataLoader(dataset, batch_sampler=sampler)

The class BaseDataset is just a basic custom Dataset:

class BaseDataset(Dataset):

    def __init__(self, inputs, targets):
        self.inputs = inputs
        self.targets = targets

    def __len__(self):
        return len(self.inputs)

    def __getitem__(self, index):
        if self.targets is None:
            return np.asarray(self.inputs[index])
        else:
            return np.asarray(self.inputs[index]), np.asarray(self.targets[index])

IoannisTselekoglou · November 29, 2022, 8:41am

Thank you for your response, yes indeed I somehow messed up the data loading and preparation. After fixing this issue, I got way better results.
So best practice would be to load the data, split it into x_trian/x_test, y_train/y_test and then wrapping it into a dataclass? I hope that’s the correct terminology.

vdw · November 29, 2022, 8:57am

You don’t have to use a Dataset, Sampler, or DataLoader instance. You can do this all “manually” by iterating over your dataset(s). These classes make it just more convenient.

The only important part is that when you prepare your data that you

Preserve the connection from the inputs/features to respective targets/outputs
That the training data and test data have similar distributions (usually ensured by shuffling)

For example, the IRIS dataset is initially sorted, so if you pick the first 2/3 as you training data ti will only contain 2 classes, while the test data will contain only the 3 class. That’s obviously not ideal :).