Deep NN Regression

I am trying make my NN regression model overfit my data.
I used the following settings:

x is of shape: (12, 132399)
y is of shape: (1, 132399)
The number of training examples m: 132399
The number of features per examples: 12
Relu activation
He initialization
Adam optimization

I have tried different learning rates, number of layers, nodes, and epochs, still overfitting is not happening. The best prediction I could get for my regression was R^2=0.6.

I appreciate if you let me know of the error in my code:

class NN(nn.Module):
    
    #Constructor
    def __init__(self,layers):
        super(NN,self).__init__()
        self.hidden=nn.ModuleList()
        
        for D_in, D_out in zip(layers,layers[1:]):
            linear_transform=nn.Linear(D_in,D_out)
            torch.nn.init.kaiming_uniform_(linear_transform.weight, nonlinearity='relu')
            self.hidden.append(linear_transform)
            
    #Prediction
    def forward(self,x):
        L=len(self.hidden)
        for l,transform in zip(range(L),self.hidden):
            if l<L:
                x=relu(transform(x))
            else:
                x=transform(x)
        return x
def train(model, criterion, trainloader, optimizer,scheduler, epochs = 100):
    cost=[]
    total=0
    i=0
    for epoch in range(epochs):
        total=0
        
        for x,y in trainloader:
            optimizer.zero_grad()
            yhat=model(x)
            loss=criterion(yhat,y)
            loss.backward()
            optimizer.step()
            total+=loss.item()
            
        scheduler.step()
        i+=1    
        cost.append(total)
        print(str(i)+':   '+str(total))
    return cost   
layers=[12,200,200,200,1]
model=NN(layers)
criterion=nn.MSELoss()
lr=0.00003
optimizer=optim.Adam(model.parameters(),lr=lr)
milestones=[500,1000]
scheduler=optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.6, last_epoch=-1)
cost=train(model,criterion,trainloader,optimizer,scheduler,epochs=1000)

What range do your features have?
Have you tried normalizing them?
Also, I assume you are passing the inputs as [batch_size, nb_features] to the model.

Thank you for your reply.
Yes I normalized all my features having zero mean and 1 stdv.
Also I create my train loader as below:


trainloader = torch.utils.data.DataLoader(train_data, batch_size=64)

The over fitting happens very slowly regardless of the network sizeā€¦