Model not learning although model parameters are updating

I have been facing this problem recently on progressively simpler models after I started facing it the first time and haven’t figured it out yet even in the simplest on sequential linear models.
The model below is being trained on 80000 data points. Each data point is of the form <id, features, output>. Output is 0/1 and # of features are 47. Testing without any training gives 58269 points tested correctly, while after every epoch hence, the total correct predictions are 19710 (exactly this).
I started facing this in a complex self coded RNN model and haven’t been able to find any issue here also given that apart from changing code to support data input form, the code is the same as available on pytorch tutorials. What is the error here?

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Tensors.
        """
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        y_pred = F.sigmoid(y_pred)
        return y_pred

filename = 'Training_dataset_Original.csv'
data = pd.read_csv(filename)
data = np.transpose(np.asarray(data.values))

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = len(data[0]), len(data)-2, 100, 2

# Create random Tensors to hold inputs and outputs
# x = torch.randn(N, D_in)
# y = torch.randn(N, D_out)
xnp = np.transpose(data[1:48]).tolist()
ynp = np.transpose(data[48:49]).tolist()
znp = np.transpose(data[0:1]).tolist()

# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)

for index in range(1):
    total = 0
    y_pred = model(torch.tensor(xnp))
    for index in range(len(xnp)):
        out = 1
        if (y_pred[index][0] > y_pred[index][1]):
            out = 0
        if (out == ynp[index][0]):
            total += 1

    print('total', total)    

for t in range(100):
    print('t', t)
    for index in range(80000):
        if(index % 10000 == 0):
            print(index)
        x = torch.tensor([xnp[index]], dtype = torch.float)
        y = torch.tensor(ynp[index], dtype = torch.long)
        # Forward pass: Compute predicted y by passing x to the model
        y_pred = model(x)
        # Compute and print loss
        loss = criterion(y_pred, y)
        # print(t, index, loss.item())

        # Zero gradients, perform a backward pass, and update the weights.
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print('testing')
    total = 0
    y_pred = model(torch.tensor(xnp))
    for index in range(len(xnp)):
        out = 1
        if (y_pred[index][0] > y_pred[index][1]):
            out = 0
        if (out == ynp[index][0]):
            total += 1

    print('total', total)

Your criterion nn.CrossEntropyLoss expects raw logits from your model instead of a probability.
Could you remove the F.sigmoid inside the forward method of your model and try it again?

The initial (irrelevant) score is varied and below 40000 (/80000). After further epochs, the result is one of 60303 after every epoch.
Also, I printed the y_pred being emitted for each data point and it always has the value [0.4404, -0.5643] which means it’s always predicting 0.
I’ve tried using NLLLoss, and a few others as well. Same issue.