I have been facing this problem recently on progressively simpler models after I started facing it the first time and haven’t figured it out yet even in the simplest on sequential linear models.

The model below is being trained on 80000 data points. Each data point is of the form <id, features, output>. Output is 0/1 and # of features are 47. Testing without any training gives 58269 points tested correctly, while after every epoch hence, the total correct predictions are 19710 (exactly this).

I started facing this in a complex self coded RNN model and haven’t been able to find any issue here also given that apart from changing code to support data input form, the code is the same as available on pytorch tutorials. What is the error here?

```
class TwoLayerNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
"""
In the constructor we instantiate two nn.Linear modules and assign them as
member variables.
"""
super(TwoLayerNet, self).__init__()
self.linear1 = torch.nn.Linear(D_in, H)
self.linear2 = torch.nn.Linear(H, D_out)
def forward(self, x):
"""
In the forward function we accept a Tensor of input data and we must return
a Tensor of output data. We can use Modules defined in the constructor as
well as arbitrary operators on Tensors.
"""
h_relu = self.linear1(x).clamp(min=0)
y_pred = self.linear2(h_relu)
y_pred = F.sigmoid(y_pred)
return y_pred
filename = 'Training_dataset_Original.csv'
data = pd.read_csv(filename)
data = np.transpose(np.asarray(data.values))
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = len(data[0]), len(data)-2, 100, 2
# Create random Tensors to hold inputs and outputs
# x = torch.randn(N, D_in)
# y = torch.randn(N, D_out)
xnp = np.transpose(data[1:48]).tolist()
ynp = np.transpose(data[48:49]).tolist()
znp = np.transpose(data[0:1]).tolist()
# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)
# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for index in range(1):
total = 0
y_pred = model(torch.tensor(xnp))
for index in range(len(xnp)):
out = 1
if (y_pred[index][0] > y_pred[index][1]):
out = 0
if (out == ynp[index][0]):
total += 1
print('total', total)
for t in range(100):
print('t', t)
for index in range(80000):
if(index % 10000 == 0):
print(index)
x = torch.tensor([xnp[index]], dtype = torch.float)
y = torch.tensor(ynp[index], dtype = torch.long)
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x)
# Compute and print loss
loss = criterion(y_pred, y)
# print(t, index, loss.item())
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()
print('testing')
total = 0
y_pred = model(torch.tensor(xnp))
for index in range(len(xnp)):
out = 1
if (y_pred[index][0] > y_pred[index][1]):
out = 0
if (out == ynp[index][0]):
total += 1
print('total', total)
```