I know this is a common topic and I have done my research, but I can’t figure out the issue so I’m asking for your help.

I have a simple one layer network to predict two balanced classed (0 and 1). This is my setup (an entire epoch is all batches in trainBatches):

```
inputSize = 750
hiddenSize = 50
outputSize = 2
batchSize = 256
model = torch.nn.Sequential(
torch.nn.Linear(inputSize, hiddenSize),
torch.nn.ReLU(),
torch.nn.Linear(hiddenSize, outputSize),
)
lossFn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=lr)
model.train()
for k in trainBatches:
input_ = Variable(torch.FloatTensor(data[k:k+batchSize,:-1]))
target_ = Variable(torch.FloatTensor(data[k:k+batchSize,-1]))
output_ = model(input_)
loss = lossFn(output_, target_.long())
optimizer.zero_grad()
loss.backward()
optimizer.step()
```

My input matrix is sparse but not sparser than standard one-hot encoding problems. Target classes are balanced. Output is a non-normalized pair of columns:

```
In [9]: model(input_)
Out[9]:
tensor([[ 0.2018, -0.2460],
[ 0.2018, -0.2460],
[ 0.2018, -0.2460],
[ 0.2532, -0.3221],
[ 0.2641, -0.3455],
```

And target also has the correct structure:

```
In [11]: target_.long()
Out[11]:
tensor([0, 0, 0, 1, 1, 1...])
```

I’ve tried different reductions, learning rates, hidden sizes and number of layers, and the accuracy always remains at around .5, so there is obviously something wrong in the learning stage independent of parameters/network structure.

This seems like such a simple example and it should be training out of the box, what else can I look into? Any mistakes in my simple setup?

Thanks in advance.