Dear community,

I am working on my first ever NN for binary classification and I am stuck for two weeks now. It performs very bad and my loss gets stuck at high level. I have X_train.shape torch.Size([38201, 129, 39]) and y_train.shape torch.Size([4927929]). I intend to use 129 rows as a batch (these are records per patient). 129 is the max number of rows, while for those where there are less rows I paded with zero rows. As the data is very class imbalanced I use nn.CrossEntropyLoss with weights (BCE function does not allow to add weights).

I very hope that an experienced look might spot some â€śsilly mistakesâ€ť in what I do. Thank you in advance.

My network:

```
class Net(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(Net, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.activation = nn.Sigmoid()
self.fc1 = nn.Linear(self.input_size, self.hidden_size)
self.fc2 = nn.LSTM(self.hidden_size, self.output_size,num_layers=2, batch_first = True) # returns a tuple
def forward(self, x):
h = self.fc1(x)
h, _ = self.fc2(h)
h = self.activation(h)
return h
input_size = X_train.shape[2] # number of features = 39
hidden_size = round(input_size/2) # half of features as they are sparse
output_size = 2 # I put 2 not 1, as loss function requires this shape
weights = torch.tensor([zeroes/zeroes, zeroes/ones])
criterion = nn.CrossEntropyLoss(weight = weights)
optimizer = optim.Adam(net.parameters(), lr=0.0005)
```

TRAINING LOOP:

```
y_train = y_train.long()
y_val = y_val.long()
epochs = 15
train_losses, validation_losses = [], []
for epoch in range(epochs):
print ("\n Epoch [%d] out of %d" % (epoch + 1, epochs))
running_loss = 0.0
validation_loss = 0.0
auc = 0.0
pr_auc = 0.0
for phase in ['train', 'validation']:
if phase == 'train':
net.train()
else:
net.eval()
if phase == 'train':
start = 0
start_y = 0
for i in range(X_train.shape[0]):
optimizer.zero_grad() # zero the gradient buffers not to consider gradients of previous iter.
end = start+1
X_batch = X_train[start:end]
start +=1
# for Cross Entropy Loss the shape of y should be different, thus â€¦.:
end_y = start_y+batch_size
y_batch = y_train[start_y:end_y]
start_y += batch_size
# forward + backward + optimize
outputs = net(X_batch)
outputs = outputs.view(batch_size,output_size)
loss = criterion(outputs, y_batch)
loss.backward()
optimizer.step() # Does the update
running_loss += loss.item()
if phase == 'validation':
net.eval()
nan = 0
with torch.no_grad():
vx_start, vy_start = 0,0
for inputs in range(X_val.shape[0]):
vx_end = vx_start+1
vX_batch = X_val[vx_start:vx_end]
vx_start +=1
# for Cross Entropy Loss the shape of y should be different, thus...:
vy_end = vy_start+batch_size
vy_batch = y_val[vy_start:vy_end]
vy_start += batch_size
inputs, labels = vX_batch,vy_batch
v_output = net(inputs)
v_output = v_output.view(batch_size,output_size)
v_loss = criterion(v_output, labels)
validation_loss += v_loss.item()
print(f"Training loss: {running_loss/X_train.shape[0]:.3f}.. "
f"Validation loss: {validation_loss/X_val.shape[0]:.3f}.. ")
if epoch%10==0:
out = net(X_val)
out = out.view(-1,2)
out = out[:,1]
out = np.where(out > 0.5, 1, 0)
pr_auc = average_precision_score(y_val,out)
print(f"PR AUC: {pr_auc:.3f} ")
#f"Test accuracy: {accuracy/len(testloader):.3f}")
validation_losses.append(validation_loss/X_val.shape[0])
train_losses.append(running_loss/X_train.shape[0])
```

OUTPUT

Epoch [1] out of 15

Training loss: 0.605â€¦ Validation loss: 0.596â€¦

PR AUC: 0.177

Epoch [2] out of 15

Training loss: 0.596â€¦ Validation loss: 0.596â€¦

Epoch [3] out of 15

Training loss: 0.596â€¦ Validation loss: 0.596â€¦

Epoch [4] out of 15

Training loss: 0.596â€¦ Validation loss: 0.596â€¦

Epoch [5] out of 15

Training loss: 0.596â€¦ Validation loss: 0.596â€¦

Epoch [6] out of 15

Training loss: 0.596â€¦ Validation loss: 0.596â€¦

Epoch [7] out of 15

Training loss: 0.596â€¦ Validation loss: 0.596â€¦

Epoch [8] out of 15

Training loss: 0.596â€¦ Validation loss: 0.596â€¦

Epoch [9] out of 15

Training loss: 0.596â€¦ Validation loss: 0.596â€¦

Epoch [10] out of 15

Training loss: 0.596â€¦ Validation loss: 0.596â€¦

Epoch [11] out of 15

Training loss: 0.596â€¦ Validation loss: 0.596â€¦

PR AUC: 0.176

Epoch [12] out of 15

Training loss: 0.596â€¦ Validation loss: 0.596â€¦

Epoch [13] out of 15

Training loss: 0.596â€¦ Validation loss: 0.596â€¦

Epoch [14] out of 15

Training loss: 0.596â€¦ Validation loss: 0.596â€¦

Epoch [15] out of 15

Training loss: 0.596â€¦ Validation loss: 0.596â€¦

Finished Training

starttime = 2020-09-29 13:12:03.304423

now = 2020-09-29 13:26:43.455491