# Why my initial loss is bigger than the expected?

I am trying to perform a simple binary classification with a neural network on the make_moons dataset.

Because of random initialization, I expect the first values to be equally splited between correct / incorrect with 50% chance. This would lead to a value of the loss (cross entropy) as -ln(2)=0.69 but my initial loss is 1.684.

What could it be?

I am using a simple PyTorch 2 layers NN:

``````class TorchNet(nn.Module):

def __init__(self, inp:int, out:int, hid:int,  n_layers:int, actf: str = 'relu':
super(TorchNet, self).__init__()

opt = ['relu', 'sigm', 'tanh']
err = 'Select a correct activation function from: {}'.format(opt)
assert actf in opt, err

self.n_lay = n_layers
self.fcInp = nn.Linear(inp, hid, bias=False)

self.fcHid = nn.ModuleList([nn.Linear(hid, hid, bias=False) for _ in range(self.n_lay)])
self.fcOut = nn.Linear(hid, out, bias=False)

if actf == 'relu': self.actf = nn.ReLU(inplace=True)
if actf == 'sigm': self.actf = nn.Sigmoid()
if actf == 'tanh': self.actf = nn.Tanh()

def forward(self, x):

# Input Layer
x = self.actf(self.fcInp(x))

# Hidden Layers
for l in range(self.n_lay):

x = self.actf(self.fcHid[l](x))

# Apply recursivity to the last layer
if l == max(range(self.n_lay)) and self.recursive is not None:
for _ in range(self.recursive):
x = self.actf(self.fcHid[l](x))

# Output Layer
x = self.fcOut(x)

return x
``````

Could you check, if all prediction are biased towards one specific class?
I tried your model with some dummy inputs and get a pretty decent loss:

``````model = TorchNet(2, 2, 2, 2)
x = torch.randn(100, 2)
target = torch.randint(0, 2, (100,))
criterion = nn.CrossEntropyLoss()

output = model(x)
loss = criterion(output, target)
print(loss)
``````

If your loss is higher, you might want to check your initializations.

I think you are right and there must be something wrong with the initializations or maybe something else I cannot figure out. To get more insights I run some experiments:

These are the models (3 hidden layers of width 10):
Code of the models is here

``````modelN = TorchNet('No Activation', inp_dim, n_class, lay_size, n_layers, actf='none', track_stats=True, recursive=0)
modelS = TorchNet('Sigmoid', inp_dim, n_class, lay_size, n_layers, actf='sigm', track_stats=True, recursive=0)
modelT = TorchNet('TanH', inp_dim, n_class, lay_size, n_layers, actf='tanh', track_stats=True, recursive=0)
modelR = TorchNet('ReLU', inp_dim, n_class, lay_size, n_layers, actf='relu', track_stats=True, recursive=0)
``````

The results changes from different runs so I guess they are very sensitive to the initialization?
How could I properly initialize them to be sure that the problem is somewhere else? I haven’t manually code any specific initialization but the default random I guess.

I also leave here the code for training:

``````def train_epoch(model, tr_loader, criterion, optimizer, lr, results):

train_loss = 0
correct, total = 0, 0

# Run minibaches from the training dataset
for i, (X, labels) in enumerate(tr_loader):

X, labels = Variable(X), Variable(labels)

# Forward pass
y_pred = model(X)
s, preds = torch.max(y_pred.data, 1)

# Compute loss
loss = criterion(y_pred, labels)

# Backward pass
loss.backward()
optimizer.step()

# Collect stats
train_loss += loss.item()
model.collect_stats(lr)

# Compute and store epoch results
total += y_pred.size(0)
correct += int(sum(preds == labels))

lss = round((train_loss / i+1), 3)
acc = round((correct / total) * 100, 2)
results.train_accy.append(acc)
results.train_loss.append(lss)
return lss, acc

valid_loss = 0
correct, total = 0, 0

for i, (X, labels) in enumerate(ts_loader):

X, labels = Variable(X), Variable(labels)

# Forward pass
y_pred = model(X)
s, preds = torch.max(y_pred.data, 1)

# Compute loss
loss = criterion(y_pred, labels)
valid_loss += loss.item()

# Compute and store epoch results
total += y_pred.size(0)
correct += int(sum(preds == labels))

lss = round((valid_loss/i+1), 3)
acc = round((correct / total) * 100, 3)
results.valid_loss.append(lss)
results.valid_accy.append(acc)
return lss, acc
``````

Then, from the main.py:

``````models += [modelN, modelS, modelT, modelR]

for model in models:

r = Results()
optimizer = optim.SGD(model.parameters(), LR, MOMEMTUM, WEIGHT_DECAY, nesterov=NESTEROV)
model_no_recursive_params = [model, criterion, optimizer, r]
train_no_recursive_params = [EPOCHS, LR]
train(*model_no_recursive_params, *train_no_recursive_params)
results.append(r)
``````

``````def weight_init(m):