Hey; I construct a very simple classification model to classify mixture of gaussian.

In this case, bivariate Gaussian. The data close to mode one has label 0 and data close to mode two has label 1.

Here is how I generate train samples

```
from torch.distributions.multivariate_normal import MultivariateNormal
m1 = MultivariateNormal(torch.zeros(2) + 300,torch.eye(2) * .01)
m2 = MultivariateNormal(torch.zeros(2) + 200.,torch.eye(2) * .01)
x1 = m1.sample((1000,)) # mode 1
x2 = m2.sample((1000,)) # mode 2
c1 = torch.zeros(1000) # labels for mode 1
c2 = torch.ones(1000) # labels for mode 2
x = torch.cat([x1,x2],dim=0)
c = torch.cat([c1,c2],dim=0).view(-1,1)
```

The train sample look like this

Now I construct a simple classifier

```
class Classifier(nn.Module):
def __init__(self,num_in_dim=2,num_hidden=100):
super(Classifier, self).__init__()
self.fc1 = nn.Sequential(
nn.Linear(num_in_dim, num_hidden),
nn.ReLU(inplace=True),
nn.Linear(num_hidden, 1),
nn.Sigmoid())
def forward(self,x):
return self.fc1(x)
```

Now I set up my training as

```
net = Classifier()
optimizer = optim.Adam(net.parameters(), lr=1e-3)
criterion = nn.BCELoss()
for i in range(100):
optimizer.zero_grad()
a = net(x)
loss = criterion(a, c)
loss.backward()
optimizer.step()
if i % 100 == 0:
print(loss.item())
```

The weird behaviour I observe is that. Here are the train loss. I use the same set of batch samples train the same network with only initial parameters different. Each line is a one initialization of network. We see that sometimes the network doesnâ€™t learn at all but some times it works really well. Though, there are many theory on optimization about local minimal etc. But I think example like this is too trivia and same thing happens even I use linear neural network.