Simple numeric handling DNN

Woosung · March 8, 2024, 6:34am

Hi guys,

I have a hard time during training and need help.

I’m trying to make DNN which just simply act as one hot encoding.
So, the input of DNN is just a normal vector. and output of DNN will be the one hot vector of input.

I made DNN with simple linear layers and relu. But, unable to train my model…
What could be the problem in here?

I attach my full code.

class onehot(nn.Module):
    def __init__(self, input_dim):
        super(onehot, self).__init__()
        self.input_dim = input_dim
        self.layers = nn.Sequential(
            nn.Linear(self.input_dim, 32),
            nn.ReLU(inplace=False),
            nn.Linear(32, self.input_dim),
        )
        for m in self.modules():
          if isinstance(m, nn.Linear):
            torch.nn.init.uniform_(m.weight.data)  
    def forward(self, x):
      return self.layers(x)

criterion = nn.CrossEntropyLoss().cuda()
optimizer = optim.SGD(encoder_9.parameters(), lr=0.001, momentum=0.9)

encoder_9 = onehot(9).cuda().train()

for epoch in range(10): 
    running_loss = 0.0
    for i, data in enumerate(x_list_c1):
        # get the inputs; data is a list of [inputs, labels]
        inputs = data.cuda()
        labels = y_list_c1[i].cuda().float()

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = encoder_9(inputs)
        logits = nn.functional.softmax(outputs, dim = -1)
        loss = criterion(logits, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.6f}')
        running_loss = 0.0

print('Finished Training')

training_epoch = 100

My inputs and labels are like this

and training result like this

loss does not decreasing…

KFrank · March 9, 2024, 1:34am

Hi Woosung!

What you have posted doesn’t seem consistent.

CrossEntropyLoss has log_softmax() inside of it, so your call to
softmax() means that softmax() is being applied twice to outputs.

This is an error and could certainly make your model train poorly.

But, furthermore, even if outputs, the prediction of your model, were a
perfect prediction, the incorrect call to softmax() would make the prediction
relatively poor. However, you report loss values of the order 0.001, which
which is really quite low and, based on the code you posted, you can’t get
a loss value anywhere near that low with the incorrect call to softmax().

Try printing out outputs, logits, and loss (as well as labels, just to be
complete), and double-check what’s going on.

Aside from this issue, my intuition is that your two-layer model probably
won’t have enough “structure” (whatever that may mean) to do a good
job of performing your desired one-hot computation. (I could be wrong,
though; I’ve never tried it.) Once you get your code working properly, you
might consider experimenting with models that are both wider (larger
internal dimensions) and deeper (more layers).

As an aside, please don’t post screenshots of textual information. Doing
so breaks accessibility, searchability, and copy-past.

Best.

K. Frank