Help for converting sklearn model to PyTorch model

I want to implement a multi-class classifier for categorizing sentences in class labels 0, 1 and 2 using PyTorch. The input to the neural network will be an average of the word emeddings (vectors of 300 dimensions) of all the words that form a sentence. Hence the input size is 300.

This is the sklearn code for the same which I found here: https://github.com/mdvu15/CS488-Senior-Capstone/blob/master/classifierTrain.py

(typing out the code snippet)
X_train, X_test, y_train, y_test = train_test_split(V, y, test_size = 0.25) #25% of data set aside for testing

mlp = MLPClassifier(hidden_layer_sizes=(500, 20, 20, 20), max_iter=1000, batch_size=32,
warm_start=True, early_stopping= True) #Classifier object

mlp.fit(X_train, y_train)

This is the PyTorch code I have so far for this-

class Linear_Model(torch.nn.Module):

def init(self):
super().init()
self.device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)

self.fc1 = torch.nn.Linear(300, 500).to(device)
self.fc2 = torch.nn.Linear(500, 20).to(device)
self.fc3 = torch.nn.Linear(20, 20).to(device)
self.fc4 = torch.nn.Linear(20, 20).to(device)
self.fc5 = torch.nn.Linear(20, 3).to(device)

# activation functions

self.relu = torch.nn.ReLU().to(device)
self.softmax = torch.nn.Softmax(dim = 1).to(device)

def forward (self, input, flag):
fc_out = self.relu(self.fc1(input))
fc_out = self.relu(self.fc2(fc_out))
fc_out = self.relu(self.fc3(fc_out))
fc_out = self.relu(self.fc4(fc_out))
fc_out = self.softmax(self.fc5(fc_out))

return fc_out

net = Linear_Model()

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr = 0.001)

#Training

for epoch in range(n_epochs):
train_accuracy = [] #for one epoch
train_losses = [] #for one epoch
net.train()
for inputs, labels in trainloader:
inputs, labels = inputs.to(device), labels.to(device)
output = net(inputs.float(), 0)
loss = criterion(output.squeeze(), labels)
train_losses.append(loss.item())
loss.backward()
optimizer.step() #Stochastic gradient descent on mini batches
optimizer.zero_grad()

# calculate training accuracy
pred = torch.argmax(output.squeeze(), axis = 1)
print(pred)
correct_tensor = pred.eq(labels.view_as(pred))
correct = np.squeeze(correct_tensor.cpu().numpy()) if torch.cuda.is_available() else np.squeeze(correct_tensor.numpy())
num_correct = np.sum(correct)
train_accuracy += [num_correct / batch_size]

Is this code correct? If not, then can the corrections (also suggestions if any) be pointed out? Thanks!

nn.CrossEntropyLoss expects raw logits, as internally F.log_softmax will be used, so remove the softmax and rerun the code again. :wink:

1 Like

Thank you so much for your reply! I will do that. Although how do I calculate training accuracy from raw logits? Because the 3 values that will be returned can be either both positive or negative (I guess).

The outputs will represent logits, so torch.argmax(output, 1) will yield the same prediction as you would get using softmax.
The higher the logit, the higher the probability.

Done. The model is converging much faster now. Thank you so much!