Multi threading slow down

Hi, when I tried to run 2 shallow Neural Network in 2 threads within one process, both get the 2x slow down(10s) than running standalone(5s). The memory is enough, CPU utilization is not saturated when running 2 jobs in parallel and the data they used are dummy data initialized in their own thread. Under same environment, I run them in 2 process, same behavior What was blocking those jobs? The pseudocode is as follows. Thanks

def train():
//x = …some random data
model = nn.Sequential(OrderedDict([
(“hidden_layer_1”, nn.Linear(inFeatures, hiddenDim)),
(“activation_1”, nn.ReLU()),
(“hidden_layer_2”, nn.Linear(hiddenDim, hiddenDim)),
(“activation_2”, nn.ReLU()),
(“output_layer”, nn.Linear(hiddenDim, nbClasses))
]))
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.001)
criterion = nn.MSELoss()

start_time = time.time()
for epoch in range(epoch_size):
    predicted = model(normed_train_data)
    loss = criterion(predicted, train_target)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
end_time = time.time()
print(end_time-start_time)

threadA = Thread(target=train, args=())
threadB = Thread(target=train, args=())
threadA.start()
threadB.start()
threadA.join()
threadB.join()`

Could you provide more code?

updated…That’s basically the most of code. There are some preprocessing of data (x) using pandas DataFrame but I don’t think that’s a problem since the timing starts after. Thanks!