I am currently trying to understand whether the situation that I’ve encountered is a normal behaviour or a bug. I run experiments that include training models in a simulated distributed environment. Without going into unnecessary details, in each round, clients train on a local trainset, test it against the local test set and report the values. The values are then stored in a csv file together with models that were tested.
To run a validation check, I fix the seed and load the local model and the local test set. Subsequently, I perform a test evaluation. What bothers me is the fact that the values reported in a csv file (test values recorded during simulation) are not fully aligned with test values that I obtain when checking the simulation validity afterwards.
This implies that the same model (with the same weights) tested on the same dataset and with fixed seed obtains two different results. As the model stabilizes (for N rounds, we will have N different models), the difference between the value reported in a csv file and one obtained during replication is close to 0. As an example, I am pasting the log below:
0: Iteration, Abs. Loss Diff.: 1.0015488862991333, Abs. Acc. Diff.: 0.25
1: Iteration, Abs. Loss Diff.: 0.0009263801574705965, Abs. Acc. Diff.: 0.0
2: Iteration, Abs. Loss Diff.: 0.005786736011505145, Abs. Acc. Diff.: 0.0
3: Iteration, Abs. Loss Diff.: 0.003574820756912178, Abs. Acc. Diff.: 0.0
4: Iteration, Abs. Loss Diff.: 0.007152392864227308, Abs. Acc. Diff.: 0.0
5: Iteration, Abs. Loss Diff.: 0.0015836870670318248, Abs. Acc. Diff.: 0.0
6: Iteration, Abs. Loss Diff.: 0.00476664781570435, Abs. Acc. Diff.: 0.0
7: Iteration, Abs. Loss Diff.: 0.003446925878524798, Abs. Acc. Diff.: 0.0
8: Iteration, Abs. Loss Diff.: 0.0017982900142670122, Abs. Acc. Diff.: 0.0
9: Iteration, Abs. Loss Diff.: 0.0006368839740753529, Abs. Acc. Diff.: 0.0
10: Iteration, Abs. Loss Diff.: 0.009332650899887107, Abs. Acc. Diff.: 0.0
11: Iteration, Abs. Loss Diff.: 0.0002723556756972778, Abs. Acc. Diff.: 0.0
12: Iteration, Abs. Loss Diff.: 0.010622120499610865, Abs. Acc. Diff.: 0.0
13: Iteration, Abs. Loss Diff.: 0.004144576042890535, Abs. Acc. Diff.: 0.0
14: Iteration, Abs. Loss Diff.: 0.00525180220603938, Abs. Acc. Diff.: 0.0
15: Iteration, Abs. Loss Diff.: 0.013058926761150391, Abs. Acc. Diff.: 0.0
16: Iteration, Abs. Loss Diff.: 0.008403560966253276, Abs. Acc. Diff.: 0.0
17: Iteration, Abs. Loss Diff.: 0.012890378683805492, Abs. Acc. Diff.: 0.0
18: Iteration, Abs. Loss Diff.: 0.015538938939571367, Abs. Acc. Diff.: 0.0
19: Iteration, Abs. Loss Diff.: 0.03375539824366569, Abs. Acc. Diff.: 0.0
20: Iteration, Abs. Loss Diff.: 0.0018654009699821117, Abs. Acc. Diff.: 0.0
21: Iteration, Abs. Loss Diff.: 0.008243808336555913, Abs. Acc. Diff.: 0.0
22: Iteration, Abs. Loss Diff.: 0.00302100986242293, Abs. Acc. Diff.: 0.0
23: Iteration, Abs. Loss Diff.: 0.004521983098238702, Abs. Acc. Diff.: 0.0
24: Iteration, Abs. Loss Diff.: 0.008875386621803094, Abs. Acc. Diff.: 0.0
...
46: Iteration, Abs. Loss Diff.: 0.046149560796329814, Abs. Acc. Diff.: 0.0
47: Iteration, Abs. Loss Diff.: 0.0725968092895346, Abs. Acc. Diff.: 0.0
48: Iteration, Abs. Loss Diff.: 0.03759608950349502, Abs. Acc. Diff.: 0.0
49: Iteration, Abs. Loss Diff.: 0.05040962719998787, Abs. Acc. Diff.: 0.0
Even though the value is stabilizing, I find this behaviour strange. Can it be due to an inherent randomness of some of the PyTorch components? The full code is much to complex to demonstrate fully, but I am also including my testing function.
def test_loop(net: torch.nn,
testdata = torch.utils.data.DataLoader):
net.to(device)
net.eval()
criterion = nn.CrossEntropyLoss()
test_loss = 0
correct = 0
total = 0
y_pred = []
y_true = []
losses = []
with torch.no_grad():
for _, dic in enumerate(testdata):
inputs = dic['image']
targets = dic['label']
inputs, targets = inputs.to(device), targets.to(device)
outputs = net(inputs)
######################
outputs = outputs.cpu()
targets = targets.cpu()
#######################
total += targets.size(0)
test_loss = criterion(outputs, targets)
losses.append(test_loss)
pred = outputs.argmax(dim=1, keepdim=True)
correct += pred.eq(targets.view_as(pred)).sum().item()
y_pred.append(pred)
y_true.append(targets)
test_loss = np.mean(losses)
accuracy = correct / total
y_true = [item.item() for sublist in y_true for item in sublist]
y_pred = [item.item() for sublist in y_pred for item in sublist]
...
return {
'test_loss': test_loss,
'accuracy': accuracy,
...
'false_positive_rate': false_positive_rate
}