Hi, I’ll show what I meant by an example, I’ll first define:
import torch
from torch.nn import functional as F
import numpy as np
import random
#####
torch.manual_seed(60)
torch.cuda.manual_seed(60)
np.random.seed(60)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
random.seed(60)
#####
Now let’s do something simple:
pred = torch.FloatTensor([0,1,2])
a = F.gumbel_softmax(pred)
b = F.gumbel_softmax(pred)
print("a:",a)
print("b:",b)
we will get:
a: tensor([0.4957, 0.1406, 0.3637])
b: tensor([0.0653, 0.6350, 0.2996])
Now if I remove a:
pred = torch.FloatTensor([0,1,2])
b = F.gumbel_softmax(pred)
print("b:",b)
we will get:
b: tensor([0.4957, 0.1406, 0.3637])
And it continues as follows with 3 instances.
I understand that it is depended on the seed, but it is impossible that doing humble softmax on a vector is deepened on if I did it before and stored it on another variable.
Moreover that doing twice gumbel softmax as follows:
pred = torch.FloatTensor([0,1,2])
a = F.gumbel_softmax(pred)
b = F.gumbel_softmax(pred)
print("a:",a)
print("b:",b)
Consistantly showed better results on multiple seeds.
Can anyone help me understand this behaviour!?