Hi,
I have the following component that would need to do some operations:
- Store some tensors (var1)
- Store some tensors that can be updated with autograd (var2)
- Store something that keeps track of which tensor have been added (var3)
- Count how many times every var2 was used (var4)
The forward pass then computes similarities (according to some metric) between the input and var1, and returns the corresponding top k var2. I then do some operations on this result.
When I check with the code below I have two problems:
- I get the warning from checkpoint:
UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(“None of the inputs have requires_grad=True. Gradients will be None”)
myThing.var2.grad
is None (before loss.backward())
class MyThing(nn.Module):
def __init__(self):
super(MyThing, self).__init__()
self.var1 = torch.zeros(0, 10, requires_grad=False)
self.var2 = nn.Parameter(torch.rand(0, 5, requires_grad=True))
self.var3 = defaultdict(bool)
self.var4 = torch.zeros(0, 1, requires_grad=False)
def add(self, elements, sorter):
with torch.no_grad(): # I don't want to store gradients for adding to variables
c = 0
highest_sorted = sorter.argsort(dim=0, descending=True)
elements = elements[highest_losses]
for element in elements:
if not self.var3[element]:
self.var1= torch.cat((self.var1, element.unsqueeze(0)), dim=0)
to_add = torch.rand(1, 5, requires_grad=True)
self.memory_values = nn.Parameter(torch.cat((self.var2, to_add), dim=0)).requires_grad_()
self.var4 = torch.cat((self.var4, torch.zeros(1,1)), dim=0)
c += 1
self.var3[element] = True
if c >= SOME_MAXIMUM_VALUE:
break
def forward(self, x):
a = FC_LAYER_1(x)
b = FC_LAYER_2(self.var1)
sims = torch.matmul(a, b.t())
idxs = sims.sort(dim=1, descending=True).indices
k_highest_sims = smart_sort(sims, idxs)[:,:K]
c = self.var2[idxs[:,:K]]
self.var4[idxs[:,:K]] += 1
return k_highest_sims, c
Code used for the forward pass outside the component:
x = SOME_TENSORS # these have requires_grad set to True above
y = SOME_OTHER_TENSOR # these have requires_grad set to True above
myThing = MyThing()
sims, outs = checkpoint(myThing, x) # need it for memory reasons. Warning here
z = FC_LAYER_3(sims)
result1 = FC_LAYER_4(y)
softmaxed_sims = F.softmax(sims, dim=1)
result2 = FC_LAYER_5(outs)
final_result = (result1 * (1-z) + (result2 * softmaxed_sims).sum(dim=1) * z)
The main problem is that the values in var2 always stay the same (confirming the None gradients).
Am I doing something wrong?