Like the code shown below. It notified me that I was trying to use some graph that had been freed but I could not find why.
Note: this code is trying to do Spectral Normalization.(https://arxiv.org/pdf/1802.05957.pdf)
Like the code shown below. It notified me that I was trying to use some graph that had been freed but I could not find why.
Note: this code is trying to do Spectral Normalization.(https://arxiv.org/pdf/1802.05957.pdf)
A weird bug indeed. Your problem comes from the fact that it seems you can’t create a parameter on CPU and then move it on GPU. As a consequence, your parameter (u) is no longer a parameter but becomes a variable. And when you call zero_grad, as u is still a leaf of the graph (requiring no gradient), but is not recognized by the loop that only treats module’s parameters.
You have to create the parameter from a tensor already on the GPU.
My solution:
class SNLinear(nn.Linear):
def __init__(self, in_features, out_features, bias=True, Ip=1):
super(SNLinear, self).__init__(in_features, out_features, bias)
self.Ip = Ip
self.u = None
def max_singular_value(self):
W = self.weight.view(self.weight.size(0), -1)
size = W.size() # n x m
if self.u is None:
self.u = Parameter(torch.FloatTensor(1, size[0]).normal_().cuda(), requires_grad=False) # 1 x n
_u = self.u
for _ in range(self.Ip):
_v = _l2normalize(torch.mm(_u, W)) # 1 x m
_u = _l2normalize(torch.mm(W, _v.t())) # n x 1
_u = _u.view(1, -1)
sigma = _u.mm(W).mm(_v.t())
return sigma, _u
def forward(self, input):
sigma, _u = self.max_singular_value()
self.u.data = _u.data
W_bar = self.weight / sigma
return F.linear(input, self.weight/sigma, self.bias)
Actually, I just found a solution. Basically, just add .detach() at https://gist.github.com/santisy/5c3b8e15f13c1c1719fabfd105c970df#file-bug_report1-py-L39
Yes, simply detaching u from the graph also make sens…