A wired problem about backward through the graph a second time

santisy · March 31, 2018, 9:46pm

Like the code shown below. It notified me that I was trying to use some graph that had been freed but I could not find why.

gist.github.com

https://gist.github.com/santisy/5c3b8e15f13c1c1719fabfd105c970df

bug_report1.py

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn import Parameter

def _l2normalize(v, eps=1e-12):
    return v / (torch.norm(v, p=2) + eps)

def max_singular_value(W, u=None, Ip=1):

This file has been truncated. show original

Note: this code is trying to do Spectral Normalization.(https://arxiv.org/pdf/1802.05957.pdf)

alexis-jacq · March 31, 2018, 11:43pm

A weird bug indeed. Your problem comes from the fact that it seems you can’t create a parameter on CPU and then move it on GPU. As a consequence, your parameter (u) is no longer a parameter but becomes a variable. And when you call zero_grad, as u is still a leaf of the graph (requiring no gradient), but is not recognized by the loop that only treats module’s parameters.

You have to create the parameter from a tensor already on the GPU.

My solution:

class SNLinear(nn.Linear):
    def __init__(self, in_features, out_features, bias=True, Ip=1):
        super(SNLinear, self).__init__(in_features, out_features, bias)
        self.Ip = Ip
        self.u = None

    def max_singular_value(self):
        W = self.weight.view(self.weight.size(0), -1)
        size = W.size() # n x m
        if self.u is None:
            self.u = Parameter(torch.FloatTensor(1, size[0]).normal_().cuda(), requires_grad=False) # 1 x n
        _u = self.u
        for _ in range(self.Ip):
            _v =  _l2normalize(torch.mm(_u, W)) # 1 x m
            _u = _l2normalize(torch.mm(W, _v.t())) # n x 1
            _u = _u.view(1, -1)
        sigma = _u.mm(W).mm(_v.t())
        return sigma, _u

    def forward(self, input):
        sigma, _u = self.max_singular_value()
        self.u.data = _u.data
        W_bar = self.weight / sigma
        return F.linear(input, self.weight/sigma, self.bias)

santisy · March 31, 2018, 11:47pm

Actually, I just found a solution. Basically, just add .detach() at https://gist.github.com/santisy/5c3b8e15f13c1c1719fabfd105c970df#file-bug_report1-py-L39

alexis-jacq · March 31, 2018, 11:50pm

Yes, simply detaching u from the graph also make sens…