How to change the variables needed for gradient computation

Edward_Zhou · March 9, 2021, 6:56pm

Hi,

I want to quantize the values of avtivations and weights of each layer into 4bit or 8bit. So i used a method to generate the values after quantization to replace the old full precision values. The values are indeed modified because this is what i want.
However, there is always
“RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation”
I know there is some internal mechainism to track the correctness for computing of gradient. I already tried "with torch.no_grad(): " and lowered the version of pytorch to 0.2 but such tracking still exist. Can i ask how to disable such feature and change the values in gradient computing freely?

Many thanks!
Edward

Eta_C · March 10, 2021, 7:03am

Hello, I write an example. Is it what you want?

def fake_quantize(data, scale):
    with torch.no_grad():
        data[:] = torch.round(data[:] * scale) / scale

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.body = nn.Conv2d(3, 3, 3)
    
    def forward(self, x):
        out = self.body(x)
        print("before quantize:", out)
        fake_quantize(out, torch.rand(1) * 100)
        print("after quantize:", out)
        return out 


net = Net()
y = net.forward(torch.rand(10, 3, 6, 6))
loss = y.sum()
loss.backward()
print(y)

Edward_Zhou · March 11, 2021, 12:32am

Many thanks for your reply.

And this example is similiar to what i have tried with torch.no_grad() method, the execution of program still warn “RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation”

Eta_C · March 11, 2021, 1:29am

Do you use data[:] = new_data instead of data = new_data ?

Edward_Zhou · March 11, 2021, 4:16am

Thanks for your reply, I used data = new_data when i got the runtime error. I have not tried data[:] = new_data yet. Can i ask why add [:] will make a difference?

Eta_C · March 11, 2021, 4:35am

x = torch.rand(3)
print(id(x))  # 140252375853408
x = torch.rand(3)
print(id(x))  # 140252189803616

x = torch.rand(3)
print(id(x))  # 140252375853408
x[:] = torch.rand(3)
print(id(x))  # 140252375853408

See, [:] would not change the memory address.

Edward_Zhou · March 17, 2021, 3:10am

Sorry for reply late. I have tried using [:]
I tried out[0][0][0][0][:] where out is a 4 dimension array. all index set to 0 just as an example here.
But i got error
out[0][0][0][0][:] = 1
IndexError: dimension specified as 0 but tensor has no dimensions
So i tried to modify the whole tensor with out[:] = torch.rand(out.size())
I found it is still
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: