Problem with zeroing gradients

ciw · October 19, 2019, 12:21pm

Hi, I have a problem which is bothering me for a week, any suggestion would be appreciated.

The problem is, when I tried to zeroing out part of the gradients, the corresponding weights are not frozen as expected, these weights are moving in a very small order similar to a truncation error.

In my application, I need to zero out part of the gradients under specific conditions. Here’s an example code to reproduce the error.

The code is modified from the following pytorch example.

github.com

pytorch/examples/blob/master/imagenet/main.py

import argparse
import os
import random
import shutil
import time
import warnings

import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.distributed as dist
import torch.optim
import torch.multiprocessing as mp
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models

This file has been truncated. show original

The 3 ways I use to zero out gradients, the error is still there.

model.conv1.weight.grad *= 0
model.conv1.weight.grad.fill_(0)
model.conv1.weight.grad.data.zero_()

The error is small so it does not affect the performance much, but I’d like to understand how does this happen. Thanks a lot.

tom · October 19, 2019, 8:23pm

If you use momentum or weight decay (from the optimizer), you will get changing parameters even when the grad has the numerical value 0.
Optimizers do special case None gradients to mean “do nothing”, but that isn’t available for parts of a parameter, but in your example model.conv1.weight.grad = None would do the trick.

Best regards

Thomas

ciw · October 21, 2019, 8:44am

Thanks Thomas.

Now I understand what is happening.