Was this possible in PyTorch previous releases?

Found some deep dream code

        act_value = model.forward(X_Variable, end_layer)
        diff_out = object(act_value, guide_features)
        act_value.backward(diff_out)

As I checked the model.forward should not have the second parameter. Just the one input tensor.

Can someone explain me the third line. Since act_value should be the activation of some end_layer what is diff_out. Should this be a gradient of some kind, since backward?

The only thing I liked with this “solution” was the deep dreaming like:

out=model(input)
out.backward()

Is this possible or I need a loss function?

The user defines a custom MyResNet which accepts two input arguments in its forward as seen here:

class MyResnet(models.resnet.ResNet):
    def forward(self, x, n_layer=3):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        layers = [self.layer1, self.layer2, self.layer3, self.layer4]

        for i_layer in range(n_layer):
            x = layers[i_layer](x)
        #         x = self.layer2(x)
        #         x = self.layer3(x)
        #         x = self.layer4(x)
        return x

Yes, diff_out is the gradient.

1 Like

Thanks for reaching out. It is still confusing for me.
How do you justify this part:

layers = [self.layer1, self.layer2, self.layer3, self.layer4]

        for i_layer in range(n_layer):
            x = layers[i_layer](x)
        #         x = self.layer2(x)
        #         x = self.layer3(x)
        #         x = self.layer4(x)
        return x

I don’t see self.layer2 is defined anywhere. If it is not defined it should be None right?

And a second question; it is not so obvious for me.

out=model(input)
out.backward(gradient)

I know input has requires_grad. When I backward() gradients should be calculated for every requires grad variable. Input is a leaf since it is user defined. What is the use of the gradient?

In the documentation students read:

It should be a tensor of matching type and location, that contains the gradient of the differentiated function w.r.t. self.

What is the meaning of w.r.t self?

MyResnet is derived from torchvision.models.resnet.Resnet as seen in my link, which defines layer2 as:

print(model.layer2)
Sequential(
  (0): Bottleneck(
    (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (downsample): Sequential(
      (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (1): Bottleneck(
    (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
  )
  (2): Bottleneck(
    (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
  )
  (3): Bottleneck(
    (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
  )
)

The gradient is used to update trainable parameters of your model.

self refers to the module itself, which .backward() is called on.
In your example you are passing the gradient directly to out’s backward. If you are not passing the gradient explicitly to backward a gradient of dL/dL = 1. will be used by default.

Practically you can define the root initial gradient this way, which would be 1 otherwise, since out is the “root” node as it is called. But thinking of this more, could we control this with the lambda?

After some checking …
Gradients are there to suppress RuntimeError: grad can be implicitly created only for scalar outputs which happens in case we don’t use the reducer function for the output such as torch.sum.

Now this brings me to the second row, I was not able to create example so that the line works:

diff_out = object(act_value, guide_features)

What I did so far:

act_value = torch.tensor([1.0, 2.0, 3.0])
guide_features = torch.tensor([3.0, 4.0, 5.0])

diff_out = object(act_value, guide_features) #TypeError: object() takes no arguments

print(act_value)
print(guide_features)
print(diff_out)

Re layer2 got that. This is present in torchvision. I didn’t expect torchvision was 5 years old. Some Resnet hub models I used just have layers.

Can someone provide some clue on this second line:

diff_out = object(act_value, guide_features)

Was this diff_out gradient creation possible just in Python 2? Any clue how this would work not getting the TypeError as above.

I see it now. object is the objective function taken from the program arguments.