```
channel_1, _, KH1, KW1 = conv_w1.shape
conv_1 = F.conv2d(x, conv_w1, bias=conv_b1, padding=2)
relu_1 = F.relu(conv_1)
```

was the model implementation where x was an input vector.

I instead did this:

```
first_conv = torch.nn.Conv2d(in_channels = input_channel, out_channels = channel_1, padding = 2, kernel_size = (KH1, KW1))
first_conv.weight = torch.nn.Parameter(conv_w1)
first_conv.bias = torch.nn.Parameter(conv_b1)
conv_1 = first_conv(x)
```

I think these two are equivalent; however, when I am running the block of the training code, only the second one complains that the gradient for conv_w1 vector is None. Can someone please explain? Thanks!