Thank you very much for your lightning response!

I am trying to visualize the weights of ConvNext Model.

The following is a function to get linearized connection between two layers of weight from lucid (tensorflow),

I was wondering if you know.

What the corresponding element of the following to pytorch?

```
tf.placeholder_with_default(tf.zeros([1,224,224,3]), [None,None, None, 3]),
tf.placeholder("int32", []),
tf.gradients(t_center[n_chan2], [T(layer1)])[0],
grad.eval(...)
```

I have tried to use torch.autograd.grad(x,y,torch.ones_like(y)) but it seems not working.

```
@functools.lru_cache(128)
def get_expanded_weights(model, layer1, layer2, W=5):
"""Get the "expanded weights" between two layers.
Arguments:
model: model to get expanded weights from
layer1: earlier layer to expand weights between
layer2: later layer to expand weights between
W: spatial width of expanded weigths
Returns:
Expanded weights as numpy array of shape
[W, W, layer1 channels, layer2 chanels]
Discussion:
Sometimes the meaningful weight interactions are between neurons which aren’t
literally adjacent in a neural network, or where the weights aren’t directly
represented in a single weight tensor. A few examples:
* In a residual network, the output of one neuron can pass through the
additive residual stream and linearly interact with a neuron much later
in the network.
* In a separable convolution, weights are stored as two or more factors,
and need to be expanded to link neurons.
* In a bottleneck architecture, neurons in the bottleneck may primarily be
a low-rank projection of neurons from the previous layer.
* The behavior of an intermediate layer simply doesn’t introduce much
non-linear behavior, leaving two neurons in non-adjacent layers with a
significant linear interaction.
As a result, we often work with “expanded weights” -- that is, the result
of multiplying adjacent weight matrices, potentially ignoring non-linearities.
We generally implement expanded weights by taking gradients through our model,
ignoring or replacing all non-linear operations with the closest linear one.
These expanded weights have the following properties:
* If two layers interact linearly, the expanded weights will give the true
linear map, even if the model doesn’t explicitly represent the weights in a
single weight matrix.
* If two layers interact non-linearly, the expanded weights can be seen as
the expected value of the gradient up to a constant factor, under the
assumption that all neurons have an equal (and independent) probability of
firing.
They also have one additional benefit, which is more of an implementation
detail: because they’re implemented in terms of gradients, you don’t need to
know how the weights are represented. For example, in TensorFlow, you don’t
need to know which variable object represents the weights. This can be a
significant convenience when you’re working with unfamiliar models!
"""
# Set up a graph for doing attribution...
with tf.Graph().as_default(), tf.Session(), gradient_override_map({"Relu": lambda op, grad: grad, "MaxPool": MaxAsAvgPoolGrad}):
t_input = tf.placeholder_with_default(tf.zeros([1,224,224,3]), [None,None, None, 3])
T = render.import_model(model, t_input, t_input)
# Compute activations; this gives us numpy arrrays with the right number of channels
acts1 = T(layer1).eval()
acts2 = T(layer2).eval()
# Compute gradient from center; due to overrides this just multiplies out the weights
t_offset = (tf.shape(T(layer2))[1]-1)//2
t_center = T(layer2)[0, t_offset, t_offset]
n_chan2 = tf.placeholder("int32", [])
t_grad = tf.gradients(t_center[n_chan2], [T(layer1)])[0]
arr = np.stack([t_grad.eval({n_chan2: i, T(layer1): acts1[:,0:W,0:W]})[0] for i in range(acts2.shape[-1])], -1)
return arr
```