Unexpected shape when computing gradient of the output with respect to the input

Hi there!
I have a tensor of shape (number_of rays, number_of_points_per_ray, 3), let’s call it input. input is passed through a model and some processing (all of this is differentiable), let’s call this process inference. Finally, we get output = inference(input), which has a shape of (number_of_rays, number_of_points_per_ray, 300), where each “ray” in the output only depends on the same ray of the input. e.g. output[i] only depends on input[i]. This means that for each set of 3 elements on the input the output has 300 elements, so I would expect to get a gradient with the same shape as the output

As explained here, I tried grads=torch.autograd.grad (outputs = output, inputs = input, grad_outputs = None)

but the output I am getting is of shape (number_of rays, number_of_points_per_ray, 3) , which is the same as the input and not the same as the output.

Any clue on what may I be doing wrong?
Thanks in advance

I’m not sure I understand the question, as torch.autograd.grad is meant to compute the gradient with respect to the inputs: inputs (sequence of Tensor) – Inputs w.r.t. which the gradient will be returned (and not accumulated into .grad). Therefore the shapes of the returned values should match that of the inputs.

Additionally, I’m not sure how you are able to run torch.autograd.grad without specifying an gradient for the output if it is not a scalar value, as I would assume it would have requires_grad=True if it is produced from an input that has requires_grad=True. Otherwise, you may already have the gradient of the outputs as that is what you are meant to pass to torch.autograd.grad to get the gradients of the inputs.

Probably I explained it poorly.
I am using a nerf-like process (that contains a neural network and many other things) that takes an origin point and a direction as input. Let’s call this set of origin and direction “ray”. For each input ray it:

  1. casts a ray and divides it into the coordinates (XYZ) of a certain number of samples that go from the origin point and along the given direction.
  2. computes a feature that represents the estimated density of the space at each sample
  3. decodes this feature into the estimated density of the given points for each frame in the input sequence. My input sequence is 300 frames, and hence, for each input ray, the process returns 300 values for each sample in the ray

Now I would like to compute the normal vector, which is usually computed in other nerf-like models by computing the gradient of the density (the output of the process) with respect to the xyz coordinates. In nerf-like models for static scenes (only one frame), each sample is used to predict only one density value, but in my case, each sample is used to predict 300 density values.

Did I explain myself better?
Thanks in advance!

The original post is, in fact and as you pointed out, very poorly explained, so I am dropping this post and opening a new one where I will explain it much better