Hi! I am new to manipulating over torch.autograd
and the gradient flow. I encounter with a problem of Acceleration Gradient Calculation
I have a YOLO net $\mathcal{N}$. It receives a video frame at every timestep and provide some outputs ${\mathbf{d}}_{i} = \mathcal{N}(\mathbf{I})$, where $\mathbf{I}$ is the image tensor of the frame and $\mathbf{d}$ is a detection output, i.e. [x,y,w,h,...]
Assume that there are 20 detection boxes after NMS, i.e. len(${\mathbf{d}}$) = 20, now I need to calculate $\frac{\partial x_{i}}{\partial \mathbf{I}}$ for i = 1:20
.
Since $\mathbf{d}{i} = \mathcal{N}(\mathbf{I}),x{i} = \mathbf{d}{i}[0]$, $x{i}$ is included in the computation graph containing $\mathbf{I}$, $\frac{\partial x_{i}}{\partial \mathbf{I}}$ can be obtained by x_i.backward(retain_graph = True)
or torch.autograd.grad(x_i, img_tensor, retain_graph = True)
(if I’m not mistaken.)
Now I calculate 20 gradients in a serial manner.
res = []
for x_i in x_list:
_grad = torch.autograd.grad(x_i, img_tensor, retain_graph=True) # bp pass
res.append(_grad[0])
If forward pass of $\mathcal{N}$ takes about T
seconds(~0.1s), the bp pass takes also approximately T
s(~0.15s).
However the serial computations will take 20 x T
seconds! That’s terrible.
I wonder if there is a parallel way, utilizing the power of GPU I mean, to shorten 20 x T
, like
res = torch.autograd.grad(x_list, [img_tensor] * len(reg_sum_list))
# _ress[0] is _ress[1]
# True
It does not work🤣