Hi
I’m working on a problem involving sensitivity analysis and hoping to use pytorch and it’s in-build operations instead of coding everything from scratch in CUDA.
I’ve a small example code (using NN example as most people would be familiar with this here) as follows, where computations involving dZ
and dA
are independent of that of Z
and A
.
def sensitive(d_inp, inp, param):
Z = torch.matmul(inp, param.T)
dZ = torch.matmul(d_inp, param.T)
A = torch.tanh(Z)
dA = torch.unsqueeze(1 - torch.tanh(Z)**2, axis=1) * dZ
return A, dA
I want to parallelise the code such that Z
and dZ
are computed in parallel, followed by the parallel evaluation of A
and dA
.
I was looking for solution to this but couldn’t find anything. Hope someone can help me out here.
Thanks