How to write a customized autograd Function which accepts lists/tuples of tensors as its inputs?

I recently encountered a problem when I try to make a customized autograd Function on my own. In my case, it is quite different from conventional requirements, because the inputs to my Function need to be some lists/tuples of tensors.

To be more specific, my problem can be simplified as follows.
The input to my Function is a list of tensor (in the presented example, we may only consider one input list; but in fact, the Function may need to take more than one input lists).

x = [torch.randn(4,5), torch.randn(8,9)]

And let’s consider the simplest case, a customized ReLU Function.
I would suppose the following codes would run, but it won’t actually.

class MyReLU(torch.autograd.Function):
@staticmethod
def forward(ctx, x):
# x need to be a list of tensors
out = [torch.relu(t) for t in x]
ctx.save_for_backward(out)
return out
@staticmethod
def backward(ctx, y):
act, = ctx.saved_tensors
out = [(t1 > 0.0) * t2 for t1, t2 in zip(act, y)]
return out

I know the actual implementation might be different, but you may get the basic idea from the above codes. The above codes didn’t work, it seems that the Function only takes torch.Tensor as inputs.

I need to implement this because these tensors have distinct sizes and cannot be stacked together. Besides, this shown codes involves some for loop to iterate over all the tensors, but actually it won’t be necessary, since my real operation is written in CUDA C++ and this can be efficiently computed. And the length of the tensor list could be very large, iterating over all the tensors should be time-consuming and inefficient. So that’s why the input to the Function need to be a list of tensors.

Could you give any suggestions or workarounds about this? Thank you very much!!