Custom filter, cuda, dataparallel problem

I built a custom filter in pytorch as shown below.

When using only one cpu or gpu, the code works fine, but when using dataparallel and model dataparallel, the code does not run.

Is there any way to solve the dataparallel problem in custom filter?

Could you post a minimal, executable code snippet which would reproduce the issue so that we could take a look at it, please?