I find a example about “wrap padding” in

and I modify the code a little to make some dimension “wrap padding” and some padding with zeros.

def pad_circular_nd2(x: torch.Tensor, pad: int, dim, dim0) -> torch.Tensor:

“”"

:param x: shape [H, W]

:param pad: int >= 0

:param dim: the dimension over which the tensors are padded

:return:

“”"`if isinstance(dim, int): dim = [dim] if isinstance(dim0, int): dim0 = [dim0] for d in dim: if d >= len(x.shape): raise IndexError(f"dim {d} out of range") idx = tuple(slice(0, None if s != d else pad, 1) for s in range(len(x.shape))) x = torch.cat([x, x[idx]], dim=d) idx = tuple(slice(None if s != d else -2 * pad, None if s != d else -pad, 1) for s in range(len(x.shape))) x = torch.cat([x[idx], x], dim=d) pass x0 = torch.zeros(x.size()).double().cuda() for d in dim0: if d >= len(x.shape): raise IndexError(f"dim {d} out of range") idx = tuple(slice(0, None if s != d else pad, 1) for s in range(len(x.shape))) x = torch.cat([x, x0[idx]], dim=d) idx = tuple(slice(None if s != d else -2 * pad, None if s != d else -pad, 1) for s in range(len(x.shape))) x = torch.cat([x0[idx], x], dim=d) pass return x.cuda()`

However this “wrap padding” runs on CPU, though I expect it runs on GPU and makes the training much slower. Is there any way to fix it?