I am thinking to define a new function, say, called F whose forward and backward methods can only be solved on CPUs.

So basically my F(x) requires reading out x.data.cpu().numpy(), and run some (sophiscated) code on CPUs, say g(x.data.cpu().numpy()). The backward pass for F also works similarly (and requires some saved numpy variables from the forward pass).

I am wondering whether I can implement similarly to other cases where the forward passes and backward passes use existing functions that are computed on GPUs?

I am tempted to just do the following in the forward pass

y = g(x.data.cpu().numpy())
return Variable(torch.Tensor(y))

You have to observe the torch.autograd.Function protocol, so forward operates on tensors and backward on Variables (until you use a version where they are merged). There are good examples floating around, if you are looking for one, I can offer a differentiable Implicit Function implementation.

I would advise to either require the caller to move it to CPU (and just raise NotImplemented if they are on GPU - that would be my preferred method given that moving to CPU has a fairly heavy performance impact as it stops the asynchronous GPU processing pytorch does) or move it back to where the caller had them.