Data parallel issue with class member tensors

I am trying to run a network across 4 GPUs using the DataParallel module. However, I keep getting:

RuntimeError: arguments are located on different GPUs at /some/location/in/torch/code.cu

The issue seems to arise from the fact that all inputs must be on the first GPU in the device list in order for communication to be able to be handled. However, my models have class variables (a meshgrid of a certain size an a few predefined filters in my case) that are initialized on the GPU in the constructor. The reason is that I want to avoid the overhead of creating these variables with each forward pass (both the memory allocation and CPU-GPU communication). These class variables are Tensors wrapped as Variables (so they can be used in convolution operations). This design appears to be the source of the problem. These variables are initialized on a GPU in my model class, but my model class is wrapped in a DataParallel, so I don’t know which GPU these variables get put on in the parallelism.

The only solution I’ve come up with is to initialize everything on the CPU in the model constructor and add a check to put them on the GPU in the first forward pass (e.g. if not my_var.is_cuda(): my_var.cuda()). Is there a better (more elegant, less hacky) design for this paradigm? I’m sure I’m not the only one who’s done this.