Parallel_apply does not work for a dict input

This seems to constrain input to variables or a list (or nested list) of variables. I suppose that at least allowing dictionary of variables as input should increase flexibility (which indeed works for single GPU case). I created an issue for it.

If you think this bug should be resolved, I would be happy to send a pull request.