Allocate cuda tensor in subprocess

@apaszke also, how is multi-gpu support being implemented? I’ve seen the Hogwild! example but it’s cpu only.

edit:

The child will be in an inconistent state and it’s UB

What is UB? :slight_smile: