How can I implement an environment run purely on GPU?

I was wondering about how can I implement an environment purely on GPU. Say, if all the variables in the environment class are torch.Tensor, well they stay on GPU during run-time?

Take the following environment for example:

class ENV_GPU(object):
    def __init__(self, a=3):
        self.num = torch.zeros((a,), dtype=torch.int8)
    def step(action):
        self.num[action] += 1

It will move your ENV_GPU object to GPU.

Thanks for your answering. However, as there is no such method called “to” in ENV_GPU, an AttributeError raised.

.to() is implemented in nn.Module.
If you derive your class from nn.Module and define step as forward it should work:

class ENV_GPU(nn.Module):
    def __init__(self, a=3):
        super(ENV_GPU, self).__init__()
        num = torch.zeros((a,), dtype=torch.int8)
        self.register_buffer('num', num)        
    def forward(self, action):
        self.num[action] += 1

model = ENV_GPU()'cuda')
> tensor([1, 0, 0], device='cuda:0', dtype=torch.int8)

Thanks for your answering. May I ask what is "self.register_buffer(‘num’, num) " for?

The attributes of a module will be moved to the device, if they are registered as buffers (i.e. they don’t need gradients) or as an nn.Parameter (i.e. they should be updated).
Since you defined num as torch.int8, an nn.Parameter won’t work, as only floating point tensors can require gradients.
If you just register num as self.num = torch.zeros((a,), dtype=torch.int8) it won’t be moved to the device.

OK, I see. Thanks very much for your help.

By the way, what if I need to calculate some intermediate results during every step? How can I make sure that there is no date moved between CPU and GPU? Take the following for example,

    def forward(self, action):
        self.knapsack_num += self.action2shift[action]
        self.knapsack_num = torch.clamp(self.knapsack_num, max=self.knapsack_max)
        reward = torch.zeros((1,), dtype=torch.float)
        if action == self.num_food_types:
            if torch.equal(self.expected_num, self.knapsack_num + self.warehouse_num):
                reward = 100 * torch.ones((1,), dtype=torch.float)
                reward = -100 * torch.ones((1,), dtype=torch.float)
            return self.knapsack_num, reward, True
            return self.knapsack_num, reward, False

If you would like to create new tensors inside forward, you should pass the device using the device of an already registered tensor:

reward = torch.zeros((1,), dtype=torch.float, device=self.num.device)

Wow, I see, this is really convenient. Is the “buffers” like an “no update required” “parameters” in a nn.Module?

Yeah, basically just tensors registered with the module, so that they will be moved to the host or device and saved in the state_dict in case you would like to serialize your model.
The running estimates of nn.BatchNorm layers are a good example. While they don’t need gradients to be updated, they should still be moved with the layer and saved to disc.

1 Like

Cool! Thanks very much for your help.