Several questions about creating complex custom layer

I have read many times of http://pytorch.org/docs/master/notes/extending.html . However, I still have some questions about building custom layers.
I have two questions:
I give a snippet which may not quite logical, but does explain what I am expressing.

class CustomLayer(nn.Module):
    def __init__(self, input_features, output_features):
        super(Linear, self).__init__()
        self.input_features = input_features
        # I don't want it be in .parameters(), it is just a intermediate variable
        self.input_tensor = torch.randn(input_features, input_features) 
        self.weight = nn.Parameter(torch.Tensor(output_features, input_features))
        # Below, some computation of self.input_tensor with self.weight
        self.inter_val = torch.mm(self.input_tensor, self.weight.t())
        ...
  1. As self.input_tensor is not a Module’s parameter, so I won’t be converted when .cuda() is called. Then
    self.input_tensor will just be on CPU? If so, self.inter_val = torch.mm(self.input_tensor, self.weight.t())
    works well too ? When the model is on multi-gpus, it still works well?
    Mention: I don’t want self.input_tensor registered as parameters, as it is quite large(not just input_features^2) and it will consume much space if I save it in disk.

Considering this situation that the layer is quite complex, so I move the computation to the custom autograd.

class CustomFunction(Funtion):
    @staticmethod
    def forward(ctx, input_features, output_features):
        ctx.save(input_features, output_features)
        self.input_tensor = torch.randn(input_features, input_features) 
        self.weight = nn.Parameter(torch.Tensor(output_features, input_features))
        self.inter_val = torch.mm(self.input_tensor, self.weight.t())

        # I even need to use some conv computation.
        # For example:
        self.conv = nn.Conv2d(1,3,3,3)
       # I have to warp tensor into a Variable, as Conv2d expect a Variable input
        self.inter_conv = Variable(torch.Tensor(1,3,self.input_tensor,self.input_tensor))
        self.inter_conv_val = self.conv(self.inter_conv)

        # Then I do something on self.inter_val and self.inter_conv_val
  1. I need to convert self.conv to GPU, as self.inter_conv_val = self.conv(self.inter_conv) is quite time-consuming in my code. As self.conv is a intermediate variable, it will not transfer to GPU when .cuda() is called according to the doc? Or I need to convert it myself here? It seems not working as if self.conv.cuda(), it requires self.inter_conv a cuda Tensor too! At the same time, I hope my code can utilize multi-gpus using nn.parallel.data_parallel, how to deal with it? I’d appreciate it if someone can give me some help.

For the question 1, I think that register_buffer(name, tensor) does exactly what you want (see https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py).

For 2, I think you should build self.conv and self.inter_conv as parameters (or buffers) in the constructor init and then call them into the forward. Then you make sure everything will be ok with allocations.

1 Like

Register_buffer will cause the tensor to be saved to disk, which he says he doesn’t want. I don’t think there is a way to have a tensor or variable automatically moved to the gpu but not saved to disk.

I’ll try, thank you for your kind reply.

1 Like