I have read many times of http://pytorch.org/docs/master/notes/extending.html . However, I still have some questions about building custom layers.
I have two questions:
I give a snippet which may not quite logical, but does explain what I am expressing.
class CustomLayer(nn.Module):
def __init__(self, input_features, output_features):
super(Linear, self).__init__()
self.input_features = input_features
# I don't want it be in .parameters(), it is just a intermediate variable
self.input_tensor = torch.randn(input_features, input_features)
self.weight = nn.Parameter(torch.Tensor(output_features, input_features))
# Below, some computation of self.input_tensor with self.weight
self.inter_val = torch.mm(self.input_tensor, self.weight.t())
...
- As
self.input_tensor
is not a Module’s parameter, so I won’t be converted when .cuda() is called. Then
self.input_tensor
will just be on CPU? If so,self.inter_val = torch.mm(self.input_tensor, self.weight.t())
works well too ? When the model is on multi-gpus, it still works well?
Mention: I don’t wantself.input_tensor
registered as parameters, as it is quite large(not just input_features^2) and it will consume much space if I save it in disk.
Considering this situation that the layer is quite complex, so I move the computation to the custom autograd.
class CustomFunction(Funtion):
@staticmethod
def forward(ctx, input_features, output_features):
ctx.save(input_features, output_features)
self.input_tensor = torch.randn(input_features, input_features)
self.weight = nn.Parameter(torch.Tensor(output_features, input_features))
self.inter_val = torch.mm(self.input_tensor, self.weight.t())
# I even need to use some conv computation.
# For example:
self.conv = nn.Conv2d(1,3,3,3)
# I have to warp tensor into a Variable, as Conv2d expect a Variable input
self.inter_conv = Variable(torch.Tensor(1,3,self.input_tensor,self.input_tensor))
self.inter_conv_val = self.conv(self.inter_conv)
# Then I do something on self.inter_val and self.inter_conv_val
- I need to convert self.conv to GPU, as self.inter_conv_val = self.conv(self.inter_conv) is quite time-consuming in my code. As self.conv is a intermediate variable, it will not transfer to GPU when .cuda() is called according to the doc? Or I need to convert it myself here? It seems not working as if self.conv.cuda(), it requires self.inter_conv a cuda Tensor too! At the same time, I hope my code can utilize multi-gpus using
nn.parallel.data_parallel
, how to deal with it? I’d appreciate it if someone can give me some help.