Hi everyone,
in my case I have to apply many tensors (of same size) to many Linear Modules (respectively).
These computations are independent and the order doesn’t matter.
To make the Gpu the most efficiently, I wanted to apply these computations using the least number of calls to the Gpu.
I decided to design a Channels wise Linear Module which is base on the Pytorch’s Linear Module :
class multiChannelsLinear(nn.Module):
__constants__ = ['bias']
def __init__(self, channels, in_features, out_features, bias=True):
super(multiChannelsLinear, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.channels = channels
self.weight = nn.Parameter(torch.Tensor(channels, out_features, in_features))
if bias:
self.bias = nn.Parameter(torch.Tensor(channels, out_features))
else:
self.register_parameter('bias', None)
self.reset_parameters()
def reset_parameters(self):
nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
nn.init.uniform_(self.bias, -bound, bound)
def forward(self, input):
input = input.transpose(0, 2).transpose(0, 1)
output = self.weight.matmul(input)
output = output.transpose(0, 1).transpose(0, 2)
if self.bias is not None:
output += self.bias
ret = output
return ret
def extra_repr(self):
return 'channels={}, in_features={}, out_features={}, bias={}'.format(
self.channels, self.in_features, self.out_features, self.bias is not None
)
It seems that the code doesn’t produce the same thing than using many Linear Modules.
I cannot find were I’m wrong,
By the way, why this kind of module does not exist in Pytorch?
I think it would be easier to reach better performance when we want to apply the same kind of transformation on different channels.