I want to implement a model which has a very large
Linear layer, because the amount of features is very big, say
2^32 features, actual input will be a sparse tensor.
Typically if the amount of features is not big, I can create an ordinary
Linear layer, for example
self.lin = Linear(in_features, out_features)
in_features=2^32, the above
Linear layer won’t work properly.
So I’m thinking about ideas like,
- Split the huge Linear into multiple small ones, e.g. each has
2^20features. And I looked at
torch.distributed.rpc, but doesn’t seem to be able to do it.
- Or use parameter server, but no idea how to turn the Linear layer into a parameter server.
I didn’t find how to do the above 2 ideas, please give me some advice.