Customized convolution in high dimension matrices


I have about 500000 matrix by dimension of 36*500 each time and I customized convolution layer by myself and use it in a network by 2 layers (a convolution layer and a linear layer) in pytorch. I think run time should be higher than it could be handled. In your idea, what should I do since I forced not to use convolution of pytorch or even convolve2d of numpy?

Many thanks,
Best Regards