How to accelerate many matrics mulitiplication

I have to carry out a lot of matrics mulitiplication at same time. It is a part of my own neural network.
On matlab, i can do things like this:

I can just use the keyword parfor
However, when it comes to pytorch, anthing becomes different.
I implemented the same code like this:

    def forward(self, x):
        :param x: train or test with the dimension of [N ,D_in, D_in, num, frame]
        for i in range(x.shape[0]):
            for j in range(x.shape[4]):
                for z in range(x.shape[3]):
                    x[i, :, :, z, j] = self.w[:, :, z].mm(x[i, :, :, z, j])
        return x

It is every slow!!
I cannot come up with any idea to accelerate this process.
Could you give me a hand? I will appreciate it!