Make operations in for-loop parallel

Hi! I am recently working on a module like this: the module has 8 different operations like conv3x3, conv5x5, skip_connection, etc. And I know that all the operations produces output in the same shape.

During forward, I want to get the result of each operation, combine them together and get the final result. As I have detected, the for-loop is the bottle-neck of the program. Therefore, I am wondering if there are any methods to make the for-loop more parallel?

An example demonstration of my module is as below.

class Module(nn.module):
    def __init__(self):
        super(Module, self).__init__()
        self.ops = nn.ModuleList()
        for i in range(8):
            # append an operation in self.ops, such as conv3x3
            self.ops.append(...)
    
    def forward(self, x):
        tmp = list()
        for i in range(8):
            tmp.append(self.ops[i](x))
        # func can be any post-preprocessing function, not important
        result = func(tmp)
        return result