How to balance memory and speed

Hi, I’m trying to calculate the result of two tensors for the following function in every forward propagation. However, I tried different methods, they either are too slow or make my tensor too big to be able to reside in memory. Can someone help me?

Details:
Tensor1: Size([15, 24, 4, 120])
Tensor2: Size([5608, 4, 120])
I need a result =calculate_average_similarity_score(tensor1, tensor2, sim_dim=-1., avg_dim=-2). The result is of Size(24, 15, 5608)

Method1:
I unsqueeze and expand both tensor1 and tensor 2 so that they both have Size([15, 24, 5608, 4, 120]) as input to the function. However, this makes a giant tensor more than 3.8Gb

Method2:
I tried to loop through two tensors like this but found it to be super slow:

res = []
for tensor1_dim0 in tensor1:
    tensor1_res=[]
    for tensor1_dim1 in dim0:
        tensor2_res = []
        for tensor2_dim0 in tensor2:
             sc=calculate_average_similarity_score(tensor1_dim1, tensor2_dim0, sim_dim=-1, sim_dim=-2)
             tensor2_res.append(sc)
        tensor1_res.append(tensor2_res)
res.append(tensor1_res)
output = torch.tensor(res)

Method3:
I tried the map method described in https://discuss.pytorch.org/t/giant-tensor-consumes-gpu-memory/142691

The function to apply to both tensors

def calculate_average_similarity_score(tensor1, tensor2, sim_dim=None, avg_dim=None):
        """
        Calculate the similarity between two tensors.

        This similarity is both calculated using a CosineSimilarity and an
        average.
        E.g.
        t1 = torch.tensor([[[1, 2, 3], [3, 2, 1]], [[1, 2, 3], [3, 2, 1]]],
        dtype=torch.double) # 2*2*3 tensor
        t2 = torch.tensor([[[1, 2, 3], [3, 2, 1]], [[1, 2, 1], [1, 2, 1]]],
        dtype=torch.double) # 2*2*3 tensor
        if sim_dim=-1, avg_dim = -2,
        This will first calculate cos similarity along dim -1, and then
        average over dim -2 (original dim -2, not the dim after cos
        similarity).
        The result is tensor([1.000, 0.8729]) because the average of the two
        similarity scores are 1.000 and 0.9729 respectively

        :param tensor1: input1
        :param tensor2: input1
        :param sim_dim: the dimension along which similarity is calculated
        This dimension becomes 1 after calculation. The sim_dim has to be
        expressed as a negative interger (for the ease of implementation).
        :param avg_dim: the dimension along which an arithmetic average is
        calculated. The sim_dim has to be expressed as a negative integer (for
        the ease of implementation).
        :return: a tensor of average scores
        """
        if sim_dim >= 0 or (avg_dim is not None and avg_dim >= 0):
            raise NotImplementedError("kernels.py::arctan_sc(). Currently "
                                      "this function is implemented assuming "
                                      "sim_dim and avg_dim both are negative. "
                                      "Change the implementation if using "
                                      "positive dimension")

        cos = CosineSimilarity(dim=sim_dim)
        sc = cos(tensor1, tensor2)
        if avg_dim is not None:
            if sim_dim > avg_dim:  # The sim_dim disappear after Cos,
                # so avg_dim changes as well
                avg_dim = avg_dim - sim_dim
            sc = torch.mean(sc, dim=avg_dim)
        return sc