Pickling tensors is much much slower than numpy arrays

I tested pickle on list of pytorch tensors, and found it was 20~30x slower than numpy arrays, what could be the reason? And how to make it faster for tensors?

Benchmark code goes as follows

import numpy as np
import torch as th
import pickle

# create same size data (2d list, inner list is a list of tensors or arrays)
numpy_array = [[np.random.rand(20) for _ in range(100)] for _ in range(100)]
torch_array = [[th.rand(20) for _ in range(100)] for _ in range(100)]

# measure pickle time cost
def time_pickle(array):
    t = time.time()
    with open("array.pkl", "wb") as fp:
        pickle.dump(array, fp)
    print(f"dump time cost is {time.time() - t}")

    t = time.time()
    with open("array.pkl", "rb") as fp:
        array = pickle.load(fp)
    print(f"load time cost is {time.time() - t}")

>>> time_pickle(torch_array)
dump time cost is 0.35193419456481934
load time cost is 0.3934769630432129

>>> time_pickle(numpy_array)
dump time cost is 0.039617061614990234
load time cost is 0.017045259475708008