I tested `pickle`

on list of pytorch tensors, and found it was 20~30x slower than numpy arrays, what could be the reason? And how to make it faster for tensors?

Benchmark code goes as follows

```
import numpy as np
import torch as th
import pickle
# create same size data (2d list, inner list is a list of tensors or arrays)
numpy_array = [[np.random.rand(20) for _ in range(100)] for _ in range(100)]
torch_array = [[th.rand(20) for _ in range(100)] for _ in range(100)]
# measure pickle time cost
def time_pickle(array):
t = time.time()
with open("array.pkl", "wb") as fp:
pickle.dump(array, fp)
print(f"dump time cost is {time.time() - t}")
t = time.time()
with open("array.pkl", "rb") as fp:
array = pickle.load(fp)
print(f"load time cost is {time.time() - t}")
>>> time_pickle(torch_array)
dump time cost is 0.35193419456481934
load time cost is 0.3934769630432129
>>> time_pickle(numpy_array)
dump time cost is 0.039617061614990234
load time cost is 0.017045259475708008
```