So I was comparing the performance of the tensor constructor to the numpy array constructor

For pytorch

```
torch.inference_mode()
total_time = 0.0
iterations = 10000
for _ in range(iterations):
data = np.random.normal(0, 1, (1000, 10)).tolist() # Use numpy for RNG
# but convert back to python since we're benchmarking constructor
t1 = time.time()
thing = torch.tensor(data, dtype=torch.float64)
total_time += time.time() - t1
print(total_time / iterations)
```

I get 0.0003917569875717163 s per iteration.

For numpy

```
total_time = 0.0
iterations = 10000
for _ in range(iterations):
data = np.random.normal(0, 1, (1000, 10)).tolist()
t1 = time.time()
thing = np.array(data, dtype=np.float64)
total_time += time.time() - t1
print(total_time / iterations)
```

0.0002772393465042114 s. A small difference, not too concerning; maybe pytorch tensors are associated with a little bit more overhead.

I remembered that when pytorch tensors are constructed from numpy arrays, torch will copy by reference instead of allocating a new block of memory. Out of curiosity, I benchmarked constructing a numpy array and then constructing a tensor out of that.

```
torch.inference_mode()
total_time = 0.0
iterations = 10000
for _ in range(iterations):
data = np.random.normal(0, 1, (1000, 10)).tolist() # Use numpy for RNG
# but convert back to python since we're benchmarking constructor
t1 = time.time()
thing = torch.from_numpy(np.array(data, dtype=np.float64))
total_time += time.time() - t1
print(total_time / iterations)
```

0.00027820470333099365 s. That seems really weird; it seems the performance difference isn’t in the overhead of the python object, but at the memory allocation/access layer. In any case, if torch tensor construction is that much slower, shouldn’t torch just call numpy to create the array in memory first by default?