I have my data (very large `np.array`

’s) saved on disk as batches with `np.memmap`

. I am reading these batches as following:

```
x1 = np.memmap('path_to_file1', mode='c')
x2 = np.memmap('path_to_file2', mode='c')
...
```

and combine them with `ConcatDataset`

. I would like to apply some **preprocessing to this combined dataset** but I don’t know how I should continue.

The reason I am troubled is because I don’t want to create copies of the arrays. For example, lets say that I want to apply some standardization. Prior to converting to a `ConcatDataset`

I calculate the mean (weighted mean) and std (weighted std), then transforming the `x`

’s and finally converting to a `ConcatDataset`

.

That is, I am doing the following:

```
x1 = np.memmap('path_to_file1', mode='c')
x2 = np.memmap('path_to_file2', mode='c')
...
mean, std = get_weighted_mean_and_std([x1, x2, ...])
for x in [x1, x2, ...]:
x -= mean
x *= 1/std
dataset = ConcatDataset([x1, x2, ...])
```

The problem is during the assignmets `-=`

and `*=`

which return copies instead of view’s.