I have created a subset of my data (a bit smaller) and now I have negative values in my std and mean tensors:

This subset is designed in a way that I would have same number of positive and negative classes.

```
train mean and std: tensor([0.0050, 0.0225, 0.0250]) tensor([0.9833, 0.9977, 0.9932])
val mean and std: tensor([-0.0584, -0.0225, -0.0385]) tensor([1.0157, 1.0408, 1.0215])
test mean and std: tensor([-0.1491, -0.0664, -0.0715]) tensor([1.0436, 1.0221, 1.0180])
```

Is any part of the code below wrong?

```
# get the mean var std of train, test and val set for data transform
def get_mean_std(loader):
# VAR[X] = E[X**2] - E[X]**2
channels_sum, channels_squared_sum, num_batches = 0, 0, 0
for data, _ in loader:
channels_sum += torch.mean(data, dim=[0,2,3])
channels_squared_sum += torch.mean(data**2, dim=[0,2,3])
num_batches += 1
mean = channels_sum/num_batches
std = (channels_squared_sum/num_batches - mean**2)**0.5
return mean, std
train_mean, train_std = get_mean_std(dataloaders_dict['train'])
print(train_mean, train_std)
test_mean, test_std = get_mean_std(dataloaders_dict['test'])
print(test_mean, test_std)
val_mean, val_std = get_mean_std(dataloaders_dict['val'])
print(val_mean, val_std)
```

Here are the values I had for the larget set of my dataset:

```
data_transforms = {
'train': transforms.Compose([
transforms.RandomResizedCrop(input_size),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
#transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
transforms.Normalize([0.7031, 0.5487, 0.6750], [0.2115, 0.2581, 0.1952])
]),
'val': transforms.Compose([
transforms.Resize(input_size),
transforms.CenterCrop(input_size),
transforms.ToTensor(),
#transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
transforms.Normalize([0.7016, 0.5549, 0.6784], [0.2099, 0.2583, 0.1998])
]),
'test': transforms.Compose([
transforms.Resize(input_size),
transforms.CenterCrop(input_size),
transforms.ToTensor(),
#transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
transforms.Normalize([0.7048, 0.5509, 0.6763], [0.2111, 0.2576, 0.1979])
])
}
```

I also understand that I should only use the train mean and std also for val and test and donâ€™t calculate it for those since it would leak into val and test based on what @ptrblck mentioned in a previous post. But I donâ€™t understand how it happens. Is there any scientific paper talking about this phenomena?