Neural network could not learn on this type of data, why?

Unable to learn by network this type of data

input_data  = [x for x in range(1, 4000)]
label_data  = [((d**3)) for d in input_data] # using power

But, This data learns quickly by the same network

input_data = [x for x in range(1, 4000)]
label_data = [((d*3)) for d in input_data] # using simple multiplication

My code is:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data.dataloader as dataloader

import matplotlib.pyplot as plt

# network
class Network(nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        self.fc1 = nn.Linear(in_features=1, out_features=50, bias=True)
        self.fc2 = nn.Linear(in_features=50, out_features=1, bias=True)
    
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x
        
net = Network()

class MyData(torch.utils.data.Dataset):
    def __init__(self, data_input, data_output):
        self.data_input = data_input
        self.data_output = data_output
        
    def __len__(self):
        return len(self.data_input)
    
    def __getitem__(self, idx):
        return {
            'x':torch.tensor([self.data_input[idx]], dtype=torch.float32), 
            'y':torch.tensor([self.data_output[idx]], dtype= torch.float32)
        }

optimizer = torch.optim.Adam(net.parameters(), lr=0.01)

input_data = [x for x in range(1, 4000)]
# label_data = [((d*3)) for d in input_data] # simple multiplication
label_data  = [((d**3)) for d in input_data]  # using power

batch_size = 100
data = torch.utils.data.DataLoader(dataset=MyData(input_data, label_data), shuffle=True, batch_size=batch_size)
criterion = nn.MSELoss()
x_data, y_data = list(),list()

for e in range(600):
    total_loss = 0
    for i, value in enumerate(data):   
        prediction = net(value['x'])
        loss = criterion(prediction, value['y'])
        total_loss+= loss.item()/batch_size
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
    y_data.append(total_loss/i+1)
    x_data.append(e)
        

net(torch.tensor([2], dtype=torch.float32)).item()

plt.figure(figsize=(16,9))
plt.xlabel('Epoch -->', size=20)
plt.ylabel('Loss -->', size=20)
plt.xticks(size=20)
plt.yticks(size=20)
plt.title('Loss graph')

plt.plot(x_data, y_data, color='green', linewidth=2)
plt.show()

print(total_loss/i+1) # print the last loss only

The range of the target values is completely different (max of 63952011999 vs. 11997) which would explain the failure in learning.
Note that the parameters of a neural network are usually initialized with “small” values using a scaled normal or uniform distribution.
To map the input value of e.g. 3999 to 63952011999 would thus require large weight values, which might take a while during the classical SGD.

2 Likes

Thanks :slight_smile: , i understand it now, eg: 3999 to 63952011999 would require large weight values. But neural network weights are initialized with small values. Here i can convert large target values to smaller range of values (eg: between 0 and 1) to solve it. Will it work then? :relaxed:

I don’t think so, because you have to consider that floating point numbers are really quite limited in what they can express, in the case of a simple sigmoid over large values… If you’re only doing small values, it may generalize to large values, but I think this is highly unlikely…

1 Like

Yes, this could work and is sometimes used if the target doesn’t have “optimal” values.
You could normalize the input (which is usually done anyway) and also normalize the target to train the model.
After the training you could unnormalize the predictions using the target stats and calculate the “real loss” based on this new prediction tensor.

Here is a code snippet using your initial dataset, which can successfully learn the data using this approach:

class Network(nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        self.fc1 = nn.Linear(in_features=1, out_features=50, bias=True)
        self.fc2 = nn.Linear(in_features=50, out_features=1, bias=True)
    
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x
        
model = Network()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.MSELoss()

input  = torch.tensor([x for x in range(1, 4000)]).float()
target  = torch.tensor([((d**3)) for d in input]).float()
input = input.unsqueeze(1)
target = target.unsqueeze(1)

# normalize input
input_mean = torch.mean(input)
input_std = torch.std(input)
input = input - input_mean
input = input / input_std

# normalize target
target_mean = torch.mean(target)
target_std = torch.std(target)
target_norm = target - target_mean
target_norm = target_norm / target_std

for epoch in range(1000):
    optimizer.zero_grad()
    out = model(input)
    loss = criterion(out, target_norm)
    loss.backward()
    optimizer.step()
    print('Epoch {}, loss {}, max. pred val {}'.format(
        epoch, loss.item(), out.max()))
        
with torch.no_grad():
    pred = model(input)
    # unnormlalize prediction
    pred = pred * target_std
    pred = pred + target_mean

plt.figure(figsize=(16,9))
plt.xlabel('Epoch -->', size=20)
plt.ylabel('Loss -->', size=20)
plt.xticks(size=20)
plt.yticks(size=20)
plt.title('Loss graph')
plt.plot(pred.numpy())
plt.plot(target.numpy())
plt.show()

If you comment the normalization out, you will see that the loss is huge (~1e20) and thus the gradients will also be very large. While this might be alright at the beginning of the training, I think you would have to play around with a learning rate scheduler in order to this the original dataset properly, so I would recommend to take a look at the normalization approach instead.

5 Likes

Should I do the normalization in this cell? I am not sure exactly where I should perform this normalization. Also, is input here image and landmarks? or is landmarks targets?

network = Network()
network.cuda()    

criterion = nn.MSELoss()
optimizer = optim.Adam(network.parameters(), lr=0.0001)

loss_min = np.inf
num_epochs = 10

start_time = time.time()
for epoch in range(1,num_epochs+1):
    
    loss_train = 0
    loss_test = 0
    running_loss = 0
    
    
    network.train()
    print('size of train loader is: ', len(train_loader))

    for step in range(1,len(train_loader)+1):
        
        batch = next(iter(train_loader))
        images, landmarks = batch['image'], batch['landmarks']
        
        images = images.cuda()
        landmarks = landmarks.view(landmarks.size(0),-1).cuda() 
        
        predictions = network(images)
        
        # clear all the gradients before calculating them
        optimizer.zero_grad()
        
        # find the loss for the current step
        loss_train_step = criterion(predictions.float(), landmarks.float())
        
        print("type(loss_train_step) is: ", type(loss_train_step))
        
        print("loss_train_step.dtype is: ",loss_train_step.dtype)
                
        # calculate the gradients
        loss_train_step.backward()
        
        # update the parameters
        optimizer.step()
        
        loss_train += loss_train_step.item()
        running_loss = loss_train/step
        
        print_overwrite(step, len(train_loader), running_loss, 'train')
        
    network.eval() 
    with torch.no_grad():
        
        for step in range(1,len(test_loader)+1):
            
            batch = next(iter(train_loader))
            images, landmarks = batch['image'], batch['landmarks']        
            images = images.cuda()
            landmarks = landmarks.view(landmarks.size(0),-1).cuda()
        
            predictions = network(images)

            # find the loss for the current step
            loss_test_step = criterion(predictions, landmarks)

            loss_test += loss_test_step.item()
            running_loss = loss_test/step

            print_overwrite(step, len(test_loader), running_loss, 'Testing')
    
    loss_train /= len(train_loader)
    loss_test /= len(test_loader)
    
    print('\n--------------------------------------------------')
    print('Epoch: {}  Train Loss: {:.4f}  Test Loss: {:.4f}'.format(epoch, loss_train, loss_test))
    print('--------------------------------------------------')
    
    if loss_test < loss_min:
        loss_min = loss_test
        torch.save(network.state_dict(), '../moth_landmarks.pth') 
        print("\nMinimum Test Loss of {:.4f} at epoch {}/{}".format(loss_min, epoch, num_epochs))
        print('Model Saved\n')
     
print('Training Complete')
print("Total Elapsed Time : {} s".format(time.time()-start_time))

I think landmarks is the target but I might be wrong. Any light in this direction is really appreciated.
Also, doesn’t PyTorch have automatic method for normalization and whitening the data?

landmarks in your code should be the targets.
You could use torchvision.transforms.Normalize to whiten the data or just apply it manually as seen in my code snippet.

1 Like

I read somewhere in the notebook I have this:

data_transform = transforms.Compose([
        transforms.RandomSizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])
    ])

So, I am not sure what I set the mean and std to for either


        batch = next(iter(train_loader))
        images, landmarks = batch['image'], batch['landmarks']
        
        images = images.cuda()
    
        landmarks = landmarks.view(landmarks.size(0),-1).cuda() 
        
        images = torchvision.transforms.Normalize(images, mean=[0.485, 0.456, 0.406], 
                                                          std=[0.229, 0.224, 0.225])
        landmarks = torchvision.transforms.Normalize(landmarks)

Also, should I normalize images and landmarks after doing a .cuda on them?

Additionally, how?

This is what I have from landmark:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-88ae116843f0> in <module>
     33 
     34         images = torchvision.transforms.Normalize(images, mean=[0.485, 0.456, 0.406], 
---> 35                                                           std=[0.229, 0.224, 0.225])
     36         landmarks = torchvision.transforms.Normalize(landmarks)
     37 

TypeError: __init__() got multiple values for argument 'mean'

Depending on the shape of landmarks Normalize might not work.
In your code you are assuming to normalize image tensors with 3 channels (thus the stats contain 3 values), which I don’t think is your use case.
Calculate the mean and std (or min and max) from all training targets and normalize the landmarks tensors with them manually.

1 Like

so I am using this code block for transformed_dataset

transformed_dataset = MothLandmarksDataset(csv_file='moth_gt.csv',
                                           root_dir='.',
                                           transform=transforms.Compose([
                                               Rescale(256),
                                               RandomCrop(224),
                                               ToTensor()
                                           ]))

for i in range(len(transformed_dataset)):
    sample = transformed_dataset[i]

    print(i, sample['image'].size(), sample['landmarks'].size())

    if i == 3:
        break

Does PyTorch have a utility function for doing data normalization? Is there a simple tutorial you could please link me to so I could learn? I am not sure how to normalize my data using transform

It depends what and how you would like to normalize the data.

E.g. image tensors can be normalized using transforms.Normalize, which will calculate their z-score using the passed mean and std. This would change the values of the image tensor, such that they have a zero mean and unit variance (if the right stats are passed).

However, the landmarks would represent a pixel position inside the image, if I’m not mistaken.
I.e. how would you like to normalize them?
You could divide the values by their max, such that all landmark values are in the range [0, 1], but this would of course mean that you have to “denormalize” them if you want to plot them into the image and calculate the “real” loss.
Nevertheless, normalizing the target (and denormalizing afterwards) might help as shown in the linked thread.

1 Like

should I normalize across all the examples in training set or all examples in my entire dataset? I meant should I find the mean and std of images across all the training set images or all the images in the datasets? Is there an example that shows how to do this from scratch not how to use some mean and std already known? The transfer learning tutorial for example uses an already known mean and std for normalization of images.
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

also, tried this and got error

transformed_dataset = MothLandmarksDataset(csv_file='moth_gt.csv',
                                           root_dir='.',
                                           transform=transforms.Compose([
                                               Rescale(256),
                                               RandomCrop(224),
                                               transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
                                                        std = [ 0.229, 0.224, 0.225 ]),
                                               ToTensor()
                                           ]))

for i in range(len(transformed_dataset)):
    sample = transformed_dataset[i]

    print(i, sample['image'].size(), sample['landmarks'].size())

    if i == 3:
        break

error is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-81-eb8dc46e0284> in <module>
     10 
     11 for i in range(len(transformed_dataset)):
---> 12     sample = transformed_dataset[i]
     13 
     14     print(i, sample['image'].size(), sample['landmarks'].size())

<ipython-input-48-9d04158922fb> in __getitem__(self, idx)
     30 
     31         if self.transform:
---> 32             sample = self.transform(sample)
     33 
     34         return sample

~/anaconda3/lib/python3.7/site-packages/torchvision/transforms/transforms.py in __call__(self, img)
     59     def __call__(self, img):
     60         for t in self.transforms:
---> 61             img = t(img)
     62         return img
     63 

~/anaconda3/lib/python3.7/site-packages/torchvision/transforms/transforms.py in __call__(self, tensor)
    210             Tensor: Normalized Tensor image.
    211         """
--> 212         return F.normalize(tensor, self.mean, self.std, self.inplace)
    213 
    214     def __repr__(self):

~/anaconda3/lib/python3.7/site-packages/torchvision/transforms/functional.py in normalize(tensor, mean, std, inplace)
    278     """
    279     if not torch.is_tensor(tensor):
--> 280         raise TypeError('tensor should be a torch tensor. Got {}.'.format(type(tensor)))
    281 
    282     if tensor.ndimension() != 3:

TypeError: tensor should be a torch tensor. Got <class 'dict'>.

You should use the training dataset statistics only.
To get the stats for this dataset, you could load the complete dataset in a single tensor (if it fits into your memory) and calculate the stats or iterate the dataset and calculate the mean and std from the running stats.

The transform error is raised, since a tensor is expected but you’ve passed a dict so I guess the image tensor might be wrapped inside the dict and you would have to access it.

2 Likes

Hello,
Thanks for the useful information and suggestion.
I have defined a custom loss function that depends on the first and second derivatives of the output of the NN with respect to the input.
(like PDE in https://discuss.pytorch.org/t/first-and-second-derivates-of-the-output-with-respect-to-the-input-inside-a-loss-function/99757)

If I normalize the target and train the model, how can I unnormalize the predictions?
Because of the first and second derivatives of the output of the NN w. r. t. the input used in loss function, I think

pred = model(input) 
pred = pred * target_std 
pred = pred + target_mean

does not work correctly.
Could you please inform me about this question?