input_data = [x for x in range(1, 4000)]
label_data = [((d**3)) for d in input_data] # using power
But, This data learns quickly by the same network
input_data = [x for x in range(1, 4000)]
label_data = [((d*3)) for d in input_data] # using simple multiplication
My code is:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data.dataloader as dataloader
import matplotlib.pyplot as plt
# network
class Network(nn.Module):
def __init__(self):
super(Network, self).__init__()
self.fc1 = nn.Linear(in_features=1, out_features=50, bias=True)
self.fc2 = nn.Linear(in_features=50, out_features=1, bias=True)
def forward(self, x):
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
net = Network()
class MyData(torch.utils.data.Dataset):
def __init__(self, data_input, data_output):
self.data_input = data_input
self.data_output = data_output
def __len__(self):
return len(self.data_input)
def __getitem__(self, idx):
return {
'x':torch.tensor([self.data_input[idx]], dtype=torch.float32),
'y':torch.tensor([self.data_output[idx]], dtype= torch.float32)
}
optimizer = torch.optim.Adam(net.parameters(), lr=0.01)
input_data = [x for x in range(1, 4000)]
# label_data = [((d*3)) for d in input_data] # simple multiplication
label_data = [((d**3)) for d in input_data] # using power
batch_size = 100
data = torch.utils.data.DataLoader(dataset=MyData(input_data, label_data), shuffle=True, batch_size=batch_size)
criterion = nn.MSELoss()
x_data, y_data = list(),list()
for e in range(600):
total_loss = 0
for i, value in enumerate(data):
prediction = net(value['x'])
loss = criterion(prediction, value['y'])
total_loss+= loss.item()/batch_size
optimizer.zero_grad()
loss.backward()
optimizer.step()
y_data.append(total_loss/i+1)
x_data.append(e)
net(torch.tensor([2], dtype=torch.float32)).item()
plt.figure(figsize=(16,9))
plt.xlabel('Epoch -->', size=20)
plt.ylabel('Loss -->', size=20)
plt.xticks(size=20)
plt.yticks(size=20)
plt.title('Loss graph')
plt.plot(x_data, y_data, color='green', linewidth=2)
plt.show()
print(total_loss/i+1) # print the last loss only
The range of the target values is completely different (max of 63952011999 vs. 11997) which would explain the failure in learning.
Note that the parameters of a neural network are usually initialized with “small” values using a scaled normal or uniform distribution.
To map the input value of e.g. 3999 to 63952011999 would thus require large weight values, which might take a while during the classical SGD.
Thanks , i understand it now, eg: 3999 to 63952011999 would require large weight values. But neural network weights are initialized with small values. Here i can convert large target values to smaller range of values (eg: between 0 and 1) to solve it. Will it work then?
I don’t think so, because you have to consider that floating point numbers are really quite limited in what they can express, in the case of a simple sigmoid over large values… If you’re only doing small values, it may generalize to large values, but I think this is highly unlikely…
Yes, this could work and is sometimes used if the target doesn’t have “optimal” values.
You could normalize the input (which is usually done anyway) and also normalize the target to train the model.
After the training you could unnormalize the predictions using the target stats and calculate the “real loss” based on this new prediction tensor.
Here is a code snippet using your initial dataset, which can successfully learn the data using this approach:
class Network(nn.Module):
def __init__(self):
super(Network, self).__init__()
self.fc1 = nn.Linear(in_features=1, out_features=50, bias=True)
self.fc2 = nn.Linear(in_features=50, out_features=1, bias=True)
def forward(self, x):
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
model = Network()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.MSELoss()
input = torch.tensor([x for x in range(1, 4000)]).float()
target = torch.tensor([((d**3)) for d in input]).float()
input = input.unsqueeze(1)
target = target.unsqueeze(1)
# normalize input
input_mean = torch.mean(input)
input_std = torch.std(input)
input = input - input_mean
input = input / input_std
# normalize target
target_mean = torch.mean(target)
target_std = torch.std(target)
target_norm = target - target_mean
target_norm = target_norm / target_std
for epoch in range(1000):
optimizer.zero_grad()
out = model(input)
loss = criterion(out, target_norm)
loss.backward()
optimizer.step()
print('Epoch {}, loss {}, max. pred val {}'.format(
epoch, loss.item(), out.max()))
with torch.no_grad():
pred = model(input)
# unnormlalize prediction
pred = pred * target_std
pred = pred + target_mean
plt.figure(figsize=(16,9))
plt.xlabel('Epoch -->', size=20)
plt.ylabel('Loss -->', size=20)
plt.xticks(size=20)
plt.yticks(size=20)
plt.title('Loss graph')
plt.plot(pred.numpy())
plt.plot(target.numpy())
plt.show()
If you comment the normalization out, you will see that the loss is huge (~1e20) and thus the gradients will also be very large. While this might be alright at the beginning of the training, I think you would have to play around with a learning rate scheduler in order to this the original dataset properly, so I would recommend to take a look at the normalization approach instead.
Should I do the normalization in this cell? I am not sure exactly where I should perform this normalization. Also, is input here image and landmarks? or is landmarks targets?
network = Network()
network.cuda()
criterion = nn.MSELoss()
optimizer = optim.Adam(network.parameters(), lr=0.0001)
loss_min = np.inf
num_epochs = 10
start_time = time.time()
for epoch in range(1,num_epochs+1):
loss_train = 0
loss_test = 0
running_loss = 0
network.train()
print('size of train loader is: ', len(train_loader))
for step in range(1,len(train_loader)+1):
batch = next(iter(train_loader))
images, landmarks = batch['image'], batch['landmarks']
images = images.cuda()
landmarks = landmarks.view(landmarks.size(0),-1).cuda()
predictions = network(images)
# clear all the gradients before calculating them
optimizer.zero_grad()
# find the loss for the current step
loss_train_step = criterion(predictions.float(), landmarks.float())
print("type(loss_train_step) is: ", type(loss_train_step))
print("loss_train_step.dtype is: ",loss_train_step.dtype)
# calculate the gradients
loss_train_step.backward()
# update the parameters
optimizer.step()
loss_train += loss_train_step.item()
running_loss = loss_train/step
print_overwrite(step, len(train_loader), running_loss, 'train')
network.eval()
with torch.no_grad():
for step in range(1,len(test_loader)+1):
batch = next(iter(train_loader))
images, landmarks = batch['image'], batch['landmarks']
images = images.cuda()
landmarks = landmarks.view(landmarks.size(0),-1).cuda()
predictions = network(images)
# find the loss for the current step
loss_test_step = criterion(predictions, landmarks)
loss_test += loss_test_step.item()
running_loss = loss_test/step
print_overwrite(step, len(test_loader), running_loss, 'Testing')
loss_train /= len(train_loader)
loss_test /= len(test_loader)
print('\n--------------------------------------------------')
print('Epoch: {} Train Loss: {:.4f} Test Loss: {:.4f}'.format(epoch, loss_train, loss_test))
print('--------------------------------------------------')
if loss_test < loss_min:
loss_min = loss_test
torch.save(network.state_dict(), '../moth_landmarks.pth')
print("\nMinimum Test Loss of {:.4f} at epoch {}/{}".format(loss_min, epoch, num_epochs))
print('Model Saved\n')
print('Training Complete')
print("Total Elapsed Time : {} s".format(time.time()-start_time))
I think landmarks is the target but I might be wrong. Any light in this direction is really appreciated.
Also, doesn’t PyTorch have automatic method for normalization and whitening the data?
landmarks in your code should be the targets.
You could use torchvision.transforms.Normalize to whiten the data or just apply it manually as seen in my code snippet.
Depending on the shape of landmarksNormalize might not work.
In your code you are assuming to normalize image tensors with 3 channels (thus the stats contain 3 values), which I don’t think is your use case.
Calculate the mean and std (or min and max) from all training targets and normalize the landmarks tensors with them manually.
so I am using this code block for transformed_dataset
transformed_dataset = MothLandmarksDataset(csv_file='moth_gt.csv',
root_dir='.',
transform=transforms.Compose([
Rescale(256),
RandomCrop(224),
ToTensor()
]))
for i in range(len(transformed_dataset)):
sample = transformed_dataset[i]
print(i, sample['image'].size(), sample['landmarks'].size())
if i == 3:
break
Does PyTorch have a utility function for doing data normalization? Is there a simple tutorial you could please link me to so I could learn? I am not sure how to normalize my data using transform
It depends what and how you would like to normalize the data.
E.g. image tensors can be normalized using transforms.Normalize, which will calculate their z-score using the passed mean and std. This would change the values of the image tensor, such that they have a zero mean and unit variance (if the right stats are passed).
However, the landmarks would represent a pixel position inside the image, if I’m not mistaken.
I.e. how would you like to normalize them?
You could divide the values by their max, such that all landmark values are in the range [0, 1], but this would of course mean that you have to “denormalize” them if you want to plot them into the image and calculate the “real” loss.
Nevertheless, normalizing the target (and denormalizing afterwards) might help as shown in the linked thread.
should I normalize across all the examples in training set or all examples in my entire dataset? I meant should I find the mean and std of images across all the training set images or all the images in the datasets? Is there an example that shows how to do this from scratch not how to use some mean and std already known? The transfer learning tutorial for example uses an already known mean and std for normalization of images. transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
You should use the training dataset statistics only.
To get the stats for this dataset, you could load the complete dataset in a single tensor (if it fits into your memory) and calculate the stats or iterate the dataset and calculate the mean and std from the running stats.
The transform error is raised, since a tensor is expected but you’ve passed a dict so I guess the image tensor might be wrapped inside the dict and you would have to access it.
If I normalize the target and train the model, how can I unnormalize the predictions?
Because of the first and second derivatives of the output of the NN w. r. t. the input used in loss function, I think
pred = model(input)
pred = pred * target_std
pred = pred + target_mean
does not work correctly.
Could you please inform me about this question?