I am trying to train an autoencoder for the purpose of regression
The network is simple with two linear layers with batch normalization in encoder stage and one simple linear layer with no activation.
I am using L1 loss ,and during training the following loss log was obtained which seems ok given the nature of my data
Device: cpu
Epoch: 000/005 | Batch 0000/0013 | Cost: 70.2134
Epoch: 000/005 | Batch 0001/0013 | Cost: 69.9557
Epoch: 000/005 | Batch 0002/0013 | Cost: 69.7368
Epoch: 000/005 | Batch 0003/0013 | Cost: 69.4878
Epoch: 000/005 | Batch 0004/0013 | Cost: 69.2337
Epoch: 000/005 | Batch 0005/0013 | Cost: 68.9682
Epoch: 000/005 | Batch 0006/0013 | Cost: 68.7011
Epoch: 000/005 | Batch 0007/0013 | Cost: 68.3944
Epoch: 000/005 | Batch 0008/0013 | Cost: 68.1173
Epoch: 000/005 | Batch 0009/0013 | Cost: 67.8054
Epoch: 000/005 | Batch 0010/0013 | Cost: 67.4841
Epoch: 000/005 | Batch 0011/0013 | Cost: 67.1984
Epoch: 000/005 | Batch 0012/0013 | Cost: 69.7803
Epoch: 001/005 | Batch 0000/0013 | Cost: 66.5010
Epoch: 001/005 | Batch 0001/0013 | Cost: 66.2217
Epoch: 001/005 | Batch 0002/0013 | Cost: 65.9123
Epoch: 001/005 | Batch 0003/0013 | Cost: 65.5267
Epoch: 001/005 | Batch 0004/0013 | Cost: 65.2258
Epoch: 001/005 | Batch 0005/0013 | Cost: 64.8399
Epoch: 001/005 | Batch 0006/0013 | Cost: 64.4806
Epoch: 001/005 | Batch 0007/0013 | Cost: 64.0471
Epoch: 001/005 | Batch 0008/0013 | Cost: 63.7403
Epoch: 001/005 | Batch 0009/0013 | Cost: 63.2445
Epoch: 001/005 | Batch 0010/0013 | Cost: 62.8701
Epoch: 001/005 | Batch 0011/0013 | Cost: 62.4358
Epoch: 001/005 | Batch 0012/0013 | Cost: 64.3226
Epoch: 002/005 | Batch 0000/0013 | Cost: 61.6072
Epoch: 002/005 | Batch 0001/0013 | Cost: 61.1589
Epoch: 002/005 | Batch 0002/0013 | Cost: 60.6826
Epoch: 002/005 | Batch 0003/0013 | Cost: 60.1973
Epoch: 002/005 | Batch 0004/0013 | Cost: 59.7261
Epoch: 002/005 | Batch 0005/0013 | Cost: 59.2015
Epoch: 002/005 | Batch 0006/0013 | Cost: 58.7691
Epoch: 002/005 | Batch 0007/0013 | Cost: 58.2427
Epoch: 002/005 | Batch 0008/0013 | Cost: 57.7692
Epoch: 002/005 | Batch 0009/0013 | Cost: 57.2405
Epoch: 002/005 | Batch 0010/0013 | Cost: 56.5343
Epoch: 002/005 | Batch 0011/0013 | Cost: 56.0606
Epoch: 002/005 | Batch 0012/0013 | Cost: 62.1784
Epoch: 003/005 | Batch 0000/0013 | Cost: 54.8221
Epoch: 003/005 | Batch 0001/0013 | Cost: 54.2681
Epoch: 003/005 | Batch 0002/0013 | Cost: 53.5823
Epoch: 003/005 | Batch 0003/0013 | Cost: 53.0807
Epoch: 003/005 | Batch 0004/0013 | Cost: 52.3433
Epoch: 003/005 | Batch 0005/0013 | Cost: 51.6870
Epoch: 003/005 | Batch 0006/0013 | Cost: 51.1865
Epoch: 003/005 | Batch 0007/0013 | Cost: 50.5368
Epoch: 003/005 | Batch 0008/0013 | Cost: 49.5710
Epoch: 003/005 | Batch 0009/0013 | Cost: 49.1832
Epoch: 003/005 | Batch 0010/0013 | Cost: 48.4606
Epoch: 003/005 | Batch 0011/0013 | Cost: 47.7079
Epoch: 003/005 | Batch 0012/0013 | Cost: 60.0171
Epoch: 004/005 | Batch 0000/0013 | Cost: 46.1114
Epoch: 004/005 | Batch 0001/0013 | Cost: 45.3702
Epoch: 004/005 | Batch 0002/0013 | Cost: 44.6226
Epoch: 004/005 | Batch 0003/0013 | Cost: 43.9987
Epoch: 004/005 | Batch 0004/0013 | Cost: 43.0381
Epoch: 004/005 | Batch 0005/0013 | Cost: 42.2448
Epoch: 004/005 | Batch 0006/0013 | Cost: 41.2923
Epoch: 004/005 | Batch 0007/0013 | Cost: 40.5142
Epoch: 004/005 | Batch 0008/0013 | Cost: 39.8084
Epoch: 004/005 | Batch 0009/0013 | Cost: 38.8818
Epoch: 004/005 | Batch 0010/0013 | Cost: 38.1623
Epoch: 004/005 | Batch 0011/0013 | Cost: 36.9880
Epoch: 004/005 | Batch 0012/0013 | Cost: 51.9905
However,when I try to obtain the validation loss,the validation loss ends up going in the range of thousands,I am not sure why this is happening,in both train and validation I am using L1 loss.
Device: cpu
Epoch: 000/005 | Batch 0000/0013 | Cost: 70.2134
Epoch: 000/005 | Batch 0001/0013 | Cost: 69.9557
Epoch: 000/005 | Batch 0002/0013 | Cost: 69.7368
Epoch: 000/005 | Batch 0003/0013 | Cost: 69.4878
Epoch: 000/005 | Batch 0004/0013 | Cost: 69.2337
Epoch: 000/005 | Batch 0005/0013 | Cost: 68.9682
Epoch: 000/005 | Batch 0006/0013 | Cost: 68.7011
Epoch: 000/005 | Batch 0007/0013 | Cost: 68.3944
Epoch: 000/005 | Batch 0008/0013 | Cost: 68.1173
Epoch: 000/005 | Batch 0009/0013 | Cost: 67.8054
Epoch: 000/005 | Batch 0010/0013 | Cost: 67.4841
Epoch: 000/005 | Batch 0011/0013 | Cost: 67.1984
Epoch: 000/005 | Batch 0012/0013 | Cost: 69.7803
Validation Loss Decreased(inf--->220074.531250) Saving The Model
Epoch: 001/005 | Batch 0000/0013 | Cost: 220074.5469
Epoch: 001/005 | Batch 0001/0013 | Cost: 66429.5312
Epoch: 001/005 | Batch 0002/0013 | Cost: 5347.3374
Epoch: 001/005 | Batch 0003/0013 | Cost: 17270.3848
Epoch: 001/005 | Batch 0004/0013 | Cost: 19462.8301
Epoch: 001/005 | Batch 0005/0013 | Cost: 19283.0410
Epoch: 001/005 | Batch 0006/0013 | Cost: 17071.7578
Epoch: 001/005 | Batch 0007/0013 | Cost: 5900.5474
Epoch: 001/005 | Batch 0008/0013 | Cost: 27269.3164
Epoch: 001/005 | Batch 0009/0013 | Cost: 4085.6606
Epoch: 001/005 | Batch 0010/0013 | Cost: 4671.1963
Epoch: 001/005 | Batch 0011/0013 | Cost: 67094.2422
Epoch: 001/005 | Batch 0012/0013 | Cost: 19397.0020
Validation Loss Decreased(220074.531250--->21512.488281) Saving The Model
Epoch: 002/005 | Batch 0000/0013 | Cost: 21512.4863
Epoch: 002/005 | Batch 0001/0013 | Cost: 23520.4238
Epoch: 002/005 | Batch 0002/0013 | Cost: 24536.1270
Epoch: 002/005 | Batch 0003/0013 | Cost: 16122.1826
Epoch: 002/005 | Batch 0004/0013 | Cost: 18098.9043
Epoch: 002/005 | Batch 0005/0013 | Cost: 15330.4785
Epoch: 002/005 | Batch 0006/0013 | Cost: 10604.5000
Epoch: 002/005 | Batch 0007/0013 | Cost: 10036.7793
Epoch: 002/005 | Batch 0008/0013 | Cost: 9192.8408
Epoch: 002/005 | Batch 0009/0013 | Cost: 5178.3882
Epoch: 002/005 | Batch 0010/0013 | Cost: 7630.6143
Epoch: 002/005 | Batch 0011/0013 | Cost: 4561.5088
Epoch: 002/005 | Batch 0012/0013 | Cost: 6063.7090
Validation Loss Decreased(21512.488281--->5502.875977) Saving The Model
Epoch: 003/005 | Batch 0000/0013 | Cost: 5502.8760
Epoch: 003/005 | Batch 0001/0013 | Cost: 5709.9111
Epoch: 003/005 | Batch 0002/0013 | Cost: 6867.4663
Epoch: 003/005 | Batch 0003/0013 | Cost: 5088.6343
Epoch: 003/005 | Batch 0004/0013 | Cost: 5191.8086
Epoch: 003/005 | Batch 0005/0013 | Cost: 4265.2114
Epoch: 003/005 | Batch 0006/0013 | Cost: 4393.9121
Epoch: 003/005 | Batch 0007/0013 | Cost: 4127.4048
Epoch: 003/005 | Batch 0008/0013 | Cost: 3400.7935
Epoch: 003/005 | Batch 0009/0013 | Cost: 3129.0066
Epoch: 003/005 | Batch 0010/0013 | Cost: 3357.7268
Epoch: 003/005 | Batch 0011/0013 | Cost: 3299.5447
Epoch: 003/005 | Batch 0012/0013 | Cost: 2972.0579
Validation Loss Decreased(5502.875977--->2440.738770) Saving The Model
Epoch: 004/005 | Batch 0000/0013 | Cost: 2440.7390
Epoch: 004/005 | Batch 0001/0013 | Cost: 2696.4646
Epoch: 004/005 | Batch 0002/0013 | Cost: 2442.3899
Epoch: 004/005 | Batch 0003/0013 | Cost: 2186.0679
Epoch: 004/005 | Batch 0004/0013 | Cost: 2166.6067
Epoch: 004/005 | Batch 0005/0013 | Cost: 2227.7563
Epoch: 004/005 | Batch 0006/0013 | Cost: 2022.2296
Epoch: 004/005 | Batch 0007/0013 | Cost: 1790.8546
Epoch: 004/005 | Batch 0008/0013 | Cost: 1893.8618
Epoch: 004/005 | Batch 0009/0013 | Cost: 5180.7158
Epoch: 004/005 | Batch 0010/0013 | Cost: 1984.6993
Epoch: 004/005 | Batch 0011/0013 | Cost: 2232.0305
Epoch: 004/005 | Batch 0012/0013 | Cost: 2158.0151
Validation Loss Decreased(2440.738770--->1890.082397) Saving The Model
I was wondering is there something wrong with my training code shown below
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for epoch in range(num_epochs):
for batch_idx,(x,y) in enumerate(train_loader):
features=func(x,y)
features.to(device)
#Forward and backward prop
decoded=model(features.float())
cost=F.l1_loss(decoded,features.float())
optimizer.zero_grad()
cost.backward()
#Update model parameters
optimizer.step()
###Logging
print ('Epoch: %03d/%03d | Batch %04d/%04d | Cost: %.4f'
%(epoch, num_epochs, batch_idx,
len(train_loader), cost))
model.eval()
for batch_idx,(x,y) in enumerate(val_loader):
features_val=func(x,y)
features.to(device)
#Decode
decoded=model(features.float())
val_cost=F.l1_loss(decoded,features.float())
if min_val_loss>val_cost:
print(f'Validation Loss Decreased({min_val_loss:.6f}--->{val_cost:.6f}) \t Saving The Model')
min_val_loss=val_cost
Some insights about the data
I have a 256x256 image and 17,000 npy files containing [x,y,z] coordinates.
I am trying to flatten the image into a 1d vector,add the x,yz coordinate at the end of this and then use that as my input to the model.
The objective would be then to reconstruct a new feature representation whose last 3 columsn will give me new values about the x,y and z coordinates.