Autoencoder's output isn't coming out to be correct

sparshgarg23 · January 18, 2022, 11:22am

I trained an auto encoder on my datasets. The dateset includes one image and several npy files that contain information about the location of a coordinate in the image.
For example one entry in the npy file is as follows
[509, 773, 4.834393501281738

The input to the autoencoder is the concatenation of the 256x256 image and this coordinate vector.Meaning that in my case the input dimensions will be (256x256+3,1)

Model details are shown below
Autoencoder model code snippet 1

class AutoEncoder(torch.nn.Module):
    
    def __init__(self,num_features,num_hidden_1,num_hidden_2,num_hidden_3=256):
        super(AutoEncoder,self).__init__()
        #Encoder
        self.linear_1=nn.Linear(num_features,num_hidden_1)
        self.bn_1=nn.BatchNorm1d(num_hidden_1)
        self.linear_11=nn.Linear(num_hidden_1,num_hidden_2)
        self.bn_2=nn.BatchNorm1d(num_hidden_2)
        #Decoder
        self.linear_21=nn.Linear(num_hidden_2,num_hidden_3)
        self.linear_22=nn.Linear(num_hidden_3,num_features)
        self.bn_3=nn.BatchNorm1d(num_hidden_3)
        self.drop=nn.Dropout(p=0.5)
    
    def forward(self,x):
        encoder=F.leaky_relu(self.bn_1(self.linear_1(x)))
        encoder=self.drop(encoder)
        encoder=F.leaky_relu(self.bn_2(self.linear_11(encoder)))
        decoded=F.leaky_relu(self.bn_3(self.linear_21(encoder)))
        decoded=self.linear_22(decoded)
        return decoded

I was able to train the model using L1 loss,and was able to get some consistent results.The issue here is that after concatenaing the image and the coordinate,I had to cast it to float

for evaluation,I am reading the same image of 256x256 and i concatenated it with a 3x1 vector using torch.rand(3,1)
Code snippet 2

img_dir='blender_files/Model_0/image_0/model_0_0.jpg'
img=cv2.resize(cv2.imread(img_dir,0),(256,256))
1.img=torch.from_numpy(img)
2.coord=torch.rand(3,1)
3.img=torch.unsqueeze(torch.flatten(img),1)
4.X=torch.cat((img,coord),dim=0)
5.features=torch.flatten(X.to(device))

The problem here is that when I evaluate the model as shown below,the last 3 values which in the decoded vector(which denote the coordinates) are not coming out to be correct
Code snippet 3

model=AutoEncoder(num_features=features.shape[0],num_hidden_1=num_hidden_1,num_hidden_2=num_hidden_2,num_hidden_3=num_hidden_3)
model.load_state_dict(torch.load('blender_files/Model_0/image_0/saved_model_img.pth'))
model.eval()
torch.manual_seed(0)
with torch.no_grad():
    decoded_rep=model(features[None,...])
print(decoded_rep)

I was hoping that the last 3 values will be somewhat similar to the coordinates in the npy file(Example coordinate values: [516, 776, 4.815295696258545])
But instead I am getting

tensor([[1.7252e+08, 1.7989e+08, 1.7386e+08,  ..., 1.8035e+08, 1.8012e+08,
         1.1754e+07]])

Any ideas why this is happening?
I also noticed that when i feed this single image ,I had to run line 1 to 5 and then the feature vector to the model had to be fed as model(features[None,…]),not doing so will give me the following error

expected 2D or 3D input (got 1D input),which didn't happen during training or validation,as the feature size in that case was torch.Size([16, 65539])

During training ,however,I would simply obtain the npy file coordinate and the image, concatenate them and feed it to the model.
code snippet 4

def func(x,y):
    new_vec=torch.cat((x,y),dim=1)
    return new_vec

 for epoch in range(num_epochs):
        model.train()
        for batch_idx,(x,y) in enumerate(train_loader):
            features=func(x,y)
            features.to(device)
            #Forward and backward prop
            decoded=model(features.float())

Here,x has size of [16,65536] and y has size of [16,3],so concatenation wasn’t an issue.
But during evaluation,
when i executed
code snippet 5

img_dir='blender_files/Model_0/image_0/model_0_0.jpg'
img=cv2.resize(cv2.imread(img_dir,0),(256,256)).ravel()
coord=torch.rand(3,1)
X=torch.cat((img,coord),dim=0)

this would end up giving me the following error

expected Tensor as element 0 in argument 0, but got numpy.ndarray

The error would be resolved,after i executed line 3-5 in code snippet 1.As such,I am not sure why I have to do this when in training I didn’t have to go through so much.

DLTR
1.Autoencoder model final output is not consitent with the actual coordinates present in npy files
2.Is my evaluation code correct and what steps can I take to improve it so that i can run the model on new images.

Enclosed is the decoded output after running training for one epoch

tensor([[ 6.1226,  6.5084,  5.6056,  ...,  5.5449,  5.8391,  5.0617],
        [ 6.1169,  6.4526,  5.6532,  ...,  5.4603,  6.0272,  5.0790],
        [ 6.1019,  6.4337,  5.6353,  ...,  5.4542,  6.0074,  5.0743],
        ...,
        [10.0247, 11.1954, 11.3434,  ..., 11.6905, 11.6435,  9.8233],
        [ 6.1019,  6.4337,  5.6353,  ...,  5.4542,  6.0074,  5.0743],
        [10.0100, 11.2148, 11.3480,  ..., 11.6928, 11.6287,  9.8208]]
I am interested in the last 3 values (11.6928,11.6287 and 9.8208)