I trained an auto encoder on my datasets. The dateset includes one image and several npy files that contain information about the location of a coordinate in the image.
For example one entry in the npy file is as follows
[509, 773, 4.834393501281738
The input to the autoencoder is the concatenation of the 256x256 image and this coordinate vector.Meaning that in my case the input dimensions will be (256x256+3,1)
Model details are shown below
Autoencoder model code snippet 1
class AutoEncoder(torch.nn.Module): def __init__(self,num_features,num_hidden_1,num_hidden_2,num_hidden_3=256): super(AutoEncoder,self).__init__() #Encoder self.linear_1=nn.Linear(num_features,num_hidden_1) self.bn_1=nn.BatchNorm1d(num_hidden_1) self.linear_11=nn.Linear(num_hidden_1,num_hidden_2) self.bn_2=nn.BatchNorm1d(num_hidden_2) #Decoder self.linear_21=nn.Linear(num_hidden_2,num_hidden_3) self.linear_22=nn.Linear(num_hidden_3,num_features) self.bn_3=nn.BatchNorm1d(num_hidden_3) self.drop=nn.Dropout(p=0.5) def forward(self,x): encoder=F.leaky_relu(self.bn_1(self.linear_1(x))) encoder=self.drop(encoder) encoder=F.leaky_relu(self.bn_2(self.linear_11(encoder))) decoded=F.leaky_relu(self.bn_3(self.linear_21(encoder))) decoded=self.linear_22(decoded) return decoded
I was able to train the model using L1 loss,and was able to get some consistent results.The issue here is that after concatenaing the image and the coordinate,I had to cast it to float
for evaluation,I am reading the same image of 256x256 and i concatenated it with a 3x1 vector using torch.rand(3,1)
Code snippet 2
img_dir='blender_files/Model_0/image_0/model_0_0.jpg' img=cv2.resize(cv2.imread(img_dir,0),(256,256)) 1.img=torch.from_numpy(img) 2.coord=torch.rand(3,1) 3.img=torch.unsqueeze(torch.flatten(img),1) 4.X=torch.cat((img,coord),dim=0) 5.features=torch.flatten(X.to(device))
The problem here is that when I evaluate the model as shown below,the last 3 values which in the decoded vector(which denote the coordinates) are not coming out to be correct
Code snippet 3
model=AutoEncoder(num_features=features.shape,num_hidden_1=num_hidden_1,num_hidden_2=num_hidden_2,num_hidden_3=num_hidden_3) model.load_state_dict(torch.load('blender_files/Model_0/image_0/saved_model_img.pth')) model.eval() torch.manual_seed(0) with torch.no_grad(): decoded_rep=model(features[None,...]) print(decoded_rep)
I was hoping that the last 3 values will be somewhat similar to the coordinates in the npy file(Example coordinate values: [516, 776, 4.815295696258545])
But instead I am getting
tensor([[1.7252e+08, 1.7989e+08, 1.7386e+08, ..., 1.8035e+08, 1.8012e+08, 1.1754e+07]])
Any ideas why this is happening?
I also noticed that when i feed this single image ,I had to run line 1 to 5 and then the feature vector to the model had to be fed as model(features[None,…]),not doing so will give me the following error
expected 2D or 3D input (got 1D input),which didn't happen during training or validation,as the feature size in that case was torch.Size([16, 65539])
During training ,however,I would simply obtain the npy file coordinate and the image, concatenate them and feed it to the model.
code snippet 4
def func(x,y): new_vec=torch.cat((x,y),dim=1) return new_vec for epoch in range(num_epochs): model.train() for batch_idx,(x,y) in enumerate(train_loader): features=func(x,y) features.to(device) #Forward and backward prop decoded=model(features.float())
Here,x has size of [16,65536] and y has size of [16,3],so concatenation wasn’t an issue.
But during evaluation,
when i executed
code snippet 5
img_dir='blender_files/Model_0/image_0/model_0_0.jpg' img=cv2.resize(cv2.imread(img_dir,0),(256,256)).ravel() coord=torch.rand(3,1) X=torch.cat((img,coord),dim=0)
this would end up giving me the following error
expected Tensor as element 0 in argument 0, but got numpy.ndarray
The error would be resolved,after i executed line 3-5 in code snippet 1.As such,I am not sure why I have to do this when in training I didn’t have to go through so much.
1.Autoencoder model final output is not consitent with the actual coordinates present in npy files
2.Is my evaluation code correct and what steps can I take to improve it so that i can run the model on new images.
Enclosed is the decoded output after running training for one epoch
tensor([[ 6.1226, 6.5084, 5.6056, ..., 5.5449, 5.8391, 5.0617], [ 6.1169, 6.4526, 5.6532, ..., 5.4603, 6.0272, 5.0790], [ 6.1019, 6.4337, 5.6353, ..., 5.4542, 6.0074, 5.0743], ..., [10.0247, 11.1954, 11.3434, ..., 11.6905, 11.6435, 9.8233], [ 6.1019, 6.4337, 5.6353, ..., 5.4542, 6.0074, 5.0743], [10.0100, 11.2148, 11.3480, ..., 11.6928, 11.6287, 9.8208]] I am interested in the last 3 values (11.6928,11.6287 and 9.8208)