Autoencoder gives same output for any input

Hi,
I am new to deep learning. My project goal is signal transformation: I have to develop an autoencoder to transform one signal to another signal. For some reason I am getting same output for any input, once the network is trained. Below is the autoencoder implementation and training looks like.

The shape of X_train and y_train is ([10, 1, 2048]). Please let me know if you have any leads or suggestions.

class AutoEncoder(nn.Module):
  def __init__(self):
    super().__init__()
    #encoder part
    self.conv1=nn.Conv1d(in_channels=1,out_channels=128,kernel_size=8,stride=1,padding=1)
    torch.nn.init.xavier_uniform_(self.conv1.weight) #intialise the parameters to glorot uniform 
    self.conv2=nn.Conv1d(in_channels=128,out_channels=256,kernel_size=8,stride=1,padding=1)
    torch.nn.init.xavier_uniform_(self.conv2.weight) #glorot uniform initialization
    self.conv3=nn.Conv1d(in_channels=256,out_channels=512,kernel_size=8,stride=1,padding=1)
    torch.nn.init.xavier_uniform_(self.conv3.weight)
    self.conv4=nn.Conv1d(in_channels=512,out_channels=1024,kernel_size=8,stride=1,padding=1)
    torch.nn.init.xavier_uniform_(self.conv4.weight)
    
    #decoder part
    self.conv5=nn.Conv1d(in_channels=1024,out_channels=256,kernel_size=2,stride=1,padding=0)
    torch.nn.init.xavier_uniform_(self.conv5.weight)
    self.conv6=nn.Conv1d(in_channels=256,out_channels=128,kernel_size=2,stride=1,padding=0)
    torch.nn.init.xavier_uniform_(self.conv6.weight)
    self.conv7=nn.Conv1d(in_channels=128,out_channels=64,kernel_size=4,stride=1,padding=0)
    torch.nn.init.xavier_uniform_(self.conv7.weight)
    self.conv8=nn.Conv1d(in_channels=64,out_channels=1,kernel_size=4,stride=1,padding=0)
    torch.nn.init.xavier_uniform_(self.conv8.weight)
    self.Uppool5=nn.Upsample(scale_factor=2, mode='nearest')
    self.Uppool6=nn.Upsample(scale_factor=4, mode='nearest')
    self.Uppool7=nn.Upsample(scale_factor=4, mode='nearest')
    self.Uppool8=nn.Upsample(scale_factor=4, mode='nearest')
    self.L1=nn.Linear(2564, 2400)
    self.L2=nn.Linear(2400, 2200)
    self.L3=nn.Linear(2200,2048)
    
    
    
def forward(self, X):
    #encoder part
    X=F.relu(self.conv1(X))
    X=F.max_pool1d(X,3)
    X=F.relu(self.conv2(X))
    X=F.max_pool1d(X,3)
    X=F.relu(self.conv3(X))
    X=F.max_pool1d(X,3)
    X=F.relu(self.conv4(X))
    X=F.max_pool1d(X,3)
    
    #decoder part.
    X=F.relu(self.conv5(X))
    X=self.Uppool5(X)
    X=F.relu(self.conv6(X))
    X=self.Uppool6(X)
    X=F.relu(self.conv7(X))
    X=self.Uppool7(X)
    X=F.relu(self.conv8(X))
    X=self.Uppool8(X)
    X = F.relu(self.L1(X))
    X = F.relu(self.L2(X))
    X =(self.L3(X))
    return X


epochs=5
for i in range(epochs):
#for the training batches
  for b, (X_train, y_train)in enumerate(train_loader):
     b+=1
    #print(b)    
     y_pred=net(X_train)
     y_pred.require_grad=True
     loss1=loss(y_pred,y_train)
   #loss = loss.item()
   # Update parameters
     optimizer.zero_grad()
     loss1.backward()
     optimizer.step()
     if b%40 == 0:
             print(b,loss1.item())

Try to overfit a small dataset (e.g. just 10 samples) by playing around with some hyperparameters and make sure the model is able to do so. Once this is done you could try to scale up the use case again.
Also, remove:

y_pred.require_grad=True

since y_pred should already require gradients (unless it’s detached) and also .require_grad is a new attribute as you might have wanted to manipulate the .requires_grad attribute.

@ptrblck Thank you for your response. As suggested, I have taken 10 samples and tried changing hyperparameters multiple times. I have tried different loss functions, learning rates, removing the initialization of parameters and so on. Also require_grad has been removed. But I am getting the same output for all the inputs. I am using the below code to get the output and comparing with the target manually by plotting.

index=38
with torch.no_grad():
y_pred=net(X_val[index].view(1,1,2048).float())

It’s a bit hard to tell what might be wrong, but since you are not able to overfit a small data sample, I would probably take a look at the model architecture.
Were you using a “known good” model or did you create this architecture as an experiment yourself?