Down/Upsampling 2D tensor

I am working on a regression problem related to material science by using residual NNs. My whole neural network is using fully connected layers with residual connections along with Batch-Normalization and Activation. The following is the network that I have tried to implement which is from the Paper: IRNet

import torch.nn as nn
import torch.nn.functional as F

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.linear1 = nn.Linear(in_features=86, out_features=1024)
        self.bn1 = nn.BatchNorm1d(num_features=1024)
        self.linear2 = nn.Linear(in_features=1024, out_features=1024)
        self.linear3 = nn.Linear(in_features=1024, out_features=1024)
        self.linear4 = nn.Linear(in_features=1024, out_features=1024)
        self.linear5 = nn.Linear(in_features=1024, out_features=512)
        self.bn2 = nn.BatchNorm1d(num_features=512)
        self.linear6 = nn.Linear(in_features=512, out_features=512)
        self.linear7 = nn.Linear(in_features=512, out_features=512)
        self.linear8 = nn.Linear(in_features=512, out_features=256)
        self.bn3 = nn.BatchNorm1d(num_features=256)
        self.linear9 = nn.Linear(in_features=256, out_features=256)
        self.linear10 = nn.Linear(in_features=256, out_features=256)
        self.linear11 = nn.Linear(in_features=256, out_features=128)
        self.bn4 = nn.BatchNorm1d(num_features=128)
        self.linear12 = nn.Linear(in_features=128, out_features=128)
        self.linear13 = nn.Linear(in_features=128, out_features=128)
        self.linear14 = nn.Linear(in_features=128, out_features=64)
        self.bn5 = nn.BatchNorm1d(num_features=64)
        self.linear15 = nn.Linear(in_features=64, out_features=64)
        self.linear16 = nn.Linear(in_features=64, out_features=32)
        self.bn6 = nn.BatchNorm1d(num_features=32)
        self.linear17 = nn.Linear(in_features=32, out_features=1)

    def forward(self, x):
        x = x.unsqueeze(0)
        residual1 = x
        residual1 = F.interpolate(residual1, size=[32,1024], mode='nearest', align_corners=None)
        x = F.relu(self.bn1(self.linear1(x)))
        x += residual1
        residual2 = x
        x = F.relu(self.bn1(self.linear2(x)))
        x += residual2
        residual3 = x
        x = F.relu(self.bn1(self.linear3(x)))
        x += residual3
        residual4 = x
        x = F.relu(self.bn1(self.linear4(x)))
        x += residual4
        residual5 = x
        residual5 = F.interpolate(residual1, size=[32,512], mode='nearest', align_corners=None)        
        x = F.relu(self.bn2(self.linear5(x)))
        x += residual15
        residual6 = x
        x = F.relu(self.bn2(self.linear6(x)))
        x += residual16
        residual7 = x
        x = F.relu(self.bn2(self.linear7(x)))
        x += residual7
        residual8 = x
        residual8 = F.interpolate(residual1, size=[32,256], mode='nearest', align_corners=None)        
        x = F.relu(self.bn3(self.linear8(x)))
        x += residual8
        residual9 = x
        x = F.relu(self.bn3(self.linear9(x)))
        x += residual9
        residual10 = x
        x = F.relu(self.bn3(self.linear10(x)))
        x += residual10
        residual11 = x
        residual11 = F.interpolate(residual1, size=[32,128], mode='nearest', align_corners=None)        
        x = F.relu(self.bn4(self.linear11(x)))
        x += residual11
        residual12 = x
        x = F.relu(self.bn4(self.linear12(x)))
        x += residual12
        residual13 = x
        x = F.relu(self.bn4(self.linear13(x)))
        x += residual13
        residual14 = x
        residual14 = F.interpolate(residual1, size=[32,64], mode='nearest', align_corners=None)                
        x = F.relu(self.bn5(self.linear14(x)))
        x += residual14
        residual15 = x
        x = F.relu(self.bn5(self.linear15(x)))
        x += residual15
        residual16 = x
        residual16 = F.interpolate(residual1, size=[32,32], mode='nearest', align_corners=None)                
        x = F.relu(self.bn6(self.linear16(x)))
        x += residual16
        out = self.linear16(x)
        return out

But it is giving me an error when I try to perform model.train() with model.forward(Xtrain) as shown below:

RuntimeError                              Traceback (most recent call last)
<ipython-input-23-a7d2104011f1> in <module>
     16         # in case you wanted a semi-full example
     17         model.train()
---> 18         outputs = model.forward(batch_x_train)
     19         train_loss = criterion(outputs,batch_y_train)

<ipython-input-20-6d0a8f559e2f> in forward(self, x)
     34         residual1 = x
---> 35         residual1 = F.interpolate(residual1, size=[32,1024], mode='nearest', align_corners=None)
     36         x = F.relu(self.bn1(self.linear1(x)))
     37         x += residual1

~/.local/lib/python3.6/site-packages/torch/nn/ in interpolate(input, size, scale_factor, mode, align_corners)
   2509     if input.dim() == 3 and mode == 'nearest':
-> 2510         return torch._C._nn.upsample_nearest1d(input, _output_size(1))
   2511     elif input.dim() == 4 and mode == 'nearest':
   2512         return torch._C._nn.upsample_nearest2d(input, _output_size(2))

RuntimeError: It is expected output_size equals to 1, but got size 2

Can you tell me what is wrong with my code?
In the following post on pytorch it was said that interpolate could be used for this purpose

Could you print the shape of residual1 before feeding it to F.interpolate, please, so that we could take a look?

The shape of the residual1 before feeding it to F.interpolate was printed as
torch.Size([1, 32, 86]) .
I did it using a random tensor created as follow x = torch.randn(32, 86)

Interpolate expects the 2nd dimension to be “channel” dimension. I’m not sure what your [32, 68] tensor is supposed to represent, but if you want to be able to add it back to the result, interpolate would expect the size argument to have a single value, like F.interpolate(residual1, size=[1024])

1 Like

Thank you for the clarification.
Also, is it ok to perform squeeze/unsqueeze every time whenever I want to add layers with different inputs which as follows:

        x = x.unsqueeze(0)
        residual1 = x
        residual1 = F.interpolate(residual1, size=[1024], mode='nearest', align_corners=None)
        x = x.squeeze(0)
        residual1 = residual1.squeeze(0)
        x = F.relu(self.bn1(self.linear1(x)))
        x += residual1

Will it effect the data-processing or the result?

What do the dimensions represent?
The posted shape should work with your code:

x = torch.randn([1, 32, 86])
y = F.interpolate(x, size=[1024], mode='nearest', align_corners=None)
> torch.Size([1, 32, 1024])

So I’m currently unsure where this error is raised.

In x = torch.randn([1, 32, 86]), 1 is added though unsqueeze operation, 32 represents batch-size and 86 represents number of features.
Initially, I was using interpolate as follows:

        residual1 = x
        residual1 = F.interpolate(residual1, size=[32,1024], mode='nearest', align_corners=None)
        x = F.relu(self.bn1(self.linear1(x)))
        x += residual1

But after correction by @futscdav , I changed it to
residual1 = F.interpolate(residual1, size=[1024], mode='nearest', align_corners=None)

But to be able to pass the torch variable through the Linear layer, I had to squeeze both x and residual1 again. So, my next concern was if I am performing the squeeze/unsqueeze operation several times in the forward will it affect my results?

It might affect the results from certain layers, yes. Autograd is able to track these operations and will properly calculate the gradients during the backward pass, so you would only have to make sure that the shapes are correct.

I am not sure where to put Autograd?
My training and evaluation part is as follows:

for epoch in range(n_epochs):

    # X is a torch Variable
    permutation1 = torch.randperm(new_x_train.size()[0])

    for i in range(0,new_x_train.size()[0], batch_size):

        indices1 = permutation1[i:i+batch_size]
        batch_x_train, batch_y_train = new_x_train[indices1], new_y_train[indices1]

        # in case you wanted a semi-full example
        outputs = model.forward(batch_x_train)
        train_loss = criterion(outputs,batch_y_train)

    y_pred = model(new_x_val)
    val_loss = criterion(y_pred, new_y_val)

You don’t need to do anything. As long as you are using PyTorch operations and don’t break the computation graph, Autograd will track all operations under the hood for you. There is no explicit call for it.

Although, I tried to run the above mentioned structure using pytorch, the result is no where near what the paper is showing. What can be the reason for it?

There could be various reasons, such as different parameter initialization, a potential bug in the architecture, different preprocessing methods, different hyperparameters for the optimizer etc.

It’s hard to tell what exactly doesn’t work. To further debug you could try to overfit a small data sample (e.g. just 10 samples) and make sure your model is able to do so. If your model cannot overfit this small dummy dataset, you might need to play around with the hyperparameters, check the gradient flow etc.