Tanh issue: RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead

jan-swiatek · December 12, 2024, 12:05pm

Hi!

I want to try DCCN architecture on my time series data. Since I have 13 input channels I thought about grouping them together into 3, 3, 3 and 4 group (it makes sense and is intuitive because data comes from 4 different devices). I’ve prepared following model

Each DCCN takes (B, CH (3/4), L) as input and returns (B, CH, 1) as output. Such output values from 4 blocks are then concatenated and permuted into (B, 1, 4 * CH). This tensor is finally passed to FC block which returns desired prediction. Forward pass works great, but something is wrong with backpropagation. If I use tanh as activation function, I got an error:

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

But I don’t use .view() anywhere. Adding .continuous() after .permute() doesn’t help as well.

What’s interesting, with ReLU it works.

Do you have any idea how to fix that?

ptrblck · December 12, 2024, 1:36pm

Could you post a minimal and executable code snippet reproducing this error?

jan-swiatek · December 12, 2024, 2:29pm

I found the reason. The issue occurs when I use MPS device. On CPU it works just fine.

# %%
import math
import torch
import torch.nn as nn

# %%
device = "mps"

# %%
class DCCN(nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, activation):
        super().__init__()

        layers = []
        num_layers = int(math.log2(256))

        for i in range(num_layers):
            if i == 0:
                in_ch, out_ch = in_channels, hidden_channels
            elif i < num_layers - 1:
                in_ch, out_ch = hidden_channels, hidden_channels
            else:
                in_ch, out_ch = hidden_channels, out_channels

            layers.append(nn.Conv1d(in_ch, out_ch, kernel_size=2, dilation=2**i))
            layers.append(activation())

        self.model = nn.Sequential(*layers)

    def forward(self, input):
        return self.model(input)


class FC(nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super().__init__()

        layers = []
        num_layers = 3

        for i in range(num_layers):
            if i == 0:
                in_ch, out_ch = in_channels, hidden_channels
            elif i < num_layers - 1:
                in_ch, out_ch = hidden_channels, hidden_channels
            else:
                in_ch, out_ch = hidden_channels, out_channels

            layers.append(nn.Linear(in_ch, out_ch))
            layers.append(nn.ReLU())

        self.model = nn.Sequential(*layers)

    def forward(self, input):
        return self.model(input)


class Model(nn.Module):
    def __init__(self, activation):
        super().__init__()

        self.dccn_1 = DCCN(3, 8, 2, activation=activation)
        self.dccn_2 = DCCN(3, 8, 2, activation=activation)
        self.dccn_3 = DCCN(3, 8, 2, activation=activation)
        self.dccn_4 = DCCN(4, 8, 2, activation=activation)

        self.fc = FC(4 * 2, 16, 3)

    def forward(self, window):
        dccn_1_out = self.dccn_1(window[:, :3])
        dccn_2_out = self.dccn_2(window[:, 3:6])
        dccn_3_out = self.dccn_3(window[:, 6:9])
        dccn_4_out = self.dccn_4(window[:, 9:])

        dccn_out = torch.cat([dccn_1_out, dccn_2_out, dccn_3_out, dccn_4_out], dim=1).permute(0, 2, 1)
        fc_out = self.fc(dccn_out)

        return fc_out

# %%
X, Y = torch.rand(512, 13, 259).to(device), torch.rand(512, 4, 3).to(device)

# %% [markdown]
# ### ReLU

# %%
model = Model(activation=nn.ReLU).to(device)
optimizer = torch.optim.Adam(model.parameters())
criterion = torch.nn.L1Loss()

# %%
model.train()

# %%
Y_pred = model(X)

optimizer.zero_grad()

loss = criterion(Y_pred, Y)

print(Y_pred.shape, Y.shape, loss)

loss.backward()
optimizer.step()

# %% [markdown]
# ### Tanh

# %%
model = Model(activation=nn.Tanh).to(device)
optimizer = torch.optim.Adam(model.parameters())
criterion = torch.nn.L1Loss()

# %%
model.train()

# %%
Y_pred = model(X)

optimizer.zero_grad()

loss = criterion(Y_pred, Y)

print(Y_pred.shape, Y.shape, loss)

loss.backward()
optimizer.step()

ptrblck · December 12, 2024, 2:41pm

Thanks for the update. Unfortunately, I’m not familiar enough with the MPS backend to know what might be causing the issue, but let me move your topic to the right MPS category.

yahyarahhawi · December 28, 2024, 5:15am

I am stuck with the same problem and it only occurs when using mps. have you found a way to solve it? thanks!

soulitzer · December 29, 2024, 3:13am

Doesn’t repro for me on cpu or cuda. Sounds like a bug if just changing to mps device breaks things. Maybe worth filing an issue Sign in to GitHub · GitHub

jan-swiatek · January 7, 2025, 3:23pm

Hi, no unfortunately not. It seems that this problem occurs only on MPS. I switched to colab to use CUDA in this particular case.