Optimizing diagonal stripe code

Thank you. I have changed the stripe function and it returns a proper stripe:

def stripe(a):
    i, j = a.size()
     assert i >= j
     k, l = a.stride()
     return torch.as_strided(a, (i - j, j), (k, k+1))

a = torch.randn((182, 91)).cuda()

output = stripe(a)
# output.size()
# torch.Size([91, 91])

a = torch.randn((150, 182, 91))
output = list(map(stripe, torch.unbind(a, 0)))
output = torch.stack(output, 0)

# output.size()
# torch.Size([150, 91, 91])

Now I am facing some obscure PyTorch 0.4 error after using that stripe function when the model computes backwards loss:

Traceback (most recent call last):
  File "runner.py", line 305, in <module>
    main()
  File "runner.py", line 249, in main
    loss = model.update(batch)
  File "J:\PyCharmProjects\tac-self-attention\model\rnn.py", line 67, in update
    loss.backward()
  File "J:\Anaconda_Python3_6\envs\cuda2\lib\site-packages\torch\tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "J:\Anaconda_Python3_6\envs\cuda2\lib\site-packages\torch\autograd\__init__.py", line 89, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Tensor: invalid storage offset at c:\programdata\miniconda3\conda-bld\pytorch_1524546371102\work\aten\src\thc\generic/THCTensor.c:759
(cuda2)

This doesn’t occur when using the version with .diag(). Any ideas what causes it?