I’m currently developing an op that operates over a contiguous final dimension. Both the cpu and gpu variants use the same code to pre-shape input and post-shape the output tensor. Currently it seems only the cpu variant has an issue with reshaping the output: malloc(): corrupted top size
. I’ve tried a few combinations of view/reshape assigning a new tensor or mutating inplace, but I keep getting this problem. Again, the gpu variant is fine (even with same code), just the cpu is a problem, all the requested shapes look okay with the good ol print debugging (requested permute is just 1,0 on a 2dim tensor).
void restoreOutputShape(torch::Tensor& output, c10::IntArrayRef inShape, int64_t dim)
{
output = output.reshape(getPermutedShape(inShape, dim));
if (dim != (output.ndimension() - 1))
{
output = output.permute(getReversePermutation(output.ndimension(), dim));
}
output = output.contiguous();
}
Open sourcing now even though still WIP so it can be inspected.