Broadcasting the depthwise convolution kernel

You could use expand(256, -1, -1, -1) instead of repeat.
Not however, that you would only save ~9kB of memory, since:

print(lpf.nelement() * 4 / 1024)
> 9.0
2 Likes