CUDA running out of memory despite nvidia-smi saying the oposite

Thanks, but that will not help as

  1. I put the i puts there one by one and
  2. as I stated, it works perfectly fine for several hundreds iterations. Hence I think there must be some mem leak or incorrectly released memory.