Using volatile=True for prediction uses more GPU memory than volatile=False

i think measuring GPU memory usage via nvidia-smi might be misleading (and might be the issue). We use a caching allocator and a few tricks, so we cant use nvidia-smi's reporting.