CUDA: Out of memory error when using multi-gpu

@ptrblck more memory compared to other gpus or more memory compared to if you were only using 1 gpu?
When I run my code with 1 gpu and batch size 16, it works. But when I run same code with same batch size using 2 gpus (with equal memory) I get out of memory error, and on GPU 1 not on GPU 0, which is strange because my default device is GPU 0.
My issue looks similar to one discussed here: