Unusual large memory for con2d with batch_size 1

Pytorch version 0.4.0, python version 2.7 and cuda-8.0, cudnn-8.0-v6

it also happens with Pytorch version 0.4.1.

import torch.nn as nn
import torch
import numpy as np

extract_local_feature = nn.Conv2d(

input = torch.Tensor(np.random.rand(1,256, 112, 112)).cuda()
import ipdb;

out = extract_local_feature(input)


the line out = extract_local_feature(input) gives me 3381MiB / 12196MiB memory usage.
This issues only happen with batch_size = 1

Could someone help me fix this?

Could you try your code again with torch.from_numpy instead of directly wrapping the numpy array in a torch.Tensor?
It’s not recommended to use your approach.

Thanks. It shows the same memory usage. (3381MiB)

I am also facing the same issue, and can reproduce from the provided code snippet. Could we get some help please?

UPDATE: Running torch.cuda.empty_cache() after the operation reduces memory for batch_size 1 to a reasonable amount of memory. As OP said, there is no issue in running this snippet with batch size > 1