Altering the weights parameter causes GPU OOM

I am attempting to play with adding (and eventually removing) neurons from a network using the function below which direcly alters the weight parameters of a layer.

def add_neuron(self):
    with torch.no_grad():
        self.fc1.weight = nn.Parameter(torch.vstack([self.fc1.weight, torch.zeros([1, input_size])]))
        self.fc1.bias = nn.Parameter(torch.hstack([self.fc1.bias, torch.zeros(1)]))
        self.fc1.out_features += 1
        self.fc2.weight = nn.Parameter(torch.hstack([self.fc2.weight, torch.zeros([num_classes, 1])]))
        self.fc2.in_features += 1

When I do this it quickly runs out of memory and crashes even though the weight matricies are small. There seems to be some heavy over use of memory going on on the GPU as shown with the summary below.

Any ideas what is causing it?

|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 1            |        cudaMalloc retries: 2         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |    6926 MB |    6926 MB |    9953 MB |    3027 MB |
|       from large pool |    6543 MB |    6543 MB |    6543 MB |       0 MB |
|       from small pool |     383 MB |     384 MB |    3410 MB |    3027 MB |
|---------------------------------------------------------------------------|
| Active memory         |    6926 MB |    6926 MB |    9953 MB |    3027 MB |
|       from large pool |    6543 MB |    6543 MB |    6543 MB |       0 MB |
|       from small pool |     383 MB |     384 MB |    3410 MB |    3027 MB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |    7484 MB |    7484 MB |    8248 MB |  782336 KB |
|       from large pool |    7100 MB |    7100 MB |    7100 MB |       0 KB |
|       from small pool |     384 MB |     386 MB |    1148 MB |  782336 KB |
|---------------------------------------------------------------------------|
| Non-releasable memory |  570482 KB |  570678 KB |   10171 MB |    9614 MB |
|       from large pool |  569488 KB |  569488 KB |    5977 MB |    5421 MB |
|       from small pool |     993 KB |   47461 KB |    4194 MB |    4193 MB |
|---------------------------------------------------------------------------|
| Allocations           |    8310    |    8317    |   62207    |   53897    |
|       from large pool |    2351    |    2351    |    2351    |       0    |
|       from small pool |    5959    |    5966    |   59856    |   53897    |
|---------------------------------------------------------------------------|
| Active allocs         |    8310    |    8317    |   62207    |   53897    |
|       from large pool |    2351    |    2351    |    2351    |       0    |
|       from small pool |    5959    |    5966    |   59856    |   53897    |
|---------------------------------------------------------------------------|
| GPU reserved segments |     547    |     547    |     929    |     382    |
|       from large pool |     355    |     355    |     355    |       0    |
|       from small pool |     192    |     193    |     574    |     382    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |     849    |     850    |   43588    |   42739    |
|       from large pool |     237    |     237    |     355    |     118    |
|       from small pool |     612    |     614    |   43233    |   42621    |
|---------------------------------------------------------------------------|
| Oversize allocations  |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Oversize GPU segments |       0    |       0    |       0    |       0    |
|===========================================================================|

I could just preallocate a certain amount of memory and use a mask I guess but I don’t know why this doesn’t work.

I figured out a solution, requires_grad needs to be turned off otherwise memory keeps growing.

    with torch.no_grad():
        self.fc1.bias.requires_grad = False
        self.fc1.weight.requires_grad = False
        self.fc2.weight.requires_grad = False

        self.fc1.weight = nn.Parameter(torch.vstack([self.fc1.weight, torch.zeros([1, input_size])]))
        self.fc1.bias = nn.Parameter(torch.hstack([self.fc1.bias, torch.zeros(1)]))
        self.fc1.out_features += 1
        self.fc2.weight = nn.Parameter(torch.hstack([self.fc2.weight, torch.zeros([num_classes, 1])]))
        self.fc2.in_features += 1

        self.fc1.bias.requires_grad = True
        self.fc1.weight.requires_grad = True
        self.fc2.weight.requires_grad = True