Model Predictions gets slower after each batch

I have trained a neural network model which would generate the input data’s embeddings. Training is done easily.

Now, I wanted to generate the embeddings of the training dataset from the model. Currently, how I’ve implemented is that there is a dataloader for training dataset which is passed to the model to generate the embeddings. Once, model generates the embeddings, it is stored in a numpy array.

model.eval()
FX      = np.zeros((1,300)) # embedding size is 300
print('\nGenerating Embeddings..')

    with torch.no_grad():

        for bi, data in tqdm(enumerate(dataloader), total=len(dataloader)):

            ids            = data["ids"]
            mask           = data["mask"]
            token_type_ids = data["token_type_ids"]
            target         = data["targets"]        # This is 1-hot label row
            target         = target.squeeze(1)

            ids            = ids.to(config.device, dtype=torch.long)
            mask           = mask.to(config.device, dtype=torch.long)
            token_type_ids = token_type_ids.to(config.device, dtype=torch.long)

            output         = model(ids, mask, token_type_ids)

            FX             = np.append(FX, output.detach().cpu().numpy(), 0)

Problem I’m facing is as epoch moves ahead, it gets slower solely due to storing the model output in FX. I’m also detaching the model output from the computation graph and then converting it into numpy array but still GPU memory keeps rising and epoch gets slower after batch by batch.

I have tried commenting out FX = np.append(FX, output.detach().cpu().numpy(), 0) and I found epoch doesn’t get slowed down. It worked perfectly.

How can I store the embeddings from the model such that the epoch doesn’t get slower?

@theGuyWithBlackTie This looks more of a numpy problem rather than a pytorch problem

e.g.

import time
X = None
batch_size=1024
embedding_size=5000
for i in range(100):
    start_time = time.time()
    if X is None:
        X = np.random.rand(batch_size, embedding_size)
    else:
        X = np.append(X, np.random.rand(batch_size, embedding_size), 0)
    if i%10==0
        print("Shape is {0} at time {1}".format(X.shape, time.time() - start_time))
Shape is (82944, 5000) at time 2.2030396461486816
Shape is (83968, 5000) at time 1.750520944595337
Shape is (84992, 5000) at time 1.9004521369934082
Shape is (86016, 5000) at time 2.32991361618042
Shape is (87040, 5000) at time 2.1326563358306885
Shape is (88064, 5000) at time 2.3492512702941895
Shape is (89088, 5000) at time 2.04399037361145
Shape is (90112, 5000) at time 1.9996109008789062
Shape is (91136, 5000) at time 1.681365728378296
Shape is (92160, 5000) at time 2.7774124145507812
Shape is (93184, 5000) at time 29.736555099487305
Shape is (94208, 5000) at time 60.32349133491516
Shape is (95232, 5000) at time 93.5995888710022
Shape is (96256, 5000) at time 68.51450300216675
Shape is (97280, 5000) at time 101.32002997398376
Shape is (98304, 5000) at time 26.734136819839478
Shape is (99328, 5000) at time 27.648850679397583
Shape is (100352, 5000) at time 93.45121455192566
Shape is (101376, 5000) at time 27.221306562423706
Shape is (102400, 5000) at time 99.91760969161987

A possible solution can be using a list and then stacking it in one go

import numpy as np
batch_size=1024
embedding_size=5000
%time data = [np.random.rand(batch_size, embedding_size) for x in range(100)]
%time np.stack(data).shape
CPU times: user 3.84 s, sys: 944 ms, total: 4.79 s
Wall time: 7.07 s
CPU times: user 944 ms, sys: 2.66 s, total: 3.6 s
Wall time: 1min 11s

(100, 1024, 5000)
1 Like

Thanks.
It solved my problem.