Deal with saved features (very large)


My model is a transformer and have a single GPU, so naturally I have CUDA memory constraints. I’ve saved Faster RCNN features in a tsv format which is 40 GB in size. Now when I start, all the data from the features is loaded in CPU RAM and I run of out of memory. I’ve set num_workers=1 btw. Another option is obtain features from Faster RCNN in real time but both transformer and faster_rcnn can’t sit on my GPU RAM. Is there an efficient way of obtaining features from the tsv file ?
My code looks like:

with open(fname) as f:
        reader = csv.DictReader(f, FIELDNAMES, delimiter="\t")
        for i, item in tqdm(enumerate(reader)):

            for key in ['img_h', 'img_w', 'num_boxes']:
                item[key] = int(item[key])
            boxes = item['num_boxes']
            decode_config = [
                ('objects_id', (boxes, ), np.int64),
                ('objects_conf', (boxes, ), np.float32),
                ('attrs_id', (boxes, ), np.int64),
                ('attrs_conf', (boxes, ), np.float32),
                ('boxes', (boxes, 4), np.float32),
                ('features', (boxes, -1), np.float32),
            for key, shape, dtype in decode_config:
                item[key] = np.frombuffer(base64.b64decode(item[key]), dtype=dtype)
                item[key] = item[key].reshape(shape)

    return data