OpenCLIP: Visual Encoder Isolation, Memory Management

I am working with pretrained models from OpenCLIP

Once I load a CLIP encoder, does doing this eliminate the rest of the components from the GPU memory?

model, , = open_clip.create_model_and_transforms()
model = model.visual

Should I be doing stuff like:

import gc

full_model, _, _ = open_clip.create_model_and_transforms()

# Isolate the visual component
visual_model = full_model.visual

# Create a new state dict with only the visual component
visual_state_dict = {k: v for k, v in full_model.state_dict().items() if k.startswith('visual.')}

# Create a new model with only the visual component
isolated_model = torch.nn.Module()
isolated_model.visual = visual_model

# Load the visual state dict into the new model
isolated_model.load_state_dict(visual_state_dict, strict=False)

# Delete references to the full model and unused variables
del full_model, visual_model, visual_state_dict

# Force garbage collection
gc.collect()

# Clear CUDA cache if using GPU
if torch.cuda.is_available():
    torch.cuda.empty_cache()

Any better way???

I did this and it worked. I was able to get my code running, it didn’t run beforehand. Any insights on torch memory management which might be useful?

            full_model, _, _ = open_clip.create_model_and_transforms()
            visual_model = copy.deepcopy(full_model.visual)
            del full_model