Method for better utilization of GPU memory for Kmeans clustering

Sabra_Ossen · November 6, 2017, 8:17pm

I have implemented K means clustering algorithm in GPU using PyTorch. I have a Tesla K80 GPU (11GB memory).

I have used the following methods to be able to increase the number of data points and clusters.

Explicitly delete variables initialized once they are out of scope, this releases GPU memory that has no use.
Used half precision floating point.

Still. I was only able to classify a maximum of 3 million data points into 500 clusters. You will be able to find the code below.

gist.github.com

https://gist.github.com/SabraO/c59edbaeb9141d88db3f1e3a0e4d3ccb

kmeans.py

import torch
import numpy as np
from datetime import datetime

n_clusters = 500 # Max Possible 485 (32-bit)
n_data_samples = 2950000 # Max possible 2M (32-bit)
n_iterations = 2

with torch.cuda.device(0):
    # Generate sample values

This file has been truncated. show original

Is there any better method supported by PyTorch to utilize the GPU memory such that the GPU memory is used for mostly calculation while the data is being streamed from CPU to GPU. This way while the data (matrices) is being transferred to the GPU, calculation on some other parts are done. This way the whole dataset doesn’t need to completely be present in the GPU. Then I will be able to cluster more data points into more clusters. Please let me know if it is possible.

Any help on this issue is appreciated.

subhadarshi · February 4, 2020, 11:41pm

check out this github repo.

find documentation here.

On a GPU (in google colab), clustering 10 million 2D samples into 3 clusters takes about 25 seconds.