How can I get the data from dataloader so I do clustering?

I am beginner. I want to do a clustering my image dataset. How can I get the data from dataloader so I do it?

tfms = transforms.Compose([
transforms.Resize((sz, sz)), # PIL Image
transforms.ToTensor(), # Tensor
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

train_ds = datasets.ImageFolder(trn_dir, transform=tfms)
train_dl = torch.utils.data.DataLoader(train_ds, batch_size=batch_size,
shuffle=True, num_workers=0)

from sklearn.cluster import KMeans
num_class=2
kmeans = KMeans(n_clusters=num_class, random_state=0).fit(???)
center=kmeans.cluster_centers_

The fit() methods expects a numpy array of the complete dataset, so you would need to get all samples of the DataLoader before feeding them to KMeans().fit().
You could use MiniBatchKMeans and use the partial_fit method, if you want to feed each batch to it.

1 Like