SCAN for clustering in NLP

sans_sense · July 18, 2021, 2:50pm

For text clustering have found technique introduced in SCAN paper for image classification very useful. I have used BERT for pretext and SCANloss for classification. Have wrote findings in my paper. Has anyone else experimented with SCAN for text clustering. Unlike k-means we get balanced clusters and a confidence value instead of distance. Distance within a cluster turns a bit subjective, while confidence learned by the system incorporates it already. Everything is implemented in pytorch, this post is a bit general, sorry if it is the wrong forum.