Weights of `CrossEntropyLoss` and biasing toward the rare event

I have a VAE that takes molecular graphs as input and is trained to generate molecular graphs in the output. The output is a vector of scores of different classes for each node and each edge. Different classes for edges include different bond types [single, double, triple, or no edge] and classes for nodes include different atom types. To train this model, I compare these scores with target labels using torch.nn.CrossEntropyLoss. Since the number of single bonds in the training dataset is way higher than the number of double and triple bonds, I am trying to use weights for the loss function. The way I impose this weights is using a normalized vector of w_class = (1/total number of occurrence for class). My expectation is that when occurrence of a class in dataset is low, the algorithm is penalize more for misprediction and hence, my model is better trained.

In reality, instead of better training, this biasses the output toward the rare class, i.e. when I generate output from latent space samples, the occurrence of rare classes are significantly increased. For instance, if my dataset had molecular graphs with the following occurrence of edge class [single=100, double=10, triple=5, or no edge=200], when generating samples from z space, I get molecular graphs that have mostly edges of class ‘double’.