Should my weight vector values be calculated from the total distribution or the distribution of the current batch that I’m looking at the specific iteration where I call the loss function?
You should calculate the weight distribution from your training set.
How should the weighting vector look? If I have 100 examples and the distribution looks like this 1: 10, 2: 10, 3: 10, 4: 50, 5: 20, would I want: [1/10 1/10 1/10 1/50 1/20] or [10 10 10 50 20], and why?
You should be weighing in the inverse ratio - by that, I mean classes with more examples should have a lesser weight (because you want to make sure loss goes down for rarer examples). For classes A, B with 90 and 10 samples respectively. The weights I would use is 0.1 for A and 0.9 for B.