Using PyTorch to Build Relative Histograms of Text Data

Hello all,

I am entirely new to PyTorch but I am familiar with machine learning from a conceptual perspective. I would like to use PyTorch for a text classification program. Basically, I have lists of words from several documents. I would like to make these lists in to relative histograms for each document where each histogram entry is the number of times each word appears in the document. Since I would like to use each histogram entry as a dimension, I would like entries of 0 occurrences to be made when a word that appears in another document/documents does not appear in the document for which the histogram is associated with.

From there, I would like to use Manhattan distance to compare test documents. How would I go about getting started with this? I don’t expect this to be spoonfed to me, but I have found the documentation to be confusing enough that I am not really sure where to begin with this. Can I even build the histograms using PyTorch or do I have to build them on my own and then pass them off to PyTorch?

Thank you in advance.

Update: Based on this post, it sounds like torch.histc can be used to do what I want. Is that correct?