Learning from label proportions

Can someone please provide a simple implementation of this based on simple tensors and not large image data? I cannot understand how this works when looking at their code.