superb! should I have both norm_landmarks as well as torch.div by max value in the raw landmarks?
norm_landmarks = transforms.Normalize(0.4949, 0.2165)
landmarks = landmarks.unsqueeze_(0)
landmarks = norm_landmarks(landmarks)
#743 is max value in raw landmarks values
landmarks = torch.div(landmarks, 743.)
predictions = network(images)
with the above code, I still end up getting nans after first step.