Maybe it is a stupid question but I still don’t understand, why the output of LSTMCell only consists of hx and cx? Should I add another feed-forward layer to compute o(t) based on hx(t-1) and x(t) ?
hx is what is the output. You can try that in nn.LSTM and compare the hx to the last output.

But the computation of o(t), c(t) and h(t) are different,
Is it ok to regard h(t) directly as o(t) ?

Well, h already has o applied. Keep in mind that is the output gate, not the output.