I have a question about how to write optimization code.
That is I want to write a conditional random field my own.I need to optimize parameter for datalikelihood by pytorch gradient.
The sketch of loss is below:assume theta in the code is the parameter I want to optimize:

for data in dataset:
#here theta need to compute grad
feature = theta.matmul(features)
#here is some function of feature(a function of theta)
#we add this function for all data as the loss
likelihood_loss += some_function(fearure)
likelihood_loss.backward()

Now I want to backprop the sum of likelihood loss to optimize paramer theta.

My question is I don’t know how to define loss function,how to initialize class of each variable to let the gradient being computed correctly.
For example which one is defined as Variable,which are torch.Tensor,which are list

A loss function is just a python function. You can define a loss function as simple as

def my_loss(pred, actual):
return pred-actual

Forward step

Start sequentially from your input, which would be pytorch tensors (if they are of some other form, convert them to pytorch tensors). Now do all your operations that you want. Compute the loss, which is just a function.

Backward step

To compute the gradients use loss.backward() and you got the gradients for your variables. If you only want to update a specific tensor, set requires_grad=True for that tensor and requires_grad=False for all other tensors.
Now to update your parameter you can use your desired optimization algorithm.

def my_optimization(param):
# To get the gradient
grad = param.grad
# SGD
param = param - alpha * grad

I know this block, my problem here is how to initialize variables theta,likelihood here.
For example likelihood = [] or something like likelihood = torch.Tensor() or likelihood = Variable(..) or likelihood = Parameter(...)

Do you want to build your custom loss function? PyTorch has standard loss functions, google is your friend.

Tensor is used for variable which is on a path of your neural network (means that non-linear function between linear algebra mappings), implies that should be derivative in case of parameter. Variable is generally used for variable which is not tensor such as input vector.

If you do so, and the “features” means a set of parameter, then you can use nn.Parameter or nn.ParameterList for your custom layer.

If you do not aim so, then you can use not only loss function but also standard layer architectures such as convolution, fully-connected layers, etc, these have its own parameters.

You do not need to concern the gradient matter (“baseline” optimizer) because PyTorch supports auto gradient ability (back-prop is automatically constructed for your network).

1.You mean tensor can be auto graded?But why there is a option that require_grad = …

2 I need to optimize theta here hence I initialize it as tensor ? can I initialize it by Parameter? since likelihood as loss need grad hence I also initialize it as tensor?Features is something extracted from input data,I need to change it too Variable?

Rather than tensor, on PyTorch, variable in neural network should be auto grad just like a path based on linear algebra, tensor is just like a box, 2-rank is a matrix, 1-rank is vector. Of course parameters should be auto grad in order to update it. You can seek a chain-rule in internet.

Before answer, I would like to confirm reason why you need to optimize the “theta” ?, and what is the “theta” ? then what is “features” you mean ?

That is I have model with parameter theta,it need to get minimize loss on likelihood loss by learning some theta,hence I take “gradient descend” which need to compute gradient instead of compute it analytically I use pytorch here.you can treat feature as input data.X

I think that you confuse parameter and function in python script with PyTorch. Parameter is one of which a weight or a bias, and should be derivative. But at your first comment, “theta” is looked like a function name or something else. because of

feature = theta.matmul(features)

Yo can do such as

self.fc = nn.Linear(RowSize, ClmSize) #2-rank FC Architecture

In your model definition. You can pass through non-linear function like;

pre_activation = self.fc(x) #x is input
post_activation = relu(pre_activation)

After this, a loss function feeds the output and can make loss value by the function, for example;

loss = criterion(post_activation, label)
loss.backward()
optimizer.step()