whether to average the loss over each observations.
Performs a back-propagation step through the criterion, with respect to the given input.
Performs a back-propagation step through the criterion, with respect to the given input.
input data
target
gradient corresponding to input data
Deep copy this criterion
Takes an input object, and computes the corresponding loss of the criterion,
compared with target
.
Takes an input object, and computes the corresponding loss of the criterion,
compared with target
.
input data
target
the loss of criterion
Computing the gradient of the criterion with respect to its own input.
Computing the gradient of the criterion with respect to its own input. This is returned in gradInput. Also, the gradInput state variable is updated accordingly.
input data
target data / labels
gradient of input
Computes the loss using input and objective function.
Computes the loss using input and objective function. This function returns the result which is stored in the output field.
input of the criterion
target or labels
the loss of the criterion
The Criterion to compute the negative policy gradient given a multinomial distribution and the sampled action and reward.
The input to this criterion should be a 2-D tensor representing a batch of multinomial distribution, the target should also be a 2-D tensor with the same size of input, representing the sampled action and reward/advantage with the index of non-zero element in the vector represents the sampled action and the non-zero element itself represents the reward. If the action is space is large, you should consider using SparseTensor for target.
The loss computed is simple the standard policy gradient,
loss = - 1/n * sum(R_{n} dot_product log(P_{n}))
where R_{n} is the reward vector, and P_{n} is the input distribution.