PGCriterion

The Criterion to compute the negative policy gradient given a multinomial distribution and the sampled action and reward.

The input to this criterion should be a 2-D tensor representing a batch of multinomial distribution, the target should also be a 2-D tensor with the same size of input, representing the sampled action and reward/advantage with the index of non-zero element in the vector represents the sampled action and the non-zero element itself represents the reward. If the action is space is large, you should consider using SparseTensor for target.

The loss computed is simple the standard policy gradient,

loss = - 1/n * sum(R_{n} dot_product log(P_{n}))

where R_{n} is the reward vector, and P_{n} is the input distribution.

Annotations: @SerialVersionUID( 76404060368920472L )

Linear Supertypes

TensorCriterion[T], AbstractCriterion[Tensor[T], Tensor[T], T], Serializable, Serializable, AnyRef, Any

Instance Constructors

new PGCriterion(sizeAverage: Boolean = false)(implicit arg0: ClassTag[T], ev: TensorNumeric[T])

sizeAverage
whether to average the loss over each observations.

Value Members

final def !=(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def !=(arg0: Any): Boolean

Definition Classes
Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def ==(arg0: Any): Boolean

Definition Classes
Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def backward(input: Tensor[T], target: Tensor[T]): Tensor[T]

Performs a back-propagation step through the criterion, with respect to the given input.
Performs a back-propagation step through the criterion, with respect to the given input.
input
input data
target
target
returns
gradient corresponding to input data

Definition Classes
AbstractCriterion
def canEqual(other: Any): Boolean

Definition Classes
AbstractCriterion
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def cloneCriterion(): AbstractCriterion[Tensor[T], Tensor[T], T]

Deep copy this criterion
Deep copy this criterion
returns
a deep copied criterion

Definition Classes
AbstractCriterion
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(other: Any): Boolean

Definition Classes
AbstractCriterion → AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def forward(input: Tensor[T], target: Tensor[T]): T

Takes an input object, and computes the corresponding loss of the criterion, compared with target.
Takes an input object, and computes the corresponding loss of the criterion, compared with target.
input
input data
target
target
returns
the loss of criterion

Definition Classes
AbstractCriterion
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
var gradInput: Tensor[T]

Definition Classes
AbstractCriterion
def hashCode(): Int

Definition Classes
AbstractCriterion → AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
var output: T

Definition Classes
AbstractCriterion
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
def updateGradInput(input: Tensor[T], target: Tensor[T]): Tensor[T]

Computing the gradient of the criterion with respect to its own input.
Computing the gradient of the criterion with respect to its own input. This is returned in gradInput. Also, the gradInput state variable is updated accordingly.
input
input data
target
target data / labels
returns
gradient of input

Definition Classes
PGCriterion → AbstractCriterion
def updateOutput(input: Tensor[T], target: Tensor[T]): T

Computes the loss using input and objective function.
Computes the loss using input and objective function. This function returns the result which is stored in the output field.
input
input of the criterion
target
target or labels
returns
the loss of the criterion

Definition Classes
PGCriterion → AbstractCriterion
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

class PGCriterion[T] extends TensorCriterion[T]

Instance Constructors

new PGCriterion(sizeAverage: Boolean = false)(implicit arg0: ClassTag[T], ev: TensorNumeric[T])

Value Members

final def !=(arg0: AnyRef): Boolean

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: AnyRef): Boolean

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def backward(input: Tensor[T], target: Tensor[T]): Tensor[T]

def canEqual(other: Any): Boolean

def clone(): AnyRef

def cloneCriterion(): AbstractCriterion[Tensor[T], Tensor[T], T]

final def eq(arg0: AnyRef): Boolean

def equals(other: Any): Boolean

def finalize(): Unit

def forward(input: Tensor[T], target: Tensor[T]): T

final def getClass(): Class[_]

var gradInput: Tensor[T]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

var output: T

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

def updateGradInput(input: Tensor[T], target: Tensor[T]): Tensor[T]

def updateOutput(input: Tensor[T], target: Tensor[T]): T

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from TensorCriterion[T]

Inherited from AbstractCriterion[Tensor[T], Tensor[T], T]

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped