Class/Object

com.intel.analytics.bigdl.nn

PGCriterion

Related Docs: object PGCriterion | package nn

Permalink

class PGCriterion[T] extends TensorCriterion[T]

The Criterion to compute the negative policy gradient given a multinomial distribution and the sampled action and reward.

The input to this criterion should be a 2-D tensor representing a batch of multinomial distribution, the target should also be a 2-D tensor with the same size of input, representing the sampled action and reward/advantage with the index of non-zero element in the vector represents the sampled action and the non-zero element itself represents the reward. If the action is space is large, you should consider using SparseTensor for target.

The loss computed is simple the standard policy gradient,

loss = - 1/n * sum(R_{n} dot_product log(P_{n}))

where R_{n} is the reward vector, and P_{n} is the input distribution.

Annotations
@SerialVersionUID()
Linear Supertypes
TensorCriterion[T], AbstractCriterion[Tensor[T], Tensor[T], T], Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. PGCriterion
  2. TensorCriterion
  3. AbstractCriterion
  4. Serializable
  5. Serializable
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new PGCriterion(sizeAverage: Boolean = false)(implicit arg0: ClassTag[T], ev: TensorNumeric[T])

    Permalink

    sizeAverage

    whether to average the loss over each observations.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def backward(input: Tensor[T], target: Tensor[T]): Tensor[T]

    Permalink

    Performs a back-propagation step through the criterion, with respect to the given input.

    Performs a back-propagation step through the criterion, with respect to the given input.

    input

    input data

    target

    target

    returns

    gradient corresponding to input data

    Definition Classes
    AbstractCriterion
  6. def canEqual(other: Any): Boolean

    Permalink
    Definition Classes
    AbstractCriterion
  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def cloneCriterion(): AbstractCriterion[Tensor[T], Tensor[T], T]

    Permalink

    Deep copy this criterion

    Deep copy this criterion

    returns

    a deep copied criterion

    Definition Classes
    AbstractCriterion
  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. def equals(other: Any): Boolean

    Permalink
    Definition Classes
    AbstractCriterion → AnyRef → Any
  11. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. def forward(input: Tensor[T], target: Tensor[T]): T

    Permalink

    Takes an input object, and computes the corresponding loss of the criterion, compared with target.

    Takes an input object, and computes the corresponding loss of the criterion, compared with target.

    input

    input data

    target

    target

    returns

    the loss of criterion

    Definition Classes
    AbstractCriterion
  13. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  14. var gradInput: Tensor[T]

    Permalink
    Definition Classes
    AbstractCriterion
  15. def hashCode(): Int

    Permalink
    Definition Classes
    AbstractCriterion → AnyRef → Any
  16. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  17. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  18. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  19. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  20. var output: T

    Permalink
    Definition Classes
    AbstractCriterion
  21. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  22. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  23. def updateGradInput(input: Tensor[T], target: Tensor[T]): Tensor[T]

    Permalink

    Computing the gradient of the criterion with respect to its own input.

    Computing the gradient of the criterion with respect to its own input. This is returned in gradInput. Also, the gradInput state variable is updated accordingly.

    input

    input data

    target

    target data / labels

    returns

    gradient of input

    Definition Classes
    PGCriterionAbstractCriterion
  24. def updateOutput(input: Tensor[T], target: Tensor[T]): T

    Permalink

    Computes the loss using input and objective function.

    Computes the loss using input and objective function. This function returns the result which is stored in the output field.

    input

    input of the criterion

    target

    target or labels

    returns

    the loss of the criterion

    Definition Classes
    PGCriterionAbstractCriterion
  25. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from TensorCriterion[T]

Inherited from AbstractCriterion[Tensor[T], Tensor[T], T]

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped