dataset

Type Members

trait AbstractDataSet[D, DataSequence] extends AnyRef

A set of data which is used in the model optimization process.
A set of data which is used in the model optimization process. The dataset can be access in a random data sample sequence. In the training process, the data sequence is a looped endless sequence. While in the validation process, the data sequence is a limited length sequence. User can use the data() method to get the data sequence.
The sequence of the data is not fixed. It can be changed by the shuffle() method.
User can create a dataset from a RDD, an array and a folder, etc. The DataSet object provides many factory methods.
D
Data type
DataSequence
Represent a sequence of data
class ArraySample[T] extends Sample[T]

A kind of sample who use only one array
case class ByteRecord(data: Array[Byte], label: Float) extends Product with Serializable

A byte array and a label.
A byte array and a label. It can contain anything.
class CachedDistriDataSet[T] extends DistributedDataSet[T]

Wrap a RDD as a DataSet.
class ChainedTransformer[A, B, C] extends Transformer[A, C]

A transformer chain two transformer together.
A transformer chain two transformer together. The output type of the first transformer should be same with the input type of the second transformer.
A
input type of the first transformer
B
output type of the first transformer, as well as the input type of the last transformer
C
output of the last transformer
class DefaultPadding extends PaddingStrategy
trait DistributedDataSet[T] extends AbstractDataSet[T, RDD[T]]

Represent a distributed data.
Represent a distributed data. Use RDD to go through all data.
case class FixedLength(fixedLength: Array[Int]) extends PaddingStrategy with Product with Serializable

Set the first dimension's length to fixed length.
Set the first dimension's length to fixed length.
fixedLength
fixed length
class Identity[A] extends Transformer[A, A]

Just transform the input to output.
abstract class Image extends Serializable

Represent an image
trait Label[T] extends AnyRef

Represent a label
class LocalArrayDataSet[T] extends LocalDataSet[T]

Wrap an array as a DataSet.
trait LocalDataSet[T] extends AbstractDataSet[T, Iterator[T]]

Manage some 'local' data, e.g.
Manage some 'local' data, e.g. data in files or memory. We use iterator to go through the data.
class LocalImagePath extends AnyRef

Represent a local file path of an image file
case class LocalSeqFilePath(path: Path) extends Product with Serializable

Represent a local file path of a hadoop sequence file
trait MiniBatch[T] extends Serializable

A interface for MiniBatch.
A interface for MiniBatch. A MiniBatch contains a few samples.
T
Numeric type
case class PaddingLongest(paddingLength: Array[Int]) extends PaddingStrategy with Product with Serializable

Add an constant length to longest feature in the first dimension
case class PaddingParam[T](paddingTensor: Option[Array[Tensor[T]]] = None, paddingStrategy: PaddingStrategy = new DefaultPadding)(implicit evidence$14: ClassTag[T]) extends Serializable with Product

Feature Padding param for MiniBatch.
Feature Padding param for MiniBatch.
For constructing a mini batch, we need to make sure all samples' feature and label in this mini batch have the same size. If the size is different, we will pad them to the same size.
By default, we will pad the first dimension to the longest size with zero in the MiniBatch. If you want to specify the padding values, you can set paddingTensor; If you want to specify the padding length, you can use PaddingLongest or FixedLength.
For example, your feature size is n*m*k, you should provide a 2D tensor in a size of m*k. If your feature is 1D, you can provide a one-element 1D tensor.
For example, we have 3 Sample, and convert them into a MiniBatch. Sample1's feature is a 2*3 tensor {1, 2, 3, 4, 5, 6}
Sample2's feature is a 1*3 tensor {7, 8, 9}
Sample3's feature is a 3*3 tensor {10, 11, 12, 13, 14, 15, 16, 17, 18}
And the paddingTensor is {-1, -2, -3}, use FixedLength(Array(4)), the MiniBatch will be a tensor of 3*4*3: {1, 2, 3, 4, 5, 6, -1, -2, -3, -1, -2, -3
7, 8, 9, -1, -2, -3, -1, -2, -3, -1, -2, -3
10, 11, 12, 13, 14, 15, 16, 17, 18 -1, -2, -3}
T
numeric type
paddingTensor
paddings tensor for the first dimension(by default None, meaning zero padding).
paddingStrategy
See PaddingLongest, FixedLength
abstract class PaddingStrategy extends Serializable
abstract class Sample[T] extends Serializable

Class that represents the features and labels of a data sample.
Class that represents the features and labels of a data sample.
T
numeric type
class SampleToMiniBatch[T] extends Transformer[Sample[T], MiniBatch[T]]

Convert a sequence of Sample to a sequence of MiniBatch through function toMiniBatch.
abstract class Sentence[T] extends Serializable

Represent a sentence
class SparseMiniBatch[T] extends ArrayTensorMiniBatch[T]

SparseMiniBatch is a MiniBatch type for TensorSample.
SparseMiniBatch is a MiniBatch type for TensorSample. And SparseMiniBatch could deal with SparseTensors in TensorSample.
T
Numeric type
class TensorSample[T] extends Sample[T]

A kind of Sample who hold both DenseTensor and SparseTensor as features.
A kind of Sample who hold both DenseTensor and SparseTensor as features.
T
numeric type
trait Transformer[A, B] extends Serializable

Transform a data stream of type A to type B.
Transform a data stream of type A to type B. It is usually used in data pre-process stage. Different transformers can compose a pipeline. For example, if there're transformer1 from A to B, transformer2 from B to C, and transformer3 from C to D, you can compose them into a bigger transformer from A to D by transformer1 -> transformer2 -> transformer 3.
The purpose of transformer is for code reuse. Many deep learning share many common data pre-process steps. User needn't write them every time, but can reuse others work.
Transformer can be used with RDD(rdd.mapPartition), iterator and DataSet.
class SampleToBatch[T] extends Transformer[Sample[T], MiniBatch[T]]

Convert a sequence of single-feature and single-label Sample to a sequence of MiniBatch, optionally padding all the features (or labels) in the mini-batch to the same length
Convert a sequence of single-feature and single-label Sample to a sequence of MiniBatch, optionally padding all the features (or labels) in the mini-batch to the same length

Annotations
@deprecated
Deprecated
(Since version 0.2.0) Use SampleToMiniBatch instead

Value Members

object ArraySample extends Serializable
object DataSet

Common used DataSet builder.
object Identity extends Serializable
object MiniBatch extends Serializable
object Sample extends Serializable
object SampleToBatch extends Serializable

Convert a sequence of Sample to a sequence of MiniBatch, optionally padding all the features (or labels) in the mini-batch to the same length
object SampleToMiniBatch extends Serializable
object SparseMiniBatch extends Serializable
object TensorSample extends Serializable
object Utils
package datamining
package image
package segmentation
package text

package dataset

Type Members

trait AbstractDataSet[D, DataSequence] extends AnyRef

class ArraySample[T] extends Sample[T]

case class ByteRecord(data: Array[Byte], label: Float) extends Product with Serializable

class CachedDistriDataSet[T] extends DistributedDataSet[T]

class ChainedTransformer[A, B, C] extends Transformer[A, C]

class DefaultPadding extends PaddingStrategy

trait DistributedDataSet[T] extends AbstractDataSet[T, RDD[T]]

case class FixedLength(fixedLength: Array[Int]) extends PaddingStrategy with Product with Serializable

class Identity[A] extends Transformer[A, A]

abstract class Image extends Serializable

trait Label[T] extends AnyRef

class LocalArrayDataSet[T] extends LocalDataSet[T]

trait LocalDataSet[T] extends AbstractDataSet[T, Iterator[T]]

class LocalImagePath extends AnyRef

case class LocalSeqFilePath(path: Path) extends Product with Serializable

trait MiniBatch[T] extends Serializable

case class PaddingLongest(paddingLength: Array[Int]) extends PaddingStrategy with Product with Serializable

case class PaddingParam[T](paddingTensor: Option[Array[Tensor[T]]] = None, paddingStrategy: PaddingStrategy = new DefaultPadding)(implicit evidence$14: ClassTag[T]) extends Serializable with Product

abstract class PaddingStrategy extends Serializable

abstract class Sample[T] extends Serializable

class SampleToMiniBatch[T] extends Transformer[Sample[T], MiniBatch[T]]

abstract class Sentence[T] extends Serializable

class SparseMiniBatch[T] extends ArrayTensorMiniBatch[T]

class TensorSample[T] extends Sample[T]

trait Transformer[A, B] extends Serializable

class SampleToBatch[T] extends Transformer[Sample[T], MiniBatch[T]]

Value Members

object ArraySample extends Serializable

object DataSet

object Identity extends Serializable

object MiniBatch extends Serializable

object Sample extends Serializable

object SampleToBatch extends Serializable

object SampleToMiniBatch extends Serializable

object SparseMiniBatch extends Serializable

object TensorSample extends Serializable

object Utils

package datamining

package image

package segmentation

package text

Ungrouped