Package

com.intel.analytics.bigdl.dataset

text

Permalink

package text

Visibility
  1. Public
  2. All

Type Members

  1. class Dictionary extends Serializable

    Permalink

    Class that help build a dictionary either from tokenized text or from saved dictionary

  2. class LabeledSentence[T] extends Sentence[T]

    Permalink

    Represent a sentence

  3. class LabeledSentenceToSample[T] extends Transformer[LabeledSentence[T], Sample[T]]

    Permalink

    if oneHot = true: Transform labeled sentences to one-hot format samples e.g.

    if oneHot = true: Transform labeled sentences to one-hot format samples e.g. sentence._data: [0, 2, 3] sentence._label: [2, 3, 1] vocabLength: 4

    > input: 0, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1 target: [3, 4, 2]

    else: The model will use LookupTable for word embedding.

    > input: [1, 2, 3]

    > label: [2, 3, 4] The input is an iterator of LabeledSentence class The output is an iterator of Sample class

  4. class SentenceBiPadding extends Transformer[String, String]

    Permalink

    x => ["start", x, "end"]

  5. class SentenceSplitter extends Transformer[String, Array[String]]

    Permalink

    Input a sequence of string, cut it into sentences.

    Input a sequence of string, cut it into sentences. The sentenceDetector is an API from OpenNLP. If sentFile is None, the default sentence delimiter is period.

  6. class SentenceTokenizer extends Transformer[String, Array[String]]

    Permalink

    Transformer that tokenizes a Document (article) into a Seq[Seq[String]]

  7. class TextToLabeledSentence[T] extends Transformer[Array[String], LabeledSentence[T]]

    Permalink

    Transform a string of sentence to LabeledSentence.

    Transform a string of sentence to LabeledSentence. e.g. ["I", "love", "Intel"] => [0, 1, 2] data: [0, 1] label: [1, 2]

    The input Array[String] should be a tokenized sentence. e.g. I love Intel => ["I", "love", "Intel"]

Value Members

  1. object Dictionary extends Serializable

    Permalink
  2. object LabeledSentenceToSample extends Serializable

    Permalink
  3. object SentenceBiPadding extends Serializable

    Permalink
  4. object SentenceSplitter extends Serializable

    Permalink
  5. object SentenceTokenizer extends Serializable

    Permalink
  6. object TextToLabeledSentence extends Serializable

    Permalink
  7. package utils

    Permalink

Ungrouped