Class that help build a dictionary either from tokenized text or from saved dictionary
Represent a sentence
if oneHot = true: Transform labeled sentences to one-hot format samples e.g.
x => ["start", x, "end"]
Input a sequence of string, cut it into sentences.
Input a sequence of string, cut it into sentences. The sentenceDetector is an API from OpenNLP. If sentFile is None, the default sentence delimiter is period.
Transformer that tokenizes a Document (article) into a Seq[Seq[String]]
Transform a string of sentence to LabeledSentence.
Transform a string of sentence to LabeledSentence. e.g. ["I", "love", "Intel"] => [0, 1, 2] data: [0, 1] label: [1, 2]
The input Array[String] should be a tokenized sentence. e.g. I love Intel => ["I", "love", "Intel"]
if oneHot = true: Transform labeled sentences to one-hot format samples e.g. sentence._data: [0, 2, 3] sentence._label: [2, 3, 1] vocabLength: 4
> input: 0, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1 target: [3, 4, 2]
else: The model will use LookupTable for word embedding.
> input: [1, 2, 3]
> label: [2, 3, 4] The input is an iterator of LabeledSentence class The output is an iterator of Sample class