Return the array of all discarded words.
Selected words with top-k frequencies and discarded the remaining words.
Selected words with top-k frequencies and discarded the remaining words. Return the length of the discarded words.
return the encoding number of a word, if word does not existed in the dictionary, it will return the dictionary length as the default index.
return the encoding number of a word, if word does not existed in the dictionary, it will return the dictionary length as the default index.
The length of the vocabulary
return the word with regard to the index, if index is out of boundary, it will randomly return a word in the discarded word list.
return the word with regard to the index, if index is out of boundary, it will randomly return a word in the discarded word list. If discard word list is Empty, it will randomly return a word in the existed dictionary.
print word-to-index dictionary
print discard dictionary
Save the dictionary, discarded words to the saveFolder directory.
Save the dictionary, discarded words to the saveFolder directory.
Return the array of all selected words.
Word encoding by its index in the dictionary
Class that help build a dictionary either from tokenized text or from saved dictionary