Go through the whole data set to gather some meta info for the tokens.
Go through the whole data set to gather some meta info for the tokens. Tokens would be discarded if the frequency ranking is less then maxWordsNum
Return a text classification model with the specific num of class
Load the pre-trained word2Vec
Load the pre-trained word2Vec
A map from word to vector
Load the pre-trained word2Vec
Load the pre-trained word2Vec
A map from word to vector
Create train and val RDDs from input
Start to train the text classification model
Train the text classification model with train and val RDDs
This example use a (pre-trained GloVe embedding) to convert word to vector, and uses it to train a text classification model on the 20 Newsgroup dataset with 20 different categories. This model can achieve around 90% accuracy after 2 epochs training.