bigdl.dataset package¶

Submodules¶

bigdl.dataset.base module¶

class bigdl.dataset.base.Progbar(target, width=30, verbose=1, interval=0.01)[source]¶

Bases: object

add(n, values=[])[source]¶

update(current, values=[], force=False)[source]¶

Parameters:	current – index of current step values – list of tuples (name, value_for_last_step).The progress bar will display averages for these values. force – force visual progress update

bigdl.dataset.base.display_table(rows, positions)[source]¶

bigdl.dataset.base.maybe_download(filename, work_directory, source_url)[source]¶

bigdl.dataset.base.urlretrieve(url, filename, reporthook=None, data=None)[source]¶

bigdl.dataset.dataset module¶

class bigdl.dataset.dataset.DataSet(jvalue=None, image_frame=None, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

get_image_frame()[source]¶

classmethod image_frame(image_frame, bigdl_type='float')[source]¶

transform(transformer)[source]¶

bigdl.dataset.mnist module¶

bigdl.dataset.mnist.extract_images(f)[source]¶

Extract the images into a 4D uint8 numpy array [index, y, x, depth].

Param:	f: A file object that can be passed into a gzip reader.
Returns:	data: A 4D unit8 numpy array [index, y, x, depth].
Raise:	ValueError: If the bytestream does not start with 2051.

bigdl.dataset.mnist.extract_labels(f)[source]¶

bigdl.dataset.mnist.load_data(location='/tmp/mnist')[source]¶

bigdl.dataset.mnist.read_data_sets(train_dir, data_type='train')[source]¶

Parse or download mnist data if train_dir is empty.

Param:	train_dir: The directory storing the mnist data
Param:	data_type: Reading training set or testing set.It can be either “train” or “test”
Returns:

(ndarray, ndarray) representing (features, labels)
features is a 4D unit8 numpy array [index, y, x, depth] representing each pixel valued from 0 to 255.
labels is 1D unit8 nunpy array representing the label valued from 0 to 9.

bigdl.dataset.movielens module¶

bigdl.dataset.movielens.get_id_pairs(data_dir)[source]¶

bigdl.dataset.movielens.get_id_ratings(data_dir)[source]¶

bigdl.dataset.movielens.read_data_sets(data_dir)[source]¶

Parse or download movielens 1m data if train_dir is empty.

Parameters:	data_dir – The directory storing the movielens data
Returns:	a 2D numpy array with user index and item index in each row

bigdl.dataset.news20 module¶

bigdl.dataset.news20.download_glove_w2v(dest_dir)[source]¶

bigdl.dataset.news20.download_news20(dest_dir)[source]¶

bigdl.dataset.news20.get_glove_w2v(source_dir='./data/news20/', dim=100)[source]¶

Parse or download the pre-trained glove word2vec if source_dir is empty.

Parameters:	source_dir – The directory storing the pre-trained word2vec dim – The dimension of a vector
Returns:	A dict mapping from word to vector

bigdl.dataset.news20.get_news20(source_dir='./data/news20/')[source]¶

Parse or download news20 if source_dir is empty.

Parameters:	source_dir – The directory storing news data.
Returns:	A list of (tokens, label)

bigdl.dataset.sentence module¶

bigdl.dataset.sentence.read_localfile(fileName)[source]¶

bigdl.dataset.sentence.sentence_tokenizer(sentences)[source]¶

bigdl.dataset.sentence.sentences_bipadding(sent)[source]¶

bigdl.dataset.sentence.sentences_split(line)[source]¶

bigdl.dataset.transformer module¶

bigdl.dataset.transformer.normalizer(data, mean, std)[source]¶: Normalize features by standard deviation data is a ndarray

Table Of Contents

Previous topic

Next topic

This Page