bigdl.dataset package

Submodules

bigdl.dataset.base module

class bigdl.dataset.base.Progbar(target, width=30, verbose=1, interval=0.01)[source]

Bases: object

add(n, values=[])[source]
update(current, values=[], force=False)[source]
Parameters:
  • current – index of current step
  • values – list of tuples (name, value_for_last_step).The progress bar will display averages for these values.
  • force – force visual progress update
bigdl.dataset.base.display_table(rows, positions)[source]
bigdl.dataset.base.maybe_download(filename, work_directory, source_url)[source]
bigdl.dataset.base.urlretrieve(url, filename, reporthook=None, data=None)[source]

bigdl.dataset.dataset module

class bigdl.dataset.dataset.DataSet(jvalue=None, image_frame=None, bigdl_type='float')[source]

Bases: bigdl.util.common.JavaValue

get_image_frame()[source]
classmethod image_frame(image_frame, bigdl_type='float')[source]
transform(transformer)[source]

bigdl.dataset.mnist module

bigdl.dataset.mnist.extract_images(f)[source]

Extract the images into a 4D uint8 numpy array [index, y, x, depth].

Param:f: A file object that can be passed into a gzip reader.
Returns:data: A 4D unit8 numpy array [index, y, x, depth].
Raise:ValueError: If the bytestream does not start with 2051.
bigdl.dataset.mnist.extract_labels(f)[source]
bigdl.dataset.mnist.load_data(location='/tmp/mnist')[source]
bigdl.dataset.mnist.read_data_sets(train_dir, data_type='train')[source]

Parse or download mnist data if train_dir is empty.

Param:train_dir: The directory storing the mnist data
Param:data_type: Reading training set or testing set.It can be either “train” or “test”
Returns:
(ndarray, ndarray) representing (features, labels)
features is a 4D unit8 numpy array [index, y, x, depth] representing each pixel valued from 0 to 255.
labels is 1D unit8 nunpy array representing the label valued from 0 to 9.

bigdl.dataset.movielens module

bigdl.dataset.movielens.get_id_pairs(data_dir)[source]
bigdl.dataset.movielens.get_id_ratings(data_dir)[source]
bigdl.dataset.movielens.read_data_sets(data_dir)[source]

Parse or download movielens 1m data if train_dir is empty.

Parameters:data_dir – The directory storing the movielens data
Returns:a 2D numpy array with user index and item index in each row

bigdl.dataset.news20 module

bigdl.dataset.news20.download_glove_w2v(dest_dir)[source]
bigdl.dataset.news20.download_news20(dest_dir)[source]
bigdl.dataset.news20.get_glove_w2v(source_dir='./data/news20/', dim=100)[source]

Parse or download the pre-trained glove word2vec if source_dir is empty.

Parameters:
  • source_dir – The directory storing the pre-trained word2vec
  • dim – The dimension of a vector
Returns:

A dict mapping from word to vector

bigdl.dataset.news20.get_news20(source_dir='./data/news20/')[source]

Parse or download news20 if source_dir is empty.

Parameters:source_dir – The directory storing news data.
Returns:A list of (tokens, label)

bigdl.dataset.sentence module

bigdl.dataset.sentence.read_localfile(fileName)[source]
bigdl.dataset.sentence.sentence_tokenizer(sentences)[source]
bigdl.dataset.sentence.sentences_bipadding(sent)[source]
bigdl.dataset.sentence.sentences_split(line)[source]

bigdl.dataset.transformer module

bigdl.dataset.transformer.normalizer(data, mean, std)[source]

Normalize features by standard deviation data is a ndarray

Module contents