BigDL Project

What is BigDL

BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters. To makes it easy to build Spark and BigDL applications, a high level Analytics Zoo is provided for end-to-end analytics + AI pipelines.

Rich deep learning support. Modeled after Torch, BigDL provides comprehensive support for deep learning, including numeric computing (via Tensor) and high level neural networks; in addition, users can load pre-trained Caffe or Torch or Keras models into Spark programs using BigDL.

Extremely high performance. To achieve high performance, BigDL uses Intel MKL and multi-threaded programming in each Spark task. Consequently, it is orders of magnitude faster than out-of-box open source Caffe, Torch or TensorFlow on a single-node Xeon (i.e., comparable with mainstream GPU).

Efficiently scale-out. BigDL can efficiently scale out to perform data analytics at "Big Data scale", by leveraging Apache Spark (a lightning fast distributed data processing framework), as well as efficient implementations of synchronous SGD and all-reduce communications on Spark.

Why BigDL?

You may want to write your deep learning programs using BigDL if:

You want to analyze a large amount of data on the same Big Data (Hadoop/Spark) cluster where the data are stored (in, say, HDFS, HBase, Hive, etc.).

You want to add deep learning functionalities (either training or prediction) to your Big Data (Spark) programs and/or workflow.

You want to leverage existing Hadoop/Spark clusters to run your deep learning applications, which can be then dynamically shared with other workloads (e.g., ETL, data warehouse, feature engineering, classical machine learning, graph analytics, etc.)

Getting Help

For the technical overview of BigDL, please refer to the BigDL white paper

You can check out the Getting Started page for a quick overview of how to use BigDL, and the BigDL Tutorials project for step-by-step deep leaning tutorials on BigDL (using Python).

You can join the BigDL Google Group (or subscribe to the Mail List) for more questions and discussions on BigDL

You can post bug reports and feature requests at the Issue Page

You may refer to Analytics Zoo for high level pipeline APIs, built-in deep learning models, reference use cases, etc. on Spark and BigDL

ReadMe

What is BigDL

Why BigDL?

Getting Help