Without pip install

Precondition

First of all, you need to obtain the BigDL libs. Refer to Install from pre built or Install from source code for more details

Remark

Only Python 2.7, Python 3.5 and Python 3.6 are supported for now.
Note that Python 3.6 is only compatible with Spark 1.6.4, 2.0.3, 2.1.1 and 2.2.0. See this issue for more discussion.

Set Environment Variables

Set BIGDL_HOME and SPARK_HOME

If you download BigDL from the Release Page

export SPARK_HOME=folder path where you extract the spark package
export BIGDL_HOME=folder path where you extract the bigdl package

If you build BigDL by yourself

export SPARK_HOME=folder path where you extract the spark package
export BIGDL_HOME=the dist folder generated by the build process, which is under the top level of the source folder

Update spark-bigdl.conf (Optional)

If you have some customized properties in some files, which is used with the --properties-file option in spark-submit/pyspark, add these customized properties into ${BIGDL_HOME}/conf/spark-bigdl.conf.

Run with pyspark

${BIGDL_HOME}/bin/pyspark-with-bigdl.sh --master local[*]

Example code to verify if BigDL can run successfully

Run with spark-submit

A BigDL Python program runs as a standard pyspark program, which requires all Python dependencies (e.g., NumPy) used by the program to be installed on each node in the Spark cluster. You can try running the BigDL lenet Python example as follows:

${BIGDL_HOME}/bin/spark-submit-with-bigdl.sh --master local[4] lenet5.py

Run with Jupyter

With the full Python API support in BigDL, users can use BigDL together with powerful notebooks (such as Jupyter notebook) in a distributed fashion across the cluster, combining Python libraries, Spark SQL / dataframes and MLlib, deep learning models in BigDL, as well as interactive visualization tools.

Install all the necessary libraries on the local node where you will run Jupyter, e.g.,

sudo apt install python
sudo apt install python-pip
sudo pip install numpy scipy pandas scikit-learn matplotlib seaborn wordcloud

Launch the Jupyter notebook as follows:

${BIGDL_HOME}/bin/juptyer-with-bigdl.sh --master local[*]

After successfully launching Jupyter, you will be able to navigate to the notebook dashboard using your browser. You can find the exact URL in the console output when you started Jupyter; by default, the dashboard URL is http://your_node:8888/

Example code to verify if run successfully

BigDL Configuration

Please check this page

FAQ

ImportError from bigdl.nn.layer import *
- Check if the path is pointing to python-api.zip: --py-files ${PYTHON_API_ZIP_PATH}
- Check if the path is pointing to python-api.zip: export PYTHONPATH=${PYTHON_API_ZIP_PATH}:$PYTHONPATH
Python in worker has a different version 2.7 than that in driver 3.4
- export PYSPARK_PYTHON=/usr/local/bin/python3.4 This path should be valid on every worker node.
- export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3.4 This path should be valid on every driver node.
TypeError: 'JavaPackage' object is not callable
- Check if every path within the launch script is valid especially the path that ends with jar.
java.lang.NoSuchMethodError:XXX
- Check if the Spark version matches, i.e check if you are using Spark2.x but the underneath BigDL is compiled with Spark1.6.
If you want to redirect spark logs to file and keep BigDL logs in console only, call the following API before you train your model:

from bigdl.util.common import *

redire_spark_logs(log_path=file path to redirect logs to)
show_bigdl_info_logs()