Without pip


Install without pip

Remark: - Only Python 2.7, Python 3.5 and Python 3.6 are supported for now. - Note that Python 3.6 is only compatible with Spark 1.6.4, 2.0.3, 2.1.1 and 2.2.0. See this issue for more discussion.

  1. Download Spark

  2. You can download the BigDL release and nightly build from the Release Page or build the BigDL package from source.

  3. Install Python dependencies:

    • BigDL only depends on Numpy for now.
    • For Spark standalone cluster:
      • If you're running in cluster mode, you need to install Python dependencies on both client and each worker node.
      • Install Numpy: sudo apt-get install python-numpy (Ubuntu)
    • For Yarn cluster:
      • You can run BigDL Python programs on YARN clusters without changes to the cluster (e.g., no need to pre-install the Python dependencies). You can first package all the required Python dependencies into a virtual environment on the localnode (where you will run the spark-submit command), and then directly use spark-submit to run the BigDL Python program on the YARN cluster (using that virtual environment). Please refer to this Packing-dependencies for more details.