Run


Set Environment Variables

Set BIGDL_HOME and SPARK_HOME:

export SPARK_HOME=folder path where you extract the spark package
export BIGDL_HOME=folder path where you extract the bigdl package
export SPARK_HOME=folder path where you extract the spark package
export BIGDL_HOME=the dist folder generated by the build process, which is under the top level of the source folder

Use Interactive Spark Shell

You can try BigDL easily using the Spark interactive shell. Run below command to start spark shell with BigDL support:

${BIGDL_HOME}/bin/spark-shell-with-bigdl.sh --master local[*]

You will see a welcome message looking like below:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_79)
Spark context available as sc.
scala> 

To use BigDL, you should first initialize the engine as below.

scala> import com.intel.analytics.bigdl.utils.Engine
scala> Engine.init

Once the engine is successfully initiated, you'll be able to play with BigDL API's. For instance, to experiment with the Tensor APIs in BigDL, you may try below code:

scala> import com.intel.analytics.bigdl.tensor.Tensor
import com.intel.analytics.bigdl.tensor.Tensor

scala> Tensor[Double](2,2).fill(1.0)
res9: com.intel.analytics.bigdl.tensor.Tensor[Double] =
1.0     1.0
1.0     1.0
[com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2]

Run as a Spark Program

You can run a BigDL program, e.g., the VGG training, as a standard Spark program (running in either local mode or cluster mode) as follows:

  1. Download the CIFAR-10 data from here. Remember to choose the binary version.
  2. Run the following command:
  # Spark local mode
  spark-submit --master local[core_number] --class com.intel.analytics.bigdl.models.vgg.Train \
  dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
  -f path_to_your_cifar_folder \
  -b batch_size

  # Spark standalone mode
  spark-submit --master spark://... --executor-cores cores_per_executor \
  --total-executor-cores total_cores_for_the_job \
  --class com.intel.analytics.bigdl.models.vgg.Train \
  dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
  -f path_to_your_cifar_folder \
  -b batch_size

  # Spark yarn client mode
  spark-submit --master yarn --deploy-mode client \
  --executor-cores cores_per_executor \
  --num-executors executors_number \
  --class com.intel.analytics.bigdl.models.vgg.Train \
  dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
  -f path_to_your_cifar_folder \
  -b batch_size

  # Spark yarn cluster mode
  spark-submit --master yarn --deploy-mode cluster \
  --executor-cores cores_per_executor \
  --num-executors executors_number \
  --class com.intel.analytics.bigdl.models.vgg.Train \
  dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
  -f path_to_your_cifar_folder \
  -b batch_size

The parameters used in the above command are:

If you are to run your own program, do remember to create SparkContext and initialize the engine before call other BigDL API's, as shown below.

 // Scala code example
 val conf = Engine.createSparkConf()
 val sc = new SparkContext(conf)
 Engine.init

Run as a Local Java/Scala program

You can try BigDL program as a local Java/Scala program.

To run the BigDL model as a local Java/Scala program, you need to set Java property bigdl.localMode to true. If you want to specify how many cores to be used for training/testing/prediction, you need to set Java property bigdl.coreNumber to the core number. You can either call System.setProperty("bigdl.localMode", "true") and System.setProperty("bigdl.coreNumber", core_number) in the Java/Scala code, or pass -Dbigdl.localMode=true and -Dbigdl.coreNumber=core_number when running the program.

You need a full jar package to run local program. In our distributed jar, we exclude the spark dependency classes. To get the full jar package, you need to build from the source code. Please refer the Build Page. You can find a bigdl-VERSION-jar-with-dependencies-and-spark.jar under the spark/dl/target/ of the source folder.

For example, you may run the Lenet model as a local Scala/Java program as follows:

1.First, you can download the MNIST Data from here. Unzip all the files and put them in one folder(e.g. mnist).

2.Run below command to train lenet as local Java/Scala program:

scala -cp spark/dl/target/bigdl-VERSION-jar-with-dependencies-and-spark.jar \
com.intel.analytics.bigdl.example.lenetLocal.Train \
-f path_to_mnist_folder \
-c core_number \
-b batch_size \
--checkpoint ./model

In the above commands

3.The above commands will cache the model in specified path(--checkpoint). Run this command will use the trained model to do a validation.

scala -cp spark/dl/target/bigdl-VERSION-jar-with-dependencies-and-spark.jar \
com.intel.analytics.bigdl.example.lenetLocal.Test \
-f path_to_mnist_folder \
--model ./model/model.iteration \
-c core_number \
-b batch_size

In the above command

4.Run below command to predict with trained model:

scala -cp spark/dl/target/bigdl-VERSION-jar-with-dependencies-and-spark.jar \
com.intel.analytics.bigdl.example.lenetLocal.Predict \
-f path_to_mnist_folder \
-c core_number \
--model ./model/model.iteration

In the above command


For Windows User

Some BigDL functions depends on Hadoop library, which requires winutils.exe installed on your machine. If you meet "Could not locate executable null\bin\winutils.exe", see the known issue page.