Set Environment Variables

To achieve high performance, BigDL uses Intel MKL and multi-threaded programming; therefore, you need to first set the environment variables by sourcing the provided script, as follows:

$ source PATH_To_BigDL/scripts/bigdl.sh

Use Interactive Spark Shell

First, set environmental variables as described in Set Environment Variables.

Then you can try BigDL easily using the Spark interactive shell. Run below command to start spark shell with BigDL support:

$ SPARK_HOME/bin/spark-shell --properties-file dist/conf/spark-bigdl.conf    \
  --jars bigdl-VERSION-jar-with-dependencies.jar

You will see a welcome message looking like below:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_79)
Spark context available as sc.
scala> 

To use BigDL, you should first initialize the engine as below.

scala>import com.intel.analytics.bigdl.utils.Engine
scala>Engine.init

Once the engine is successfully initialted, you'll be able to play with BigDL API's. For instance, to experiment with the Tensor APIs in BigDL, you may try below code:

scala> import com.intel.analytics.bigdl.tensor.Tensor
import com.intel.analytics.bigdl.tensor.Tensor

scala> Tensor[Double](2,2).fill(1.0)
res9: com.intel.analytics.bigdl.tensor.Tensor[Double] =
1.0     1.0
1.0     1.0
[com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2]

Run as a Spark Program

First, set environmental variables as described in Set Environment Variables.

Then you can run a BigDL program, e.g., the VGG training, as a standard Spark program (running in either local mode or cluster mode) as follows:

1.Download the CIFAR-10 data from here. Remember to choose the binary version.

2.Run the VGG training.

  # Spark local mode
  spark-submit --master local[core_number] --class com.intel.analytics.bigdl.models.vgg.Train \
  ${BIGDL_HOME}/dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
  -f path_to_your_cifar_folder \
  -b batch_size

  # Spark standalone mode
  spark-submit --master spark://... --executor-cores cores_per_executor \
  --total-executor-cores total_cores_for_the_job \
  --class com.intel.analytics.bigdl.models.vgg.Train \
  dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
  -f path_to_your_cifar_folder \
  -b batch_size

  # Spark yarn mode
  spark-submit --master yarn --deploy-mode client \
  --executor-cores cores_per_executor \
  --num-executors executors_number \
  --class com.intel.analytics.bigdl.models.vgg.Train \
  dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
  -f path_to_your_cifar_folder \
  -b batch_size

The parameters used in the above command are:

If you are to run your own program, do remember to create SparkContext and initialize the engine before call other BigDL API's, as shown below.

 // Scala code example
 val conf = Engine.createSparkConf()
 val sc = new SparkContext(conf)
 Engine.init