After pip install
Precondition
Use an Interactive Shell
- Type
python
in the command line to start a REPL. - Only Python 2.7, Python 3.5 and Python 3.6 are supported for now.
- Note that Python 3.6 is only compatible with Spark 1.6.4, 2.0.3, 2.1.1 and 2.2.0. See this issue for more discussion.
Run as a local program
- If the type of input data is ndarray instead of RDD or DataFrame, the model would be trained or validated in local mode.
- Check LeNet for more details.
# X_train, Y_train, X_test are all ndarray and the first dimension is the sample number.
local_optimizer = Optimizer.create(
model=model_definition,
training_set=(X_train, Y_train))
local_optimizer.predict(X_test)
local_optimizer.predict_class(X_test)
Use Jupyter Notebook
- Just start jupyter notebook as you normally do, e.g.
jupyter notebook --notebook-dir=./ --ip=* --no-browser
Example code to verify if BigDL can run successfully
from bigdl.util.common import *
from pyspark import SparkContext
from bigdl.nn.layer import *
import bigdl.version
# create sparkcontext with bigdl configuration
sc = SparkContext.getOrCreate(conf=create_spark_conf().setMaster("local[*]"))
init_engine() # prepare the bigdl environment
bigdl.version.__version__ # Get the current BigDL version
linear = Linear(2, 3) # Try to create a Linear layer
BigDL Configurations
- Increase memory
export SPARK_DRIVER_MEMORY=20g
- Add extra jars or python packages
Set the environment variables BIGDL_JARS
and BIGDL_PACKAGES
BEFORE creating SparkContext
:
export BIGDL_JARS=...
export BIGDL_PACKAGES=...
- Redirect logs
If you want to redirect spark logs to file and keep BigDL logs in console only, call the following API before you train your model:
from bigdl.util.common import *
# by default redirected to `bigdl.log` under the current workspace
redire_spark_logs(log_path="bigdl.log")
show_bigdl_info_logs()