-
Currently, BigDL uses synchronous mini-batch SGD in model training. The mini-batch size is expected to be a multiple of total cores used in the job.
-
You may observe very poor performance when running BigDL for Spark 2.0 with Java 7; it is highly recommended to use Java 8 when building and running BigDL for Spark 2.0.
-
On Spark 2.0, please use default Java serializer instead of Kryo because of Kryo Issue 341. The issue has been fixed in Kryo 4.0. However, Spark 2.0 uses Kryo 3.0.3. Spark 1.5 and 1.6 do not have this problem.
-
On CentOS 6 and 7, please increase the max user processes to a larger value (e.g., 514585); otherwise, you may see errors like "unable to create new native thread".
-
Currently, BigDL will load all the training and validation data into memory during training. You may encounter errors if it runs out of memory.
-
If you meet the program stuck after Save model... on Mesos, check the
spark.driver.memory
and increase the value. Eg, VGG on Cifar10 may need 20G+. -
If you meet
can't find executor core number
on Mesos, you should pass the executor cores through--conf spark.executor.cores=xxx