Quantization Support

Introduction

Quantization is a method that will use low-precision caculations to substitute float caculations. It will improve the inference performance and reduce the size of model by up to 4x.

Quantize the pretrained model

BigDL provide command line tools for converting the pretrained (BigDL, Caffe, Torch and Tensorflow) model to quantized model with parameter --quantize true.

#!/bin/bash

set -x

BIGDL_HOME=${bigdl_folder}/dist
BIGDL_JAR_NAME=`ls ${BIGDL_HOME}/lib/ | grep jar-with-dependencies.jar`
BIGDL_JAR="${BIGDL_HOME}/lib/$BIGDL_JAR_NAME"
SPARK_JAR=/opt/spark/jars/*
JAR=${BIGDL_JAR}:${SPARK_JAR}
CLASS=com.intel.analytics.bigdl.utils.ConvertModel

FROM=caffe
TO=bigdl
MODEL=bvlc_alexnet.caffemodel

java -cp ${JAR} ${CLASS} --from ${FROM} --to ${TO} \
    --input ${MODEL} --output ${MODEL%%.caffemodel}.bigdlmodel \
    --prototxt ${PWD}/deploy.prototxt --quantize true

ConvertModel supports converting different types of pretrained models to bigdlmodel. It also supports converting bigdlmodel to other types. The help is

Usage: Convert models between different dl frameworks [options]

  --from <value>
        What's the type origin model bigdl,caffe,torch,tensorflow?
  --to <value>
        What's the type of model you want bigdl,caffe,torch?
  --input <value>
        Where's the origin model file?
  --output <value>
        Where's the bigdl model file to save?
  --prototxt <value>
        Where's the caffe deploy prototxt?
  --quantize <value>
        Do you want to quantize the model? Only works when "--to" is bigdl;you can only perform inference using the new quantized model.
  --tf_inputs <value>
        Inputs for Tensorflow
  --tf_outputs <value>
        Outputs for Tensorflow

Quantize model in code

You can call quantize() method to quantize the model. It will deep copy original model and generate new one. You can only perform inference using the new quantized model.

val model = xxx
val quantizedModel = model.quantize()
quantizeModel.forward(inputTensor)

There's also a Python API which is same as scala version.

model = xxx
quantizedModel = model.quantize()