# Tflite

# Overview

Library that can help in

  • reduce model size (less memory, faster download )
  • fuse operation (faster operation)

Takes regular tensroflow SavedModel (protobuff) and compress it to a much lighter format based on FlatBuffers. FlatBuffers should lead to faster loading in memory

# How tp use

Tflite conversion

import tensorflow as tf 

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path)
tflite_model = converter.convert()

with open("converted_model.tflite", "wb") as f:
    f.write(tflite_model)

Usage in browser

import * as tf from '@tensorflow/tfjs';
const model = await tf.loadLayersModel('https://example.com/tfjs/model.json');
const image = tf.fromPixels(webcamElement);
const prediction = model.predict(image);

# Details in redution

  1. Remove / Fuse operation

Multiplication and addition can be rearranged to be more efficinet 3×a + 4×a + 5×a => (3 + 4 + 5)×a

Fuse operations like Batchnorm.

unoptimized optimized

Above visiulaiztions were created witht netron (opens new window) using the mobilenet frozen buffer and tflite dwonload from tfhub (opens new window).

  1. Qunatization By default most model uses 32 bit float. If you convert to 16 bit float that would reduce the size.

unoptimized Reference: https://www.tensorflow.org/lite/performance/model_optimization

Post Training Symmetric Quantization Lets say weights range from -1.5 to 0.8: Bytes –127, 0, and +127 will correspond to the floats –1.5, 0.0, and +1.5, respectively. 0.0 always maps to 0 when using symmetrical quantization byte values +68 to +127 will not be used, since they map to floats greater than +0.8)

Some systems like Edge TPU only support int quantization.

# References

Géron, A. (2019). Hands-on machine learning with Scikit-Learn, Keras and TensorFlow: concepts, tools, and techniques to build intelligent systems (2nd ed.). O’Reilly. Link (opens new window)