Preparing your model for inference

A well-trained model is a good achievement but you might want to tune a few things to make it production-ready!

Model optimization

This section is meant to help you perform inference with optimized versions of your model.


NOTE: We support half-precision inference for PyTorch and TensorFlow models only on GPU devices.

Half-precision (or FP16) is a binary floating-point format that occupies 16 bits in computer memory.


  • Faster inference

  • Less memory usage

import tensorflow as tf
from tensorflow.keras import mixed_precision
predictor = ocr_predictor(reco_arch="crnn_mobilenet_v3_small", det_arch="linknet_resnet34", pretrained=True)

Export to ONNX

ONNX (Open Neural Network Exchange) is an open and interoperable format for representing and exchanging machine learning models. It defines a common format for representing models, including the network structure, layer types, parameters, and metadata.

import tensorflow as tf
from doctr.models import vitstr_small
from doctr.models.utils import export_model_to_onnx

batch_size = 16
input_shape = (3, 32, 128)
model = vitstr_small(pretrained=True, exportable=True)
dummy_input = [tf.TensorSpec([batch_size, input_shape], tf.float32, name="input")]
model_path, output = export_model_to_onnx(model, model_name="vitstr.onnx", dummy_input=dummy_input)

Using your ONNX exported model in docTR

Coming soon