doctr.models#

doctr.models.classification#

doctr.models.classification.vgg16_bn_r(pretrained: bool = False, **kwargs: Any) VGG[source]#

VGG-16 architecture as described in “Very Deep Convolutional Networks for Large-Scale Image Recognition”, modified by adding batch normalization, rectangular pooling and a simpler classification head.

>>> import tensorflow as tf
>>> from doctr.models import vgg16_bn_r
>>> model = vgg16_bn_r(pretrained=False)
>>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

Returns

VGG feature extractor

doctr.models.classification.resnet18(pretrained: bool = False, **kwargs: Any) ResNet[source]#

Resnet-18 architecture as described in “Deep Residual Learning for Image Recognition”,.

>>> import tensorflow as tf
>>> from doctr.models import resnet18
>>> model = resnet18(pretrained=False)
>>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained – boolean, True if model is pretrained

Returns

A classification model

doctr.models.classification.resnet34(pretrained: bool = False, **kwargs: Any) ResNet[source]#

Resnet-34 architecture as described in “Deep Residual Learning for Image Recognition”,.

>>> import tensorflow as tf
>>> from doctr.models import resnet34
>>> model = resnet34(pretrained=False)
>>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained – boolean, True if model is pretrained

Returns

A classification model

doctr.models.classification.resnet50(pretrained: bool = False, **kwargs: Any) ResNet[source]#

Resnet-50 architecture as described in “Deep Residual Learning for Image Recognition”,.

>>> import tensorflow as tf
>>> from doctr.models import resnet50
>>> model = resnet50(pretrained=False)
>>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained – boolean, True if model is pretrained

Returns

A classification model

doctr.models.classification.resnet31(pretrained: bool = False, **kwargs: Any) ResNet[source]#

Resnet31 architecture with rectangular pooling windows as described in “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”,. Downsizing: (H, W) –> (H/8, W/4)

>>> import tensorflow as tf
>>> from doctr.models import resnet31
>>> model = resnet31(pretrained=False)
>>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained – boolean, True if model is pretrained

Returns

A classification model

doctr.models.classification.mobilenet_v3_small(pretrained: bool = False, **kwargs: Any) MobileNetV3[source]#

MobileNetV3-Small architecture as described in “Searching for MobileNetV3”,.

>>> import tensorflow as tf
>>> from doctr.models import mobilenet_v3_small
>>> model = mobilenet_v3_small(pretrained=False)
>>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained – boolean, True if model is pretrained

Returns

a keras.Model

doctr.models.classification.mobilenet_v3_large(pretrained: bool = False, **kwargs: Any) MobileNetV3[source]#

MobileNetV3-Large architecture as described in “Searching for MobileNetV3”,.

>>> import tensorflow as tf
>>> from doctr.models import mobilenet_v3_large
>>> model = mobilenet_v3_large(pretrained=False)
>>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained – boolean, True if model is pretrained

Returns

a keras.Model

doctr.models.classification.mobilenet_v3_small_r(pretrained: bool = False, **kwargs: Any) MobileNetV3[source]#

MobileNetV3-Small architecture as described in “Searching for MobileNetV3”,, with rectangular pooling.

>>> import tensorflow as tf
>>> from doctr.models import mobilenet_v3_small_r
>>> model = mobilenet_v3_small_r(pretrained=False)
>>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained – boolean, True if model is pretrained

Returns

a keras.Model

doctr.models.classification.mobilenet_v3_large_r(pretrained: bool = False, **kwargs: Any) MobileNetV3[source]#

MobileNetV3-Large architecture as described in “Searching for MobileNetV3”,.

>>> import tensorflow as tf
>>> from doctr.models import mobilenet_v3_large_r
>>> model = mobilenet_v3_large_r(pretrained=False)
>>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained – boolean, True if model is pretrained

Returns

a keras.Model

doctr.models.classification.mobilenet_v3_small_orientation(pretrained: bool = False, **kwargs: Any) MobileNetV3[source]#

MobileNetV3-Small architecture as described in “Searching for MobileNetV3”,.

>>> import tensorflow as tf
>>> from doctr.models import mobilenet_v3_small_orientation
>>> model = mobilenet_v3_small_orientation(pretrained=False)
>>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained – boolean, True if model is pretrained

Returns

a keras.Model

doctr.models.classification.magc_resnet31(pretrained: bool = False, **kwargs: Any) ResNet[source]#

Resnet31 architecture with Multi-Aspect Global Context Attention as described in “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”,.

>>> import tensorflow as tf
>>> from doctr.models import magc_resnet31
>>> model = magc_resnet31(pretrained=False)
>>> input_tensor = tf.random.uniform(shape=[1, 224, 224, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained – boolean, True if model is pretrained

Returns

A feature extractor model

doctr.models.classification.crop_orientation_predictor(arch: str = 'mobilenet_v3_small_orientation', pretrained: bool = False, **kwargs: Any) CropOrientationPredictor[source]#

Orientation classification architecture.

>>> import numpy as np
>>> from doctr.models import crop_orientation_predictor
>>> model = crop_orientation_predictor(arch='classif_mobilenet_v3_small', pretrained=True)
>>> input_crop = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
>>> out = model([input_crop])
Parameters
  • arch – name of the architecture to use (e.g. ‘mobilenet_v3_small’)

  • pretrained – If True, returns a model pre-trained on our recognition crops dataset

Returns

CropOrientationPredictor

doctr.models.detection#

doctr.models.detection.linknet_resnet18(pretrained: bool = False, **kwargs: Any) LinkNet[source]#

LinkNet as described in “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.

>>> import tensorflow as tf
>>> from doctr.models import linknet_resnet18
>>> model = linknet_resnet18(pretrained=True)
>>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained (bool) – If True, returns a model pre-trained on our text detection dataset

Returns

text detection architecture

doctr.models.detection.linknet_resnet34(pretrained: bool = False, **kwargs: Any) LinkNet[source]#

LinkNet as described in “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.

>>> import tensorflow as tf
>>> from doctr.models import linknet_resnet34
>>> model = linknet_resnet34(pretrained=True)
>>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained (bool) – If True, returns a model pre-trained on our text detection dataset

Returns

text detection architecture

doctr.models.detection.linknet_resnet50(pretrained: bool = False, **kwargs: Any) LinkNet[source]#

LinkNet as described in “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.

>>> import tensorflow as tf
>>> from doctr.models import linknet_resnet50
>>> model = linknet_resnet50(pretrained=True)
>>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained (bool) – If True, returns a model pre-trained on our text detection dataset

Returns

text detection architecture

doctr.models.detection.db_resnet50(pretrained: bool = False, **kwargs: Any) DBNet[source]#

DBNet as described in “Real-time Scene Text Detection with Differentiable Binarization”, using a ResNet-50 backbone.

>>> import tensorflow as tf
>>> from doctr.models import db_resnet50
>>> model = db_resnet50(pretrained=True)
>>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained (bool) – If True, returns a model pre-trained on our text detection dataset

Returns

text detection architecture

doctr.models.detection.db_mobilenet_v3_large(pretrained: bool = False, **kwargs: Any) DBNet[source]#

DBNet as described in “Real-time Scene Text Detection with Differentiable Binarization”, using a mobilenet v3 large backbone.

>>> import tensorflow as tf
>>> from doctr.models import db_mobilenet_v3_large
>>> model = db_mobilenet_v3_large(pretrained=True)
>>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained (bool) – If True, returns a model pre-trained on our text detection dataset

Returns

text detection architecture

doctr.models.detection.detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, assume_straight_pages: bool = True, **kwargs: Any) DetectionPredictor[source]#

Text detection architecture.

>>> import numpy as np
>>> from doctr.models import detection_predictor
>>> model = detection_predictor(arch='db_resnet50', pretrained=True)
>>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
>>> out = model([input_page])
Parameters
  • arch – name of the architecture to use (e.g. ‘db_resnet50’)

  • pretrained – If True, returns a model pre-trained on our text detection dataset

  • assume_straight_pages – If True, fit straight boxes to the page

Returns

Detection predictor

doctr.models.recognition#

doctr.models.recognition.crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) CRNN[source]#

CRNN with a VGG-16 backbone as described in “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.

>>> import tensorflow as tf
>>> from doctr.models import crnn_vgg16_bn
>>> model = crnn_vgg16_bn(pretrained=True)
>>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained (bool) – If True, returns a model pre-trained on our text recognition dataset

Returns

text recognition architecture

doctr.models.recognition.crnn_mobilenet_v3_small(pretrained: bool = False, **kwargs: Any) CRNN[source]#

CRNN with a MobileNet V3 Small backbone as described in “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.

>>> import tensorflow as tf
>>> from doctr.models import crnn_mobilenet_v3_small
>>> model = crnn_mobilenet_v3_small(pretrained=True)
>>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained (bool) – If True, returns a model pre-trained on our text recognition dataset

Returns

text recognition architecture

doctr.models.recognition.crnn_mobilenet_v3_large(pretrained: bool = False, **kwargs: Any) CRNN[source]#

CRNN with a MobileNet V3 Large backbone as described in “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.

>>> import tensorflow as tf
>>> from doctr.models import crnn_mobilenet_v3_large
>>> model = crnn_mobilenet_v3_large(pretrained=True)
>>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained (bool) – If True, returns a model pre-trained on our text recognition dataset

Returns

text recognition architecture

doctr.models.recognition.sar_resnet31(pretrained: bool = False, **kwargs: Any) SAR[source]#

SAR with a resnet-31 feature extractor as described in “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.

>>> import tensorflow as tf
>>> from doctr.models import sar_resnet31
>>> model = sar_resnet31(pretrained=False)
>>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained (bool) – If True, returns a model pre-trained on our text recognition dataset

Returns

text recognition architecture

doctr.models.recognition.master(pretrained: bool = False, **kwargs: Any) MASTER[source]#

MASTER as described in paper: <https://arxiv.org/pdf/1910.02562.pdf>`_.

>>> import tensorflow as tf
>>> from doctr.models import master
>>> model = master(pretrained=False)
>>> input_tensor = tf.random.uniform(shape=[1, 48, 160, 3], maxval=1, dtype=tf.float32)
>>> out = model(input_tensor)
Parameters

pretrained (bool) – If True, returns a model pre-trained on our text recognition dataset

Returns

text recognition architecture

doctr.models.recognition.recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) RecognitionPredictor[source]#

Text recognition architecture.

Example::
>>> import numpy as np
>>> from doctr.models import recognition_predictor
>>> model = recognition_predictor(pretrained=True)
>>> input_page = (255 * np.random.rand(32, 128, 3)).astype(np.uint8)
>>> out = model([input_page])
Parameters
  • arch – name of the architecture to use (e.g. ‘crnn_vgg16_bn’)

  • pretrained – If True, returns a model pre-trained on our text recognition dataset

Returns

Recognition predictor

doctr.models.zoo#

doctr.models.ocr_predictor(det_arch: str = 'db_resnet50', reco_arch: str = 'crnn_vgg16_bn', pretrained: bool = False, assume_straight_pages: bool = True, preserve_aspect_ratio: bool = False, symmetric_pad: bool = True, export_as_straight_boxes: bool = False, **kwargs: Any) OCRPredictor[source]#

End-to-end OCR architecture using one model for localization, and another for text recognition.

>>> import numpy as np
>>> from doctr.models import ocr_predictor
>>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
>>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
>>> out = model([input_page])
Parameters
  • det_arch – name of the detection architecture to use (e.g. ‘db_resnet50’, ‘db_mobilenet_v3_large’)

  • reco_arch – name of the recognition architecture to use (e.g. ‘crnn_vgg16_bn’, ‘sar_resnet31’)

  • pretrained – If True, returns a model pre-trained on our OCR dataset

  • assume_straight_pages – if True, speeds up the inference by assuming you only pass straight pages without rotated textual elements.

  • preserve_aspect_ratio – If True, pad the input document image to preserve the aspect ratio before running the detection model on it.

  • symmetric_pad – if True, pad the image symmetrically instead of padding at the bottom-right.

  • export_as_straight_boxes – when assume_straight_pages is set to False, export final predictions (potentially rotated) as straight bounding boxes.

  • kwargs – keyword args of OCRPredictor

Returns

OCR predictor