doctr.models#
doctr.models.classification#
- doctr.models.classification.vgg16_bn_r(pretrained: bool = False, **kwargs: Any) VGG [source]#
VGG-16 architecture as described in “Very Deep Convolutional Networks for Large-Scale Image Recognition”, modified by adding batch normalization, rectangular pooling and a simpler classification head.
>>> import tensorflow as tf >>> from doctr.models import vgg16_bn_r >>> model = vgg16_bn_r(pretrained=False) >>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained (bool) – If True, returns a model pre-trained on ImageNet
- Returns
VGG feature extractor
- doctr.models.classification.resnet18(pretrained: bool = False, **kwargs: Any) ResNet [source]#
Resnet-18 architecture as described in “Deep Residual Learning for Image Recognition”,.
>>> import tensorflow as tf >>> from doctr.models import resnet18 >>> model = resnet18(pretrained=False) >>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained – boolean, True if model is pretrained
- Returns
A classification model
- doctr.models.classification.resnet34(pretrained: bool = False, **kwargs: Any) ResNet [source]#
Resnet-34 architecture as described in “Deep Residual Learning for Image Recognition”,.
>>> import tensorflow as tf >>> from doctr.models import resnet34 >>> model = resnet34(pretrained=False) >>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained – boolean, True if model is pretrained
- Returns
A classification model
- doctr.models.classification.resnet50(pretrained: bool = False, **kwargs: Any) ResNet [source]#
Resnet-50 architecture as described in “Deep Residual Learning for Image Recognition”,.
>>> import tensorflow as tf >>> from doctr.models import resnet50 >>> model = resnet50(pretrained=False) >>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained – boolean, True if model is pretrained
- Returns
A classification model
- doctr.models.classification.resnet31(pretrained: bool = False, **kwargs: Any) ResNet [source]#
Resnet31 architecture with rectangular pooling windows as described in “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”,. Downsizing: (H, W) –> (H/8, W/4)
>>> import tensorflow as tf >>> from doctr.models import resnet31 >>> model = resnet31(pretrained=False) >>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained – boolean, True if model is pretrained
- Returns
A classification model
- doctr.models.classification.mobilenet_v3_small(pretrained: bool = False, **kwargs: Any) MobileNetV3 [source]#
MobileNetV3-Small architecture as described in “Searching for MobileNetV3”,.
>>> import tensorflow as tf >>> from doctr.models import mobilenet_v3_small >>> model = mobilenet_v3_small(pretrained=False) >>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained – boolean, True if model is pretrained
- Returns
a keras.Model
- doctr.models.classification.mobilenet_v3_large(pretrained: bool = False, **kwargs: Any) MobileNetV3 [source]#
MobileNetV3-Large architecture as described in “Searching for MobileNetV3”,.
>>> import tensorflow as tf >>> from doctr.models import mobilenet_v3_large >>> model = mobilenet_v3_large(pretrained=False) >>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained – boolean, True if model is pretrained
- Returns
a keras.Model
- doctr.models.classification.mobilenet_v3_small_r(pretrained: bool = False, **kwargs: Any) MobileNetV3 [source]#
MobileNetV3-Small architecture as described in “Searching for MobileNetV3”,, with rectangular pooling.
>>> import tensorflow as tf >>> from doctr.models import mobilenet_v3_small_r >>> model = mobilenet_v3_small_r(pretrained=False) >>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained – boolean, True if model is pretrained
- Returns
a keras.Model
- doctr.models.classification.mobilenet_v3_large_r(pretrained: bool = False, **kwargs: Any) MobileNetV3 [source]#
MobileNetV3-Large architecture as described in “Searching for MobileNetV3”,.
>>> import tensorflow as tf >>> from doctr.models import mobilenet_v3_large_r >>> model = mobilenet_v3_large_r(pretrained=False) >>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained – boolean, True if model is pretrained
- Returns
a keras.Model
- doctr.models.classification.mobilenet_v3_small_orientation(pretrained: bool = False, **kwargs: Any) MobileNetV3 [source]#
MobileNetV3-Small architecture as described in “Searching for MobileNetV3”,.
>>> import tensorflow as tf >>> from doctr.models import mobilenet_v3_small_orientation >>> model = mobilenet_v3_small_orientation(pretrained=False) >>> input_tensor = tf.random.uniform(shape=[1, 512, 512, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained – boolean, True if model is pretrained
- Returns
a keras.Model
- doctr.models.classification.magc_resnet31(pretrained: bool = False, **kwargs: Any) ResNet [source]#
Resnet31 architecture with Multi-Aspect Global Context Attention as described in “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”,.
>>> import tensorflow as tf >>> from doctr.models import magc_resnet31 >>> model = magc_resnet31(pretrained=False) >>> input_tensor = tf.random.uniform(shape=[1, 224, 224, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained – boolean, True if model is pretrained
- Returns
A feature extractor model
- doctr.models.classification.crop_orientation_predictor(arch: str = 'mobilenet_v3_small_orientation', pretrained: bool = False, **kwargs: Any) CropOrientationPredictor [source]#
Orientation classification architecture.
>>> import numpy as np >>> from doctr.models import crop_orientation_predictor >>> model = crop_orientation_predictor(arch='classif_mobilenet_v3_small', pretrained=True) >>> input_crop = (255 * np.random.rand(600, 800, 3)).astype(np.uint8) >>> out = model([input_crop])
- Parameters
arch – name of the architecture to use (e.g. ‘mobilenet_v3_small’)
pretrained – If True, returns a model pre-trained on our recognition crops dataset
- Returns
CropOrientationPredictor
doctr.models.detection#
- doctr.models.detection.linknet_resnet18(pretrained: bool = False, **kwargs: Any) LinkNet [source]#
LinkNet as described in “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
>>> import tensorflow as tf >>> from doctr.models import linknet_resnet18 >>> model = linknet_resnet18(pretrained=True) >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained (bool) – If True, returns a model pre-trained on our text detection dataset
- Returns
text detection architecture
- doctr.models.detection.linknet_resnet34(pretrained: bool = False, **kwargs: Any) LinkNet [source]#
LinkNet as described in “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
>>> import tensorflow as tf >>> from doctr.models import linknet_resnet34 >>> model = linknet_resnet34(pretrained=True) >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained (bool) – If True, returns a model pre-trained on our text detection dataset
- Returns
text detection architecture
- doctr.models.detection.linknet_resnet50(pretrained: bool = False, **kwargs: Any) LinkNet [source]#
LinkNet as described in “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
>>> import tensorflow as tf >>> from doctr.models import linknet_resnet50 >>> model = linknet_resnet50(pretrained=True) >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained (bool) – If True, returns a model pre-trained on our text detection dataset
- Returns
text detection architecture
- doctr.models.detection.db_resnet50(pretrained: bool = False, **kwargs: Any) DBNet [source]#
DBNet as described in “Real-time Scene Text Detection with Differentiable Binarization”, using a ResNet-50 backbone.
>>> import tensorflow as tf >>> from doctr.models import db_resnet50 >>> model = db_resnet50(pretrained=True) >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained (bool) – If True, returns a model pre-trained on our text detection dataset
- Returns
text detection architecture
- doctr.models.detection.db_mobilenet_v3_large(pretrained: bool = False, **kwargs: Any) DBNet [source]#
DBNet as described in “Real-time Scene Text Detection with Differentiable Binarization”, using a mobilenet v3 large backbone.
>>> import tensorflow as tf >>> from doctr.models import db_mobilenet_v3_large >>> model = db_mobilenet_v3_large(pretrained=True) >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained (bool) – If True, returns a model pre-trained on our text detection dataset
- Returns
text detection architecture
- doctr.models.detection.detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, assume_straight_pages: bool = True, **kwargs: Any) DetectionPredictor [source]#
Text detection architecture.
>>> import numpy as np >>> from doctr.models import detection_predictor >>> model = detection_predictor(arch='db_resnet50', pretrained=True) >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8) >>> out = model([input_page])
- Parameters
arch – name of the architecture to use (e.g. ‘db_resnet50’)
pretrained – If True, returns a model pre-trained on our text detection dataset
assume_straight_pages – If True, fit straight boxes to the page
- Returns
Detection predictor
doctr.models.recognition#
- doctr.models.recognition.crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) CRNN [source]#
CRNN with a VGG-16 backbone as described in “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
>>> import tensorflow as tf >>> from doctr.models import crnn_vgg16_bn >>> model = crnn_vgg16_bn(pretrained=True) >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained (bool) – If True, returns a model pre-trained on our text recognition dataset
- Returns
text recognition architecture
- doctr.models.recognition.crnn_mobilenet_v3_small(pretrained: bool = False, **kwargs: Any) CRNN [source]#
CRNN with a MobileNet V3 Small backbone as described in “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
>>> import tensorflow as tf >>> from doctr.models import crnn_mobilenet_v3_small >>> model = crnn_mobilenet_v3_small(pretrained=True) >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained (bool) – If True, returns a model pre-trained on our text recognition dataset
- Returns
text recognition architecture
- doctr.models.recognition.crnn_mobilenet_v3_large(pretrained: bool = False, **kwargs: Any) CRNN [source]#
CRNN with a MobileNet V3 Large backbone as described in “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
>>> import tensorflow as tf >>> from doctr.models import crnn_mobilenet_v3_large >>> model = crnn_mobilenet_v3_large(pretrained=True) >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained (bool) – If True, returns a model pre-trained on our text recognition dataset
- Returns
text recognition architecture
- doctr.models.recognition.sar_resnet31(pretrained: bool = False, **kwargs: Any) SAR [source]#
SAR with a resnet-31 feature extractor as described in “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
>>> import tensorflow as tf >>> from doctr.models import sar_resnet31 >>> model = sar_resnet31(pretrained=False) >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained (bool) – If True, returns a model pre-trained on our text recognition dataset
- Returns
text recognition architecture
- doctr.models.recognition.master(pretrained: bool = False, **kwargs: Any) MASTER [source]#
MASTER as described in paper: <https://arxiv.org/pdf/1910.02562.pdf>`_.
>>> import tensorflow as tf >>> from doctr.models import master >>> model = master(pretrained=False) >>> input_tensor = tf.random.uniform(shape=[1, 48, 160, 3], maxval=1, dtype=tf.float32) >>> out = model(input_tensor)
- Parameters
pretrained (bool) – If True, returns a model pre-trained on our text recognition dataset
- Returns
text recognition architecture
- doctr.models.recognition.recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) RecognitionPredictor [source]#
Text recognition architecture.
- Example::
>>> import numpy as np >>> from doctr.models import recognition_predictor >>> model = recognition_predictor(pretrained=True) >>> input_page = (255 * np.random.rand(32, 128, 3)).astype(np.uint8) >>> out = model([input_page])
- Parameters
arch – name of the architecture to use (e.g. ‘crnn_vgg16_bn’)
pretrained – If True, returns a model pre-trained on our text recognition dataset
- Returns
Recognition predictor
doctr.models.zoo#
- doctr.models.ocr_predictor(det_arch: str = 'db_resnet50', reco_arch: str = 'crnn_vgg16_bn', pretrained: bool = False, assume_straight_pages: bool = True, preserve_aspect_ratio: bool = False, symmetric_pad: bool = True, export_as_straight_boxes: bool = False, **kwargs: Any) OCRPredictor [source]#
End-to-end OCR architecture using one model for localization, and another for text recognition.
>>> import numpy as np >>> from doctr.models import ocr_predictor >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True) >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8) >>> out = model([input_page])
- Parameters
det_arch – name of the detection architecture to use (e.g. ‘db_resnet50’, ‘db_mobilenet_v3_large’)
reco_arch – name of the recognition architecture to use (e.g. ‘crnn_vgg16_bn’, ‘sar_resnet31’)
pretrained – If True, returns a model pre-trained on our OCR dataset
assume_straight_pages – if True, speeds up the inference by assuming you only pass straight pages without rotated textual elements.
preserve_aspect_ratio – If True, pad the input document image to preserve the aspect ratio before running the detection model on it.
symmetric_pad – if True, pad the image symmetrically instead of padding at the bottom-right.
export_as_straight_boxes – when assume_straight_pages is set to False, export final predictions (potentially rotated) as straight bounding boxes.
kwargs – keyword args of OCRPredictor
- Returns
OCR predictor