docTR: Document Text Recognition¶
State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
DocTR provides an easy and powerful way to extract valuable information from your documents:
đ§Ÿ for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
đ©âđŹ for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
Main Features¶
đ€ Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⥠User-friendly, 3 lines of code to load a document and extract text with a predictor
đ State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
⥠Optimized for inference speed on both CPU & GPU
đŠ Light package, minimal dependencies
đ ïž Actively maintained by Mindee
đ Easy integration (available templates for browser demo & API deployment)
Model zoo¶
Text detection models¶
Text recognition models¶
SAR from âShow, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognitionâ
MASTER from âMASTER: Multi-Aspect Non-local Network for Scene Text Recognitionâ
ViTSTR from âVision Transformer for Fast and Efficient Scene Text Recognitionâ
PARSeq from âScene Text Recognition with Permuted Autoregressive Sequence Modelsâ
Supported datasets¶
FUNSD from âFUNSD: A Dataset for Form Understanding in Noisy Scanned Documentsâ.
CORD from âCORD: A Consolidated Receipt Dataset forPost-OCR Parsingâ.
SROIE from ICDAR 2019.
IIIT-5k from CVIT.
Street View Text from âEnd-to-End Scene Text Recognitionâ.
SynthText from Visual Geometry Group.
SVHN from âReading Digits in Natural Images with Unsupervised Feature Learningâ.
IC03 from ICDAR 2003.
IC13 from ICDAR 2013.
IMGUR5K from âTextStyleBrush: Transfer of Text Aesthetics from a Single Exampleâ.
MJSynth from âSynthetic Data and Artificial Neural Networks for Natural Scene Text Recognitionâ.
IIITHWS from âGenerating Synthetic Data for Text Recognitionâ.
WILDRECEIPT from âSpatial Dual-Modality Graph Reasoning for Key Information Extractionâ.