Quickstart¶
This page shows you how to get OCR results from a document in just a few lines of code. For more details see Choosing the right model.
Load a document¶
docTR can read PDFs, images, and web pages:
from doctr.io import DocumentFile
# From a PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# From one or more images
doc = DocumentFile.from_images("path/to/your/img.jpg")
doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
# From a URL (requires the ``html`` extra: pip install "python-doctr[html]")
doc = DocumentFile.from_url("https://www.example.com")
Run OCR¶
from doctr.io import DocumentFile
from doctr.models import ocr_predictor
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
model = ocr_predictor(pretrained=True)
result = model(doc)
The predictor uses db_resnet50 for text detection and crnn_vgg16_bn for text recognition by default.
You can choose any combination of supported architectures.
Inspect the output¶
The result is a Document object.
Render as plain text:
print(result.render())
Export as a nested dictionary (JSON-serialisable):
import json
print(json.dumps(result.export(), indent=2))
Visualize on screen (requires the viz extra: pip install "python-doctr[viz]"):
result.pages[0].show()
Multi-page PDF end-to-end example¶
The following snippet processes every page of a PDF and collects the plain-text output:
import json
from doctr.io import DocumentFile
from doctr.models import ocr_predictor
model = ocr_predictor(pretrained=True)
doc = DocumentFile.from_pdf("path/to/multi_page.pdf")
result = model(doc)
# Plain-text — one string per page
for page_idx, page in enumerate(result.pages):
print(f"--- Page {page_idx + 1} ---")
print(page.render())
# Structured output — JSON-serialisable dict
output = result.export()
with open("ocr_output.json", "w") as f:
json.dump(output, f, indent=2)
Common pitfalls¶
Note
Visualization requires the
vizextra (installsmatplotlibandmplcursors):pip install "python-doctr[viz]". Calls toresult.show()orresult.pages[0].show()raise aModuleNotFoundErrorwithout it.HTML input requires the
htmlextra:pip install "python-doctr[html]".Image format: pass file paths or NumPy
uint8arrays shaped(H, W, C)in RGB order. Grayscale arrays must be converted to 3-channel before use.Pretrained weights are downloaded on first use and cached locally. Subsequent calls are instantaneous.
PDF pages are returned as images:
DocumentFile.from_pdfreturns one NumPy array per page, soresult.pages[i]corresponds to the i-th PDF page.
Next steps¶
Choosing the right model - full predictor guide, architecture benchmarks, GPU usage.
Train your own model - train and load your own models.
Share your model with the community - share your trained models on Hugging Face Hub.