doctr.utils¶
This module regroups non-core features that are complementary to the rest of the package.
Visualization¶
Easy-to-use functions to make sense of your model’s predictions.
- doctr.utils.visualization.visualize_page(page: dict[str, Any], image: ndarray, words_only: bool = True, display_artefacts: bool = True, scale: float = 10, interactive: bool = True, add_labels: bool = True, **kwargs: Any) Figure [source]¶
Visualize a full page with predicted blocks, lines and words
>>> import numpy as np >>> import matplotlib.pyplot as plt >>> from doctr.utils.visualization import visualize_page >>> from doctr.models import ocr_db_crnn >>> model = ocr_db_crnn(pretrained=True) >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8) >>> out = model([[input_page]]) >>> visualize_page(out[0].pages[0].export(), input_page) >>> plt.show()
- Parameters:
page – the exported Page of a Document
image – np array of the page, needs to have the same shape than page[‘dimensions’]
words_only – whether only words should be displayed
display_artefacts – whether artefacts should be displayed
scale – figsize of the largest windows side
interactive – whether the plot should be interactive
add_labels – for static plot, adds text labels on top of bounding box
**kwargs – keyword arguments for the polygon patch
- Returns:
the matplotlib figure
Reconstitution¶
- doctr.utils.reconstitution.synthesize_page(page: dict[str, Any], draw_proba: bool = False, font_family: str | None = None, smoothing_factor: float = 0.95, min_font_size: int = 8, max_font_size: int = 50) ndarray [source]¶
Draw a the content of the element page (OCR response) on a blank page.
- Parameters:
page – exported Page object to represent
draw_proba – if True, draw words in colors to represent confidence. Blue: p=1, red: p=0
font_family – family of the font
smoothing_factor – factor to smooth the font size
min_font_size – minimum font size
max_font_size – maximum font size
- Returns:
the synthesized page
Task evaluation¶
Implementations of task-specific metrics to easily assess your model performances.
- class doctr.utils.metrics.TextMatch[source]¶
Implements text match metric (word-level accuracy) for recognition task.
The raw aggregated metric is computed as follows:
with the indicator function
defined as:where
is the set of all possible character sequences, is a strictly positive integer.>>> from doctr.utils import TextMatch >>> metric = TextMatch() >>> metric.update(['Hello', 'world'], ['hello', 'world']) >>> metric.summary()
- class doctr.utils.metrics.LocalizationConfusion(iou_thresh: float = 0.5, use_polygons: bool = False)[source]¶
Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
with the function
being the Intersection over Union between bounding boxes and , and the function defined as:where
is the set of possible bounding boxes, (number of ground truths) and (number of predictions) are strictly positive integers.>>> import numpy as np >>> from doctr.utils import LocalizationConfusion >>> metric = LocalizationConfusion(iou_thresh=0.5) >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]])) >>> metric.summary()
- Parameters:
iou_thresh – minimum IoU to consider a pair of prediction and ground truth as a match
use_polygons – if set to True, predictions and targets will be expected to have rotated format
- class doctr.utils.metrics.OCRMetric(iou_thresh: float = 0.5, use_polygons: bool = False)[source]¶
Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
with the function
being the Intersection over Union between bounding boxes and , and the function defined as:where
is the set of possible bounding boxes, is the set of possible character sequences, (number of ground truths) and (number of predictions) are strictly positive integers.>>> import numpy as np >>> from doctr.utils import OCRMetric >>> metric = OCRMetric(iou_thresh=0.5) >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]), >>> ['hello'], ['hello', 'world']) >>> metric.summary()
- Parameters:
iou_thresh – minimum IoU to consider a pair of prediction and ground truth as a match
use_polygons – if set to True, predictions and targets will be expected to have rotated format
- update(gt_boxes: ndarray, pred_boxes: ndarray, gt_labels: list[str], pred_labels: list[str]) None [source]¶
Updates the metric
- Parameters:
gt_boxes – a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
pred_boxes – a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
gt_labels – a list of N string labels
pred_labels – a list of M string labels
- class doctr.utils.metrics.DetectionMetric(iou_thresh: float = 0.5, use_polygons: bool = False)[source]¶
Implements an object detection metric.
The aggregated metrics are computed as follows:
with the function
being the Intersection over Union between bounding boxes and , and the function defined as:where
is the set of possible bounding boxes, is the set of possible class indices, (number of ground truths) and (number of predictions) are strictly positive integers.>>> import numpy as np >>> from doctr.utils import DetectionMetric >>> metric = DetectionMetric(iou_thresh=0.5) >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]), >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64)) >>> metric.summary()
- Parameters:
iou_thresh – minimum IoU to consider a pair of prediction and ground truth as a match
use_polygons – if set to True, predictions and targets will be expected to have rotated format
- update(gt_boxes: ndarray, pred_boxes: ndarray, gt_labels: ndarray, pred_labels: ndarray) None [source]¶
Updates the metric
- Parameters:
gt_boxes – a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
pred_boxes – a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
gt_labels – an array of class indices of shape (N,)
pred_labels – an array of class indices of shape (M,)