doctr.utils¶

This module regroups non-core features that are complementary to the rest of the package.

Visualization¶

Easy-to-use functions to make sense of your model’s predictions.

doctr.utils.visualization.visualize_page(page: Dict[str, Any], image: ndarray, words_only: bool = True, display_artefacts: bool = True, scale: float = 10, interactive: bool = True, add_labels: bool = True, **kwargs: Any) → Figure[source]¶

Visualize a full page with predicted blocks, lines and words

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from doctr.utils.visualization import visualize_page
>>> from doctr.models import ocr_db_crnn
>>> model = ocr_db_crnn(pretrained=True)
>>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
>>> out = model([[input_page]])
>>> visualize_page(out[0].pages[0].export(), input_page)
>>> plt.show()

Args:¶

page: the exported Page of a Document image: np array of the page, needs to have the same shape than page[‘dimensions’] words_only: whether only words should be displayed display_artefacts: whether artefacts should be displayed scale: figsize of the largest windows side interactive: whether the plot should be interactive add_labels: for static plot, adds text labels on top of bounding box **kwargs: keyword arguments for the polygon patch

Returns:¶

the matplotlib figure

Task evaluation¶

Implementations of task-specific metrics to easily assess your model performances.

class doctr.utils.metrics.TextMatch[source]¶

Implements text match metric (word-level accuracy) for recognition task.

The raw aggregated metric is computed as follows:

\[\forall X, Y \in \mathcal{W}^N, TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)\]

with the indicator function \(f_{a}\) defined as:

\[\begin{split}\forall a, x \in \mathcal{W}, f_a(x) = \left\{ \begin{array}{ll} 1 & \mbox{if } x = a \\ 0 & \mbox{otherwise.} \end{array} \right.\end{split}\]

where \(\mathcal{W}\) is the set of all possible character sequences, \(N\) is a strictly positive integer.

>>> from doctr.utils import TextMatch
>>> metric = TextMatch()
>>> metric.update(['Hello', 'world'], ['hello', 'world'])
>>> metric.summary()

update(gt: List[str], pred: List[str]) → None[source]¶: Update the state of the metric with new predictions

Args:¶

gt: list of groung-truth character sequences pred: list of predicted character sequences

summary() → Dict[str, float][source]¶

Computes the aggregated metrics

Returns:

a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
counterpart and its lower-case anyascii counterpart

class doctr.utils.metrics.LocalizationConfusion(iou_thresh: float = 0.5, use_polygons: bool = False)[source]¶

Implements common confusion metrics and mean IoU for localization evaluation.

The aggregated metrics are computed as follows:

\[\begin{split}\forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)\end{split}\]

with the function \(IoU(x, y)\) being the Intersection over Union between bounding boxes \(x\) and \(y\), and the function \(g_{X}\) defined as:

\[\begin{split}\forall y \in \mathcal{B}, g_X(y) = \left\{ \begin{array}{ll} 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\ 0 & \mbox{otherwise.} \end{array} \right.\end{split}\]

where \(\mathcal{B}\) is the set of possible bounding boxes, \(N\) (number of ground truths) and \(M\) (number of predictions) are strictly positive integers.

>>> import numpy as np
>>> from doctr.utils import LocalizationConfusion
>>> metric = LocalizationConfusion(iou_thresh=0.5)
>>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
>>> metric.summary()

Args:¶

iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match use_polygons: if set to True, predictions and targets will be expected to have rotated format

update(gts: ndarray, preds: ndarray) → None[source]¶: Updates the metric

Args:¶

gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones

summary() → Tuple[float | None, float | None, float | None][source]¶

Computes the aggregated metrics

Return type:: a tuple with the recall, precision and meanIoU scores

class doctr.utils.metrics.OCRMetric(iou_thresh: float = 0.5, use_polygons: bool = False)[source]¶

Implements an end-to-end OCR metric.

The aggregated metrics are computed as follows:

\[\begin{split}\forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N, \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)\end{split}\]

with the function \(IoU(x, y)\) being the Intersection over Union between bounding boxes \(x\) and \(y\), and the function \(h_{B, L}\) defined as:

\[\begin{split}\forall (b, l) \in \mathcal{B} \times \mathcal{L}, h_{B,L}(b, l) = \left\{ \begin{array}{ll} 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\ 0 & \mbox{otherwise.} \end{array} \right.\end{split}\]

where \(\mathcal{B}\) is the set of possible bounding boxes, \(\mathcal{L}\) is the set of possible character sequences, \(N\) (number of ground truths) and \(M\) (number of predictions) are strictly positive integers.

>>> import numpy as np
>>> from doctr.utils import OCRMetric
>>> metric = OCRMetric(iou_thresh=0.5)
>>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
>>>               ['hello'], ['hello', 'world'])
>>> metric.summary()

Args:¶

iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match use_polygons: if set to True, predictions and targets will be expected to have rotated format

update(gt_boxes: ndarray, pred_boxes: ndarray, gt_labels: List[str], pred_labels: List[str]) → None[source]¶: Updates the metric

Args:¶

gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones gt_labels: a list of N string labels pred_labels: a list of M string labels

summary() → Tuple[Dict[str, float | None], Dict[str, float | None], float | None][source]¶

Computes the aggregated metrics

Return type:: a tuple with the recall & precision for each string comparison and the mean IoU

class doctr.utils.metrics.DetectionMetric(iou_thresh: float = 0.5, use_polygons: bool = False)[source]¶

Implements an object detection metric.

The aggregated metrics are computed as follows:

\[\begin{split}\forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N, \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)\end{split}\]

with the function \(IoU(x, y)\) being the Intersection over Union between bounding boxes \(x\) and \(y\), and the function \(h_{B, C}\) defined as:

\[\begin{split}\forall (b, c) \in \mathcal{B} \times \mathcal{C}, h_{B,C}(b, c) = \left\{ \begin{array}{ll} 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\ 0 & \mbox{otherwise.} \end{array} \right.\end{split}\]

where \(\mathcal{B}\) is the set of possible bounding boxes, \(\mathcal{C}\) is the set of possible class indices, \(N\) (number of ground truths) and \(M\) (number of predictions) are strictly positive integers.

>>> import numpy as np
>>> from doctr.utils import DetectionMetric
>>> metric = DetectionMetric(iou_thresh=0.5)
>>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
>>>               np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
>>> metric.summary()

Args:¶

iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match use_polygons: if set to True, predictions and targets will be expected to have rotated format

update(gt_boxes: ndarray, pred_boxes: ndarray, gt_labels: ndarray, pred_labels: ndarray) → None[source]¶: Updates the metric

Args:¶

gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones gt_labels: an array of class indices of shape (N,) pred_labels: an array of class indices of shape (M,)

summary() → Tuple[float | None, float | None, float | None][source]¶

Computes the aggregated metrics

Return type:: a tuple with the recall & precision for each class prediction and the mean IoU