doctr.utils¶
This module regroups non-core features that are complementary to the rest of the package.
Visualization¶
Easy-to-use functions to make sense of your model’s predictions.
- doctr.utils.visualization.visualize_page(page: Dict[str, Any], image: ndarray, words_only: bool = True, display_artefacts: bool = True, scale: float = 10, interactive: bool = True, add_labels: bool = True, **kwargs: Any) Figure [source]¶
Visualize a full page with predicted blocks, lines and words
>>> import numpy as np >>> import matplotlib.pyplot as plt >>> from doctr.utils.visualization import visualize_page >>> from doctr.models import ocr_db_crnn >>> model = ocr_db_crnn(pretrained=True) >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8) >>> out = model([[input_page]]) >>> visualize_page(out[0].pages[0].export(), input_page) >>> plt.show()
Args:¶
page: the exported Page of a Document image: np array of the page, needs to have the same shape than page[‘dimensions’] words_only: whether only words should be displayed display_artefacts: whether artefacts should be displayed scale: figsize of the largest windows side interactive: whether the plot should be interactive add_labels: for static plot, adds text labels on top of bounding box **kwargs: keyword arguments for the polygon patch
Returns:¶
the matplotlib figure
Task evaluation¶
Implementations of task-specific metrics to easily assess your model performances.
- class doctr.utils.metrics.TextMatch[source]¶
Implements text match metric (word-level accuracy) for recognition task.
The raw aggregated metric is computed as follows:
\[\forall X, Y \in \mathcal{W}^N, TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)\]with the indicator function \(f_{a}\) defined as:
\[\begin{split}\forall a, x \in \mathcal{W}, f_a(x) = \left\{ \begin{array}{ll} 1 & \mbox{if } x = a \\ 0 & \mbox{otherwise.} \end{array} \right.\end{split}\]where \(\mathcal{W}\) is the set of all possible character sequences, \(N\) is a strictly positive integer.
>>> from doctr.utils import TextMatch >>> metric = TextMatch() >>> metric.update(['Hello', 'world'], ['hello', 'world']) >>> metric.summary()
- class doctr.utils.metrics.LocalizationConfusion(iou_thresh: float = 0.5, use_polygons: bool = False)[source]¶
Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
\[\begin{split}\forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)\end{split}\]with the function \(IoU(x, y)\) being the Intersection over Union between bounding boxes \(x\) and \(y\), and the function \(g_{X}\) defined as:
\[\begin{split}\forall y \in \mathcal{B}, g_X(y) = \left\{ \begin{array}{ll} 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\ 0 & \mbox{otherwise.} \end{array} \right.\end{split}\]where \(\mathcal{B}\) is the set of possible bounding boxes, \(N\) (number of ground truths) and \(M\) (number of predictions) are strictly positive integers.
>>> import numpy as np >>> from doctr.utils import LocalizationConfusion >>> metric = LocalizationConfusion(iou_thresh=0.5) >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]])) >>> metric.summary()
Args:¶
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match use_polygons: if set to True, predictions and targets will be expected to have rotated format
- class doctr.utils.metrics.OCRMetric(iou_thresh: float = 0.5, use_polygons: bool = False)[source]¶
Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
\[\begin{split}\forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N, \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)\end{split}\]with the function \(IoU(x, y)\) being the Intersection over Union between bounding boxes \(x\) and \(y\), and the function \(h_{B, L}\) defined as:
\[\begin{split}\forall (b, l) \in \mathcal{B} \times \mathcal{L}, h_{B,L}(b, l) = \left\{ \begin{array}{ll} 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\ 0 & \mbox{otherwise.} \end{array} \right.\end{split}\]where \(\mathcal{B}\) is the set of possible bounding boxes, \(\mathcal{L}\) is the set of possible character sequences, \(N\) (number of ground truths) and \(M\) (number of predictions) are strictly positive integers.
>>> import numpy as np >>> from doctr.utils import OCRMetric >>> metric = OCRMetric(iou_thresh=0.5) >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]), >>> ['hello'], ['hello', 'world']) >>> metric.summary()
Args:¶
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match use_polygons: if set to True, predictions and targets will be expected to have rotated format
- update(gt_boxes: ndarray, pred_boxes: ndarray, gt_labels: List[str], pred_labels: List[str]) None [source]¶
Updates the metric
Args:¶
gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones gt_labels: a list of N string labels pred_labels: a list of M string labels
- class doctr.utils.metrics.DetectionMetric(iou_thresh: float = 0.5, use_polygons: bool = False)[source]¶
Implements an object detection metric.
The aggregated metrics are computed as follows:
\[\begin{split}\forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N, \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)\end{split}\]with the function \(IoU(x, y)\) being the Intersection over Union between bounding boxes \(x\) and \(y\), and the function \(h_{B, C}\) defined as:
\[\begin{split}\forall (b, c) \in \mathcal{B} \times \mathcal{C}, h_{B,C}(b, c) = \left\{ \begin{array}{ll} 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\ 0 & \mbox{otherwise.} \end{array} \right.\end{split}\]where \(\mathcal{B}\) is the set of possible bounding boxes, \(\mathcal{C}\) is the set of possible class indices, \(N\) (number of ground truths) and \(M\) (number of predictions) are strictly positive integers.
>>> import numpy as np >>> from doctr.utils import DetectionMetric >>> metric = DetectionMetric(iou_thresh=0.5) >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]), >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64)) >>> metric.summary()
Args:¶
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match use_polygons: if set to True, predictions and targets will be expected to have rotated format
- update(gt_boxes: ndarray, pred_boxes: ndarray, gt_labels: ndarray, pred_labels: ndarray) None [source]¶
Updates the metric
Args:¶
gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones gt_labels: an array of class indices of shape (N,) pred_labels: an array of class indices of shape (M,)