doctr.transforms#

Data transformations are part of both training and inference procedure. Drawing inspiration from the design of torchvision, we express transformations as composable modules.

Supported transformations#

Here are all transformations that are available through docTR:

class doctr.transforms.Resize(output_size: int | Tuple[int, int], method: str = 'bilinear', preserve_aspect_ratio: bool = False, symmetric_pad: bool = False)[source]#

Resizes a tensor to a target size

>>> import tensorflow as tf
>>> from doctr.transforms import Resize
>>> transfo = Resize((32, 32))
>>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))

Args:#

output_size: expected output size method: interpolation method preserve_aspect_ratio: if True, preserve aspect ratio and pad the rest with zeros symmetric_pad: if True while preserving aspect ratio, the padding will be done symmetrically

class doctr.transforms.Normalize(mean: Tuple[float, float, float], std: Tuple[float, float, float])[source]#

Normalize a tensor to a Gaussian distribution for each channel

>>> import tensorflow as tf
>>> from doctr.transforms import Normalize
>>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))

Args:#

mean: average value per channel std: standard deviation per channel

class doctr.transforms.LambdaTransformation(fn: Callable[[Tensor], Tensor])[source]#

Normalize a tensor to a Gaussian distribution for each channel

>>> import tensorflow as tf
>>> from doctr.transforms import LambdaTransformation
>>> transfo = LambdaTransformation(lambda x: x/ 255.)
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))

Args:#

fn: the function to be applied to the input tensor

class doctr.transforms.ToGray(num_output_channels: int = 1)[source]#

Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor

>>> import tensorflow as tf
>>> from doctr.transforms import ToGray
>>> transfo = ToGray()
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
class doctr.transforms.ColorInversion(min_val: float = 0.5)[source]#

Applies the following tranformation to a tensor (image or batch of images): convert to grayscale, colorize (shift 0-values randomly), and then invert colors

>>> import tensorflow as tf
>>> from doctr.transforms import ColorInversion
>>> transfo = ColorInversion(min_val=0.6)
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))

Args:#

min_val: range [min_val, 1] to colorize RGB pixels

class doctr.transforms.RandomBrightness(max_delta: float = 0.3)[source]#

Randomly adjust brightness of a tensor (batch of images or image) by adding a delta to all pixels

>>> import tensorflow as tf
>>> from doctr.transforms import RandomBrightness
>>> transfo = RandomBrightness()
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))

Args:#

max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta] p: probability to apply transformation

class doctr.transforms.RandomContrast(delta: float = 0.3)[source]#

Randomly adjust contrast of a tensor (batch of images or image) by adjusting each pixel: (img - mean) * contrast_factor + mean.

>>> import tensorflow as tf
>>> from doctr.transforms import RandomContrast
>>> transfo = RandomContrast()
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))

Args:#

delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)

class doctr.transforms.RandomSaturation(delta: float = 0.5)[source]#

Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and increasing saturation by a factor.

>>> import tensorflow as tf
>>> from doctr.transforms import RandomSaturation
>>> transfo = RandomSaturation()
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))

Args:#

delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)

class doctr.transforms.RandomHue(max_delta: float = 0.3)[source]#

Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta

>>> import tensorflow as tf
>>> from doctr.transforms import RandomHue
>>> transfo = RandomHue()
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))

Args:#

max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]

class doctr.transforms.RandomGamma(min_gamma: float = 0.5, max_gamma: float = 1.5, min_gain: float = 0.8, max_gain: float = 1.2)[source]#

randomly performs gamma correction for a tensor (batch of images or image)

>>> import tensorflow as tf
>>> from doctr.transforms import RandomGamma
>>> transfo = RandomGamma()
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))

Args:#

min_gamma: non-negative real number, lower bound for gamma param max_gamma: non-negative real number, upper bound for gamma min_gain: lower bound for constant multiplier max_gain: upper bound for constant multiplier

class doctr.transforms.RandomJpegQuality(min_quality: int = 60, max_quality: int = 100)[source]#

Randomly adjust jpeg quality of a 3 dimensional RGB image

>>> import tensorflow as tf
>>> from doctr.transforms import RandomJpegQuality
>>> transfo = RandomJpegQuality()
>>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))

Args:#

min_quality: int between [0, 100] max_quality: int between [0, 100]

class doctr.transforms.RandomRotate(max_angle: float = 5.0, expand: bool = False)[source]#

Randomly rotate a tensor image and its boxes

https://doctr-static.mindee.com/models?id=v0.4.0/rotation_illustration.png&src=0

Args:#

max_angle: maximum angle for rotation, in degrees. Angles will be uniformly picked in

[-max_angle, max_angle]

expand: whether the image should be padded before the rotation

class doctr.transforms.RandomCrop(scale: Tuple[float, float] = (0.08, 1.0), ratio: Tuple[float, float] = (0.75, 1.33))[source]#

Randomly crop a tensor image and its boxes

Args:#

scale: tuple of floats, relative (min_area, max_area) of the crop ratio: tuple of float, relative (min_ratio, max_ratio) where ratio = h/w

class doctr.transforms.GaussianBlur(kernel_shape: int | Iterable[int], std: Tuple[float, float])[source]#

Randomly adjust jpeg quality of a 3 dimensional RGB image

>>> import tensorflow as tf
>>> from doctr.transforms import GaussianBlur
>>> transfo = GaussianBlur(3, (.1, 5))
>>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))

Args:#

kernel_shape: size of the blurring kernel std: min and max value of the standard deviation

class doctr.transforms.ChannelShuffle[source]#

Randomly shuffle channel order of a given image

class doctr.transforms.GaussianNoise(mean: float = 0.0, std: float = 1.0)[source]#

Adds Gaussian Noise to the input tensor

>>> import tensorflow as tf
>>> from doctr.transforms import GaussianNoise
>>> transfo = GaussianNoise(0., 1.)
>>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))

Args:#

mean : mean of the gaussian distribution std : std of the gaussian distribution

class doctr.transforms.RandomHorizontalFlip(p: float)[source]#

Adds random horizontal flip to the input tensor/np.ndarray

>>> import tensorflow as tf
>>> from doctr.transforms import RandomHorizontalFlip
>>> transfo = RandomHorizontalFlip(p=0.5)
>>> image = tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1)
>>> target = {
>>> "boxes": np.array([[0.1, 0.1, 0.4, 0.5] ], dtype= np.float32),
>>> "labels": np.ones(1, dtype= np.int64)
>>> }
>>> out = transfo(image, target)

Args:#

p : probability of Horizontal Flip

class doctr.transforms.RandomShadow(opacity_range: Tuple[float, float] | None = None)[source]#

Adds random shade to the input image

>>> import tensorflow as tf
>>> from doctr.transforms import RandomShadow
>>> transfo = RandomShadow(0., 1.)
>>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))

Args:#

opacity_range : minimum and maximum opacity of the shade

Composing transformations#

It is common to require several transformations to be performed consecutively.

class doctr.transforms.Compose(transforms: List[Callable[[Any], Any]])[source]#

Implements a wrapper that will apply transformations sequentially

>>> import tensorflow as tf
>>> from doctr.transforms import Compose, Resize
>>> transfos = Compose([Resize((32, 32))])
>>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))

Args:#

transforms: list of transformation modules

class doctr.transforms.OneOf(transforms: List[Callable[[Any], Any]])[source]#

Randomly apply one of the input transformations

>>> import tensorflow as tf
>>> from doctr.transforms import OneOf
>>> transfo = OneOf([JpegQuality(), Gamma()])
>>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))

Args:#

transforms: list of transformations, one only will be picked

class doctr.transforms.RandomApply(transform: Callable[[Any], Any], p: float = 0.5)[source]#

Apply with a probability p the input transformation

>>> import tensorflow as tf
>>> from doctr.transforms import RandomApply
>>> transfo = RandomApply(Gamma(), p=.5)
>>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))

Args:#

transform: transformation to apply p: probability to apply