doctr.transforms#

Data transformations are part of both training and inference procedure. Drawing inspiration from the design of torchvision, we express transformations as composable modules.

Supported transformations#

Here are all transformations that are available through docTR:

class doctr.transforms.Resize(output_size: Union[int, Tuple[int, int]], method: str = 'bilinear', preserve_aspect_ratio: bool = False, symmetric_pad: bool = False)[source]#

Resizes a tensor to a target size

>>> import tensorflow as tf
>>> from doctr.transforms import Resize
>>> transfo = Resize((32, 32))
>>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Parameters
  • output_size – expected output size

  • method – interpolation method

  • preserve_aspect_ratio – if True, preserve aspect ratio and pad the rest with zeros

  • symmetric_pad – if True while preserving aspect ratio, the padding will be done symmetrically

class doctr.transforms.Normalize(mean: Tuple[float, float, float], std: Tuple[float, float, float])[source]#

Normalize a tensor to a Gaussian distribution for each channel

>>> import tensorflow as tf
>>> from doctr.transforms import Normalize
>>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Parameters
  • mean – average value per channel

  • std – standard deviation per channel

class doctr.transforms.LambdaTransformation(fn: Callable[[Tensor], Tensor])[source]#

Normalize a tensor to a Gaussian distribution for each channel

>>> import tensorflow as tf
>>> from doctr.transforms import LambdaTransformation
>>> transfo = LambdaTransformation(lambda x: x/ 255.)
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Parameters

fn – the function to be applied to the input tensor

class doctr.transforms.ToGray(num_output_channels: int = 1)[source]#

Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor

>>> import tensorflow as tf
>>> from doctr.transforms import ToGray
>>> transfo = ToGray()
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
class doctr.transforms.ColorInversion(min_val: float = 0.5)[source]#

Applies the following tranformation to a tensor (image or batch of images): convert to grayscale, colorize (shift 0-values randomly), and then invert colors

>>> import tensorflow as tf
>>> from doctr.transforms import ColorInversion
>>> transfo = ColorInversion(min_val=0.6)
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Parameters

min_val – range [min_val, 1] to colorize RGB pixels

class doctr.transforms.RandomBrightness(max_delta: float = 0.3)[source]#

Randomly adjust brightness of a tensor (batch of images or image) by adding a delta to all pixels

>>> import tensorflow as tf
>>> from doctr.transforms import RandomBrightness
>>> transfo = RandomBrightness()
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Parameters
  • max_delta – offset to add to each pixel is randomly picked in [-max_delta, max_delta]

  • p – probability to apply transformation

class doctr.transforms.RandomContrast(delta: float = 0.3)[source]#

Randomly adjust contrast of a tensor (batch of images or image) by adjusting each pixel: (img - mean) * contrast_factor + mean.

>>> import tensorflow as tf
>>> from doctr.transforms import RandomContrast
>>> transfo = RandomContrast()
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Parameters

delta – multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)

class doctr.transforms.RandomSaturation(delta: float = 0.5)[source]#

Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and increasing saturation by a factor.

>>> import tensorflow as tf
>>> from doctr.transforms import RandomSaturation
>>> transfo = RandomSaturation()
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Parameters

delta – multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)

class doctr.transforms.RandomHue(max_delta: float = 0.3)[source]#

Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta

>>> import tensorflow as tf
>>> from doctr.transforms import RandomHue
>>> transfo = RandomHue()
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Parameters

max_delta – offset to add to each pixel is randomly picked in [-max_delta, max_delta]

class doctr.transforms.RandomGamma(min_gamma: float = 0.5, max_gamma: float = 1.5, min_gain: float = 0.8, max_gain: float = 1.2)[source]#

randomly performs gamma correction for a tensor (batch of images or image)

>>> import tensorflow as tf
>>> from doctr.transforms import RandomGamma
>>> transfo = RandomGamma()
>>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Parameters
  • min_gamma – non-negative real number, lower bound for gamma param

  • max_gamma – non-negative real number, upper bound for gamma

  • min_gain – lower bound for constant multiplier

  • max_gain – upper bound for constant multiplier

class doctr.transforms.RandomJpegQuality(min_quality: int = 60, max_quality: int = 100)[source]#

Randomly adjust jpeg quality of a 3 dimensional RGB image

>>> import tensorflow as tf
>>> from doctr.transforms import RandomJpegQuality
>>> transfo = RandomJpegQuality()
>>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Parameters
  • min_quality – int between [0, 100]

  • max_quality – int between [0, 100]

class doctr.transforms.RandomRotate(max_angle: float = 5.0, expand: bool = False)[source]#

Randomly rotate a tensor image and its boxes

https://github.com/mindee/doctr/releases/download/v0.4.0/rotation_illustration.png
Parameters
  • max_angle – maximum angle for rotation, in degrees. Angles will be uniformly picked in [-max_angle, max_angle]

  • expand – whether the image should be padded before the rotation

class doctr.transforms.RandomCrop(scale: Tuple[float, float] = (0.08, 1.0), ratio: Tuple[float, float] = (0.75, 1.33))[source]#

Randomly crop a tensor image and its boxes

Parameters
  • scale – tuple of floats, relative (min_area, max_area) of the crop

  • ratio – tuple of float, relative (min_ratio, max_ratio) where ratio = h/w

class doctr.transforms.GaussianBlur(kernel_shape: Union[int, Iterable[int]], std: Tuple[float, float])[source]#

Randomly adjust jpeg quality of a 3 dimensional RGB image

>>> import tensorflow as tf
>>> from doctr.transforms import GaussianBlur
>>> transfo = GaussianBlur(3, (.1, 5))
>>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Parameters
  • kernel_shape – size of the blurring kernel

  • std – min and max value of the standard deviation

class doctr.transforms.ChannelShuffle[source]#

Randomly shuffle channel order of a given image

class doctr.transforms.GaussianNoise(mean: float = 0.0, std: float = 1.0)[source]#

Adds Gaussian Noise to the input tensor

>>> import tensorflow as tf
>>> from doctr.transforms import GaussianNoise
>>> transfo = GaussianNoise(0., 1.)
>>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Parameters
  • mean – mean of the gaussian distribution

  • std – std of the gaussian distribution

class doctr.transforms.RandomHorizontalFlip(p: float)[source]#

Adds random horizontal flip to the input tensor/np.ndarray

>>> import tensorflow as tf
>>> from doctr.transforms import RandomHorizontalFlip
>>> transfo = RandomHorizontalFlip(p=0.5)
>>> image = tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1)
>>> target = {
>>> "boxes": np.array([[0.1, 0.1, 0.4, 0.5] ], dtype= np.float32),
>>> "labels": np.ones(1, dtype= np.int64)
>>> }
>>> out = transfo(image, target)
Parameters

p – probability of Horizontal Flip

class doctr.transforms.RandomShadow(opacity_range: Optional[Tuple[float, float]] = None)[source]#

Adds random shade to the input image

>>> import tensorflow as tf
>>> from doctr.transforms import RandomShadow
>>> transfo = RandomShadow(0., 1.)
>>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Parameters

opacity_range – minimum and maximum opacity of the shade

Composing transformations#

It is common to require several transformations to be performed consecutively.

class doctr.transforms.Compose(transforms: List[Callable[[Any], Any]])[source]#

Implements a wrapper that will apply transformations sequentially

>>> import tensorflow as tf
>>> from doctr.transforms import Compose, Resize
>>> transfos = Compose([Resize((32, 32))])
>>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Parameters

transforms – list of transformation modules

class doctr.transforms.OneOf(transforms: List[Callable[[Any], Any]])[source]#

Randomly apply one of the input transformations

>>> import tensorflow as tf
>>> from doctr.transforms import OneOf
>>> transfo = OneOf([JpegQuality(), Gamma()])
>>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Parameters

transforms – list of transformations, one only will be picked

class doctr.transforms.RandomApply(transform: Callable[[Any], Any], p: float = 0.5)[source]#

Apply with a probability p the input transformation

>>> import tensorflow as tf
>>> from doctr.transforms import RandomApply
>>> transfo = RandomApply(Gamma(), p=.5)
>>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Parameters
  • transform – transformation to apply

  • p – probability to apply