doctr.transforms#
Data transformations are part of both training and inference procedure. Drawing inspiration from the design of torchvision, we express transformations as composable modules.
Supported transformations#
Here are all transformations that are available through docTR:
- class doctr.transforms.Resize(output_size: int | Tuple[int, int], method: str = 'bilinear', preserve_aspect_ratio: bool = False, symmetric_pad: bool = False)[source]#
Resizes a tensor to a target size
>>> import tensorflow as tf >>> from doctr.transforms import Resize >>> transfo = Resize((32, 32)) >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Args:#
output_size: expected output size method: interpolation method preserve_aspect_ratio: if True, preserve aspect ratio and pad the rest with zeros symmetric_pad: if True while preserving aspect ratio, the padding will be done symmetrically
- class doctr.transforms.Normalize(mean: Tuple[float, float, float], std: Tuple[float, float, float])[source]#
Normalize a tensor to a Gaussian distribution for each channel
>>> import tensorflow as tf >>> from doctr.transforms import Normalize >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Args:#
mean: average value per channel std: standard deviation per channel
- class doctr.transforms.LambdaTransformation(fn: Callable[[Tensor], Tensor])[source]#
Normalize a tensor to a Gaussian distribution for each channel
>>> import tensorflow as tf >>> from doctr.transforms import LambdaTransformation >>> transfo = LambdaTransformation(lambda x: x/ 255.) >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Args:#
fn: the function to be applied to the input tensor
- class doctr.transforms.ToGray(num_output_channels: int = 1)[source]#
Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
>>> import tensorflow as tf >>> from doctr.transforms import ToGray >>> transfo = ToGray() >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- class doctr.transforms.ColorInversion(min_val: float = 0.5)[source]#
Applies the following tranformation to a tensor (image or batch of images): convert to grayscale, colorize (shift 0-values randomly), and then invert colors
>>> import tensorflow as tf >>> from doctr.transforms import ColorInversion >>> transfo = ColorInversion(min_val=0.6) >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
>>> import torch >>> from doctr.transforms import ColorInversion >>> transfo = ColorInversion(min_val=0.6) >>> out = transfo(torch.rand(8, 64, 64, 3))
Args:#
min_val: range [min_val, 1] to colorize RGB pixels
- class doctr.transforms.RandomBrightness(max_delta: float = 0.3)[source]#
Randomly adjust brightness of a tensor (batch of images or image) by adding a delta to all pixels
>>> import tensorflow as tf >>> from doctr.transforms import RandomBrightness >>> transfo = RandomBrightness() >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Args:#
max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta] p: probability to apply transformation
- class doctr.transforms.RandomContrast(delta: float = 0.3)[source]#
Randomly adjust contrast of a tensor (batch of images or image) by adjusting each pixel: (img - mean) * contrast_factor + mean.
>>> import tensorflow as tf >>> from doctr.transforms import RandomContrast >>> transfo = RandomContrast() >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Args:#
delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- class doctr.transforms.RandomSaturation(delta: float = 0.5)[source]#
Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and increasing saturation by a factor.
>>> import tensorflow as tf >>> from doctr.transforms import RandomSaturation >>> transfo = RandomSaturation() >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Args:#
delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- class doctr.transforms.RandomHue(max_delta: float = 0.3)[source]#
Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
>>> import tensorflow as tf >>> from doctr.transforms import RandomHue >>> transfo = RandomHue() >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Args:#
max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- class doctr.transforms.RandomGamma(min_gamma: float = 0.5, max_gamma: float = 1.5, min_gain: float = 0.8, max_gain: float = 1.2)[source]#
randomly performs gamma correction for a tensor (batch of images or image)
>>> import tensorflow as tf >>> from doctr.transforms import RandomGamma >>> transfo = RandomGamma() >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Args:#
min_gamma: non-negative real number, lower bound for gamma param max_gamma: non-negative real number, upper bound for gamma min_gain: lower bound for constant multiplier max_gain: upper bound for constant multiplier
- class doctr.transforms.RandomJpegQuality(min_quality: int = 60, max_quality: int = 100)[source]#
Randomly adjust jpeg quality of a 3 dimensional RGB image
>>> import tensorflow as tf >>> from doctr.transforms import RandomJpegQuality >>> transfo = RandomJpegQuality() >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Args:#
min_quality: int between [0, 100] max_quality: int between [0, 100]
- class doctr.transforms.RandomRotate(max_angle: float = 5.0, expand: bool = False)[source]#
Randomly rotate a tensor image and its boxes
Args:#
- max_angle: maximum angle for rotation, in degrees. Angles will be uniformly picked in
[-max_angle, max_angle]
expand: whether the image should be padded before the rotation
- class doctr.transforms.RandomCrop(scale: Tuple[float, float] = (0.08, 1.0), ratio: Tuple[float, float] = (0.75, 1.33))[source]#
Randomly crop a tensor image and its boxes
Args:#
scale: tuple of floats, relative (min_area, max_area) of the crop ratio: tuple of float, relative (min_ratio, max_ratio) where ratio = h/w
- class doctr.transforms.GaussianBlur(kernel_shape: int | Iterable[int], std: Tuple[float, float])[source]#
Randomly adjust jpeg quality of a 3 dimensional RGB image
>>> import tensorflow as tf >>> from doctr.transforms import GaussianBlur >>> transfo = GaussianBlur(3, (.1, 5)) >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Args:#
kernel_shape: size of the blurring kernel std: min and max value of the standard deviation
- class doctr.transforms.GaussianNoise(mean: float = 0.0, std: float = 1.0)[source]#
Adds Gaussian Noise to the input tensor
>>> import tensorflow as tf >>> from doctr.transforms import GaussianNoise >>> transfo = GaussianNoise(0., 1.) >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Args:#
mean : mean of the gaussian distribution std : std of the gaussian distribution
- class doctr.transforms.RandomHorizontalFlip(p: float)[source]#
Adds random horizontal flip to the input tensor/np.ndarray
>>> import tensorflow as tf >>> from doctr.transforms import RandomHorizontalFlip >>> transfo = RandomHorizontalFlip(p=0.5) >>> image = tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1) >>> target = { >>> "boxes": np.array([[0.1, 0.1, 0.4, 0.5] ], dtype= np.float32), >>> "labels": np.ones(1, dtype= np.int64) >>> } >>> out = transfo(image, target)
Args:#
p : probability of Horizontal Flip
- class doctr.transforms.RandomShadow(opacity_range: Tuple[float, float] | None = None)[source]#
Adds random shade to the input image
>>> import tensorflow as tf >>> from doctr.transforms import RandomShadow >>> transfo = RandomShadow(0., 1.) >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Args:#
opacity_range : minimum and maximum opacity of the shade
Composing transformations#
It is common to require several transformations to be performed consecutively.
- class doctr.transforms.Compose(transforms: List[Callable[[Any], Any]])[source]#
Implements a wrapper that will apply transformations sequentially
>>> import tensorflow as tf >>> from doctr.transforms import Compose, Resize >>> transfos = Compose([Resize((32, 32))]) >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Args:#
transforms: list of transformation modules
- class doctr.transforms.OneOf(transforms: List[Callable[[Any], Any]])[source]#
Randomly apply one of the input transformations
>>> import tensorflow as tf >>> from doctr.transforms import OneOf >>> transfo = OneOf([JpegQuality(), Gamma()]) >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
>>> import torch >>> from doctr.transforms import OneOf >>> transfo = OneOf([JpegQuality(), Gamma()]) >>> out = transfo(torch.rand(1, 64, 64, 3))
Args:#
transforms: list of transformations, one only will be picked
- class doctr.transforms.RandomApply(transform: Callable[[Any], Any], p: float = 0.5)[source]#
Apply with a probability p the input transformation
>>> import tensorflow as tf >>> from doctr.transforms import RandomApply >>> transfo = RandomApply(Gamma(), p=.5) >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
>>> import torch >>> from doctr.transforms import RandomApply >>> transfo = RandomApply(Gamma(), p=.5) >>> out = transfo(torch.rand(1, 64, 64, 3))
Args:#
transform: transformation to apply p: probability to apply