doctr.transforms¶
Data transformations are part of both training and inference procedure. Drawing inspiration from the design of torchvision, we express transformations as composable modules.
Supported transformations¶
Here are all transformations that are available through docTR:
- class doctr.transforms.Resize(output_size: int | Tuple[int, int], method: str = 'bilinear', preserve_aspect_ratio: bool = False, symmetric_pad: bool = False)[source]¶
Resizes a tensor to a target size
>>> import tensorflow as tf >>> from doctr.transforms import Resize >>> transfo = Resize((32, 32)) >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Args:¶
output_size: expected output size method: interpolation method preserve_aspect_ratio: if True, preserve aspect ratio and pad the rest with zeros symmetric_pad: if True while preserving aspect ratio, the padding will be done symmetrically
- class doctr.transforms.Normalize(mean: Tuple[float, float, float], std: Tuple[float, float, float])[source]¶
Normalize a tensor to a Gaussian distribution for each channel
>>> import tensorflow as tf >>> from doctr.transforms import Normalize >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Args:¶
mean: average value per channel std: standard deviation per channel
- class doctr.transforms.LambdaTransformation(fn: Callable[[Tensor], Tensor])[source]¶
Normalize a tensor to a Gaussian distribution for each channel
>>> import tensorflow as tf >>> from doctr.transforms import LambdaTransformation >>> transfo = LambdaTransformation(lambda x: x/ 255.) >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Args:¶
fn: the function to be applied to the input tensor
- class doctr.transforms.ToGray(num_output_channels: int = 1)[source]¶
Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
>>> import tensorflow as tf >>> from doctr.transforms import ToGray >>> transfo = ToGray() >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- class doctr.transforms.ColorInversion(min_val: float = 0.5)[source]¶
Applies the following tranformation to a tensor (image or batch of images): convert to grayscale, colorize (shift 0-values randomly), and then invert colors
>>> import tensorflow as tf >>> from doctr.transforms import ColorInversion >>> transfo = ColorInversion(min_val=0.6) >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
>>> import torch >>> from doctr.transforms import ColorInversion >>> transfo = ColorInversion(min_val=0.6) >>> out = transfo(torch.rand(8, 64, 64, 3))
Args:¶
min_val: range [min_val, 1] to colorize RGB pixels
- class doctr.transforms.RandomBrightness(max_delta: float = 0.3)[source]¶
Randomly adjust brightness of a tensor (batch of images or image) by adding a delta to all pixels
>>> import tensorflow as tf >>> from doctr.transforms import RandomBrightness >>> transfo = RandomBrightness() >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Args:¶
max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta] p: probability to apply transformation
- class doctr.transforms.RandomContrast(delta: float = 0.3)[source]¶
Randomly adjust contrast of a tensor (batch of images or image) by adjusting each pixel: (img - mean) * contrast_factor + mean.
>>> import tensorflow as tf >>> from doctr.transforms import RandomContrast >>> transfo = RandomContrast() >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Args:¶
delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- class doctr.transforms.RandomSaturation(delta: float = 0.5)[source]¶
Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and increasing saturation by a factor.
>>> import tensorflow as tf >>> from doctr.transforms import RandomSaturation >>> transfo = RandomSaturation() >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Args:¶
delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- class doctr.transforms.RandomHue(max_delta: float = 0.3)[source]¶
Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
>>> import tensorflow as tf >>> from doctr.transforms import RandomHue >>> transfo = RandomHue() >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Args:¶
max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- class doctr.transforms.RandomGamma(min_gamma: float = 0.5, max_gamma: float = 1.5, min_gain: float = 0.8, max_gain: float = 1.2)[source]¶
randomly performs gamma correction for a tensor (batch of images or image)
>>> import tensorflow as tf >>> from doctr.transforms import RandomGamma >>> transfo = RandomGamma() >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
Args:¶
min_gamma: non-negative real number, lower bound for gamma param max_gamma: non-negative real number, upper bound for gamma min_gain: lower bound for constant multiplier max_gain: upper bound for constant multiplier
- class doctr.transforms.RandomJpegQuality(min_quality: int = 60, max_quality: int = 100)[source]¶
Randomly adjust jpeg quality of a 3 dimensional RGB image
>>> import tensorflow as tf >>> from doctr.transforms import RandomJpegQuality >>> transfo = RandomJpegQuality() >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Args:¶
min_quality: int between [0, 100] max_quality: int between [0, 100]
- class doctr.transforms.RandomRotate(max_angle: float = 5.0, expand: bool = False)[source]¶
Randomly rotate a tensor image and its boxes
Args:¶
- max_angle: maximum angle for rotation, in degrees. Angles will be uniformly picked in
[-max_angle, max_angle]
expand: whether the image should be padded before the rotation
- class doctr.transforms.RandomCrop(scale: Tuple[float, float] = (0.08, 1.0), ratio: Tuple[float, float] = (0.75, 1.33))[source]¶
Randomly crop a tensor image and its boxes
Args:¶
scale: tuple of floats, relative (min_area, max_area) of the crop ratio: tuple of float, relative (min_ratio, max_ratio) where ratio = h/w
- class doctr.transforms.GaussianBlur(kernel_shape: int | Iterable[int], std: Tuple[float, float])[source]¶
Randomly adjust jpeg quality of a 3 dimensional RGB image
>>> import tensorflow as tf >>> from doctr.transforms import GaussianBlur >>> transfo = GaussianBlur(3, (.1, 5)) >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Args:¶
kernel_shape: size of the blurring kernel std: min and max value of the standard deviation
- class doctr.transforms.GaussianNoise(mean: float = 0.0, std: float = 1.0)[source]¶
Adds Gaussian Noise to the input tensor
>>> import tensorflow as tf >>> from doctr.transforms import GaussianNoise >>> transfo = GaussianNoise(0., 1.) >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Args:¶
mean : mean of the gaussian distribution std : std of the gaussian distribution
- class doctr.transforms.RandomHorizontalFlip(p: float)[source]¶
Adds random horizontal flip to the input tensor/np.ndarray
>>> import tensorflow as tf >>> from doctr.transforms import RandomHorizontalFlip >>> transfo = RandomHorizontalFlip(p=0.5) >>> image = tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1) >>> target = np.array([[0.1, 0.1, 0.4, 0.5] ], dtype= np.float32) >>> out = transfo(image, target)
Args:¶
p : probability of Horizontal Flip
- class doctr.transforms.RandomShadow(opacity_range: Tuple[float, float] | None = None)[source]¶
Adds random shade to the input image
>>> import tensorflow as tf >>> from doctr.transforms import RandomShadow >>> transfo = RandomShadow(0., 1.) >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Args:¶
opacity_range : minimum and maximum opacity of the shade
- class doctr.transforms.RandomResize(scale_range: Tuple[float, float] = (0.3, 0.9), preserve_aspect_ratio: bool | float = False, symmetric_pad: bool | float = False, p: float = 0.5)[source]¶
Randomly resize the input image and align corresponding targets
>>> import tensorflow as tf >>> from doctr.transforms import RandomResize >>> transfo = RandomResize((0.3, 0.9), preserve_aspect_ratio=True, symmetric_pad=True, p=0.5) >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Args:¶
scale_range: range of the resizing factor for width and height (independently) preserve_aspect_ratio: whether to preserve the aspect ratio of the image,
given a float value, the aspect ratio will be preserved with this probability
- symmetric_pad: whether to symmetrically pad the image,
given a float value, the symmetric padding will be applied with this probability
p: probability to apply the transformation
Composing transformations¶
It is common to require several transformations to be performed consecutively.
- class doctr.transforms.Compose(transforms: List[Callable[[Any], Any]])[source]¶
Implements a wrapper that will apply transformations sequentially
>>> import tensorflow as tf >>> from doctr.transforms import Compose, Resize >>> transfos = Compose([Resize((32, 32))]) >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
Args:¶
transforms: list of transformation modules
- class doctr.transforms.OneOf(transforms: List[Callable[[Any], Any]])[source]¶
Randomly apply one of the input transformations
>>> import tensorflow as tf >>> from doctr.transforms import OneOf >>> transfo = OneOf([JpegQuality(), Gamma()]) >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
>>> import torch >>> from doctr.transforms import OneOf >>> transfo = OneOf([JpegQuality(), Gamma()]) >>> out = transfo(torch.rand(1, 64, 64, 3))
Args:¶
transforms: list of transformations, one only will be picked
- class doctr.transforms.RandomApply(transform: Callable[[Any], Any], p: float = 0.5)[source]¶
Apply with a probability p the input transformation
>>> import tensorflow as tf >>> from doctr.transforms import RandomApply >>> transfo = RandomApply(Gamma(), p=.5) >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
>>> import torch >>> from doctr.transforms import RandomApply >>> transfo = RandomApply(Gamma(), p=.5) >>> out = transfo(torch.rand(1, 64, 64, 3))
Args:¶
transform: transformation to apply p: probability to apply