Input

class Base64Input(base64_string, filename)

Base64-encoded text input.

Parameters:
  • base64_string (str) –

  • filename (str) –

class BytesInput(raw_bytes, filename)

Raw bytes input.

Parameters:
  • raw_bytes (bytes) –

  • filename (str) –

class FileInput(file)

A binary file input.

Parameters:

file (BinaryIO) –

class InputType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

The input type, for internal use.

BASE64 = 'base64'
BYTES = 'bytes'
FILE = 'file'
PATH = 'path'
URL = 'url'
class LocalInputSource(input_type)

Base class for all input sources coming from the local machine.

Parameters:

input_type (InputType) –

close()

Close the file object.

Return type:

None

count_doc_pages()

Count the pages in the PDF.

Return type:

int

Returns:

the number of pages.

fix_pdf(maximum_offset=500)

Fix a potentially broken pdf file.

WARNING: this feature alters the data of the enqueued file by removing unnecessary headers.

Reads the bytes of a PDF file until a proper pdf tag is encountered, or until the maximum offset has been reached. If a tag denoting a PDF file is found, deletes all bytes before it.

Parameters:

maximum_offset (int, default: 500) – maximum byte offset where superfluous headers will be removed. Cannot be less than 0.

Return type:

None

is_pdf()
Return type:

bool

Returns:

True if the file is a PDF.

is_pdf_empty()

Check if the PDF is empty.

Return type:

bool

Returns:

True if the PDF is empty

merge_pdf_pages(page_numbers)

Create a new PDF from pages and set it to file_object.

Parameters:

page_numbers (set) – List of pages number to use for merging in the original PDF.

Return type:

None

Returns:

None

process_pdf(behavior, on_min_pages, page_indexes)

Run any required processing on a PDF file.

Return type:

None

Parameters:
  • behavior (str) –

  • on_min_pages (int) –

  • page_indexes (Sequence) –

read_contents(close_file)

Read the contents of the input file.

Parameters:

close_file (bool) – whether to close the file after reading

Return type:

Tuple[str, bytes]

Returns:

a Tuple with the file name and binary data

file_mimetype: str
file_object: BinaryIO
filename: str
filepath: Optional[str]
input_type: InputType
class LocalResponse(input_file)

Local response loaded from a file.

Parameters:

input_file (Union[BinaryIO, str, Path, bytes]) –

get_hmac_signature(secret_key)

Returns the hmac signature of the local response, from the secret key provided.

Parameters:

secret_key (Union[str, bytes, bytearray]) – Secret key, either a string or a byte/byte array.

Returns:

The hmac signature of the local response.

is_valid_hmac_signature(secret_key, signature)

Checks if the hmac signature of the local response is valid.

Parameters:
  • secret_key (Union[str, bytes, bytearray]) – Secret key, given as a string.

  • signature (str) – HMAC signature, given as a string.

Returns:

True if the HMAC signature is valid.

property as_dict: Dict[str, Any]

Returns the dictionary representation of the file.

Returns:

A json-like dictionary.

class PageOptions(page_indexes, operation='KEEP_ONLY', on_min_pages=0)

Options to pass to the parse method for cutting multipage documents.

Parameters:
  • page_indexes (Sequence[int]) –

  • operation (str) –

  • on_min_pages (int) –

on_min_pages: int

Apply the operation only if document has at least this many pages.

Default: 0 (apply on all documents)

operation: str

Operation to apply on the document, given the page_indexes specified:

  • KEEP_ONLY - keep only the specified pages, and remove all others.

  • REMOVE - remove the specified pages, and keep all others.

page_indexes: Sequence[int]

Zero-based list of page indexes. A negative index can be used, indicating an offset from the end of the document.

[0, -1] represents the fist and last pages of the document.

class PathInput(filepath)

A local path input.

Parameters:

filepath (Optional[str]) –

class UrlInputSource(url)

A local or distant URL input.

Parameters:

url (str) –

as_local_input_source(filename=None, username=None, password=None, token=None, headers=None, max_redirects=3)

Convert the URL content to a BytesInput object.

Parameters:
  • filename (Optional[str], default: None) – Optional filename for the BytesInput.

  • username (Optional[str], default: None) – Optional username for authentication.

  • password (Optional[str], default: None) – Optional password for authentication.

  • token (Optional[str], default: None) – Optional token for authentication.

  • headers (Optional[dict], default: None) – Optional additional headers for the request.

  • max_redirects (int, default: 3) – Maximum number of redirects to follow.

Return type:

BytesInput

Returns:

A BytesInput object containing the file content.

save_to_file(filepath, filename=None, username=None, password=None, token=None, headers=None, max_redirects=3)

Save the content of the URL to a file.

Parameters:
  • filepath (Union[Path, str]) – Path to save the content to.

  • filename (Optional[str], default: None) – Optional filename to give to the file.

  • username (Optional[str], default: None) – Optional username for authentication.

  • password (Optional[str], default: None) – Optional password for authentication.

  • token (Optional[str], default: None) – Optional token for authentication.

  • headers (Optional[dict], default: None) – Optional additional headers for the request.

  • max_redirects (int, default: 3) – Maximum number of redirects to follow.

Return type:

Path

Returns:

The path to the saved file.

url: str

The Uniform Resource Locator.

class WorkflowOptions(alias=None, priority=None, full_text=False, public_url=None)

Options to pass to a workflow execution.

Parameters:
  • alias (Optional[str]) –

  • priority (Optional[ExecutionPriority]) –

  • full_text (bool) –

  • public_url (Optional[str]) –

alias: Optional[str]

Alias for the document.

full_text: bool

Whether to include the full OCR text response in compatible APIs.

priority: Optional[ExecutionPriority]

Priority of the document.

public_url: Optional[str]

A unique, encrypted URL for accessing the document validation interface without requiring authentication.