Mindee Input

class Base64Input(base64_string, filename)

Base64-encoded text input.

Parameters:
  • base64_string (str)

  • filename (str)

apply_page_options(page_options)

Apply cut and merge options on multipage documents.

Return type:

None

Parameters:

page_options (PageOptions)

close()

Allow explicit closing for users not using a context manager.

compress(quality=85, max_width=None, max_height=None, force_source_text=False, disable_source_text=True)

Compresses the file object, either as a PDF or an image.

Parameters:
  • quality (int, default: 85) – Quality of the compression. For images, this is the JPEG quality. For PDFs, this affects image quality within the PDF.

  • max_width (int | None, default: None) – Maximum width for image resizing. Ignored for PDFs.

  • max_height (int | None, default: None) – Maximum height for image resizing. Ignored for PDFs.

  • force_source_text (bool, default: False) – For PDFs, whether to force compression even if source text is present.

  • disable_source_text (bool, default: True) – For PDFs, whether to disable source text during compression.

Return type:

None

fix_pdf(maximum_offset=500)

Fix a potentially broken pdf file.

WARNING: this feature alters the data of the enqueued file by removing unnecessary headers.

Reads the bytes of a PDF file until a proper pdf tag is encountered, or until the maximum offset has been reached. If a tag denoting a PDF file is found, deletes all bytes before it.

Parameters:

maximum_offset (int, default: 500) – maximum byte offset where superfluous headers will be removed. Cannot be less than 0.

Return type:

None

has_source_text()

If the file is a PDF, checks if it has source text.

Return type:

bool

Returns:

True if the file is a PDF and has source text. False otherwise.

is_pdf()
Return type:

bool

Returns:

True if the file is a PDF.

is_pdf_empty()

Check if the PDF is empty.

Return type:

bool

Returns:

True if the PDF is empty

merge_pdf_pages(page_numbers)

Create a new PDF from pages and set it to file_object.

Parameters:

page_numbers (set) – List of page numbers to use for merging in the original PDF.

Return type:

None

Returns:

None

process_pdf(behavior, on_min_pages, page_indexes)

Run any required processing on a PDF file.

Return type:

None

Parameters:
  • behavior (str)

  • on_min_pages (int)

  • page_indexes (Sequence[int])

read_contents(close_file)

Read the contents of the input file.

Parameters:

close_file (bool) – whether to close the file after reading

Return type:

tuple[str, bytes]

Returns:

a Tuple with the file name and binary data

file_mimetype: str
file_object: BinaryIO
filename: str
filepath: str | None
page_count: int
class BytesInput(raw_bytes, filename)

Raw bytes input.

Parameters:
  • raw_bytes (bytes)

  • filename (str)

apply_page_options(page_options)

Apply cut and merge options on multipage documents.

Return type:

None

Parameters:

page_options (PageOptions)

close()

Allow explicit closing for users not using a context manager.

compress(quality=85, max_width=None, max_height=None, force_source_text=False, disable_source_text=True)

Compresses the file object, either as a PDF or an image.

Parameters:
  • quality (int, default: 85) – Quality of the compression. For images, this is the JPEG quality. For PDFs, this affects image quality within the PDF.

  • max_width (int | None, default: None) – Maximum width for image resizing. Ignored for PDFs.

  • max_height (int | None, default: None) – Maximum height for image resizing. Ignored for PDFs.

  • force_source_text (bool, default: False) – For PDFs, whether to force compression even if source text is present.

  • disable_source_text (bool, default: True) – For PDFs, whether to disable source text during compression.

Return type:

None

fix_pdf(maximum_offset=500)

Fix a potentially broken pdf file.

WARNING: this feature alters the data of the enqueued file by removing unnecessary headers.

Reads the bytes of a PDF file until a proper pdf tag is encountered, or until the maximum offset has been reached. If a tag denoting a PDF file is found, deletes all bytes before it.

Parameters:

maximum_offset (int, default: 500) – maximum byte offset where superfluous headers will be removed. Cannot be less than 0.

Return type:

None

has_source_text()

If the file is a PDF, checks if it has source text.

Return type:

bool

Returns:

True if the file is a PDF and has source text. False otherwise.

is_pdf()
Return type:

bool

Returns:

True if the file is a PDF.

is_pdf_empty()

Check if the PDF is empty.

Return type:

bool

Returns:

True if the PDF is empty

merge_pdf_pages(page_numbers)

Create a new PDF from pages and set it to file_object.

Parameters:

page_numbers (set) – List of page numbers to use for merging in the original PDF.

Return type:

None

Returns:

None

process_pdf(behavior, on_min_pages, page_indexes)

Run any required processing on a PDF file.

Return type:

None

Parameters:
  • behavior (str)

  • on_min_pages (int)

  • page_indexes (Sequence[int])

read_contents(close_file)

Read the contents of the input file.

Parameters:

close_file (bool) – whether to close the file after reading

Return type:

tuple[str, bytes]

Returns:

a Tuple with the file name and binary data

file_mimetype: str
file_object: BinaryIO
filename: str
filepath: str | None
page_count: int
class FileInput(file)

A binary file input.

Parameters:

file (BinaryIO | IO[bytes])

apply_page_options(page_options)

Apply cut and merge options on multipage documents.

Return type:

None

Parameters:

page_options (PageOptions)

close()

Allow explicit closing for users not using a context manager.

compress(quality=85, max_width=None, max_height=None, force_source_text=False, disable_source_text=True)

Compresses the file object, either as a PDF or an image.

Parameters:
  • quality (int, default: 85) – Quality of the compression. For images, this is the JPEG quality. For PDFs, this affects image quality within the PDF.

  • max_width (int | None, default: None) – Maximum width for image resizing. Ignored for PDFs.

  • max_height (int | None, default: None) – Maximum height for image resizing. Ignored for PDFs.

  • force_source_text (bool, default: False) – For PDFs, whether to force compression even if source text is present.

  • disable_source_text (bool, default: True) – For PDFs, whether to disable source text during compression.

Return type:

None

fix_pdf(maximum_offset=500)

Fix a potentially broken pdf file.

WARNING: this feature alters the data of the enqueued file by removing unnecessary headers.

Reads the bytes of a PDF file until a proper pdf tag is encountered, or until the maximum offset has been reached. If a tag denoting a PDF file is found, deletes all bytes before it.

Parameters:

maximum_offset (int, default: 500) – maximum byte offset where superfluous headers will be removed. Cannot be less than 0.

Return type:

None

has_source_text()

If the file is a PDF, checks if it has source text.

Return type:

bool

Returns:

True if the file is a PDF and has source text. False otherwise.

is_pdf()
Return type:

bool

Returns:

True if the file is a PDF.

is_pdf_empty()

Check if the PDF is empty.

Return type:

bool

Returns:

True if the PDF is empty

merge_pdf_pages(page_numbers)

Create a new PDF from pages and set it to file_object.

Parameters:

page_numbers (set) – List of page numbers to use for merging in the original PDF.

Return type:

None

Returns:

None

process_pdf(behavior, on_min_pages, page_indexes)

Run any required processing on a PDF file.

Return type:

None

Parameters:
  • behavior (str)

  • on_min_pages (int)

  • page_indexes (Sequence[int])

read_contents(close_file)

Read the contents of the input file.

Parameters:

close_file (bool) – whether to close the file after reading

Return type:

tuple[str, bytes]

Returns:

a Tuple with the file name and binary data

file_mimetype: str
file_object: BinaryIO
filename: str
filepath: str | None
page_count: int
class LocalInputSource

Base class for all input sources coming from the local machine.

apply_page_options(page_options)

Apply cut and merge options on multipage documents.

Return type:

None

Parameters:

page_options (PageOptions)

close()

Allow explicit closing for users not using a context manager.

compress(quality=85, max_width=None, max_height=None, force_source_text=False, disable_source_text=True)

Compresses the file object, either as a PDF or an image.

Parameters:
  • quality (int, default: 85) – Quality of the compression. For images, this is the JPEG quality. For PDFs, this affects image quality within the PDF.

  • max_width (int | None, default: None) – Maximum width for image resizing. Ignored for PDFs.

  • max_height (int | None, default: None) – Maximum height for image resizing. Ignored for PDFs.

  • force_source_text (bool, default: False) – For PDFs, whether to force compression even if source text is present.

  • disable_source_text (bool, default: True) – For PDFs, whether to disable source text during compression.

Return type:

None

fix_pdf(maximum_offset=500)

Fix a potentially broken pdf file.

WARNING: this feature alters the data of the enqueued file by removing unnecessary headers.

Reads the bytes of a PDF file until a proper pdf tag is encountered, or until the maximum offset has been reached. If a tag denoting a PDF file is found, deletes all bytes before it.

Parameters:

maximum_offset (int, default: 500) – maximum byte offset where superfluous headers will be removed. Cannot be less than 0.

Return type:

None

has_source_text()

If the file is a PDF, checks if it has source text.

Return type:

bool

Returns:

True if the file is a PDF and has source text. False otherwise.

is_pdf()
Return type:

bool

Returns:

True if the file is a PDF.

is_pdf_empty()

Check if the PDF is empty.

Return type:

bool

Returns:

True if the PDF is empty

merge_pdf_pages(page_numbers)

Create a new PDF from pages and set it to file_object.

Parameters:

page_numbers (set) – List of page numbers to use for merging in the original PDF.

Return type:

None

Returns:

None

process_pdf(behavior, on_min_pages, page_indexes)

Run any required processing on a PDF file.

Return type:

None

Parameters:
  • behavior (str)

  • on_min_pages (int)

  • page_indexes (Sequence[int])

read_contents(close_file)

Read the contents of the input file.

Parameters:

close_file (bool) – whether to close the file after reading

Return type:

tuple[str, bytes]

Returns:

a Tuple with the file name and binary data

file_mimetype: str
file_object: BinaryIO
filename: str
filepath: str | None
page_count: int
class LocalResponse(input_file)

Local response loaded from a file.

Parameters:

input_file (BinaryIO | str | Path | bytes)

deserialize_response(response_class)

Load a local inference.

Typically used when wanting to load a V2 webhook callback.

Return type:

TypeVar(ResponseT, bound= CommonResponse)

Parameters:

response_class (type[ResponseT])

get_hmac_signature(secret_key)

Returns the hmac signature of the local response, from the secret key provided.

Parameters:

secret_key (str | bytes | bytearray) – Secret key, either a string or a byte/byte array.

Returns:

The hmac signature of the local response.

is_valid_hmac_signature(secret_key, signature)

Checks if the hmac signature of the local response is valid.

Parameters:
  • secret_key (str | bytes | bytearray) – Secret key, given as a string.

  • signature (str) – HMAC signature, given as a string.

Returns:

True if the HMAC signature is valid.

ResponseT = ~ResponseT
property as_dict: dict[str, Any]

Returns the dictionary representation of the file.

Returns:

A json-like dictionary.

class PageOptions(page_indexes, operation='KEEP_ONLY', on_min_pages=0)

Options to pass to the parse method for cutting multipage documents.

Parameters:
  • page_indexes (Sequence[int])

  • operation (str)

  • on_min_pages (int)

count(value, /)

Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

on_min_pages: int

Apply the operation only if document has at least this many pages.

Default: 0 (apply on all documents)

operation: str

Operation to apply on the document, given the page_indexes specified:

  • KEEP_ONLY - keep only the specified pages, and remove all others.

  • REMOVE - remove the specified pages, and keep all others.

page_indexes: Sequence[int]

Zero-based list of page indexes. A negative index can be used, indicating an offset from the end of the document.

[0, -1] represents the fist and last pages of the document.

class PathInput(filepath)

A local path input.

Parameters:

filepath (str | None)

apply_page_options(page_options)

Apply cut and merge options on multipage documents.

Return type:

None

Parameters:

page_options (PageOptions)

close()

Allow explicit closing for users not using a context manager.

compress(quality=85, max_width=None, max_height=None, force_source_text=False, disable_source_text=True)

Compresses the file object, either as a PDF or an image.

Parameters:
  • quality (int, default: 85) – Quality of the compression. For images, this is the JPEG quality. For PDFs, this affects image quality within the PDF.

  • max_width (int | None, default: None) – Maximum width for image resizing. Ignored for PDFs.

  • max_height (int | None, default: None) – Maximum height for image resizing. Ignored for PDFs.

  • force_source_text (bool, default: False) – For PDFs, whether to force compression even if source text is present.

  • disable_source_text (bool, default: True) – For PDFs, whether to disable source text during compression.

Return type:

None

fix_pdf(maximum_offset=500)

Fix a potentially broken pdf file.

WARNING: this feature alters the data of the enqueued file by removing unnecessary headers.

Reads the bytes of a PDF file until a proper pdf tag is encountered, or until the maximum offset has been reached. If a tag denoting a PDF file is found, deletes all bytes before it.

Parameters:

maximum_offset (int, default: 500) – maximum byte offset where superfluous headers will be removed. Cannot be less than 0.

Return type:

None

has_source_text()

If the file is a PDF, checks if it has source text.

Return type:

bool

Returns:

True if the file is a PDF and has source text. False otherwise.

is_pdf()
Return type:

bool

Returns:

True if the file is a PDF.

is_pdf_empty()

Check if the PDF is empty.

Return type:

bool

Returns:

True if the PDF is empty

merge_pdf_pages(page_numbers)

Create a new PDF from pages and set it to file_object.

Parameters:

page_numbers (set) – List of page numbers to use for merging in the original PDF.

Return type:

None

Returns:

None

process_pdf(behavior, on_min_pages, page_indexes)

Run any required processing on a PDF file.

Return type:

None

Parameters:
  • behavior (str)

  • on_min_pages (int)

  • page_indexes (Sequence[int])

read_contents(close_file)

Read the contents of the input file.

Parameters:

close_file (bool) – whether to close the file after reading

Return type:

tuple[str, bytes]

Returns:

a Tuple with the file name and binary data

file_mimetype: str
file_object: BinaryIO
filename: str
filepath: str | None
page_count: int
class URLInputSource(url)

A local or distant URL input.

Parameters:

url (str)

as_local_input_source(filename=None, username=None, password=None, token=None, headers=None, max_redirects=3)

Convert the URL content to a BytesInput object.

Parameters:
  • filename (str | None, default: None) – Optional filename for the BytesInput.

  • username (str | None, default: None) – Optional username for authentication.

  • password (str | None, default: None) – Optional password for authentication.

  • token (str | None, default: None) – Optional token for authentication.

  • headers (dict | None, default: None) – Optional additional headers for the request.

  • max_redirects (int, default: 3) – Maximum number of redirects to follow.

Return type:

BytesInput

Returns:

A BytesInput object containing the file content.

save_to_file(filepath, filename=None, username=None, password=None, token=None, headers=None, max_redirects=3)

Save the content of the URL to a file.

Parameters:
  • filepath (Path | str) – Path to save the content to.

  • filename (str | None, default: None) – Optional filename to give to the file.

  • username (str | None, default: None) – Optional username for authentication.

  • password (str | None, default: None) – Optional password for authentication.

  • token (str | None, default: None) – Optional token for authentication.

  • headers (dict | None, default: None) – Optional additional headers for the request.

  • max_redirects (int, default: 3) – Maximum number of redirects to follow.

Return type:

Path

Returns:

The path to the saved file.

url: str

The Uniform Resource Locator.