Input
- class Base64Input(base64_string, filename)
Base64-encoded text input.
- Parameters:
base64_string (str) –
filename (str) –
- class BytesInput(raw_bytes, filename)
Raw bytes input.
- Parameters:
raw_bytes (bytes) –
filename (str) –
- class FileInput(file)
A binary file input.
- Parameters:
file (BinaryIO) –
- class InputType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
The input type, for internal use.
- BASE64 = 'base64'
- BYTES = 'bytes'
- FILE = 'file'
- PATH = 'path'
- URL = 'url'
- class LocalInputSource(input_type)
Base class for all input sources coming from the local machine.
- Parameters:
input_type (InputType) –
- close()
Close the file object.
- Return type:
None
- count_doc_pages()
Count the pages in the PDF.
- Return type:
int
- Returns:
the number of pages.
- fix_pdf(maximum_offset=500)
Fix a potentially broken pdf file.
WARNING: this feature alters the data of the enqueued file by removing unnecessary headers.
Reads the bytes of a PDF file until a proper pdf tag is encountered, or until the maximum offset has been reached. If a tag denoting a PDF file is found, deletes all bytes before it.
- Parameters:
maximum_offset (
int
, default:500
) – maximum byte offset where superfluous headers will be removed. Cannot be less than 0.- Return type:
None
- is_pdf()
- Return type:
bool
- Returns:
True if the file is a PDF.
- is_pdf_empty()
Check if the PDF is empty.
- Return type:
bool
- Returns:
True
if the PDF is empty
- merge_pdf_pages(page_numbers)
Create a new PDF from pages and set it to
file_object
.- Parameters:
page_numbers (
set
) – List of pages number to use for merging in the original PDF.- Return type:
None
- Returns:
None
- process_pdf(behavior, on_min_pages, page_indexes)
Run any required processing on a PDF file.
- Return type:
None
- Parameters:
behavior (str) –
on_min_pages (int) –
page_indexes (Sequence) –
- read_contents(close_file)
Read the contents of the input file.
- Parameters:
close_file (
bool
) – whether to close the file after reading- Return type:
Tuple
[str
,bytes
]- Returns:
a Tuple with the file name and binary data
-
file_mimetype:
str
-
file_object:
BinaryIO
-
filename:
str
-
filepath:
Optional
[str
]
- class LocalResponse(input_file)
Local response loaded from a file.
- Parameters:
input_file (Union[BinaryIO, str, Path, bytes]) –
- get_hmac_signature(secret_key)
Returns the hmac signature of the local response, from the secret key provided.
- Parameters:
secret_key (
Union
[str
,bytes
,bytearray
]) – Secret key, either a string or a byte/byte array.- Returns:
The hmac signature of the local response.
- is_valid_hmac_signature(secret_key, signature)
Checks if the hmac signature of the local response is valid.
- Parameters:
secret_key (
Union
[str
,bytes
,bytearray
]) – Secret key, given as a string.signature (
str
) – HMAC signature, given as a string.
- Returns:
True if the HMAC signature is valid.
- property as_dict: Dict[str, Any]
Returns the dictionary representation of the file.
- Returns:
A json-like dictionary.
- class PageOptions(page_indexes, operation='KEEP_ONLY', on_min_pages=0)
Options to pass to the parse method for cutting multipage documents.
- Parameters:
page_indexes (Sequence[int]) –
operation (str) –
on_min_pages (int) –
-
on_min_pages:
int
Apply the operation only if document has at least this many pages.
Default: 0 (apply on all documents)
-
operation:
str
Operation to apply on the document, given the
page_indexes
specified:KEEP_ONLY
- keep only the specified pages, and remove all others.REMOVE
- remove the specified pages, and keep all others.
-
page_indexes:
Sequence
[int
] Zero-based list of page indexes. A negative index can be used, indicating an offset from the end of the document.
[0, -1] represents the fist and last pages of the document.
- class PathInput(filepath)
A local path input.
- Parameters:
filepath (Optional[str]) –
- class UrlInputSource(url)
A local or distant URL input.
- Parameters:
url (str) –
- as_local_input_source(filename=None, username=None, password=None, token=None, headers=None, max_redirects=3)
Convert the URL content to a BytesInput object.
- Parameters:
filename (
Optional
[str
], default:None
) – Optional filename for the BytesInput.username (
Optional
[str
], default:None
) – Optional username for authentication.password (
Optional
[str
], default:None
) – Optional password for authentication.token (
Optional
[str
], default:None
) – Optional token for authentication.headers (
Optional
[dict
], default:None
) – Optional additional headers for the request.max_redirects (
int
, default:3
) – Maximum number of redirects to follow.
- Return type:
- Returns:
A BytesInput object containing the file content.
- save_to_file(filepath, filename=None, username=None, password=None, token=None, headers=None, max_redirects=3)
Save the content of the URL to a file.
- Parameters:
filepath (
Union
[Path
,str
]) – Path to save the content to.filename (
Optional
[str
], default:None
) – Optional filename to give to the file.username (
Optional
[str
], default:None
) – Optional username for authentication.password (
Optional
[str
], default:None
) – Optional password for authentication.token (
Optional
[str
], default:None
) – Optional token for authentication.headers (
Optional
[dict
], default:None
) – Optional additional headers for the request.max_redirects (
int
, default:3
) – Maximum number of redirects to follow.
- Return type:
Path
- Returns:
The path to the saved file.
-
url:
str
The Uniform Resource Locator.
- class WorkflowOptions(alias=None, priority=None, full_text=False, public_url=None)
Options to pass to a workflow execution.
- Parameters:
alias (Optional[str]) –
priority (Optional[ExecutionPriority]) –
full_text (bool) –
public_url (Optional[str]) –
-
alias:
Optional
[str
] Alias for the document.
-
full_text:
bool
Whether to include the full OCR text response in compatible APIs.
-
priority:
Optional
[ExecutionPriority
] Priority of the document.
-
public_url:
Optional
[str
] A unique, encrypted URL for accessing the document validation interface without requiring authentication.