Input
- class Base64Input(base64_string, filename)
Base64-encoded text input.
- Parameters:
base64_string (str) –
filename (str) –
-
file_mimetype:
str
-
file_object:
BinaryIO
-
filename:
str
-
filepath:
Optional
[str
]
- class BytesInput(raw_bytes, filename)
Raw bytes input.
- Parameters:
raw_bytes (bytes) –
filename (str) –
-
file_mimetype:
str
-
file_object:
BinaryIO
-
filename:
str
-
filepath:
Optional
[str
]
- class FileInput(file)
A binary file input.
- Parameters:
file (BinaryIO) –
-
file_mimetype:
str
-
file_object:
BinaryIO
-
filename:
str
-
filepath:
Optional
[str
]
- class InputType(value)
The input type, for internal use.
- BASE64 = 'base64'
- BYTES = 'bytes'
- FILE = 'file'
- PATH = 'path'
- URL = 'url'
- class LocalInputSource(input_type)
Base class for all input sources coming from the local machine.
- Parameters:
input_type (InputType) –
- close()
Close the file object.
- Return type:
None
- count_doc_pages()
Count the pages in the PDF.
- Return type:
int
- Returns:
the number of pages.
- fix_pdf(maximum_offset=500)
Fix a potentially broken pdf file.
WARNING: this feature alters the data of the enqueued file by removing unnecessary headers.
Reads the bytes of a PDF file until a proper pdf tag is encountered, or until the maximum offset has been reached. If a tag denoting a PDF file is found, deletes all bytes before it.
- Parameters:
maximum_offset (
int
, default:500
) – maximum byte offset where superfluous headers will be removed. Cannot be less than 0.- Return type:
None
- is_pdf()
- Return type:
bool
- Returns:
True if the file is a PDF.
- is_pdf_empty()
Check if the PDF is empty.
- Return type:
bool
- Returns:
True
if the PDF is empty
- merge_pdf_pages(page_numbers)
Create a new PDF from pages and set it to
file_object
.- Parameters:
page_numbers (
set
) – List of pages number to use for merging in the original PDF.- Return type:
None
- Returns:
None
- process_pdf(behavior, on_min_pages, page_indexes)
Run any required processing on a PDF file.
- Return type:
None
- Parameters:
behavior (str) –
on_min_pages (int) –
page_indexes (Sequence) –
- read_contents(close_file)
Read the contents of the input file.
- Parameters:
close_file (
bool
) – whether to close the file after reading- Return type:
Tuple
[str
,bytes
]- Returns:
a Tuple with the file name and binary data
-
file_mimetype:
str
-
file_object:
BinaryIO
-
filename:
str
-
filepath:
Optional
[str
]
- class LocalResponse(input_file)
Local response loaded from a file.
- Parameters:
input_file (Union[BinaryIO, str, Path, bytes]) –
- get_hmac_signature(secret_key)
Returns the hmac signature of the local response, from the secret key provided.
- Parameters:
secret_key (
Union
[str
,bytes
,bytearray
]) – Secret key, either a string or a byte/byte array.- Returns:
The hmac signature of the local response.
- is_valid_hmac_signature(secret_key, signature)
Checks if the hmac signature of the local response is valid.
- Parameters:
secret_key (
Union
[str
,bytes
,bytearray
]) – Secret key, given as a string.signature (
str
) – HMAC signature, given as a string.
- Returns:
True if the HMAC signature is valid.
- property as_dict: Dict[str, Any]
Returns the dictionary representation of the file.
- Returns:
A json-like dictionary.
- class PageOptions(page_indexes, operation='KEEP_ONLY', on_min_pages=0)
Options to pass to the parse method for cutting multipage documents.
- Parameters:
page_indexes (Sequence[int]) –
operation (str) –
on_min_pages (int) –
-
on_min_pages:
int
Apply the operation only if document has at least this many pages.
Default: 0 (apply on all documents)
-
operation:
str
Operation to apply on the document, given the
page_indexes
specified:KEEP_ONLY
- keep only the specified pages, and remove all others.REMOVE
- remove the specified pages, and keep all others.
-
page_indexes:
Sequence
[int
] Zero-based list of page indexes. A negative index can be used, indicating an offset from the end of the document.
[0, -1] represents the fist and last pages of the document.