Mindee Client

Client

class Client(api_key='')

Mindee API Client.

See: https://developers.mindee.com/docs/

Parameters:

api_key (str) –

create_endpoint(endpoint_name, account_name='mindee', version=None)

Add a custom endpoint, created using the Mindee API Builder.

Parameters:
  • endpoint_name (str) – The “API name” field in the “Settings” page of the API Builder

  • account_name (str, default: 'mindee') – Your organization’s username on the API Builder

  • version (Optional[str], default: None) – If set, locks the version of the model to use. If not set, use the latest version of the model.

Return type:

Endpoint

enqueue(product_class, input_source, include_words=False, close_file=True, page_options=None, cropper=False, endpoint=None, full_text=False)

Enqueues a document to an asynchronous endpoint.

Parameters:
  • product_class (Type[Inference]) – The document class to use. The response object will be instantiated based on this parameter.

  • input_source (Union[LocalInputSource, UrlInputSource]) – The document/source file to use. Has to be created beforehand.

  • include_words (bool, default: False) – Whether to include the full text for each page. This performs a full OCR operation on the server and will increase response time.

  • close_file (bool, default: True) – Whether to close() the file after parsing it. Set to False if you need to access the file after this operation.

  • page_options (Optional[PageOptions], default: None) – If set, remove pages from the document as specified. This is done before sending the file to the server. It is useful to avoid page limitations.

  • cropper (bool, default: False) – Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.

  • endpoint (Optional[Endpoint], default: None) – For custom endpoints, an endpoint has to be given.

  • full_text (bool, default: False) – Whether to include the full OCR text response in compatible APIs.

Return type:

AsyncPredictResponse

enqueue_and_parse(product_class, input_source, include_words=False, close_file=True, page_options=None, cropper=False, endpoint=None, initial_delay_sec=2, delay_sec=1.5, max_retries=30, full_text=False)

Enqueues to an asynchronous endpoint and automatically polls for a response.

Parameters:
  • product_class (Type[Inference]) – The document class to use. The response object will be instantiated based on thisparameter.

  • input_source (Union[LocalInputSource, UrlInputSource]) – The document/source file to use. Has to be created beforehand.

  • include_words (bool, default: False) – Whether to include the full text for each page. This performs a full OCR operation on the server and will increase response time.

  • close_file (bool, default: True) – Whether to close() the file after parsing it. Set to False if you need to access the file after this operation.

  • page_options (Optional[PageOptions], default: None) – If set, remove pages from the document as specified. This is done before sending the file to the server. It is useful to avoid page limitations.

  • cropper (bool, default: False) – Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.

  • endpoint (Optional[Endpoint], default: None) – For custom endpoints, an endpoint has to be given.

  • initial_delay_sec (float, default: 2) – Delay between each polling attempts This should not be shorter than 1 second.

  • delay_sec (float, default: 1.5) – Delay between each polling attempts This should not be shorter than 1 second.

  • max_retries (int, default: 30) – Total amount of polling attempts.

  • full_text (bool, default: False) – Whether to include the full OCR text response in compatible APIs.

Return type:

AsyncPredictResponse

execute_workflow(input_source, workflow_id, options=None, page_options=None)

Send the document to a workflow execution.

Parameters:
  • input_source (Union[LocalInputSource, UrlInputSource]) – The document/source file to use. Has to be created beforehand.

  • workflow_id (str) – ID of the workflow.

  • page_options (Optional[PageOptions], default: None) – If set, remove pages from the document as specified. This is done before sending the file to the server. It is useful to avoid page limitations.

  • options (Optional[WorkflowOptions], default: None) – Options for the workflow.

Return type:

WorkflowResponse

Returns:

load_prediction(product_class, local_response)

Load a prediction.

Parameters:
  • product_class (Type[Inference]) – Class of the product to use.

  • local_response (LocalResponse) – Local response to load.

Return type:

Union[AsyncPredictResponse, PredictResponse]

Returns:

A valid prediction.

parse(product_class, input_source, include_words=False, close_file=True, page_options=None, cropper=False, endpoint=None, full_text=False)

Call prediction API on the document and parse the results.

Parameters:
  • product_class (Type[Inference]) – The document class to use. The response object will be instantiated based on this parameter.

  • input_source (Union[LocalInputSource, UrlInputSource]) – The document/source file to use. Has to be created beforehand.

  • include_words (bool, default: False) – Whether to include the full text for each page. This performs a full OCR operation on the server and will increase response time. Only available on financial document APIs.

  • close_file (bool, default: True) – Whether to close() the file after parsing it. Set to False if you need to access the file after this operation.

  • page_options (Optional[PageOptions], default: None) – If set, remove pages from the document as specified. This is done before sending the file to the server. It is useful to avoid page limitations.

  • cropper (bool, default: False) – Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.

  • endpoint (Optional[Endpoint], default: None) – For custom endpoints, an endpoint has to be given.

  • full_text (bool, default: False) – Whether to include the full OCR text response in compatible APIs.

Return type:

PredictResponse

parse_queued(product_class, queue_id, endpoint=None)

Parses a queued document.

Parameters:
  • product_class (Type[Inference]) – The document class to use. The response object will be instantiated based on this parameter.

  • queue_id (str) – queue_id received from the API.

  • endpoint (Optional[Endpoint], default: None) – For custom endpoints, an endpoint has to be given.

Return type:

AsyncPredictResponse

send_feedback(product_class, document_id, feedback, endpoint=None)

Send a feedback for a document.

Parameters:
  • product_class (Type[Inference]) – The document class to use. The response object will be instantiated based on this parameter.

  • document_id (str) – The id of the document to send feedback to.

  • feedback (Dict[str, Any]) – Feedback to send.

  • endpoint (Optional[Endpoint], default: None) – For custom endpoints, an endpoint has to be given.

Return type:

FeedbackResponse

source_from_b64string(input_string, filename, fix_pdf=False)

Load a document from a base64 encoded string.

Parameters:
  • input_string (str) – Input to parse as base64 string

  • filename (str) – The name of the file (without the path)

  • fix_pdf (bool, default: False) – Whether to attempt fixing PDF files before sending. Setting this to True can modify the data sent to Mindee.

Return type:

Base64Input

source_from_bytes(input_bytes, filename, fix_pdf=False)

Load a document from raw bytes.

Parameters:
  • input_bytes (bytes) – Raw byte input

  • filename (str) – The name of the file (without the path)

  • fix_pdf (bool, default: False) – Whether to attempt fixing PDF files before sending. Setting this to True can modify the data sent to Mindee.

Return type:

BytesInput

source_from_file(input_file, fix_pdf=False)

Load a document from a normal Python file object/handle.

Parameters:
  • input_file (BinaryIO) – Input file handle

  • fix_pdf (bool, default: False) – Whether to attempt fixing PDF files before sending. Setting this to True can modify the data sent to Mindee.

Return type:

FileInput

source_from_path(input_path, fix_pdf=False)

Load a document from an absolute path, as a string.

Parameters:
  • input_path (Union[Path, str]) – Path of file to open

  • fix_pdf (bool, default: False) – Whether to attempt fixing PDF files before sending. Setting this to True can modify the data sent to Mindee.

Return type:

PathInput

source_from_url(url)

Load a document from a URL.

Parameters:

url (str) – Raw byte input

Return type:

UrlInputSource