Extraction Params

Extraction Parameters

class ExtractionParameters(model_id, alias=None, webhook_ids=None, polling_options=None, close_file=True, rag=None, raw_text=None, polygon=None, confidence=None, text_context=None, data_schema=None)

Inference parameters to set when sending a file.

Parameters:

model_id (str)
alias (str | None)
webhook_ids (list[str] | None)
polling_options (PollingOptions | None)
close_file (bool)
rag (bool | None)
raw_text (bool | None)
polygon (bool | None)
confidence (bool | None)
text_context (str | None)
data_schema (DataSchema | str | dict | None)

classmethod get_enqueue_slug()

Getter for the enqueue slug.

Return type:: str

get_form_data()

Return the parameters as a config dictionary.

Return type:: dict[str, str | list[str]]
Returns:: A dict of parameters.

alias: str | None = None: Use an alias to link the file to your own DB. If empty, no alias will be used.

close_file: bool = True: Whether to close the file after product.

confidence: bool | None = None: Boost the precision and accuracy of all extractions. Calculate confidence scores for all fields, and fill their confidence attribute.

data_schema: DataSchema | str | dict | None = None: Dynamic changes to the data schema of the model for this inference. Not recommended, for specific use only.

model_id: str: ID of the model, required.

polling_options: PollingOptions | None = None: Options for polling. Set only if having timeout issues.

polygon: bool | None = None: Calculate bounding box polygons for all fields, and fill their locations attribute.

rag: bool | None = None: Enhance extraction accuracy with Retrieval-Augmented Generation.

raw_text: bool | None = None: Extract the full text content from the document as strings, and fill the raw_text attribute.

text_context: str | None = None: Additional text context used by the model during inference. Not recommended, for specific use only.

webhook_ids: list[str] | None = None: IDs of webhooks to propagate the API response to.

Data Schema

class DataSchema(replace=None)

Modify the Data Schema.

Parameters:: replace (DataSchemaReplace | dict | str | None)

replace: DataSchemaReplace | dict | str | None = None: If set, completely replaces the data schema of the model.

Data Schema Field

class DataSchemaField(title, name, is_array, type, classification_values=None, unique_values=None, description=None, guidelines=None, nested_fields=None)

A field in the data schema.

Parameters:

title (str)
name (str)
is_array (bool)
type (str)
classification_values (list[str] | None)
unique_values (bool | None)
description (str | None)
guidelines (str | None)
nested_fields (dict | None)

classification_values: list[str] | None = None: Allowed values when type is classification. Leave empty for other types.

description: str | None = None: Detailed description of what this field represents.

guidelines: str | None = None: Optional extraction guidelines.

is_array: bool: Whether this field can contain multiple values.

name: str: Name of the field in the data schema.

nested_fields: dict | None = None: Subfields when type is nested_object. Leave empty for other types.

title: str: Display name for the field, also impacts inference results.

type: str: Data type of the field.

unique_values: bool | None = None: Whether to remove duplicate values in the array. Only applicable if is_array is True.

Data Schema Replace

class DataSchemaReplace(fields)

The structure to completely replace the data schema of the model.

Parameters:: fields (list[DataSchemaField | dict])

String Data Class

class StringDataClass

Base class for dataclasses that can be serialized to JSON.