Extraction Params
Extraction Parameters
- class ExtractionParameters(model_id, alias=None, webhook_ids=None, polling_options=None, close_file=True, rag=None, raw_text=None, polygon=None, confidence=None, text_context=None, data_schema=None)
Inference parameters to set when sending a file.
- Parameters:
model_id (str)
alias (str | None)
webhook_ids (list[str] | None)
polling_options (PollingOptions | None)
close_file (bool)
rag (bool | None)
raw_text (bool | None)
polygon (bool | None)
confidence (bool | None)
text_context (str | None)
data_schema (DataSchema | str | dict | None)
- classmethod get_enqueue_slug()
Getter for the enqueue slug.
- Return type:
str
- get_form_data()
Return the parameters as a config dictionary.
- Return type:
dict[str,str|list[str]]- Returns:
A dict of parameters.
- alias: str | None = None
Use an alias to link the file to your own DB. If empty, no alias will be used.
- close_file: bool = True
Whether to close the file after product.
- confidence: bool | None = None
Boost the precision and accuracy of all extractions. Calculate confidence scores for all fields, and fill their
confidenceattribute.
- data_schema: DataSchema | str | dict | None = None
Dynamic changes to the data schema of the model for this inference. Not recommended, for specific use only.
- model_id: str
ID of the model, required.
- polling_options: PollingOptions | None = None
Options for polling. Set only if having timeout issues.
- polygon: bool | None = None
Calculate bounding box polygons for all fields, and fill their
locationsattribute.
- rag: bool | None = None
Enhance extraction accuracy with Retrieval-Augmented Generation.
- raw_text: bool | None = None
Extract the full text content from the document as strings, and fill the
raw_textattribute.
- text_context: str | None = None
Additional text context used by the model during inference. Not recommended, for specific use only.
- webhook_ids: list[str] | None = None
IDs of webhooks to propagate the API response to.
Data Schema
- class DataSchema(replace=None)
Modify the Data Schema.
- Parameters:
replace (DataSchemaReplace | dict | str | None)
- replace: DataSchemaReplace | dict | str | None = None
If set, completely replaces the data schema of the model.
Data Schema Field
- class DataSchemaField(title, name, is_array, type, classification_values=None, unique_values=None, description=None, guidelines=None, nested_fields=None)
A field in the data schema.
- Parameters:
title (str)
name (str)
is_array (bool)
type (str)
classification_values (list[str] | None)
unique_values (bool | None)
description (str | None)
guidelines (str | None)
nested_fields (dict | None)
- classification_values: list[str] | None = None
Allowed values when type is classification. Leave empty for other types.
- description: str | None = None
Detailed description of what this field represents.
- guidelines: str | None = None
Optional extraction guidelines.
- is_array: bool
Whether this field can contain multiple values.
- name: str
Name of the field in the data schema.
- nested_fields: dict | None = None
Subfields when type is nested_object. Leave empty for other types.
- title: str
Display name for the field, also impacts inference results.
- type: str
Data type of the field.
- unique_values: bool | None = None
Whether to remove duplicate values in the array. Only applicable if is_array is True.
Data Schema Replace
- class DataSchemaReplace(fields)
The structure to completely replace the data schema of the model.
- Parameters:
fields (list[DataSchemaField | dict])
String Data Class
- class StringDataClass
Base class for dataclasses that can be serialized to JSON.