Extraction Params

Extraction Parameters

class ExtractionParameters(model_id, alias=None, webhook_ids=None, polling_options=None, close_file=True, rag=None, raw_text=None, polygon=None, confidence=None, text_context=None, data_schema=None)

Inference parameters to set when sending a file.

Parameters:
  • model_id (str)

  • alias (str | None)

  • webhook_ids (list[str] | None)

  • polling_options (PollingOptions | None)

  • close_file (bool)

  • rag (bool | None)

  • raw_text (bool | None)

  • polygon (bool | None)

  • confidence (bool | None)

  • text_context (str | None)

  • data_schema (DataSchema | str | dict | None)

classmethod get_enqueue_slug()

Getter for the enqueue slug.

Return type:

str

get_form_data()

Return the parameters as a config dictionary.

Return type:

dict[str, str | list[str]]

Returns:

A dict of parameters.

alias: str | None = None

Use an alias to link the file to your own DB. If empty, no alias will be used.

close_file: bool = True

Whether to close the file after product.

confidence: bool | None = None

Boost the precision and accuracy of all extractions. Calculate confidence scores for all fields, and fill their confidence attribute.

data_schema: DataSchema | str | dict | None = None

Dynamic changes to the data schema of the model for this inference. Not recommended, for specific use only.

model_id: str

ID of the model, required.

polling_options: PollingOptions | None = None

Options for polling. Set only if having timeout issues.

polygon: bool | None = None

Calculate bounding box polygons for all fields, and fill their locations attribute.

rag: bool | None = None

Enhance extraction accuracy with Retrieval-Augmented Generation.

raw_text: bool | None = None

Extract the full text content from the document as strings, and fill the raw_text attribute.

text_context: str | None = None

Additional text context used by the model during inference. Not recommended, for specific use only.

webhook_ids: list[str] | None = None

IDs of webhooks to propagate the API response to.

Data Schema

class DataSchema(replace=None)

Modify the Data Schema.

Parameters:

replace (DataSchemaReplace | dict | str | None)

replace: DataSchemaReplace | dict | str | None = None

If set, completely replaces the data schema of the model.

Data Schema Field

class DataSchemaField(title, name, is_array, type, classification_values=None, unique_values=None, description=None, guidelines=None, nested_fields=None)

A field in the data schema.

Parameters:
  • title (str)

  • name (str)

  • is_array (bool)

  • type (str)

  • classification_values (list[str] | None)

  • unique_values (bool | None)

  • description (str | None)

  • guidelines (str | None)

  • nested_fields (dict | None)

classification_values: list[str] | None = None

Allowed values when type is classification. Leave empty for other types.

description: str | None = None

Detailed description of what this field represents.

guidelines: str | None = None

Optional extraction guidelines.

is_array: bool

Whether this field can contain multiple values.

name: str

Name of the field in the data schema.

nested_fields: dict | None = None

Subfields when type is nested_object. Leave empty for other types.

title: str

Display name for the field, also impacts inference results.

type: str

Data type of the field.

unique_values: bool | None = None

Whether to remove duplicate values in the array. Only applicable if is_array is True.

Data Schema Replace

class DataSchemaReplace(fields)

The structure to completely replace the data schema of the model.

Parameters:

fields (list[DataSchemaField | dict])

String Data Class

class StringDataClass

Base class for dataclasses that can be serialized to JSON.