V1 PDF

Multi-Receipts Extractor

extract_receipts(input_source, inference)

Extracts individual receipts from multi-receipts documents.

Parameters:

input_source (LocalInputSource) – Local Input Source to extract sub-receipts from.
inference (Inference) – Results of the inference.

Return type:

list[ExtractedImage]

Returns:

Individual extracted receipts as an array of ExtractedMultiReceiptsImage.

class PDFExtractor(local_input)

V1-specific PDF extractor.

cut_pages(page_indexes)

Create a new PDF from pages and save it into a buffer.

Parameters:: page_indexes (list) – List of pages number to use for merging in the original PDF.
Return type:: BinaryIO
Returns:: The buffer containing the new PDF.

extract_documents(page_indexes)

Extracts complete PDFs from the document.

Parameters:: page_indexes (list[list[int]]) – List of sub-lists of pages to keep.
Return type:: list[ExtractedPDF]
Returns:: A list of extracted invoices.

extract_invoices(page_indexes, strict=False)

Extracts invoices as complete PDFs from the document from either a list of pages or a list of page groups.

Parameters:

page_indexes (list[InvoiceSplitterV1InvoicePageGroup | list[int]]) – List of sub-lists of pages to keep.
strict (bool, default: False) – Whether to trust confidence scores above 0.5 (included) or not.

Return type:

list[ExtractedPDF]

Returns:

A list of extracted invoices.

extract_sub_documents(page_indexes)

Extract the sub-documents from the main pdf, based on the given list of page indexes.

Parameters:: page_indexes (list[list[int]]) – List of list of numbers, representing page indexes.
Return type:: list[ExtractedPDF]
Returns:: A list of created PDFS.

get_page_count()

Get the number of pages in the PDF file.