V1 PDF
Multi-Receipts Extractor
- extract_receipts(input_source, inference)
Extracts individual receipts from multi-receipts documents.
- Parameters:
input_source (
LocalInputSource) – Local Input Source to extract sub-receipts from.inference (
Inference) – Results of the inference.
- Return type:
list[ExtractedImage]- Returns:
Individual extracted receipts as an array of ExtractedMultiReceiptsImage.
PDF Extractor
- class PDFExtractor(local_input)
V1-specific PDF extractor.
- Parameters:
local_input (LocalInputSource)
- cut_pages(page_indexes)
Create a new PDF from pages and save it into a buffer.
- Parameters:
page_indexes (
list) – List of pages number to use for merging in the original PDF.- Return type:
BinaryIO- Returns:
The buffer containing the new PDF.
- extract_documents(page_indexes)
Extracts complete PDFs from the document.
- Parameters:
page_indexes (
list[list[int]]) – List of sub-lists of pages to keep.- Return type:
list[ExtractedPDF]- Returns:
A list of extracted invoices.
- extract_invoices(page_indexes, strict=False)
Extracts invoices as complete PDFs from the document from either a list of pages or a list of page groups.
- Parameters:
page_indexes (
list[InvoiceSplitterV1InvoicePageGroup|list[int]]) – List of sub-lists of pages to keep.strict (
bool, default:False) – Whether to trust confidence scores above 0.5 (included) or not.
- Return type:
list[ExtractedPDF]- Returns:
A list of extracted invoices.
- extract_sub_documents(page_indexes)
Extract the sub-documents from the main pdf, based on the given list of page indexes.
- Parameters:
page_indexes (
list[list[int]]) – List of list of numbers, representing page indexes.- Return type:
list[ExtractedPDF]- Returns:
A list of created PDFS.
- get_page_count()
Get the number of pages in the PDF file.
- Return type:
int