PdfExtractor
extends PdfExtractor
in package
PDF extraction class.
Table of Contents
Properties
- $pageCount : int
Methods
- __construct() : mixed
- extractInvoices() : array<string|int, ExtractedPdf>
- Extracts invoices as complete PDFs from the document.
- extractSubDocuments() : array<string|int, ExtractedPdf>
- Extracts sub-documents from the source document using list of page indexes.
- getFileName() : string
Properties
$pageCount
public
int
$pageCount
number of pages in the file
Methods
__construct()
public
__construct(LocalInputSource $localInput) : mixed
Parameters
- $localInput : LocalInputSource
-
Local Input, accepts all compatible formats.
Tags
extractInvoices()
Extracts invoices as complete PDFs from the document.
public
extractInvoices(array<string|int, array<string|int, int>>|InvoiceSplitterV1InvoicePageGroups $pageIndexes[, bool $strict = false ]) : array<string|int, ExtractedPdf>
Parameters
- $pageIndexes : array<string|int, array<string|int, int>>|InvoiceSplitterV1InvoicePageGroups
-
List of sub-lists of pages to keep.
- $strict : bool = false
-
Whether to trust confidence scores or not.
Return values
array<string|int, ExtractedPdf> —a list of extracted invoices
extractSubDocuments()
Extracts sub-documents from the source document using list of page indexes.
public
extractSubDocuments(array<string|int, array<string|int, int>>|InvoiceSplitterV1InvoicePageGroups $pageIndexes) : array<string|int, ExtractedPdf>
Parameters
- $pageIndexes : array<string|int, array<string|int, int>>|InvoiceSplitterV1InvoicePageGroups
-
List of sub-lists of pages to keep.
Tags
Return values
array<string|int, ExtractedPdf> —list of extracted documents
getFileName()
public
getFileName() : string
Return values
string —name of the file