PdfExtractor
in package
PDF extraction class.
Table of Contents
Properties
Methods
- __construct() : mixed
- extractInvoices() : array<string|int, ExtractedPdf>
- Extracts invoices as complete PDFs from the document.
- extractSubDocuments() : array<string|int, ExtractedPdf>
- Extracts sub-documents from the source document using list of page indexes.
- getFileName() : string
- getPageCount() : int
- Wrapper for pdf GetPageCount().
Properties
$fileName
private
string
$fileName
name of the file
$pdfBytes
private
string
$pdfBytes
bytes representation of a file
Methods
__construct()
public
__construct(LocalInputSource $localInput) : mixed
Parameters
- $localInput : LocalInputSource
-
Local Input, accepts all compatible formats.
Tags
extractInvoices()
Extracts invoices as complete PDFs from the document.
public
extractInvoices(array<string|int, mixed>|InvoiceSplitterV1InvoicePageGroups $pageIndexes[, bool $strict = false ]) : array<string|int, ExtractedPdf>
Parameters
- $pageIndexes : array<string|int, mixed>|InvoiceSplitterV1InvoicePageGroups
-
List of sub-lists of pages to keep.
- $strict : bool = false
-
Whether to trust confidence scores or not.
Return values
array<string|int, ExtractedPdf> —a list of extracted invoices
extractSubDocuments()
Extracts sub-documents from the source document using list of page indexes.
public
extractSubDocuments(array<string|int, mixed>|InvoiceSplitterV1InvoicePageGroups $pageIndexes) : array<string|int, ExtractedPdf>
Parameters
- $pageIndexes : array<string|int, mixed>|InvoiceSplitterV1InvoicePageGroups
-
List of sub-lists of pages to keep.
Tags
Return values
array<string|int, ExtractedPdf> —list of extracted documents
getFileName()
public
getFileName() : string
Return values
string —name of the file
getPageCount()
Wrapper for pdf GetPageCount().
public
getPageCount() : int
Tags
Return values
int —The number of pages in the file.