PdfExtractor
in package
PDF extraction class.
Table of Contents
Properties
Methods
- __construct() : mixed
- extractInvoices() : array<string|int, mixed>
- Extracts invoices as complete PDFs from the document.
- extractSubDocuments() : array<string|int, mixed>
- Extracts sub-documents from the source document using list of page indexes.
- getFileName() : string
- getPageCount() : int
- Wrapper for pdf GetPageCount().
Properties
$fileName
private
string
$fileName
Name of the file.
$pdfBytes
private
string
$pdfBytes
Bytes representation of a file.
Methods
__construct()
public
__construct(LocalInputSource $localInput) : mixed
Parameters
- $localInput : LocalInputSource
-
Local Input, accepts all compatible formats.
Tags
extractInvoices()
Extracts invoices as complete PDFs from the document.
public
extractInvoices(array<string|int, mixed> $pageIndexes[, bool $strict = false ]) : array<string|int, mixed>
Parameters
- $pageIndexes : array<string|int, mixed>
-
List of sub-lists of pages to keep.
- $strict : bool = false
-
Whether to trust confidence scores of 1.0 only or not.
Return values
array<string|int, mixed> —A list of extracted invoices.
extractSubDocuments()
Extracts sub-documents from the source document using list of page indexes.
public
extractSubDocuments(array<string|int, mixed> $pageIndexes) : array<string|int, mixed>
Parameters
- $pageIndexes : array<string|int, mixed>
-
List of sub-lists of pages to keep.
Tags
Return values
array<string|int, mixed> —List of extracted documents.
getFileName()
public
getFileName() : string
Return values
string —Name of the file.
getPageCount()
Wrapper for pdf GetPageCount().
public
getPageCount() : int
Tags
Return values
int —The number of pages in the file.