java.lang.Object
- com.mindee.extraction.PDFExtractor

public class PDFExtractor
extends Object

PDF extraction class.

Constructor Summary

Constructors
Constructor Description

PDFExtractor(LocalInputSource source)
Init from a LocalInputSource.

PDFExtractor(String filePath)
Init from a path.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`static BufferedImage`	`byteArrayToBufferedImage(byte[] byteArray)`	Converts an array to a buffered image.
`List<ExtractedPDF>`	`extractInvoices(List<InvoiceSplitterV1InvoicePageGroup> pageIndexes)`	Extract invoices from the given page indexes (from an invoice-splitter prediction).
`List<ExtractedPDF>`	`extractInvoices(List<InvoiceSplitterV1InvoicePageGroup> pageIndexes, boolean strict)`	Extract invoices from the given page indexes (from an invoice-splitter prediction).
`List<ExtractedPDF>`	`extractSubDocuments(List<List<Integer>> pageIndexes)`	Given a list of page indexes, extracts the corresponding documents.
`int`	`getPageCount()`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - PDFExtractor
```
public PDFExtractor(String filePath)
             throws IOException
```
    Init from a path.
    
    Parameters:
    
    filePath - Path to the file.
    
    Throws:
    
    IOException - Throws if the file can't be accessed.
  - PDFExtractor
```
public PDFExtractor(LocalInputSource source)
             throws IOException
```
    Init from a LocalInputSource.
    
    Parameters:
    
    source - The local source.
    
    Throws:
    
    IOException - Throws if the file can't be accessed.
- Method Detail
  - getPageCount
```
public int getPageCount()
```
    Returns:
    
    The number of pages in the file.
  - byteArrayToBufferedImage
```
public static BufferedImage byteArrayToBufferedImage(byte[] byteArray)
                                              throws IOException
```
    Converts an array to a buffered image.
    
    Parameters:
    
    byteArray - Raw byte array.
    
    Returns:
    
    a valid ImageIO buffer.
    
    Throws:
    
    IOException - Throws if the file can't be accessed.
  - extractSubDocuments
```
public List<ExtractedPDF> extractSubDocuments(List<List<Integer>> pageIndexes)
                                       throws IOException
```
    Given a list of page indexes, extracts the corresponding documents.
    
    Parameters:
    
    pageIndexes - List of page indexes.
    
    Returns:
    
    A list of extracted files.
    
    Throws:
    
    IOException - Throws if the file can't be accessed.
  - extractInvoices
```
public List<ExtractedPDF> extractInvoices(List<InvoiceSplitterV1InvoicePageGroup> pageIndexes)
                                   throws IOException
```
    Extract invoices from the given page indexes (from an invoice-splitter prediction).
    
    Parameters:
    
    pageIndexes - List of page indexes.
    
    Returns:
    
    a list of extracted files.
    
    Throws:
    
    IOException - Throws if the file can't be accessed.
  - extractInvoices
```
public List<ExtractedPDF> extractInvoices(List<InvoiceSplitterV1InvoicePageGroup> pageIndexes,
                                          boolean strict)
                                   throws IOException
```
    Extract invoices from the given page indexes (from an invoice-splitter prediction).
    
    Parameters:
    
    pageIndexes - List of page indexes.
    
    strict - Whether the extraction should strictly follow the confidence scores or not.
    
    Returns:
    
    a list of extracted files.
    
    Throws:
    
    IOException - Throws if the file can't be accessed.

Constructor	Description
`PDFExtractor(LocalInputSource source)`	Init from a `LocalInputSource`.
`PDFExtractor(String filePath)`	Init from a path.

Class PDFExtractor

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

PDFExtractor

PDFExtractor

Method Detail

getPageCount

byteArrayToBufferedImage

extractSubDocuments

extractInvoices

extractInvoices