java.lang.Object
- com.mindee.pdf.BasePDFExtractor

Direct Known Subclasses:

PDFExtractor
```
public class BasePDFExtractor
extends Object
```
PDF extraction class.

Field Summary

Fields
Modifier and Type Field Description

protected String filename

protected org.apache.pdfbox.pdmodel.PDDocument sourcePdf

Constructor Summary

Constructors
Constructor Description

BasePDFExtractor(LocalInputSource source)
Init from a LocalInputSource.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`ExtractedPDFs`	`extractMultipleDocuments(List<List<Integer>> pageIndexes)`	Given a list of page indexes, extracts the corresponding documents.
`ExtractedPDF`	`extractSingleDocument(List<Integer> pageIndexes, boolean closeOriginal)`
`protected String`	`makeFilename(List<Integer> pageNumbers)`	Make a nice filename for the split.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

sourcePdf

protected final org.apache.pdfbox.pdmodel.PDDocument sourcePdf

filename
```
protected final String filename
```

Constructor Detail
- BasePDFExtractor
```
public BasePDFExtractor(LocalInputSource source)
                 throws IOException
```
  Init from a LocalInputSource.
  
  Parameters:
  
  source - The local source.
  
  Throws:
  
  IOException - Throws if the file can't be accessed.

Method Detail

extractSingleDocument

public ExtractedPDF extractSingleDocument(List<Integer> pageIndexes,
                                          boolean closeOriginal)
                                   throws IOException

Throws:: IOException

extractMultipleDocuments
```
public ExtractedPDFs extractMultipleDocuments(List<List<Integer>> pageIndexes)
                                       throws IOException
```
Given a list of page indexes, extracts the corresponding documents.

Parameters:

pageIndexes - List of page indexes.

Returns:

A list of extracted files.

Throws:

IOException - Throws if the file can't be accessed.

makeFilename

protected String makeFilename(List<Integer> pageNumbers)

Make a nice filename for the split.

Modifier and Type	Field	Description
`protected String`	`filename`
`protected org.apache.pdfbox.pdmodel.PDDocument`	`sourcePdf`

Class BasePDFExtractor

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

sourcePdf

filename

Constructor Detail

BasePDFExtractor

Method Detail

extractSingleDocument

extractMultipleDocuments

makeFilename