Package com.mindee.pdf
Class BasePDFExtractor
- java.lang.Object
-
- com.mindee.pdf.BasePDFExtractor
-
- Direct Known Subclasses:
PDFExtractor
public class BasePDFExtractor extends Object
PDF extraction class.
-
-
Constructor Summary
Constructors Constructor Description BasePDFExtractor(LocalInputSource source)Init from aLocalInputSource.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ExtractedPDFextractSinglePage(List<Integer> pageNumbers, boolean closeOriginal)ExtractedPDFsextractSubDocuments(List<List<Integer>> pageIndexes)Given a list of page indexes, extracts the corresponding documents.protected StringmakeFilename(List<Integer> pageNumbers)Make a nice filename for the split.
-
-
-
Field Detail
-
sourcePdf
protected final org.apache.pdfbox.pdmodel.PDDocument sourcePdf
-
filename
protected final String filename
-
-
Constructor Detail
-
BasePDFExtractor
public BasePDFExtractor(LocalInputSource source) throws IOException
Init from aLocalInputSource.- Parameters:
source- The local source.- Throws:
IOException- Throws if the file can't be accessed.
-
-
Method Detail
-
extractSinglePage
public ExtractedPDF extractSinglePage(List<Integer> pageNumbers, boolean closeOriginal) throws IOException
- Throws:
IOException
-
extractSubDocuments
public ExtractedPDFs extractSubDocuments(List<List<Integer>> pageIndexes) throws IOException
Given a list of page indexes, extracts the corresponding documents.- Parameters:
pageIndexes- List of page indexes.- Returns:
- A list of extracted files.
- Throws:
IOException- Throws if the file can't be accessed.
-
-