Class BasePDFExtractor

  • Direct Known Subclasses:
    PDFExtractor

    public class BasePDFExtractor
    extends Object
    PDF extraction class.
    • Field Detail

      • sourcePdf

        protected final org.apache.pdfbox.pdmodel.PDDocument sourcePdf
      • filename

        protected final String filename
    • Method Detail

      • extractSubDocuments

        public ExtractedPDFs extractSubDocuments​(List<List<Integer>> pageIndexes)
                                          throws IOException
        Given a list of page indexes, extracts the corresponding documents.
        Parameters:
        pageIndexes - List of page indexes.
        Returns:
        A list of extracted files.
        Throws:
        IOException - Throws if the file can't be accessed.
      • makeFilename

        protected String makeFilename​(List<Integer> pageNumbers)
        Make a nice filename for the split.