Class PDFUtils


  • public final class PDFUtils
    extends Object
    Utilities for working with PDFs.
    • Method Detail

      • getNumberOfPages

        public static int getNumberOfPages​(LocalInputSource inputSource)
                                    throws IOException
        Get the number of pages in the PDF.
        Parameters:
        inputSource - The PDF file.
        Throws:
        IOException
      • mergePdfPages

        public static byte[] mergePdfPages​(File file,
                                           List<Integer> pageNumbers)
                                    throws IOException
        Merge specified PDF pages together.
        Parameters:
        file - The PDF file.
        pageNumbers - Lit of page numbers to merge together.
        Throws:
        IOException
      • mergePdfPages

        public static byte[] mergePdfPages​(org.apache.pdfbox.pdmodel.PDDocument document,
                                           List<Integer> pageNumbers)
                                    throws IOException
        Throws:
        IOException
      • mergePdfPages

        public static byte[] mergePdfPages​(org.apache.pdfbox.pdmodel.PDDocument document,
                                           List<Integer> pageNumbers,
                                           boolean closeOriginal)
                                    throws IOException
        Throws:
        IOException
      • pdfToImages

        public static List<PdfPageImage> pdfToImages​(String filePath)
                                              throws IOException
        Render all pages of a PDF as images. Converting PDFs with hundreds of pages may result in a heap space error.
        Parameters:
        filePath - The path to the PDF file.
        Returns:
        List of all pages as images.
        Throws:
        IOException
      • pdfToImages

        public static List<PdfPageImage> pdfToImages​(LocalInputSource source)
                                              throws IOException
        Render all pages of a PDF as images. Converting PDFs with hundreds of pages may result in a heap space error.
        Parameters:
        source - The PDF file.
        Returns:
        List of all pages as images.
        Throws:
        IOException
      • pdfPageToImage

        public static PdfPageImage pdfPageToImage​(String filePath,
                                                  int pageNumber)
                                           throws IOException
        Render a single page of a PDF as an image. Main use case is for processing PDFs with hundreds of pages. If you need to only render some pages from the PDF, use mergePdfPages and then pdfToImages.
        Parameters:
        filePath - The path to the PDF file.
        pageNumber - The page number to render, first page is 1.
        Returns:
        The page as an image.
        Throws:
        IOException
      • pdfPageToImage

        public static PdfPageImage pdfPageToImage​(LocalInputSource source,
                                                  int pageNumber)
                                           throws IOException
        Render a single page of a PDF as an image. Main use case is for processing PDFs with hundreds of pages. If you need to only render some pages from the PDF, use mergePdfPages and then pdfToImages.
        Parameters:
        source - The PDF file.
        pageNumber - The page number to render, first page is 1.
        Returns:
        The page as an image.
        Throws:
        IOException
      • documentToBytes

        public static byte[] documentToBytes​(org.apache.pdfbox.pdmodel.PDDocument document)
                                      throws IOException
        Throws:
        IOException
      • extractAndAddText

        public static void extractAndAddText​(org.apache.pdfbox.pdmodel.PDDocument inputDoc,
                                             org.apache.pdfbox.pdmodel.PDPageContentStream contentStream,
                                             int pageIndex,
                                             boolean disableSourceText)
                                      throws IOException
        Throws:
        IOException
      • addImageToPage

        public static void addImageToPage​(org.apache.pdfbox.pdmodel.PDPageContentStream contentStream,
                                          org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject pdImage,
                                          org.apache.pdfbox.pdmodel.common.PDRectangle pageSize)
                                   throws IOException
        Throws:
        IOException