Module: Mindee::PDF::PDFCompressor
- Defined in:
- lib/mindee/pdf/pdf_compressor.rb
Overview
Image compressor module to handle PDF compression.
Class Method Summary collapse
-
.compress_pdf(pdf_data, quality: 85, force_source_text_compression: false, disable_source_text: true) ⇒ Object
Compresses each page of a provided PDF stream.
-
.create_output_pdf(pages, disable_source_text, pdf_data) ⇒ Origami::PDF
Creates the output PDF with processed pages.
-
.inject_text(pdf_data, pages) ⇒ Object
Extracts text from a source text PDF, and injects it into a newly-created one.
-
.process_pdf_page(page_stream, page_index, image_quality, media_box) ⇒ Origami::Page
Takes in a page stream, rasterizes it into a JPEG image, and applies the result onto a new Origami PDF page.
-
.process_pdf_pages(pdf, quality) ⇒ Array<Origami::Page>
Processes all pages in the PDF.
Class Method Details
.compress_pdf(pdf_data, quality: 85, force_source_text_compression: false, disable_source_text: true) ⇒ Object
Compresses each page of a provided PDF stream. Skips if force_source_text isn’t set and source text is detected.
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
# File 'lib/mindee/pdf/pdf_compressor.rb', line 14 def self.compress_pdf(pdf_data, quality: 85, force_source_text_compression: false, disable_source_text: true) if PDFTools.source_text?(pdf_data) if force_source_text_compression if disable_source_text puts "\e[33m[WARNING] Re-writing PDF source-text is an EXPERIMENTAL feature.\e[0m" else puts "\e[33m[WARNING] Source-file contains text, but disable_source_text flag is ignored. " \ "Resulting file will not contain any embedded text.\e[0m" end else puts "\e[33m[WARNING] Source-text detected in input PDF. Aborting operation.\e[0m" return pdf_data end end pdf_data.rewind pdf = Origami::PDF.read(pdf_data) pages = process_pdf_pages(pdf, quality) output_pdf = create_output_pdf(pages, disable_source_text, pdf_data) output_stream = StringIO.new output_pdf.save(output_stream) output_stream end |
.create_output_pdf(pages, disable_source_text, pdf_data) ⇒ Origami::PDF
Creates the output PDF with processed pages.
55 56 57 58 59 60 61 62 63 64 65 |
# File 'lib/mindee/pdf/pdf_compressor.rb', line 55 def self.create_output_pdf(pages, disable_source_text, pdf_data) output_pdf = Origami::PDF.new # NOTE: Page order and XObject handling require adjustment due to origami adding the last page first. pages.rotate!(1) if pages.count >= 2 inject_text(pdf_data, pages) unless disable_source_text pages.each { |page| output_pdf.append_page(page) } output_pdf end |
.inject_text(pdf_data, pages) ⇒ Object
Extracts text from a source text PDF, and injects it into a newly-created one.
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
# File 'lib/mindee/pdf/pdf_compressor.rb', line 70 def self.inject_text(pdf_data, pages) reader = PDFReader::Reader.new(pdf_data) reader.pages.each_with_index do |original_page, index| break if index >= pages.length receiver = PDFReader::Reader::PageTextReceiver.new original_page.walk(receiver) receiver.runs.each do |text_run| x = text_run.origin.x y = text_run.origin.y text = text_run.text font_size = text_run.font_size content_stream = Origami::Stream.new content_stream.dictionary[:Filter] = :FlateDecode content_stream.data = "BT\n/F1 #{font_size} Tf\n#{x} #{y} Td\n(#{text}) Tj\nET\n" pages[index].Contents.data += content_stream.data end end end |
.process_pdf_page(page_stream, page_index, image_quality, media_box) ⇒ Origami::Page
Takes in a page stream, rasterizes it into a JPEG image, and applies the result onto a new Origami PDF page.
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
# File 'lib/mindee/pdf/pdf_compressor.rb', line 100 def self.process_pdf_page(page_stream, page_index, image_quality, media_box) new_page = Origami::Page.new compressed_image = Mindee::Image::ImageUtils.pdf_to_magick_image(page_stream, image_quality) width, height = Mindee::Image::ImageUtils.calculate_dimensions_from_media_box(compressed_image, media_box) compressed_xobject = PDF::PDFTools.create_xobject(compressed_image) PDF::PDFTools.set_xobject_properties(compressed_xobject, compressed_image) xobject_name = "X#{page_index + 1}" PDF::PDFTools.add_content_to_page(new_page, xobject_name, width, height) new_page.add_xobject(compressed_xobject, xobject_name) PDF::PDFTools.set_page_dimensions(new_page, width, height) new_page end |
.process_pdf_pages(pdf, quality) ⇒ Array<Origami::Page>
Processes all pages in the PDF.
44 45 46 47 48 |
# File 'lib/mindee/pdf/pdf_compressor.rb', line 44 def self.process_pdf_pages(pdf, quality) pdf.pages.map.with_index do |page, index| process_pdf_page(Mindee::PDF::PdfProcessor.get_page(pdf, index), index, quality, page[:MediaBox]) end end |