Module: Mindee::PDF::PDFTools
- Defined in:
- lib/mindee/pdf/pdf_tools.rb
Overview
Collection of miscellaneous PDF operations,as well as some monkey-patching for Origami.
Class Method Summary collapse
-
.add_content_to_page(page, xobject_name, width, height) ⇒ Object
Adds a content stream to the specified PDF page to display an image XObject.
-
.create_xobject(image) ⇒ Origami::Graphics::ImageXObject
Creates an image XObject from the provided image.
-
.determine_colorspace(image) ⇒ Symbol
Determines the colorspace for an image based on its metadata.
-
.determine_filter(image) ⇒ Symbol
Determines the appropriate filter for an image based on its properties.
-
.process_image_xobject(image_data, image_quality, width, height) ⇒ Origami::Graphics::ImageXObject
Processes an image into an image XObject for PDF embedding.
-
.set_page_dimensions(page, width, height) ⇒ Object
Sets the dimensions for the specified PDF page.
-
.set_xobject_properties(xobject, image) ⇒ Object
Sets properties on the provided image XObject based on image metadata.
-
.source_text?(pdf_data) ⇒ bool
Checks whether the file has source_text.
-
.stream_has_text?(stream) ⇒ bool
Checks a PDFs stream content for text operators See opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf page 243-251.
Instance Method Summary collapse
-
#to_io_stream(params = {}) ⇒ StringIO
Converts the current PDF document into a binary-encoded StringIO stream.
Class Method Details
.add_content_to_page(page, xobject_name, width, height) ⇒ Object
Adds a content stream to the specified PDF page to display an image XObject.
138 139 140 141 142 |
# File 'lib/mindee/pdf/pdf_tools.rb', line 138 def self.add_content_to_page(page, xobject_name, width, height) content = "q\n#{width} 0 0 #{height} 0 0 cm\n/#{xobject_name} Do\nQ\n" content_stream = Origami::Stream.new(content) page.Contents = content_stream end |
.create_xobject(image) ⇒ Origami::Graphics::ImageXObject
Creates an image XObject from the provided image.
Converts the given image to a binary stream using Mindee’s image utilities, then creates an Origami::Graphics::ImageXObject with a JPEG filter.
89 90 91 92 |
# File 'lib/mindee/pdf/pdf_tools.rb', line 89 def self.create_xobject(image) image_io = Mindee::Image::ImageUtils.image_to_stringio(image) Origami::Graphics::ImageXObject.from_image_file(image_io, 'jpg') end |
.determine_colorspace(image) ⇒ Symbol
Determines the colorspace for an image based on its metadata.
123 124 125 126 127 128 129 130 |
# File 'lib/mindee/pdf/pdf_tools.rb', line 123 def self.determine_colorspace(image) colorspace = image.data['colorspace'] case colorspace when 'CMYK' then :DeviceCMYK when 'Gray', 'PseudoClass Gray' then :DeviceGray else :DeviceRGB end end |
.determine_filter(image) ⇒ Symbol
Determines the appropriate filter for an image based on its properties.
110 111 112 113 114 115 116 117 |
# File 'lib/mindee/pdf/pdf_tools.rb', line 110 def self.determine_filter(image) filter = image.data['properties']['filter'] case filter when %r{Zip}i then :FlateDecode when %r{LZW}i then :LZWDecode else :DCTDecode end end |
.process_image_xobject(image_data, image_quality, width, height) ⇒ Origami::Graphics::ImageXObject
Processes an image into an image XObject for PDF embedding.
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 |
# File 'lib/mindee/pdf/pdf_tools.rb', line 161 def self.process_image_xobject(image_data, image_quality, width, height) compressed_data = Image::ImageCompressor.compress_image( image_data, quality: image_quality, max_width: width, max_height: height ) new_image = Origami::Graphics::ImageXObject.new new_image.data = compressed_data new_image.Width = width new_image.Height = height new_image.ColorSpace = :DeviceRGB new_image.BitsPerComponent = 8 new_image end |
.set_page_dimensions(page, width, height) ⇒ Object
Sets the dimensions for the specified PDF page.
149 150 151 152 |
# File 'lib/mindee/pdf/pdf_tools.rb', line 149 def self.set_page_dimensions(page, width, height) page[:MediaBox] = [0, 0, width, height] page[:CropBox] = [0, 0, width, height] end |
.set_xobject_properties(xobject, image) ⇒ Object
Sets properties on the provided image XObject based on image metadata.
98 99 100 101 102 103 104 |
# File 'lib/mindee/pdf/pdf_tools.rb', line 98 def self.set_xobject_properties(xobject, image) xobject.dictionary[:BitsPerComponent] = 8 xobject.dictionary[:Filter] = determine_filter(image) xobject.dictionary[:Width] = image[:width] xobject.dictionary[:Height] = image[:height] xobject.dictionary[:ColorSpace] = determine_colorspace(image) end |
.source_text?(pdf_data) ⇒ bool
Checks whether the file has source_text. Sends false if the file isn’t a PDF.
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
# File 'lib/mindee/pdf/pdf_tools.rb', line 57 def self.source_text?(pdf_data) begin pdf_data.rewind pdf = Origami::PDF.read(pdf_data) pdf.each_page do |page| next unless page[:Contents] contents = page[:Contents].solve contents = [contents] unless contents.is_a?(Origami::Array) contents.each do |stream_ref| stream = stream_ref.solve return true if stream_has_text?(stream) end end false end false rescue Origami::InvalidPDFError false end |
.stream_has_text?(stream) ⇒ bool
Checks a PDFs stream content for text operators See opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf page 243-251.
46 47 48 49 50 51 52 |
# File 'lib/mindee/pdf/pdf_tools.rb', line 46 def self.stream_has_text?(stream) data = stream.data return false if data.nil? || data.empty? text_operators = ['Tc', 'Tw', 'Th', 'TL', 'Tf', 'Tk', 'Tr', 'Tm', 'T*', 'Tj', 'TJ', "'", '"'] text_operators.any? { |op| data.include?(op) } end |
Instance Method Details
#to_io_stream(params = {}) ⇒ StringIO
Converts the current PDF document into a binary-encoded StringIO stream.
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
# File 'lib/mindee/pdf/pdf_tools.rb', line 16 def to_io_stream(params = {}) = { delinearize: true, recompile: true, decrypt: false, noindent: nil, } .update(params) if frozen? # incompatible flags with frozen doc (signed) [:recompile] = nil [:rebuild_xrefs] = nil [:noindent] = nil [:obfuscate] = false end load_all_objects unless @loaded intents_as_pdfa1 if [:intent].to_s =~ %r{pdf[/-]?A1?/i} delinearize! if [:delinearize] && linearized? compile() if [:recompile] io_stream = StringIO.new(output()) io_stream.set_encoding Encoding::BINARY io_stream end |