Module: Mindee::Image::ImageExtractor
- Defined in:
- lib/mindee/image/image_extractor.rb
Overview
Image Extraction wrapper class.
Class Method Summary collapse
-
.attach_image_as_new_file(input_buffer, format: 'jpg') ⇒ Origami::PDF
Attaches an image as a new page in a PdfDocument object.
-
.create_extracted_image(buffer, file_name, page_id, element_id) ⇒ Object
Generates an ExtractedImage.
-
.extract_images_from_polygons(input_source, pdf_stream, page_id, polygons) ⇒ Array<Mindee::Image::ExtractedImage>
Extracts images from their positions on a file (as polygons).
-
.extract_multiple_images_from_source(input_source, page_id, polygons) ⇒ Array<Mindee::Image::ExtractedImage>
Extracts multiple images from a given local input source.
-
.load_input_source_pdf_page_as_stringio(input_file, page_id) ⇒ StringIO
Loads a single_page from an image file or a pdf document.
Class Method Details
.attach_image_as_new_file(input_buffer, format: 'jpg') ⇒ Origami::PDF
Attaches an image as a new page in a PdfDocument object.
19 20 21 22 23 24 25 26 27 28 29 |
# File 'lib/mindee/image/image_extractor.rb', line 19 def self.attach_image_as_new_file(input_buffer, format: 'jpg') magick_image = MiniMagick::Image.read(input_buffer) # NOTE: some jpeg images get rendered as three different versions of themselves per output if the format isn't # converted. magick_image.format(format) original_density = magick_image.resolution scale_factor = original_density[0].to_f / 4.166666 # No clue why the resolution needs to be reduced for # the pdf otherwise the resulting image shrinks. magick_image.format('pdf', 0, { density: scale_factor.to_s }) Origami::PDF.read(StringIO.new(magick_image.to_blob)) end |
.create_extracted_image(buffer, file_name, page_id, element_id) ⇒ Object
Generates an ExtractedImage.
95 96 97 98 99 100 101 102 |
# File 'lib/mindee/image/image_extractor.rb', line 95 def self.create_extracted_image(buffer, file_name, page_id, element_id) buffer.rewind ExtractedImage.new( Mindee::Input::Source::BytesInputSource.new(buffer.read.to_s, file_name), page_id, element_id ) end |
.extract_images_from_polygons(input_source, pdf_stream, page_id, polygons) ⇒ Array<Mindee::Image::ExtractedImage>
Extracts images from their positions on a file (as polygons).
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/mindee/image/image_extractor.rb', line 52 def self.extract_images_from_polygons(input_source, pdf_stream, page_id, polygons) extracted_elements = [] polygons.each_with_index do |polygon, element_id| polygon = ImageUtils.normalize_polygon(polygon) page_content = ImageUtils.read_page_content(pdf_stream) min_max_x = Geometry.get_min_max_x([ polygon.top_left, polygon.bottom_right, polygon.top_right, polygon.bottom_left, ]) min_max_y = Geometry.get_min_max_y([ polygon.top_left, polygon.bottom_right, polygon.top_right, polygon.bottom_left, ]) file_extension = ImageUtils.determine_file_extension(input_source) cropped_image = ImageUtils.crop_image(page_content, min_max_x, min_max_y) if file_extension == 'pdf' cropped_image.format('jpg') else cropped_image.format(file_extension.to_s) end buffer = StringIO.new ImageUtils.write_image_to_buffer(cropped_image, buffer) file_name = "#{input_source.filename}_page#{page_id}-#{element_id}.#{file_extension}" extracted_elements << create_extracted_image(buffer, file_name, page_id, element_id) end extracted_elements end |
.extract_multiple_images_from_source(input_source, page_id, polygons) ⇒ Array<Mindee::Image::ExtractedImage>
Extracts multiple images from a given local input source.
to extract.
38 39 40 41 42 43 |
# File 'lib/mindee/image/image_extractor.rb', line 38 def self.extract_multiple_images_from_source(input_source, page_id, polygons) new_stream = load_input_source_pdf_page_as_stringio(input_source, page_id) new_stream.seek(0) extract_images_from_polygons(input_source, new_stream, page_id, polygons) end |
.load_input_source_pdf_page_as_stringio(input_file, page_id) ⇒ StringIO
Loads a single_page from an image file or a pdf document.
109 110 111 112 113 114 115 116 |
# File 'lib/mindee/image/image_extractor.rb', line 109 def self.load_input_source_pdf_page_as_stringio(input_file, page_id) input_file.io_stream.rewind if input_file.pdf? Mindee::PDF::PDFProcessor.get_page(Origami::PDF.read(input_file.io_stream), page_id) else input_file.io_stream end end |