Module: Mindee::Image::ImageExtractor
- Defined in:
- lib/mindee/image/image_extractor.rb
Overview
Image Extraction wrapper class.
Class Method Summary collapse
- 
  
    
      .attach_image_as_new_file(input_buffer, format: 'jpg')  ⇒ Origami::PDF 
    
    
  
  
  
  
  
  
  
  
  
    Attaches an image as a new page in a PdfDocument object. 
- 
  
    
      .create_extracted_image(buffer, file_name, page_id, element_id)  ⇒ Object 
    
    
  
  
  
  
  
  
  
  
  
    Generates an ExtractedImage. 
- 
  
    
      .extract_images_from_polygons(input_source, pdf_stream, page_id, polygons)  ⇒ Array<Mindee::Image::ExtractedImage> 
    
    
  
  
  
  
  
  
  
  
  
    Extracts images from their positions on a file (as polygons). 
- 
  
    
      .extract_multiple_images_from_source(input_source, page_id, polygons)  ⇒ Array<Mindee::Image::ExtractedImage> 
    
    
  
  
  
  
  
  
  
  
  
    Extracts multiple images from a given local input source. 
- 
  
    
      .load_input_source_pdf_page_as_stringio(input_file, page_id)  ⇒ StringIO 
    
    
  
  
  
  
  
  
  
  
  
    Loads a single_page from an image file or a pdf document. 
Class Method Details
.attach_image_as_new_file(input_buffer, format: 'jpg') ⇒ Origami::PDF
Attaches an image as a new page in a PdfDocument object.
| 19 20 21 22 23 24 25 26 27 28 29 | # File 'lib/mindee/image/image_extractor.rb', line 19 def self.attach_image_as_new_file(input_buffer, format: 'jpg') magick_image = MiniMagick::Image.read(input_buffer) # NOTE: some jpeg images get rendered as three different versions of themselves per output if the format isn't # converted. magick_image.format(format) original_density = magick_image.resolution scale_factor = original_density[0].to_f / 4.166666 # No clue why the resolution needs to be reduced for # the pdf otherwise the resulting image shrinks. magick_image.format('pdf', 0, { density: scale_factor.to_s }) Origami::PDF.read(StringIO.new(magick_image.to_blob)) end | 
.create_extracted_image(buffer, file_name, page_id, element_id) ⇒ Object
Generates an ExtractedImage.
| 95 96 97 98 99 100 101 102 | # File 'lib/mindee/image/image_extractor.rb', line 95 def self.create_extracted_image(buffer, file_name, page_id, element_id) buffer.rewind ExtractedImage.new( Mindee::Input::Source::BytesInputSource.new(buffer.read.to_s, file_name), page_id, element_id ) end | 
.extract_images_from_polygons(input_source, pdf_stream, page_id, polygons) ⇒ Array<Mindee::Image::ExtractedImage>
Extracts images from their positions on a file (as polygons).
| 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | # File 'lib/mindee/image/image_extractor.rb', line 52 def self.extract_images_from_polygons(input_source, pdf_stream, page_id, polygons) extracted_elements = [] polygons.each_with_index do |polygon, element_id| polygon = ImageUtils.normalize_polygon(polygon) page_content = ImageUtils.read_page_content(pdf_stream) min_max_x = Geometry.get_min_max_x([ polygon.top_left, polygon.bottom_right, polygon.top_right, polygon.bottom_left, ]) min_max_y = Geometry.get_min_max_y([ polygon.top_left, polygon.bottom_right, polygon.top_right, polygon.bottom_left, ]) file_extension = ImageUtils.determine_file_extension(input_source) cropped_image = ImageUtils.crop_image(page_content, min_max_x, min_max_y) if file_extension == 'pdf' cropped_image.format('jpg') else cropped_image.format(file_extension.to_s) end buffer = StringIO.new ImageUtils.write_image_to_buffer(cropped_image, buffer) file_name = "#{input_source.filename}_page#{page_id}-#{element_id}.#{file_extension}" extracted_elements << create_extracted_image(buffer, file_name, page_id, element_id) end extracted_elements end | 
.extract_multiple_images_from_source(input_source, page_id, polygons) ⇒ Array<Mindee::Image::ExtractedImage>
Extracts multiple images from a given local input source.
to extract.
| 38 39 40 41 42 43 | # File 'lib/mindee/image/image_extractor.rb', line 38 def self.extract_multiple_images_from_source(input_source, page_id, polygons) new_stream = load_input_source_pdf_page_as_stringio(input_source, page_id) new_stream.seek(0) extract_images_from_polygons(input_source, new_stream, page_id, polygons) end | 
.load_input_source_pdf_page_as_stringio(input_file, page_id) ⇒ StringIO
Loads a single_page from an image file or a pdf document.
| 109 110 111 112 113 114 115 116 | # File 'lib/mindee/image/image_extractor.rb', line 109 def self.load_input_source_pdf_page_as_stringio(input_file, page_id) input_file.io_stream.rewind if input_file.pdf? Mindee::PDF::PDFProcessor.get_page(Origami::PDF.read(input_file.io_stream), page_id) else input_file.io_stream end end |