Class: Mindee::Client

Inherits:
Object
  • Object
show all
Defined in:
lib/mindee/client.rb

Overview

Mindee API Client. See: developers.mindee.com/docs

Instance Method Summary collapse

Constructor Details

#initialize(api_key: '') ⇒ Client

Returns a new instance of Client.

Parameters:

  • api_key (String) (defaults to: '')


13
14
15
# File 'lib/mindee/client.rb', line 13

def initialize(api_key: '')
  @api_key = api_key
end

Instance Method Details

#create_endpoint(endpoint_name: '', account_name: '', version: '') ⇒ Mindee::HTTP::Endpoint

Creates a custom endpoint with the given values. Do not set for standard (off the shelf) endpoints.

Parameters:

  • endpoint_name (String) (defaults to: '')

    For custom endpoints, the “API name” field in the “Settings” page of the API Builder. Do not set for standard (off the shelf) endpoints.

  • account_name (String) (defaults to: '')

    For custom endpoints, your account or organization username on the API Builder. This is normally not required unless you have a custom endpoint which has the same name as a standard (off the shelf) endpoint.

  • version (String) (defaults to: '')

    For custom endpoints, version of the product

Returns:



238
239
240
241
# File 'lib/mindee/client.rb', line 238

def create_endpoint(endpoint_name: '', account_name: '', version: '')
  initialize_endpoint(Mindee::Product::Custom::CustomV1, endpoint_name: endpoint_name, account_name: ,
                                                         version: version)
end

#enqueue(input_source, product_class, endpoint: nil, all_words: false, close_file: true, page_options: nil, cropper: false) ⇒ Mindee::Parsing::Common::ApiResponse

Enqueue a document for async parsing

Doesn’t need to be set in the case of OTS APIs.

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::UrlInputSource)
  • product_class (Mindee::Product)

    class of the product

  • endpoint (HTTP::Endpoint, nil) (defaults to: nil)

    Endpoint of the API.

  • all_words (Boolean) (defaults to: false)

    Whether to extract all the words on each page. This performs a full OCR operation on the server and will increase response time.

  • close_file (Boolean) (defaults to: true)

    Whether to close() the file after parsing it. Set to false if you need to access the file after this operation.

  • page_options (Hash, nil) (defaults to: nil)

    Page cutting/merge options:

    • :page_indexes Zero-based list of page indexes.

    • :operation Operation to apply on the document, given the ‘page_indexes specified:

      • :KEEP_ONLY - keep only the specified pages, and remove all others.

      • :REMOVE - remove the specified pages, and keep all others.

    • :on_min_pages Apply the operation only if document has at least this many pages.

  • cropper (Boolean) (defaults to: false)

    Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.

Returns:



86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
# File 'lib/mindee/client.rb', line 86

def enqueue(
  input_source,
  product_class,
  endpoint: nil,
  all_words: false,
  close_file: true,
  page_options: nil,
  cropper: false
)
  if input_source.is_a?(Mindee::Input::Source::LocalInputSource) && !page_options.nil? && input_source.pdf?
    input_source.process_pdf(page_options)
  end
  endpoint = initialize_endpoint(product_class) if endpoint.nil?
  prediction, raw_http = endpoint.predict_async(input_source, all_words, close_file, cropper)
  Mindee::Parsing::Common::ApiResponse.new(product_class,
                                           prediction, raw_http)
end

#enqueue_and_parse(input_source, product_class, endpoint: nil, all_words: false, close_file: true, page_options: nil, cropper: false, initial_delay_sec: 4, delay_sec: 2, max_retries: 60) ⇒ Mindee::Parsing::Common::ApiResponse

rubocop:disable Metrics/ParameterLists Enqueue a document for async parsing and automatically try to retrieve it

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::UrlInputSource)
  • product_class (Mindee::Product)

    class of the product

  • endpoint (HTTP::Endpoint, nil) (defaults to: nil)

    Endpoint of the API. Doesn’t need to be set in the case of OTS APIs.

  • all_words (Boolean) (defaults to: false)

    Whether to extract all the words on each page. This performs a full OCR operation on the server and will increase response time.

  • close_file (Boolean) (defaults to: true)

    Whether to close() the file after parsing it. Set to false if you need to access the file after this operation.

  • page_options (Hash, nil) (defaults to: nil)

    Page cutting/merge options: * :page_indexes Zero-based list of page indexes. * :operation Operation to apply on the document, given the page_indexes specified: *:KEEP_ONLY- keep only the specified pages, and remove all others. *:REMOVE- remove the specified pages, and keep all others. *:on_min_pages` Apply the operation only if document has at least this many pages.

  • cropper (Boolean, nil) (defaults to: false)

    Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.

  • initial_delay_sec (Integer, Float) (defaults to: 4)

    initial delay before polling. Defaults to 4.

  • delay_sec (Integer, Float) (defaults to: 2)

    delay between polling attempts. Defaults to 2.

  • max_retries (Integer) (defaults to: 60)

    maximum amount of retries. Defaults to 60.

Returns:



145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
# File 'lib/mindee/client.rb', line 145

def enqueue_and_parse(
  input_source,
  product_class,
  endpoint: nil,
  all_words: false,
  close_file: true,
  page_options: nil,
  cropper: false,
  initial_delay_sec: 4,
  delay_sec: 2,
  max_retries: 60
)
  enqueue_res = enqueue(
    input_source,
    product_class,
    endpoint: endpoint,
    all_words: all_words,
    close_file: close_file,
    page_options: page_options,
    cropper: cropper
  )
  sleep(initial_delay_sec)
  polling_attempts = 1
  job_id = enqueue_res.job.id
  queue_res = parse_queued(job_id, product_class, endpoint: endpoint)
  while queue_res.job.status != Mindee::Parsing::Common::JobStatus::COMPLETED && polling_attempts < max_retries
    sleep(delay_sec)
    queue_res = parse_queued(job_id, product_class, endpoint: endpoint)
    polling_attempts += 1
  end
  if queue_res.job.status != Mindee::Parsing::Common::JobStatus::COMPLETED
    elapsed = initial_delay_sec + (polling_attempts * delay_sec)
    raise "Asynchronous parsing request timed out after #{elapsed} seconds (#{polling_attempts} tries)"
  end

  queue_res
end

#parse(input_source, product_class, endpoint: nil, all_words: false, close_file: true, page_options: nil, cropper: false) ⇒ Mindee::Parsing::Common::ApiResponse

Call prediction API on a document and parse the results.

Doesn’t need to be set in the case of OTS APIs.

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::UrlInputSource)
  • product_class (Mindee::Product)

    class of the product

  • endpoint (HTTP::Endpoint) (defaults to: nil)

    Endpoint of the API

  • all_words (Boolean) (defaults to: false)

    Whether to include the full text for each page. This performs a full OCR operation on the server and will increase response time.

  • close_file (Boolean) (defaults to: true)

    Whether to close() the file after parsing it. Set to false if you need to access the file after this operation.

  • page_options (Hash, nil) (defaults to: nil)

    Page cutting/merge options:

    • :page_indexes Zero-based list of page indexes.

    • :operation Operation to apply on the document, given the ‘page_indexes specified:

      • :KEEP_ONLY - keep only the specified pages, and remove all others.

      • :REMOVE - remove the specified pages, and keep all others.

    • :on_min_pages Apply the operation only if document has at least this many pages.

  • cropper (Boolean) (defaults to: false)

    Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.

Returns:



43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# File 'lib/mindee/client.rb', line 43

def parse(
  input_source,
  product_class,
  endpoint: nil,
  all_words: false,
  close_file: true,
  page_options: nil,
  cropper: false
)
  if input_source.is_a?(Mindee::Input::Source::LocalInputSource) && !page_options.nil? && input_source.pdf?
    input_source.process_pdf(page_options)
  end
  endpoint = initialize_endpoint(product_class) if endpoint.nil?
  prediction, raw_http = endpoint.predict(input_source, all_words, close_file, cropper)
  Mindee::Parsing::Common::ApiResponse.new(product_class, prediction, raw_http)
end

#parse_queued(job_id, product_class, endpoint: nil) ⇒ Mindee::Parsing::Common::ApiResponse

Parses a queued document

Doesn’t need to be set in the case of OTS APIs.

Parameters:

  • job_id (String)

    Id of the job (queue) to poll from

  • product_class (Mindee::Product)

    class of the product

  • endpoint (HTTP::Endpoint, nil) (defaults to: nil)

    Endpoint of the API

Returns:



112
113
114
115
116
117
118
119
120
# File 'lib/mindee/client.rb', line 112

def parse_queued(
  job_id,
  product_class,
  endpoint: nil
)
  endpoint = initialize_endpoint(product_class) if endpoint.nil?
  prediction, raw_http = endpoint.parse_async(job_id)
  Mindee::Parsing::Common::ApiResponse.new(product_class, prediction, raw_http)
end

#source_from_b64string(base64_string, filename, fix_pdf: false) ⇒ Mindee::Input::Source::Base64InputSource

Load a document from a base64 encoded string.

Parameters:

  • base64_string (String)

    Input to parse as base64 string

  • filename (String)

    The name of the file (without the path)

  • fix_pdf (Boolean) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



207
208
209
# File 'lib/mindee/client.rb', line 207

def source_from_b64string(base64_string, filename, fix_pdf: false)
  Input::Source::Base64InputSource.new(base64_string, filename, fix_pdf: fix_pdf)
end

#source_from_bytes(input_bytes, filename, fix_pdf: false) ⇒ Mindee::Input::Source::BytesInputSource

Load a document from raw bytes.

Parameters:

  • input_bytes (String)

    Encoding::BINARY byte input

  • filename (String)

    The name of the file (without the path)

  • fix_pdf (Boolean) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



198
199
200
# File 'lib/mindee/client.rb', line 198

def source_from_bytes(input_bytes, filename, fix_pdf: false)
  Input::Source::BytesInputSource.new(input_bytes, filename, fix_pdf: fix_pdf)
end

#source_from_file(input_file, filename, fix_pdf: false) ⇒ Mindee::Input::Source::FileInputSource

Load a document from a normal Ruby File.

Parameters:

  • input_file (File)

    Input file handle

  • filename (String)

    The name of the file (without the path)

  • fix_pdf (Boolean) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



216
217
218
# File 'lib/mindee/client.rb', line 216

def source_from_file(input_file, filename, fix_pdf: false)
  Input::Source::FileInputSource.new(input_file, filename, fix_pdf: fix_pdf)
end

#source_from_path(input_path, fix_pdf: false) ⇒ Mindee::Input::Source::PathInputSource

Load a document from an absolute path, as a string.

Parameters:

  • input_path (String)

    Path of file to open

  • fix_pdf (Boolean) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



189
190
191
# File 'lib/mindee/client.rb', line 189

def source_from_path(input_path, fix_pdf: false)
  Input::Source::PathInputSource.new(input_path, fix_pdf: fix_pdf)
end

#source_from_url(url) ⇒ Mindee::Input::Source::UrlInputSource

Load a document from a secure remote source (HTTPS).

Parameters:

  • url (String)

    Url of the file

Returns:



223
224
225
# File 'lib/mindee/client.rb', line 223

def source_from_url(url)
  Input::Source::UrlInputSource.new(url)
end