Class: Mindee::Client

Inherits:
Object
  • Object
show all
Defined in:
lib/mindee/client.rb

Overview

Mindee API Client. See: developers.mindee.com/docs

Instance Method Summary collapse

Constructor Details

#initialize(api_key: '') ⇒ Client

Returns a new instance of Client.

Parameters:

  • api_key (String) (defaults to: '')


13
14
15
# File 'lib/mindee/client.rb', line 13

def initialize(api_key: '')
  @api_key = api_key
end

Instance Method Details

#create_endpoint(endpoint_name: '', account_name: '', version: '') ⇒ Mindee::HTTP::Endpoint

Creates a custom endpoint with the given values. Do not set for standard (off the shelf) endpoints.

Parameters:

  • endpoint_name (String) (defaults to: '')

    For custom endpoints, the “API name” field in the “Settings” page of the API Builder. Do not set for standard (off the shelf) endpoints.

  • account_name (String) (defaults to: '')

    For custom endpoints, your account or organization username on the API Builder. This is normally not required unless you have a custom endpoint which has the same name as a standard (off the shelf) endpoint.

  • version (String) (defaults to: '')

    For custom endpoints, version of the product

Returns:



302
303
304
305
# File 'lib/mindee/client.rb', line 302

def create_endpoint(endpoint_name: '', account_name: '', version: '')
  initialize_endpoint(Mindee::Product::Custom::CustomV1, endpoint_name: endpoint_name, account_name: ,
                                                         version: version)
end

#enqueue(input_source, product_class, endpoint: nil, all_words: false, full_text: false, close_file: true, page_options: nil, cropper: false) ⇒ Mindee::Parsing::Common::ApiResponse

Enqueue a document for async parsing

Doesn’t need to be set in the case of OTS APIs.

Parameters:

  • product_class (Mindee::Inference)

    class of the product

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::UrlInputSource)
  • endpoint (HTTP::Endpoint, nil) (defaults to: nil)

    Endpoint of the API.

  • all_words (Boolean) (defaults to: false)

    Whether to extract all the words on each page. This performs a full OCR operation on the server and will increase response time.

  • full_text (Boolean) (defaults to: false)

    Whether to include the full OCR text response in compatible APIs. This performs a full OCR operation on the server and may increase response time.

  • close_file (Boolean) (defaults to: true)

    Whether to close() the file after parsing it. Set to false if you need to access the file after this operation.

  • page_options (Hash, nil) (defaults to: nil)

    Page cutting/merge options:

    • :page_indexes Zero-based list of page indexes.

    • :operation Operation to apply on the document, given the ‘page_indexes specified:

      • :KEEP_ONLY - keep only the specified pages, and remove all others.

      • :REMOVE - remove the specified pages, and keep all others.

    • :on_min_pages Apply the operation only if document has at least this many pages.

  • cropper (Boolean) (defaults to: false)

    Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.

Returns:



93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
# File 'lib/mindee/client.rb', line 93

def enqueue(
  input_source,
  product_class,
  endpoint: nil,
  all_words: false,
  full_text: false,
  close_file: true,
  page_options: nil,
  cropper: false
)
  if input_source.is_a?(Mindee::Input::Source::LocalInputSource) && !page_options.nil? && input_source.pdf?
    input_source.process_pdf(page_options)
  end
  endpoint = initialize_endpoint(product_class) if endpoint.nil?
  prediction, raw_http = endpoint.predict_async(input_source, all_words, full_text, close_file, cropper)
  Mindee::Parsing::Common::ApiResponse.new(product_class,
                                           prediction, raw_http)
end

#enqueue_and_parse(input_source, product_class, endpoint: nil, all_words: false, full_text: false, close_file: true, page_options: nil, cropper: false, initial_delay_sec: 2, delay_sec: 1.5, max_retries: 80) ⇒ Mindee::Parsing::Common::ApiResponse

Enqueue a document for async parsing and automatically try to retrieve it

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::UrlInputSource)
  • product_class (Mindee::Inference)

    class of the product

  • endpoint (HTTP::Endpoint, nil) (defaults to: nil)

    Endpoint of the API. Doesn’t need to be set in the case of OTS APIs.

  • all_words (Boolean) (defaults to: false)

    Whether to extract all the words on each page. This performs a full OCR operation on the server and will increase response time.

  • full_text (Boolean) (defaults to: false)

    Whether to include the full OCR text response in compatible APIs. This performs a full OCR operation on the server and may increase response time.

  • close_file (Boolean) (defaults to: true)

    Whether to close() the file after parsing it. Set to false if you need to access the file after this operation.

  • page_options (Hash, nil) (defaults to: nil)

    Page cutting/merge options: * :page_indexes Zero-based list of page indexes. * :operation Operation to apply on the document, given the page_indexes specified: *:KEEP_ONLY- keep only the specified pages, and remove all others. *:REMOVE- remove the specified pages, and keep all others. *:on_min_pages` Apply the operation only if document has at least this many pages.

  • cropper (Boolean, nil) (defaults to: false)

    Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.

  • initial_delay_sec (Integer, Float) (defaults to: 2)

    initial delay before polling. Defaults to 2.

  • delay_sec (Integer, Float) (defaults to: 1.5)

    delay between polling attempts. Defaults to 1.5.

  • max_retries (Integer) (defaults to: 80)

    maximum amount of retries. Defaults to 80.

Returns:



156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
# File 'lib/mindee/client.rb', line 156

def enqueue_and_parse(
  input_source,
  product_class,
  endpoint: nil,
  all_words: false,
  full_text: false,
  close_file: true,
  page_options: nil,
  cropper: false,
  initial_delay_sec: 2,
  delay_sec: 1.5,
  max_retries: 80
)
  enqueue_res = enqueue(
    input_source,
    product_class,
    endpoint: endpoint,
    all_words: all_words,
    full_text: full_text,
    close_file: close_file,
    page_options: page_options,
    cropper: cropper
  )
  sleep(initial_delay_sec)
  polling_attempts = 1
  job_id = enqueue_res.job.id
  queue_res = parse_queued(job_id, product_class, endpoint: endpoint)
  while queue_res.job.status != Mindee::Parsing::Common::JobStatus::COMPLETED && polling_attempts < max_retries
    sleep(delay_sec)
    queue_res = parse_queued(job_id, product_class, endpoint: endpoint)
    polling_attempts += 1
  end
  if queue_res.job.status != Mindee::Parsing::Common::JobStatus::COMPLETED
    elapsed = initial_delay_sec + (polling_attempts * delay_sec)
    raise "Asynchronous parsing request timed out after #{elapsed} seconds (#{polling_attempts} tries)"
  end

  queue_res
end

#execute_workflow(input_source, workflow_id, document_alias: nil, priority: nil, full_text: false, public_url: nil, page_options: nil) ⇒ Mindee::Parsing::Common::WorkflowResponse

Sends a document to a workflow.

requiring authentication.

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::UrlInputSource)
  • document_alias (String, nil) (defaults to: nil)

    Alias to give to the document.

  • priority (Symbol, nil) (defaults to: nil)

    Priority to give to the document.

  • full_text (Boolean) (defaults to: false)

    Whether to include the full OCR text response in compatible APIs. This performs a full OCR operation on the server and may increase response time.

  • public_url (String, nil) (defaults to: nil)

    A unique, encrypted URL for accessing the document validation interface without

  • page_options (Hash, nil) (defaults to: nil)

    Page cutting/merge options:

    • :page_indexes Zero-based list of page indexes.

    • :operation Operation to apply on the document, given the ‘page_indexes specified:

      • :KEEP_ONLY - keep only the specified pages, and remove all others.

      • :REMOVE - remove the specified pages, and keep all others.

    • :on_min_pages Apply the operation only if document has at least this many pages.

Returns:



218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'lib/mindee/client.rb', line 218

def execute_workflow(
  input_source,
  workflow_id,
  document_alias: nil,
  priority: nil,
  full_text: false,
  public_url: nil,
  page_options: nil
)
  if input_source.is_a?(Mindee::Input::Source::LocalInputSource) && !page_options.nil? && input_source.pdf?
    input_source.process_pdf(page_options)
  end

  workflow_endpoint = Mindee::HTTP::WorkflowEndpoint.new(workflow_id, api_key: @api_key)
  prediction, raw_http = workflow_endpoint.execute_workflow(input_source, full_text, document_alias, priority,
                                                            public_url)
  Mindee::Parsing::Common::WorkflowResponse.new(Product::Generated::GeneratedV1,
                                                prediction, raw_http)
end

#load_prediction(product_class, local_response) ⇒ Mindee::Parsing::Common::ApiResponse

Load a prediction.

Parameters:

Returns:



243
244
245
246
247
# File 'lib/mindee/client.rb', line 243

def load_prediction(product_class, local_response)
  Mindee::Parsing::Common::ApiResponse.new(product_class, local_response.as_hash, local_response.as_hash.to_json)
rescue KeyError
  raise 'No prediction found in local response.'
end

#parse(input_source, product_class, endpoint: nil, all_words: false, full_text: false, close_file: true, page_options: nil, cropper: false) ⇒ Mindee::Parsing::Common::ApiResponse

Call prediction API on a document and parse the results.

Doesn’t need to be set in the case of OTS APIs.

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::UrlInputSource)
  • product_class (Mindee::Inference)

    class of the product

  • endpoint (HTTP::Endpoint) (defaults to: nil)

    Endpoint of the API

  • all_words (Boolean) (defaults to: false)

    Whether to include the full text for each page. This performs a full OCR operation on the server and will increase response time.

  • full_text (Boolean) (defaults to: false)

    Whether to include the full OCR text response in compatible APIs. This performs a full OCR operation on the server and may increase response time.

  • close_file (Boolean) (defaults to: true)

    Whether to close() the file after parsing it. Set to false if you need to access the file after this operation.

  • page_options (Hash, nil) (defaults to: nil)

    Page cutting/merge options:

    • :page_indexes Zero-based list of page indexes.

    • :operation Operation to apply on the document, given the ‘page_indexes specified:

      • :KEEP_ONLY - keep only the specified pages, and remove all others.

      • :REMOVE - remove the specified pages, and keep all others.

    • :on_min_pages Apply the operation only if document has at least this many pages.

  • cropper (Boolean) (defaults to: false)

    Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.

Returns:



46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# File 'lib/mindee/client.rb', line 46

def parse(
  input_source,
  product_class,
  endpoint: nil,
  all_words: false,
  full_text: false,
  close_file: true,
  page_options: nil,
  cropper: false
)
  if input_source.is_a?(Mindee::Input::Source::LocalInputSource) && !page_options.nil? && input_source.pdf?
    input_source.process_pdf(page_options)
  end
  endpoint = initialize_endpoint(product_class) if endpoint.nil?
  prediction, raw_http = endpoint.predict(input_source, all_words, full_text, close_file, cropper)
  Mindee::Parsing::Common::ApiResponse.new(product_class, prediction, raw_http)
end

#parse_queued(job_id, product_class, endpoint: nil) ⇒ Mindee::Parsing::Common::ApiResponse

Parses a queued document

Doesn’t need to be set in the case of OTS APIs.

Parameters:

  • job_id (String)

    Id of the job (queue) to poll from

  • product_class (Mindee::Inference)

    class of the product

  • endpoint (HTTP::Endpoint, nil) (defaults to: nil)

    Endpoint of the API

Returns:



120
121
122
123
124
125
126
127
128
# File 'lib/mindee/client.rb', line 120

def parse_queued(
  job_id,
  product_class,
  endpoint: nil
)
  endpoint = initialize_endpoint(product_class) if endpoint.nil?
  prediction, raw_http = endpoint.parse_async(job_id)
  Mindee::Parsing::Common::ApiResponse.new(product_class, prediction, raw_http)
end

#source_from_b64string(base64_string, filename, fix_pdf: false) ⇒ Mindee::Input::Source::Base64InputSource

Load a document from a base64 encoded string.

Parameters:

  • base64_string (String)

    Input to parse as base64 string

  • filename (String)

    The name of the file (without the path)

  • fix_pdf (Boolean) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



271
272
273
# File 'lib/mindee/client.rb', line 271

def source_from_b64string(base64_string, filename, fix_pdf: false)
  Input::Source::Base64InputSource.new(base64_string, filename, fix_pdf: fix_pdf)
end

#source_from_bytes(input_bytes, filename, fix_pdf: false) ⇒ Mindee::Input::Source::BytesInputSource

Load a document from raw bytes.

Parameters:

  • input_bytes (String)

    Encoding::BINARY byte input

  • filename (String)

    The name of the file (without the path)

  • fix_pdf (Boolean) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



262
263
264
# File 'lib/mindee/client.rb', line 262

def source_from_bytes(input_bytes, filename, fix_pdf: false)
  Input::Source::BytesInputSource.new(input_bytes, filename, fix_pdf: fix_pdf)
end

#source_from_file(input_file, filename, fix_pdf: false) ⇒ Mindee::Input::Source::FileInputSource

Load a document from a normal Ruby File.

Parameters:

  • input_file (File)

    Input file handle

  • filename (String)

    The name of the file (without the path)

  • fix_pdf (Boolean) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



280
281
282
# File 'lib/mindee/client.rb', line 280

def source_from_file(input_file, filename, fix_pdf: false)
  Input::Source::FileInputSource.new(input_file, filename, fix_pdf: fix_pdf)
end

#source_from_path(input_path, fix_pdf: false) ⇒ Mindee::Input::Source::PathInputSource

Load a document from an absolute path, as a string.

Parameters:

  • input_path (String)

    Path of file to open

  • fix_pdf (Boolean) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



253
254
255
# File 'lib/mindee/client.rb', line 253

def source_from_path(input_path, fix_pdf: false)
  Input::Source::PathInputSource.new(input_path, fix_pdf: fix_pdf)
end

#source_from_url(url) ⇒ Mindee::Input::Source::UrlInputSource

Load a document from a secure remote source (HTTPS).

Parameters:

  • url (String)

    Url of the file

Returns:



287
288
289
# File 'lib/mindee/client.rb', line 287

def source_from_url(url)
  Input::Source::UrlInputSource.new(url)
end