Class: Mindee::Client

Inherits:
Object
  • Object
show all
Defined in:
lib/mindee/client.rb

Overview

Mindee API Client. See: developers.mindee.com/docs

Instance Method Summary collapse

Constructor Details

#initialize(api_key: '') ⇒ Client

Returns a new instance of Client.

Parameters:

  • api_key (String) (defaults to: '')


110
111
112
# File 'lib/mindee/client.rb', line 110

def initialize(api_key: '')
  @api_key = api_key
end

Instance Method Details

#create_endpoint(endpoint_name: '', account_name: '', version: '') ⇒ Mindee::HTTP::Endpoint

Creates a custom endpoint with the given values. Do not set for standard (off the shelf) endpoints.

Parameters:

  • endpoint_name (String) (defaults to: '')

    For custom endpoints, the “API name” field in the “Settings” page of the API Builder. Do not set for standard (off the shelf) endpoints.

  • account_name (String) (defaults to: '')

    For custom endpoints, your account or organization username on the API Builder. This is normally not required unless you have a custom endpoint which has the same name as a standard (off the shelf) endpoint.

  • version (String) (defaults to: '')

    For custom endpoints, version of the product

Returns:



410
411
412
413
414
415
416
417
# File 'lib/mindee/client.rb', line 410

def create_endpoint(endpoint_name: '', account_name: '', version: '')
  initialize_endpoint(
    Mindee::Product::Universal::Universal,
    endpoint_name: endpoint_name,
    account_name: ,
    version: version
  )
end

#enqueue(input_source, product_class, endpoint: nil, options: {}) ⇒ Mindee::Parsing::Common::ApiResponse

Enqueue a document for async parsing

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::URLInputSource)

    The source of the input document (local file or URL).

  • product_class (Mindee::Inference)

    The class of the product.

  • options (Hash) (defaults to: {})

    A hash of options to configure the enqueue behavior. Possible keys: * :endpoint [HTTP::Endpoint, nil] Endpoint of the API. Doesn’t need to be set in the case of OTS APIs. * :all_words [bool] Whether to extract all the words on each page. This performs a full OCR operation on the server and will increase response time. * :full_text [bool] Whether to include the full OCR text response in compatible APIs. This performs a full OCR operation on the server and may increase response time. * :close_file [bool] Whether to close() the file after parsing it. Set to false if you need to access the file after this operation. * :page_options [Hash, nil] Page cutting/merge options: - :page_indexes [Array<Integer>] Zero-based list of page indexes. - :operation [Symbol] Operation to apply on the document, given the page_indexes specified: - :KEEP_ONLY - keep only the specified pages, and remove all others. - :REMOVE - remove the specified pages, and keep all others. - :on_min_pages [Integer] Apply the operation only if the document has at least this many pages. * :cropper [bool] Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.

  • endpoint (Mindee::HTTP::Endpoint) (defaults to: nil)

    Endpoint of the API.

Returns:



212
213
214
215
216
217
218
219
220
221
222
223
224
225
# File 'lib/mindee/client.rb', line 212

def enqueue(input_source, product_class, endpoint: nil, options: {})
  opts = normalize_parse_options(options)
  endpoint ||= initialize_endpoint(product_class)
  logger.debug("Enqueueing document as '#{endpoint.url_root}'")

  prediction, raw_http = endpoint.predict_async(
    input_source,
    opts.all_words,
    opts.full_text,
    opts.close_file,
    opts.cropper
  )
  Mindee::Parsing::Common::ApiResponse.new(product_class, prediction, raw_http)
end

#enqueue_and_parse(input_source, product_class, endpoint, options) ⇒ Mindee::Parsing::Common::ApiResponse

Enqueue a document for async parsing and automatically try to retrieve it

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::URLInputSource)

    The source of the input document (local file or URL).

  • product_class (Mindee::Inference)

    The class of the product.

  • options (Hash)

    A hash of options to configure the parsing behavior. Possible keys: * :endpoint [HTTP::Endpoint, nil] Endpoint of the API. Doesn’t need to be set in the case of OTS APIs. * :all_words [bool] Whether to extract all the words on each page. This performs a full OCR operation on the server and will increase response time. * :full_text [bool] Whether to include the full OCR text response in compatible APIs. This performs a full OCR operation on the server and may increase response time. * :close_file [bool] Whether to close() the file after parsing it. Set to false if you need to access the file after this operation. * :page_options [Hash, nil] Page cutting/merge options: - :page_indexes [Array<Integer>] Zero-based list of page indexes. - :operation [Symbol] Operation to apply on the document, given the page_indexes specified: - :KEEP_ONLY - keep only the specified pages, and remove all others. - :REMOVE - remove the specified pages, and keep all others. - :on_min_pages [Integer] Apply the operation only if the document has at least this many pages. * :cropper [bool, nil] Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time. * :initial_delay_sec [Numeric] Initial delay before polling. Defaults to 2. * :delay_sec [Numeric] Delay between polling attempts. Defaults to 1.5. * :max_retries [Integer] Maximum number of retries. Defaults to 80.

  • endpoint (Mindee::HTTP::Endpoint)

    Endpoint of the API.

Returns:



269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
# File 'lib/mindee/client.rb', line 269

def enqueue_and_parse(input_source, product_class, endpoint, options)
  validate_async_params(options.initial_delay_sec, options.delay_sec, options.max_retries)
  enqueue_res = enqueue(input_source, product_class, endpoint: endpoint, options: options)
  job = enqueue_res.job or raise Errors::MindeeAPIError, 'Expected job to be present'
  job_id = job.id

  sleep(options.initial_delay_sec)
  polling_attempts = 1
  logger.debug("Successfully enqueued document with job id: '#{job_id}'")
  queue_res = parse_queued(job_id, product_class, endpoint: endpoint)
  queue_res_job = queue_res.job or raise Errors::MindeeAPIError, 'Expected job to be present'
  valid_statuses = [
    Mindee::Parsing::Common::JobStatus::WAITING,
    Mindee::Parsing::Common::JobStatus::PROCESSING,
  ]
  # @type var valid_statuses: Array[(:waiting | :processing | :completed | :failed)]
  while valid_statuses.include?(queue_res_job.status) && polling_attempts < options.max_retries
    logger.debug("Polling server for parsing result with job id: '#{job_id}'. Attempt #{polling_attempts}")
    sleep(options.delay_sec)
    queue_res = parse_queued(job_id, product_class, endpoint: endpoint)
    queue_res_job = queue_res.job or raise Errors::MindeeAPIError, 'Expected job to be present'
    polling_attempts += 1
  end

  if queue_res_job.status != Mindee::Parsing::Common::JobStatus::COMPLETED
    elapsed = options.initial_delay_sec + (polling_attempts * options.delay_sec.to_f)
    raise Errors::MindeeAPIError,
          "Asynchronous parsing request timed out after #{elapsed} seconds (#{polling_attempts} tries)"
  end

  queue_res
end

#execute_workflow(input_source, workflow_id, options: {}) ⇒ Mindee::Parsing::Common::WorkflowResponse

Sends a document to a workflow.

Accepts options either as a Hash or as a WorkflowOptions struct.

requiring authentication. * page_options [Hash, nil] Page cutting/merge options: * :page_indexes Zero-based list of page indexes. * :operation Operation to apply on the document, given the page_indexes specified: *:KEEP_ONLY- keep only the specified pages, and remove all others. *:REMOVE- remove the specified pages, and keep all others. *:on_min_pages` Apply the operation only if document has at least this many pages.

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::URLInputSource)
  • workflow_id (String)
  • options (Hash, WorkflowOptions) (defaults to: {})

    Options to configure workflow behavior. Possible keys: * document_alias [String, nil] Alias to give to the document. * priority [Symbol, nil] Priority to give to the document. * full_text [bool] Whether to include the full OCR text response in compatible APIs. * rag [bool, nil] Whether to enable Retrieval-Augmented Generation.

    • public_url [String, nil] A unique, encrypted URL for accessing the document validation interface without

Returns:



323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
# File 'lib/mindee/client.rb', line 323

def execute_workflow(input_source, workflow_id, options: {})
  opts = options.is_a?(WorkflowOptions) ? options : WorkflowOptions.new(params: options)
  if opts.respond_to?(:page_options) && input_source.is_a?(Input::Source::LocalInputSource)
    process_pdf_if_required(input_source,
                            opts)
  end

  workflow_endpoint = Mindee::HTTP::WorkflowEndpoint.new(workflow_id, api_key: @api_key)
  logger.debug("Sending document to workflow '#{workflow_id}'")

  prediction, raw_http = workflow_endpoint.execute_workflow(
    input_source,
    opts
  )

  Mindee::Parsing::Common::WorkflowResponse.new(Product::Universal::Universal, prediction, raw_http)
end

#load_prediction(product_class, local_response) ⇒ Mindee::Parsing::Common::ApiResponse

Load a prediction.

Parameters:

Returns:



346
347
348
349
350
351
352
353
354
355
# File 'lib/mindee/client.rb', line 346

def load_prediction(product_class, local_response)
  raise Errors::MindeeAPIError, 'Expected LocalResponse to not be nil.' if local_response.nil?

  response_hash = local_response.as_hash || {}
  raise Errors::MindeeAPIError, 'Expected LocalResponse#as_hash to return a hash.' if response_hash.nil?

  Mindee::Parsing::Common::ApiResponse.new(product_class, response_hash, response_hash.to_json)
rescue KeyError, Errors::MindeeAPIError
  raise Errors::MindeeInputError, 'No prediction found in local response.'
end

#parse(input_source, product_class, endpoint: nil, options: {}, enqueue: true) ⇒ Mindee::Parsing::Common::ApiResponse

Enqueue a document for parsing and automatically try to retrieve it if needed.

Accepts options either as a Hash or as a ParseOptions struct.

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::URLInputSource)
  • product_class (Mindee::Inference)

    The class of the product.

  • endpoint (Mindee::HTTP::Endpoint, nil) (defaults to: nil)

    Endpoint of the API.

  • options (Hash) (defaults to: {})

    A hash of options to configure the parsing behavior. Possible keys: * :all_words [bool] Whether to extract all the words on each page. This performs a full OCR operation on the server and will increase response time. * :full_text [bool] Whether to include the full OCR text response in compatible APIs. This performs a full OCR operation on the server and may increase response time. * :close_file [bool] Whether to close() the file after parsing it. Set to false if you need to access the file after this operation. * :page_options [Hash, nil] Page cutting/merge options: - :page_indexes [Array<Integer>] Zero-based list of page indexes. - :operation [Symbol] Operation to apply on the document, given the page_indexes specified: - :KEEP_ONLY - keep only the specified pages, and remove all others. - :REMOVE - remove the specified pages, and keep all others. - :on_min_pages [Integer] Apply the operation only if the document has at least this many pages. * :cropper [bool, nil] Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time. * :initial_delay_sec [Numeric] Initial delay before polling. Defaults to 2. * :delay_sec [Numeric] Delay between polling attempts. Defaults to 1.5. * :max_retries [Integer] Maximum number of retries. Defaults to 80.

  • enqueue (bool) (defaults to: true)

    Whether to enqueue the file.

Returns:



141
142
143
144
145
146
147
148
149
150
151
# File 'lib/mindee/client.rb', line 141

def parse(input_source, product_class, endpoint: nil, options: {}, enqueue: true)
  opts = normalize_parse_options(options)
  process_pdf_if_required(input_source, opts) if input_source.is_a?(Input::Source::LocalInputSource)
  endpoint ||= initialize_endpoint(product_class)

  if enqueue && product_class.has_async
    enqueue_and_parse(input_source, product_class, endpoint, opts)
  else
    parse_sync(input_source, product_class, endpoint, opts)
  end
end

#parse_queued(job_id, product_class, endpoint: nil) ⇒ Mindee::Parsing::Common::ApiResponse

Parses a queued document

Doesn’t need to be set in the case of OTS APIs.

Parameters:

  • job_id (String)

    ID of the job (queue) to poll from

  • product_class (Mindee::Inference)

    class of the product

  • endpoint (HTTP::Endpoint, nil) (defaults to: nil)

    Endpoint of the API

Returns:



235
236
237
238
239
240
# File 'lib/mindee/client.rb', line 235

def parse_queued(job_id, product_class, endpoint: nil)
  endpoint = initialize_endpoint(product_class) if endpoint.nil?
  logger.debug("Fetching queued document as '#{endpoint.url_root}'")
  prediction, raw_http = endpoint.parse_async(job_id)
  Mindee::Parsing::Common::ApiResponse.new(product_class, prediction, raw_http)
end

#source_from_b64string(base64_string, filename, repair_pdf: false) ⇒ Mindee::Input::Source::Base64InputSource

Load a document from a base64 encoded string.

Parameters:

  • base64_string (String)

    Input to parse as base64 string

  • filename (String)

    The name of the file (without the path)

  • repair_pdf (bool) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



379
380
381
# File 'lib/mindee/client.rb', line 379

def source_from_b64string(base64_string, filename, repair_pdf: false)
  Input::Source::Base64InputSource.new(base64_string, filename, repair_pdf: repair_pdf)
end

#source_from_bytes(input_bytes, filename, repair_pdf: false) ⇒ Mindee::Input::Source::BytesInputSource

Load a document from raw bytes.

Parameters:

  • input_bytes (String)

    Encoding::BINARY byte input

  • filename (String)

    The name of the file (without the path)

  • repair_pdf (bool) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



370
371
372
# File 'lib/mindee/client.rb', line 370

def source_from_bytes(input_bytes, filename, repair_pdf: false)
  Input::Source::BytesInputSource.new(input_bytes, filename, repair_pdf: repair_pdf)
end

#source_from_file(input_file, filename, repair_pdf: false) ⇒ Mindee::Input::Source::FileInputSource

Load a document from a normal Ruby File.

Parameters:

  • input_file (File)

    Input file handle

  • filename (String)

    The name of the file (without the path)

  • repair_pdf (bool) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



388
389
390
# File 'lib/mindee/client.rb', line 388

def source_from_file(input_file, filename, repair_pdf: false)
  Input::Source::FileInputSource.new(input_file, filename, repair_pdf: repair_pdf)
end

#source_from_path(input_path, repair_pdf: false) ⇒ Mindee::Input::Source::PathInputSource

Load a document from an absolute path, as a string.

Parameters:

  • input_path (String)

    Path of file to open

  • repair_pdf (bool) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



361
362
363
# File 'lib/mindee/client.rb', line 361

def source_from_path(input_path, repair_pdf: false)
  Input::Source::PathInputSource.new(input_path, repair_pdf: repair_pdf)
end

#source_from_url(url) ⇒ Mindee::Input::Source::URLInputSource

Load a document from a secure remote source (HTTPS).

Parameters:

  • url (String)

    URL of the file

Returns:



395
396
397
# File 'lib/mindee/client.rb', line 395

def source_from_url(url)
  Input::Source::URLInputSource.new(url)
end