> ## Documentation Index
> Fetch the complete documentation index at: https://edenai-docs-github-copilot-integration.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# OCR Multipage

> OCR or Optical Character Recognition is also referred to as text recognition or text extraction. It allows users to extract text data from PDFs with multiple pages.

export const TechArticleSchema = ({title, description, path, articleSection, about, proficiencyLevel = "Beginner", dependencies, keywords = [], datePublished, dateModified, image, inLanguage = "en"}) => {
  const baseUrl = "https://www.edenai.co/docs";
  const canonicalUrl = `${baseUrl}/${path}`.replace(/\/+$/, "");
  const ogParams = new URLSearchParams({
    division: articleSection || "",
    title: title || "",
    description: description || ""
  });
  const resolvedImage = image || `https://edenai.mintlify.app/_mintlify/api/og?${ogParams.toString()}`;
  const data = {
    "@context": "https://schema.org",
    "@type": "TechArticle",
    "@id": `${canonicalUrl}#techarticle`,
    mainEntityOfPage: {
      "@type": "WebPage",
      "@id": canonicalUrl
    },
    headline: title,
    name: title,
    description: description,
    url: canonicalUrl,
    inLanguage: inLanguage,
    isPartOf: {
      "@type": "WebSite",
      name: "Eden AI Documentation",
      url: baseUrl
    },
    author: [{
      "@type": "Organization",
      name: "Eden AI",
      url: "https://www.edenai.co/"
    }],
    publisher: {
      "@type": "Organization",
      name: "Eden AI",
      url: "https://www.edenai.co/",
      logo: {
        "@type": "ImageObject",
        url: "https://www.edenai.co/assets/logo.png"
      }
    }
  };
  if (articleSection) data.articleSection = articleSection;
  if (about) data.about = {
    "@type": "Thing",
    name: about
  };
  if (proficiencyLevel) data.proficiencyLevel = proficiencyLevel;
  if (dependencies) data.dependencies = dependencies;
  if (keywords && keywords.length) data.keywords = keywords;
  if (datePublished) data.datePublished = datePublished;
  if (dateModified) data.dateModified = dateModified;
  data.image = Array.isArray(resolvedImage) ? resolvedImage : [resolvedImage];
  const json = JSON.stringify(data);
  const schemaId = `techarticle-${canonicalUrl}`;
  React.useEffect(() => {
    if (typeof document === "undefined") return;
    document.querySelectorAll(`script[data-schema-id="${schemaId}"]`).forEach(n => n.remove());
    const script = document.createElement("script");
    script.type = "application/ld+json";
    script.dataset.schemaId = schemaId;
    script.textContent = json;
    document.head.appendChild(script);
    return () => script.remove();
  }, [json, schemaId]);
  return null;
};

<TechArticleSchema title={`OCR Multipage`} description={`OCR or Optical Character Recognition is also referred to as text recognition or text extraction. It allows users to extract text data from PDFs with multiple pages.`} path="v3/expert-models/features/ocr/ocr-async" articleSection="OCR Features" about={`OCR API`} proficiencyLevel="Intermediate" keywords={[`Eden AI`, `AI API`, `OCR`, `document parsing`]} datePublished="2026-05-06T00:00:00Z" dateModified="2026-05-07T00:00:00Z" />

## Endpoint

`POST /v3/universal-ai/async` (async)

Model string pattern: `ocr/ocr_async/{provider}[/{model}]`

## Input

| Field | Type        | Required | Description                                             |
| ----- | ----------- | -------- | ------------------------------------------------------- |
| file  | file\_input | Yes      | PDF or image file ID from /v3/upload or direct file URL |

## Output

| Field                         | Type           | Required | Description                             |
| ----------------------------- | -------------- | -------- | --------------------------------------- |
| raw\_text                     | string         | Yes      |                                         |
| **pages**                     | array\[object] | No       | List of pages                           |
|     **lines**                 | array\[object] | No       | List of lines                           |
|         text                  | string         | Yes      | Text detected in the line               |
|         **words**             | array\[object] | No       | List of words                           |
|             text              | string         | Yes      | Text detected in the word               |
|             **bounding\_box** | object         | Yes      | Bounding boxes of the words in the word |
|                 left          | float          | Yes      | Left coordinate of the bounding box     |
|                 top           | float          | Yes      | Top coordinate of the bounding box      |
|                 width         | float          | Yes      | Width of the bounding box               |
|                 height        | float          | Yes      | Height of the bounding box              |
|             confidence        | float          | Yes      | Confidence score of the word            |
|         **bounding\_box**     | object         | No       | Bounding box of the line, can be None   |
|             left              | float          | Yes      | Left coordinate of the bounding box     |
|             top               | float          | Yes      | Top coordinate of the bounding box      |
|             width             | float          | Yes      | Width of the bounding box               |
|             height            | float          | Yes      | Height of the bounding box              |
|         confidence            | float          | Yes      | Confidence of the line                  |
| number\_of\_pages             | int            | Yes      | Number of pages in the document         |

## Available Providers

| Provider  | Model String              | Price                 |
| --------- | ------------------------- | --------------------- |
| amazon    | `ocr/ocr_async/amazon`    | \$1.5 per 1,000 pages |
| microsoft | `ocr/ocr_async/microsoft` | \$10 per 1,000 pages  |
| mistral   | `ocr/ocr_async/mistral`   | \$1 per 1,000 pages   |

## Quick Start

> This is an **async** feature. The initial response returns a job ID. Poll `GET /v3/universal-ai/async/{job_id}` until the job completes.

<CodeGroup>
  ```python Python theme={null}
  import requests

  url = "https://api.edenai.run/v3/universal-ai/async"
  headers = {
      "Authorization": "Bearer YOUR_API_KEY",
      "Content-Type": "application/json"
  }

  payload = {
      "model": "ocr/ocr_async/amazon",
      "input": {
          "file": "YOUR_FILE_UUID_OR_URL"
      }
  }

  response = requests.post(url, headers=headers, json=payload)
  print(response.json())
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.edenai.run/v3/universal-ai/async \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "ocr/ocr_async/amazon",
      "input": {"file": "YOUR_FILE_UUID_OR_URL"}
    }'
  ```
</CodeGroup>
