`. `null` if not found. </ResponseField> <ResponseField name="description" type="string | null"> Meta description, preferring `og:description` over `<meta name="description">`. `null` if not found. </ResponseField> <ResponseField name="language" type="string | null"> Detected page language (e.g. `"en"`). `null` if undetectable. </ResponseField> <ResponseField name="author" type="string | null"> Author from meta tags. `null` if not found. </ResponseField> <ResponseField name="published_date" type="string | null"> Publication date, if detectable. `null` if not found. </ResponseField> <ResponseField name="text" type="string | object"> Extracted page content. Format depends on the `format` request parameter: * `string` when `format` is `"html"` or `"markdown"` * `object` (document tree) when `format` is `"json"` </ResponseField> <ResponseField name="links" type="string[]"> All `<a href>` URLs on the page, resolved to absolute URLs. Only present when `links: true` was requested. </ResponseField> <ResponseField name="image_links" type="string[]"> All `<img src>` URLs on the page, resolved to absolute URLs. Only present when `image_links: true` was requested. </ResponseField> <ResponseField name="latency_ms" type="number | null"> Time to fetch and extract this URL, in milliseconds. `null` if unavailable. </ResponseField> <ResponseField name="format" type="string"> Output format used for the `text` field. Echoes the request `format` parameter (`"markdown"`, `"html"`, or `"json"`). </ResponseField> <Note> Fields that could not be extracted (`title`, `description`, `language`, `author`, `published_date`) are returned as `null`. </Note> ### `errors[]` One entry per URL that could not be fetched. Always present, may be empty. Per-URL failures do not affect the rest of the batch. <ResponseField name="url" type="string"> The URL that failed. </ResponseField> <ResponseField name="error" type="string"> Structured error code identifying the failure type. One of: `target_http_error`, `page_not_found`, `target_unreachable`, `timeout`, `bot_blocked`, `empty_content`, `invalid_url`, `invalid_redirect_url`, `proxy_error`. </ResponseField> <ResponseField name="status" type="number (optional)"> Upstream HTTP status code. Present when `error` is `target_http_error` or `page_not_found`. </ResponseField> ## SDK Methods <CodeGroup> ```python Python theme={null} from tinyfish import TinyFish client = TinyFish() result = client.fetch.get_contents( urls=["https://www.tinyfish.ai/"], format="markdown", ) for page in result.results: print(page.title, "→", page.text[:100]) ``` ```typescript TypeScript theme={null} import { TinyFish } from "@tiny-fish/sdk"; const client = new TinyFish(); const result = await client.fetch.getContents({ urls: ["https://www.tinyfish.ai/"], format: "markdown", }); result.results.forEach((page) => console.log(page.title, "→", page.text.slice(0, 100))); ``` </CodeGroup> *** ## Error Codes HTTP-level errors apply to the entire request. | Status | Meaning | | ------ | -------------------------------------------------------------------------------- | | `400` | Invalid request — missing `urls`, too many URLs (max 10), or bad parameter value | | `401` | Missing or invalid API key | | `429` | Rate limit exceeded | | `500` | Internal server error | Per-URL errors appear in `errors[]` alongside a `200` response. The `error` field is one of these codes: | Error code | `status` field | Meaning | | ---------------------- | ------------------------------------ | ----------------------------------------------------------------------- | | `target_http_error` | HTTP status code (e.g. `403`, `500`) | Target server returned a non-2xx HTTP response other than 404/410 | | `page_not_found` | `404` or `410` | Target URL does not exist (HTTP 404 Not Found or 410 Gone) | | `target_unreachable` | — | Connection refused, TLS failure, DNS failure, or other network error | | `timeout` | — | Page did not finish loading within the request deadline | | `bot_blocked` | — | Site returned a bot-protection challenge (Cloudflare, Incapsula) | | `empty_content` | — | Browser returned HTML but no extractable text found | | `invalid_url` | — | URL rejected before fetch (private IP, invalid scheme, disallowed host) | | `invalid_redirect_url` | — | Redirect target rejected before fetch (private IP or disallowed host) | | `proxy_error` | — | Proxy tunnel failed — site may be reachable directly | <Note> Per-URL fetch failures are **not** HTTP errors. They appear as entries in `errors[]` alongside a `200` response. </Note> <Note> Each URL has a **110-second backend timeout**. If the page doesn't respond within 110 seconds, that URL returns a `timeout` error in `errors[]` while the rest of the batch continues. Requests are also subject to a **120-second CDN ceiling** for the full batch. Set your client-side timeout to at least **150 seconds** to receive CDN timeout errors cleanly. </Note> *** ## Supported Content Types | Content Type | Behavior | | ----------------- | ------------------------------------------------------------------ | | HTML | Full text extraction with formatting | | PDF | Text content extracted | | JSON | Raw JSON returned as text | | Plain text | Full text returned | | Images (PNG, JPG) | Not supported — returns an error indicating no extractable content | *** ## Usage Endpoint Retrieve a paginated history of your fetch operations. ``` GET https://api.fetch.tinyfish.ai/usage ``` All requests require an `X-API-Key` header. See [Authentication](/authentication). ### Query Parameters <ParamField query="start_after" type="string (ISO 8601)"> Filter results created after this timestamp. Example: `2026-01-01T00:00:00Z` </ParamField> <ParamField query="end_before" type="string (ISO 8601)"> Filter results created before this timestamp. Example: `2026-02-01T00:00:00Z` </ParamField> <ParamField query="status" type="string"> Filter by result status. One of: `completed`, `failed`. </ParamField> <ParamField query="limit" type="integer" default="100"> Maximum number of items per page. Range: 1-1000. </ParamField> <ParamField query="page" type="integer" default="1"> Page number for pagination. </ParamField> ### Response ```json theme={null} { "items": [...], "total": 42, "limit": 100, "page": 1, "total_pages": 1, "has_more": false } ``` ### `items[]` <ResponseField name="id" type="string"> Unique identifier for the fetch result. </ResponseField> <ResponseField name="url" type="string"> The original requested URL. </ResponseField> <ResponseField name="final_url" type="string"> The URL after any redirects. </ResponseField> <ResponseField name="title" type="string | null"> Page title, if detected. </ResponseField> <ResponseField name="description" type="string | null"> Meta description, if detected. </ResponseField> <ResponseField name="language" type="string | null"> Detected page language (e.g. `"en"`). </ResponseField> <ResponseField name="author" type="string | null"> Page author, if detected. </ResponseField> <ResponseField name="published_date" type="string | null"> Published date, if detected. </ResponseField> <ResponseField name="format" type="string"> The format used for extraction: `markdown`, `html`, or `json`. </ResponseField> <ResponseField name="status" type="string"> Result status: `completed` or `failed`. </ResponseField> <ResponseField name="request_origin" type="string"> Where the request originated: `api`, `cli`, `python-sdk`, `js-sdk`, `mcp`, etc. </ResponseField> <ResponseField name="request_id" type="string | null"> The request ID that grouped this URL with others in a batch. </ResponseField> <ResponseField name="text_length" type="integer | null"> Length of the extracted text content in characters. The full text is not included in usage responses. </ResponseField> <ResponseField name="links_count" type="integer"> Number of links found on the page. </ResponseField> <ResponseField name="image_links_count" type="integer"> Number of image links found on the page. </ResponseField> <ResponseField name="latency_ms" type="number | null"> Time taken to fetch and extract the page, in milliseconds. </ResponseField> <ResponseField name="created_at" type="string (ISO 8601)"> Timestamp when the fetch was executed. </ResponseField> <ResponseField name="error" type="string | null"> Error message if the fetch failed. `null` for successful fetches. </ResponseField> ### Error Codes | Status | Meaning | | ------ | -------------------------- | | `400` | Invalid query parameters | | `401` | Missing or invalid API key | | `500` | Internal server error | *** ## Rate Limits Limits apply per API key, measured in URLs per minute across all requests. | Plan | URLs / minute | | ------------- | ------------- | | Free | 150 | | Pay As You Go | 150 | | Starter | 300 | | Pro | 600 | When the limit is exceeded, the API returns `HTTP 429`. *** ## Billing Fetch does not use credits. *** ## Related <CardGroup cols={2}> <Card title="Fetch Overview" icon="bolt" href="/fetch-api"> First request, response shape, and product routing </Card> <Card title="Authentication" icon="key" href="/authentication"> API key setup and troubleshooting </Card> <Card title="Error Codes" icon="triangle-exclamation" href="/error-codes"> Full list of API error codes </Card> </CardGroup>

> ## Documentation Index > Fetch the complete documentation index at: https://docs.tinyfish.ai/llms.txt > Use this file to discover all available pages before exploring further. # Fetch API Reference > Complete reference for the Fetch API endpoint ## Endpoint ``` POST https://api.fetch.tinyfish.ai ``` All requests require an `X-API-Key` header. See [Authentication](/authentication). *** ## Request ```json theme={null} { "urls": ["https://example.com"], "format": "html", "links": false, "image_links": false, "ttl": 3600, "per_url_timeout_ms": 45000 } ``` ### Parameters URLs to fetch and extract. Maximum 10 URLs per request. All URLs must use `http` or `https`. Private IP addresses, localhost, and cloud metadata endpoints are rejected. Output format for the `text` field in each result. One of: * `html` — semantic HTML * `markdown` — clean Markdown, recommended for LLMs (default) * `json` — structured document tree When `true`, include all `` URLs found on the page in the `links` field. When `true`, include all `` URLs found on the page in the `image_links` field. Cache freshness tolerance in seconds. * Omit `ttl` to accept any cached entry. * Set `ttl` to `0` when you want a live fetch. * Set `ttl` to a positive integer to accept cached entries younger than that many seconds. Per-URL wall-clock timeout budget in milliseconds. Must be between `1` and `110000`. If a URL exceeds this budget, that URL returns a `timeout` error in `errors[]` while other URLs in the same request can still complete. *** ## Response ```json theme={null} { "results": [...], "errors": [...] } ``` ### `results[]` One entry per successfully fetched URL. The original requested URL. The URL after any redirects. May differ from `url`. Page title, preferring `og:title` over ``. `null` if not found. </ResponseField> <ResponseField name="description" type="string | null"> Meta description, preferring `og:description` over `<meta name="description">`. `null` if not found. </ResponseField> <ResponseField name="language" type="string | null"> Detected page language (e.g. `"en"`). `null` if undetectable. </ResponseField> <ResponseField name="author" type="string | null"> Author from meta tags. `null` if not found. </ResponseField> <ResponseField name="published_date" type="string | null"> Publication date, if detectable. `null` if not found. </ResponseField> <ResponseField name="text" type="string | object"> Extracted page content. Format depends on the `format` request parameter: * `string` when `format` is `"html"` or `"markdown"` * `object` (document tree) when `format` is `"json"` </ResponseField> <ResponseField name="links" type="string[]"> All `<a href>` URLs on the page, resolved to absolute URLs. Only present when `links: true` was requested. </ResponseField> <ResponseField name="image_links" type="string[]"> All `<img src>` URLs on the page, resolved to absolute URLs. Only present when `image_links: true` was requested. </ResponseField> <ResponseField name="latency_ms" type="number | null"> Time to fetch and extract this URL, in milliseconds. `null` if unavailable. </ResponseField> <ResponseField name="format" type="string"> Output format used for the `text` field. Echoes the request `format` parameter (`"markdown"`, `"html"`, or `"json"`). </ResponseField> <Note> Fields that could not be extracted (`title`, `description`, `language`, `author`, `published_date`) are returned as `null`. </Note> ### `errors[]` One entry per URL that could not be fetched. Always present, may be empty. Per-URL failures do not affect the rest of the batch. <ResponseField name="url" type="string"> The URL that failed. </ResponseField> <ResponseField name="error" type="string"> Structured error code identifying the failure type. One of: `target_http_error`, `page_not_found`, `target_unreachable`, `timeout`, `bot_blocked`, `empty_content`, `invalid_url`, `invalid_redirect_url`, `proxy_error`. </ResponseField> <ResponseField name="status" type="number (optional)"> Upstream HTTP status code. Present when `error` is `target_http_error` or `page_not_found`. </ResponseField> ## SDK Methods <CodeGroup> ```python Python theme={null} from tinyfish import TinyFish client = TinyFish() result = client.fetch.get_contents( urls=["https://www.tinyfish.ai/"], format="markdown", ) for page in result.results: print(page.title, "→", page.text[:100]) ``` ```typescript TypeScript theme={null} import { TinyFish } from "@tiny-fish/sdk"; const client = new TinyFish(); const result = await client.fetch.getContents({ urls: ["https://www.tinyfish.ai/"], format: "markdown", }); result.results.forEach((page) => console.log(page.title, "→", page.text.slice(0, 100))); ``` </CodeGroup> *** ## Error Codes HTTP-level errors apply to the entire request. | Status | Meaning | | ------ | -------------------------------------------------------------------------------- | | `400` | Invalid request — missing `urls`, too many URLs (max 10), or bad parameter value | | `401` | Missing or invalid API key | | `429` | Rate limit exceeded | | `500` | Internal server error | Per-URL errors appear in `errors[]` alongside a `200` response. The `error` field is one of these codes: | Error code | `status` field | Meaning | | ---------------------- | ------------------------------------ | ----------------------------------------------------------------------- | | `target_http_error` | HTTP status code (e.g. `403`, `500`) | Target server returned a non-2xx HTTP response other than 404/410 | | `page_not_found` | `404` or `410` | Target URL does not exist (HTTP 404 Not Found or 410 Gone) | | `target_unreachable` | — | Connection refused, TLS failure, DNS failure, or other network error | | `timeout` | — | Page did not finish loading within the request deadline | | `bot_blocked` | — | Site returned a bot-protection challenge (Cloudflare, Incapsula) | | `empty_content` | — | Browser returned HTML but no extractable text found | | `invalid_url` | — | URL rejected before fetch (private IP, invalid scheme, disallowed host) | | `invalid_redirect_url` | — | Redirect target rejected before fetch (private IP or disallowed host) | | `proxy_error` | — | Proxy tunnel failed — site may be reachable directly | <Note> Per-URL fetch failures are **not** HTTP errors. They appear as entries in `errors[]` alongside a `200` response. </Note> <Note> Each URL has a **110-second backend timeout**. If the page doesn't respond within 110 seconds, that URL returns a `timeout` error in `errors[]` while the rest of the batch continues. Requests are also subject to a **120-second CDN ceiling** for the full batch. Set your client-side timeout to at least **150 seconds** to receive CDN timeout errors cleanly. </Note> *** ## Supported Content Types | Content Type | Behavior | | ----------------- | ------------------------------------------------------------------ | | HTML | Full text extraction with formatting | | PDF | Text content extracted | | JSON | Raw JSON returned as text | | Plain text | Full text returned | | Images (PNG, JPG) | Not supported — returns an error indicating no extractable content | *** ## Usage Endpoint Retrieve a paginated history of your fetch operations. ``` GET https://api.fetch.tinyfish.ai/usage ``` All requests require an `X-API-Key` header. See [Authentication](/authentication). ### Query Parameters <ParamField query="start_after" type="string (ISO 8601)"> Filter results created after this timestamp. Example: `2026-01-01T00:00:00Z` </ParamField> <ParamField query="end_before" type="string (ISO 8601)"> Filter results created before this timestamp. Example: `2026-02-01T00:00:00Z` </ParamField> <ParamField query="status" type="string"> Filter by result status. One of: `completed`, `failed`. </ParamField> <ParamField query="limit" type="integer" default="100"> Maximum number of items per page. Range: 1-1000. </ParamField> <ParamField query="page" type="integer" default="1"> Page number for pagination. </ParamField> ### Response ```json theme={null} { "items": [...], "total": 42, "limit": 100, "page": 1, "total_pages": 1, "has_more": false } ``` ### `items[]` <ResponseField name="id" type="string"> Unique identifier for the fetch result. </ResponseField> <ResponseField name="url" type="string"> The original requested URL. </ResponseField> <ResponseField name="final_url" type="string"> The URL after any redirects. </ResponseField> <ResponseField name="title" type="string | null"> Page title, if detected. </ResponseField> <ResponseField name="description" type="string | null"> Meta description, if detected. </ResponseField> <ResponseField name="language" type="string | null"> Detected page language (e.g. `"en"`). </ResponseField> <ResponseField name="author" type="string | null"> Page author, if detected. </ResponseField> <ResponseField name="published_date" type="string | null"> Published date, if detected. </ResponseField> <ResponseField name="format" type="string"> The format used for extraction: `markdown`, `html`, or `json`. </ResponseField> <ResponseField name="status" type="string"> Result status: `completed` or `failed`. </ResponseField> <ResponseField name="request_origin" type="string"> Where the request originated: `api`, `cli`, `python-sdk`, `js-sdk`, `mcp`, etc. </ResponseField> <ResponseField name="request_id" type="string | null"> The request ID that grouped this URL with others in a batch. </ResponseField> <ResponseField name="text_length" type="integer | null"> Length of the extracted text content in characters. The full text is not included in usage responses. </ResponseField> <ResponseField name="links_count" type="integer"> Number of links found on the page. </ResponseField> <ResponseField name="image_links_count" type="integer"> Number of image links found on the page. </ResponseField> <ResponseField name="latency_ms" type="number | null"> Time taken to fetch and extract the page, in milliseconds. </ResponseField> <ResponseField name="created_at" type="string (ISO 8601)"> Timestamp when the fetch was executed. </ResponseField> <ResponseField name="error" type="string | null"> Error message if the fetch failed. `null` for successful fetches. </ResponseField> ### Error Codes | Status | Meaning | | ------ | -------------------------- | | `400` | Invalid query parameters | | `401` | Missing or invalid API key | | `500` | Internal server error | *** ## Rate Limits Limits apply per API key, measured in URLs per minute across all requests. | Plan | URLs / minute | | ------------- | ------------- | | Free | 150 | | Pay As You Go | 150 | | Starter | 300 | | Pro | 600 | When the limit is exceeded, the API returns `HTTP 429`. *** ## Billing Fetch does not use credits. *** ## Related <CardGroup cols={2}> <Card title="Fetch Overview" icon="bolt" href="/fetch-api"> First request, response shape, and product routing </Card> <Card title="Authentication" icon="key" href="/authentication"> API key setup and troubleshooting </Card> <Card title="Error Codes" icon="triangle-exclamation" href="/error-codes"> Full list of API error codes </Card> </CardGroup>