Use Case

Better LLM answers start with better input

Turn PDF documents into token-efficient Markdown that preserves structure. Get better answers from GPT, Claude, and open-source models by feeding them properly formatted source material.

Get 50 Free Pages

No credit card required

Raw PDF text vs. structured Markdown

What you feed the model determines what you get back. Real examples from actual document types.

Tables

An invoice line-item table. Raw extraction scrambles rows and columns -- the LLM guesses at relationships and gets them wrong.

Raw extraction
Item   Description     Unit Price
Widget A   Industrial widget   $24.99
per unit   Widget B   Premium
widget with coating  $42.50 per
unit   Shipping  Flat rate
$15.00   Total      $1,299.49
ParseBridge output
| Item     | Description                    | Unit Price |
|----------|--------------------------------|------------|
| Widget A | Industrial widget              | $24.99     |
| Widget B | Premium widget with coating    | $42.50     |
| Shipping | Flat rate                      | $15.00     |
| **Total**|                                | **$1,299.49** |

Document structure

A service contract. Without headings, the model has no sense of what section it is reading or how to navigate the document.

Raw extraction
SERVICE AGREEMENT This Agreement is entered
into as of January 15, 2024. DEFINITIONS
"Service Provider" means Acme Corp. "Client"
means the undersigned party. PAYMENT TERMS
Payment is due within 30 days of invoice date.
Late payments accrue interest at 1.5% per month.
TERMINATION Either party may terminate with 90
days written notice.
ParseBridge output
# Service Agreement

This Agreement is entered into as of January 15, 2024.

## Definitions

- **"Service Provider"** means Acme Corp.
- **"Client"** means the undersigned party.

## Payment Terms

Payment is due within 30 days of invoice date.
Late payments accrue interest at 1.5% per month.

## Termination

Either party may terminate with 90 days written notice.

Fewer tokens, more content

Raw extraction wastes tokens on whitespace artifacts, repeated headers/footers, and rendering noise. Clean Markdown means more of your context window goes to actual document content -- so you can fit more pages into a single prompt or spend fewer tokens per query.

What you can build

Document Q&A

Ask questions about contracts, reports, or manuals and get answers that reference specific sections and table rows.

Summarization

Generate summaries that respect document hierarchy instead of flattening everything into a single blob.

Data extraction

Pull structured data from invoices, financial statements, and forms with tables that come through as actual tables.

Multi-document analysis

Compare clauses across contracts or reconcile data across reports in a single prompt.

PDF to LLM in two API calls

Parse the document with ParseBridge, inject the Markdown into your prompt. Works with any model that accepts text.

import OpenAI from "openai";

// 1. Parse the PDF into structured Markdown
const res = await fetch("https://api.parsebridge.com/v1/parse/url", {
  method: "POST",
  headers: {
    Authorization: "Bearer pb_your_api_key",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://example.com/contract.pdf",
  }),
});
const { markdown } = await res.json();

// 2. Feed the Markdown into your LLM prompt
const openai = new OpenAI();
const completion = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    {
      role: "user",
      content: `Extract all payment terms and deadlines
from this contract:\n\n${markdown}`,
    },
  ],
});

console.log(completion.choices[0].message.content);
Try It Free

50 free pages, no credit card required. View the API docs