Skip to main content

Downloads

The BDDF schema defines all supported fields and their data types. You can download it and the example files below:
  • Latest BDDF JSON Schema - defines all supported fields and data types - download
  • Expert Interview Example - sample BDDF file for expert-interview content - download
  • Report Example - sample BDDF file for report-type content - download

Tips & Tricks

A few tips to make your content easier to process — both for humans reading it and for machines working behind the scenes: Be mindful of paragraphs Well-structured paragraphs improve readability and also give NLP (Natural Language Processing) systems more context to work with.
  • For text/plain and text/markdown, we recommend separating paragraphs with two newlines (\n\n).
  • For application/html, use proper <p> tags.
Keep the title simple The contentBlock for the title should usually just be a short, clear string — plain text is best here rather than formatted markup. Use the right content_type for complex content If you need to express rich structures like tables, lists, or headings, don’t force them into plain text. Use either:
  • application/html — if you want precise control with tags, or
  • text/markdown — if you prefer lightweight formatting.
When to split the body into multiple content blocks Technically, you could put an entire document into a single contentBlock. However there are scenarios in which it is recommended to split the body into multiple content blocks:
  • When the document has sections
    • Break the body into one contentBlock per section.
    • Use the section field to identify where each block belongs.
  • When the document is paginated
  • For paged formats (like PDFs), create one contentBlock per page.
  • Use the pages field to indicate which page(s) the content came from.
Imagine you have a job posting saved as an HTML web page - you can send it as a single content block, as long as there’s no metadata that applies only to specific parts. However, if you can identify sections like “About Employer,” “Skills Required,” “Responsibilities,” or “Salary Information,” please split the content accordingly and include the relevant metadata for each section in the designated fields. Similarly, if you have a collection of PDF filings that you’re converting to markdown, instead of sending the entire markdown document as one piece, please break it down by pages and add the appropriate page information in the designated fields. Here are a few examples:
  • Text is already formatted in markdown
  • No section or page metadata
  • No change in content_type
"content": {
  "title": {
    "content_type": "text/plain",
    "value": "Vireon Labs Secures $22M Funding",
    "role": "HEADING"
  },
  "body": {
    "content_type": "text/markdown",
    "value": "**Vireon Labs Secures $22M Funding**\n\n*September 16, 2025* — Clean-tech startup **Vireon Labs** raised **$22 million** in Series A funding led by **BlueRock Ventures**. The company aims to advance its AI-powered battery recycling technology.\n\nVireon Labs plans to expand pilot programs in the U.S. and Europe, aiming to significantly reduce battery waste.",
    "role": "NORMAL"
  }
}
  • The table of contents uses Markdown formatting, while the rest of the text is plain text, so a separate content block is needed.
"content": {
  "title": {
    "content_type": "text/plain",
    "value": "Fashion Inc. – Fiscal Year Ended December 31, 2024",
    "role": "HEADING"
  },
  "body": [
    {
    "content_type": "text/markdown",
    "value": "### Table of Contents\n\n1. Introduction  \n2. Strategic Outlook  \n3. Leadership Overview  \n4. Financial Highlights",
    "role": "NORMAL",
    },
    {
    "content_type": "text/plain",
    "value": "Fashion Inc., a global apparel and lifestyle brand, continued to expand its market presence in 2024 through digital transformation, sustainable sourcing, and entry into new retail partnerships. This filing provides an overview of company performance, management decisions, and future expectations.",
    "role": "NORMAL",
    }
  ]
}
  • As a general rule, start a new content block at the beginning of each new page.
  • Tables are an exception to this rule, they should remain in a single block even if they span multiple pages.
Don’t do this:
"pages": [
          7,
          8,
          9,
          10,
          11,
          12,
          13
  ]