Downloads
The BDDF schema defines all supported fields and their data types. You can download it and the example files below:- Latest BDDF JSON Schema - defines all supported fields and data types - download
- Expert Interview Example - sample BDDF file for expert-interview content - download
- Report Example - sample BDDF file for report-type content - download
Tips & Tricks
A few tips to make your content easier to process — both for humans reading it and for machines working behind the scenes: Be mindful of paragraphs Well-structured paragraphs improve readability and also give NLP (Natural Language Processing) systems more context to work with.- For
text/plainandtext/markdown, we recommend separating paragraphs with two newlines(\n\n). - For
application/html, use proper<p>tags.
contentBlock for the title should usually just be a short, clear string — plain text is best here rather than formatted markup.
Use the right content_type for complex content
If you need to express rich structures like tables, lists, or headings, don’t force them into plain text. Use either:
- application/html — if you want precise control with tags, or
- text/markdown — if you prefer lightweight formatting.
contentBlock. However there are scenarios in which it is recommended to split the body into multiple content blocks:
- When the document has sections
- Break the body into one
contentBlockper section. - Use the
sectionfield to identify where each block belongs.
- Break the body into one
- When the document is paginated
- For paged formats (like PDFs), create one
contentBlockper page. - Use the
pagesfield to indicate which page(s) the content came from.
Simple but correct example
Simple but correct example
- Text is already formatted in markdown
- No section or page metadata
- No change in content_type
Different content types mixed - separate content blocks expected
Different content types mixed - separate content blocks expected
- The table of contents uses Markdown formatting, while the rest of the text is plain text, so a separate content block is needed.
Avoid spanning multiple pages ❌
Avoid spanning multiple pages ❌
- As a general rule, start a new content block at the beginning of each new page.
- Tables are an exception to this rule, they should remain in a single block even if they span multiple pages.
Quick Links
- Onboarding Overview - typical onboarding flow
- Quick Start Guide - step by step guide to your first BDDF file
- Bigdata Document Format - in-depth schema definition with examples

