You can upload your own files such as PDFs, TXT documents, and other textual formats to Bigdata.com; they are then enriched (extraction, structure and annotation of the content) and indexed.
Once uploaded, your files are enriched and indexed automatically, making them available for the Search and Research Agent endpoints.
The script below uploads multiple files to Bigdata using the REST API: it reads a list of file paths, uploads each file (POST → PUT to presigned URL → poll until enrichment completes), and writes results to a CSV.
If your browser displays the python text instead of downloading it, press Ctrl+S (or Cmd+S on Mac) after the file opens.
Setup
-
Create a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# or: .venv\Scripts\activate # Windows
-
Install dependencies
pip install -r requirements.txt
-
Configure environment
Copy
.env to a new file if needed, then edit .env and set your Bigdata API key:
cp .env .env.local
# Edit .env or .env.local and set BIGDATA_API_KEY=your-api-key
The script loads variables from .env in the script directory. You can also set BIGDATA_API_KEY (and optionally BIGDATA_API_BASE_URL) in your shell.
For general environment setup, see Prerequisites.
Usage
Run the script with these parameters:
| Parameter | Description |
|---|
| workdir | Directory that contains your files and where the log and result CSV will be written. |
| upload_txt_filename | Name of a text file inside workdir that lists files to upload (one path per line; paths are relative to workdir unless absolute). |
| max_concurrency | Number of files to upload in parallel (e.g. 5). |
Create a list file (e.g. file_list.txt) in workdir with one filename or path per line:
report.pdf
data/other_doc.PDF
Example: run the script
From the batch_file_upload directory, with BIGDATA_API_KEY set in .env:
cd /path/to/bigdata-api-resources/how_to_guides/batch_file_upload
pip install -r requirements.txt
# Edit .env and set BIGDATA_API_KEY=your-api-key
# Run (paths in the list file are relative to workdir)
python batch_file_upload.py \
workdir=/home/you/Documents/PDFsamples \
upload_txt_filename=file_list.txt \
max_concurrency=5
Or set the API key in the shell and run from any directory:
export BIGDATA_API_KEY=your-api-key
python /path/to/batch_file_upload/batch_file_upload.py \
workdir=/home/you/Documents/PDFsamples \
upload_txt_filename=file_list.txt \
max_concurrency=5
The script will:
- Write a log file in
workdir (e.g. bigdata_processing_20260312_120000.log).
- Write a result CSV in
workdir (e.g. uploaded_file_ids_20260312_120000.csv) with columns: file_id, upload_status, file_path.
Example uploaded_file_ids_20260312_120000.csv:
file_id,upload_status,file_path
4C303FEB0B384EEB882FAF927D4F1961,UPLOAD_DONE,report.pdf
3BDBA5EBA34A4A65817954E3559476BB,UPLOAD_DONE,data/other_doc.PDF
F6FCC64ABAD64D52AC8A6864AE5F7C40,UPLOAD_ERROR,another.pdf
Environment variables
| Variable | Required | Default | Description |
|---|
BIGDATA_API_KEY | Yes | — | Your Bigdata API key. |
BIGDATA_API_BASE_URL | No | https://api.bigdata.com | API base URL. |
BIGDATA_RATE_LIMIT_PER_MINUTE | No | 500 | Max requests per minute (should match your WAF). |
BIGDATA_RATE_LIMIT_SAFETY_MARGIN | No | 20 | Margin under the limit (actual cap = limit − margin). |
BIGDATA_POLL_INTERVAL_SEC | No | 10 | Seconds between status polls while waiting for completion. |
BIGDATA_UPLOAD_MAX_RETRIES | No | 5 | Max retries per file on 429/5xx. |
Variables are loaded from .env in the script folder; you can override them in the shell.