Skip to main content
You can upload your own files such as PDFs, TXT documents, and other textual formats to Bigdata.com; they are then enriched (extraction, structure and annotation of the content) and indexed. Once uploaded, your files are enriched and indexed automatically, making them available for the Search and Research Agent endpoints. The script below uploads multiple files to Bigdata using the REST API: it reads a list of file paths, uploads each file (POST → PUT to presigned URL → poll until enrichment completes), and writes results to a CSV.
If your browser displays the python text instead of downloading it, press Ctrl+S (or Cmd+S on Mac) after the file opens.

Setup

  1. Create a virtual environment (recommended)
    python -m venv .venv
    source .venv/bin/activate   # Linux/macOS
    # or:  .venv\Scripts\activate   # Windows
    
  2. Install dependencies
    pip install -r requirements.txt
    
  3. Configure environment Copy .env to a new file if needed, then edit .env and set your Bigdata API key:
    cp .env .env.local
    # Edit .env or .env.local and set BIGDATA_API_KEY=your-api-key
    
    The script loads variables from .env in the script directory. You can also set BIGDATA_API_KEY (and optionally BIGDATA_API_BASE_URL) in your shell. For general environment setup, see Prerequisites.

Usage

Run the script with these parameters:
ParameterDescription
workdirDirectory that contains your files and where the log and result CSV will be written.
upload_txt_filenameName of a text file inside workdir that lists files to upload (one path per line; paths are relative to workdir unless absolute).
max_concurrencyNumber of files to upload in parallel (e.g. 5).
Create a list file (e.g. file_list.txt) in workdir with one filename or path per line:
report.pdf
data/other_doc.PDF

Example: run the script

From the batch_file_upload directory, with BIGDATA_API_KEY set in .env:
cd /path/to/bigdata-api-resources/how_to_guides/batch_file_upload
pip install -r requirements.txt
# Edit .env and set BIGDATA_API_KEY=your-api-key

# Run (paths in the list file are relative to workdir)
python batch_file_upload.py \
  workdir=/home/you/Documents/PDFsamples \
  upload_txt_filename=file_list.txt \
  max_concurrency=5
Or set the API key in the shell and run from any directory:
export BIGDATA_API_KEY=your-api-key
python /path/to/batch_file_upload/batch_file_upload.py \
  workdir=/home/you/Documents/PDFsamples \
  upload_txt_filename=file_list.txt \
  max_concurrency=5
The script will:
  1. Write a log file in workdir (e.g. bigdata_processing_20260312_120000.log).
  2. Write a result CSV in workdir (e.g. uploaded_file_ids_20260312_120000.csv) with columns: file_id, upload_status, file_path.
Example uploaded_file_ids_20260312_120000.csv:
file_id,upload_status,file_path
4C303FEB0B384EEB882FAF927D4F1961,UPLOAD_DONE,report.pdf
3BDBA5EBA34A4A65817954E3559476BB,UPLOAD_DONE,data/other_doc.PDF
F6FCC64ABAD64D52AC8A6864AE5F7C40,UPLOAD_ERROR,another.pdf

Environment variables

VariableRequiredDefaultDescription
BIGDATA_API_KEYYesYour Bigdata API key.
BIGDATA_API_BASE_URLNohttps://api.bigdata.comAPI base URL.
BIGDATA_RATE_LIMIT_PER_MINUTENo500Max requests per minute (should match your WAF).
BIGDATA_RATE_LIMIT_SAFETY_MARGINNo20Margin under the limit (actual cap = limit − margin).
BIGDATA_POLL_INTERVAL_SECNo10Seconds between status polls while waiting for completion.
BIGDATA_UPLOAD_MAX_RETRIESNo5Max retries per file on 429/5xx.
Variables are loaded from .env in the script folder; you can override them in the shell.