Bigdata lets you query external data, and also upload your own content for search and analysis.

Private & Secure: No LLM training on your data

The following method allows you to upload files from disk using the Python SDK:

from bigdata_client import Bigdata

bigdata = Bigdata()
file = bigdata.uploads.upload_from_disk('path/to/file')

The file object returned is a bigdata_client.models.uploads.File object, which contains:

id: The unique identifier of the file.
name: The name of the file. It will be set to the name of the original file in the disk.
status: The status of the file. Check bigdata_client.file_status.FileStatus for the list of possible statuses.
uploaded_at: The datetime when the file was uploaded, according to the server.
raw_size: The size of the file in bytes.

Besides the path, the upload_from_disk() method also accepts the following optional parameters:

provider_document_id: Allows you to assign a specific ID to your document which will be available as provider_document_id in the metadata node of the annotated.json. It is useful in case you want to co-relate your own ids with the ones provided by Bigdata.
provider_date_utc: Allows you to assign a specific timestamp (a string with YYYY-MM-DD hh:mm:ss format or a datetime) to your document. This will modify the document published date, allowing us to better assign a reporting date to detected events.
primary_entity: You can specify a “Primary Entity” to boost entity recognition in your document. When a primary entity is set for a document, it increases the chances to detect events even when the entity is not explicitly mentioned. Setting a primary entity is optional and you can use either a name or the corresponding rp_entity_id.
skip_metadata: If True, it will upload the file but not retrieve its metadata. Recommended for bulk uploads. It is False by default.

file = bigdata.uploads.upload_from_disk(
  'path/to/file',
  provider_document_id='my_document_id',
  provider_date_utc='2022-01-01 12:00:00',
  primary_entity='Apple Inc.',
  skip_metadata=True
)

Once a file is uploaded, the service will analyze it and index it to the Vector Database, so it becomes available for tagging, sharing, and querying with both the Chat and Search services. Please use the following method to wait for the file to be fully processed:

file.wait_for_completion()

If your company disabled the indexing step, because you are only interested in the analytics, you can use the following method to wait for the file to be fully analyzed:

file.wait_for_analysis_complete()

By default, these methods will wait 2400 seconds (40 minutes) until the file is processed. If you want to customize the time you are willing to wait, you can pass a timeout parameter to the method.After the timeout (In seconds) is reached, the method will raise a TimeoutError exception:

file.wait_for_completion(timeout=300)

The platform must format the original file with extra metadata before processing it, and the maximum file size after that normalization is 10MB.

Tag uploaded files

You can modify file tags using the add_tags(), remove_tags(), and set_tags() methods of the File class objects. The file object may come from the list(), get(), or upload_from_disk() methods.

Add Tag

To add a tag to a file, use the add_tags() method. You can add a single tag or a list of tags.

file = bigdata.uploads.get("4DC8AF5500AD4EB0A360D0C7BD6F9286")
print(file.tags)
>>> []

file.add_tags(["New Tag"])
print(file.tags)
>>> ["New Tag"]

file.add_tags(["New Tag 2", "New Tag 3"])
print(file.tags)
>>> ["New Tag", "New Tag 2", "New Tag 3"]

Remove Tag

To remove a tag from a file, use the remove_tags() method. You can remove a single tag or a list of tags.

file.remove_tags(["New Tag"])
print(file.tags)
>>> ["New Tag 2", "New Tag 3"]

# To remove all tags from a file
file.remove_tags(file.tags)
print(file.tags)
>>> []

Set Tags

To replace all tags with new ones, use the set_tags() method. This operation is permanent and replaces all existing tags.

file.set_tags(["Final Tag"])
print(file.tags)
>>> ["Final Tag"]

file.set_tags(["New Final Tag 1", "New Final Tag 2"])
print(file.tags)
>>> ["New Final Tag 1", "New Final Tag 2"]

List my tags

You can find all of the tags used across your own files and list them with list_my_tags() method.

bigdata.uploads.list_my_tags()
>>> ["New Final Tag 1", "New Final Tag 2"]

Files shared with you can also have their own tags. In order to find all these tags and list them use list_tags_shared_with_me() method.

bigdata.uploads.list_tags_shared_with_me()
>>> ["Tag set by another user", "Another tag set by another user"]

Working with your files

To list all the files that have been uploaded to the server, you can use the list() method:

files = bigdata.uploads.list()
for file in files:
   print(file)

In case you have many files, you must iterate over the results:

for n in itertools.count(start=1):
    files = bigdata_cli.uploads.list(page_number=n)
    do_stuff_with_files(files)
    if not files:
        break

Where the output contains the ID, file size, upload date, and name of the file:

C48410DA1AEE439ABAA0619F272B67F4  123 Jan  1 2021 My First Document.pdf,
BE61DA39E0F540A599E958BBEB9BA3D5   1K Feb 10 2023 Document_2.txt,
687A8B473E654416A0C19CD79EE77413 120K Jul 31 2024 Document-3.docx,
F1345B07DDE145CAB30C08CC01B393D6 1.2M Dec 31 2024 Another file.docx,
3A56AC4B2BCB42FEA7B0AF062FE78534 1.1G Apr 10 2024 The last file.pdf,

Additionally, you can get a file by its ID:

file = bigdata.uploads.get("<document_id>")
print(file)

# C48410DA1AEE439ABAA0619F272B67F4  123 Jan  1 2021 My First Document.pdf

Once your files are processed, you can download 3 different versions of the file:

The original file, by calling the download_original() method of the file object.
The annotated version of the file, by calling the download_annotated() method of the file object. This is a JSON file containing the text together with the detections made by the system.
The analytics version of the file, by calling the download_analytics() method of the file object. This is a JSON file containing the analytics created by the system.

file.download_original('path/to/save/original_file')
file.download_annotated('path/to/save/annotated_file.json')
file.download_analytics('path/to/save/analytics_file.json')

Additionally, you can get the annotations directly as a python dictionary by calling the get_<file_type>_dict() method:

annotations = file.get_annotated_dict()
print(annotations)

analytics = file.get_analytics_dict()
print(analytics)

You can share your private content with other members of your organization. This allows your colleagues to find the documents you share in their search results. To share a document, use the share_with_company method. For example:

file = bigdata.uploads.get("<document_id>")
file.share_with_company()  # Option 1. Operating on the object in memory

After sharing, the company_shared_permission attribute of the search object will be set to SharePermission.READ.

bigdata.uploads.share_with_company("<document_id>")  # Option 2. Operating on the file ID

To unshare a file, use the unshare_with_company method:

file = bigdata.uploads.get("<document_id>")
file.unshare_with_company()  # Option 1. Operating on the object in memory

bigdata.uploads.unshare_with_company("<document_id>")  # Option 2. Operating on the file ID

To list all the files that have been shared with you, please refer to list_shared() method:

files = bigdata.uploads.list_shared()
for file in files:
   print(file)

In case you have many files, you must iterate over the results:

for n in itertools.count(start=1):
    files = bigdata_cli.uploads.list_shared(page_number=n)
    print(files)
    if not files:
        break

The same operations to download the original version of the file, the annotated structure, and its analytics are also available for the shared files.

Deleting uploaded files

To delete a file, you can use the delete() method of the file object, where the object may be coming from the list() method, from the get() method, or from the upload_from_disk() method:

files = []
for n in itertools.count(start=1):
 files_in_page = bigdata_cli.uploads.list_shared(page_number=n)
 files.extend(files_in_page)
 if not files_in_page:
     break

for i, file in enumerate(files):
   print(f"{i} {file}")

print(f"Enter the file row number to delete: [0 - {len(files)-1}]")
file_id = int(input())
if file_id > 0:
   file = files[file_id]
   file.delete()

# The file is now deleted and bigdata.uploads.get() will raise an exception since the file does not exist anymore

Note that deleting a file is a permanent operation and cannot be undone.

Another way to delete a file, if we know the ID, is to use the delete() method of the Uploads object. This will avoid the need to get the file object first:

bigdata.uploads.delete("<document_id>")

Only files that are in the COMPLETED or FAILED can be deleted. Attempting to delete a file that is still being processed will raise an exception. To avoid this, you can use the wait_for_completion() method:

file = bigdata.uploads.upload_from_disk('path/to/file')

# Wait for the file to be processed
file.wait_for_completion()
file.delete()

Getting Started

Research Service

Search Service

Upload proprietary content

Knowledge Graph

Watchlist

Troubleshooting

Upload your own content

Tag uploaded files

Add Tag

Remove Tag

Set Tags

List my tags

Working with your files

Deleting uploaded files

Getting Started

Research Service

Search Service

Upload proprietary content

Knowledge Graph

Watchlist

Troubleshooting

​Tag uploaded files

​Add Tag

​Remove Tag

​Set Tags

​List my tags

​List tags shared with me

​Working with your files

​Sharing Private Content

​Deleting uploaded files

Tag uploaded files

Add Tag

Remove Tag

Set Tags

List my tags

List tags shared with me

Working with your files

Sharing Private Content

Deleting uploaded files