Query filters

Entity

A filter to match an entity by its “EntityID”. Utilize the methods provided in Knowledge Graph to identify entities/topics/sources of interest and use the obtained IDs to build queries. Example:

curl -X POST 'https://api.bigdata.com/v1/search' \
  -H 'Content-Type: application/json' \
  -H 'X-API-KEY: <your-api-key>' \
  --data '{
    "query": {
      "filters": {
        "entity": {
          "any_of": [
            "228D42",
            "D8442A"
          ]
        }
      },
      "max_chunks": 10
    }
  }'

Similarity

It calculates the embedding of the provided sentence in the Similarity filter and searches for the closest nodes in the proprietary Bigdata Vector Database. The following example searches for chunks closely related to the sentence Tariffs impacting US companies.

curl -X POST 'https://api.bigdata.com/v1/search' \
  -H 'Content-Type: application/json' \
  -H 'X-API-KEY: <your-api-key>' \
  --data '{
  "query": {
    "text": "Tariffs impacting US companies",
    "filters": {},
    "max_chunks": 10
  }
}'

When using the Similarity filter, you could also apply a second ranking phase to improve precision. More details in Rerank search.

For Python SDK users:We advise using a maximum of one Similarity filter per query. If you need to search for multiple sentences, you can create various queries, each with one Similarity filter.The operator AND (&) is supported, but the returned chunks must be closely related to all specified sentences in the Similarity filters.The operator OR (|) is not supported. Please create multiple queries with one Similarity filter each, and then you can combine their results.

Keyword

We can enrich the query criteria with positive or negative Keyword filters. The keyword match is at the document title level or the chunk text. For instance, the following query will retrieve chunks that mention “Announcement” and “2024” but not “2023” in either the chunk or the document’s title. Example:

curl -X POST 'https://api.bigdata.com/v1/search' \
  -H 'Content-Type: application/json' \
  -H 'X-API-KEY: <your-api-key>' \
  --data '{
  "query": {
    "filters": {
      "keyword": {
        "all_of": [
          "Announcement",
          "2024"
        ],
        "none_of": [
          "2023"
        ]
      }
    },
    "max_chunks": 10
  }
}'

The Keyword matching uses stemming, which means that the search will also match similar words. For example, searching for “resignation” will also match results containing the word “resignations”.

Topic

Bigdata identifies topics in the unstructured data so you can filter by them and find the text where those events have been identified. The Knowledge Graph defines 2.4k topics. The best way to get a list of relevant topics for your search is with the Co-mentions > Connected Topics method. You can also explore them in the Knowledge Graph > Find Topics page or send us an email at support@bigdata.com to request the whole taxonomy. Once you have the list of topic IDs you want to monitor, you can add them to the Search as a filter. Example:

curl -X POST 'https://api.bigdata.com/v1/search' \
  -H 'Content-Type: application/json' \
  -H 'X-API-KEY: <your-api-key>' \
  --data '{
  "query": {
    "filters": {
      "topic": {
        "any_of": [ 
          "business,labor-issues,executive-appointment,",
          "business,labor-issues,executive-resignation,",
          "business,labor-issues,executive-retirement,"
        ]
      }
    },
    "max_chunks": 10
  }
}'

Source

Bigdata’s ecosystem comprises key high-quality content sources, including web content, premium news, press wires, call transcripts, and regulatory filings. You can focus your search on a list of trusted sources to minimize the noise and ensure novel information in your results. Example:

curl -X POST 'https://api.bigdata.com/v1/search' \
  -H 'Content-Type: application/json' \
  -H 'X-API-KEY: <your-api-key>' \
  --data '{
  "query": {
    "filters": {
      "source": {
        "mode": "INCLUDE",
        "values": [
          "E54C73"
        ]
      },
      "entity": {
        "any_of": [
          "228D42"
        ]
      }
    },
    "max_chunks": 10
  }
}'

SentimentRange

With Sentiment Ranges you can filter out document chunks by specifying a sentiment score range between -1.00 and +1.00. This score reflects the sentiment of each chunk based on the language used in every sentence. A score closer to -1.00 indicates negative sentiment, while a score closer to +1.00 indicates positive sentiment.

The API support 3 values: Positive, Neutral and Negative.The Python SDK directly support numerical values for sentiment.

curl -X POST 'https://api.bigdata.com/v1/search' \
  -H 'Content-Type: application/json' \
  -H 'X-API-KEY: <your-api-key>' \
  --data '{
  "query": {
    "filters": {
      "entity": {
        "any_of": [
          "228D42",
          "D8442A"
        ]
      },
      "sentiment": {
        "values": [
          "negative"
        ]
      }
    },
    "max_chunks": 10
  }
}'

Document

The Document filter is only supported in the SDK, and we will soon release a RESTful endpoint to download the entire document.

By providing a document ID, you can retrieve all the chunks within that document, or all the chunks that meet the criteria of your query statements. Example:

from bigdata_client import Bigdata
from bigdata_client.query import Entity, Document

bigdata = Bigdata()

MICROSOFT = "228D42"

query = Entity(MICROSOFT) & Document("0B4EE52A6A611A8326D7EA3E8DC075E3","9C67269CD8747E33DDEE94554A13E6EC")

search = bigdata.search.new(query)
documents = search.run(2)
print(documents)

Transcript

You can filter by a transcript subtype. The possible values are:

ANALYST_INVESTOR_SHAREHOLDER_MEETING: Analyst, Investor and Shareholder meeting.
CONFERENCE_CALL: General Conference Call. Coming Soon
GENERAL_PRESENTATION: General Presentation.
EARNINGS_CALL: Earnings Call.
EARNINGS_RELEASE: Earnings Release. Coming Soon
GUIDANCE_CALL: Guidance Call.
SALES_REVENUE_CALL: Sales and Revenue Call.
SALES_REVENUE_RELEASE: Sales and Revenue Release. Coming Soon
SPECIAL_SITUATION_MA: Special Situation, M&A and Other.
SHAREHOLDERS_MEETING: Shareholders Meeting. Coming Soon
MANAGEMENT_PLAN_ANNOUNCEMENT: Management Plan Announcement. Coming Soon
INVESTOR_CONFERENCE_CALL: Investor Conference Call. Coming Soon

Example:

curl -X POST 'https://api.bigdata.com/v1/search' \
  -H 'Content-Type: application/json' \
  -H 'X-API-KEY: <your-api-key>' \
  --data '{
  "query": {
    "filters": {
       "document_type": {
                "mode": "INCLUDE",
                "values": [
                    {
                      "type": "TRANSCRIPT",
                      "subtypes": ["EARNINGS_CALL"]
                    }
                ]
            }
    },
    "max_chunks": 10
  }
}'

The API still needs to support SectionMetadata

SectionMetadata: This filter allows querying for segments inside transcript documents. A DocumentChunk will be defined by one or more sections, always within its hierarchical structure:
- QA: question and answer section. This section can be decomposed on:
  - QUESTION: a question made during the session to a speaker.
  - ANSWER: an answer from a speaker of the event.
- MANAGEMENT_DISCUSSION: Management Discussion Section.

Example:

from bigdata_client import Bigdata
from bigdata_client.query import Entity, TranscriptTypes, SectionMetadata

bigdata = Bigdata()

MICROSOFT = "228D42"

query = Entity(MICROSOFT) & TranscriptTypes.EARNINGS_CALL & SectionMetadata.MANAGEMENT_DISCUSSION

search = bigdata.search.new(query)
documents = search.run(2)
print(documents)

Filing

You can also query a specific Filing subtype. The possible values are:

SEC_10_K: Annual report filing regarding a company’s financial performance submitted to the Securities and Exchange Commission (SEC).
SEC_10_Q: Quarterly report filing regarding a company’s financial performance submitted to SEC.
SEC_8_K: Report filed whenever a significant corporate event takes place that triggers a disclosure submitted to SEC.
SEC_20_F: Annual report filing for non-U.S. and non-Canadian companies that have securities trading in the U.S.
SEC_S_1: Filing needed to register the securities of companies that wish to go public with the U.S.
SEC_S_3: Filing utilized when a company wishes to raise capital.
SEC_6_K: Report of foreign private issuer pursuant to rules 13a-16 and 15d-16.

Example:

curl -X POST 'https://api.bigdata.com/v1/search' \
  -H 'Content-Type: application/json' \
  -H 'X-API-KEY: <your-api-key>' \
  --data '{
  "query": {
    "filters": {
       "document_type": {
                "mode": "INCLUDE",
                "values": [
                    {
                      "type": "FILING",
                      "subtypes": ["SEC_10_K"]
                    }
                ]
            }
    },
    "max_chunks": 10
  }
}'

When querying Transcripts or Filings, it is helpful to narrow your search using the reporting details, such as fiscal year and quarter, as described in the next section.

Reporting details

They help you to specify the period and and the reporting company.

FiscalYear: Integer representing the annual reporting period.
FiscalQuarter: Integer representing the fiscal quarter covered.
ReportingEntity: Allows searching by the reporting company.

Example:

curl -X POST 'https://api.bigdata.com/v1/search' \
  -H 'Content-Type: application/json' \
  -H 'X-API-KEY: <your-api-key>' \
  --data '{
  "query": {
    "filters": {
      "document_type": {
          "mode": "INCLUDE",
          "values": [
            {
               "type": "TRANSCRIPT",
               "subtypes": ["EARNINGS_CALL"]
            }
          ]
      },
      "reporting_entities": [
        "228D42"
      ],
      "reporting_periods": [
        {
          "fiscal_year": 2024,
          "fiscal_quarter": 2
        }
      ]
    },
    "max_chunks": 10
  }
}'

FileTag

The API still needs to support FileTag filter

You can also add a tag to your query to filter by private documents that include that tag. Example:

from bigdata_client import Bigdata
from bigdata_client.query import FileTag

bigdata = Bigdata()

MICROSOFT = "228D42"

query = (
    Entity(MICROSOFT)
    & FileTag("tag_1", "tag_2")
    )

search = bigdata.search.new(query)
documents = search.run(2)
print(documents)

The API requests can contain multiple filters; the SDK uses the Query operators to combine them. For example, you can combine different query filters with & (AND) | (OR) and ~ (NOT) operators.

from bigdata_client import Bigdata
from bigdata_client.query import Entity, Keyword, Topic, Similarity

bigdata = Bigdata()

TESLA = "DD3BB1"
APPLE = "D8442A"
GOOGLE = "D8C3A1"

tech_companies = Entity(TESLA) | Entity(APPLE) | Entity(GOOGLE)
keywords = Similarity("executive appointment") | Keyword("CEO resignation")
topics = (
    Topic("business,labor-issues,executive-appointment,,")
    | Topic("business,labor-issues,executive-resignation,,")
    | Topic("business,labor-issues,executive-retirement,,")
)
query = tech_companies & (keywords | topics)

search = bigdata.search.new(query)

for result in search.limit_documents(2):
    print(result)

This should be sufficient for most use cases, but sometimes the query is built from an external list of entities, keywords, topics, etc. For example, provided a list of entity ids you could do:

from bigdata_client import Bigdata
from bigdata_client.query import Entity

bigdata = Bigdata()

entity_ids = read_entity_ids_from_file()  # Just for explanation purposes 
entities = [Entity(eid) for eid in entity_ids]
query = None
for entity in entities:
    if query is None:
        query = entity
    else:
        query = query | entity
search = bigdata.search.new(query)

documents = search.run(2)
print(documents)

This is a bit cumbersome, so we provide two helper function to make this easier: All and Any. The first one is used to combine a list of entities, keywords, topics, etc. with the AND operator, and the second one is used to combine them with the OR operator. With the help from Any the previous example would be rewritten as:

from bigdata_client import Bigdata
from bigdata_client.query import Entity, Any

bigdata = Bigdata()

entity_ids = read_entity_ids_from_file()  # Just for explanation purposes 
entities = [Entity(eid) for eid in entity_ids]
query = Any(entities)
search = bigdata.search.new(query)
documents = search.run(2)
print(documents)

Document Version

Document Version are not yet supported in the API.

Search by Document Version. Example:

from bigdata_client import Bigdata
from bigdata_client.query import DocumentVersion

bigdata = Bigdata()

VERSION = "RAW"

query = DocumentVersion(VERSION)
# Search for DocumentVersion
search = bigdata.search.new(query)

documents = search.run(2)
print(documents)

See class DocumentVersion for further details.

Watchlist

If you want to retrieve insights about any of the entities in a Watchlist, you can add all the entities in the query with a Any operator.

Watchlists are not yet supported in the API.

from bigdata_client import Bigdata
from bigdata_client.query import Any

bigdata = Bigdata()

MY_WATCHLIST_ID = "c2356958-48f6-4380-bb1f-c588656fb2c0"

watchlist = bigdata.watchlists.get(MY_WATCHLIST_ID)
companies = bigdata.knowledge_graph.get_entities(watchlist.items)

query = Any(companies)
search = bigdata.search.new(query)

documents = search.run(2)
for doc in documents:
    print(doc)

Checkout the page Watchlist management for more information on how to create and manage Watchlists.

Getting Started

Research Service

Search Service

Upload proprietary content

Knowledge Graph

Watchlist

Troubleshooting

Entity

Similarity

Keyword

Topic

Source

SentimentRange

Document

Transcript

Filing

Reporting details

FileTag

Document Version

Watchlist

Getting Started

Research Service

Search Service

Upload proprietary content

Knowledge Graph

Watchlist

Troubleshooting

​Entity

​Similarity

​Keyword

​Topic

​Source

​SentimentRange

​Document

​Transcript

​Filing

​Reporting details

​FileTag

​Query operators (SDK related)

​Document Version

​Watchlist

Entity

Similarity

Keyword

Topic

Source

SentimentRange

Document

Transcript

Filing

Reporting details

FileTag

Query operators (SDK related)

Document Version

Watchlist