Introduction
In this use case, we explore the effect of cross encoder in action. In the first step, a similarity search rapidly scans hundreds of millions of documents to surface a list of matches. While this method efficiently gathers candidates, not all of them may be perfectly aligned with the query’s intent. We call this the baseline search results. Then we run search for the same query with cross encoder re-ranking enabled, which filters out documents that do not meet the relevance threshold.Step 0: Prerequisites
We need to import the Bigdata client library with the supporting modules:Step 1: Initialization
First we need to initialize the Bigdata class. This class will be used to interact with the Bigdata API. The authentication is done here too, by loading theBIGDATA_USERNAME
and BIGDATA_PASSWORD
environment variables. If they are not set, you
can either set them or pass the username
and password
arguments to
the Bigdata
class:
Step 2: Define Helper Functions
We define a helper function to print the search results. This function prints the search results in a readable format:Step 3: Define Search Query
As an example, we explore the potential impact of President Trump’s proposed tax cuts---referred to as “Trump 2.0” tax cuts---on the federal deficit. We use aSimilarity
query to search for documents that are
similar to the query string.
Step 4: Phase 1 - Baseline Search
We first run a search with the query string to see the results without cross encoder re-ranking. This will give us a baseline to compare the results later once we enable cross encoder re-ranking.Step 4: Phase 2 - Search with Re-Ranking
Thererank_threshold
argument is used to apply the re-ranking using
the cross encoder between the query and the initial search results. This
will filter out the documents that have a re-ranking relevance score
below this threshold, so enhances the relevance of the final search
results.
Conclusion
The results are re-ranked to prioritize highly relevant matches with the query intent, ensuring the most pertinent results appear at the top. For example, results related to adjacent topics (e.g., “tariffs”) are deprioritized in favor of those directly aligned with the search query. Any results falling below thererank_threshold
are filtered out,
improving overall relevance.
For more details, please refer to the Bigdata.com API official
documentation.
Happy Searching! 🚀