ThematicScreener

Screen a universe of companies for exposure to a given theme using LLM-powered analysis.

Parameters

llm_model (str): LLM <provider::model> to use for text processing and analysis (e.g., "openai::gpt-4o-mini").
main_theme (str): The main theme to screen for. Sub-themes are generated from this.
companies (List[Company]): List of companies to analyze.
start_date (str): Start date for searching relevant documents (YYYY-MM-DD).
end_date (str): End date for searching relevant documents (YYYY-MM-DD).
document_type (DocumentType): Type of documents to search (NEWS, FILINGS, TRANSCRIPTS).
fiscal_year (int, optional): Fiscal year to analyze.
sources (Optional[List[str]]): Filter search results by document sources.
rerank_threshold (Optional[float]): Threshold for reranking search results.
focus (str, optional): Focus for sub-theme generation.

Returns

ThematicScreener instance.

Example

from bigdata_research_tools.screener import ThematicScreener
from bigdata_client.models.entities import Company
from bigdata_client.models.search import DocumentType

companies = [Company(name="Apple Inc.", ticker="AAPL"), Company(name="Microsoft Corp.", ticker="MSFT")]

screener = ThematicScreener(
    llm_model="openai::gpt-4o-mini",
    main_theme="AI",
    companies=companies,
    start_date="2024-01-01",
    end_date="2024-06-30",
    document_type=DocumentType.NEWS,
    rerank_threshold=0.7
)

screen_companies

Screen companies for thematic exposure and generate labeled results, company/industry summaries, and motivations.

Parameters

document_limit (int, optional): Max documents per query (default: 10).
batch_size (int, optional): Number of entities per batch (default: 10).
frequency (str, optional): Date range frequency ('Y', 'M', 'W', 'D', default: '3M').
word_range (Tuple[int, int], optional): Word count range for motivations (default: (50, 100)).
export_path (str, optional): Path to export results as Excel.

Returns

dict with:
- df_labeled: DataFrame with labeled search results.
- df_company: DataFrame with company-level output.
- df_industry: DataFrame with industry-level output.
- theme_tree: ThemeTree object used for screening.

Example

results = screener.screen_companies(
    document_limit=10,
    batch_size=10,
    frequency="3M",
    word_range=(50, 100),
    export_path="output/thematic_screening.xlsx"
)

df_labeled = results["df_labeled"]
df_company = results["df_company"]
df_industry = results["df_industry"]
theme_tree = results["theme_tree"]

Narrative miner Labeler

On this page

ThematicScreener
screen_companies

ThematicScreener

Screen a universe of companies for exposure to a given theme using LLM-powered analysis.

Parameters

llm_model (str): LLM <provider::model> to use for text processing and analysis (e.g., "openai::gpt-4o-mini").
main_theme (str): The main theme to screen for. Sub-themes are generated from this.
companies (List[Company]): List of companies to analyze.
start_date (str): Start date for searching relevant documents (YYYY-MM-DD).
end_date (str): End date for searching relevant documents (YYYY-MM-DD).
document_type (DocumentType): Type of documents to search (NEWS, FILINGS, TRANSCRIPTS).
fiscal_year (int, optional): Fiscal year to analyze.
sources (Optional[List[str]]): Filter search results by document sources.
rerank_threshold (Optional[float]): Threshold for reranking search results.
focus (str, optional): Focus for sub-theme generation.

Returns

ThematicScreener instance.

Example

from bigdata_research_tools.screener import ThematicScreener
from bigdata_client.models.entities import Company
from bigdata_client.models.search import DocumentType

companies = [Company(name="Apple Inc.", ticker="AAPL"), Company(name="Microsoft Corp.", ticker="MSFT")]

screener = ThematicScreener(
    llm_model="openai::gpt-4o-mini",
    main_theme="AI",
    companies=companies,
    start_date="2024-01-01",
    end_date="2024-06-30",
    document_type=DocumentType.NEWS,
    rerank_threshold=0.7
)

screen_companies

Screen companies for thematic exposure and generate labeled results, company/industry summaries, and motivations.

Parameters

document_limit (int, optional): Max documents per query (default: 10).
batch_size (int, optional): Number of entities per batch (default: 10).
frequency (str, optional): Date range frequency ('Y', 'M', 'W', 'D', default: '3M').
word_range (Tuple[int, int], optional): Word count range for motivations (default: (50, 100)).
export_path (str, optional): Path to export results as Excel.

Returns

dict with:
- df_labeled: DataFrame with labeled search results.
- df_company: DataFrame with company-level output.
- df_industry: DataFrame with industry-level output.
- theme_tree: ThemeTree object used for screening.

Example

results = screener.screen_companies(
    document_limit=10,
    batch_size=10,
    frequency="3M",
    word_range=(50, 100),
    export_path="output/thematic_screening.xlsx"
)

df_labeled = results["df_labeled"]
df_company = results["df_company"]
df_industry = results["df_industry"]
theme_tree = results["theme_tree"]

Narrative miner Labeler

On this page

ThematicScreener
screen_companies

Screener

ThematicScreener

screen_companies

Introduction

Research Service

Research Tools

Screener

ThematicScreener

screen_companies

​ThematicScreener

​screen_companies

Introduction

Research Service

Research Tools

​ThematicScreener

​screen_companies

ThematicScreener

screen_companies

ThematicScreener

screen_companies