Tutorial: Building a real-time financial news agent

This article examines how agentic AI is reshaping finance by enabling autonomous, adaptive systems that go beyond chatbots to deliver smarter decision-making, risk analysis, and personalized services.
Brett Young
Created on June 3|Last edited on June 9
Comment
In this project, we're building a modular LLM-powered agent that keeps track of the latest news for any set of public companies in your portfolio. This agent acts like an automated research assistant: for each company, it continuously scans the web for fresh headlines, uses a language model to extract key details into structured, machine-readable JSON, and filters out duplicates or outdated stories. For every headline, the agent then analyzes sentiment, generates a succinct multi-sentence summary, and provides a direct explanation of why the story matters for the business or its industry. The end result is a well-organized, date-stamped news digest per company, clearly separated into positive and negative trends, with actionable summaries—ready for easy review or reporting.
Here’s the code for our agent: 
import time
import json
import re
from datetime import datetime, timedelta
from openai import OpenAI
import weave; weave.init('finance_agents')
﻿
﻿
# --------- CONFIGURATION ----------
PORTFOLIO = [
    "Apple Inc",
    "Microsoft Corporation",
    "Tesla Inc",
    "Amazon.com Inc",
    "Alphabet Inc"
]
DAYS_BACK = 10
MAX_RESULTS_PER_COMPANY = 8
﻿
client = OpenAI()
﻿
def date_list(days_back):
    return [(datetime.now() - timedelta(days=i)).strftime("%Y-%m-%d") for i in range(days_back)]
﻿
LAST_N_DATES = set(date_list(DAYS_BACK))
﻿
def extract_json(response_text):
    if isinstance(response_text, list):
        return response_text
    if isinstance(response_text, dict):
        return [response_text]
    match = re.search(r"```json\s*(.*?)\s*```", response_text, re.DOTALL)
    if not match:
        match = re.search(r"```(.*?)```", response_text, re.DOTALL)
    json_str = ""
    if match:
        json_str = match.group(1)
    else:
        json_str = response_text.strip()
    try:
        obj = json.loads(json_str)
        if isinstance(obj, dict):
            return [obj]
        if isinstance(obj, list):
            return obj
        print(f"Unrecognized result type after json.loads: {type(obj)}")
        return []
    except Exception as load_err:
        print("Error parsing JSON from extract_json:", load_err)
        print("Raw attempted string:\n", json_str[:500])
        return []
﻿
@weave.op 
def fetch_news(company):
    prompt = (
        f"Find up to {MAX_RESULTS_PER_COMPANY} of the most important English news stories about '{company}' "
        f"from the last {DAYS_BACK} days. Strictly output as a JSON array in a markdown code block like this: "
        "[{{\"title\": \"...\", \"url\": \"...\", \"date_published\": \"YYYY-MM-DD\", \"snippet\": \"...\"}}] "
        "If there are none, return []."
    )
    try:
        response = client.responses.create(
            model="gpt-4.1",
            tools=[{"type": "web_search_preview"}],
            input=prompt,
        )
        output = response.output_text.strip()
        stories = extract_json(output)
        if not isinstance(stories, list):
            print(f"Unexpected format for {company}: {output}")
            return []
        return stories
    except Exception as e:
        print(f"Error fetching news for {company}: {e}")
        return []
﻿
﻿
@weave.op 
def classify_and_summarize(title, snippet, date):
    """
    Asks the model for polarity, a rich summary, and a significance/impact statement, then returns all.
    """
    prompt = (
        "Given the following news story, respond ONLY with a JSON object (NOT markdown or code block): "
        "{"
          "\"sentiment\": \"positive\" or \"negative\", "
          "\"summary\": \"detailed, multi-sentence summary (about 10 sentences)\", "
          "\"implications\": \"Succinctly explain why this news matters for the company or industry (1-2 sentences)\", "
          "\"date\": \"YYYY-MM-DD\""
        "}. "
        "Do not output anything except valid JSON. "
        f"Title: {title}\nSnippet: {snippet}\nDate: {date}"
    )
    resp = client.chat.completions.create(
        model="gpt-4.1",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2,
    )
    text = resp.choices[0].message.content.strip()
    try:
        result = json.loads(text)
        return (
            result.get("sentiment", "unknown"),
            result.get("summary", ""),
            result.get("implications", ""),
            result.get("date", date),
        )
    except Exception:
        sent = re.search(r'"sentiment"\s*:\s*"(\w+)"', text)
        summ = re.search(r'"summary"\s*:\s*"([^"]+)"', text)
        impact = re.search(r'"implications"\s*:\s*"([^"]+)"', text)
        date_f = re.search(r'"date"\s*:\s*"([^"]+)"', text)
        return (
            sent.group(1) if sent else "unknown",
            summ.group(1) if summ else "",
            impact.group(1) if impact else "",
            date_f.group(1) if date_f else date,
        )
﻿
# --------- MAIN SCRIPT -----------
all_results = {}
﻿
for company in PORTFOLIO:
    print(f"\n--- Fetching news for {company} (last {DAYS_BACK} days) ---")
    stories = fetch_news(company)
    seen_urls = set()
    recent_stories = []
    for story in stories:
        url = story.get('url','')
        date_pub = story.get('date_published','')[:10]
        if url and url not in seen_urls and date_pub in LAST_N_DATES:
            recent_stories.append(story)
            seen_urls.add(url)
            if len(recent_stories) >= MAX_RESULTS_PER_COMPANY:
                break
    all_results[company] = recent_stories
    time.sleep(2)
﻿
print("\n============================")
print(f"📰 NEWS PORTFOLIO DIGEST: Last {DAYS_BACK} days\n")
for company in PORTFOLIO:
    print(f"\n{'='*40}\n{company}\n{'='*40}")
    stories = all_results.get(company, [])
    if not stories:
        print("No recent news found.")
        continue
    stories_by_date = {d: [] for d in LAST_N_DATES}
    for entry in stories:
        date_pub = entry.get('date_published', '')[:10]
        if date_pub in LAST_N_DATES:
            stories_by_date[date_pub].append(entry)
    for dt in sorted(LAST_N_DATES, reverse=True):
        dt_stories = stories_by_date.get(dt, [])
        if not dt_stories:
            continue
        print(f"\n--- {dt} ---")
        positive, negative = [], []
        for entry in dt_stories:
            title = entry.get('title', '')
            snippet = entry.get('snippet', '')
            url = entry.get('url', '')
            sentiment, summary, implications, date_clean = classify_and_summarize(title, snippet, dt)
            # RICH, DETAILED OUTPUT:
            display = (
                f"* [{title}]({url}) ({date_clean})\n"
                f"  - Sentiment: {sentiment.upper()}\n"
                f"  - Full snippet: {snippet}\n"
                f"  - Detailed summary: {summary}\n"
                f"  - Why it matters: {implications}"
            )
            if sentiment.lower() == "positive":
                positive.append(display)
            else:
                negative.append(display)
            time.sleep(1.2)
        if positive:
            print("\n🟢 POSITIVE NEWS:")
            print('\n\n'.join(positive))
        if negative:
            print("\n🔴 NEGATIVE NEWS:")
            print('\n\n'.join(negative))
print("\n--- END OF PORTFOLIO SUMMARY ---")
First, you define your portfolio by listing the companies you want to track. The script then loops over each company, asking an LLM (with web access) to find the most important news stories from the last several days. The model is prompted to return a structured JSON array for each story—containing the headline, link, snippet, and publication date. The output is filtered for duplicates and only includes articles from the time window you specify.
For every recent story it finds, the agent runs a second model call. This time, it asks for the sentiment of the story (positive or negative), a detailed multi-sentence summary, and a concise explanation of why this news might matter to investors or the company itself. This information is also returned in strict JSON. The script includes some simple logic to recover data if the model ever returns malformed responses, so the pipeline doesn’t break on odd output.
All this is done for each company in your portfolio, with optional delays to avoid hitting any API rate limits. Once the results are collected, the script neatly prints out everything—sorted by company and date, and grouped by sentiment. For each article, you’ll see the headline, publication date, original snippet, the agent-generated summary, and the impact explanation. The whole thing is easily configurable: you can change the date range, the number of headlines, or swap out companies as needed.
With W&B Weave paired to this workflow, you get full visibility—every prompt, response, and summary is automatically logged and inspectable, making it easy to monitor, debug, or evolve your news agent over time. Here’s a screenshot inside Weave of how our agent is running: 
﻿
﻿
This simple script combining LLMs with web search quickly evolved into a powerful tool for investors. We built an agent that delivers actionable intelligence in a clean, auditable format. The value isn’t in any single part, but in how they work together: lightweight tools, stitched into a pipeline that now feels like a real-time research analyst, always on, always filtering signal from noise.
﻿
﻿
﻿
﻿
Add a comment