FinSim v1

98138631 · Fares · 98138631 · 98138631 · 98138631 · 98138631
Commit 98138631 authored Apr 08, 2026 by Fares
44 changed files
--- a/investment_engine/.env
+++ b/investment_engine/.env
+GROQ_API_KEY=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
+SERPAPI_API_KEY=0b123e5cf375884f50e23cfae6de2afb76f4b0bac1c05abd700d8357c3ac2377
--- a/investment_engine/.python-version
+++ b/investment_engine/.python-version
+3.12
--- a/investment_engine/HOW_TO_RUN.txt
+++ b/investment_engine/HOW_TO_RUN.txt
+================================================================================
+                        FinSim — HOW TO RUN (Complete Guide)
+================================================================================
+
+Last Updated: March 31, 2026
+
+
+================================================================================
+  WHAT WAS INSTALLED ON YOUR MACHINE
+================================================================================
+
+  1. UV Package Manager (v0.11.2)
+     - Location: C:\Users\fmfmf\.local\bin\uv.exe
+     - UV is a modern Python package manager (replaces pip/venv/virtualenv)
+
+  2. Python 3.12.13 (installed via UV)
+     - Managed by UV, no need to install separately
+
+  3. Virtual Environment + 67 Python packages
+     - Location: investment_engine\.venv\
+     - All dependencies (FastAPI, Polars, yfinance, LangChain, Groq, etc.)
+
+  4. LangGraph was added as a missing dependency to pyproject.toml
+
+
+================================================================================
+  QUICK START — Run FinSim
+================================================================================
+
+  Open PowerShell (or Windows Terminal) and run:
+
+    cd C:\Users\fmfmf\Desktop\FinSim\FinSim\investment_engine
+    $env:Path = "C:\Users\fmfmf\.local\bin;$env:Path"
+    uv run python main.py
+
+  Then open your browser to:
+
+    http://localhost:8000
+
+  That's it! The UI has two panels:
+    - LEFT:  "Generate Historical Scenarios" — fetches stock data, runs AI, saves to DB
+    - RIGHT: "Interactive Advisor Bot" — chat with the AI investment advisor
+
+
+================================================================================
+  STOPPING THE SERVER
+================================================================================
+
+  Press Ctrl+C in the terminal where the server is running.
+
+
+================================================================================
+  DETAILED STEP-BY-STEP (First Time Setup)
+================================================================================
+
+  If you ever need to set this up on a fresh machine again:
+
+  Step 1: Install UV (one-time)
+  ────────────────────────────
+    Open Powershell and run:
+      Set-ExecutionPolicy RemoteSigned -Scope CurrentUser -Force
+      irm https://astral.sh/uv/install.ps1 | iex
+
+  Step 2: Add UV to PATH (every new terminal session)
+  ────────────────────────────────────────────────────
+      $env:Path = "C:\Users\fmfmf\.local\bin;$env:Path"
+
+    TIP: To make this permanent, add C:\Users\fmfmf\.local\bin to your
+    Windows System PATH via:
+      Settings > System > About > Advanced system settings >
+      Environment Variables > Path > Edit > New
+
+  Step 3: Install Python (one-time)
+  ─────────────────────────────────
+      uv python install 3.12
+
+  Step 4: Navigate to project and sync dependencies
+  ──────────────────────────────────────────────────
+      cd C:\Users\fmfmf\Desktop\FinSim\FinSim\investment_engine
+      uv sync
+
+  Step 5: Run the application
+  ───────────────────────────
+      uv run python main.py
+
+
+================================================================================
+  DATABASE INFO (Already Connected & Working)
+================================================================================
+
+  Host:     scenariodb.caprover.al-arcade.com
+  Port:     3306
+  User:     root
+  Password: Alarcade123#
+  Database: mcq_app
+  Table:    scenarios (141 scenarios currently in DB)
+
+  Other tables in the DB: quiz_attempts, quiz_scenarios, quizzes,
+                          user_scenario_history, users
+
+  The database connection is configured in:
+    investment_engine\config.py  (default values)
+
+  Connection is tested automatically on app startup (init_db).
+
+
+================================================================================
+  API KEYS (Already Configured in .env)
+================================================================================
+
+  File: investment_engine\.env
+
+  Contains:
+    GROQ_API_KEY     — For the Llama-3.3-70b AI (scenario generation + chat)
+    SERPAPI_API_KEY   — For real-time web search in the chat bot
+
+  If keys expire, replace them in the .env file. No code changes needed.
+
+
+================================================================================
+  API ENDPOINTS
+================================================================================
+
+  GET  /          → Serves the Web UI (index.html)
+  POST /generate  → Generates scenarios (body: JSON with stock_symbol, etc.)
+  POST /chat      → Chat with the advisor bot (body: JSON with message)
+
+  Example curl for generate:
+    curl -X POST http://localhost:8000/generate ^
+      -H "Content-Type: application/json" ^
+      -d "{\"stock_symbol\":\"AAPL\",\"zscore_window\":100,\"zscore_trigger_min\":-2.5,\"zscore_trigger_max\":2.5}"
+
+  Example curl for chat:
+    curl -X POST http://localhost:8000/chat ^
+      -H "Content-Type: application/json" ^
+      -d "{\"session_id\":\"my_session\",\"message\":\"Give me a scenario\"}"
+
+
+================================================================================
+  PROJECT FILE STRUCTURE
+================================================================================
+
+  investment_engine\
+  ├── main.py                  Entry point (starts Uvicorn server on port 8000)
+  ├── app.py                   FastAPI app with /generate and /chat routes
+  ├── config.py                Environment config (DB creds, API keys, defaults)
+  ├── models.py                Pydantic data models (request/response contracts)
+  ├── .env                     Secret keys (GROQ_API_KEY, SERPAPI_API_KEY)
+  ├── pyproject.toml           Project definition + dependencies
+  ├── uv.lock                  Locked dependency versions
+  ├── services\
+  │   ├── zscore_engine.py     Polars Z-Score calculation + yfinance data fetch
+  │   ├── scenario_gen.py      Groq LLM prompt → structured MCQ generation
+  │   ├── database.py          MySQL connection pool, insert, random fetch
+  │   └── chat_agent.py        LangGraph ReAct agent (advisor bot)
+  └── static\
+      └── index.html           Frontend UI (dark theme, two-panel layout)
+
+
+================================================================================
+  TROUBLESHOOTING
+================================================================================
+
+  Problem: "uv is not recognized"
+  Solution: Run this first in your terminal:
+    $env:Path = "C:\Users\fmfmf\.local\bin;$env:Path"
+
+  Problem: "Python was not found"
+  Solution: Don't run "python" directly. Always use "uv run python ..."
+    UV manages its own Python installation.
+
+  Problem: "GROQ_API_KEY is not set"
+  Solution: Make sure the .env file exists in the investment_engine folder
+    and contains your key: GROQ_API_KEY=gsk_...
+
+  Problem: "Could not initialize DB on startup"
+  Solution: Check your internet connection. The MySQL database is remote
+    (hosted on CapRover). If the host is down, the app will still start
+    but DB features won't work.
+
+  Problem: Port 8000 already in use
+  Solution: Either stop the other process using port 8000, or edit main.py
+    and change the port number in: uvicorn.run("app:app", port=8000)
+
+  Problem: "No events found for any requested symbols"
+  Solution: yfinance may have rate-limited you. Wait a minute and try again,
+    or try a different stock symbol.
+
+  Problem: Generation takes too long
+  Solution: The scenario generation pipeline does 3 things sequentially:
+    1. Downloads 5 years of stock data (yfinance) — ~5 seconds
+    2. Calculates Z-Scores with Polars — instant
+    3. Calls Groq LLM to generate scenarios — ~10-20 seconds
+    Total: ~15-30 seconds is normal.
+
+
+================================================================================
+  USEFUL COMMANDS
+================================================================================
+
+  Start the server:
+    uv run python main.py
+
+  Add a new dependency:
+    uv add <package-name>
+
+  Update all dependencies:
+    uv sync --upgrade
+
+  Run a one-off Python script:
+    uv run python <script.py>
+
+  Check installed packages:
+    uv pip list
+
+
+================================================================================
--- a/investment_engine/LangGraph_Explanation.txt
+++ b/investment_engine/LangGraph_Explanation.txt
--- a/investment_engine/PROJECT_OVERVIEW.txt
+++ b/investment_engine/PROJECT_OVERVIEW.txt
+================================================================================
+                           FinSim
+          AI-Powered Investment Simulation & Education Platform
+================================================================================
+
+
+WHAT IS FINSIM?
+───────────────────────────────────────────────────────────────────────
+
+FinSim is a full-stack web application that teaches people how to make
+smarter investment decisions — using real market data, artificial
+intelligence, and interactive simulations.
+
+Think of it as a personal investment training ground: you can chat with
+an AI advisor, test your knowledge with scenario-based quizzes, predict
+stock movements with a multi-agent AI engine, and even ask "what if I
+had invested in Tesla 5 years ago?" and get the real math.
+
+The platform is built for students, aspiring investors, and anyone who
+wants to understand how financial markets work — without risking real
+money.
+
+
+CORE FEATURES
+───────────────────────────────────────────────────────────────────────
+
+1. USER AUTHENTICATION & DASHBOARD
+   - Secure login/registration system
+   - Personal dashboard showing quiz scores, accuracy stats, and
+     recent performance
+   - Tracks your learning progress over time
+
+2. AI INVESTMENT ADVISOR (Chat)
+   - A conversational AI chatbot powered by Groq's LLaMA 3.3 (70B)
+   - Has access to live market data (real stock prices via yfinance)
+     and web search (SerpAPI) for current news
+   - Can answer questions about stocks, bonds, ETFs, portfolio
+     strategy, and financial concepts
+   - Can generate practice MCQ scenarios on demand
+
+3. SCENARIO GENERATOR (Z-Score Pipeline)
+   - Fetches 5 years of real historical stock prices from Yahoo Finance
+   - Uses statistical analysis (Z-Scores via Polars) to detect
+     significant market events and anomalies
+   - Feeds those events into an AI model that generates realistic
+     investment scenario questions with 4 multiple-choice answers
+   - Each scenario includes a best answer with rationale and 3 wrong
+     answers with explanations for why they're wrong
+   - All scenarios are stored in a MySQL database
+
+4. PRACTICE MCQ QUIZ SYSTEM
+   - Timed quizzes with real-world investment scenarios
+   - Configurable: choose number of questions and difficulty
+   - Instant feedback after each answer with detailed explanations
+   - Scores are tracked and contribute to your dashboard stats
+
+5. LEADERBOARD
+   - Competitive ranking across all users
+   - Shows total score, quizzes taken, and average performance
+   - Encourages engagement through friendly competition
+
+6. AKINATOR 2.0 — Multi-Agent Investment Prediction Engine
+   This is the flagship feature. A sophisticated AI system built with
+   LangGraph (a graph-based AI orchestration framework) that runs
+   9 interconnected processing nodes:
+
+   - ROUTER: Classifies user queries and determines the processing path
+   - WHAT-IF ENGINE: Analyzes hypothetical past investments using real
+     historical data ("What if I invested $10K in Apple 3 years ago?")
+   - ANALYST HUB: A ReAct agent that runs 5 expert tools in parallel:
+       * Market Data Analyst (live prices, PE ratios, returns)
+       * News & Sentiment Analyst (current headlines and market mood)
+       * Risk Assessment Analyst (volatility, Sharpe ratio, drawdown)
+       * Portfolio Strategy Advisor (allocation recommendations)
+       * NewsAPI Headlines (dedicated news fetching)
+   - NEWS SENTIMENT SCORER: Calculates a sentiment score (0-100) from
+     real headlines using keyword-based analysis
+   - CONFIDENCE SCORER: Rates the prediction's reliability (0-95%)
+     based on data completeness and source alignment
+   - SELF-CORRECTION (CRITIQUE): Reviews the prediction against the
+     news sentiment — if the AI says "buy" but the news is bearish,
+     it flags the contradiction with a correction note
+   - JIT EDUCATION: Scans the response for financial jargon (P/E Ratio,
+     Volatility, Sharpe Ratio, etc.) and provides plain-English
+     definitions so beginners can learn as they read
+   - INVESTMENT MEMO: Compiles the entire analysis into a professional
+     summary document
+   - FORMAT RESPONSE: Assembles everything into the final output
+
+   Additionally, Akinator 2.0 features a "Panel Discussion Mode" where
+   10 distinct investor personas (risk manager, quant, aggressive trader,
+   value investor, macro economist, technical analyst, institutional
+   banker, crypto enthusiast, behavioral psychologist, ESG advocate)
+   debate the investment using real data before making a consensus
+   recommendation.
+
+
+TECHNOLOGY STACK
+───────────────────────────────────────────────────────────────────────
+
+Backend:
+  - Python 3.12
+  - FastAPI (web framework and REST API)
+  - LangGraph (graph-based AI workflow orchestration)
+  - LangChain (LLM integration and tool calling)
+  - Groq API with LLaMA 3.3 70B Versatile (large language model)
+  - yfinance (real-time and historical stock market data)
+  - SerpAPI (live web search and news)
+  - Polars (high-performance data processing, written in Rust)
+  - MySQL (remote database for scenario storage)
+  - bcrypt (password hashing)
+
+Frontend:
+  - Vanilla HTML, CSS, JavaScript (no frameworks)
+  - marked.js (markdown rendering in chat)
+  - Responsive dark-mode UI with glassmorphism design
+  - Mobile-friendly with sidebar navigation
+
+Infrastructure:
+  - UV package manager (reproducible Python environments)
+  - CapRover (remote MySQL hosting)
+
+
+HOW IT ALL CONNECTS
+───────────────────────────────────────────────────────────────────────
+
+                    ┌─────────────────────┐
+                    │    User Browser     │
+                    │  (HTML/CSS/JS UI)   │
+                    └────────┬────────────┘
+                             │ HTTP
+                             ▼
+                    ┌─────────────────────┐
+                    │   FastAPI Server    │
+                    │      (app.py)       │
+                    └────────┬────────────┘
+                             │
+          ┌──────────────────┼──────────────────┐
+          │                  │                  │
+          ▼                  ▼                  ▼
+   ┌─────────────┐  ┌──────────────┐  ┌──────────────┐
+   │  AI Advisor  │  │  Akinator    │  │  Scenario    │
+   │ (chat_agent) │  │  2.0 Graph   │  │  Generator   │
+   │  ReAct Agent │  │  (9 nodes)   │  │  (Z-Scores)  │
+   └──────┬──────┘  └──────┬───────┘  └──────┬───────┘
+          │                │                  │
+          ▼                ▼                  ▼
+   ┌─────────────────────────────────────────────────┐
+   │              External Services                   │
+   │  Groq LLM  |  yfinance  |  SerpAPI  |  MySQL   │
+   └─────────────────────────────────────────────────┘
+
+
+WHAT MAKES THIS PROJECT SPECIAL
+───────────────────────────────────────────────────────────────────────
+
+1. Real Data, Not Simulations
+   Unlike most educational tools that use fake numbers, FinSim pulls
+   live stock prices, real news headlines, and actual historical data.
+   Every prediction and scenario is grounded in reality.
+
+2. Multi-Agent AI Architecture
+   The Akinator 2.0 doesn't just call one AI model — it orchestrates
+   multiple specialized "agents" (market analyst, risk assessor, news
+   scanner, strategy advisor) that each gather different data, then
+   synthesizes their findings into a unified recommendation.
+
+3. Self-Correcting AI
+   The system reviews its own predictions against current news
+   sentiment. If there's a contradiction, it flags it automatically.
+   This teaches users that even AI predictions need critical thinking.
+
+4. Learning While Using
+   The JIT Education system detects financial jargon in AI responses
+   and explains terms in plain English. Users learn new financial
+   vocabulary naturally as they interact with the system.
+
+5. Graph-Based AI Orchestration
+   Built with LangGraph, the Akinator uses a directed graph where
+   data flows through 9 processing nodes with conditional branching.
+   This is the same architecture used by production AI systems at
+   companies like Google and OpenAI.
+
+6. Full-Stack, Production-Quality
+   User authentication, database persistence, responsive mobile UI,
+   leaderboards, error handling, rate limit management — this isn't
+   a prototype, it's a complete application.
+
+
+================================================================================
+  Built with Python, FastAPI, LangGraph, LangChain, Groq AI, and love.
+================================================================================
--- a/investment_engine/README.md
+++ b/investment_engine/README.md
--- a/investment_engine/__pycache__/app.cpython-312.pyc
+++ b/investment_engine/__pycache__/app.cpython-312.pyc
--- a/investment_engine/__pycache__/config.cpython-312.pyc
+++ b/investment_engine/__pycache__/config.cpython-312.pyc
--- a/investment_engine/__pycache__/models.cpython-312.pyc
+++ b/investment_engine/__pycache__/models.cpython-312.pyc
--- a/investment_engine/app.py
+++ b/investment_engine/app.py
--- a/investment_engine/chroma_data/3fdeae09-15b0-4d9a-bacd-de89598b5415/data_level0.bin
+++ b/investment_engine/chroma_data/3fdeae09-15b0-4d9a-bacd-de89598b5415/data_level0.bin
--- a/investment_engine/chroma_data/3fdeae09-15b0-4d9a-bacd-de89598b5415/header.bin
+++ b/investment_engine/chroma_data/3fdeae09-15b0-4d9a-bacd-de89598b5415/header.bin
--- a/investment_engine/chroma_data/3fdeae09-15b0-4d9a-bacd-de89598b5415/length.bin
+++ b/investment_engine/chroma_data/3fdeae09-15b0-4d9a-bacd-de89598b5415/length.bin
--- a/investment_engine/chroma_data/3fdeae09-15b0-4d9a-bacd-de89598b5415/link_lists.bin
+++ b/investment_engine/chroma_data/3fdeae09-15b0-4d9a-bacd-de89598b5415/link_lists.bin
--- a/investment_engine/chroma_data/7dbd1834-e235-47a7-bcf0-7f3bbc160e76/data_level0.bin
+++ b/investment_engine/chroma_data/7dbd1834-e235-47a7-bcf0-7f3bbc160e76/data_level0.bin
--- a/investment_engine/chroma_data/7dbd1834-e235-47a7-bcf0-7f3bbc160e76/header.bin
+++ b/investment_engine/chroma_data/7dbd1834-e235-47a7-bcf0-7f3bbc160e76/header.bin
--- a/investment_engine/chroma_data/7dbd1834-e235-47a7-bcf0-7f3bbc160e76/length.bin
+++ b/investment_engine/chroma_data/7dbd1834-e235-47a7-bcf0-7f3bbc160e76/length.bin
--- a/investment_engine/chroma_data/7dbd1834-e235-47a7-bcf0-7f3bbc160e76/link_lists.bin
+++ b/investment_engine/chroma_data/7dbd1834-e235-47a7-bcf0-7f3bbc160e76/link_lists.bin
--- a/investment_engine/chroma_data/b22fd19f-7bb9-4630-97ca-009a721acc18/data_level0.bin
+++ b/investment_engine/chroma_data/b22fd19f-7bb9-4630-97ca-009a721acc18/data_level0.bin
--- a/investment_engine/chroma_data/b22fd19f-7bb9-4630-97ca-009a721acc18/header.bin
+++ b/investment_engine/chroma_data/b22fd19f-7bb9-4630-97ca-009a721acc18/header.bin
--- a/investment_engine/chroma_data/b22fd19f-7bb9-4630-97ca-009a721acc18/length.bin
+++ b/investment_engine/chroma_data/b22fd19f-7bb9-4630-97ca-009a721acc18/length.bin
--- a/investment_engine/chroma_data/b22fd19f-7bb9-4630-97ca-009a721acc18/link_lists.bin
+++ b/investment_engine/chroma_data/b22fd19f-7bb9-4630-97ca-009a721acc18/link_lists.bin
--- a/investment_engine/chroma_data/chroma.sqlite3
+++ b/investment_engine/chroma_data/chroma.sqlite3
--- a/investment_engine/config.py
+++ b/investment_engine/config.py
+"""
+Configuration — central environment variables definitions using pydantic-settings
+"""
+
+from pydantic_settings import BaseSettings
+from functools import lru_cache
+
+
+class Settings(BaseSettings):
+    # ── Groq API ────────────────────────────────────────────
+    GROQ_API_KEY: str = ""
+
+    # Model used for scenario generation and chat
+    GROQ_MODEL: str = "llama-3.3-70b-versatile"
+    
+    # ── SerpAPI ─────────────────────────────────────────────
+    SERPAPI_API_KEY: str = ""
+
+    # ── MySQL (Al-Arcade Remote DB) ───────────────────────
+    MYSQL_HOST: str = "scenariodb.caprover.al-arcade.com"
+    MYSQL_PORT: int = 3306
+    MYSQL_USER: str = "root"
+    MYSQL_PASSWORD: str = "Alarcade123#"
+    MYSQL_DATABASE: str = "mcq_app"
+
+    # ── Defaults for Z-Score ──────────────────────────────
+    DEFAULT_ZSCORE_WINDOW: int = 100
+    DEFAULT_ZSCORE_TRIGGER_MIN: float = -2.5
+    DEFAULT_ZSCORE_TRIGGER_MAX: float = 2.5
+
+    class Config:
+        env_file = ".env"
+        env_file_encoding = "utf-8"
+
+
+@lru_cache()
+def get_settings() -> Settings:
+    return Settings()
--- a/investment_engine/debug_scenario.py
+++ b/investment_engine/debug_scenario.py
+
+import os
+import json
+from datetime import datetime
+from langchain_groq import ChatGroq
+from langchain_core.messages import SystemMessage, HumanMessage, AIMessage, ToolMessage
+from langgraph.prebuilt import create_react_agent
+from langchain_core.tools import tool
+
+# Set the path so it can import from services
+import sys
+import os
+sys.path.append(os.getcwd())
+
+from config import get_settings
+from services.chat_agent import SerpApi_Search, mcq_scenarios
+
+def test_agent_scenario():
+    settings = get_settings()
+    llm = ChatGroq(
+        api_key=settings.GROQ_API_KEY,
+        model_name=settings.GROQ_MODEL,
+        temperature=0.7,
+    )
+    
+    today = datetime.now().strftime("%Y-%m-%d")
+    system_prompt_str = f"""Role: Expert Investment Advisor AI for markets, strategy, and portfolio education.
+
+1. INVESTMENT ADVICE & REAL-TIME DATA
+Tool: Use SerpApi_Search for all news, current prices (e.g., Gold in Egypt), and market data.
+Date: Today is {today}.
+
+2. MCQ GENERATION (Practice)
+**. SCENARIO PRESENTATION (UI/UX):**
+When a scenario is retrieved, present it with high readability using this exact structure:
+---
+### 📊 Investment Case Study
+> [Insert a concise, professional paragraph describing the situation.]
+
+**Key Market Data:**
+* 💵 **Initial Capital:** [Value]
+* 📈 **Asset Class:** [Type]
+* ⏱️ **Time Horizon:** [Duration]
+* ⚠️ **Risk Level:** [Rating]
+
+**Select the Best Course of Action:**
+* **A)** [Option A text]
+* **B)** [Option B text]
+* **C)** [Option C text]
+* **D)** [Option D text]
+---
+*Instruction: Wait for the user's letter (A-D) before providing the rationale.*
+
+3. MCQ SCENARIO QUESTIONS (Database)
+Tool: Use mcq_scenarios."""
+    
+    tools = [SerpApi_Search, mcq_scenarios]
+    agent_executor = create_react_agent(llm, tools, prompt=system_prompt_str)
+    
+    user_message = "provide a scenario"
+    print(f"User: {user_message}")
+    
+    response = agent_executor.invoke({
+        "messages": [HumanMessage(content=user_message)]
+    })
+    
+    messages = response["messages"]
+    for i, msg in enumerate(messages):
+        print(f"\n--- Message {i} ({type(msg).__name__}) ---")
+        try:
+            print(f"Content: {msg.content}")
+        except UnicodeEncodeError:
+            print(f"Content (UTF-8 bytes): {msg.content.encode('utf-8')}")
+            
+        if hasattr(msg, 'tool_calls'):
+            print(f"Tool Calls: {msg.tool_calls}")
+
+    last_msg = messages[-1]
+    try:
+        print(f"\nFinal Answer: '{last_msg.content}'")
+    except UnicodeEncodeError:
+        print(f"\nFinal Answer (UTF-8 bytes): '{last_msg.content.encode('utf-8')}'")
+
+if __name__ == "__main__":
+    test_agent_scenario()
--- a/investment_engine/main.py
+++ b/investment_engine/main.py
+import uvicorn
+from app import app
+from config import get_settings
+
+def start():
+    """Start the FastAPI application."""
+    print("Starting FinSim...")
+    settings = get_settings()
+    if not settings.GROQ_API_KEY:
+        print("WARNING: GROQ_API_KEY is not set in the .env file.")
+        
+    uvicorn.run("app:app", host="0.0.0.0", port=8000, reload=True)
+
+if __name__ == "__main__":
+    start()
--- a/investment_engine/models.py
+++ b/investment_engine/models.py
+"""
+Pydantic models — the contract between every layer of the app.
+"""
+
+from __future__ import annotations
+from pydantic import BaseModel, Field
+from typing import Optional, List
+
+
+# ═══════════════════════════════════════════════════════════════
+# REQUEST MODELS
+# ═══════════════════════════════════════════════════════════════
+
+class GenerateRequest(BaseModel):
+    stock_symbol: str = Field("", examples=["AAPL"])
+    zscore_window: int = Field(default=100, ge=5)
+    zscore_trigger_min: float = Field(default=-2.5)
+    zscore_trigger_max: float = Field(default=2.5)
+    knowledge_base_id: Optional[str] = None
+
+
+class ChatRequest(BaseModel):
+    session_id: str = Field(default="default")
+    message: str
+    knowledge_base_id: Optional[str] = None
+
+
+# ═══════════════════════════════════════════════════════════════
+# Z-SCORE MODELS
+# ═══════════════════════════════════════════════════════════════
+
+class ZScoreEvent(BaseModel):
+    date: str
+    price: float
+    z_score: float
+    event_type: str          # "major" | "normal"
+    context: str
+    direction: str           # "decline" | "rally"
+
+
+class ZScoreResult(BaseModel):
+    events: List[ZScoreEvent]
+    total_events: int
+    window_size: int
+    data_points: int
+
+
+# ═══════════════════════════════════════════════════════════════
+# SCENARIO MODELS
+# ═══════════════════════════════════════════════════════════════
+
+class AnswerOption(BaseModel):
+    answer: str
+    explanation: str
+
+
+class BestAnswer(BaseModel):
+    answer: str
+    rationale: str
+
+
+class GivensTable(BaseModel):
+    date: Optional[str] = None
+    stock_symbol: Optional[str] = None
+    price: Optional[float] = None
+    z_score: Optional[float] = None
+    event_type: Optional[str] = None
+    market_conditions: Optional[str] = None
+    context: Optional[str] = None
+
+    class Config:
+        extra = "allow"          # AI may add extra fields
+
+
+class Scenario(BaseModel):
+    id: str
+    title: str
+    short_description: str = Field(alias="shortDescription", default="")
+    givens_table: GivensTable = Field(alias="givensTable")
+    scenario_paragraph: str = Field(alias="scenarioParagraph", default="")
+    best_answer: BestAnswer = Field(alias="bestAnswer", default_factory=lambda: BestAnswer(answer="Unknown", rationale="Unknown"))
+    
+    # Strictly padding exactly 3 elements to prevent IndexError during DB integration
+    other_answers: list[AnswerOption] = Field(
+        alias="otherAnswers", 
+        default_factory=lambda: [
+            AnswerOption(answer="Unknown A", explanation="TBD"),
+            AnswerOption(answer="Unknown B", explanation="TBD"),
+            AnswerOption(answer="Unknown C", explanation="TBD")
+        ]
+    )
+    
+    event_type: Optional[str] = None
+    risk_level: str = Field(alias="riskLevel", description="Low, Medium, or High", default="Medium")
+
+    class Config:
+        populate_by_name = True
+
+
+class ScenarioGenerationResult(BaseModel):
+    scenarios: list[Scenario]
+    total_possible_scenarios: int = Field(alias="totalPossibleScenarios")
+
+    class Config:
+        populate_by_name = True
+
+
+# ═══════════════════════════════════════════════════════════════
+# CHAT MODELS
+# ═══════════════════════════════════════════════════════════════
+
+class ChatResponse(BaseModel):
+    session_id: str
+    reply: str
--- a/investment_engine/project_description.txt
+++ b/investment_engine/project_description.txt
+# FinSim Investment Engine - Comprehensive Documentation & Setup Guide
+
+## 1. Complete Step-by-Step Setup & Execution Guide
+
+**Prerequisites:**
+- **Python 3.12** or higher installed on your windows system.
+- The **`uv`** package manager (highly recommended as the project uses a `uv.lock` file).
+
+### Step 1: Open Terminal and Navigate to the Project Directory
+Open PowerShell or Command Prompt and navigate to the project folder:
+```powershell
+cd C:\Users\Fares\OneDrive\Desktop\FinSim\investment_engine
+```
+
+### Step 2: Install Dependencies
+Since the project relies on the modern `uv` build tool, we will use it to install the environment perfectly.
+
+If you don't have `uv` installed globally in Python, install it first:
+```powershell
+pip install uv
+```
+
+Now, sync the dependencies. This command automatically creates a `.venv` virtual environment in the folder and strictly installs everything in `uv.lock` (like FastAPI, LangChain, Polars):
+```powershell
+uv sync
+```
+*(If you are avoiding `uv` for any reason, you can manually use standard pip instead: `python -m venv .venv`, then `.\.venv\Scripts\activate`, then `pip install -e .`)*
+
+### Step 3: Configure Environment Variables
+The application needs secure API keys to talk to the AI and Search platforms.
+1. Make sure you are in `C:\Users\Fares\OneDrive\Desktop\FinSim\investment_engine`.
+2. Create a new text file named exactly `.env` (with a dot at the start).
+3. Open `.env` in Notepad or VSCode and paste the following, replacing the placeholders with your actual keys:
+
+```env
+GROQ_API_KEY="your_groq_api_key_here"
+SERPAPI_API_KEY="your_serpapi_api_key_here"
+```
+*(Note: Important Database credentials for the remote CapRover MySQL instance are already hardcoded/defaulted safely in `config.py`, so you do not need to add DB keys here unless you want to override them).*
+
+### Step 4: Run the Application
+Start the FastAPI server. Because we used `uv`, we can use `uv run` to automatically use the virtual environment without needing to activate it manually.
+
+```powershell
+uv run python main.py
+```
+
+*Output should look like this:*
+```text
+Starting FinSim...
+INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
+INFO:     Started reloader process [...]
+INFO:     Started server process [...]
+INFO:     Waiting for application startup.
+INFO:     Application startup complete.
+```
+
+### Step 5: Access the Web Interface
+1. Open your web browser (Chrome, Edge, etc.).
+2. Go to: [http://localhost:8000](http://localhost:8000)
+3. You will see the FinSim UI dashboard! You can generate historical scenarios on the left panel, and start chatting with the interactive AI on the right.
+
+---
+
+## 2. Granular File Descriptions
+
+### Core Application Layer
+- **`app.py`**
+  - **Purpose:** The central nervous system of the FastAPI app.
+  - **Details:** Mounts the `static/` folder to serve the UI on `/`. It defines the two main POST endpoints: `/generate` (which sequentially calls the Z-score engine, AI scenario generator, and Database insert functions) and `/chat` (which talks to the interactive agent). It also initializes the remote Database table on startup if it isn't there.
+- **`main.py`**
+  - **Purpose:** The immediate execution point.
+  - **Details:** Calls `uvicorn.run("app:app", host="0.0.0.0", port=8000)`. It checks `config.py` upfront to warn you in the terminal if you forgot to set your `GROQ_API_KEY`.
+- **`config.py`**
+  - **Purpose:** Environment and configuration management.
+  - **Details:** Uses `pydantic-settings`. Automatically loads the `.env` file. Defines all default values such as the Groq model name (`llama-3.3-70b-versatile`), remote MySQL server/credentials for CapRover, and the default math triggers for the Z-score logic (like a 100-day window).
+- **`models.py`**
+  - **Purpose:** The strict data types (Pydantic).
+  - **Details:** Enforces rigid shapes for all data flowing through the app. It holds models for HTTP requests (`GenerateRequest`), internal Z-Score calculations (`ZScoreEvent`), and highly nested JSON structures that the LLM is forced to output (`Scenario`, `ScenarioGenerationResult`).
+
+### Business Logic (`services/` Directory)
+- **`services/zscore_engine.py`**
+  - **Purpose:** The high-speed quantitative volatility analyzer.
+  - **Details:** Connects to Yahoo Finance (`yfinance`) to pull 5 years of daily stock prices. Uses `polars` (a blazing fast data library written in Rust) to calculate rolling means, standard deviations, and final Z-scores. Filters out data that exceeds the trigger thresholds. It categorizes dates against a hardcoded list of `KNOWN_EVENTS` (e.g. 2008 Lehman Brothers collapse) to inject real historical context into the data points before returning them.
+- **`services/scenario_gen.py`**
+  - **Purpose:** Connects to Groq AI to generate MCQs.
+  - **Details:** Takes the mathematical events found by `zscore_engine.py` and feeds them to the `llama-3.3-70b-versatile` model via LangChain. A massive system prompt forces the LLM to output pure JSON mapping exactly to the components required by the `Scenario` Pydantic model (Title, paragraph narrative, a best answer with rationale, and 3 decoy answers).
+- **`services/database.py`**
+  - **Purpose:** MySQL persistence layer.
+  - **Details:** Sets up connection pooling to the `scenariodb.caprover.al-arcade.com` server. Includes SQL statements for `init_db()` (table creation) and `insert_scenario()` to log AI-generated MCQs robustly. Exports `get_random_scenario()` specifically for the chatbot to grab quiz questions.
+- **`services/chat_agent.py`**
+  - **Purpose:** The interactive LangChain ReAct (Reasoning and Acting) bot.
+  - **Details:** Creates a conversational agent loop. It gives the AI tools: `@tool SerpApi_Search` for live web lookups (prices/news), and `@tool mcq_scenarios` to fetch DB questions. Maintains temporary session history in a dictionary `_sessions`, ensuring the bot remembers the last 20 messages per user. Complex extraction logic is included to pull the final response string from LangChain's diverse message structures.
+
+### Frontend (`static/` Directory)
+- **`static/index.html`**
+  - **Purpose:** The user-facing dashboard.
+  - **Details:** A clean, zero-dependency HTML file styled completely with CSS Variables (dark theme). It contains a form matching `models.GenerateRequest` on the left that fires Javascript `fetch('/generate')` requests. On the right, it implements a scrollable chat UI that tracks session variables and POSTs arrays of strings to `fetch('/chat')`.
+
+### Dev Tools & Meta Files
+- **`pyproject.toml`**
+  - **Purpose:** Python application package definitions.
+  - **Details:** Specifies that this requires Python >= 3.12 and strictly declares what packages the project needs (fastapi, langchain, yfinance, etc).
+- **`uv.lock`**
+  - **Purpose:** The reproducible dependencies file.
+  - **Details:** Auto-generated by `uv`, it locks the exact hashes and versions of every library tree so developers sharing the project experience zero environment issues.
+- **`.python-version`**
+  - **Purpose:** A tiny text file (just says `3.12`) telling version managers like `pyenv` or `uv` to use Python 3.12 by default here.
+- **`debug_scenario.py`**
+  - **Purpose:** Terminal debugging.
+  - **Details:** A manual script to test the LangChain chat agent loop in isolation inside the terminal, skipping the FastAPI and HTML layer entirely. Great for diagnosing AI tool-calling prompt issues.
+- **`test_extraction_mock.py`**
+  - **Purpose:** Unit testing for parsing LangChain AI formats.
+  - **Details:** LangChain AI messages can randomly return as plain strings, lists of dicts, or nested objects. This mocks fake responses and runs them through the parsing algorithm copied from `chat_agent.py` to assert it successfully extracts plain text in all scenarios without crashing.
--- a/investment_engine/pyproject.toml
+++ b/investment_engine/pyproject.toml
+[project]
+name = "investment-engine"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.12"
+dependencies = [
+    "bcrypt>=5.0.0",
+    "beautifulsoup4>=4.14.3",
+    "chromadb>=1.5.5",
+    "fastapi>=0.135.1",
+    "langchain>=0.3.0",
+    "langchain-chroma>=1.1.0",
+    "langchain-community>=0.4.1",
+    "langchain-core>=0.3.0",
+    "langchain-groq>=0.2.0",
+    "langchain-huggingface>=1.2.1",
+    "langgraph>=1.1.1",
+    "mysql-connector-python>=8.0.0",
+    "numpy>=1.24",
+    "polars>=1.38.1",
+    "pyarrow>=23.0.1",
+    "pydantic>=2.12.5",
+    "pydantic-settings>=2.13.1",
+    "pypdf>=6.9.2",
+    "python-dotenv>=1.2.2",
+    "python-multipart>=0.0.22",
+    "sentence-transformers>=5.3.0",
+    "uvicorn>=0.41.0",
+    "yfinance>=1.2.0",
+]
--- a/investment_engine/services/__pycache__/akinator.cpython-312.pyc
+++ b/investment_engine/services/__pycache__/akinator.cpython-312.pyc
--- a/investment_engine/services/__pycache__/chat_agent.cpython-312.pyc
+++ b/investment_engine/services/__pycache__/chat_agent.cpython-312.pyc
--- a/investment_engine/services/__pycache__/database.cpython-312.pyc
+++ b/investment_engine/services/__pycache__/database.cpython-312.pyc
--- a/investment_engine/services/__pycache__/rag_engine.cpython-312.pyc
+++ b/investment_engine/services/__pycache__/rag_engine.cpython-312.pyc
--- a/investment_engine/services/__pycache__/scenario_gen.cpython-312.pyc
+++ b/investment_engine/services/__pycache__/scenario_gen.cpython-312.pyc
--- a/investment_engine/services/__pycache__/zscore_engine.cpython-312.pyc
+++ b/investment_engine/services/__pycache__/zscore_engine.cpython-312.pyc
--- a/investment_engine/services/akinator.py
+++ b/investment_engine/services/akinator.py
--- a/investment_engine/services/chat_agent.py
+++ b/investment_engine/services/chat_agent.py
--- a/investment_engine/services/database.py
+++ b/investment_engine/services/database.py
+"""
+Database layer — MySQL operations connecting directly to CapRover instance.
+"""
+
+import json
+import mysql.connector
+from mysql.connector import pooling
+from contextlib import contextmanager
+
+from config import get_settings
+from models import Scenario, ScenarioGenerationResult
+
+
+_pool: pooling.MySQLConnectionPool | None = None
+
+
+def _get_pool() -> pooling.MySQLConnectionPool:
+    global _pool
+    if _pool is None:
+        s = get_settings()
+        _pool = pooling.MySQLConnectionPool(
+            pool_name="scenario_pool",
+            pool_size=5,
+            host=s.MYSQL_HOST,
+            port=s.MYSQL_PORT,
+            user=s.MYSQL_USER,
+            password=s.MYSQL_PASSWORD,
+            database=s.MYSQL_DATABASE,
+        )
+    return _pool
+
+
+@contextmanager
+def get_connection():
+    conn = _get_pool().get_connection()
+    try:
+        yield conn
+        conn.commit()
+    except Exception:
+        conn.rollback()
+        raise
+    finally:
+        conn.close()
+
+
+def init_db():
+    """Create the scenarios table if it doesn't exist."""
+    ddl = """
+    CREATE TABLE IF NOT EXISTS scenarios (
+        id                  VARCHAR(20) PRIMARY KEY,
+        title               TEXT NOT NULL,
+        short_description   TEXT,
+        givens_table        JSON,
+        scenario_paragraph  TEXT,
+        best_answer         TEXT,
+        best_answer_rationale TEXT,
+        other_option1       TEXT,
+        other_option1_exp   TEXT,
+        other_option2       TEXT,
+        other_option2_exp   TEXT,
+        other_option3       TEXT,
+        other_option3_exp   TEXT,
+        event_type          VARCHAR(20) DEFAULT 'normal',
+        difficulty          VARCHAR(20) DEFAULT 'medium',
+        category            VARCHAR(100) DEFAULT 'General',
+        risk_level          VARCHAR(20) DEFAULT 'Medium',
+        created_at          TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
+    """
+    with get_connection() as conn:
+        cursor = conn.cursor()
+        cursor.execute(ddl)
+
+
+def insert_scenario(s: Scenario):
+    """Insert one scenario — parameterized, avoiding SQL injection."""
+    sql = """
+        INSERT INTO scenarios (
+            id, title, short_description, givens_table,
+            scenario_paragraph, best_answer, best_answer_rationale,
+            other_option1, other_option1_exp,
+            other_option2, other_option2_exp,
+            other_option3, other_option3_exp,
+            event_type, difficulty, category, risk_level
+        ) VALUES (
+            %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s
+        )
+        ON DUPLICATE KEY UPDATE
+            title = VALUES(title),
+            short_description = VALUES(short_description),
+            givens_table = VALUES(givens_table),
+            scenario_paragraph = VALUES(scenario_paragraph),
+            risk_level = VALUES(risk_level)
+    """
+
+    # Pad extra options if Gemini returns less than 3
+    others = s.other_answers + [
+        type("Obj", (), {"answer": "", "explanation": ""})()
+    ] * 3 
+
+    ctx = (s.givens_table.context or "").lower() if s.givens_table else ""
+    category = "Financial Crisis" if any(
+        kw in ctx for kw in ["crisis", "crash", "covid", "pandemic", "collapse"]
+    ) else "General"
+
+    params = (
+        s.id,
+        s.title,
+        s.short_description,
+        json.dumps(s.givens_table.model_dump() if s.givens_table else {}),
+        s.scenario_paragraph,
+        s.best_answer.answer,
+        s.best_answer.rationale,
+        others[0].answer,
+        others[0].explanation,
+        others[1].answer,
+        others[1].explanation,
+        others[2].answer,
+        others[2].explanation,
+        s.event_type or "normal",
+        "medium",
+        category,
+        s.risk_level
+    )
+
+    with get_connection() as conn:
+        cursor = conn.cursor()
+        cursor.execute(sql, params)
+
+
+def insert_all_scenarios(result: ScenarioGenerationResult) -> int:
+    """Bulk insert all scenarios to CapRover database. Returns count inserted."""
+    count = 0
+    for s in result.scenarios:
+        insert_scenario(s)
+        count += 1
+    return count
+
+
+def get_random_scenario() -> dict | None:
+    """Pull one random scenario for the Interactive Advisor Bot."""
+    sql = """
+        SELECT id, title, short_description, givens_table,
+               scenario_paragraph, best_answer, best_answer_rationale,
+               other_option1, other_option1_exp,
+               other_option2, other_option2_exp,
+               other_option3, other_option3_exp,
+               event_type, difficulty, category, risk_level
+        FROM scenarios
+        ORDER BY RAND()
+        LIMIT 1
+    """
+    with get_connection() as conn:
+        cursor = conn.cursor(dictionary=True)
+        cursor.execute(sql)
+        row = cursor.fetchone()
+    return row
--- a/investment_engine/services/rag_engine.py
+++ b/investment_engine/services/rag_engine.py
+"""
+RAG Engine — Handles document parsing, chunking, and ChromaDB vector storage.
+"""
+
+import os
+import shutil
+import uuid
+import tempfile
+from pathlib import Path
+
+from langchain_huggingface import HuggingFaceEmbeddings
+from langchain_chroma import Chroma
+from langchain_community.document_loaders import PyPDFLoader, TextLoader, WebBaseLoader
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+
+# We store the ChromaDB locally
+DB_DIR = os.path.join(os.path.dirname(os.path.dirname(__file__)), "chroma_data")
+os.makedirs(DB_DIR, exist_ok=True)
+
+_embeddings = None
+
+def get_embeddings():
+    global _embeddings
+    if not _embeddings:
+        _embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
+    return _embeddings
+
+def get_kb_collection(kb_id: str) -> Chroma:
+    """Gets the Chroma vectorstore for a specific knowledge base (collection)."""
+    return Chroma(
+        collection_name=kb_id,
+        embedding_function=get_embeddings(),
+        persist_directory=DB_DIR
+    )
+
+def list_knowledge_bases():
+    """Lists all available knowledge bases by inspecting the Chroma directory or client."""
+    # Since Chroma 0.4+, we interact with the persistent client directly to list collections
+    import chromadb
+    client = chromadb.PersistentClient(path=DB_DIR)
+    collections = client.list_collections()
+    
+    # Each collection is a kb. Return list of dicts.
+    return [{"id": c.name, "name": c.name} for c in collections]
+
+def ingest_document(kb_id: str, file_path_or_url: str, doc_type: str):
+    """
+    Ingests a document or URL into the specified knowledge base.
+    """
+    if doc_type == "pdf":
+        loader = PyPDFLoader(file_path_or_url)
+        docs = loader.load()
+    elif doc_type == "txt":
+        loader = TextLoader(file_path_or_url, encoding="utf-8")
+        docs = loader.load()
+    elif doc_type == "json":
+        import json
+        from langchain_core.documents import Document
+        with open(file_path_or_url, "r", encoding="utf-8") as f:
+            try:
+                data = json.load(f)
+                text_content = json.dumps(data, indent=2)
+            except json.JSONDecodeError:
+                f.seek(0)
+                text_content = f.read()
+        docs = [Document(page_content=text_content, metadata={"source": file_path_or_url})]
+    elif doc_type == "docx":
+        from langchain_community.document_loaders import Docx2txtLoader
+        loader = Docx2txtLoader(file_path_or_url)
+        docs = loader.load()
+    elif doc_type == "url":
+        loader = WebBaseLoader(file_path_or_url)
+        docs = loader.load()
+    else:
+        raise ValueError(f"Unsupported document type: {doc_type}")
+    
+    # Chunking
+    text_splitter = RecursiveCharacterTextSplitter(
+        chunk_size=1000,
+        chunk_overlap=200,
+        separators=["\n\n", "\n", " ", ""]
+    )
+    splits = text_splitter.split_documents(docs)
+    
+    # Store in ChromaDB
+    vectorstore = get_kb_collection(kb_id)
+    vectorstore.add_documents(documents=splits)
+    
+    return len(splits)
+
+def search_kb(kb_id: str, query: str, top_k: int = 3) -> str:
+    """
+    Searches the knowledge base and returns a formatted string of the top context chunks.
+    """
+    try:
+        vectorstore = get_kb_collection(kb_id)
+        docs = vectorstore.similarity_search(query, k=top_k)
+        if not docs:
+            return "No relevant information found in the knowledge base."
+        
+        results = []
+        for i, doc in enumerate(docs):
+            source = doc.metadata.get("source", "Unknown Source")
+            page = doc.metadata.get("page", "")
+            page_info = f" (Page {page})" if page else ""
+            results.append(f"--- Doc {i+1} | Source: {source}{page_info} ---\n{doc.page_content}")
+            
+        return "\n\n".join(results)
+    except Exception as e:
+        print(f"Error searching KB {kb_id}: {e}")
+        return f"Error retrieving from Knowledge Base: {e}"
--- a/investment_engine/services/scenario_gen.py
+++ b/investment_engine/services/scenario_gen.py
+"""
+Scenario Generator — Talks to Gemini using LangChain and strict Pydantic parsing.
+"""
+
+import json
+import uuid
+from typing import Optional
+from langchain_groq import ChatGroq
+from config import get_settings
+from models import ZScoreResult, ScenarioGenerationResult
+
+SYSTEM_PROMPT = """You are a Financial Scenario Generator AI.
+Your task is to create high-quality, pedagogical financial scenarios using historical market data and Z-score analysis.
+
+GENERATE EXACTLY 5 DIVERSE SCENARIOS from the provided events. 
+
+You MUST return ONLY valid JSON matching this exact structure:
+{
+  "totalPossibleScenarios": 5,
+  "scenarios": [
+    {
+      "id": "10-character string (e.g. SCEN-00A3X)",
+      "title": "Clear, descriptive title",
+      "shortDescription": "1-2 sentence overview",
+      "givensTable": {
+        "date": "...",
+        "stockSymbol": "...",
+        "price": 0.0,
+        "zScore": 0.0,
+        "marketConditions": "...",
+        "eventType": "...",
+        "context": "..."
+      },
+      "scenarioParagraph": "Detailed narrative describing the market situation... Clearly mention whether this is a MAJOR CRISIS EVENT or NORMAL MARKET VOLATILITY.",
+      "bestAnswer": {
+        "answer": "...",
+        "rationale": "..."
+      },
+      "otherAnswers": [
+        { "answer": "...", "explanation": "..." },
+        { "answer": "...", "explanation": "..." },
+        { "answer": "...", "explanation": "..." }
+      ],
+      "riskLevel": "Low | Medium | High"
+    }
+  ]
+}
+
+Event-Type Classification Rules:
+- "major" → context contains crisis keywords (COVID, Crash, Financial Crisis, pandemic, collapse, 9/11) OR |Z-score| >= 3.0
+- "normal" → everything else
+
+Risk Classification Rules:
+- "High" → Major market crashes, high volatility during crises.
+- "Medium" → Notable daily volatility or uncertainty.
+- "Low" → Slight corrections or rallies in otherwise stable periods.
+
+Return ONLY valid JSON. No markdown fences, no commentary, no additional text outside the JSON object.
+"""
+
+def _build_user_prompt(
+    stock_symbol: str,
+    zscore_result: ZScoreResult,
+) -> str:
+    events_data = [e.model_dump() for e in zscore_result.events]
+    return (
+        f"Stock Symbol: {stock_symbol}\n"
+        f"Total Events Available: {zscore_result.total_events}\n"
+        f"Window Size: {zscore_result.window_size}\n"
+        f"Data Points: {zscore_result.data_points}\n\n"
+        f"Events Data:\n{json.dumps(events_data, indent=2)}"
+    )
+
+def generate_scenarios(stock_symbol: str, zscore_result: ZScoreResult, knowledge_base_id: Optional[str] = None) -> ScenarioGenerationResult:
+    """
+    Call Groq API → parse structured JSON → return validated Pydantic model.
+    """
+    settings = get_settings()
+    
+    # Initialize Groq client
+    llm = ChatGroq(
+        api_key=settings.GROQ_API_KEY,
+        model_name=settings.GROQ_MODEL,
+        temperature=0.7,
+        max_tokens=8000
+    )
+
+    user_prompt = _build_user_prompt(stock_symbol, zscore_result)
+    
+    system_message = SYSTEM_PROMPT
+    if knowledge_base_id:
+        from services.rag_engine import search_kb
+        style_context = search_kb(knowledge_base_id, "financial investment scenario examples style guide writing", top_k=2)
+        system_message += f"\n\n--- RAG STYLE EXAMPLES & CONTEXT ---\nPlease mimic the tone, structure, or terminology from this retrieved knowledge base if applicable:\n{style_context}\n------------------------------------"
+    
+    # Send message to Groq
+    messages = [
+        ("system", system_message),
+        ("human", user_prompt),
+    ]
+    
+    response = llm.invoke(messages)
+    raw_text = response.content.strip()
+
+    # Parse & Validate
+    try:
+        data = json.loads(raw_text)
+    except json.JSONDecodeError as e:
+        # Fallback if markdown fences sneaked in
+        import re
+        match = re.search(r"```json?\s*(.*?)\s*```", raw_text, re.DOTALL)
+        if match:
+            data = json.loads(match.group(1))
+        else:
+            raise ValueError(f"Groq returned invalid JSON: {e}\n\n{raw_text[:500]}")
+
+    result = ScenarioGenerationResult.model_validate(data)
+
+    # Enrich event_type and context safely
+    for s in result.scenarios:
+        # Enforce globally unique IDs to prevent database collisions
+        s.id = f"SC-{uuid.uuid4().hex[:10].upper()}"
+        
+        if s.givens_table:
+            if s.givens_table.event_type:
+                s.event_type = s.givens_table.event_type
+            else:
+                s.givens_table.event_type = "normal"
+                s.event_type = "normal"
+                
+            if not s.givens_table.context:
+                s.givens_table.context = "General market activity"
+        else:
+            s.event_type = "normal"
+
+    return result
--- a/investment_engine/services/zscore_engine.py
+++ b/investment_engine/services/zscore_engine.py
+"""
+Z-Score Engine — Polars Implementation.
+Replaces the Pandas engine with high-performance lazy execution.
+"""
+
+import yfinance as yf
+import polars as pl
+from datetime import datetime
+from models import ZScoreEvent, ZScoreResult
+
+# Known major historical events for enriched context
+KNOWN_EVENTS = {
+    (2020, 2): "COVID-19 Market Crash — Global pandemic triggers historic sell-off",
+    (2020, 3): "COVID-19 Market Crash — Peak pandemic fear and lockdowns",
+    (2008, 8): "2008 Financial Crisis — Lehman Brothers collapse begins",
+    (2008, 9): "2008 Financial Crisis — Full-blown credit market freeze",
+    (2008, 10): "2008 Financial Crisis — Global contagion and bank bailouts",
+    (2008, 11): "2008 Financial Crisis — Continued deleveraging",
+    (2001, 8): "Dot-com Bubble Aftermath — Tech sector implosion",
+    (2001, 9): "9/11 Attacks — Markets shut down then crash on reopening",
+    (2022, 5): "2022 Bear Market — Fed rate hikes crush growth stocks",
+    (2022, 6): "2022 Bear Market — Inflation fears peak",
+    (2020, 10): "Pre-Election Volatility — Uncertainty ahead of US elections",
+    (2011, 7): "US Debt Ceiling Crisis — S&P downgrades US credit rating",
+    (2011, 8): "US Debt Ceiling Crisis — Market turmoil continues",
+    (2018, 12): "Fed Tightening Scare — December 2018 sell-off",
+    (2015, 8): "China Devaluation Shock — Yuan devaluation rattles global markets",
+}
+
+def _classify_event(z: float, dt: datetime) -> tuple[str, str]:
+    """Classify an event as major/normal and generate context."""
+    year = dt.year
+    month = dt.month
+    abs_z = abs(z)
+
+    known = KNOWN_EVENTS.get((year, month))
+
+    if known or abs_z >= 3.0:
+        event_type = "major"
+        if known:
+            context = known
+        elif z < 0:
+            context = "Significant market decline — potential crisis event"
+        else:
+            context = "Significant market rally — unusual positive movement"
+    else:
+        event_type = "normal"
+        if z < 0:
+            context = "Notable market decline — day-to-day volatility"
+        else:
+            context = "Notable market increase — day-to-day volatility"
+
+    return event_type, context
+
+def identify_events(symbol: str, window: int = 100, trigger_min: float = -2.5, trigger_max: float = 2.5) -> ZScoreResult:
+    """
+    Downloads data with yfinance and calculates rolling Z-scores using Polars.
+    Accepts empty symbol to aggregate a basket of top stocks.
+    Returns filtered ZScoreResult.
+    """
+    import random
+    
+    symbols_to_fetch = [symbol] if symbol else ["SPY", "QQQ", "AAPL", "MSFT", "TSLA", "AMZN", "NVDA", "META"]
+    if not symbol:
+        # Pick 3 random stocks + SPY to avoid massive token blowout
+        symbols_to_fetch = ["SPY"] + random.sample([s for s in symbols_to_fetch if s != "SPY"], 3)
+    
+    all_events = []
+    total_data_points = 0
+    
+    for current_symbol in symbols_to_fetch:
+        try:
+            # 1. Fetch data
+            data = yf.download(current_symbol, period="5y", progress=False)
+            if data.empty:
+                continue
+                
+            # 2. Reset index to get Date as a column, handle multi-layer columns from yf
+            df_pd = data.reset_index()
+            
+            # Flatten columns if yfinance returns multi-index
+            if isinstance(df_pd.columns, pl.DataFrame): # safety fallback
+                 pass
+            
+            new_cols = []
+            for col in df_pd.columns:
+                if isinstance(col, tuple):
+                    # Filter out empty strings from the tuple and join
+                    parts = [str(c) for c in col if c]
+                    new_cols.append('_'.join(parts).strip('_'))
+                else:
+                    new_cols.append(str(col))
+            df_pd.columns = new_cols
+            
+            # Ensure we have Date and Close
+            date_col = next((c for c in df_pd.columns if 'date' in c.lower()), None)
+            close_col = next((c for c in df_pd.columns if 'close' in c.lower() and ('_' not in c or current_symbol.lower() in c.lower() or 'close' == c.lower())), None)
+            
+            if not close_col: # Fallback to just the first column that has 'close'
+                close_col = next((c for c in df_pd.columns if 'close' in c.lower()), None)
+            
+            if not date_col or not close_col:
+                print(f"Skipping {current_symbol}: Could not cleanly identify Date/Close columns.")
+                continue
+                
+            df_pd = df_pd[[date_col, close_col]].rename(columns={date_col: "Date", close_col: "Close"})
+    
+            # Drop NaN
+            df_pd = df_pd.dropna()
+            if len(df_pd) < window:
+                 continue
+    
+            # 3. Process with Polars
+            df = pl.from_pandas(df_pd)
+            
+            # Ensure correct types
+            df = df.with_columns([
+                pl.col("Date").cast(pl.Datetime)
+            ])
+            
+            # 4. Calculate Z-Scores Lazy Execution
+            q = (
+                df.lazy()
+                .with_columns([
+                    pl.col("Close").rolling_mean(window_size=window).alias("Mean"),
+                    pl.col("Close").rolling_std(window_size=window).alias("Std")
+                ])
+                .with_columns([
+                    ((pl.col("Close") - pl.col("Mean")) / pl.col("Std")).alias("Z_Score")
+                ])
+                .drop_nulls() 
+            )
+            
+            # Force computation
+            processed_df = q.collect()
+            total_data_points += len(processed_df)
+            
+            # Filter significant events
+            events_df = processed_df.filter(
+                (pl.col("Z_Score") <= trigger_min) | (pl.col("Z_Score") >= trigger_max)
+            )
+            
+            # Convert to Pydantic models
+            for row in events_df.to_dicts():
+                z = round(float(row["Z_Score"]), 3)
+                dt = row["Date"]
+                event_type, context = _classify_event(z, dt)
+                
+                # Append stock symbol to context for mixed baskets
+                if not symbol:
+                    context = f"[{current_symbol}] " + context
+                    
+                all_events.append(ZScoreEvent(
+                    date=dt.strftime("%Y-%m-%d"),
+                    price=round(float(row["Close"]), 2),
+                    z_score=z,
+                    event_type=event_type,
+                    context=context,
+                    direction="decline" if z < 0 else "rally"
+                ))
+                
+        except Exception as e:
+            print(f"Warning: Z-Score calculation failed for {current_symbol}: {str(e)}")
+            continue
+
+    if not all_events:
+        raise RuntimeError(f"No events found for any requested symbols.")
+        
+    # Sort events by date descending and limit to top 150 to keep LLM context size healthy
+    all_events = sorted(all_events, key=lambda x: abs(x.z_score), reverse=True)[:150]
+
+    return ZScoreResult(
+        events=all_events,
+        total_events=len(all_events),
+        window_size=window,
+        data_points=total_data_points
+    )
--- a/investment_engine/static/index.html
+++ b/investment_engine/static/index.html
--- a/investment_engine/test_extraction_mock.py
+++ b/investment_engine/test_extraction_mock.py
--- a/investment_engine/uv.lock
+++ b/investment_engine/uv.lock