Commit 98138631 authored by Fares's avatar Fares

FinSim v1

parents
GROQ_API_KEY=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
SERPAPI_API_KEY=0b123e5cf375884f50e23cfae6de2afb76f4b0bac1c05abd700d8357c3ac2377
================================================================================
FinSim — HOW TO RUN (Complete Guide)
================================================================================
Last Updated: March 31, 2026
================================================================================
WHAT WAS INSTALLED ON YOUR MACHINE
================================================================================
1. UV Package Manager (v0.11.2)
- Location: C:\Users\fmfmf\.local\bin\uv.exe
- UV is a modern Python package manager (replaces pip/venv/virtualenv)
2. Python 3.12.13 (installed via UV)
- Managed by UV, no need to install separately
3. Virtual Environment + 67 Python packages
- Location: investment_engine\.venv\
- All dependencies (FastAPI, Polars, yfinance, LangChain, Groq, etc.)
4. LangGraph was added as a missing dependency to pyproject.toml
================================================================================
QUICK START — Run FinSim
================================================================================
Open PowerShell (or Windows Terminal) and run:
cd C:\Users\fmfmf\Desktop\FinSim\FinSim\investment_engine
$env:Path = "C:\Users\fmfmf\.local\bin;$env:Path"
uv run python main.py
Then open your browser to:
http://localhost:8000
That's it! The UI has two panels:
- LEFT: "Generate Historical Scenarios" — fetches stock data, runs AI, saves to DB
- RIGHT: "Interactive Advisor Bot" — chat with the AI investment advisor
================================================================================
STOPPING THE SERVER
================================================================================
Press Ctrl+C in the terminal where the server is running.
================================================================================
DETAILED STEP-BY-STEP (First Time Setup)
================================================================================
If you ever need to set this up on a fresh machine again:
Step 1: Install UV (one-time)
────────────────────────────
Open Powershell and run:
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser -Force
irm https://astral.sh/uv/install.ps1 | iex
Step 2: Add UV to PATH (every new terminal session)
────────────────────────────────────────────────────
$env:Path = "C:\Users\fmfmf\.local\bin;$env:Path"
TIP: To make this permanent, add C:\Users\fmfmf\.local\bin to your
Windows System PATH via:
Settings > System > About > Advanced system settings >
Environment Variables > Path > Edit > New
Step 3: Install Python (one-time)
─────────────────────────────────
uv python install 3.12
Step 4: Navigate to project and sync dependencies
──────────────────────────────────────────────────
cd C:\Users\fmfmf\Desktop\FinSim\FinSim\investment_engine
uv sync
Step 5: Run the application
───────────────────────────
uv run python main.py
================================================================================
DATABASE INFO (Already Connected & Working)
================================================================================
Host: scenariodb.caprover.al-arcade.com
Port: 3306
User: root
Password: Alarcade123#
Database: mcq_app
Table: scenarios (141 scenarios currently in DB)
Other tables in the DB: quiz_attempts, quiz_scenarios, quizzes,
user_scenario_history, users
The database connection is configured in:
investment_engine\config.py (default values)
Connection is tested automatically on app startup (init_db).
================================================================================
API KEYS (Already Configured in .env)
================================================================================
File: investment_engine\.env
Contains:
GROQ_API_KEY — For the Llama-3.3-70b AI (scenario generation + chat)
SERPAPI_API_KEY — For real-time web search in the chat bot
If keys expire, replace them in the .env file. No code changes needed.
================================================================================
API ENDPOINTS
================================================================================
GET / → Serves the Web UI (index.html)
POST /generate → Generates scenarios (body: JSON with stock_symbol, etc.)
POST /chat → Chat with the advisor bot (body: JSON with message)
Example curl for generate:
curl -X POST http://localhost:8000/generate ^
-H "Content-Type: application/json" ^
-d "{\"stock_symbol\":\"AAPL\",\"zscore_window\":100,\"zscore_trigger_min\":-2.5,\"zscore_trigger_max\":2.5}"
Example curl for chat:
curl -X POST http://localhost:8000/chat ^
-H "Content-Type: application/json" ^
-d "{\"session_id\":\"my_session\",\"message\":\"Give me a scenario\"}"
================================================================================
PROJECT FILE STRUCTURE
================================================================================
investment_engine\
├── main.py Entry point (starts Uvicorn server on port 8000)
├── app.py FastAPI app with /generate and /chat routes
├── config.py Environment config (DB creds, API keys, defaults)
├── models.py Pydantic data models (request/response contracts)
├── .env Secret keys (GROQ_API_KEY, SERPAPI_API_KEY)
├── pyproject.toml Project definition + dependencies
├── uv.lock Locked dependency versions
├── services\
│ ├── zscore_engine.py Polars Z-Score calculation + yfinance data fetch
│ ├── scenario_gen.py Groq LLM prompt → structured MCQ generation
│ ├── database.py MySQL connection pool, insert, random fetch
│ └── chat_agent.py LangGraph ReAct agent (advisor bot)
└── static\
└── index.html Frontend UI (dark theme, two-panel layout)
================================================================================
TROUBLESHOOTING
================================================================================
Problem: "uv is not recognized"
Solution: Run this first in your terminal:
$env:Path = "C:\Users\fmfmf\.local\bin;$env:Path"
Problem: "Python was not found"
Solution: Don't run "python" directly. Always use "uv run python ..."
UV manages its own Python installation.
Problem: "GROQ_API_KEY is not set"
Solution: Make sure the .env file exists in the investment_engine folder
and contains your key: GROQ_API_KEY=gsk_...
Problem: "Could not initialize DB on startup"
Solution: Check your internet connection. The MySQL database is remote
(hosted on CapRover). If the host is down, the app will still start
but DB features won't work.
Problem: Port 8000 already in use
Solution: Either stop the other process using port 8000, or edit main.py
and change the port number in: uvicorn.run("app:app", port=8000)
Problem: "No events found for any requested symbols"
Solution: yfinance may have rate-limited you. Wait a minute and try again,
or try a different stock symbol.
Problem: Generation takes too long
Solution: The scenario generation pipeline does 3 things sequentially:
1. Downloads 5 years of stock data (yfinance) — ~5 seconds
2. Calculates Z-Scores with Polars — instant
3. Calls Groq LLM to generate scenarios — ~10-20 seconds
Total: ~15-30 seconds is normal.
================================================================================
USEFUL COMMANDS
================================================================================
Start the server:
uv run python main.py
Add a new dependency:
uv add <package-name>
Update all dependencies:
uv sync --upgrade
Run a one-off Python script:
uv run python <script.py>
Check installed packages:
uv pip list
================================================================================
This diff is collapsed.
================================================================================
FinSim
AI-Powered Investment Simulation & Education Platform
================================================================================
WHAT IS FINSIM?
───────────────────────────────────────────────────────────────────────
FinSim is a full-stack web application that teaches people how to make
smarter investment decisions — using real market data, artificial
intelligence, and interactive simulations.
Think of it as a personal investment training ground: you can chat with
an AI advisor, test your knowledge with scenario-based quizzes, predict
stock movements with a multi-agent AI engine, and even ask "what if I
had invested in Tesla 5 years ago?" and get the real math.
The platform is built for students, aspiring investors, and anyone who
wants to understand how financial markets work — without risking real
money.
CORE FEATURES
───────────────────────────────────────────────────────────────────────
1. USER AUTHENTICATION & DASHBOARD
- Secure login/registration system
- Personal dashboard showing quiz scores, accuracy stats, and
recent performance
- Tracks your learning progress over time
2. AI INVESTMENT ADVISOR (Chat)
- A conversational AI chatbot powered by Groq's LLaMA 3.3 (70B)
- Has access to live market data (real stock prices via yfinance)
and web search (SerpAPI) for current news
- Can answer questions about stocks, bonds, ETFs, portfolio
strategy, and financial concepts
- Can generate practice MCQ scenarios on demand
3. SCENARIO GENERATOR (Z-Score Pipeline)
- Fetches 5 years of real historical stock prices from Yahoo Finance
- Uses statistical analysis (Z-Scores via Polars) to detect
significant market events and anomalies
- Feeds those events into an AI model that generates realistic
investment scenario questions with 4 multiple-choice answers
- Each scenario includes a best answer with rationale and 3 wrong
answers with explanations for why they're wrong
- All scenarios are stored in a MySQL database
4. PRACTICE MCQ QUIZ SYSTEM
- Timed quizzes with real-world investment scenarios
- Configurable: choose number of questions and difficulty
- Instant feedback after each answer with detailed explanations
- Scores are tracked and contribute to your dashboard stats
5. LEADERBOARD
- Competitive ranking across all users
- Shows total score, quizzes taken, and average performance
- Encourages engagement through friendly competition
6. AKINATOR 2.0 — Multi-Agent Investment Prediction Engine
This is the flagship feature. A sophisticated AI system built with
LangGraph (a graph-based AI orchestration framework) that runs
9 interconnected processing nodes:
- ROUTER: Classifies user queries and determines the processing path
- WHAT-IF ENGINE: Analyzes hypothetical past investments using real
historical data ("What if I invested $10K in Apple 3 years ago?")
- ANALYST HUB: A ReAct agent that runs 5 expert tools in parallel:
* Market Data Analyst (live prices, PE ratios, returns)
* News & Sentiment Analyst (current headlines and market mood)
* Risk Assessment Analyst (volatility, Sharpe ratio, drawdown)
* Portfolio Strategy Advisor (allocation recommendations)
* NewsAPI Headlines (dedicated news fetching)
- NEWS SENTIMENT SCORER: Calculates a sentiment score (0-100) from
real headlines using keyword-based analysis
- CONFIDENCE SCORER: Rates the prediction's reliability (0-95%)
based on data completeness and source alignment
- SELF-CORRECTION (CRITIQUE): Reviews the prediction against the
news sentiment — if the AI says "buy" but the news is bearish,
it flags the contradiction with a correction note
- JIT EDUCATION: Scans the response for financial jargon (P/E Ratio,
Volatility, Sharpe Ratio, etc.) and provides plain-English
definitions so beginners can learn as they read
- INVESTMENT MEMO: Compiles the entire analysis into a professional
summary document
- FORMAT RESPONSE: Assembles everything into the final output
Additionally, Akinator 2.0 features a "Panel Discussion Mode" where
10 distinct investor personas (risk manager, quant, aggressive trader,
value investor, macro economist, technical analyst, institutional
banker, crypto enthusiast, behavioral psychologist, ESG advocate)
debate the investment using real data before making a consensus
recommendation.
TECHNOLOGY STACK
───────────────────────────────────────────────────────────────────────
Backend:
- Python 3.12
- FastAPI (web framework and REST API)
- LangGraph (graph-based AI workflow orchestration)
- LangChain (LLM integration and tool calling)
- Groq API with LLaMA 3.3 70B Versatile (large language model)
- yfinance (real-time and historical stock market data)
- SerpAPI (live web search and news)
- Polars (high-performance data processing, written in Rust)
- MySQL (remote database for scenario storage)
- bcrypt (password hashing)
Frontend:
- Vanilla HTML, CSS, JavaScript (no frameworks)
- marked.js (markdown rendering in chat)
- Responsive dark-mode UI with glassmorphism design
- Mobile-friendly with sidebar navigation
Infrastructure:
- UV package manager (reproducible Python environments)
- CapRover (remote MySQL hosting)
HOW IT ALL CONNECTS
───────────────────────────────────────────────────────────────────────
┌─────────────────────┐
│ User Browser │
│ (HTML/CSS/JS UI) │
└────────┬────────────┘
│ HTTP
┌─────────────────────┐
│ FastAPI Server │
│ (app.py) │
└────────┬────────────┘
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ AI Advisor │ │ Akinator │ │ Scenario │
│ (chat_agent) │ │ 2.0 Graph │ │ Generator │
│ ReAct Agent │ │ (9 nodes) │ │ (Z-Scores) │
└──────┬──────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────┐
│ External Services │
│ Groq LLM | yfinance | SerpAPI | MySQL │
└─────────────────────────────────────────────────┘
WHAT MAKES THIS PROJECT SPECIAL
───────────────────────────────────────────────────────────────────────
1. Real Data, Not Simulations
Unlike most educational tools that use fake numbers, FinSim pulls
live stock prices, real news headlines, and actual historical data.
Every prediction and scenario is grounded in reality.
2. Multi-Agent AI Architecture
The Akinator 2.0 doesn't just call one AI model — it orchestrates
multiple specialized "agents" (market analyst, risk assessor, news
scanner, strategy advisor) that each gather different data, then
synthesizes their findings into a unified recommendation.
3. Self-Correcting AI
The system reviews its own predictions against current news
sentiment. If there's a contradiction, it flags it automatically.
This teaches users that even AI predictions need critical thinking.
4. Learning While Using
The JIT Education system detects financial jargon in AI responses
and explains terms in plain English. Users learn new financial
vocabulary naturally as they interact with the system.
5. Graph-Based AI Orchestration
Built with LangGraph, the Akinator uses a directed graph where
data flows through 9 processing nodes with conditional branching.
This is the same architecture used by production AI systems at
companies like Google and OpenAI.
6. Full-Stack, Production-Quality
User authentication, database persistence, responsive mobile UI,
leaderboards, error handling, rate limit management — this isn't
a prototype, it's a complete application.
================================================================================
Built with Python, FastAPI, LangGraph, LangChain, Groq AI, and love.
================================================================================
This diff is collapsed.
"""
Configuration — central environment variables definitions using pydantic-settings
"""
from pydantic_settings import BaseSettings
from functools import lru_cache
class Settings(BaseSettings):
# ── Groq API ────────────────────────────────────────────
GROQ_API_KEY: str = ""
# Model used for scenario generation and chat
GROQ_MODEL: str = "llama-3.3-70b-versatile"
# ── SerpAPI ─────────────────────────────────────────────
SERPAPI_API_KEY: str = ""
# ── MySQL (Al-Arcade Remote DB) ───────────────────────
MYSQL_HOST: str = "scenariodb.caprover.al-arcade.com"
MYSQL_PORT: int = 3306
MYSQL_USER: str = "root"
MYSQL_PASSWORD: str = "Alarcade123#"
MYSQL_DATABASE: str = "mcq_app"
# ── Defaults for Z-Score ──────────────────────────────
DEFAULT_ZSCORE_WINDOW: int = 100
DEFAULT_ZSCORE_TRIGGER_MIN: float = -2.5
DEFAULT_ZSCORE_TRIGGER_MAX: float = 2.5
class Config:
env_file = ".env"
env_file_encoding = "utf-8"
@lru_cache()
def get_settings() -> Settings:
return Settings()
import os
import json
from datetime import datetime
from langchain_groq import ChatGroq
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage, ToolMessage
from langgraph.prebuilt import create_react_agent
from langchain_core.tools import tool
# Set the path so it can import from services
import sys
import os
sys.path.append(os.getcwd())
from config import get_settings
from services.chat_agent import SerpApi_Search, mcq_scenarios
def test_agent_scenario():
settings = get_settings()
llm = ChatGroq(
api_key=settings.GROQ_API_KEY,
model_name=settings.GROQ_MODEL,
temperature=0.7,
)
today = datetime.now().strftime("%Y-%m-%d")
system_prompt_str = f"""Role: Expert Investment Advisor AI for markets, strategy, and portfolio education.
1. INVESTMENT ADVICE & REAL-TIME DATA
Tool: Use SerpApi_Search for all news, current prices (e.g., Gold in Egypt), and market data.
Date: Today is {today}.
2. MCQ GENERATION (Practice)
**. SCENARIO PRESENTATION (UI/UX):**
When a scenario is retrieved, present it with high readability using this exact structure:
---
### 📊 Investment Case Study
> [Insert a concise, professional paragraph describing the situation.]
**Key Market Data:**
* 💵 **Initial Capital:** [Value]
* 📈 **Asset Class:** [Type]
* ⏱️ **Time Horizon:** [Duration]
* ⚠️ **Risk Level:** [Rating]
**Select the Best Course of Action:**
* **A)** [Option A text]
* **B)** [Option B text]
* **C)** [Option C text]
* **D)** [Option D text]
---
*Instruction: Wait for the user's letter (A-D) before providing the rationale.*
3. MCQ SCENARIO QUESTIONS (Database)
Tool: Use mcq_scenarios."""
tools = [SerpApi_Search, mcq_scenarios]
agent_executor = create_react_agent(llm, tools, prompt=system_prompt_str)
user_message = "provide a scenario"
print(f"User: {user_message}")
response = agent_executor.invoke({
"messages": [HumanMessage(content=user_message)]
})
messages = response["messages"]
for i, msg in enumerate(messages):
print(f"\n--- Message {i} ({type(msg).__name__}) ---")
try:
print(f"Content: {msg.content}")
except UnicodeEncodeError:
print(f"Content (UTF-8 bytes): {msg.content.encode('utf-8')}")
if hasattr(msg, 'tool_calls'):
print(f"Tool Calls: {msg.tool_calls}")
last_msg = messages[-1]
try:
print(f"\nFinal Answer: '{last_msg.content}'")
except UnicodeEncodeError:
print(f"\nFinal Answer (UTF-8 bytes): '{last_msg.content.encode('utf-8')}'")
if __name__ == "__main__":
test_agent_scenario()
import uvicorn
from app import app
from config import get_settings
def start():
"""Start the FastAPI application."""
print("Starting FinSim...")
settings = get_settings()
if not settings.GROQ_API_KEY:
print("WARNING: GROQ_API_KEY is not set in the .env file.")
uvicorn.run("app:app", host="0.0.0.0", port=8000, reload=True)
if __name__ == "__main__":
start()
"""
Pydantic models — the contract between every layer of the app.
"""
from __future__ import annotations
from pydantic import BaseModel, Field
from typing import Optional, List
# ═══════════════════════════════════════════════════════════════
# REQUEST MODELS
# ═══════════════════════════════════════════════════════════════
class GenerateRequest(BaseModel):
stock_symbol: str = Field("", examples=["AAPL"])
zscore_window: int = Field(default=100, ge=5)
zscore_trigger_min: float = Field(default=-2.5)
zscore_trigger_max: float = Field(default=2.5)
knowledge_base_id: Optional[str] = None
class ChatRequest(BaseModel):
session_id: str = Field(default="default")
message: str
knowledge_base_id: Optional[str] = None
# ═══════════════════════════════════════════════════════════════
# Z-SCORE MODELS
# ═══════════════════════════════════════════════════════════════
class ZScoreEvent(BaseModel):
date: str
price: float
z_score: float
event_type: str # "major" | "normal"
context: str
direction: str # "decline" | "rally"
class ZScoreResult(BaseModel):
events: List[ZScoreEvent]
total_events: int
window_size: int
data_points: int
# ═══════════════════════════════════════════════════════════════
# SCENARIO MODELS
# ═══════════════════════════════════════════════════════════════
class AnswerOption(BaseModel):
answer: str
explanation: str
class BestAnswer(BaseModel):
answer: str
rationale: str
class GivensTable(BaseModel):
date: Optional[str] = None
stock_symbol: Optional[str] = None
price: Optional[float] = None
z_score: Optional[float] = None
event_type: Optional[str] = None
market_conditions: Optional[str] = None
context: Optional[str] = None
class Config:
extra = "allow" # AI may add extra fields
class Scenario(BaseModel):
id: str
title: str
short_description: str = Field(alias="shortDescription", default="")
givens_table: GivensTable = Field(alias="givensTable")
scenario_paragraph: str = Field(alias="scenarioParagraph", default="")
best_answer: BestAnswer = Field(alias="bestAnswer", default_factory=lambda: BestAnswer(answer="Unknown", rationale="Unknown"))
# Strictly padding exactly 3 elements to prevent IndexError during DB integration
other_answers: list[AnswerOption] = Field(
alias="otherAnswers",
default_factory=lambda: [
AnswerOption(answer="Unknown A", explanation="TBD"),
AnswerOption(answer="Unknown B", explanation="TBD"),
AnswerOption(answer="Unknown C", explanation="TBD")
]
)
event_type: Optional[str] = None
risk_level: str = Field(alias="riskLevel", description="Low, Medium, or High", default="Medium")
class Config:
populate_by_name = True
class ScenarioGenerationResult(BaseModel):
scenarios: list[Scenario]
total_possible_scenarios: int = Field(alias="totalPossibleScenarios")
class Config:
populate_by_name = True
# ═══════════════════════════════════════════════════════════════
# CHAT MODELS
# ═══════════════════════════════════════════════════════════════
class ChatResponse(BaseModel):
session_id: str
reply: str
# FinSim Investment Engine - Comprehensive Documentation & Setup Guide
## 1. Complete Step-by-Step Setup & Execution Guide
**Prerequisites:**
- **Python 3.12** or higher installed on your windows system.
- The **`uv`** package manager (highly recommended as the project uses a `uv.lock` file).
### Step 1: Open Terminal and Navigate to the Project Directory
Open PowerShell or Command Prompt and navigate to the project folder:
```powershell
cd C:\Users\Fares\OneDrive\Desktop\FinSim\investment_engine
```
### Step 2: Install Dependencies
Since the project relies on the modern `uv` build tool, we will use it to install the environment perfectly.
If you don't have `uv` installed globally in Python, install it first:
```powershell
pip install uv
```
Now, sync the dependencies. This command automatically creates a `.venv` virtual environment in the folder and strictly installs everything in `uv.lock` (like FastAPI, LangChain, Polars):
```powershell
uv sync
```
*(If you are avoiding `uv` for any reason, you can manually use standard pip instead: `python -m venv .venv`, then `.\.venv\Scripts\activate`, then `pip install -e .`)*
### Step 3: Configure Environment Variables
The application needs secure API keys to talk to the AI and Search platforms.
1. Make sure you are in `C:\Users\Fares\OneDrive\Desktop\FinSim\investment_engine`.
2. Create a new text file named exactly `.env` (with a dot at the start).
3. Open `.env` in Notepad or VSCode and paste the following, replacing the placeholders with your actual keys:
```env
GROQ_API_KEY="your_groq_api_key_here"
SERPAPI_API_KEY="your_serpapi_api_key_here"
```
*(Note: Important Database credentials for the remote CapRover MySQL instance are already hardcoded/defaulted safely in `config.py`, so you do not need to add DB keys here unless you want to override them).*
### Step 4: Run the Application
Start the FastAPI server. Because we used `uv`, we can use `uv run` to automatically use the virtual environment without needing to activate it manually.
```powershell
uv run python main.py
```
*Output should look like this:*
```text
Starting FinSim...
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO: Started reloader process [...]
INFO: Started server process [...]
INFO: Waiting for application startup.
INFO: Application startup complete.
```
### Step 5: Access the Web Interface
1. Open your web browser (Chrome, Edge, etc.).
2. Go to: [http://localhost:8000](http://localhost:8000)
3. You will see the FinSim UI dashboard! You can generate historical scenarios on the left panel, and start chatting with the interactive AI on the right.
---
## 2. Granular File Descriptions
### Core Application Layer
- **`app.py`**
- **Purpose:** The central nervous system of the FastAPI app.
- **Details:** Mounts the `static/` folder to serve the UI on `/`. It defines the two main POST endpoints: `/generate` (which sequentially calls the Z-score engine, AI scenario generator, and Database insert functions) and `/chat` (which talks to the interactive agent). It also initializes the remote Database table on startup if it isn't there.
- **`main.py`**
- **Purpose:** The immediate execution point.
- **Details:** Calls `uvicorn.run("app:app", host="0.0.0.0", port=8000)`. It checks `config.py` upfront to warn you in the terminal if you forgot to set your `GROQ_API_KEY`.
- **`config.py`**
- **Purpose:** Environment and configuration management.
- **Details:** Uses `pydantic-settings`. Automatically loads the `.env` file. Defines all default values such as the Groq model name (`llama-3.3-70b-versatile`), remote MySQL server/credentials for CapRover, and the default math triggers for the Z-score logic (like a 100-day window).
- **`models.py`**
- **Purpose:** The strict data types (Pydantic).
- **Details:** Enforces rigid shapes for all data flowing through the app. It holds models for HTTP requests (`GenerateRequest`), internal Z-Score calculations (`ZScoreEvent`), and highly nested JSON structures that the LLM is forced to output (`Scenario`, `ScenarioGenerationResult`).
### Business Logic (`services/` Directory)
- **`services/zscore_engine.py`**
- **Purpose:** The high-speed quantitative volatility analyzer.
- **Details:** Connects to Yahoo Finance (`yfinance`) to pull 5 years of daily stock prices. Uses `polars` (a blazing fast data library written in Rust) to calculate rolling means, standard deviations, and final Z-scores. Filters out data that exceeds the trigger thresholds. It categorizes dates against a hardcoded list of `KNOWN_EVENTS` (e.g. 2008 Lehman Brothers collapse) to inject real historical context into the data points before returning them.
- **`services/scenario_gen.py`**
- **Purpose:** Connects to Groq AI to generate MCQs.
- **Details:** Takes the mathematical events found by `zscore_engine.py` and feeds them to the `llama-3.3-70b-versatile` model via LangChain. A massive system prompt forces the LLM to output pure JSON mapping exactly to the components required by the `Scenario` Pydantic model (Title, paragraph narrative, a best answer with rationale, and 3 decoy answers).
- **`services/database.py`**
- **Purpose:** MySQL persistence layer.
- **Details:** Sets up connection pooling to the `scenariodb.caprover.al-arcade.com` server. Includes SQL statements for `init_db()` (table creation) and `insert_scenario()` to log AI-generated MCQs robustly. Exports `get_random_scenario()` specifically for the chatbot to grab quiz questions.
- **`services/chat_agent.py`**
- **Purpose:** The interactive LangChain ReAct (Reasoning and Acting) bot.
- **Details:** Creates a conversational agent loop. It gives the AI tools: `@tool SerpApi_Search` for live web lookups (prices/news), and `@tool mcq_scenarios` to fetch DB questions. Maintains temporary session history in a dictionary `_sessions`, ensuring the bot remembers the last 20 messages per user. Complex extraction logic is included to pull the final response string from LangChain's diverse message structures.
### Frontend (`static/` Directory)
- **`static/index.html`**
- **Purpose:** The user-facing dashboard.
- **Details:** A clean, zero-dependency HTML file styled completely with CSS Variables (dark theme). It contains a form matching `models.GenerateRequest` on the left that fires Javascript `fetch('/generate')` requests. On the right, it implements a scrollable chat UI that tracks session variables and POSTs arrays of strings to `fetch('/chat')`.
### Dev Tools & Meta Files
- **`pyproject.toml`**
- **Purpose:** Python application package definitions.
- **Details:** Specifies that this requires Python >= 3.12 and strictly declares what packages the project needs (fastapi, langchain, yfinance, etc).
- **`uv.lock`**
- **Purpose:** The reproducible dependencies file.
- **Details:** Auto-generated by `uv`, it locks the exact hashes and versions of every library tree so developers sharing the project experience zero environment issues.
- **`.python-version`**
- **Purpose:** A tiny text file (just says `3.12`) telling version managers like `pyenv` or `uv` to use Python 3.12 by default here.
- **`debug_scenario.py`**
- **Purpose:** Terminal debugging.
- **Details:** A manual script to test the LangChain chat agent loop in isolation inside the terminal, skipping the FastAPI and HTML layer entirely. Great for diagnosing AI tool-calling prompt issues.
- **`test_extraction_mock.py`**
- **Purpose:** Unit testing for parsing LangChain AI formats.
- **Details:** LangChain AI messages can randomly return as plain strings, lists of dicts, or nested objects. This mocks fake responses and runs them through the parsing algorithm copied from `chat_agent.py` to assert it successfully extracts plain text in all scenarios without crashing.
[project]
name = "investment-engine"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
"bcrypt>=5.0.0",
"beautifulsoup4>=4.14.3",
"chromadb>=1.5.5",
"fastapi>=0.135.1",
"langchain>=0.3.0",
"langchain-chroma>=1.1.0",
"langchain-community>=0.4.1",
"langchain-core>=0.3.0",
"langchain-groq>=0.2.0",
"langchain-huggingface>=1.2.1",
"langgraph>=1.1.1",
"mysql-connector-python>=8.0.0",
"numpy>=1.24",
"polars>=1.38.1",
"pyarrow>=23.0.1",
"pydantic>=2.12.5",
"pydantic-settings>=2.13.1",
"pypdf>=6.9.2",
"python-dotenv>=1.2.2",
"python-multipart>=0.0.22",
"sentence-transformers>=5.3.0",
"uvicorn>=0.41.0",
"yfinance>=1.2.0",
]
This diff is collapsed.
This diff is collapsed.
"""
Database layer — MySQL operations connecting directly to CapRover instance.
"""
import json
import mysql.connector
from mysql.connector import pooling
from contextlib import contextmanager
from config import get_settings
from models import Scenario, ScenarioGenerationResult
_pool: pooling.MySQLConnectionPool | None = None
def _get_pool() -> pooling.MySQLConnectionPool:
global _pool
if _pool is None:
s = get_settings()
_pool = pooling.MySQLConnectionPool(
pool_name="scenario_pool",
pool_size=5,
host=s.MYSQL_HOST,
port=s.MYSQL_PORT,
user=s.MYSQL_USER,
password=s.MYSQL_PASSWORD,
database=s.MYSQL_DATABASE,
)
return _pool
@contextmanager
def get_connection():
conn = _get_pool().get_connection()
try:
yield conn
conn.commit()
except Exception:
conn.rollback()
raise
finally:
conn.close()
def init_db():
"""Create the scenarios table if it doesn't exist."""
ddl = """
CREATE TABLE IF NOT EXISTS scenarios (
id VARCHAR(20) PRIMARY KEY,
title TEXT NOT NULL,
short_description TEXT,
givens_table JSON,
scenario_paragraph TEXT,
best_answer TEXT,
best_answer_rationale TEXT,
other_option1 TEXT,
other_option1_exp TEXT,
other_option2 TEXT,
other_option2_exp TEXT,
other_option3 TEXT,
other_option3_exp TEXT,
event_type VARCHAR(20) DEFAULT 'normal',
difficulty VARCHAR(20) DEFAULT 'medium',
category VARCHAR(100) DEFAULT 'General',
risk_level VARCHAR(20) DEFAULT 'Medium',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
"""
with get_connection() as conn:
cursor = conn.cursor()
cursor.execute(ddl)
def insert_scenario(s: Scenario):
"""Insert one scenario — parameterized, avoiding SQL injection."""
sql = """
INSERT INTO scenarios (
id, title, short_description, givens_table,
scenario_paragraph, best_answer, best_answer_rationale,
other_option1, other_option1_exp,
other_option2, other_option2_exp,
other_option3, other_option3_exp,
event_type, difficulty, category, risk_level
) VALUES (
%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s
)
ON DUPLICATE KEY UPDATE
title = VALUES(title),
short_description = VALUES(short_description),
givens_table = VALUES(givens_table),
scenario_paragraph = VALUES(scenario_paragraph),
risk_level = VALUES(risk_level)
"""
# Pad extra options if Gemini returns less than 3
others = s.other_answers + [
type("Obj", (), {"answer": "", "explanation": ""})()
] * 3
ctx = (s.givens_table.context or "").lower() if s.givens_table else ""
category = "Financial Crisis" if any(
kw in ctx for kw in ["crisis", "crash", "covid", "pandemic", "collapse"]
) else "General"
params = (
s.id,
s.title,
s.short_description,
json.dumps(s.givens_table.model_dump() if s.givens_table else {}),
s.scenario_paragraph,
s.best_answer.answer,
s.best_answer.rationale,
others[0].answer,
others[0].explanation,
others[1].answer,
others[1].explanation,
others[2].answer,
others[2].explanation,
s.event_type or "normal",
"medium",
category,
s.risk_level
)
with get_connection() as conn:
cursor = conn.cursor()
cursor.execute(sql, params)
def insert_all_scenarios(result: ScenarioGenerationResult) -> int:
"""Bulk insert all scenarios to CapRover database. Returns count inserted."""
count = 0
for s in result.scenarios:
insert_scenario(s)
count += 1
return count
def get_random_scenario() -> dict | None:
"""Pull one random scenario for the Interactive Advisor Bot."""
sql = """
SELECT id, title, short_description, givens_table,
scenario_paragraph, best_answer, best_answer_rationale,
other_option1, other_option1_exp,
other_option2, other_option2_exp,
other_option3, other_option3_exp,
event_type, difficulty, category, risk_level
FROM scenarios
ORDER BY RAND()
LIMIT 1
"""
with get_connection() as conn:
cursor = conn.cursor(dictionary=True)
cursor.execute(sql)
row = cursor.fetchone()
return row
"""
RAG Engine — Handles document parsing, chunking, and ChromaDB vector storage.
"""
import os
import shutil
import uuid
import tempfile
from pathlib import Path
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import PyPDFLoader, TextLoader, WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
# We store the ChromaDB locally
DB_DIR = os.path.join(os.path.dirname(os.path.dirname(__file__)), "chroma_data")
os.makedirs(DB_DIR, exist_ok=True)
_embeddings = None
def get_embeddings():
global _embeddings
if not _embeddings:
_embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
return _embeddings
def get_kb_collection(kb_id: str) -> Chroma:
"""Gets the Chroma vectorstore for a specific knowledge base (collection)."""
return Chroma(
collection_name=kb_id,
embedding_function=get_embeddings(),
persist_directory=DB_DIR
)
def list_knowledge_bases():
"""Lists all available knowledge bases by inspecting the Chroma directory or client."""
# Since Chroma 0.4+, we interact with the persistent client directly to list collections
import chromadb
client = chromadb.PersistentClient(path=DB_DIR)
collections = client.list_collections()
# Each collection is a kb. Return list of dicts.
return [{"id": c.name, "name": c.name} for c in collections]
def ingest_document(kb_id: str, file_path_or_url: str, doc_type: str):
"""
Ingests a document or URL into the specified knowledge base.
"""
if doc_type == "pdf":
loader = PyPDFLoader(file_path_or_url)
docs = loader.load()
elif doc_type == "txt":
loader = TextLoader(file_path_or_url, encoding="utf-8")
docs = loader.load()
elif doc_type == "json":
import json
from langchain_core.documents import Document
with open(file_path_or_url, "r", encoding="utf-8") as f:
try:
data = json.load(f)
text_content = json.dumps(data, indent=2)
except json.JSONDecodeError:
f.seek(0)
text_content = f.read()
docs = [Document(page_content=text_content, metadata={"source": file_path_or_url})]
elif doc_type == "docx":
from langchain_community.document_loaders import Docx2txtLoader
loader = Docx2txtLoader(file_path_or_url)
docs = loader.load()
elif doc_type == "url":
loader = WebBaseLoader(file_path_or_url)
docs = loader.load()
else:
raise ValueError(f"Unsupported document type: {doc_type}")
# Chunking
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", " ", ""]
)
splits = text_splitter.split_documents(docs)
# Store in ChromaDB
vectorstore = get_kb_collection(kb_id)
vectorstore.add_documents(documents=splits)
return len(splits)
def search_kb(kb_id: str, query: str, top_k: int = 3) -> str:
"""
Searches the knowledge base and returns a formatted string of the top context chunks.
"""
try:
vectorstore = get_kb_collection(kb_id)
docs = vectorstore.similarity_search(query, k=top_k)
if not docs:
return "No relevant information found in the knowledge base."
results = []
for i, doc in enumerate(docs):
source = doc.metadata.get("source", "Unknown Source")
page = doc.metadata.get("page", "")
page_info = f" (Page {page})" if page else ""
results.append(f"--- Doc {i+1} | Source: {source}{page_info} ---\n{doc.page_content}")
return "\n\n".join(results)
except Exception as e:
print(f"Error searching KB {kb_id}: {e}")
return f"Error retrieving from Knowledge Base: {e}"
"""
Scenario Generator — Talks to Gemini using LangChain and strict Pydantic parsing.
"""
import json
import uuid
from typing import Optional
from langchain_groq import ChatGroq
from config import get_settings
from models import ZScoreResult, ScenarioGenerationResult
SYSTEM_PROMPT = """You are a Financial Scenario Generator AI.
Your task is to create high-quality, pedagogical financial scenarios using historical market data and Z-score analysis.
GENERATE EXACTLY 5 DIVERSE SCENARIOS from the provided events.
You MUST return ONLY valid JSON matching this exact structure:
{
"totalPossibleScenarios": 5,
"scenarios": [
{
"id": "10-character string (e.g. SCEN-00A3X)",
"title": "Clear, descriptive title",
"shortDescription": "1-2 sentence overview",
"givensTable": {
"date": "...",
"stockSymbol": "...",
"price": 0.0,
"zScore": 0.0,
"marketConditions": "...",
"eventType": "...",
"context": "..."
},
"scenarioParagraph": "Detailed narrative describing the market situation... Clearly mention whether this is a MAJOR CRISIS EVENT or NORMAL MARKET VOLATILITY.",
"bestAnswer": {
"answer": "...",
"rationale": "..."
},
"otherAnswers": [
{ "answer": "...", "explanation": "..." },
{ "answer": "...", "explanation": "..." },
{ "answer": "...", "explanation": "..." }
],
"riskLevel": "Low | Medium | High"
}
]
}
Event-Type Classification Rules:
- "major" → context contains crisis keywords (COVID, Crash, Financial Crisis, pandemic, collapse, 9/11) OR |Z-score| >= 3.0
- "normal" → everything else
Risk Classification Rules:
- "High" → Major market crashes, high volatility during crises.
- "Medium" → Notable daily volatility or uncertainty.
- "Low" → Slight corrections or rallies in otherwise stable periods.
Return ONLY valid JSON. No markdown fences, no commentary, no additional text outside the JSON object.
"""
def _build_user_prompt(
stock_symbol: str,
zscore_result: ZScoreResult,
) -> str:
events_data = [e.model_dump() for e in zscore_result.events]
return (
f"Stock Symbol: {stock_symbol}\n"
f"Total Events Available: {zscore_result.total_events}\n"
f"Window Size: {zscore_result.window_size}\n"
f"Data Points: {zscore_result.data_points}\n\n"
f"Events Data:\n{json.dumps(events_data, indent=2)}"
)
def generate_scenarios(stock_symbol: str, zscore_result: ZScoreResult, knowledge_base_id: Optional[str] = None) -> ScenarioGenerationResult:
"""
Call Groq API → parse structured JSON → return validated Pydantic model.
"""
settings = get_settings()
# Initialize Groq client
llm = ChatGroq(
api_key=settings.GROQ_API_KEY,
model_name=settings.GROQ_MODEL,
temperature=0.7,
max_tokens=8000
)
user_prompt = _build_user_prompt(stock_symbol, zscore_result)
system_message = SYSTEM_PROMPT
if knowledge_base_id:
from services.rag_engine import search_kb
style_context = search_kb(knowledge_base_id, "financial investment scenario examples style guide writing", top_k=2)
system_message += f"\n\n--- RAG STYLE EXAMPLES & CONTEXT ---\nPlease mimic the tone, structure, or terminology from this retrieved knowledge base if applicable:\n{style_context}\n------------------------------------"
# Send message to Groq
messages = [
("system", system_message),
("human", user_prompt),
]
response = llm.invoke(messages)
raw_text = response.content.strip()
# Parse & Validate
try:
data = json.loads(raw_text)
except json.JSONDecodeError as e:
# Fallback if markdown fences sneaked in
import re
match = re.search(r"```json?\s*(.*?)\s*```", raw_text, re.DOTALL)
if match:
data = json.loads(match.group(1))
else:
raise ValueError(f"Groq returned invalid JSON: {e}\n\n{raw_text[:500]}")
result = ScenarioGenerationResult.model_validate(data)
# Enrich event_type and context safely
for s in result.scenarios:
# Enforce globally unique IDs to prevent database collisions
s.id = f"SC-{uuid.uuid4().hex[:10].upper()}"
if s.givens_table:
if s.givens_table.event_type:
s.event_type = s.givens_table.event_type
else:
s.givens_table.event_type = "normal"
s.event_type = "normal"
if not s.givens_table.context:
s.givens_table.context = "General market activity"
else:
s.event_type = "normal"
return result
"""
Z-Score Engine — Polars Implementation.
Replaces the Pandas engine with high-performance lazy execution.
"""
import yfinance as yf
import polars as pl
from datetime import datetime
from models import ZScoreEvent, ZScoreResult
# Known major historical events for enriched context
KNOWN_EVENTS = {
(2020, 2): "COVID-19 Market Crash — Global pandemic triggers historic sell-off",
(2020, 3): "COVID-19 Market Crash — Peak pandemic fear and lockdowns",
(2008, 8): "2008 Financial Crisis — Lehman Brothers collapse begins",
(2008, 9): "2008 Financial Crisis — Full-blown credit market freeze",
(2008, 10): "2008 Financial Crisis — Global contagion and bank bailouts",
(2008, 11): "2008 Financial Crisis — Continued deleveraging",
(2001, 8): "Dot-com Bubble Aftermath — Tech sector implosion",
(2001, 9): "9/11 Attacks — Markets shut down then crash on reopening",
(2022, 5): "2022 Bear Market — Fed rate hikes crush growth stocks",
(2022, 6): "2022 Bear Market — Inflation fears peak",
(2020, 10): "Pre-Election Volatility — Uncertainty ahead of US elections",
(2011, 7): "US Debt Ceiling Crisis — S&P downgrades US credit rating",
(2011, 8): "US Debt Ceiling Crisis — Market turmoil continues",
(2018, 12): "Fed Tightening Scare — December 2018 sell-off",
(2015, 8): "China Devaluation Shock — Yuan devaluation rattles global markets",
}
def _classify_event(z: float, dt: datetime) -> tuple[str, str]:
"""Classify an event as major/normal and generate context."""
year = dt.year
month = dt.month
abs_z = abs(z)
known = KNOWN_EVENTS.get((year, month))
if known or abs_z >= 3.0:
event_type = "major"
if known:
context = known
elif z < 0:
context = "Significant market decline — potential crisis event"
else:
context = "Significant market rally — unusual positive movement"
else:
event_type = "normal"
if z < 0:
context = "Notable market decline — day-to-day volatility"
else:
context = "Notable market increase — day-to-day volatility"
return event_type, context
def identify_events(symbol: str, window: int = 100, trigger_min: float = -2.5, trigger_max: float = 2.5) -> ZScoreResult:
"""
Downloads data with yfinance and calculates rolling Z-scores using Polars.
Accepts empty symbol to aggregate a basket of top stocks.
Returns filtered ZScoreResult.
"""
import random
symbols_to_fetch = [symbol] if symbol else ["SPY", "QQQ", "AAPL", "MSFT", "TSLA", "AMZN", "NVDA", "META"]
if not symbol:
# Pick 3 random stocks + SPY to avoid massive token blowout
symbols_to_fetch = ["SPY"] + random.sample([s for s in symbols_to_fetch if s != "SPY"], 3)
all_events = []
total_data_points = 0
for current_symbol in symbols_to_fetch:
try:
# 1. Fetch data
data = yf.download(current_symbol, period="5y", progress=False)
if data.empty:
continue
# 2. Reset index to get Date as a column, handle multi-layer columns from yf
df_pd = data.reset_index()
# Flatten columns if yfinance returns multi-index
if isinstance(df_pd.columns, pl.DataFrame): # safety fallback
pass
new_cols = []
for col in df_pd.columns:
if isinstance(col, tuple):
# Filter out empty strings from the tuple and join
parts = [str(c) for c in col if c]
new_cols.append('_'.join(parts).strip('_'))
else:
new_cols.append(str(col))
df_pd.columns = new_cols
# Ensure we have Date and Close
date_col = next((c for c in df_pd.columns if 'date' in c.lower()), None)
close_col = next((c for c in df_pd.columns if 'close' in c.lower() and ('_' not in c or current_symbol.lower() in c.lower() or 'close' == c.lower())), None)
if not close_col: # Fallback to just the first column that has 'close'
close_col = next((c for c in df_pd.columns if 'close' in c.lower()), None)
if not date_col or not close_col:
print(f"Skipping {current_symbol}: Could not cleanly identify Date/Close columns.")
continue
df_pd = df_pd[[date_col, close_col]].rename(columns={date_col: "Date", close_col: "Close"})
# Drop NaN
df_pd = df_pd.dropna()
if len(df_pd) < window:
continue
# 3. Process with Polars
df = pl.from_pandas(df_pd)
# Ensure correct types
df = df.with_columns([
pl.col("Date").cast(pl.Datetime)
])
# 4. Calculate Z-Scores Lazy Execution
q = (
df.lazy()
.with_columns([
pl.col("Close").rolling_mean(window_size=window).alias("Mean"),
pl.col("Close").rolling_std(window_size=window).alias("Std")
])
.with_columns([
((pl.col("Close") - pl.col("Mean")) / pl.col("Std")).alias("Z_Score")
])
.drop_nulls()
)
# Force computation
processed_df = q.collect()
total_data_points += len(processed_df)
# Filter significant events
events_df = processed_df.filter(
(pl.col("Z_Score") <= trigger_min) | (pl.col("Z_Score") >= trigger_max)
)
# Convert to Pydantic models
for row in events_df.to_dicts():
z = round(float(row["Z_Score"]), 3)
dt = row["Date"]
event_type, context = _classify_event(z, dt)
# Append stock symbol to context for mixed baskets
if not symbol:
context = f"[{current_symbol}] " + context
all_events.append(ZScoreEvent(
date=dt.strftime("%Y-%m-%d"),
price=round(float(row["Close"]), 2),
z_score=z,
event_type=event_type,
context=context,
direction="decline" if z < 0 else "rally"
))
except Exception as e:
print(f"Warning: Z-Score calculation failed for {current_symbol}: {str(e)}")
continue
if not all_events:
raise RuntimeError(f"No events found for any requested symbols.")
# Sort events by date descending and limit to top 150 to keep LLM context size healthy
all_events = sorted(all_events, key=lambda x: abs(x.z_score), reverse=True)[:150]
return ZScoreResult(
events=all_events,
total_events=len(all_events),
window_size=window,
data_points=total_data_points
)
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment