From 3a424d15a57e5822c74dd0f2fdafcea7856ea3ec Mon Sep 17 00:00:00 2001 From: bnair123 <47283308+bnair123@users.noreply.github.com> Date: Wed, 24 Dec 2025 22:03:18 +0400 Subject: [PATCH] Add project context and documentation for Music Analyser This document outlines the vision, technical decisions, current architecture, and future roadmap for the Music Analyser project. It serves as a guide for future AI agents or developers. --- Context.md | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 Context.md diff --git a/Context.md b/Context.md new file mode 100644 index 0000000..1004a07 --- /dev/null +++ b/Context.md @@ -0,0 +1,114 @@ +# Music Analyser - Project Context & Documentation + +This document serves as a comprehensive guide to the **Music Analyser** project. It outlines the vision, technical decisions, current architecture, and future roadmap. **Use this document to provide context to future AI agents or developers.** + +## 1. Project Vision +The goal of this project is to build a personal analytics dashboard that: +1. **Regularly queries** the Spotify API (24/7) to collect a complete history of listening habits. +2. Stores this data locally (or in a private database) to ensure ownership and completeness. +3. Provides **rich analysis** and visualizations (similar to "Spotify Wrapped" but on-demand and more detailed). +4. Integrates **AI (Google Gemini)** to provide qualitative insights, summaries, and trend analysis (e.g., "You started the week with high-energy pop but shifted to lo-fi study beats by Friday"). + +## 2. Roadmap & Phases + +### Phase 1: Foundation & Data Collection (Current Status: ✅ COMPLETED) +- **Goal:** reliable data ingestion and storage. +- **Deliverables:** + - FastAPI Backend. + - SQLite Database (with SQLAlchemy). + - Spotify OAuth logic (Refresh Token flow). + - Background Worker for 24/7 polling. + - Docker containerization + GitHub Actions (Multi-arch build). + +### Phase 2: Visualization (Next Step) +- **Goal:** View the raw data. +- **Deliverables:** + - Frontend (React + Vite). + - Basic Data Table / List View of listening history. + - Basic filtering (by date, artist). + +### Phase 3: Analysis & AI +- **Goal:** Deep insights. +- **Deliverables:** + - Advanced charts/graphs. + - AI Integration (Gemini 2.5/3 Flash) to generate text summaries of listening trends. + - Email reports (optional). + +## 3. Technical Architecture + +### Backend +- **Language:** Python 3.11+ +- **Framework:** FastAPI (High performance, easy to use). +- **Dependencies:** `httpx` (Async HTTP), `sqlalchemy` (ORM), `pydantic` (Validation). + +### Database +- **Current:** SQLite (`music.db`). + - *Decision:* Chosen for simplicity in Phase 1. +- **Future path:** The code uses SQLAlchemy, so migrating to **PostgreSQL** (e.g., Supabase) only requires changing the connection string in `database.py`. + +### Database Schema +1. **`Track` Table:** + - Stores unique tracks. + - Columns: `id` (Spotify ID), `name`, `artist`, `album`, `duration_ms`, `metadata_json` (Stores the *entire* raw Spotify JSON response for future-proofing). +2. **`PlayHistory` Table:** + - Stores the instances of listening. + - Columns: `id`, `track_id` (FK), `played_at` (Timestamp), `context_uri`. + +### Authentication Strategy +- **Challenge:** The background worker runs headless (no user to click "Login"). +- **Solution:** We use the **Authorization Code Flow with Refresh Tokens**. + 1. User runs the local helper script (`backend/scripts/get_refresh_token.py`) once. + 2. This generates a long-lived `SPOTIFY_REFRESH_TOKEN`. + 3. The backend uses this token to automatically request new short-lived Access Tokens whenever needed. + +### Background Worker Logic +- **File:** `backend/run_worker.py` -> `backend/app/ingest.py` +- **Process:** + 1. Worker wakes up every 60 seconds. + 2. Calls Spotify `recently-played` endpoint (limit 50). + 3. Iterates through tracks. + 4. **Deduplication:** Checks `(track_id, played_at)` against the DB. If it exists, skip. If not, insert. + 5. **Metadata:** If the track is new to the system, it saves the full metadata immediately. + +### AI Integration +- **Model:** Google Gemini (Target: 2.5 Flash or 3 Flash). +- **Status:** Service class exists (`AIService`) but is not yet fully wired into the daily workflow. + +### Deployment +- **Docker:** Multi-stage build (python-slim). +- **CI/CD:** GitHub Actions workflow (`docker-publish.yml`). + - Builds for `linux/amd64` and `linux/arm64`. + - Pushes to GitHub Container Registry (ghcr.io). + +## 4. How to Run + +### Prerequisites +- Spotify Client ID & Secret. +- Google Gemini API Key. +- Docker (optional). + +### Local Development +1. **Setup Env:** + ```bash + cp backend/.env.example backend/.env + # Fill in details + ``` +2. **Install:** + ```bash + cd backend + pip install -r requirements.txt + ``` +3. **Run API:** + ```bash + uvicorn app.main:app --reload + ``` +4. **Run Worker:** + ```bash + python run_worker.py + ``` + +### Docker +```bash +docker build -t music-analyser-backend ./backend +docker run --env-file backend/.env music-analyser-backend +```