Add skip tracking, compressed heatmap, listening log, docs, tests, and OpenAI support

Major changes: - Add skip tracking: poll currently-playing every 15s, detect skips (<30s listened) - Add listening-log and sessions API endpoints - Fix ReccoBeats client to extract spotify_id from href response - Compress heatmap from 24 hours to 6 x 4-hour blocks - Add OpenAI support in narrative service (use max_completion_tokens for new models) - Add ListeningLog component with timeline and list views - Update all frontend components to use real data (album art, play counts) - Add docker-compose external network (dockernet) support - Add comprehensive documentation (API, DATA_MODEL, ARCHITECTURE, FRONTEND) - Add unit tests for ingest and API endpoints
2026-02-25 11:46:07 +00:00 · 2025-12-30 00:15:01 +04:00
parent faee830545
commit 887e78bf47
26 changed files with 1942 additions and 662 deletions
--- a/docs/API.md
+++ b/docs/API.md
@@ -0,0 +1,125 @@
+# API Documentation
+
+The MusicAnalyser Backend is built with FastAPI. It provides endpoints for data ingestion, listening history retrieval, and AI-powered analysis.
+
+## Base URL
+Default local development: `http://localhost:8000`
+Docker environment: Proxied via Nginx at `http://localhost:8991/api`
+
+---
+
+## Endpoints
+
+### 1. Root / Health Check
+- **URL**: `/`
+- **Method**: `GET`
+- **Response**:
+  ```json
+  {
+    "status": "ok",
+    "message": "Music Analyser API is running"
+  }
+  ```
+
+### 2. Get Recent History
+Returns a flat list of recently played tracks.
+- **URL**: `/history`
+- **Method**: `GET`
+- **Query Parameters**:
+  - `limit` (int, default=50): Number of items to return.
+- **Response**: List of PlayHistory objects with nested Track data.
+
+### 3. Get Tracks
+Returns a list of unique tracks in the database.
+- **URL**: `/tracks`
+- **Method**: `GET`
+- **Query Parameters**:
+  - `limit` (int, default=50): Number of tracks to return.
+
+### 4. Trigger Spotify Ingestion
+Manually triggers a background task to poll Spotify for recently played tracks.
+- **URL**: `/trigger-ingest`
+- **Method**: `POST`
+- **Response**:
+  ```json
+  {
+    "status": "Ingestion started in background"
+  }
+  ```
+
+### 5. Trigger Analysis Pipeline
+Runs the full stats calculation and AI narrative generation for a specific timeframe.
+- **URL**: `/trigger-analysis`
+- **Method**: `POST`
+- **Query Parameters**:
+  - `days` (int, default=30): Number of past days to analyze.
+  - `model_name` (str): LLM model to use.
+- **Response**:
+  ```json
+  {
+    "status": "success",
+    "snapshot_id": 1,
+    "period": { "start": "...", "end": "..." },
+    "metrics": { ... },
+    "narrative": { ... }
+  }
+  ```
+
+### 6. Get Analysis Snapshots
+Retrieves previously saved analysis reports.
+- **URL**: `/snapshots`
+- **Method**: `GET`
+- **Query Parameters**:
+  - `limit` (int, default=10): Number of snapshots to return.
+
+### 7. Detailed Listening Log
+Returns a refined listening log with skip detection and listening duration calculations.
+- **URL**: `/listening-log`
+- **Method**: `GET`
+- **Query Parameters**:
+  - `days` (int, 1-365, default=7): Timeframe.
+  - `limit` (int, 1-1000, default=200): Max plays to return.
+- **Response**:
+  ```json
+  {
+    "plays": [
+      {
+        "id": 123,
+        "track_name": "Song Name",
+        "artist": "Artist Name",
+        "played_at": "ISO-TIMESTAMP",
+        "listened_ms": 180000,
+        "skipped": false,
+        "image": "..."
+      }
+    ],
+    "period": { "start": "...", "end": "..." }
+  }
+  ```
+
+### 8. Session Statistics
+Groups plays into listening sessions (Marathon, Standard, Micro).
+- **URL**: `/sessions`
+- **Method**: `GET`
+- **Query Parameters**:
+  - `days` (int, 1-365, default=7): Timeframe.
+- **Response**:
+  ```json
+  {
+    "sessions": [
+      {
+        "start_time": "...",
+        "end_time": "...",
+        "duration_minutes": 45,
+        "track_count": 12,
+        "type": "Standard"
+      }
+    ],
+    "summary": {
+      "count": 10,
+      "avg_minutes": 35,
+      "micro_rate": 0.1,
+      "marathon_rate": 0.05
+    }
+  }
+  ```
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -0,0 +1,43 @@
+# Architecture Overview
+
+MusicAnalyser is a full-stack personal analytics platform designed to collect, store, and analyze music listening habits using the Spotify API and Google Gemini AI.
+
+## System Components
+
+### 1. Backend (FastAPI)
+- **API Layer**: Handles requests from the frontend, manages the database, and triggers analysis.
+- **Database**: SQLite used for local storage of listening history, track metadata, and AI snapshots.
+- **ORM**: SQLAlchemy manages the data models and relationships.
+- **Services**:
+  - `SpotifyClient`: Handles OAuth2 flow and API requests.
+  - `StatsService`: Computes complex metrics (heatmaps, sessions, top tracks, hipster scores).
+  - `NarrativeService`: Interfaces with Google Gemini to generate text-based insights.
+  - `IngestService`: Manages the logic of fetching and deduplicating Spotify "recently played" data.
+
+### 2. Background Worker
+- A standalone Python script (`run_worker.py`) that polls the Spotify API every 60 seconds.
+- Ensures a continuous record of listening history even when the dashboard is not open.
+
+### 3. Frontend (React)
+- **Framework**: Vite + React.
+- **Styling**: Tailwind CSS for a modern, dark-themed dashboard.
+- **Visualizations**: Recharts for radar and heatmaps; Framer Motion for animations.
+- **State**: Managed via standard React hooks (`useState`, `useEffect`) and local storage for caching.
+
+### 4. External Integrations
+- **Spotify API**: Primary data source for tracks, artists, and listening history.
+- **ReccoBeats API**: Used for fetching audio features (BPM, Energy, Mood) for tracks.
+- **Genius API**: Used for fetching song lyrics to provide deep content analysis.
+- **Google Gemini**: Large Language Model used to "roast" the user's taste and generate personas.
+
+## Data Flow
+
+1. **Ingestion**: `Background Worker` → `Spotify API` → `Database (PlayHistory)`.
+2. **Enrichment**: `Ingest Logic` → `ReccoBeats/Genius/Spotify` → `Database (Track/Artist)`.
+3. **Analysis**: `Frontend` → `Backend API` → `StatsService` → `NarrativeService (Gemini)` → `Database (Snapshot)`.
+4. **Visualization**: `Frontend` ← `Backend API` ← `Database (Snapshot/Log)`.
+
+## Deployment
+- **Containerization**: Both Backend and Frontend are containerized using Docker.
+- **Docker Compose**: Orchestrates the backend (including worker) and frontend (Nginx proxy) services.
+- **CI/CD**: GitHub Actions builds multi-arch images (amd64/arm64) and pushes to GHCR.
--- a/docs/DATA_MODEL.md
+++ b/docs/DATA_MODEL.md
@@ -0,0 +1,89 @@
+# Data Model Documentation
+
+This document describes the database schema for the MusicAnalyser project. The project uses SQLite with SQLAlchemy as the ORM.
+
+## Entity Relationship Diagram Overview
+
+- **Artist** (Many-to-Many) **Track**
+- **Track** (One-to-Many) **PlayHistory**
+- **AnalysisSnapshot** (Independent)
+
+---
+
+## Tables
+
+### `artists`
+Stores unique artists retrieved from Spotify.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | String | Spotify ID (Primary Key) |
+| `name` | String | Artist name |
+| `genres` | JSON | List of genre strings |
+| `image_url` | String | URL to artist profile image |
+
+### `tracks`
+Stores unique tracks retrieved from Spotify, enriched with audio features and lyrics.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | String | Spotify ID (Primary Key) |
+| `name` | String | Track name |
+| `artist` | String | Display string for artists (e.g., "Artist A, Artist B") |
+| `album` | String | Album name |
+| `image_url` | String | URL to album art |
+| `duration_ms` | Integer | Track duration in milliseconds |
+| `popularity` | Integer | Spotify popularity score (0-100) |
+| `raw_data` | JSON | Full raw response from Spotify API for future-proofing |
+| `danceability` | Float | Audio feature: Danceability (0.0 to 1.0) |
+| `energy` | Float | Audio feature: Energy (0.0 to 1.0) |
+| `key` | Integer | Audio feature: Key |
+| `loudness` | Float | Audio feature: Loudness in dB |
+| `mode` | Integer | Audio feature: Mode (0 for Minor, 1 for Major) |
+| `speechiness` | Float | Audio feature: Speechiness (0.0 to 1.0) |
+| `acousticness` | Float | Audio feature: Acousticness (0.0 to 1.0) |
+| `instrumentalness` | Float | Audio feature: Instrumentalness (0.0 to 1.0) |
+| `liveness` | Float | Audio feature: Liveness (0.0 to 1.0) |
+| `valence` | Float | Audio feature: Valence (0.0 to 1.0) |
+| `tempo` | Float | Audio feature: Tempo in BPM |
+| `time_signature` | Integer | Audio feature: Time signature |
+| `lyrics` | Text | Full lyrics retrieved from Genius |
+| `lyrics_summary` | String | AI-generated summary of lyrics |
+| `genre_tags` | String | Combined genre tags for the track |
+| `created_at` | DateTime | Timestamp of record creation |
+| `updated_at` | DateTime | Timestamp of last update |
+
+### `play_history`
+Stores individual listening instances.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | Integer | Primary Key (Auto-increment) |
+| `track_id` | String | Foreign Key to `tracks.id` |
+| `played_at` | DateTime | Timestamp when the track was played |
+| `context_uri` | String | Spotify context URI (e.g., playlist or album URI) |
+| `listened_ms` | Integer | Computed duration the track was actually heard |
+| `skipped` | Boolean | Whether the track was likely skipped |
+| `source` | String | Ingestion source (e.g., "spotify_recently_played") |
+
+### `analysis_snapshots`
+Stores periodic analysis results generated by the AI service.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | Integer | Primary Key |
+| `date` | DateTime | When the analysis was performed |
+| `period_start` | DateTime | Start of the analyzed period |
+| `period_end` | DateTime | End of the analyzed period |
+| `period_label` | String | Label for the period (e.g., "last_30_days") |
+| `metrics_payload` | JSON | Computed statistics used as input for the AI |
+| `narrative_report` | JSON | AI-generated narrative and persona |
+| `model_used` | String | LLM model identifier (e.g., "gemini-1.5-flash") |
+
+### `track_artists` (Association Table)
+Facilitates the many-to-many relationship between tracks and artists.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `track_id` | String | Foreign Key to `tracks.id` |
+| `artist_id` | String | Foreign Key to `artists.id` |
--- a/docs/FRONTEND.md
+++ b/docs/FRONTEND.md
@@ -0,0 +1,61 @@
+# Frontend Documentation
+
+The frontend is a React application built with Vite and Tailwind CSS. It uses Ant Design for some UI components and Recharts for data visualization.
+
+## Main Components
+
+### `Dashboard.jsx`
+The primary layout component that manages data fetching and state.
+- **Features**: 
+  - Handles API calls to `/snapshots` and `/trigger-analysis`.
+  - Implements local storage caching to reduce API load.
+  - Displays a global loading state during analysis.
+  - Contains the main header with a refresh trigger.
+
+### `NarrativeSection.jsx`
+Displays the AI-generated qualitative analysis.
+- **Props**:
+  - `narrative`: Object containing `persona`, `vibe_check_short`, and `roast`.
+  - `vibe`: Object containing audio features used to generate dynamic tags.
+- **Purpose**: Gives the user a "identity" based on their music taste (e.g., "THE MELANCHOLIC ARCHITECT").
+
+### `StatsGrid.jsx`
+A grid of high-level metric cards.
+- **Props**:
+  - `metrics`: The `metrics_payload` from a snapshot.
+- **Displays**:
+  - **Minutes Listened**: Total listening time converted to days.
+  - **Obsession**: The #1 most played track with album art background.
+  - **Unique Artists**: Count of different artists encountered.
+  - **Hipster Score**: A percentage indicating how obscure the user's taste is.
+
+### `VibeRadar.jsx`
+Visualizes the "Sonic DNA" of the user.
+- **Props**:
+  - `vibe`: Audio feature averages (acousticness, danceability, energy, etc.).
+- **Visuals**:
+  - **Radar Chart**: Shows the balance of audio features.
+  - **Mood Clusters**: Floating bubbles representing "Party", "Focus", and "Chill" percentages.
+  - **Whiplash Meter**: Shows volatility in tempo, energy, and valence between consecutive tracks.
+
+### `TopRotation.jsx`
+A horizontal scrolling list of the most played tracks.
+- **Props**:
+  - `volume`: Object containing `top_tracks` array.
+- **Purpose**: Quick view of recent favorites.
+
+### `HeatMap.jsx`
+Visualizes when the user listens to music.
+- **Props**:
+  - `timeHabits`: Compressed heatmap data (7x6 grid for days/time blocks).
+  - `sessions`: List of recent listening sessions.
+- **Visuals**:
+  - **Grid**: Days of the week vs. Time blocks (12am, 4am, etc.).
+  - **Session Timeline**: Vertical list of recent listening bouts with session type (Marathon vs. Micro).
+
+### `ListeningLog.jsx`
+A detailed view of individual plays.
+- **Features**:
+  - **Timeline View**: Visualizes listening sessions across the day for the last 7 days.
+  - **List View**: A table of individual plays with skip status detection.
+  - **Timeframe Filter**: Toggle between 24h, 7d, 14d, and 30d views.