Add skip tracking, compressed heatmap, listening log, docs, tests, and OpenAI support

Major changes:
- Add skip tracking: poll currently-playing every 15s, detect skips (<30s listened)
- Add listening-log and sessions API endpoints
- Fix ReccoBeats client to extract spotify_id from href response
- Compress heatmap from 24 hours to 6 x 4-hour blocks
- Add OpenAI support in narrative service (use max_completion_tokens for new models)
- Add ListeningLog component with timeline and list views
- Update all frontend components to use real data (album art, play counts)
- Add docker-compose external network (dockernet) support
- Add comprehensive documentation (API, DATA_MODEL, ARCHITECTURE, FRONTEND)
- Add unit tests for ingest and API endpoints
This commit is contained in:
bnair123
2025-12-30 00:15:01 +04:00
parent faee830545
commit 887e78bf47
26 changed files with 1942 additions and 662 deletions

125
docs/API.md Normal file
View File

@@ -0,0 +1,125 @@
# API Documentation
The MusicAnalyser Backend is built with FastAPI. It provides endpoints for data ingestion, listening history retrieval, and AI-powered analysis.
## Base URL
Default local development: `http://localhost:8000`
Docker environment: Proxied via Nginx at `http://localhost:8991/api`
---
## Endpoints
### 1. Root / Health Check
- **URL**: `/`
- **Method**: `GET`
- **Response**:
```json
{
"status": "ok",
"message": "Music Analyser API is running"
}
```
### 2. Get Recent History
Returns a flat list of recently played tracks.
- **URL**: `/history`
- **Method**: `GET`
- **Query Parameters**:
- `limit` (int, default=50): Number of items to return.
- **Response**: List of PlayHistory objects with nested Track data.
### 3. Get Tracks
Returns a list of unique tracks in the database.
- **URL**: `/tracks`
- **Method**: `GET`
- **Query Parameters**:
- `limit` (int, default=50): Number of tracks to return.
### 4. Trigger Spotify Ingestion
Manually triggers a background task to poll Spotify for recently played tracks.
- **URL**: `/trigger-ingest`
- **Method**: `POST`
- **Response**:
```json
{
"status": "Ingestion started in background"
}
```
### 5. Trigger Analysis Pipeline
Runs the full stats calculation and AI narrative generation for a specific timeframe.
- **URL**: `/trigger-analysis`
- **Method**: `POST`
- **Query Parameters**:
- `days` (int, default=30): Number of past days to analyze.
- `model_name` (str): LLM model to use.
- **Response**:
```json
{
"status": "success",
"snapshot_id": 1,
"period": { "start": "...", "end": "..." },
"metrics": { ... },
"narrative": { ... }
}
```
### 6. Get Analysis Snapshots
Retrieves previously saved analysis reports.
- **URL**: `/snapshots`
- **Method**: `GET`
- **Query Parameters**:
- `limit` (int, default=10): Number of snapshots to return.
### 7. Detailed Listening Log
Returns a refined listening log with skip detection and listening duration calculations.
- **URL**: `/listening-log`
- **Method**: `GET`
- **Query Parameters**:
- `days` (int, 1-365, default=7): Timeframe.
- `limit` (int, 1-1000, default=200): Max plays to return.
- **Response**:
```json
{
"plays": [
{
"id": 123,
"track_name": "Song Name",
"artist": "Artist Name",
"played_at": "ISO-TIMESTAMP",
"listened_ms": 180000,
"skipped": false,
"image": "..."
}
],
"period": { "start": "...", "end": "..." }
}
```
### 8. Session Statistics
Groups plays into listening sessions (Marathon, Standard, Micro).
- **URL**: `/sessions`
- **Method**: `GET`
- **Query Parameters**:
- `days` (int, 1-365, default=7): Timeframe.
- **Response**:
```json
{
"sessions": [
{
"start_time": "...",
"end_time": "...",
"duration_minutes": 45,
"track_count": 12,
"type": "Standard"
}
],
"summary": {
"count": 10,
"avg_minutes": 35,
"micro_rate": 0.1,
"marathon_rate": 0.05
}
}
```

43
docs/ARCHITECTURE.md Normal file
View File

@@ -0,0 +1,43 @@
# Architecture Overview
MusicAnalyser is a full-stack personal analytics platform designed to collect, store, and analyze music listening habits using the Spotify API and Google Gemini AI.
## System Components
### 1. Backend (FastAPI)
- **API Layer**: Handles requests from the frontend, manages the database, and triggers analysis.
- **Database**: SQLite used for local storage of listening history, track metadata, and AI snapshots.
- **ORM**: SQLAlchemy manages the data models and relationships.
- **Services**:
- `SpotifyClient`: Handles OAuth2 flow and API requests.
- `StatsService`: Computes complex metrics (heatmaps, sessions, top tracks, hipster scores).
- `NarrativeService`: Interfaces with Google Gemini to generate text-based insights.
- `IngestService`: Manages the logic of fetching and deduplicating Spotify "recently played" data.
### 2. Background Worker
- A standalone Python script (`run_worker.py`) that polls the Spotify API every 60 seconds.
- Ensures a continuous record of listening history even when the dashboard is not open.
### 3. Frontend (React)
- **Framework**: Vite + React.
- **Styling**: Tailwind CSS for a modern, dark-themed dashboard.
- **Visualizations**: Recharts for radar and heatmaps; Framer Motion for animations.
- **State**: Managed via standard React hooks (`useState`, `useEffect`) and local storage for caching.
### 4. External Integrations
- **Spotify API**: Primary data source for tracks, artists, and listening history.
- **ReccoBeats API**: Used for fetching audio features (BPM, Energy, Mood) for tracks.
- **Genius API**: Used for fetching song lyrics to provide deep content analysis.
- **Google Gemini**: Large Language Model used to "roast" the user's taste and generate personas.
## Data Flow
1. **Ingestion**: `Background Worker``Spotify API``Database (PlayHistory)`.
2. **Enrichment**: `Ingest Logic``ReccoBeats/Genius/Spotify``Database (Track/Artist)`.
3. **Analysis**: `Frontend``Backend API``StatsService``NarrativeService (Gemini)``Database (Snapshot)`.
4. **Visualization**: `Frontend``Backend API``Database (Snapshot/Log)`.
## Deployment
- **Containerization**: Both Backend and Frontend are containerized using Docker.
- **Docker Compose**: Orchestrates the backend (including worker) and frontend (Nginx proxy) services.
- **CI/CD**: GitHub Actions builds multi-arch images (amd64/arm64) and pushes to GHCR.

89
docs/DATA_MODEL.md Normal file
View File

@@ -0,0 +1,89 @@
# Data Model Documentation
This document describes the database schema for the MusicAnalyser project. The project uses SQLite with SQLAlchemy as the ORM.
## Entity Relationship Diagram Overview
- **Artist** (Many-to-Many) **Track**
- **Track** (One-to-Many) **PlayHistory**
- **AnalysisSnapshot** (Independent)
---
## Tables
### `artists`
Stores unique artists retrieved from Spotify.
| Field | Type | Description |
|-------|------|-------------|
| `id` | String | Spotify ID (Primary Key) |
| `name` | String | Artist name |
| `genres` | JSON | List of genre strings |
| `image_url` | String | URL to artist profile image |
### `tracks`
Stores unique tracks retrieved from Spotify, enriched with audio features and lyrics.
| Field | Type | Description |
|-------|------|-------------|
| `id` | String | Spotify ID (Primary Key) |
| `name` | String | Track name |
| `artist` | String | Display string for artists (e.g., "Artist A, Artist B") |
| `album` | String | Album name |
| `image_url` | String | URL to album art |
| `duration_ms` | Integer | Track duration in milliseconds |
| `popularity` | Integer | Spotify popularity score (0-100) |
| `raw_data` | JSON | Full raw response from Spotify API for future-proofing |
| `danceability` | Float | Audio feature: Danceability (0.0 to 1.0) |
| `energy` | Float | Audio feature: Energy (0.0 to 1.0) |
| `key` | Integer | Audio feature: Key |
| `loudness` | Float | Audio feature: Loudness in dB |
| `mode` | Integer | Audio feature: Mode (0 for Minor, 1 for Major) |
| `speechiness` | Float | Audio feature: Speechiness (0.0 to 1.0) |
| `acousticness` | Float | Audio feature: Acousticness (0.0 to 1.0) |
| `instrumentalness` | Float | Audio feature: Instrumentalness (0.0 to 1.0) |
| `liveness` | Float | Audio feature: Liveness (0.0 to 1.0) |
| `valence` | Float | Audio feature: Valence (0.0 to 1.0) |
| `tempo` | Float | Audio feature: Tempo in BPM |
| `time_signature` | Integer | Audio feature: Time signature |
| `lyrics` | Text | Full lyrics retrieved from Genius |
| `lyrics_summary` | String | AI-generated summary of lyrics |
| `genre_tags` | String | Combined genre tags for the track |
| `created_at` | DateTime | Timestamp of record creation |
| `updated_at` | DateTime | Timestamp of last update |
### `play_history`
Stores individual listening instances.
| Field | Type | Description |
|-------|------|-------------|
| `id` | Integer | Primary Key (Auto-increment) |
| `track_id` | String | Foreign Key to `tracks.id` |
| `played_at` | DateTime | Timestamp when the track was played |
| `context_uri` | String | Spotify context URI (e.g., playlist or album URI) |
| `listened_ms` | Integer | Computed duration the track was actually heard |
| `skipped` | Boolean | Whether the track was likely skipped |
| `source` | String | Ingestion source (e.g., "spotify_recently_played") |
### `analysis_snapshots`
Stores periodic analysis results generated by the AI service.
| Field | Type | Description |
|-------|------|-------------|
| `id` | Integer | Primary Key |
| `date` | DateTime | When the analysis was performed |
| `period_start` | DateTime | Start of the analyzed period |
| `period_end` | DateTime | End of the analyzed period |
| `period_label` | String | Label for the period (e.g., "last_30_days") |
| `metrics_payload` | JSON | Computed statistics used as input for the AI |
| `narrative_report` | JSON | AI-generated narrative and persona |
| `model_used` | String | LLM model identifier (e.g., "gemini-1.5-flash") |
### `track_artists` (Association Table)
Facilitates the many-to-many relationship between tracks and artists.
| Field | Type | Description |
|-------|------|-------------|
| `track_id` | String | Foreign Key to `tracks.id` |
| `artist_id` | String | Foreign Key to `artists.id` |

61
docs/FRONTEND.md Normal file
View File

@@ -0,0 +1,61 @@
# Frontend Documentation
The frontend is a React application built with Vite and Tailwind CSS. It uses Ant Design for some UI components and Recharts for data visualization.
## Main Components
### `Dashboard.jsx`
The primary layout component that manages data fetching and state.
- **Features**:
- Handles API calls to `/snapshots` and `/trigger-analysis`.
- Implements local storage caching to reduce API load.
- Displays a global loading state during analysis.
- Contains the main header with a refresh trigger.
### `NarrativeSection.jsx`
Displays the AI-generated qualitative analysis.
- **Props**:
- `narrative`: Object containing `persona`, `vibe_check_short`, and `roast`.
- `vibe`: Object containing audio features used to generate dynamic tags.
- **Purpose**: Gives the user a "identity" based on their music taste (e.g., "THE MELANCHOLIC ARCHITECT").
### `StatsGrid.jsx`
A grid of high-level metric cards.
- **Props**:
- `metrics`: The `metrics_payload` from a snapshot.
- **Displays**:
- **Minutes Listened**: Total listening time converted to days.
- **Obsession**: The #1 most played track with album art background.
- **Unique Artists**: Count of different artists encountered.
- **Hipster Score**: A percentage indicating how obscure the user's taste is.
### `VibeRadar.jsx`
Visualizes the "Sonic DNA" of the user.
- **Props**:
- `vibe`: Audio feature averages (acousticness, danceability, energy, etc.).
- **Visuals**:
- **Radar Chart**: Shows the balance of audio features.
- **Mood Clusters**: Floating bubbles representing "Party", "Focus", and "Chill" percentages.
- **Whiplash Meter**: Shows volatility in tempo, energy, and valence between consecutive tracks.
### `TopRotation.jsx`
A horizontal scrolling list of the most played tracks.
- **Props**:
- `volume`: Object containing `top_tracks` array.
- **Purpose**: Quick view of recent favorites.
### `HeatMap.jsx`
Visualizes when the user listens to music.
- **Props**:
- `timeHabits`: Compressed heatmap data (7x6 grid for days/time blocks).
- `sessions`: List of recent listening sessions.
- **Visuals**:
- **Grid**: Days of the week vs. Time blocks (12am, 4am, etc.).
- **Session Timeline**: Vertical list of recent listening bouts with session type (Marathon vs. Micro).
### `ListeningLog.jsx`
A detailed view of individual plays.
- **Features**:
- **Timeline View**: Visualizes listening sessions across the day for the last 7 days.
- **List View**: A table of individual plays with skip status detection.
- **Timeframe Filter**: Toggle between 24h, 7d, 14d, and 30d views.