# Technical Documentation: Stats & Narrative Services ## Overview This document details the implementation of the core analysis engine (`StatsService`) and the AI narration layer (`NarrativeService`). These services transform raw Spotify listening data into computable metrics and human-readable insights. ## 1. StatsService (`backend/app/services/stats_service.py`) The `StatsService` is a deterministic calculation engine. It takes a time range (`period_start` to `period_end`) and aggregates `PlayHistory` records. ### Core Architecture - **Input:** SQLAlchemy Session, Start Datetime, End Datetime. - **Output:** A structured JSON dictionary containing discrete analysis blocks (Volume, Time, Sessions, Vibe, etc.). - **Optimization:** Uses `joinedload` to eagerly fetch `Track` and `Artist` relations, preventing N+1 query performance issues during iteration. ### Metric Logic #### A. Volume & Consumption - **Top Tracks/Artists:** Aggregated by ID, not name, to handle artist renames or duplicates. - **Concentration Metrics:** - **HHI (Herfindahl–Hirschman Index):** Measures diversity. `SUM(share^2)`. Close to 0 = diverse, close to 1 = repetitive. - **Gini Coefficient:** Measures inequality of play distribution. - **Genre Entropy:** `-SUM(p * log(p))` for genre probabilities. Higher = more diverse genre consumption. - **Artists:** Parsed from the `Track.artists` relationship (Many-to-Many) rather than the flat string, ensuring accurate counts for collaborations (e.g., "Drake, Future" counts for both). #### B. Time & Habits - **Part of Day:** Fixed buckets: - Morning: 06:00 - 12:00 - Afternoon: 12:00 - 18:00 - Evening: 18:00 - 23:59 - Night: 00:00 - 06:00 - **Streaks:** Calculates consecutive days with at least one play. - **Active Days:** Count of unique dates with activity. #### C. Session Analytics - **Session Definition:** A sequence of plays where the gap between any two consecutive tracks is ≤ 20 minutes. A gap > 20 minutes starts a new session. - **Energy Arcs:** Compares the `energy` feature of the first and last track in a session. - Rising: Delta > +0.1 - Falling: Delta < -0.1 - Flat: Otherwise #### D. The "Vibe" (Audio Features) - **Aggregation:** Calculates Mean, Standard Deviation, and Percentiles (P10, P50/Median, P90) for all Spotify audio features (Energy, Valence, Danceability, etc.). - **Whiplash Score:** Measures the "volatility" of a listening session. Calculated as the average absolute difference in a feature (Tempo, Energy, Valence) between consecutive tracks. - High Whiplash (> 15-20 for BPM) = Chaotic playlist shuffling. - Low Whiplash = Smooth transitions. - **Profiles:** - **Mood Quadrant:** (Avg Valence, Avg Energy) coordinates. - **Texture:** Acousticness vs. Instrumentalness. #### E. Context & Behavior - **Context URI:** Parsed to determine source (Playlist vs. Album vs. Artist). - **Context Switching:** Percentage of track transitions where the `context_uri` changes. High rate = user is jumping between playlists or albums frequently. #### F. Lifecycle & Discovery - **Discovery:** Tracks played in the current period that were *never* played before `period_start`. - **Obsession:** Tracks with ≥ 5 plays in the current period. - **Skip Detection (Boredom Skips):** - Logic: `(next_start - current_start) < (current_duration - 10s)` - Only counts if the listening time was > 30s (to filter accidental clicks). - Proxy for "User got bored and hit next." --- ## 2. NarrativeService (`backend/app/services/narrative_service.py`) The `NarrativeService` acts as an interpreter. It feeds the raw JSON from `StatsService` into Google's Gemini LLM to generate text. ### Payload Shaping To ensure reliability and manage token costs, the service **does not** send the raw full database dump. It pre-processes the stats: - Truncates top lists to Top 5. - Removes raw transition arrays. - Simplifies nested structures. ### LLM Prompt Engineering The system uses a strict persona ("Witty Music Critic") and enforces specific constraints: - **Output:** Strict JSON. - **Safety:** Explicitly forbidden from making mental health diagnoses (e.g., no "You seem depressed"). - **Content:** Must reference specific numbers from the input stats (e.g., "Your 85% Mainstream Score..."). ### Output Schema The LLM returns a JSON object with: - `vibe_check`: 2-3 paragraph summary. - `patterns`: List of specific observations. - `persona`: A creative 2-3 word label (e.g., "The Genre Chameleon"). - `roast`: A playful critique. - `era_insight`: Commentary on the user's "Musical Age" (weighted avg release year). ## 3. Data Models (`backend/app/models.py`) - **Track:** Stores static metadata and audio features. `raw_data` stores the full Spotify JSON for future-proofing. - **Artist:** Normalized artist entities. Linked to tracks via `track_artists` table. - **PlayHistory:** The timeseries ledger. Links `Track` to a timestamp and context. - **AnalysisSnapshot:** Stores the final output of these services. - `metrics_payload`: The JSON output of `StatsService`. - `narrative_report`: The JSON output of `NarrativeService`.