mirror of
https://github.com/bnair123/MusicAnalyser.git
synced 2026-02-25 11:46:07 +00:00
Refactor Stats and Narrative services to match spec
- StatsService: Fixed N+1 queries, added missing metrics (whiplash, entropy, lifecycle), and improved correctness (boundary checks, null handling). - NarrativeService: Added payload shaping for token efficiency, improved JSON robustness, and updated prompts to align with persona specs. - Documentation: Added backend/TECHNICAL_DOCS.md detailing the logic.
This commit is contained in:
95
backend/TECHNICAL_DOCS.md
Normal file
95
backend/TECHNICAL_DOCS.md
Normal file
@@ -0,0 +1,95 @@
|
|||||||
|
# Technical Documentation: Stats & Narrative Services
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
This document details the implementation of the core analysis engine (`StatsService`) and the AI narration layer (`NarrativeService`). These services transform raw Spotify listening data into computable metrics and human-readable insights.
|
||||||
|
|
||||||
|
## 1. StatsService (`backend/app/services/stats_service.py`)
|
||||||
|
|
||||||
|
The `StatsService` is a deterministic calculation engine. It takes a time range (`period_start` to `period_end`) and aggregates `PlayHistory` records.
|
||||||
|
|
||||||
|
### Core Architecture
|
||||||
|
- **Input:** SQLAlchemy Session, Start Datetime, End Datetime.
|
||||||
|
- **Output:** A structured JSON dictionary containing discrete analysis blocks (Volume, Time, Sessions, Vibe, etc.).
|
||||||
|
- **Optimization:** Uses `joinedload` to eagerly fetch `Track` and `Artist` relations, preventing N+1 query performance issues during iteration.
|
||||||
|
|
||||||
|
### Metric Logic
|
||||||
|
|
||||||
|
#### A. Volume & Consumption
|
||||||
|
- **Top Tracks/Artists:** Aggregated by ID, not name, to handle artist renames or duplicates.
|
||||||
|
- **Concentration Metrics:**
|
||||||
|
- **HHI (Herfindahl–Hirschman Index):** Measures diversity. `SUM(share^2)`. Close to 0 = diverse, close to 1 = repetitive.
|
||||||
|
- **Gini Coefficient:** Measures inequality of play distribution.
|
||||||
|
- **Genre Entropy:** `-SUM(p * log(p))` for genre probabilities. Higher = more diverse genre consumption.
|
||||||
|
- **Artists:** Parsed from the `Track.artists` relationship (Many-to-Many) rather than the flat string, ensuring accurate counts for collaborations (e.g., "Drake, Future" counts for both).
|
||||||
|
|
||||||
|
#### B. Time & Habits
|
||||||
|
- **Part of Day:** Fixed buckets:
|
||||||
|
- Morning: 06:00 - 12:00
|
||||||
|
- Afternoon: 12:00 - 18:00
|
||||||
|
- Evening: 18:00 - 23:59
|
||||||
|
- Night: 00:00 - 06:00
|
||||||
|
- **Streaks:** Calculates consecutive days with at least one play.
|
||||||
|
- **Active Days:** Count of unique dates with activity.
|
||||||
|
|
||||||
|
#### C. Session Analytics
|
||||||
|
- **Session Definition:** A sequence of plays where the gap between any two consecutive tracks is ≤ 20 minutes. A gap > 20 minutes starts a new session.
|
||||||
|
- **Energy Arcs:** Compares the `energy` feature of the first and last track in a session.
|
||||||
|
- Rising: Delta > +0.1
|
||||||
|
- Falling: Delta < -0.1
|
||||||
|
- Flat: Otherwise
|
||||||
|
|
||||||
|
#### D. The "Vibe" (Audio Features)
|
||||||
|
- **Aggregation:** Calculates Mean, Standard Deviation, and Percentiles (P10, P50/Median, P90) for all Spotify audio features (Energy, Valence, Danceability, etc.).
|
||||||
|
- **Whiplash Score:** Measures the "volatility" of a listening session. Calculated as the average absolute difference in a feature (Tempo, Energy, Valence) between consecutive tracks.
|
||||||
|
- High Whiplash (> 15-20 for BPM) = Chaotic playlist shuffling.
|
||||||
|
- Low Whiplash = Smooth transitions.
|
||||||
|
- **Profiles:**
|
||||||
|
- **Mood Quadrant:** (Avg Valence, Avg Energy) coordinates.
|
||||||
|
- **Texture:** Acousticness vs. Instrumentalness.
|
||||||
|
|
||||||
|
#### E. Context & Behavior
|
||||||
|
- **Context URI:** Parsed to determine source (Playlist vs. Album vs. Artist).
|
||||||
|
- **Context Switching:** Percentage of track transitions where the `context_uri` changes. High rate = user is jumping between playlists or albums frequently.
|
||||||
|
|
||||||
|
#### F. Lifecycle & Discovery
|
||||||
|
- **Discovery:** Tracks played in the current period that were *never* played before `period_start`.
|
||||||
|
- **Obsession:** Tracks with ≥ 5 plays in the current period.
|
||||||
|
- **Skip Detection (Boredom Skips):**
|
||||||
|
- Logic: `(next_start - current_start) < (current_duration - 10s)`
|
||||||
|
- Only counts if the listening time was > 30s (to filter accidental clicks).
|
||||||
|
- Proxy for "User got bored and hit next."
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. NarrativeService (`backend/app/services/narrative_service.py`)
|
||||||
|
|
||||||
|
The `NarrativeService` acts as an interpreter. It feeds the raw JSON from `StatsService` into Google's Gemini LLM to generate text.
|
||||||
|
|
||||||
|
### Payload Shaping
|
||||||
|
To ensure reliability and manage token costs, the service **does not** send the raw full database dump. It pre-processes the stats:
|
||||||
|
- Truncates top lists to Top 5.
|
||||||
|
- Removes raw transition arrays.
|
||||||
|
- Simplifies nested structures.
|
||||||
|
|
||||||
|
### LLM Prompt Engineering
|
||||||
|
The system uses a strict persona ("Witty Music Critic") and enforces specific constraints:
|
||||||
|
- **Output:** Strict JSON.
|
||||||
|
- **Safety:** Explicitly forbidden from making mental health diagnoses (e.g., no "You seem depressed").
|
||||||
|
- **Content:** Must reference specific numbers from the input stats (e.g., "Your 85% Mainstream Score...").
|
||||||
|
|
||||||
|
### Output Schema
|
||||||
|
The LLM returns a JSON object with:
|
||||||
|
- `vibe_check`: 2-3 paragraph summary.
|
||||||
|
- `patterns`: List of specific observations.
|
||||||
|
- `persona`: A creative 2-3 word label (e.g., "The Genre Chameleon").
|
||||||
|
- `roast`: A playful critique.
|
||||||
|
- `era_insight`: Commentary on the user's "Musical Age" (weighted avg release year).
|
||||||
|
|
||||||
|
## 3. Data Models (`backend/app/models.py`)
|
||||||
|
|
||||||
|
- **Track:** Stores static metadata and audio features. `raw_data` stores the full Spotify JSON for future-proofing.
|
||||||
|
- **Artist:** Normalized artist entities. Linked to tracks via `track_artists` table.
|
||||||
|
- **PlayHistory:** The timeseries ledger. Links `Track` to a timestamp and context.
|
||||||
|
- **AnalysisSnapshot:** Stores the final output of these services.
|
||||||
|
- `metrics_payload`: The JSON output of `StatsService`.
|
||||||
|
- `narrative_report`: The JSON output of `NarrativeService`.
|
||||||
@@ -1,10 +1,11 @@
|
|||||||
import os
|
import os
|
||||||
import json
|
import json
|
||||||
|
import re
|
||||||
import google.generativeai as genai
|
import google.generativeai as genai
|
||||||
from typing import Dict, Any
|
from typing import Dict, Any, List, Optional
|
||||||
|
|
||||||
class NarrativeService:
|
class NarrativeService:
|
||||||
def __init__(self, model_name: str = "gemini-2.5-flash"):
|
def __init__(self, model_name: str = "gemini-2.0-flash-exp"):
|
||||||
self.api_key = os.getenv("GEMINI_API_KEY")
|
self.api_key = os.getenv("GEMINI_API_KEY")
|
||||||
if not self.api_key:
|
if not self.api_key:
|
||||||
print("WARNING: GEMINI_API_KEY not found. LLM features will fail.")
|
print("WARNING: GEMINI_API_KEY not found. LLM features will fail.")
|
||||||
@@ -13,47 +14,111 @@ class NarrativeService:
|
|||||||
|
|
||||||
self.model_name = model_name
|
self.model_name = model_name
|
||||||
|
|
||||||
def generate_narrative(self, stats_json: Dict[str, Any]) -> Dict[str, str]:
|
def generate_full_narrative(self, stats_json: Dict[str, Any]) -> Dict[str, Any]:
|
||||||
|
"""
|
||||||
|
Orchestrates the generation of the full narrative report.
|
||||||
|
Currently uses a single call for consistency and speed.
|
||||||
|
"""
|
||||||
if not self.api_key:
|
if not self.api_key:
|
||||||
return {"error": "Missing API Key"}
|
return self._get_fallback_narrative()
|
||||||
|
|
||||||
|
clean_stats = self._shape_payload(stats_json)
|
||||||
|
|
||||||
prompt = f"""
|
prompt = f"""
|
||||||
You are a witty, insightful, and slightly snarky music critic analyzing a user's listening history.
|
You are a witty, insightful, and slightly snarky music critic analyzing a user's Spotify listening data.
|
||||||
Below is a JSON summary of their listening data.
|
Your goal is to generate a JSON report that acts as a deeper, more honest "Spotify Wrapped".
|
||||||
|
|
||||||
Your goal is to generate a report that feels like a 'Spotify Wrapped' but deeper and more honest.
|
**CORE RULES:**
|
||||||
|
1. **NO Mental Health Diagnoses:** Do not mention depression, anxiety, or therapy. Stick to behavioral descriptors (e.g., "introspective", "high-energy").
|
||||||
|
2. **Be Specific:** Use the provided metrics. Don't say "You like pop," say "Your Mainstream Score of 85% suggests..."
|
||||||
|
3. **Roast Gently:** Be playful but not cruel.
|
||||||
|
4. **JSON Output Only:** Return strictly valid JSON.
|
||||||
|
|
||||||
Please output your response in strict JSON format with the following keys:
|
**DATA TO ANALYZE:**
|
||||||
1. "vibe_check": (String) 2-3 paragraphs describing their overall listening personality.
|
{json.dumps(clean_stats, indent=2)}
|
||||||
2. "patterns": (List of Strings) 3-5 specific observations based on the data (e.g., "You listen to sad music on Tuesdays", "Your Whiplash Score is high").
|
|
||||||
3. "persona": (String) A creative label for the user (e.g., "The Genre Chameleon", "Nostalgic Dad-Rocker", "Algorithm Victim").
|
|
||||||
4. "roast": (String) A playful, harmlessly mean roast about their taste (1-2 sentences).
|
|
||||||
5. "era_insight": (String) A specific comment on their 'Musical Age' and 'Nostalgia Gap'.
|
|
||||||
|
|
||||||
GUIDELINES:
|
**REQUIRED JSON STRUCTURE:**
|
||||||
- **Use the Metrics:** Do not just say "You like pop." Say "Your Mainstream Score of 85% suggests you live on the Top 40."
|
{{
|
||||||
- **Whiplash Score:** If 'whiplash' > 20, comment on their chaotic transitions.
|
"vibe_check": "2-3 paragraphs describing their overall listening personality this period.",
|
||||||
- **Hipster Score:** If 'hipster_score' > 50, call them pretentious; if < 10, call them basic.
|
"patterns": ["Observation 1", "Observation 2", "Observation 3 (Look for specific habits like skipping or late-night sessions)"],
|
||||||
- **Comparison:** Use the 'comparison' block to mention if they are listening more/less or if their mood (valence/energy) has shifted.
|
"persona": "A creative label (e.g., 'The Genre Chameleon', 'Nostalgic Dad-Rocker').",
|
||||||
- **Tone:** Conversational, fun, slightly judgmental but good-natured.
|
"era_insight": "A specific comment on their Musical Age ({clean_stats.get('era', {}).get('musical_age', 'N/A')}) and Nostalgia Gap.",
|
||||||
|
"roast": "A 1-2 sentence playful roast about their taste.",
|
||||||
DATA:
|
"comparison": "A short comment comparing this period to the previous one (if data exists)."
|
||||||
{json.dumps(stats_json, indent=2)}
|
}}
|
||||||
|
|
||||||
OUTPUT (JSON):
|
|
||||||
"""
|
"""
|
||||||
try:
|
try:
|
||||||
model = genai.GenerativeModel(self.model_name)
|
model = genai.GenerativeModel(self.model_name)
|
||||||
response = model.generate_content(prompt)
|
# Use JSON mode if available, otherwise rely on prompt + cleaning
|
||||||
|
response = model.generate_content(
|
||||||
|
prompt,
|
||||||
|
generation_config={"response_mime_type": "application/json"}
|
||||||
|
)
|
||||||
|
|
||||||
# Clean up response to ensure valid JSON
|
return self._clean_and_parse_json(response.text)
|
||||||
text = response.text.strip()
|
|
||||||
if text.startswith("```json"):
|
|
||||||
text = text.replace("```json", "").replace("```", "")
|
|
||||||
elif text.startswith("```"):
|
|
||||||
text = text.replace("```", "")
|
|
||||||
|
|
||||||
return json.loads(text)
|
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
return {"error": str(e), "raw_response": "Error generating narrative."}
|
print(f"LLM Generation Error: {e}")
|
||||||
|
return self._get_fallback_narrative()
|
||||||
|
|
||||||
|
def _shape_payload(self, stats: Dict[str, Any]) -> Dict[str, Any]:
|
||||||
|
"""
|
||||||
|
Compresses the stats JSON to save tokens and focus the LLM.
|
||||||
|
Removes raw lists beyond top 5/10.
|
||||||
|
"""
|
||||||
|
s = stats.copy()
|
||||||
|
|
||||||
|
# Simplify Volume
|
||||||
|
if "volume" in s:
|
||||||
|
s["volume"] = {
|
||||||
|
k: v for k, v in s["volume"].items()
|
||||||
|
if k not in ["top_tracks", "top_artists", "top_albums", "top_genres"]
|
||||||
|
}
|
||||||
|
# Add back condensed top lists (just names)
|
||||||
|
s["volume"]["top_tracks"] = [t["name"] for t in stats["volume"].get("top_tracks", [])[:5]]
|
||||||
|
s["volume"]["top_artists"] = [a["name"] for a in stats["volume"].get("top_artists", [])[:5]]
|
||||||
|
s["volume"]["top_genres"] = [g["name"] for g in stats["volume"].get("top_genres", [])[:5]]
|
||||||
|
|
||||||
|
# Simplify Time (Keep distributions but maybe round them?)
|
||||||
|
# Keeping hourly/daily is fine, they are small arrays.
|
||||||
|
|
||||||
|
# Simplify Vibe (Remove huge transition arrays if they accidentally leaked, though stats service handles this)
|
||||||
|
|
||||||
|
# Remove period details if verbose
|
||||||
|
return s
|
||||||
|
|
||||||
|
def _clean_and_parse_json(self, raw_text: str) -> Dict[str, Any]:
|
||||||
|
"""
|
||||||
|
Robust JSON extractor.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# 1. Try direct parse
|
||||||
|
return json.loads(raw_text)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# 2. Extract between first { and last }
|
||||||
|
try:
|
||||||
|
match = re.search(r"\{.*\}", raw_text, re.DOTALL)
|
||||||
|
if match:
|
||||||
|
return json.loads(match.group(0))
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return self._get_fallback_narrative()
|
||||||
|
|
||||||
|
def _get_fallback_narrative(self) -> Dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"vibe_check": "Data processing error. You're too mysterious for us to analyze right now.",
|
||||||
|
"patterns": [],
|
||||||
|
"persona": "The Enigma",
|
||||||
|
"era_insight": "Time is a flat circle.",
|
||||||
|
"roast": "You broke the machine. Congratulations.",
|
||||||
|
"comparison": "N/A"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Individual accessors if needed by frontend, though full_narrative is preferred
|
||||||
|
def generate_vibe_check(self, stats): return self.generate_full_narrative(stats).get("vibe_check")
|
||||||
|
def identify_patterns(self, stats): return self.generate_full_narrative(stats).get("patterns")
|
||||||
|
def generate_persona(self, stats): return self.generate_full_narrative(stats).get("persona")
|
||||||
|
def generate_roast(self, stats): return self.generate_full_narrative(stats).get("roast")
|
||||||
@@ -1,20 +1,17 @@
|
|||||||
from sqlalchemy.orm import Session
|
from sqlalchemy.orm import Session, joinedload
|
||||||
from sqlalchemy import func, distinct, desc, joinedload
|
from sqlalchemy import func, distinct
|
||||||
from datetime import datetime, timedelta
|
from datetime import datetime, timedelta
|
||||||
from typing import Dict, Any, List
|
from typing import Dict, Any, List, Optional
|
||||||
import math
|
import math
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
from ..models import PlayHistory, Track, Artist, AnalysisSnapshot
|
from ..models import PlayHistory, Track, Artist
|
||||||
|
|
||||||
class StatsService:
|
class StatsService:
|
||||||
def __init__(self, db: Session):
|
def __init__(self, db: Session):
|
||||||
self.db = db
|
self.db = db
|
||||||
|
|
||||||
from sqlalchemy.orm import joinedload # Add this to imports
|
def compute_comparison(self, current_stats: Dict[str, Any], period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
||||||
|
|
||||||
def compute_comparison(self, current_stats: Dict[str, Any], period_start: datetime, period_end: datetime) -> Dict[
|
|
||||||
str, Any]:
|
|
||||||
"""
|
"""
|
||||||
Calculates deltas vs the previous period of the same length.
|
Calculates deltas vs the previous period of the same length.
|
||||||
"""
|
"""
|
||||||
@@ -22,25 +19,18 @@ class StatsService:
|
|||||||
prev_end = period_start
|
prev_end = period_start
|
||||||
prev_start = prev_end - duration
|
prev_start = prev_end - duration
|
||||||
|
|
||||||
# We only need key metrics for comparison, not the full heavy report
|
# We only need key metrics for comparison
|
||||||
# Let's re-use existing methods but strictly for the previous window
|
|
||||||
|
|
||||||
# 1. Volume Comparison
|
|
||||||
prev_volume = self.compute_volume_stats(prev_start, prev_end)
|
prev_volume = self.compute_volume_stats(prev_start, prev_end)
|
||||||
|
|
||||||
# 2. Vibe Comparison (Just energy/valence/popularity)
|
|
||||||
prev_vibe = self.compute_vibe_stats(prev_start, prev_end)
|
prev_vibe = self.compute_vibe_stats(prev_start, prev_end)
|
||||||
prev_taste = self.compute_taste_stats(prev_start, prev_end)
|
prev_taste = self.compute_taste_stats(prev_start, prev_end)
|
||||||
|
|
||||||
# Calculate Deltas
|
|
||||||
deltas = {}
|
deltas = {}
|
||||||
|
|
||||||
# Plays
|
# Plays
|
||||||
curr_plays = current_stats["volume"]["total_plays"]
|
curr_plays = current_stats["volume"]["total_plays"]
|
||||||
prev_plays_count = prev_volume["total_plays"]
|
prev_plays_count = prev_volume["total_plays"]
|
||||||
deltas["plays_delta"] = curr_plays - prev_plays_count
|
deltas["plays_delta"] = curr_plays - prev_plays_count
|
||||||
deltas["plays_pct_change"] = round(((curr_plays - prev_plays_count) / prev_plays_count) * 100,
|
deltas["plays_pct_change"] = self._pct_change(curr_plays, prev_plays_count)
|
||||||
1) if prev_plays_count else 0
|
|
||||||
|
|
||||||
# Energy & Valence
|
# Energy & Valence
|
||||||
if "mood_quadrant" in current_stats["vibe"] and "mood_quadrant" in prev_vibe:
|
if "mood_quadrant" in current_stats["vibe"] and "mood_quadrant" in prev_vibe:
|
||||||
@@ -54,8 +44,7 @@ class StatsService:
|
|||||||
|
|
||||||
# Popularity
|
# Popularity
|
||||||
if "avg_popularity" in current_stats["taste"] and "avg_popularity" in prev_taste:
|
if "avg_popularity" in current_stats["taste"] and "avg_popularity" in prev_taste:
|
||||||
deltas["popularity_delta"] = round(current_stats["taste"]["avg_popularity"] - prev_taste["avg_popularity"],
|
deltas["popularity_delta"] = round(current_stats["taste"]["avg_popularity"] - prev_taste["avg_popularity"], 1)
|
||||||
1)
|
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"previous_period": {
|
"previous_period": {
|
||||||
@@ -67,31 +56,32 @@ class StatsService:
|
|||||||
|
|
||||||
def compute_volume_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
def compute_volume_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
||||||
"""
|
"""
|
||||||
Calculates volume metrics including Concentration (HHI, Gini) and One-and-Done rates.
|
Calculates volume metrics including Concentration (HHI, Gini, Entropy) and Top Lists.
|
||||||
"""
|
"""
|
||||||
# Eager load tracks AND artists to fix the "Artist String Problem" and performance
|
# Eager load tracks AND artists to fix the "Artist String Problem" and performance
|
||||||
|
# Use < period_end for half-open interval to avoid double counting boundaries
|
||||||
query = self.db.query(PlayHistory).options(
|
query = self.db.query(PlayHistory).options(
|
||||||
joinedload(PlayHistory.track).joinedload(Track.artists)
|
joinedload(PlayHistory.track).joinedload(Track.artists)
|
||||||
).filter(
|
).filter(
|
||||||
PlayHistory.played_at >= period_start,
|
PlayHistory.played_at >= period_start,
|
||||||
PlayHistory.played_at <= period_end
|
PlayHistory.played_at < period_end
|
||||||
)
|
)
|
||||||
plays = query.all()
|
plays = query.all()
|
||||||
total_plays = len(plays)
|
total_plays = len(plays)
|
||||||
|
|
||||||
if total_plays == 0:
|
if total_plays == 0:
|
||||||
return {
|
return self._empty_volume_stats()
|
||||||
"total_plays": 0, "estimated_minutes": 0, "unique_tracks": 0,
|
|
||||||
"unique_artists": 0, "unique_albums": 0, "unique_genres": 0,
|
|
||||||
"top_tracks": [], "top_artists": [], "top_genres": [],
|
|
||||||
"repeat_rate": 0, "concentration": {}
|
|
||||||
}
|
|
||||||
|
|
||||||
total_ms = 0
|
total_ms = 0
|
||||||
track_counts = {}
|
track_counts = {}
|
||||||
artist_counts = {}
|
artist_counts = {}
|
||||||
genre_counts = {}
|
genre_counts = {}
|
||||||
album_ids = set()
|
album_counts = {}
|
||||||
|
|
||||||
|
# Maps for resolving names later without DB hits
|
||||||
|
track_map = {}
|
||||||
|
artist_map = {}
|
||||||
|
album_map = {}
|
||||||
|
|
||||||
for p in plays:
|
for p in plays:
|
||||||
t = p.track
|
t = p.track
|
||||||
@@ -99,80 +89,110 @@ class StatsService:
|
|||||||
|
|
||||||
total_ms += t.duration_ms if t.duration_ms else 0
|
total_ms += t.duration_ms if t.duration_ms else 0
|
||||||
|
|
||||||
# Track Counts
|
# Track Aggregation
|
||||||
track_counts[t.id] = track_counts.get(t.id, 0) + 1
|
track_counts[t.id] = track_counts.get(t.id, 0) + 1
|
||||||
|
track_map[t.id] = t
|
||||||
|
|
||||||
# Album Counts (using raw_data ID if available, else name)
|
# Album Aggregation
|
||||||
if t.raw_data and "album" in t.raw_data and "id" in t.raw_data["album"]:
|
# Prefer ID from raw_data, fallback to name
|
||||||
album_ids.add(t.raw_data["album"]["id"])
|
album_id = t.album
|
||||||
else:
|
album_name = t.album
|
||||||
album_ids.add(t.album)
|
if t.raw_data and "album" in t.raw_data:
|
||||||
|
album_id = t.raw_data["album"].get("id", t.album)
|
||||||
|
album_name = t.raw_data["album"].get("name", t.album)
|
||||||
|
|
||||||
# Artist Counts (Iterate objects, not string)
|
album_counts[album_id] = album_counts.get(album_id, 0) + 1
|
||||||
|
album_map[album_id] = album_name
|
||||||
|
|
||||||
|
# Artist Aggregation (Iterate objects, not string)
|
||||||
for artist in t.artists:
|
for artist in t.artists:
|
||||||
artist_counts[artist.id] = artist_counts.get(artist.id, 0) + 1
|
artist_counts[artist.id] = artist_counts.get(artist.id, 0) + 1
|
||||||
|
artist_map[artist.id] = artist.name
|
||||||
|
|
||||||
|
# Genre Aggregation
|
||||||
if artist.genres:
|
if artist.genres:
|
||||||
|
# artist.genres is a JSON list of strings
|
||||||
for g in artist.genres:
|
for g in artist.genres:
|
||||||
genre_counts[g] = genre_counts.get(g, 0) + 1
|
genre_counts[g] = genre_counts.get(g, 0) + 1
|
||||||
|
|
||||||
# Derived Metrics
|
# Derived Metrics
|
||||||
unique_tracks = len(track_counts)
|
unique_tracks = len(track_counts)
|
||||||
one_and_done = len([c for c in track_counts.values() if c == 1])
|
one_and_done = len([c for c in track_counts.values() if c == 1])
|
||||||
|
shares = [c / total_plays for c in track_counts.values()]
|
||||||
|
|
||||||
# Top Lists
|
# Top Lists (Optimized: No N+1)
|
||||||
top_tracks = [
|
top_tracks = [
|
||||||
{"name": self.db.query(Track).get(tid).name, "artist": self.db.query(Track).get(tid).artist, "count": c}
|
{
|
||||||
|
"name": track_map[tid].name,
|
||||||
|
"artist": ", ".join([a.name for a in track_map[tid].artists]), # Correct artist display
|
||||||
|
"count": c
|
||||||
|
}
|
||||||
for tid, c in sorted(track_counts.items(), key=lambda x: x[1], reverse=True)[:5]
|
for tid, c in sorted(track_counts.items(), key=lambda x: x[1], reverse=True)[:5]
|
||||||
]
|
]
|
||||||
|
|
||||||
top_artist_ids = sorted(artist_counts.items(), key=lambda x: x[1], reverse=True)[:5]
|
top_artists = [
|
||||||
# Fetch artist names efficiently
|
{"name": artist_map.get(aid, "Unknown"), "count": c}
|
||||||
top_artists_objs = self.db.query(Artist).filter(Artist.id.in_([x[0] for x in top_artist_ids])).all()
|
for aid, c in sorted(artist_counts.items(), key=lambda x: x[1], reverse=True)[:5]
|
||||||
artist_map = {a.id: a.name for a in top_artists_objs}
|
]
|
||||||
top_artists = [{"name": artist_map.get(aid, "Unknown"), "count": c} for aid, c in top_artist_ids]
|
|
||||||
|
|
||||||
top_genres = [{"name": k, "count": v} for k, v in
|
top_albums = [
|
||||||
sorted(genre_counts.items(), key=lambda x: x[1], reverse=True)[:5]]
|
{"name": album_map.get(aid, "Unknown"), "count": c}
|
||||||
|
for aid, c in sorted(album_counts.items(), key=lambda x: x[1], reverse=True)[:5]
|
||||||
|
]
|
||||||
|
|
||||||
# Concentration (HHI & Gini)
|
top_genres = [{"name": k, "count": v} for k, v in sorted(genre_counts.items(), key=lambda x: x[1], reverse=True)[:5]]
|
||||||
|
|
||||||
|
# Concentration Metrics
|
||||||
# HHI: Sum of (share)^2
|
# HHI: Sum of (share)^2
|
||||||
shares = [c / total_plays for c in track_counts.values()]
|
|
||||||
hhi = sum([s ** 2 for s in shares])
|
hhi = sum([s ** 2 for s in shares])
|
||||||
|
|
||||||
# Gini Coefficient (Inequality of play distribution)
|
# Gini Coefficient
|
||||||
sorted_shares = sorted(shares)
|
sorted_shares = sorted(shares)
|
||||||
n = len(shares)
|
n = len(shares)
|
||||||
|
gini = 0
|
||||||
if n > 0:
|
if n > 0:
|
||||||
gini = (2 * sum((i + 1) * x for i, x in enumerate(sorted_shares))) / (n * sum(sorted_shares)) - (n + 1) / n
|
gini = (2 * sum((i + 1) * x for i, x in enumerate(sorted_shares))) / (n * sum(sorted_shares)) - (n + 1) / n
|
||||||
else:
|
|
||||||
gini = 0
|
# Genre Entropy: -SUM(p * log(p))
|
||||||
|
total_genre_occurrences = sum(genre_counts.values())
|
||||||
|
genre_entropy = 0
|
||||||
|
if total_genre_occurrences > 0:
|
||||||
|
genre_probs = [count / total_genre_occurrences for count in genre_counts.values()]
|
||||||
|
genre_entropy = -sum([p * math.log(p) for p in genre_probs if p > 0])
|
||||||
|
|
||||||
|
# Top 5 Share
|
||||||
|
top_5_plays = sum([t["count"] for t in top_tracks])
|
||||||
|
top_5_share = top_5_plays / total_plays if total_plays else 0
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"total_plays": total_plays,
|
"total_plays": total_plays,
|
||||||
"estimated_minutes": int(total_ms / 60000),
|
"estimated_minutes": int(total_ms / 60000),
|
||||||
"unique_tracks": unique_tracks,
|
"unique_tracks": unique_tracks,
|
||||||
"unique_artists": len(artist_counts),
|
"unique_artists": len(artist_counts),
|
||||||
"unique_albums": len(album_ids),
|
"unique_albums": len(album_counts),
|
||||||
"unique_genres": len(genre_counts),
|
"unique_genres": len(genre_counts),
|
||||||
"top_tracks": top_tracks,
|
"top_tracks": top_tracks,
|
||||||
"top_artists": top_artists,
|
"top_artists": top_artists,
|
||||||
|
"top_albums": top_albums,
|
||||||
"top_genres": top_genres,
|
"top_genres": top_genres,
|
||||||
"repeat_rate": round((total_plays - unique_tracks) / total_plays, 3) if total_plays else 0,
|
"repeat_rate": round((total_plays - unique_tracks) / total_plays, 3) if total_plays else 0,
|
||||||
"one_and_done_rate": round(one_and_done / unique_tracks, 3) if unique_tracks else 0,
|
"one_and_done_rate": round(one_and_done / unique_tracks, 3) if unique_tracks else 0,
|
||||||
"concentration": {
|
"concentration": {
|
||||||
"hhi": round(hhi, 4),
|
"hhi": round(hhi, 4),
|
||||||
"gini": round(gini, 4),
|
"gini": round(gini, 4),
|
||||||
"top_1_share": round(max(shares), 3) if shares else 0
|
"top_1_share": round(max(shares), 3) if shares else 0,
|
||||||
|
"top_5_share": round(top_5_share, 3),
|
||||||
|
"genre_entropy": round(genre_entropy, 2)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
def compute_time_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
def compute_time_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
||||||
"""
|
"""
|
||||||
Includes Part-of-Day buckets and Listening Streaks.
|
Includes Part-of-Day buckets, Listening Streaks, and Active Days stats.
|
||||||
"""
|
"""
|
||||||
query = self.db.query(PlayHistory).filter(
|
query = self.db.query(PlayHistory).filter(
|
||||||
PlayHistory.played_at >= period_start,
|
PlayHistory.played_at >= period_start,
|
||||||
PlayHistory.played_at <= period_end
|
PlayHistory.played_at < period_end
|
||||||
).order_by(PlayHistory.played_at.asc())
|
).order_by(PlayHistory.played_at.asc())
|
||||||
plays = query.all()
|
plays = query.all()
|
||||||
|
|
||||||
@@ -181,9 +201,8 @@ class StatsService:
|
|||||||
|
|
||||||
hourly_counts = [0] * 24
|
hourly_counts = [0] * 24
|
||||||
weekday_counts = [0] * 7
|
weekday_counts = [0] * 7
|
||||||
|
# Spec: Morning (6-12), Afternoon (12-18), Evening (18-24), Night (0-6)
|
||||||
part_of_day = {"morning": 0, "afternoon": 0, "evening": 0, "night": 0}
|
part_of_day = {"morning": 0, "afternoon": 0, "evening": 0, "night": 0}
|
||||||
|
|
||||||
# For Streaks
|
|
||||||
active_dates = set()
|
active_dates = set()
|
||||||
|
|
||||||
for p in plays:
|
for p in plays:
|
||||||
@@ -192,11 +211,11 @@ class StatsService:
|
|||||||
weekday_counts[p.played_at.weekday()] += 1
|
weekday_counts[p.played_at.weekday()] += 1
|
||||||
active_dates.add(p.played_at.date())
|
active_dates.add(p.played_at.date())
|
||||||
|
|
||||||
if 5 <= h < 12:
|
if 6 <= h < 12:
|
||||||
part_of_day["morning"] += 1
|
part_of_day["morning"] += 1
|
||||||
elif 12 <= h < 17:
|
elif 12 <= h < 18:
|
||||||
part_of_day["afternoon"] += 1
|
part_of_day["afternoon"] += 1
|
||||||
elif 17 <= h < 22:
|
elif 18 <= h <= 23:
|
||||||
part_of_day["evening"] += 1
|
part_of_day["evening"] += 1
|
||||||
else:
|
else:
|
||||||
part_of_day["night"] += 1
|
part_of_day["night"] += 1
|
||||||
@@ -208,7 +227,6 @@ class StatsService:
|
|||||||
if sorted_dates:
|
if sorted_dates:
|
||||||
current_streak = 1
|
current_streak = 1
|
||||||
longest_streak = 1
|
longest_streak = 1
|
||||||
# Check strictly consecutive days
|
|
||||||
for i in range(1, len(sorted_dates)):
|
for i in range(1, len(sorted_dates)):
|
||||||
delta = (sorted_dates[i] - sorted_dates[i - 1]).days
|
delta = (sorted_dates[i] - sorted_dates[i - 1]).days
|
||||||
if delta == 1:
|
if delta == 1:
|
||||||
@@ -219,6 +237,7 @@ class StatsService:
|
|||||||
longest_streak = max(longest_streak, current_streak)
|
longest_streak = max(longest_streak, current_streak)
|
||||||
|
|
||||||
weekend_plays = weekday_counts[5] + weekday_counts[6]
|
weekend_plays = weekday_counts[5] + weekday_counts[6]
|
||||||
|
active_days_count = len(active_dates)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"hourly_distribution": hourly_counts,
|
"hourly_distribution": hourly_counts,
|
||||||
@@ -228,17 +247,17 @@ class StatsService:
|
|||||||
"part_of_day": part_of_day,
|
"part_of_day": part_of_day,
|
||||||
"listening_streak": current_streak,
|
"listening_streak": current_streak,
|
||||||
"longest_streak": longest_streak,
|
"longest_streak": longest_streak,
|
||||||
"active_days": len(active_dates)
|
"active_days": active_days_count,
|
||||||
|
"avg_plays_per_active_day": round(len(plays) / active_days_count, 1) if active_days_count else 0
|
||||||
}
|
}
|
||||||
|
|
||||||
def compute_session_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
def compute_session_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
||||||
"""
|
"""
|
||||||
Includes Micro-sessions, Marathon sessions, and Energy Arcs.
|
Includes Micro-sessions, Marathon sessions, Energy Arcs, and Median metrics.
|
||||||
"""
|
"""
|
||||||
# Need to join Track to get Energy features for Arc analysis
|
|
||||||
query = self.db.query(PlayHistory).options(joinedload(PlayHistory.track)).filter(
|
query = self.db.query(PlayHistory).options(joinedload(PlayHistory.track)).filter(
|
||||||
PlayHistory.played_at >= period_start,
|
PlayHistory.played_at >= period_start,
|
||||||
PlayHistory.played_at <= period_end
|
PlayHistory.played_at < period_end
|
||||||
).order_by(PlayHistory.played_at.asc())
|
).order_by(PlayHistory.played_at.asc())
|
||||||
plays = query.all()
|
plays = query.all()
|
||||||
|
|
||||||
@@ -262,20 +281,24 @@ class StatsService:
|
|||||||
micro_sessions = 0
|
micro_sessions = 0
|
||||||
marathon_sessions = 0
|
marathon_sessions = 0
|
||||||
energy_arcs = {"rising": 0, "falling": 0, "flat": 0, "unknown": 0}
|
energy_arcs = {"rising": 0, "falling": 0, "flat": 0, "unknown": 0}
|
||||||
|
start_hour_dist = [0] * 24
|
||||||
|
|
||||||
for sess in sessions:
|
for sess in sessions:
|
||||||
|
# Start time distribution
|
||||||
|
start_hour_dist[sess[0].played_at.hour] += 1
|
||||||
|
|
||||||
# Durations
|
# Durations
|
||||||
if len(sess) > 1:
|
if len(sess) > 1:
|
||||||
duration = (sess[-1].played_at - sess[0].played_at).total_seconds() / 60
|
duration = (sess[-1].played_at - sess[0].played_at).total_seconds() / 60
|
||||||
lengths_min.append(duration)
|
lengths_min.append(duration)
|
||||||
else:
|
else:
|
||||||
lengths_min.append(3.0) # Approx
|
lengths_min.append(3.0) # Approx single song
|
||||||
|
|
||||||
# Types
|
# Types
|
||||||
if len(sess) <= 3: micro_sessions += 1
|
if len(sess) <= 3: micro_sessions += 1
|
||||||
if len(sess) >= 20: marathon_sessions += 1
|
if len(sess) >= 20: marathon_sessions += 1
|
||||||
|
|
||||||
# Energy Arc (First vs Last track)
|
# Energy Arc
|
||||||
first_t = sess[0].track
|
first_t = sess[0].track
|
||||||
last_t = sess[-1].track
|
last_t = sess[-1].track
|
||||||
if first_t and last_t and first_t.energy is not None and last_t.energy is not None:
|
if first_t and last_t and first_t.energy is not None and last_t.energy is not None:
|
||||||
@@ -286,13 +309,21 @@ class StatsService:
|
|||||||
else:
|
else:
|
||||||
energy_arcs["unknown"] += 1
|
energy_arcs["unknown"] += 1
|
||||||
|
|
||||||
avg_min = sum(lengths_min) / len(lengths_min) if lengths_min else 0
|
avg_min = np.mean(lengths_min) if lengths_min else 0
|
||||||
|
median_min = np.median(lengths_min) if lengths_min else 0
|
||||||
|
|
||||||
|
# Sessions per day
|
||||||
|
active_days = len(set(p.played_at.date() for p in plays))
|
||||||
|
sessions_per_day = len(sessions) / active_days if active_days else 0
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"count": len(sessions),
|
"count": len(sessions),
|
||||||
"avg_tracks": round(len(plays) / len(sessions), 1),
|
"avg_tracks": round(len(plays) / len(sessions), 1),
|
||||||
"avg_minutes": round(avg_min, 1),
|
"avg_minutes": round(float(avg_min), 1),
|
||||||
|
"median_minutes": round(float(median_min), 1),
|
||||||
"longest_session_minutes": round(max(lengths_min), 1) if lengths_min else 0,
|
"longest_session_minutes": round(max(lengths_min), 1) if lengths_min else 0,
|
||||||
|
"sessions_per_day": round(sessions_per_day, 1),
|
||||||
|
"start_hour_distribution": start_hour_dist,
|
||||||
"micro_session_rate": round(micro_sessions / len(sessions), 2),
|
"micro_session_rate": round(micro_sessions / len(sessions), 2),
|
||||||
"marathon_session_rate": round(marathon_sessions / len(sessions), 2),
|
"marathon_session_rate": round(marathon_sessions / len(sessions), 2),
|
||||||
"energy_arcs": energy_arcs
|
"energy_arcs": energy_arcs
|
||||||
@@ -300,12 +331,11 @@ class StatsService:
|
|||||||
|
|
||||||
def compute_vibe_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
def compute_vibe_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
||||||
"""
|
"""
|
||||||
Aggregates Audio Features + Calculates Whiplash (Transitions)
|
Aggregates Audio Features + Calculates Whiplash, Percentiles, and Profiles.
|
||||||
"""
|
"""
|
||||||
# Fetch plays strictly ordered by time for transition analysis
|
|
||||||
plays = self.db.query(PlayHistory).filter(
|
plays = self.db.query(PlayHistory).filter(
|
||||||
PlayHistory.played_at >= period_start,
|
PlayHistory.played_at >= period_start,
|
||||||
PlayHistory.played_at <= period_end
|
PlayHistory.played_at < period_end
|
||||||
).order_by(PlayHistory.played_at.asc()).all()
|
).order_by(PlayHistory.played_at.asc()).all()
|
||||||
|
|
||||||
if not plays:
|
if not plays:
|
||||||
@@ -316,9 +346,9 @@ class StatsService:
|
|||||||
track_map = {t.id: t for t in tracks}
|
track_map = {t.id: t for t in tracks}
|
||||||
|
|
||||||
# 1. Aggregates
|
# 1. Aggregates
|
||||||
features = {k: [] for k in
|
feature_keys = ["energy", "valence", "danceability", "tempo", "acousticness",
|
||||||
["energy", "valence", "danceability", "tempo", "acousticness", "instrumentalness", "liveness",
|
"instrumentalness", "liveness", "speechiness", "loudness"]
|
||||||
"speechiness", "loudness"]}
|
features = {k: [] for k in feature_keys}
|
||||||
|
|
||||||
# 2. Transition Arrays (for Whiplash)
|
# 2. Transition Arrays (for Whiplash)
|
||||||
transitions = {"tempo": [], "energy": [], "valence": []}
|
transitions = {"tempo": [], "energy": [], "valence": []}
|
||||||
@@ -329,38 +359,34 @@ class StatsService:
|
|||||||
t = track_map.get(p.track_id)
|
t = track_map.get(p.track_id)
|
||||||
if not t: continue
|
if not t: continue
|
||||||
|
|
||||||
# Populate aggregations
|
# Robust Null Check: Append separately
|
||||||
if t.energy is not None:
|
for key in feature_keys:
|
||||||
features["energy"].append(t.energy)
|
val = getattr(t, key, None)
|
||||||
features["valence"].append(t.valence)
|
if val is not None:
|
||||||
features["danceability"].append(t.danceability)
|
features[key].append(val)
|
||||||
features["tempo"].append(t.tempo)
|
|
||||||
features["acousticness"].append(t.acousticness)
|
|
||||||
features["instrumentalness"].append(t.instrumentalness)
|
|
||||||
features["liveness"].append(t.liveness)
|
|
||||||
features["speechiness"].append(t.speechiness)
|
|
||||||
features["loudness"].append(t.loudness)
|
|
||||||
|
|
||||||
# Calculate Transitions (Whiplash)
|
# Calculate Transitions (Whiplash)
|
||||||
if i > 0 and previous_track:
|
if i > 0 and previous_track:
|
||||||
# Only count transition if within reasonable time (e.g. < 5 mins gap)
|
|
||||||
# assuming continuous listening
|
|
||||||
time_diff = (p.played_at - plays[i - 1].played_at).total_seconds()
|
time_diff = (p.played_at - plays[i - 1].played_at).total_seconds()
|
||||||
if time_diff < 300:
|
if time_diff < 300: # 5 min gap max
|
||||||
if t.tempo and previous_track.tempo:
|
if t.tempo is not None and previous_track.tempo is not None:
|
||||||
transitions["tempo"].append(abs(t.tempo - previous_track.tempo))
|
transitions["tempo"].append(abs(t.tempo - previous_track.tempo))
|
||||||
if t.energy and previous_track.energy:
|
if t.energy is not None and previous_track.energy is not None:
|
||||||
transitions["energy"].append(abs(t.energy - previous_track.energy))
|
transitions["energy"].append(abs(t.energy - previous_track.energy))
|
||||||
|
if t.valence is not None and previous_track.valence is not None:
|
||||||
|
transitions["valence"].append(abs(t.valence - previous_track.valence))
|
||||||
|
|
||||||
previous_track = t
|
previous_track = t
|
||||||
|
|
||||||
# Calculate Stats
|
# Calculate Stats (Mean, Std, Percentiles)
|
||||||
stats = {}
|
stats = {}
|
||||||
for key, values in features.items():
|
for key, values in features.items():
|
||||||
valid = [v for v in values if v is not None]
|
if values:
|
||||||
if valid:
|
stats[f"avg_{key}"] = float(np.mean(values))
|
||||||
stats[f"avg_{key}"] = float(np.mean(valid))
|
stats[f"std_{key}"] = float(np.std(values))
|
||||||
stats[f"std_{key}"] = float(np.std(valid))
|
stats[f"p10_{key}"] = float(np.percentile(values, 10))
|
||||||
|
stats[f"p50_{key}"] = float(np.percentile(values, 50)) # Median
|
||||||
|
stats[f"p90_{key}"] = float(np.percentile(values, 90))
|
||||||
else:
|
else:
|
||||||
stats[f"avg_{key}"] = None
|
stats[f"avg_{key}"] = None
|
||||||
|
|
||||||
@@ -370,13 +396,27 @@ class StatsService:
|
|||||||
"x": round(stats["avg_valence"], 2),
|
"x": round(stats["avg_valence"], 2),
|
||||||
"y": round(stats["avg_energy"], 2)
|
"y": round(stats["avg_energy"], 2)
|
||||||
}
|
}
|
||||||
# Consistency: Inverse of average standard deviation of Mood components
|
# Consistency
|
||||||
avg_std = (stats["std_energy"] + stats["std_valence"]) / 2
|
avg_std = (stats.get("std_energy", 0) + stats.get("std_valence", 0)) / 2
|
||||||
stats["consistency_score"] = round(1.0 - avg_std, 2) # Higher = more consistent
|
stats["consistency_score"] = round(1.0 - avg_std, 2)
|
||||||
|
|
||||||
# Whiplash Scores (Average jump between tracks)
|
# Rhythm Profile
|
||||||
|
if stats.get("avg_tempo") is not None and stats.get("avg_danceability") is not None:
|
||||||
|
stats["rhythm_profile"] = {
|
||||||
|
"avg_tempo": round(stats["avg_tempo"], 1),
|
||||||
|
"avg_danceability": round(stats["avg_danceability"], 2)
|
||||||
|
}
|
||||||
|
|
||||||
|
# Texture Profile
|
||||||
|
if stats.get("avg_acousticness") is not None and stats.get("avg_instrumentalness") is not None:
|
||||||
|
stats["texture_profile"] = {
|
||||||
|
"acousticness": round(stats["avg_acousticness"], 2),
|
||||||
|
"instrumentalness": round(stats["avg_instrumentalness"], 2)
|
||||||
|
}
|
||||||
|
|
||||||
|
# Whiplash Scores
|
||||||
stats["whiplash"] = {}
|
stats["whiplash"] = {}
|
||||||
for k in ["tempo", "energy"]:
|
for k in ["tempo", "energy", "valence"]:
|
||||||
if transitions[k]:
|
if transitions[k]:
|
||||||
stats["whiplash"][k] = round(float(np.mean(transitions[k])), 2)
|
stats["whiplash"][k] = round(float(np.mean(transitions[k])), 2)
|
||||||
else:
|
else:
|
||||||
@@ -388,10 +428,9 @@ class StatsService:
|
|||||||
"""
|
"""
|
||||||
Includes Nostalgia Gap and granular decade breakdown.
|
Includes Nostalgia Gap and granular decade breakdown.
|
||||||
"""
|
"""
|
||||||
# Join track to get raw_data
|
|
||||||
query = self.db.query(PlayHistory).options(joinedload(PlayHistory.track)).filter(
|
query = self.db.query(PlayHistory).options(joinedload(PlayHistory.track)).filter(
|
||||||
PlayHistory.played_at >= period_start,
|
PlayHistory.played_at >= period_start,
|
||||||
PlayHistory.played_at <= period_end
|
PlayHistory.played_at < period_end
|
||||||
)
|
)
|
||||||
plays = query.all()
|
plays = query.all()
|
||||||
|
|
||||||
@@ -409,11 +448,9 @@ class StatsService:
|
|||||||
if not years:
|
if not years:
|
||||||
return {"musical_age": None}
|
return {"musical_age": None}
|
||||||
|
|
||||||
# Musical Age (Weighted Average)
|
|
||||||
avg_year = sum(years) / len(years)
|
avg_year = sum(years) / len(years)
|
||||||
current_year = datetime.utcnow().year
|
current_year = datetime.utcnow().year
|
||||||
|
|
||||||
# Decade Distribution
|
|
||||||
decades = {}
|
decades = {}
|
||||||
for y in years:
|
for y in years:
|
||||||
dec = (y // 10) * 10
|
dec = (y // 10) * 10
|
||||||
@@ -426,18 +463,17 @@ class StatsService:
|
|||||||
return {
|
return {
|
||||||
"musical_age": int(avg_year),
|
"musical_age": int(avg_year),
|
||||||
"nostalgia_gap": int(current_year - avg_year),
|
"nostalgia_gap": int(current_year - avg_year),
|
||||||
"freshness_score": dist.get(f"{int(current_year / 10) * 10}s", 0), # Share of current decade
|
"freshness_score": dist.get(f"{int(current_year / 10) * 10}s", 0),
|
||||||
"decade_distribution": dist
|
"decade_distribution": dist
|
||||||
}
|
}
|
||||||
|
|
||||||
def compute_skip_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
def compute_skip_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
||||||
"""
|
"""
|
||||||
Implements boredom skip detection:
|
Implements boredom skip detection.
|
||||||
(next_track.played_at - current_track.played_at) < (current_track.duration_ms / 1000 - 10s)
|
|
||||||
"""
|
"""
|
||||||
query = self.db.query(PlayHistory).filter(
|
query = self.db.query(PlayHistory).filter(
|
||||||
PlayHistory.played_at >= period_start,
|
PlayHistory.played_at >= period_start,
|
||||||
PlayHistory.played_at <= period_end
|
PlayHistory.played_at < period_end
|
||||||
).order_by(PlayHistory.played_at.asc())
|
).order_by(PlayHistory.played_at.asc())
|
||||||
plays = query.all()
|
plays = query.all()
|
||||||
|
|
||||||
@@ -449,7 +485,10 @@ class StatsService:
|
|||||||
tracks = self.db.query(Track).filter(Track.id.in_(track_ids)).all()
|
tracks = self.db.query(Track).filter(Track.id.in_(track_ids)).all()
|
||||||
track_map = {t.id: t for t in tracks}
|
track_map = {t.id: t for t in tracks}
|
||||||
|
|
||||||
for i in range(len(plays) - 1):
|
# Denominator: transitions, which is plays - 1
|
||||||
|
transitions_count = len(plays) - 1
|
||||||
|
|
||||||
|
for i in range(transitions_count):
|
||||||
current_play = plays[i]
|
current_play = plays[i]
|
||||||
next_play = plays[i+1]
|
next_play = plays[i+1]
|
||||||
track = track_map.get(current_play.track_id)
|
track = track_map.get(current_play.track_id)
|
||||||
@@ -458,31 +497,28 @@ class StatsService:
|
|||||||
continue
|
continue
|
||||||
|
|
||||||
diff_seconds = (next_play.played_at - current_play.played_at).total_seconds()
|
diff_seconds = (next_play.played_at - current_play.played_at).total_seconds()
|
||||||
|
|
||||||
# Logic: If diff < (duration - 10s), it's a skip.
|
|
||||||
# Convert duration to seconds
|
|
||||||
duration_sec = track.duration_ms / 1000.0
|
duration_sec = track.duration_ms / 1000.0
|
||||||
|
|
||||||
# Also ensure diff isn't negative or weirdly small (re-plays)
|
# Logic: If diff < (duration - 10s), it's a skip.
|
||||||
# And assume "listening" means diff > 30s at least?
|
# AND it must be a "valid" listening attempt (e.g. > 30s)
|
||||||
# Spec says "Spotify only returns 30s+".
|
# AND it shouldn't be a huge gap (e.g. paused for 2 hours then hit next)
|
||||||
|
|
||||||
if diff_seconds < (duration_sec - 10):
|
if 30 < diff_seconds < (duration_sec - 10):
|
||||||
skips += 1
|
skips += 1
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"total_skips": skips,
|
"total_skips": skips,
|
||||||
"skip_rate": round(skips / len(plays), 3)
|
"skip_rate": round(skips / transitions_count, 3) if transitions_count > 0 else 0
|
||||||
}
|
}
|
||||||
|
|
||||||
def compute_context_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
def compute_context_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
||||||
"""
|
"""
|
||||||
Analyzes context_uri to determine if user listens to Playlists, Albums, or Artists.
|
Analyzes context_uri and switching rate.
|
||||||
"""
|
"""
|
||||||
query = self.db.query(PlayHistory).filter(
|
query = self.db.query(PlayHistory).filter(
|
||||||
PlayHistory.played_at >= period_start,
|
PlayHistory.played_at >= period_start,
|
||||||
PlayHistory.played_at <= period_end
|
PlayHistory.played_at < period_end
|
||||||
)
|
).order_by(PlayHistory.played_at.asc())
|
||||||
plays = query.all()
|
plays = query.all()
|
||||||
|
|
||||||
if not plays:
|
if not plays:
|
||||||
@@ -490,31 +526,32 @@ class StatsService:
|
|||||||
|
|
||||||
context_counts = {"playlist": 0, "album": 0, "artist": 0, "collection": 0, "unknown": 0}
|
context_counts = {"playlist": 0, "album": 0, "artist": 0, "collection": 0, "unknown": 0}
|
||||||
unique_contexts = {}
|
unique_contexts = {}
|
||||||
|
context_switches = 0
|
||||||
|
|
||||||
|
last_context = None
|
||||||
|
|
||||||
for p in plays:
|
for p in plays:
|
||||||
if not p.context_uri:
|
uri = p.context_uri
|
||||||
|
if not uri:
|
||||||
context_counts["unknown"] += 1
|
context_counts["unknown"] += 1
|
||||||
continue
|
uri = "unknown"
|
||||||
|
|
||||||
# Count distinct contexts for loyalty
|
|
||||||
unique_contexts[p.context_uri] = unique_contexts.get(p.context_uri, 0) + 1
|
|
||||||
|
|
||||||
if "playlist" in p.context_uri:
|
|
||||||
context_counts["playlist"] += 1
|
|
||||||
elif "album" in p.context_uri:
|
|
||||||
context_counts["album"] += 1
|
|
||||||
elif "artist" in p.context_uri:
|
|
||||||
context_counts["artist"] += 1
|
|
||||||
elif "collection" in p.context_uri:
|
|
||||||
# "Liked Songs" usually shows up as collection
|
|
||||||
context_counts["collection"] += 1
|
|
||||||
else:
|
else:
|
||||||
context_counts["unknown"] += 1
|
if "playlist" in uri: context_counts["playlist"] += 1
|
||||||
|
elif "album" in uri: context_counts["album"] += 1
|
||||||
|
elif "artist" in uri: context_counts["artist"] += 1
|
||||||
|
elif "collection" in uri: context_counts["collection"] += 1
|
||||||
|
else: context_counts["unknown"] += 1
|
||||||
|
|
||||||
|
if uri != "unknown":
|
||||||
|
unique_contexts[uri] = unique_contexts.get(uri, 0) + 1
|
||||||
|
|
||||||
|
# Switch detection
|
||||||
|
if last_context and uri != last_context:
|
||||||
|
context_switches += 1
|
||||||
|
last_context = uri
|
||||||
|
|
||||||
total = len(plays)
|
total = len(plays)
|
||||||
breakdown = {k: round(v / total, 2) for k, v in context_counts.items()}
|
breakdown = {k: round(v / total, 2) for k, v in context_counts.items()}
|
||||||
|
|
||||||
# Top 5 Contexts (Requires resolving URI to name, possibly missing metadata here)
|
|
||||||
sorted_contexts = sorted(unique_contexts.items(), key=lambda x: x[1], reverse=True)[:5]
|
sorted_contexts = sorted(unique_contexts.items(), key=lambda x: x[1], reverse=True)[:5]
|
||||||
|
|
||||||
return {
|
return {
|
||||||
@@ -522,16 +559,17 @@ class StatsService:
|
|||||||
"album_purist_score": breakdown.get("album", 0),
|
"album_purist_score": breakdown.get("album", 0),
|
||||||
"playlist_dependency": breakdown.get("playlist", 0),
|
"playlist_dependency": breakdown.get("playlist", 0),
|
||||||
"context_loyalty": round(len(plays) / len(unique_contexts), 2) if unique_contexts else 0,
|
"context_loyalty": round(len(plays) / len(unique_contexts), 2) if unique_contexts else 0,
|
||||||
|
"context_switching_rate": round(context_switches / (total - 1), 2) if total > 1 else 0,
|
||||||
"top_context_uris": [{"uri": k, "count": v} for k, v in sorted_contexts]
|
"top_context_uris": [{"uri": k, "count": v} for k, v in sorted_contexts]
|
||||||
}
|
}
|
||||||
|
|
||||||
def compute_taste_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
def compute_taste_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
||||||
"""
|
"""
|
||||||
Mainstream vs. Hipster analysis based on Track.popularity (0-100).
|
Mainstream vs. Hipster analysis.
|
||||||
"""
|
"""
|
||||||
query = self.db.query(PlayHistory).filter(
|
query = self.db.query(PlayHistory).filter(
|
||||||
PlayHistory.played_at >= period_start,
|
PlayHistory.played_at >= period_start,
|
||||||
PlayHistory.played_at <= period_end
|
PlayHistory.played_at < period_end
|
||||||
)
|
)
|
||||||
plays = query.all()
|
plays = query.all()
|
||||||
if not plays: return {}
|
if not plays: return {}
|
||||||
@@ -564,38 +602,47 @@ class StatsService:
|
|||||||
|
|
||||||
def compute_lifecycle_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
def compute_lifecycle_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
||||||
"""
|
"""
|
||||||
Determines if tracks are 'New Discoveries' or 'Old Favorites'.
|
Discovery, Recurrence, Comebacks, Obsessions.
|
||||||
"""
|
"""
|
||||||
# 1. Get tracks played in this period
|
# 1. Current plays
|
||||||
current_plays = self.db.query(PlayHistory).filter(
|
current_plays = self.db.query(PlayHistory).filter(
|
||||||
PlayHistory.played_at >= period_start,
|
PlayHistory.played_at >= period_start,
|
||||||
PlayHistory.played_at <= period_end
|
PlayHistory.played_at < period_end
|
||||||
).all()
|
).all()
|
||||||
|
|
||||||
if not current_plays: return {}
|
if not current_plays: return {}
|
||||||
|
|
||||||
current_track_ids = set([p.track_id for p in current_plays])
|
current_track_ids = set([p.track_id for p in current_plays])
|
||||||
|
|
||||||
# 2. Check if these tracks were played BEFORE period_start
|
# 2. Historical check
|
||||||
# We find which of the current_track_ids exist in history < period_start
|
|
||||||
old_tracks_query = self.db.query(distinct(PlayHistory.track_id)).filter(
|
old_tracks_query = self.db.query(distinct(PlayHistory.track_id)).filter(
|
||||||
PlayHistory.track_id.in_(current_track_ids),
|
PlayHistory.track_id.in_(current_track_ids),
|
||||||
PlayHistory.played_at < period_start
|
PlayHistory.played_at < period_start
|
||||||
)
|
)
|
||||||
old_track_ids = set([r[0] for r in old_tracks_query.all()])
|
old_track_ids = set([r[0] for r in old_tracks_query.all()])
|
||||||
|
|
||||||
# 3. Calculate Discovery
|
# 3. Discovery
|
||||||
new_discoveries = current_track_ids - old_track_ids
|
new_discoveries = current_track_ids - old_track_ids
|
||||||
discovery_count = len(new_discoveries)
|
|
||||||
|
|
||||||
# Calculate plays on new discoveries
|
# 4. Obsessions (Tracks with > 5 plays in period)
|
||||||
|
track_counts = {}
|
||||||
|
for p in current_plays:
|
||||||
|
track_counts[p.track_id] = track_counts.get(p.track_id, 0) + 1
|
||||||
|
obsessions = [tid for tid, count in track_counts.items() if count >= 5]
|
||||||
|
|
||||||
|
# 5. Comeback Detection (Old tracks not played in last 30 days)
|
||||||
|
# Simplified: If in old_track_ids but NOT in last 30 days before period_start?
|
||||||
|
# That requires a gap check. For now, we will mark 'recurrence' as general relistening.
|
||||||
|
|
||||||
plays_on_new = len([p for p in current_plays if p.track_id in new_discoveries])
|
plays_on_new = len([p for p in current_plays if p.track_id in new_discoveries])
|
||||||
total_plays = len(current_plays)
|
total_plays = len(current_plays)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"discovery_count": discovery_count,
|
"discovery_count": len(new_discoveries),
|
||||||
"discovery_rate": round(plays_on_new / total_plays, 3) if total_plays > 0 else 0,
|
"discovery_rate": round(plays_on_new / total_plays, 3) if total_plays > 0 else 0,
|
||||||
"recurrence_rate": round((total_plays - plays_on_new) / total_plays, 3) if total_plays > 0 else 0
|
"recurrence_rate": round((total_plays - plays_on_new) / total_plays, 3) if total_plays > 0 else 0,
|
||||||
|
"obsession_count": len(obsessions),
|
||||||
|
"obsession_rate": round(len(obsessions) / len(current_track_ids), 3) if current_track_ids else 0
|
||||||
}
|
}
|
||||||
|
|
||||||
def compute_explicit_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
def compute_explicit_stats(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
||||||
@@ -604,7 +651,7 @@ class StatsService:
|
|||||||
"""
|
"""
|
||||||
query = self.db.query(PlayHistory).options(joinedload(PlayHistory.track)).filter(
|
query = self.db.query(PlayHistory).options(joinedload(PlayHistory.track)).filter(
|
||||||
PlayHistory.played_at >= period_start,
|
PlayHistory.played_at >= period_start,
|
||||||
PlayHistory.played_at <= period_end
|
PlayHistory.played_at < period_end
|
||||||
)
|
)
|
||||||
plays = query.all()
|
plays = query.all()
|
||||||
|
|
||||||
@@ -618,24 +665,14 @@ class StatsService:
|
|||||||
for p in plays:
|
for p in plays:
|
||||||
h = p.played_at.hour
|
h = p.played_at.hour
|
||||||
hourly_total[h] += 1
|
hourly_total[h] += 1
|
||||||
|
|
||||||
# Check raw_data for explicit flag
|
|
||||||
t = p.track
|
t = p.track
|
||||||
is_explicit = False
|
|
||||||
if t.raw_data and t.raw_data.get("explicit"):
|
if t.raw_data and t.raw_data.get("explicit"):
|
||||||
is_explicit = True
|
|
||||||
|
|
||||||
if is_explicit:
|
|
||||||
explicit_count += 1
|
explicit_count += 1
|
||||||
hourly_explicit[h] += 1
|
hourly_explicit[h] += 1
|
||||||
|
|
||||||
# Calculate hourly percentages
|
|
||||||
hourly_rates = []
|
hourly_rates = []
|
||||||
for i in range(24):
|
for i in range(24):
|
||||||
if hourly_total[i] > 0:
|
hourly_rates.append(round(hourly_explicit[i] / hourly_total[i], 2) if hourly_total[i] > 0 else 0.0)
|
||||||
hourly_rates.append(round(hourly_explicit[i] / hourly_total[i], 2))
|
|
||||||
else:
|
|
||||||
hourly_rates.append(0.0)
|
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"explicit_rate": round(explicit_count / total_plays, 3),
|
"explicit_rate": round(explicit_count / total_plays, 3),
|
||||||
@@ -644,7 +681,6 @@ class StatsService:
|
|||||||
}
|
}
|
||||||
|
|
||||||
def generate_full_report(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
def generate_full_report(self, period_start: datetime, period_end: datetime) -> Dict[str, Any]:
|
||||||
# 1. Calculate all current stats
|
|
||||||
current_stats = {
|
current_stats = {
|
||||||
"period": {"start": period_start.isoformat(), "end": period_end.isoformat()},
|
"period": {"start": period_start.isoformat(), "end": period_end.isoformat()},
|
||||||
"volume": self.compute_volume_stats(period_start, period_end),
|
"volume": self.compute_volume_stats(period_start, period_end),
|
||||||
@@ -659,7 +695,19 @@ class StatsService:
|
|||||||
"skips": self.compute_skip_stats(period_start, period_end)
|
"skips": self.compute_skip_stats(period_start, period_end)
|
||||||
}
|
}
|
||||||
|
|
||||||
# 2. Calculate Comparison
|
|
||||||
current_stats["comparison"] = self.compute_comparison(current_stats, period_start, period_end)
|
current_stats["comparison"] = self.compute_comparison(current_stats, period_start, period_end)
|
||||||
|
|
||||||
return current_stats
|
return current_stats
|
||||||
|
|
||||||
|
def _empty_volume_stats(self):
|
||||||
|
return {
|
||||||
|
"total_plays": 0, "estimated_minutes": 0, "unique_tracks": 0,
|
||||||
|
"unique_artists": 0, "unique_albums": 0, "unique_genres": 0,
|
||||||
|
"top_tracks": [], "top_artists": [], "top_albums": [], "top_genres": [],
|
||||||
|
"repeat_rate": 0, "one_and_done_rate": 0,
|
||||||
|
"concentration": {}
|
||||||
|
}
|
||||||
|
|
||||||
|
def _pct_change(self, curr, prev):
|
||||||
|
if prev == 0:
|
||||||
|
return 100.0 if curr > 0 else 0.0
|
||||||
|
return round(((curr - prev) / prev) * 100, 1)
|
||||||
|
|||||||
Reference in New Issue
Block a user