4 Commits

Author SHA1 Message Date
google-labs-jules[bot]
0ca9893c68 Implement Phase 2 Frontend and Phase 3 Data Enrichment
- Initialize React+Vite Frontend with Ant Design Dashboard.
- Implement Data Enrichment: ReccoBeats (Audio Features) and Spotify (Genres).
- Update Database Schema via Alembic Migrations.
- Add Docker support (Dockerfile, docker-compose.yml).
- Update README with hosting instructions.
2025-12-24 21:34:36 +00:00
bnair123
3a424d15a5 Add project context and documentation for Music Analyser
This document outlines the vision, technical decisions, current architecture, and future roadmap for the Music Analyser project. It serves as a guide for future AI agents or developers.
2025-12-24 22:03:18 +04:00
bnair123
4ca4c7befd Enhance Docker publish workflow with metadata and caching
Added environment variables for registry and image name. Updated Docker build and push steps to include metadata extraction and caching.
2025-12-24 21:54:04 +04:00
bnair123
b502e95652 Merge pull request #1 from bnair123/setup-initial-backend-8149240771439055261
Initial Backend Setup
2025-12-24 21:30:32 +04:00
17 changed files with 757 additions and 65 deletions

View File

@@ -6,9 +6,17 @@ on:
pull_request: pull_request:
branches: [ "main" ] branches: [ "main" ]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs: jobs:
build: build:
runs-on: ubuntu-latest runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps: steps:
- uses: actions/checkout@v3 - uses: actions/checkout@v3
@@ -18,9 +26,32 @@ jobs:
- name: Set up Docker Buildx - name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3 uses: docker/setup-buildx-action@v3
- name: Log in to the Container registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=semver,pattern={{version}}
type=sha
latest
- name: Build and push - name: Build and push
uses: docker/build-push-action@v5 uses: docker/build-push-action@v5
with: with:
context: ./backend context: ./backend
push: false push: true
tags: user/app:latest tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
platforms: linux/amd64,linux/arm64
cache-from: type=gha
cache-to: type=gha,mode=max

114
Context.md Normal file
View File

@@ -0,0 +1,114 @@
# Music Analyser - Project Context & Documentation
This document serves as a comprehensive guide to the **Music Analyser** project. It outlines the vision, technical decisions, current architecture, and future roadmap. **Use this document to provide context to future AI agents or developers.**
## 1. Project Vision
The goal of this project is to build a personal analytics dashboard that:
1. **Regularly queries** the Spotify API (24/7) to collect a complete history of listening habits.
2. Stores this data locally (or in a private database) to ensure ownership and completeness.
3. Provides **rich analysis** and visualizations (similar to "Spotify Wrapped" but on-demand and more detailed).
4. Integrates **AI (Google Gemini)** to provide qualitative insights, summaries, and trend analysis (e.g., "You started the week with high-energy pop but shifted to lo-fi study beats by Friday").
## 2. Roadmap & Phases
### Phase 1: Foundation & Data Collection (Current Status: ✅ COMPLETED)
- **Goal:** reliable data ingestion and storage.
- **Deliverables:**
- FastAPI Backend.
- SQLite Database (with SQLAlchemy).
- Spotify OAuth logic (Refresh Token flow).
- Background Worker for 24/7 polling.
- Docker containerization + GitHub Actions (Multi-arch build).
### Phase 2: Visualization (Next Step)
- **Goal:** View the raw data.
- **Deliverables:**
- Frontend (React + Vite).
- Basic Data Table / List View of listening history.
- Basic filtering (by date, artist).
### Phase 3: Analysis & AI
- **Goal:** Deep insights.
- **Deliverables:**
- Advanced charts/graphs.
- AI Integration (Gemini 2.5/3 Flash) to generate text summaries of listening trends.
- Email reports (optional).
## 3. Technical Architecture
### Backend
- **Language:** Python 3.11+
- **Framework:** FastAPI (High performance, easy to use).
- **Dependencies:** `httpx` (Async HTTP), `sqlalchemy` (ORM), `pydantic` (Validation).
### Database
- **Current:** SQLite (`music.db`).
- *Decision:* Chosen for simplicity in Phase 1.
- **Future path:** The code uses SQLAlchemy, so migrating to **PostgreSQL** (e.g., Supabase) only requires changing the connection string in `database.py`.
### Database Schema
1. **`Track` Table:**
- Stores unique tracks.
- Columns: `id` (Spotify ID), `name`, `artist`, `album`, `duration_ms`, `metadata_json` (Stores the *entire* raw Spotify JSON response for future-proofing).
2. **`PlayHistory` Table:**
- Stores the instances of listening.
- Columns: `id`, `track_id` (FK), `played_at` (Timestamp), `context_uri`.
### Authentication Strategy
- **Challenge:** The background worker runs headless (no user to click "Login").
- **Solution:** We use the **Authorization Code Flow with Refresh Tokens**.
1. User runs the local helper script (`backend/scripts/get_refresh_token.py`) once.
2. This generates a long-lived `SPOTIFY_REFRESH_TOKEN`.
3. The backend uses this token to automatically request new short-lived Access Tokens whenever needed.
### Background Worker Logic
- **File:** `backend/run_worker.py` -> `backend/app/ingest.py`
- **Process:**
1. Worker wakes up every 60 seconds.
2. Calls Spotify `recently-played` endpoint (limit 50).
3. Iterates through tracks.
4. **Deduplication:** Checks `(track_id, played_at)` against the DB. If it exists, skip. If not, insert.
5. **Metadata:** If the track is new to the system, it saves the full metadata immediately.
### AI Integration
- **Model:** Google Gemini (Target: 2.5 Flash or 3 Flash).
- **Status:** Service class exists (`AIService`) but is not yet fully wired into the daily workflow.
### Deployment
- **Docker:** Multi-stage build (python-slim).
- **CI/CD:** GitHub Actions workflow (`docker-publish.yml`).
- Builds for `linux/amd64` and `linux/arm64`.
- Pushes to GitHub Container Registry (ghcr.io).
## 4. How to Run
### Prerequisites
- Spotify Client ID & Secret.
- Google Gemini API Key.
- Docker (optional).
### Local Development
1. **Setup Env:**
```bash
cp backend/.env.example backend/.env
# Fill in details
```
2. **Install:**
```bash
cd backend
pip install -r requirements.txt
```
3. **Run API:**
```bash
uvicorn app.main:app --reload
```
4. **Run Worker:**
```bash
python run_worker.py
```
### Docker
```bash
docker build -t music-analyser-backend ./backend
docker run --env-file backend/.env music-analyser-backend
```

View File

@@ -1,27 +1,37 @@
# Music Analyser # Music Analyser
A personal analytics dashboard for your music listening habits, powered by Python, FastAPI, and Google Gemini AI. A personal analytics dashboard for your music listening habits, powered by Python, FastAPI, React, and Google Gemini AI.
## Project Structure ## Project Structure
- `backend/`: FastAPI backend for data ingestion and API. - `backend/`: FastAPI backend for data ingestion and API.
- `app/ingest.py`: Background worker that polls Spotify. - `app/ingest.py`: Background worker that polls Spotify and enriches data via ReccoBeats.
- `app/services/`: Logic for Spotify and Gemini APIs. - `app/services/`: Logic for Spotify, ReccoBeats, and Gemini APIs.
- `app/models.py`: Database schema (Tracks, PlayHistory). - `app/models.py`: Database schema (Tracks, PlayHistory).
- `frontend/`: (Coming Soon) React/Vite frontend. - `frontend/`: React + Vite frontend for visualizing the dashboard.
- `docker-compose.yml`: For easy deployment.
## Getting Started ## Features
### Prerequisites - **Continuous Ingestion**: Polls Spotify every 60 seconds to record your listening history.
- **Data Enrichment**: Automatically fetches **Genres** (via Spotify) and **Audio Features** (Energy, BPM, Mood via ReccoBeats).
- **Dashboard**: A responsive UI to view your history and stats.
- **AI Ready**: Database schema and environment prepared for Gemini AI integration.
- Docker & Docker Compose (optional, for containerization) ## Hosting Guide (Docker)
- Python 3.11+ (for local dev)
- A Spotify Developer App (Client ID & Secret)
- A Google Gemini API Key
### 1. Setup Environment Variables This application is designed to run via Docker Compose.
Create a `.env` file in the `backend/` directory: ### 1. Prerequisites
- Docker & Docker Compose installed.
- **Spotify Developer Credentials** (Client ID & Secret).
- **Spotify Refresh Token** (Run `backend/scripts/get_refresh_token.py` locally to generate this).
- **Google Gemini API Key**.
### 2. Deployment
1. **Clone the repository**.
2. **Create a `.env` file** in the root directory (or use environment variables directly):
```bash ```bash
SPOTIFY_CLIENT_ID="your_client_id" SPOTIFY_CLIENT_ID="your_client_id"
@@ -30,43 +40,48 @@ SPOTIFY_REFRESH_TOKEN="your_refresh_token"
GEMINI_API_KEY="your_gemini_key" GEMINI_API_KEY="your_gemini_key"
``` ```
To get the `SPOTIFY_REFRESH_TOKEN`, run the helper script: 3. **Run with Docker Compose**:
```bash ```bash
python backend/scripts/get_refresh_token.py docker-compose up -d --build
``` ```
### 2. Run Locally This will:
- Build and start the **Backend** (port 8000).
- Build and start the **Frontend** (port 8991).
- Create a **Persistent Volume** at `/opt/mySpotify` (mapped to the container's database) to ensure **no data loss** during updates.
Install dependencies: 4. **Access the Dashboard**:
Open your browser to `http://localhost:8991` (or your server IP).
### 3. Data Persistence & Updates
- **Data**: All data is stored in `music.db` inside the container, which is mounted to `/opt/mySpotify/music.db` on your host machine.
- **Migrations**: The project uses **Alembic** for database migrations. When you update the container image in the future, the backend will automatically apply any schema changes without deleting your data.
### 4. Pulling from Registry (Alternative)
If you prefer to pull the pre-built image instead of building locally:
```bash
docker pull ghcr.io/bnair123/musicanalyser:latest
```
(Note: You still need to mount the volume and pass environment variables as shown in `docker-compose.yml`).
## Local Development
1. **Backend**:
```bash ```bash
cd backend cd backend
pip install -r requirements.txt pip install -r requirements.txt
python run_worker.py # Starts ingestion
uvicorn app.main:app --reload # Starts API
``` ```
Run the server: 2. **Frontend**:
```bash ```bash
uvicorn app.main:app --reload cd frontend
``` npm install
npm run dev
The API will be available at `http://localhost:8000`.
### 3. Run Ingestion (Manually)
You can trigger the ingestion process via the API:
```bash
curl -X POST http://localhost:8000/trigger-ingest
```
Or run the ingestion logic directly via python shell (see `app/ingest.py`).
### 4. Docker Build
To build the image locally:
```bash
docker build -t music-analyser-backend ./backend
``` ```

147
backend/alembic.ini Normal file
View File

@@ -0,0 +1,147 @@
# A generic, single database configuration.
[alembic]
# path to migration scripts.
# this is typically a path given in POSIX (e.g. forward slashes)
# format, relative to the token %(here)s which refers to the location of this
# ini file
script_location = %(here)s/alembic
# template used to generate migration file names; The default value is %%(rev)s_%%(slug)s
# Uncomment the line below if you want the files to be prepended with date and time
# see https://alembic.sqlalchemy.org/en/latest/tutorial.html#editing-the-ini-file
# for all available tokens
# file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s
# sys.path path, will be prepended to sys.path if present.
# defaults to the current working directory. for multiple paths, the path separator
# is defined by "path_separator" below.
prepend_sys_path = .
# timezone to use when rendering the date within the migration file
# as well as the filename.
# If specified, requires the tzdata library which can be installed by adding
# `alembic[tz]` to the pip requirements.
# string value is passed to ZoneInfo()
# leave blank for localtime
# timezone =
# max length of characters to apply to the "slug" field
# truncate_slug_length = 40
# set to 'true' to run the environment during
# the 'revision' command, regardless of autogenerate
# revision_environment = false
# set to 'true' to allow .pyc and .pyo files without
# a source .py file to be detected as revisions in the
# versions/ directory
# sourceless = false
# version location specification; This defaults
# to <script_location>/versions. When using multiple version
# directories, initial revisions must be specified with --version-path.
# The path separator used here should be the separator specified by "path_separator"
# below.
# version_locations = %(here)s/bar:%(here)s/bat:%(here)s/alembic/versions
# path_separator; This indicates what character is used to split lists of file
# paths, including version_locations and prepend_sys_path within configparser
# files such as alembic.ini.
# The default rendered in new alembic.ini files is "os", which uses os.pathsep
# to provide os-dependent path splitting.
#
# Note that in order to support legacy alembic.ini files, this default does NOT
# take place if path_separator is not present in alembic.ini. If this
# option is omitted entirely, fallback logic is as follows:
#
# 1. Parsing of the version_locations option falls back to using the legacy
# "version_path_separator" key, which if absent then falls back to the legacy
# behavior of splitting on spaces and/or commas.
# 2. Parsing of the prepend_sys_path option falls back to the legacy
# behavior of splitting on spaces, commas, or colons.
#
# Valid values for path_separator are:
#
# path_separator = :
# path_separator = ;
# path_separator = space
# path_separator = newline
#
# Use os.pathsep. Default configuration used for new projects.
path_separator = os
# set to 'true' to search source files recursively
# in each "version_locations" directory
# new in Alembic version 1.10
# recursive_version_locations = false
# the output encoding used when revision files
# are written from script.py.mako
# output_encoding = utf-8
# database URL. This is consumed by the user-maintained env.py script only.
# other means of configuring database URLs may be customized within the env.py
# file.
sqlalchemy.url = driver://user:pass@localhost/dbname
[post_write_hooks]
# post_write_hooks defines scripts or Python functions that are run
# on newly generated revision scripts. See the documentation for further
# detail and examples
# format using "black" - use the console_scripts runner, against the "black" entrypoint
# hooks = black
# black.type = console_scripts
# black.entrypoint = black
# black.options = -l 79 REVISION_SCRIPT_FILENAME
# lint with attempts to fix using "ruff" - use the module runner, against the "ruff" module
# hooks = ruff
# ruff.type = module
# ruff.module = ruff
# ruff.options = check --fix REVISION_SCRIPT_FILENAME
# Alternatively, use the exec runner to execute a binary found on your PATH
# hooks = ruff
# ruff.type = exec
# ruff.executable = ruff
# ruff.options = check --fix REVISION_SCRIPT_FILENAME
# Logging configuration. This is also consumed by the user-maintained
# env.py script only.
[loggers]
keys = root,sqlalchemy,alembic
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = WARNING
handlers = console
qualname =
[logger_sqlalchemy]
level = WARNING
handlers =
qualname = sqlalchemy.engine
[logger_alembic]
level = INFO
handlers =
qualname = alembic
[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic
[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S

1
backend/alembic/README Normal file
View File

@@ -0,0 +1 @@
Generic single-database configuration.

87
backend/alembic/env.py Normal file
View File

@@ -0,0 +1,87 @@
from logging.config import fileConfig
import os
import sys
from sqlalchemy import engine_from_config
from sqlalchemy import pool
from alembic import context
# Add app to path to import models
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from app.database import Base
from app.models import * # Import models to register them
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config
# Interpret the config file for Python logging.
# This line sets up loggers basically.
if config.config_file_name is not None:
fileConfig(config.config_file_name)
# add your model's MetaData object here
# for 'autogenerate' support
target_metadata = Base.metadata
# other values from the config, defined by the needs of env.py,
# can be acquired:
# my_important_option = config.get_main_option("my_important_option")
# ... etc.
# Override sqlalchemy.url with our app's URL
config.set_main_option("sqlalchemy.url", "sqlite:///./music.db")
def run_migrations_offline() -> None:
"""Run migrations in 'offline' mode.
This configures the context with just a URL
and not an Engine, though an Engine is acceptable
here as well. By skipping the Engine creation
we don't even need a DBAPI to be available.
Calls to context.execute() here emit the given string to the
script output.
"""
url = config.get_main_option("sqlalchemy.url")
context.configure(
url=url,
target_metadata=target_metadata,
literal_binds=True,
dialect_opts={"paramstyle": "named"},
)
with context.begin_transaction():
context.run_migrations()
def run_migrations_online() -> None:
"""Run migrations in 'online' mode.
In this scenario we need to create an Engine
and associate a connection with the context.
"""
connectable = engine_from_config(
config.get_section(config.config_ini_section, {}),
prefix="sqlalchemy.",
poolclass=pool.NullPool,
)
with connectable.connect() as connection:
context.configure(
connection=connection, target_metadata=target_metadata
)
with context.begin_transaction():
context.run_migrations()
if context.is_offline_mode():
run_migrations_offline()
else:
run_migrations_online()

View File

@@ -0,0 +1,28 @@
"""${message}
Revision ID: ${up_revision}
Revises: ${down_revision | comma,n}
Create Date: ${create_date}
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
${imports if imports else ""}
# revision identifiers, used by Alembic.
revision: str = ${repr(up_revision)}
down_revision: Union[str, Sequence[str], None] = ${repr(down_revision)}
branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)}
depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)}
def upgrade() -> None:
"""Upgrade schema."""
${upgrades if upgrades else "pass"}
def downgrade() -> None:
"""Downgrade schema."""
${downgrades if downgrades else "pass"}

View File

@@ -0,0 +1,73 @@
"""Initial Schema Complete
Revision ID: 707387fe1be2
Revises:
Create Date: 2025-12-24 21:23:43.744292
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = '707387fe1be2'
down_revision: Union[str, Sequence[str], None] = None
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
"""Upgrade schema."""
# ### commands auto generated by Alembic - please adjust! ###
op.create_table('tracks',
sa.Column('id', sa.String(), nullable=False),
sa.Column('name', sa.String(), nullable=True),
sa.Column('artist', sa.String(), nullable=True),
sa.Column('album', sa.String(), nullable=True),
sa.Column('duration_ms', sa.Integer(), nullable=True),
sa.Column('popularity', sa.Integer(), nullable=True),
sa.Column('raw_data', sa.JSON(), nullable=True),
sa.Column('danceability', sa.Float(), nullable=True),
sa.Column('energy', sa.Float(), nullable=True),
sa.Column('key', sa.Integer(), nullable=True),
sa.Column('loudness', sa.Float(), nullable=True),
sa.Column('mode', sa.Integer(), nullable=True),
sa.Column('speechiness', sa.Float(), nullable=True),
sa.Column('acousticness', sa.Float(), nullable=True),
sa.Column('instrumentalness', sa.Float(), nullable=True),
sa.Column('liveness', sa.Float(), nullable=True),
sa.Column('valence', sa.Float(), nullable=True),
sa.Column('tempo', sa.Float(), nullable=True),
sa.Column('time_signature', sa.Integer(), nullable=True),
sa.Column('genres', sa.JSON(), nullable=True),
sa.Column('lyrics_summary', sa.String(), nullable=True),
sa.Column('genre_tags', sa.String(), nullable=True),
sa.Column('created_at', sa.DateTime(), nullable=True),
sa.Column('updated_at', sa.DateTime(), nullable=True),
sa.PrimaryKeyConstraint('id')
)
op.create_index(op.f('ix_tracks_id'), 'tracks', ['id'], unique=False)
op.create_table('play_history',
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('track_id', sa.String(), nullable=True),
sa.Column('played_at', sa.DateTime(), nullable=True),
sa.Column('context_uri', sa.String(), nullable=True),
sa.ForeignKeyConstraint(['track_id'], ['tracks.id'], ),
sa.PrimaryKeyConstraint('id')
)
op.create_index(op.f('ix_play_history_id'), 'play_history', ['id'], unique=False)
op.create_index(op.f('ix_play_history_played_at'), 'play_history', ['played_at'], unique=False)
# ### end Alembic commands ###
def downgrade() -> None:
"""Downgrade schema."""
# ### commands auto generated by Alembic - please adjust! ###
op.drop_index(op.f('ix_play_history_played_at'), table_name='play_history')
op.drop_index(op.f('ix_play_history_id'), table_name='play_history')
op.drop_table('play_history')
op.drop_index(op.f('ix_tracks_id'), table_name='tracks')
op.drop_table('tracks')
# ### end Alembic commands ###

View File

@@ -5,6 +5,7 @@ from sqlalchemy.orm import Session
from .models import Track, PlayHistory from .models import Track, PlayHistory
from .database import SessionLocal from .database import SessionLocal
from .services.spotify_client import SpotifyClient from .services.spotify_client import SpotifyClient
from .services.reccobeats_client import ReccoBeatsClient
from dateutil import parser from dateutil import parser
# Initialize Spotify Client (env vars will be populated later) # Initialize Spotify Client (env vars will be populated later)
@@ -15,10 +16,93 @@ def get_spotify_client():
refresh_token=os.getenv("SPOTIFY_REFRESH_TOKEN"), refresh_token=os.getenv("SPOTIFY_REFRESH_TOKEN"),
) )
def get_reccobeats_client():
return ReccoBeatsClient()
async def enrich_tracks(db: Session, spotify_client: SpotifyClient, recco_client: ReccoBeatsClient):
"""
Finds tracks missing genres (Spotify) or audio features (ReccoBeats) and enriches them.
"""
# 1. Enrich Audio Features (via ReccoBeats)
tracks_missing_features = db.query(Track).filter(Track.danceability == None).limit(50).all()
print(f"DEBUG: Found {len(tracks_missing_features)} tracks missing audio features.")
if tracks_missing_features:
print(f"Enriching {len(tracks_missing_features)} tracks with audio features (ReccoBeats)...")
ids = [t.id for t in tracks_missing_features]
features_list = await recco_client.get_audio_features(ids)
features_map = {}
for f in features_list:
if "href" in f and "track/" in f["href"]:
tid = f["href"].split("track/")[1].split("?")[0]
features_map[tid] = f
updated_count = 0
for track in tracks_missing_features:
data = features_map.get(track.id)
if data:
track.danceability = data.get("danceability")
track.energy = data.get("energy")
track.key = data.get("key")
track.loudness = data.get("loudness")
track.mode = data.get("mode")
track.speechiness = data.get("speechiness")
track.acousticness = data.get("acousticness")
track.instrumentalness = data.get("instrumentalness")
track.liveness = data.get("liveness")
track.valence = data.get("valence")
track.tempo = data.get("tempo")
updated_count += 1
print(f"Updated {updated_count} tracks with audio features.")
db.commit()
# 2. Enrich Genres (via Spotify Artists)
tracks_missing_genres = db.query(Track).filter(Track.genres == None).limit(50).all()
if tracks_missing_genres:
print(f"Enriching {len(tracks_missing_genres)} tracks with genres (Spotify)...")
artist_ids = set()
track_artist_map = {}
for t in tracks_missing_genres:
if t.raw_data and "artists" in t.raw_data:
a_ids = [a["id"] for a in t.raw_data["artists"]]
artist_ids.update(a_ids)
track_artist_map[t.id] = a_ids
artist_ids_list = list(artist_ids)
artist_genre_map = {}
for i in range(0, len(artist_ids_list), 50):
chunk = artist_ids_list[i:i+50]
artists_data = await spotify_client.get_artists(chunk)
for a_data in artists_data:
if a_data:
artist_genre_map[a_data["id"]] = a_data.get("genres", [])
for t in tracks_missing_genres:
a_ids = track_artist_map.get(t.id, [])
combined_genres = set()
for a_id in a_ids:
genres = artist_genre_map.get(a_id, [])
combined_genres.update(genres)
t.genres = list(combined_genres)
db.commit()
async def ingest_recently_played(db: Session): async def ingest_recently_played(db: Session):
client = get_spotify_client() spotify_client = get_spotify_client()
recco_client = get_reccobeats_client()
try: try:
items = await client.get_recently_played(limit=50) items = await spotify_client.get_recently_played(limit=50)
except Exception as e: except Exception as e:
print(f"Error connecting to Spotify: {e}") print(f"Error connecting to Spotify: {e}")
return return
@@ -30,7 +114,6 @@ async def ingest_recently_played(db: Session):
played_at_str = item["played_at"] played_at_str = item["played_at"]
played_at = parser.isoparse(played_at_str) played_at = parser.isoparse(played_at_str)
# 1. Check if track exists, if not create it
track_id = track_data["id"] track_id = track_data["id"]
track = db.query(Track).filter(Track.id == track_id).first() track = db.query(Track).filter(Track.id == track_id).first()
@@ -46,10 +129,8 @@ async def ingest_recently_played(db: Session):
raw_data=track_data raw_data=track_data
) )
db.add(track) db.add(track)
db.commit() # Commit immediately so ID exists for foreign key db.commit()
# 2. Check if this specific play instance exists
# We assume (track_id, played_at) is unique enough
exists = db.query(PlayHistory).filter( exists = db.query(PlayHistory).filter(
PlayHistory.track_id == track_id, PlayHistory.track_id == track_id,
PlayHistory.played_at == played_at PlayHistory.played_at == played_at
@@ -66,9 +147,13 @@ async def ingest_recently_played(db: Session):
db.commit() db.commit()
# Enrich
await enrich_tracks(db, spotify_client, recco_client)
async def run_worker(): async def run_worker():
"""Simulates a background worker loop.""" """Simulates a background worker loop."""
db = SessionLocal() db = SessionLocal()
try: try:
while True: while True:
print("Worker: Polling Spotify...") print("Worker: Polling Spotify...")

View File

@@ -1,4 +1,4 @@
from sqlalchemy import Column, Integer, String, DateTime, JSON, ForeignKey, Boolean from sqlalchemy import Column, Integer, String, DateTime, JSON, ForeignKey, Float
from sqlalchemy.orm import relationship from sqlalchemy.orm import relationship
from datetime import datetime from datetime import datetime
from .database import Base from .database import Base
@@ -16,6 +16,24 @@ class Track(Base):
# Store raw full JSON response for future-proofing analysis # Store raw full JSON response for future-proofing analysis
raw_data = Column(JSON, nullable=True) raw_data = Column(JSON, nullable=True)
# Enriched Data (Phase 3 Prep)
# Audio Features
danceability = Column(Float, nullable=True)
energy = Column(Float, nullable=True)
key = Column(Integer, nullable=True)
loudness = Column(Float, nullable=True)
mode = Column(Integer, nullable=True)
speechiness = Column(Float, nullable=True)
acousticness = Column(Float, nullable=True)
instrumentalness = Column(Float, nullable=True)
liveness = Column(Float, nullable=True)
valence = Column(Float, nullable=True)
tempo = Column(Float, nullable=True)
time_signature = Column(Integer, nullable=True)
# Genres (stored as JSON list of strings)
genres = Column(JSON, nullable=True)
# AI Analysis fields # AI Analysis fields
lyrics_summary = Column(String, nullable=True) lyrics_summary = Column(String, nullable=True)
genre_tags = Column(String, nullable=True) # JSON list stored as string or just raw JSON genre_tags = Column(String, nullable=True) # JSON list stored as string or just raw JSON

View File

@@ -0,0 +1,18 @@
import httpx
from typing import List, Dict, Any
RECCOBEATS_API_URL = "https://api.reccobeats.com/v1/audio-features"
class ReccoBeatsClient:
async def get_audio_features(self, spotify_ids: List[str]) -> List[Dict[str, Any]]:
if not spotify_ids:
return []
ids_param = ",".join(spotify_ids)
async with httpx.AsyncClient() as client:
try:
response = await client.get(RECCOBEATS_API_URL, params={"ids": ids_param})
if response.status_code != 200:
return []
return response.json().get("content", [])
except Exception:
return []

View File

@@ -3,6 +3,7 @@ import base64
import time import time
import httpx import httpx
from fastapi import HTTPException from fastapi import HTTPException
from typing import List, Dict, Any
SPOTIFY_TOKEN_URL = "https://accounts.spotify.com/api/token" SPOTIFY_TOKEN_URL = "https://accounts.spotify.com/api/token"
SPOTIFY_API_BASE = "https://api.spotify.com/v1" SPOTIFY_API_BASE = "https://api.spotify.com/v1"
@@ -68,3 +69,26 @@ class SpotifyClient:
if response.status_code != 200: if response.status_code != 200:
return None return None
return response.json() return response.json()
async def get_artists(self, artist_ids: List[str]) -> List[Dict[str, Any]]:
"""
Fetches artist details (including genres) for a list of artist IDs.
Spotify allows up to 50 IDs per request.
"""
if not artist_ids:
return []
token = await self.get_access_token()
ids_param = ",".join(artist_ids)
async with httpx.AsyncClient() as client:
response = await client.get(
f"{SPOTIFY_API_BASE}/artists",
params={"ids": ids_param},
headers={"Authorization": f"Bearer {token}"},
)
if response.status_code != 200:
print(f"Error fetching artists: {response.text}")
return []
return response.json().get("artists", [])

View File

@@ -9,3 +9,4 @@ google-generativeai==0.3.2
tenacity==8.2.3 tenacity==8.2.3
python-dateutil==2.9.0.post0 python-dateutil==2.9.0.post0
requests==2.31.0 requests==2.31.0
alembic==1.13.1

View File

26
docker-compose.yml Normal file
View File

@@ -0,0 +1,26 @@
version: '3.8'
services:
backend:
build:
context: ./backend
image: ghcr.io/bnair123/musicanalyser:latest
container_name: music-analyser-backend
restart: unless-stopped
volumes:
- /opt/mySpotify/music.db:/app/music.db
environment:
- SPOTIFY_CLIENT_ID=${SPOTIFY_CLIENT_ID}
- SPOTIFY_CLIENT_SECRET=${SPOTIFY_CLIENT_SECRET}
- SPOTIFY_REFRESH_TOKEN=${SPOTIFY_REFRESH_TOKEN}
- GEMINI_API_KEY=${GEMINI_API_KEY}
ports:
- '8000:8000'
frontend:
build:
context: ./frontend
container_name: music-analyser-frontend
restart: unless-stopped
ports:
- '8991:80'
depends_on:
- backend

11
frontend/Dockerfile Normal file
View File

@@ -0,0 +1,11 @@
FROM node:18-alpine as build
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
FROM nginx:alpine
COPY --from=build /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

13
frontend/nginx.conf Normal file
View File

@@ -0,0 +1,13 @@
server {
listen 80;
location / {
root /usr/share/nginx/html;
index index.html index.htm;
try_files $uri $uri/ /index.html;
}
location /api/ {
proxy_pass http://backend:8000/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}