Implement Phase 2 Frontend and Phase 3 Data Enrichment

- Initialize React+Vite Frontend with Ant Design Dashboard. - Implement Data Enrichment: ReccoBeats (Audio Features) and Spotify (Genres). - Update Database Schema via Alembic Migrations. - Add Docker support (Dockerfile, docker-compose.yml). - Update README with hosting instructions.
Add project context and documentation for Music Analyser
2026-06-25 17:18:29 +02:00 · 2025-12-24 21:34:36 +00:00 · 2025-12-24 22:03:18 +04:00 · 2025-12-24 21:54:04 +04:00 · 2025-12-24 21:30:32 +04:00
17 changed files with 757 additions and 65 deletions
@@ -6,9 +6,17 @@ on:
  pull_request:
    branches: [ "main" ]
 env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
 jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    steps:
    - uses: actions/checkout@v3
@@ -18,9 +26,32 @@ jobs:
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v3
    - name: Log in to the Container registry
      uses: docker/login-action@v3
      with:
        registry: ${{ env.REGISTRY }}
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
    - name: Extract metadata (tags, labels) for Docker
      id: meta
      uses: docker/metadata-action@v5
      with:
        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
        tags: |
          type=ref,event=branch
          type=ref,event=pr
          type=semver,pattern={{version}}
          type=sha
          latest
    - name: Build and push
      uses: docker/build-push-action@v5
      with:
        context: ./backend
-        push: false
+        push: true
-        tags: user/app:latest
+        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        platforms: linux/amd64,linux/arm64
        cache-from: type=gha
        cache-to: type=gha,mode=max
@@ -0,0 +1,114 @@
 # Music Analyser - Project Context & Documentation
 This document serves as a comprehensive guide to the **Music Analyser** project. It outlines the vision, technical decisions, current architecture, and future roadmap. **Use this document to provide context to future AI agents or developers.**
 ## 1. Project Vision
 The goal of this project is to build a personal analytics dashboard that:
 1.  **Regularly queries** the Spotify API (24/7) to collect a complete history of listening habits.
 2.  Stores this data locally (or in a private database) to ensure ownership and completeness.
 3.  Provides **rich analysis** and visualizations (similar to "Spotify Wrapped" but on-demand and more detailed).
 4.  Integrates **AI (Google Gemini)** to provide qualitative insights, summaries, and trend analysis (e.g., "You started the week with high-energy pop but shifted to lo-fi study beats by Friday").
 ## 2. Roadmap & Phases
 ### Phase 1: Foundation & Data Collection (Current Status: ✅ COMPLETED)
 -   **Goal:** reliable data ingestion and storage.
 -   **Deliverables:**
    -   FastAPI Backend.
    -   SQLite Database (with SQLAlchemy).
    -   Spotify OAuth logic (Refresh Token flow).
    -   Background Worker for 24/7 polling.
    -   Docker containerization + GitHub Actions (Multi-arch build).
 ### Phase 2: Visualization (Next Step)
 -   **Goal:** View the raw data.
 -   **Deliverables:**
    -   Frontend (React + Vite).
    -   Basic Data Table / List View of listening history.
    -   Basic filtering (by date, artist).
 ### Phase 3: Analysis & AI
 -   **Goal:** Deep insights.
 -   **Deliverables:**
    -   Advanced charts/graphs.
    -   AI Integration (Gemini 2.5/3 Flash) to generate text summaries of listening trends.
    -   Email reports (optional).
 ## 3. Technical Architecture
 ### Backend
 -   **Language:** Python 3.11+
 -   **Framework:** FastAPI (High performance, easy to use).
 -   **Dependencies:** `httpx` (Async HTTP), `sqlalchemy` (ORM), `pydantic` (Validation).
 ### Database
 -   **Current:** SQLite (`music.db`).
    -   *Decision:* Chosen for simplicity in Phase 1.
 -   **Future path:** The code uses SQLAlchemy, so migrating to **PostgreSQL** (e.g., Supabase) only requires changing the connection string in `database.py`.
 ### Database Schema
 1.  **`Track` Table:**
    -   Stores unique tracks.
    -   Columns: `id` (Spotify ID), `name`, `artist`, `album`, `duration_ms`, `metadata_json` (Stores the *entire* raw Spotify JSON response for future-proofing).
 2.  **`PlayHistory` Table:**
    -   Stores the instances of listening.
    -   Columns: `id`, `track_id` (FK), `played_at` (Timestamp), `context_uri`.
 ### Authentication Strategy
 -   **Challenge:** The background worker runs headless (no user to click "Login").
 -   **Solution:** We use the **Authorization Code Flow with Refresh Tokens**.
    1.  User runs the local helper script (`backend/scripts/get_refresh_token.py`) once.
    2.  This generates a long-lived `SPOTIFY_REFRESH_TOKEN`.
    3.  The backend uses this token to automatically request new short-lived Access Tokens whenever needed.
 ### Background Worker Logic
 -   **File:** `backend/run_worker.py` -> `backend/app/ingest.py`
 -   **Process:**
    1.  Worker wakes up every 60 seconds.
    2.  Calls Spotify `recently-played` endpoint (limit 50).
    3.  Iterates through tracks.
    4.  **Deduplication:** Checks `(track_id, played_at)` against the DB. If it exists, skip. If not, insert.
    5.  **Metadata:** If the track is new to the system, it saves the full metadata immediately.
 ### AI Integration
 -   **Model:** Google Gemini (Target: 2.5 Flash or 3 Flash).
 -   **Status:** Service class exists (`AIService`) but is not yet fully wired into the daily workflow.
 ### Deployment
 -   **Docker:** Multi-stage build (python-slim).
 -   **CI/CD:** GitHub Actions workflow (`docker-publish.yml`).
    -   Builds for `linux/amd64` and `linux/arm64`.
    -   Pushes to GitHub Container Registry (ghcr.io).
 ## 4. How to Run
 ### Prerequisites
 -   Spotify Client ID & Secret.
 -   Google Gemini API Key.
 -   Docker (optional).
 ### Local Development
 1.  **Setup Env:**
    ```bash
    cp backend/.env.example backend/.env
    # Fill in details
    ```
 2.  **Install:**
    ```bash
    cd backend
    pip install -r requirements.txt
    ```
 3.  **Run API:**
    ```bash
    uvicorn app.main:app --reload
    ```
 4.  **Run Worker:**
    ```bash
    python run_worker.py
    ```
 ### Docker
 ```bash
 docker build -t music-analyser-backend ./backend
 docker run --env-file backend/.env music-analyser-backend
 ```
@@ -1,27 +1,37 @@
 # Music Analyser
-A personal analytics dashboard for your music listening habits, powered by Python, FastAPI, and Google Gemini AI.
+A personal analytics dashboard for your music listening habits, powered by Python, FastAPI, React, and Google Gemini AI.
 ## Project Structure
 - `backend/`: FastAPI backend for data ingestion and API.
-  - `app/ingest.py`: Background worker that polls Spotify.
+  - `app/ingest.py`: Background worker that polls Spotify and enriches data via ReccoBeats.
-  - `app/services/`: Logic for Spotify and Gemini APIs.
+  - `app/services/`: Logic for Spotify, ReccoBeats, and Gemini APIs.
  - `app/models.py`: Database schema (Tracks, PlayHistory).
- `frontend/`: (Coming Soon) React/Vite frontend.
+- `frontend/`: React + Vite frontend for visualizing the dashboard.
 - `docker-compose.yml`: For easy deployment.
-## Getting Started
+## Features
-### Prerequisites
+- **Continuous Ingestion**: Polls Spotify every 60 seconds to record your listening history.
 - **Data Enrichment**: Automatically fetches **Genres** (via Spotify) and **Audio Features** (Energy, BPM, Mood via ReccoBeats).
 - **Dashboard**: A responsive UI to view your history and stats.
 - **AI Ready**: Database schema and environment prepared for Gemini AI integration.
- Docker & Docker Compose (optional, for containerization)
+## Hosting Guide (Docker)
 - Python 3.11+ (for local dev)
 - A Spotify Developer App (Client ID & Secret)
 - A Google Gemini API Key
-### 1. Setup Environment Variables
+This application is designed to run via Docker Compose.
-Create a `.env` file in the `backend/` directory:
+### 1. Prerequisites
 - Docker & Docker Compose installed.
 - **Spotify Developer Credentials** (Client ID & Secret).
 - **Spotify Refresh Token** (Run `backend/scripts/get_refresh_token.py` locally to generate this).
 - **Google Gemini API Key**.
 ### 2. Deployment
 1.  **Clone the repository**.
 2.  **Create a `.env` file** in the root directory (or use environment variables directly):
    ```bash
    SPOTIFY_CLIENT_ID="your_client_id"
@@ -30,43 +40,48 @@ SPOTIFY_REFRESH_TOKEN="your_refresh_token"
    GEMINI_API_KEY="your_gemini_key"
    ```
-To get the `SPOTIFY_REFRESH_TOKEN`, run the helper script:
+3.  **Run with Docker Compose**:
    ```bash
-python backend/scripts/get_refresh_token.py
+    docker-compose up -d --build
    ```
-### 2. Run Locally
+    This will:
    - Build and start the **Backend** (port 8000).
    - Build and start the **Frontend** (port 8991).
    - Create a **Persistent Volume** at `/opt/mySpotify` (mapped to the container's database) to ensure **no data loss** during updates.
-Install dependencies:
+4.  **Access the Dashboard**:
    Open your browser to `http://localhost:8991` (or your server IP).
 ### 3. Data Persistence & Updates
 - **Data**: All data is stored in `music.db` inside the container, which is mounted to `/opt/mySpotify/music.db` on your host machine.
 - **Migrations**: The project uses **Alembic** for database migrations. When you update the container image in the future, the backend will automatically apply any schema changes without deleting your data.
 ### 4. Pulling from Registry (Alternative)
 If you prefer to pull the pre-built image instead of building locally:
 ```bash
 docker pull ghcr.io/bnair123/musicanalyser:latest
 ```
 (Note: You still need to mount the volume and pass environment variables as shown in `docker-compose.yml`).
 ## Local Development
 1.  **Backend**:
    ```bash
    cd backend
    pip install -r requirements.txt
    python run_worker.py  # Starts ingestion
    uvicorn app.main:app --reload  # Starts API
    ```
-Run the server:
+2.  **Frontend**:
    ```bash
-uvicorn app.main:app --reload
+    cd frontend
-```
+    npm install
-
+    npm run dev
 The API will be available at `http://localhost:8000`.
 ### 3. Run Ingestion (Manually)
 You can trigger the ingestion process via the API:
 ```bash
 curl -X POST http://localhost:8000/trigger-ingest
 ```
 Or run the ingestion logic directly via python shell (see `app/ingest.py`).
 ### 4. Docker Build
 To build the image locally:
 ```bash
 docker build -t music-analyser-backend ./backend
    ```
@@ -0,0 +1,147 @@
 # A generic, single database configuration.
 [alembic]
 # path to migration scripts.
 # this is typically a path given in POSIX (e.g. forward slashes)
 # format, relative to the token %(here)s which refers to the location of this
 # ini file
 script_location = %(here)s/alembic
 # template used to generate migration file names; The default value is %%(rev)s_%%(slug)s
 # Uncomment the line below if you want the files to be prepended with date and time
 # see https://alembic.sqlalchemy.org/en/latest/tutorial.html#editing-the-ini-file
 # for all available tokens
 # file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s
 # sys.path path, will be prepended to sys.path if present.
 # defaults to the current working directory.  for multiple paths, the path separator
 # is defined by "path_separator" below.
 prepend_sys_path = .
 # timezone to use when rendering the date within the migration file
 # as well as the filename.
 # If specified, requires the tzdata library which can be installed by adding
 # `alembic[tz]` to the pip requirements.
 # string value is passed to ZoneInfo()
 # leave blank for localtime
 # timezone =
 # max length of characters to apply to the "slug" field
 # truncate_slug_length = 40
 # set to 'true' to run the environment during
 # the 'revision' command, regardless of autogenerate
 # revision_environment = false
 # set to 'true' to allow .pyc and .pyo files without
 # a source .py file to be detected as revisions in the
 # versions/ directory
 # sourceless = false
 # version location specification; This defaults
 # to <script_location>/versions.  When using multiple version
 # directories, initial revisions must be specified with --version-path.
 # The path separator used here should be the separator specified by "path_separator"
 # below.
 # version_locations = %(here)s/bar:%(here)s/bat:%(here)s/alembic/versions
 # path_separator; This indicates what character is used to split lists of file
 # paths, including version_locations and prepend_sys_path within configparser
 # files such as alembic.ini.
 # The default rendered in new alembic.ini files is "os", which uses os.pathsep
 # to provide os-dependent path splitting.
 #
 # Note that in order to support legacy alembic.ini files, this default does NOT
 # take place if path_separator is not present in alembic.ini.  If this
 # option is omitted entirely, fallback logic is as follows:
 #
 # 1. Parsing of the version_locations option falls back to using the legacy
 #    "version_path_separator" key, which if absent then falls back to the legacy
 #    behavior of splitting on spaces and/or commas.
 # 2. Parsing of the prepend_sys_path option falls back to the legacy
 #    behavior of splitting on spaces, commas, or colons.
 #
 # Valid values for path_separator are:
 #
 # path_separator = :
 # path_separator = ;
 # path_separator = space
 # path_separator = newline
 #
 # Use os.pathsep. Default configuration used for new projects.
 path_separator = os
 # set to 'true' to search source files recursively
 # in each "version_locations" directory
 # new in Alembic version 1.10
 # recursive_version_locations = false
 # the output encoding used when revision files
 # are written from script.py.mako
 # output_encoding = utf-8
 # database URL.  This is consumed by the user-maintained env.py script only.
 # other means of configuring database URLs may be customized within the env.py
 # file.
 sqlalchemy.url = driver://user:pass@localhost/dbname
 [post_write_hooks]
 # post_write_hooks defines scripts or Python functions that are run
 # on newly generated revision scripts.  See the documentation for further
 # detail and examples
 # format using "black" - use the console_scripts runner, against the "black" entrypoint
 # hooks = black
 # black.type = console_scripts
 # black.entrypoint = black
 # black.options = -l 79 REVISION_SCRIPT_FILENAME
 # lint with attempts to fix using "ruff" - use the module runner, against the "ruff" module
 # hooks = ruff
 # ruff.type = module
 # ruff.module = ruff
 # ruff.options = check --fix REVISION_SCRIPT_FILENAME
 # Alternatively, use the exec runner to execute a binary found on your PATH
 # hooks = ruff
 # ruff.type = exec
 # ruff.executable = ruff
 # ruff.options = check --fix REVISION_SCRIPT_FILENAME
 # Logging configuration.  This is also consumed by the user-maintained
 # env.py script only.
 [loggers]
 keys = root,sqlalchemy,alembic
 [handlers]
 keys = console
 [formatters]
 keys = generic
 [logger_root]
 level = WARNING
 handlers = console
 qualname =
 [logger_sqlalchemy]
 level = WARNING
 handlers =
 qualname = sqlalchemy.engine
 [logger_alembic]
 level = INFO
 handlers =
 qualname = alembic
 [handler_console]
 class = StreamHandler
 args = (sys.stderr,)
 level = NOTSET
 formatter = generic
 [formatter_generic]
 format = %(levelname)-5.5s [%(name)s] %(message)s
 datefmt = %H:%M:%S
@@ -0,0 +1 @@
 Generic single-database configuration.
@@ -0,0 +1,87 @@
 from logging.config import fileConfig
 import os
 import sys
 from sqlalchemy import engine_from_config
 from sqlalchemy import pool
 from alembic import context
 # Add app to path to import models
 sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 from app.database import Base
 from app.models import * # Import models to register them
 # this is the Alembic Config object, which provides
 # access to the values within the .ini file in use.
 config = context.config
 # Interpret the config file for Python logging.
 # This line sets up loggers basically.
 if config.config_file_name is not None:
    fileConfig(config.config_file_name)
 # add your model's MetaData object here
 # for 'autogenerate' support
 target_metadata = Base.metadata
 # other values from the config, defined by the needs of env.py,
 # can be acquired:
 # my_important_option = config.get_main_option("my_important_option")
 # ... etc.
 # Override sqlalchemy.url with our app's URL
 config.set_main_option("sqlalchemy.url", "sqlite:///./music.db")
 def run_migrations_offline() -> None:
    """Run migrations in 'offline' mode.
    This configures the context with just a URL
    and not an Engine, though an Engine is acceptable
    here as well.  By skipping the Engine creation
    we don't even need a DBAPI to be available.
    Calls to context.execute() here emit the given string to the
    script output.
    """
    url = config.get_main_option("sqlalchemy.url")
    context.configure(
        url=url,
        target_metadata=target_metadata,
        literal_binds=True,
        dialect_opts={"paramstyle": "named"},
    )
    with context.begin_transaction():
        context.run_migrations()
 def run_migrations_online() -> None:
    """Run migrations in 'online' mode.
    In this scenario we need to create an Engine
    and associate a connection with the context.
    """
    connectable = engine_from_config(
        config.get_section(config.config_ini_section, {}),
        prefix="sqlalchemy.",
        poolclass=pool.NullPool,
    )
    with connectable.connect() as connection:
        context.configure(
            connection=connection, target_metadata=target_metadata
        )
        with context.begin_transaction():
            context.run_migrations()
 if context.is_offline_mode():
    run_migrations_offline()
 else:
    run_migrations_online()
@@ -0,0 +1,28 @@
 """${message}
 Revision ID: ${up_revision}
 Revises: ${down_revision | comma,n}
 Create Date: ${create_date}
 """
 from typing import Sequence, Union
 from alembic import op
 import sqlalchemy as sa
 ${imports if imports else ""}
 # revision identifiers, used by Alembic.
 revision: str = ${repr(up_revision)}
 down_revision: Union[str, Sequence[str], None] = ${repr(down_revision)}
 branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)}
 depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)}
 def upgrade() -> None:
    """Upgrade schema."""
    ${upgrades if upgrades else "pass"}
 def downgrade() -> None:
    """Downgrade schema."""
    ${downgrades if downgrades else "pass"}
@@ -0,0 +1,73 @@
 """Initial Schema Complete
 Revision ID: 707387fe1be2
 Revises:
 Create Date: 2025-12-24 21:23:43.744292
 """
 from typing import Sequence, Union
 from alembic import op
 import sqlalchemy as sa
 # revision identifiers, used by Alembic.
 revision: str = '707387fe1be2'
 down_revision: Union[str, Sequence[str], None] = None
 branch_labels: Union[str, Sequence[str], None] = None
 depends_on: Union[str, Sequence[str], None] = None
 def upgrade() -> None:
    """Upgrade schema."""
    # ### commands auto generated by Alembic - please adjust! ###
    op.create_table('tracks',
    sa.Column('id', sa.String(), nullable=False),
    sa.Column('name', sa.String(), nullable=True),
    sa.Column('artist', sa.String(), nullable=True),
    sa.Column('album', sa.String(), nullable=True),
    sa.Column('duration_ms', sa.Integer(), nullable=True),
    sa.Column('popularity', sa.Integer(), nullable=True),
    sa.Column('raw_data', sa.JSON(), nullable=True),
    sa.Column('danceability', sa.Float(), nullable=True),
    sa.Column('energy', sa.Float(), nullable=True),
    sa.Column('key', sa.Integer(), nullable=True),
    sa.Column('loudness', sa.Float(), nullable=True),
    sa.Column('mode', sa.Integer(), nullable=True),
    sa.Column('speechiness', sa.Float(), nullable=True),
    sa.Column('acousticness', sa.Float(), nullable=True),
    sa.Column('instrumentalness', sa.Float(), nullable=True),
    sa.Column('liveness', sa.Float(), nullable=True),
    sa.Column('valence', sa.Float(), nullable=True),
    sa.Column('tempo', sa.Float(), nullable=True),
    sa.Column('time_signature', sa.Integer(), nullable=True),
    sa.Column('genres', sa.JSON(), nullable=True),
    sa.Column('lyrics_summary', sa.String(), nullable=True),
    sa.Column('genre_tags', sa.String(), nullable=True),
    sa.Column('created_at', sa.DateTime(), nullable=True),
    sa.Column('updated_at', sa.DateTime(), nullable=True),
    sa.PrimaryKeyConstraint('id')
    )
    op.create_index(op.f('ix_tracks_id'), 'tracks', ['id'], unique=False)
    op.create_table('play_history',
    sa.Column('id', sa.Integer(), nullable=False),
    sa.Column('track_id', sa.String(), nullable=True),
    sa.Column('played_at', sa.DateTime(), nullable=True),
    sa.Column('context_uri', sa.String(), nullable=True),
    sa.ForeignKeyConstraint(['track_id'], ['tracks.id'], ),
    sa.PrimaryKeyConstraint('id')
    )
    op.create_index(op.f('ix_play_history_id'), 'play_history', ['id'], unique=False)
    op.create_index(op.f('ix_play_history_played_at'), 'play_history', ['played_at'], unique=False)
    # ### end Alembic commands ###
 def downgrade() -> None:
    """Downgrade schema."""
    # ### commands auto generated by Alembic - please adjust! ###
    op.drop_index(op.f('ix_play_history_played_at'), table_name='play_history')
    op.drop_index(op.f('ix_play_history_id'), table_name='play_history')
    op.drop_table('play_history')
    op.drop_index(op.f('ix_tracks_id'), table_name='tracks')
    op.drop_table('tracks')
    # ### end Alembic commands ###
@@ -5,6 +5,7 @@ from sqlalchemy.orm import Session
 from .models import Track, PlayHistory
 from .database import SessionLocal
 from .services.spotify_client import SpotifyClient
 from .services.reccobeats_client import ReccoBeatsClient
 from dateutil import parser
 # Initialize Spotify Client (env vars will be populated later)
@@ -15,10 +16,93 @@ def get_spotify_client():
        refresh_token=os.getenv("SPOTIFY_REFRESH_TOKEN"),
    )
 def get_reccobeats_client():
    return ReccoBeatsClient()
 async def enrich_tracks(db: Session, spotify_client: SpotifyClient, recco_client: ReccoBeatsClient):
    """
    Finds tracks missing genres (Spotify) or audio features (ReccoBeats) and enriches them.
    """
    # 1. Enrich Audio Features (via ReccoBeats)
    tracks_missing_features = db.query(Track).filter(Track.danceability == None).limit(50).all()
    print(f"DEBUG: Found {len(tracks_missing_features)} tracks missing audio features.")
    if tracks_missing_features:
        print(f"Enriching {len(tracks_missing_features)} tracks with audio features (ReccoBeats)...")
        ids = [t.id for t in tracks_missing_features]
        features_list = await recco_client.get_audio_features(ids)
        features_map = {}
        for f in features_list:
            if "href" in f and "track/" in f["href"]:
                tid = f["href"].split("track/")[1].split("?")[0]
                features_map[tid] = f
        updated_count = 0
        for track in tracks_missing_features:
            data = features_map.get(track.id)
            if data:
                track.danceability = data.get("danceability")
                track.energy = data.get("energy")
                track.key = data.get("key")
                track.loudness = data.get("loudness")
                track.mode = data.get("mode")
                track.speechiness = data.get("speechiness")
                track.acousticness = data.get("acousticness")
                track.instrumentalness = data.get("instrumentalness")
                track.liveness = data.get("liveness")
                track.valence = data.get("valence")
                track.tempo = data.get("tempo")
                updated_count += 1
        print(f"Updated {updated_count} tracks with audio features.")
        db.commit()
    # 2. Enrich Genres (via Spotify Artists)
    tracks_missing_genres = db.query(Track).filter(Track.genres == None).limit(50).all()
    if tracks_missing_genres:
        print(f"Enriching {len(tracks_missing_genres)} tracks with genres (Spotify)...")
        artist_ids = set()
        track_artist_map = {}
        for t in tracks_missing_genres:
            if t.raw_data and "artists" in t.raw_data:
                a_ids = [a["id"] for a in t.raw_data["artists"]]
                artist_ids.update(a_ids)
                track_artist_map[t.id] = a_ids
        artist_ids_list = list(artist_ids)
        artist_genre_map = {}
        for i in range(0, len(artist_ids_list), 50):
            chunk = artist_ids_list[i:i+50]
            artists_data = await spotify_client.get_artists(chunk)
            for a_data in artists_data:
                if a_data:
                    artist_genre_map[a_data["id"]] = a_data.get("genres", [])
        for t in tracks_missing_genres:
            a_ids = track_artist_map.get(t.id, [])
            combined_genres = set()
            for a_id in a_ids:
                genres = artist_genre_map.get(a_id, [])
                combined_genres.update(genres)
            t.genres = list(combined_genres)
        db.commit()
 async def ingest_recently_played(db: Session):
-    client = get_spotify_client()
+    spotify_client = get_spotify_client()
    recco_client = get_reccobeats_client()
    try:
-        items = await client.get_recently_played(limit=50)
+        items = await spotify_client.get_recently_played(limit=50)
    except Exception as e:
        print(f"Error connecting to Spotify: {e}")
        return
@@ -30,7 +114,6 @@ async def ingest_recently_played(db: Session):
        played_at_str = item["played_at"]
        played_at = parser.isoparse(played_at_str)
        # 1. Check if track exists, if not create it
        track_id = track_data["id"]
        track = db.query(Track).filter(Track.id == track_id).first()
@@ -46,10 +129,8 @@ async def ingest_recently_played(db: Session):
                raw_data=track_data
            )
            db.add(track)
-            db.commit() # Commit immediately so ID exists for foreign key
+            db.commit()
        # 2. Check if this specific play instance exists
        # We assume (track_id, played_at) is unique enough
        exists = db.query(PlayHistory).filter(
            PlayHistory.track_id == track_id,
            PlayHistory.played_at == played_at
@@ -66,9 +147,13 @@ async def ingest_recently_played(db: Session):
    db.commit()
    # Enrich
    await enrich_tracks(db, spotify_client, recco_client)
 async def run_worker():
    """Simulates a background worker loop."""
    db = SessionLocal()
    try:
        while True:
            print("Worker: Polling Spotify...")
@@ -1,4 +1,4 @@
-from sqlalchemy import Column, Integer, String, DateTime, JSON, ForeignKey, Boolean
+from sqlalchemy import Column, Integer, String, DateTime, JSON, ForeignKey, Float
 from sqlalchemy.orm import relationship
 from datetime import datetime
 from .database import Base
@@ -16,6 +16,24 @@ class Track(Base):
    # Store raw full JSON response for future-proofing analysis
    raw_data = Column(JSON, nullable=True)
    # Enriched Data (Phase 3 Prep)
    # Audio Features
    danceability = Column(Float, nullable=True)
    energy = Column(Float, nullable=True)
    key = Column(Integer, nullable=True)
    loudness = Column(Float, nullable=True)
    mode = Column(Integer, nullable=True)
    speechiness = Column(Float, nullable=True)
    acousticness = Column(Float, nullable=True)
    instrumentalness = Column(Float, nullable=True)
    liveness = Column(Float, nullable=True)
    valence = Column(Float, nullable=True)
    tempo = Column(Float, nullable=True)
    time_signature = Column(Integer, nullable=True)
    # Genres (stored as JSON list of strings)
    genres = Column(JSON, nullable=True)
    # AI Analysis fields
    lyrics_summary = Column(String, nullable=True)
    genre_tags = Column(String, nullable=True) # JSON list stored as string or just raw JSON
@@ -0,0 +1,18 @@
 import httpx
 from typing import List, Dict, Any
 RECCOBEATS_API_URL = "https://api.reccobeats.com/v1/audio-features"
 class ReccoBeatsClient:
    async def get_audio_features(self, spotify_ids: List[str]) -> List[Dict[str, Any]]:
        if not spotify_ids:
            return []
        ids_param = ",".join(spotify_ids)
        async with httpx.AsyncClient() as client:
            try:
                response = await client.get(RECCOBEATS_API_URL, params={"ids": ids_param})
                if response.status_code != 200:
                    return []
                return response.json().get("content", [])
            except Exception:
                return []
@@ -3,6 +3,7 @@ import base64
 import time
 import httpx
 from fastapi import HTTPException
 from typing import List, Dict, Any
 SPOTIFY_TOKEN_URL = "https://accounts.spotify.com/api/token"
 SPOTIFY_API_BASE = "https://api.spotify.com/v1"
@@ -68,3 +69,26 @@ class SpotifyClient:
            if response.status_code != 200:
                return None
            return response.json()
    async def get_artists(self, artist_ids: List[str]) -> List[Dict[str, Any]]:
        """
        Fetches artist details (including genres) for a list of artist IDs.
        Spotify allows up to 50 IDs per request.
        """
        if not artist_ids:
            return []
        token = await self.get_access_token()
        ids_param = ",".join(artist_ids)
        async with httpx.AsyncClient() as client:
            response = await client.get(
                f"{SPOTIFY_API_BASE}/artists",
                params={"ids": ids_param},
                headers={"Authorization": f"Bearer {token}"},
            )
            if response.status_code != 200:
                print(f"Error fetching artists: {response.text}")
                return []
            return response.json().get("artists", [])
@@ -9,3 +9,4 @@ google-generativeai==0.3.2
 tenacity==8.2.3
 python-dateutil==2.9.0.post0
 requests==2.31.0
 alembic==1.13.1
@@ -0,0 +1,26 @@
 version: '3.8'
 services:
  backend:
    build:
      context: ./backend
    image: ghcr.io/bnair123/musicanalyser:latest
    container_name: music-analyser-backend
    restart: unless-stopped
    volumes:
      - /opt/mySpotify/music.db:/app/music.db
    environment:
      - SPOTIFY_CLIENT_ID=${SPOTIFY_CLIENT_ID}
      - SPOTIFY_CLIENT_SECRET=${SPOTIFY_CLIENT_SECRET}
      - SPOTIFY_REFRESH_TOKEN=${SPOTIFY_REFRESH_TOKEN}
      - GEMINI_API_KEY=${GEMINI_API_KEY}
    ports:
      - '8000:8000'
  frontend:
    build:
      context: ./frontend
    container_name: music-analyser-frontend
    restart: unless-stopped
    ports:
      - '8991:80'
    depends_on:
      - backend
@@ -0,0 +1,11 @@
 FROM node:18-alpine as build
 WORKDIR /app
 COPY package*.json ./
 RUN npm install
 COPY . .
 RUN npm run build
 FROM nginx:alpine
 COPY --from=build /app/dist /usr/share/nginx/html
 COPY nginx.conf /etc/nginx/conf.d/default.conf
 EXPOSE 80
 CMD ["nginx", "-g", "daemon off;"]
@@ -0,0 +1,13 @@
 server {
    listen 80;
    location / {
        root /usr/share/nginx/html;
        index index.html index.htm;
        try_files $uri $uri/ /index.html;
    }
    location /api/ {
        proxy_pass http://backend:8000/;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
 }
Author	SHA1	Message	Date
google-labs-jules[bot]	0ca9893c68	Implement Phase 2 Frontend and Phase 3 Data Enrichment - Initialize React+Vite Frontend with Ant Design Dashboard. - Implement Data Enrichment: ReccoBeats (Audio Features) and Spotify (Genres). - Update Database Schema via Alembic Migrations. - Add Docker support (Dockerfile, docker-compose.yml). - Update README with hosting instructions.	2025-12-24 21:34:36 +00:00
bnair123	3a424d15a5	Add project context and documentation for Music Analyser This document outlines the vision, technical decisions, current architecture, and future roadmap for the Music Analyser project. It serves as a guide for future AI agents or developers.	2025-12-24 22:03:18 +04:00
bnair123	4ca4c7befd	Enhance Docker publish workflow with metadata and caching Added environment variables for registry and image name. Updated Docker build and push steps to include metadata extraction and caching.	2025-12-24 21:54:04 +04:00
bnair123	b502e95652	Merge pull request #1 from bnair123/setup-initial-backend-8149240771439055261 Initial Backend Setup	2025-12-24 21:30:32 +04:00