# Database Documentation ## PostgreSQL Connection Details | Property | Value | |----------|-------| | Host | `100.91.248.114` | | Port | `5433` | | User | `bnair` | | Password | `Bharath2002` | | Database | `music_db` | | Data Location (on server) | `/opt/DB/MusicDB/pgdata` | ### Connection String ``` postgresql://bnair:Bharath2002@100.91.248.114:5433/music_db ``` ## Schema Overview ``` ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ artists │ │ track_artists │ │ tracks │ ├─────────────────┤ ├──────────────────┤ ├─────────────────┤ │ id (PK) │◄────┤ artist_id (FK) │ │ id (PK) │ │ name │ │ track_id (FK) │────►│ reccobeats_id │ │ genres (JSON) │ └──────────────────┘ │ name │ │ image_url │ │ artist │ └─────────────────┘ │ album │ │ image_url │ │ duration_ms │ │ popularity │ │ raw_data (JSON) │ │ danceability │ │ energy │ │ key │ │ ... (audio) │ │ genres (JSON) │ │ lyrics │ │ created_at │ │ updated_at │ └─────────────────┘ │ │ ▼ ┌─────────────────────┐ ┌─────────────────┐ │ analysis_snapshots │ │ play_history │ ├─────────────────────┤ ├─────────────────┤ │ id (PK) │ │ id (PK) │ │ date │ │ track_id (FK) │ │ period_start │ │ played_at │ │ period_end │ │ context_uri │ │ period_label │ │ listened_ms │ │ metrics_payload │ │ skipped │ │ narrative_report │ │ source │ │ model_used │ └─────────────────┘ │ playlist_theme │ │ ... (playlist) │ │ playlist_composition│ └─────────────────────┘ ┌─────────────────────┐ │ playlist_config │ ├─────────────────────┤ │ key (PK) │ │ spotify_id │ │ last_updated │ │ current_theme │ │ description │ │ composition (JSON) │ └─────────────────────┘ ``` ## Tables ### `tracks` Central entity storing Spotify track metadata and enriched audio features. | Column | Type | Description | |--------|------|-------------| | `id` | VARCHAR | Spotify track ID (primary key) | | `reccobeats_id` | VARCHAR | ReccoBeats UUID for audio features | | `name` | VARCHAR | Track title | | `artist` | VARCHAR | Display artist string (e.g., "Drake, Future") | | `album` | VARCHAR | Album name | | `image_url` | VARCHAR | Album art URL | | `duration_ms` | INTEGER | Track duration in milliseconds | | `popularity` | INTEGER | Spotify popularity score (0-100) | | `raw_data` | JSON | Full Spotify API response | | `danceability` | FLOAT | Audio feature (0.0-1.0) | | `energy` | FLOAT | Audio feature (0.0-1.0) | | `key` | INTEGER | Musical key (0-11) | | `loudness` | FLOAT | Audio feature (dB) | | `mode` | INTEGER | Major (1) or minor (0) | | `speechiness` | FLOAT | Audio feature (0.0-1.0) | | `acousticness` | FLOAT | Audio feature (0.0-1.0) | | `instrumentalness` | FLOAT | Audio feature (0.0-1.0) | | `liveness` | FLOAT | Audio feature (0.0-1.0) | | `valence` | FLOAT | Audio feature (0.0-1.0) | | `tempo` | FLOAT | BPM | | `time_signature` | INTEGER | Beats per bar | | `genres` | JSON | Genre tags (deprecated, use Artist.genres) | | `lyrics` | TEXT | Full lyrics from Genius | | `lyrics_summary` | VARCHAR | AI-generated summary | | `genre_tags` | VARCHAR | AI-generated tags | | `created_at` | TIMESTAMP | Record creation time | | `updated_at` | TIMESTAMP | Last update time | ### `artists` Artist entities with genre information. | Column | Type | Description | |--------|------|-------------| | `id` | VARCHAR | Spotify artist ID (primary key) | | `name` | VARCHAR | Artist name | | `genres` | JSON | List of genre strings | | `image_url` | VARCHAR | Artist profile image URL | ### `track_artists` Many-to-many relationship between tracks and artists. | Column | Type | Description | |--------|------|-------------| | `track_id` | VARCHAR | Foreign key to tracks.id | | `artist_id` | VARCHAR | Foreign key to artists.id | ### `play_history` Immutable log of listening events. | Column | Type | Description | |--------|------|-------------| | `id` | INTEGER | Auto-increment primary key | | `track_id` | VARCHAR | Foreign key to tracks.id | | `played_at` | TIMESTAMP | When the track was played | | `context_uri` | VARCHAR | Spotify context (playlist, album, etc.) | | `listened_ms` | INTEGER | Duration actually listened | | `skipped` | BOOLEAN | Whether track was skipped | | `source` | VARCHAR | Source of the play event | ### `analysis_snapshots` Stores computed statistics and AI-generated narratives. | Column | Type | Description | |--------|------|-------------| | `id` | INTEGER | Auto-increment primary key | | `date` | TIMESTAMP | When analysis was run | | `period_start` | TIMESTAMP | Analysis period start | | `period_end` | TIMESTAMP | Analysis period end | | `period_label` | VARCHAR | Label (e.g., "last_30_days") | | `metrics_payload` | JSON | StatsService output | | `narrative_report` | JSON | NarrativeService output | | `model_used` | VARCHAR | LLM model name | | `playlist_theme` | VARCHAR | AI-generated theme name | | `playlist_theme_reasoning` | TEXT | AI explanation for theme | | `six_hour_playlist_id` | VARCHAR | Spotify playlist ID | | `daily_playlist_id` | VARCHAR | Spotify playlist ID | | `playlist_composition` | JSON | Track list at snapshot time | ### `playlist_config` Configuration for managed Spotify playlists. | Column | Type | Description | |--------|------|-------------| | `key` | VARCHAR | Config key (primary key, e.g., "six_hour") | | `spotify_id` | VARCHAR | Spotify playlist ID | | `last_updated` | TIMESTAMP | Last update time | | `current_theme` | VARCHAR | Current playlist theme | | `description` | VARCHAR | Playlist description | | `composition` | JSON | Current track list | ## Schema Modifications (Alembic) All schema changes MUST go through Alembic migrations. ### Creating a New Migration ```bash cd backend source venv/bin/activate # Auto-generate migration from model changes alembic revision --autogenerate -m "description_of_change" # Or create empty migration for manual SQL alembic revision -m "description_of_change" ``` ### Applying Migrations ```bash # Apply all pending migrations alembic upgrade head # Apply specific migration alembic upgrade # Rollback one migration alembic downgrade -1 # Rollback to specific revision alembic downgrade ``` ### Migration Best Practices 1. **Test locally first** - Always test migrations on a dev database 2. **Backup before migrating** - `pg_dump -h 100.91.248.114 -p 5433 -U bnair music_db > backup.sql` 3. **One change per migration** - Keep migrations atomic 4. **Include rollback logic** - Implement `downgrade()` function 5. **Review autogenerated migrations** - They may miss nuances ### Example Migration ```python # alembic/versions/xxxx_add_new_column.py from alembic import op import sqlalchemy as sa revision = 'xxxx' down_revision = 'yyyy' def upgrade(): op.add_column('tracks', sa.Column('new_column', sa.String(), nullable=True)) def downgrade(): op.drop_column('tracks', 'new_column') ``` ## Direct Database Access ### Using psql ```bash psql -h 100.91.248.114 -p 5433 -U bnair -d music_db ``` ### Using Python ```python import psycopg2 conn = psycopg2.connect( host='100.91.248.114', port=5433, user='bnair', password='Bharath2002', dbname='music_db' ) ``` ### Common Queries ```sql -- Recent plays SELECT t.name, t.artist, ph.played_at FROM play_history ph JOIN tracks t ON ph.track_id = t.id ORDER BY ph.played_at DESC LIMIT 10; -- Top tracks by play count SELECT t.name, t.artist, COUNT(*) as plays FROM play_history ph JOIN tracks t ON ph.track_id = t.id GROUP BY t.id, t.name, t.artist ORDER BY plays DESC LIMIT 10; -- Genre distribution SELECT genre, COUNT(*) FROM artists, jsonb_array_elements_text(genres::jsonb) AS genre GROUP BY genre ORDER BY count DESC; ```