feat: migrate to PostgreSQL and enhance playlist curation

- Migrate database from SQLite to PostgreSQL (100.91.248.114:5433)
- Fix playlist curation to use actual top tracks instead of AI name matching
- Add /playlists/history endpoint for historical playlist viewing
- Add Playlist Archives section to frontend with expandable history
- Add playlist-modify-* scopes to Spotify OAuth for playlist creation
- Rewrite Genius client to use official API (fixes 403 scraping blocks)
- Ensure playlists are created on Spotify before curation attempts
- Add DATABASE.md documentation for PostgreSQL schema
- Add migrations for PlaylistConfig and composition storage
This commit is contained in:
bnair123
2025-12-30 22:24:56 +04:00
parent 26b4895695
commit 272148c5bf
19 changed files with 1130 additions and 145 deletions

271
docs/DATABASE.md Normal file
View File

@@ -0,0 +1,271 @@
# Database Documentation
## PostgreSQL Connection Details
| Property | Value |
|----------|-------|
| Host | `100.91.248.114` |
| Port | `5433` |
| User | `bnair` |
| Password | `Bharath2002` |
| Database | `music_db` |
| Data Location (on server) | `/opt/DB/MusicDB/pgdata` |
### Connection String
```
postgresql://bnair:Bharath2002@100.91.248.114:5433/music_db
```
## Schema Overview
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ artists │ │ track_artists │ │ tracks │
├─────────────────┤ ├──────────────────┤ ├─────────────────┤
│ id (PK) │◄────┤ artist_id (FK) │ │ id (PK) │
│ name │ │ track_id (FK) │────►│ reccobeats_id │
│ genres (JSON) │ └──────────────────┘ │ name │
│ image_url │ │ artist │
└─────────────────┘ │ album │
│ image_url │
│ duration_ms │
│ popularity │
│ raw_data (JSON) │
│ danceability │
│ energy │
│ key │
│ ... (audio) │
│ genres (JSON) │
│ lyrics │
│ created_at │
│ updated_at │
└─────────────────┘
┌─────────────────────┐ ┌─────────────────┐
│ analysis_snapshots │ │ play_history │
├─────────────────────┤ ├─────────────────┤
│ id (PK) │ │ id (PK) │
│ date │ │ track_id (FK) │
│ period_start │ │ played_at │
│ period_end │ │ context_uri │
│ period_label │ │ listened_ms │
│ metrics_payload │ │ skipped │
│ narrative_report │ │ source │
│ model_used │ └─────────────────┘
│ playlist_theme │
│ ... (playlist) │
│ playlist_composition│
└─────────────────────┘
┌─────────────────────┐
│ playlist_config │
├─────────────────────┤
│ key (PK) │
│ spotify_id │
│ last_updated │
│ current_theme │
│ description │
│ composition (JSON) │
└─────────────────────┘
```
## Tables
### `tracks`
Central entity storing Spotify track metadata and enriched audio features.
| Column | Type | Description |
|--------|------|-------------|
| `id` | VARCHAR | Spotify track ID (primary key) |
| `reccobeats_id` | VARCHAR | ReccoBeats UUID for audio features |
| `name` | VARCHAR | Track title |
| `artist` | VARCHAR | Display artist string (e.g., "Drake, Future") |
| `album` | VARCHAR | Album name |
| `image_url` | VARCHAR | Album art URL |
| `duration_ms` | INTEGER | Track duration in milliseconds |
| `popularity` | INTEGER | Spotify popularity score (0-100) |
| `raw_data` | JSON | Full Spotify API response |
| `danceability` | FLOAT | Audio feature (0.0-1.0) |
| `energy` | FLOAT | Audio feature (0.0-1.0) |
| `key` | INTEGER | Musical key (0-11) |
| `loudness` | FLOAT | Audio feature (dB) |
| `mode` | INTEGER | Major (1) or minor (0) |
| `speechiness` | FLOAT | Audio feature (0.0-1.0) |
| `acousticness` | FLOAT | Audio feature (0.0-1.0) |
| `instrumentalness` | FLOAT | Audio feature (0.0-1.0) |
| `liveness` | FLOAT | Audio feature (0.0-1.0) |
| `valence` | FLOAT | Audio feature (0.0-1.0) |
| `tempo` | FLOAT | BPM |
| `time_signature` | INTEGER | Beats per bar |
| `genres` | JSON | Genre tags (deprecated, use Artist.genres) |
| `lyrics` | TEXT | Full lyrics from Genius |
| `lyrics_summary` | VARCHAR | AI-generated summary |
| `genre_tags` | VARCHAR | AI-generated tags |
| `created_at` | TIMESTAMP | Record creation time |
| `updated_at` | TIMESTAMP | Last update time |
### `artists`
Artist entities with genre information.
| Column | Type | Description |
|--------|------|-------------|
| `id` | VARCHAR | Spotify artist ID (primary key) |
| `name` | VARCHAR | Artist name |
| `genres` | JSON | List of genre strings |
| `image_url` | VARCHAR | Artist profile image URL |
### `track_artists`
Many-to-many relationship between tracks and artists.
| Column | Type | Description |
|--------|------|-------------|
| `track_id` | VARCHAR | Foreign key to tracks.id |
| `artist_id` | VARCHAR | Foreign key to artists.id |
### `play_history`
Immutable log of listening events.
| Column | Type | Description |
|--------|------|-------------|
| `id` | INTEGER | Auto-increment primary key |
| `track_id` | VARCHAR | Foreign key to tracks.id |
| `played_at` | TIMESTAMP | When the track was played |
| `context_uri` | VARCHAR | Spotify context (playlist, album, etc.) |
| `listened_ms` | INTEGER | Duration actually listened |
| `skipped` | BOOLEAN | Whether track was skipped |
| `source` | VARCHAR | Source of the play event |
### `analysis_snapshots`
Stores computed statistics and AI-generated narratives.
| Column | Type | Description |
|--------|------|-------------|
| `id` | INTEGER | Auto-increment primary key |
| `date` | TIMESTAMP | When analysis was run |
| `period_start` | TIMESTAMP | Analysis period start |
| `period_end` | TIMESTAMP | Analysis period end |
| `period_label` | VARCHAR | Label (e.g., "last_30_days") |
| `metrics_payload` | JSON | StatsService output |
| `narrative_report` | JSON | NarrativeService output |
| `model_used` | VARCHAR | LLM model name |
| `playlist_theme` | VARCHAR | AI-generated theme name |
| `playlist_theme_reasoning` | TEXT | AI explanation for theme |
| `six_hour_playlist_id` | VARCHAR | Spotify playlist ID |
| `daily_playlist_id` | VARCHAR | Spotify playlist ID |
| `playlist_composition` | JSON | Track list at snapshot time |
### `playlist_config`
Configuration for managed Spotify playlists.
| Column | Type | Description |
|--------|------|-------------|
| `key` | VARCHAR | Config key (primary key, e.g., "six_hour") |
| `spotify_id` | VARCHAR | Spotify playlist ID |
| `last_updated` | TIMESTAMP | Last update time |
| `current_theme` | VARCHAR | Current playlist theme |
| `description` | VARCHAR | Playlist description |
| `composition` | JSON | Current track list |
## Schema Modifications (Alembic)
All schema changes MUST go through Alembic migrations.
### Creating a New Migration
```bash
cd backend
source venv/bin/activate
# Auto-generate migration from model changes
alembic revision --autogenerate -m "description_of_change"
# Or create empty migration for manual SQL
alembic revision -m "description_of_change"
```
### Applying Migrations
```bash
# Apply all pending migrations
alembic upgrade head
# Apply specific migration
alembic upgrade <revision_id>
# Rollback one migration
alembic downgrade -1
# Rollback to specific revision
alembic downgrade <revision_id>
```
### Migration Best Practices
1. **Test locally first** - Always test migrations on a dev database
2. **Backup before migrating** - `pg_dump -h 100.91.248.114 -p 5433 -U bnair music_db > backup.sql`
3. **One change per migration** - Keep migrations atomic
4. **Include rollback logic** - Implement `downgrade()` function
5. **Review autogenerated migrations** - They may miss nuances
### Example Migration
```python
# alembic/versions/xxxx_add_new_column.py
from alembic import op
import sqlalchemy as sa
revision = 'xxxx'
down_revision = 'yyyy'
def upgrade():
op.add_column('tracks', sa.Column('new_column', sa.String(), nullable=True))
def downgrade():
op.drop_column('tracks', 'new_column')
```
## Direct Database Access
### Using psql
```bash
psql -h 100.91.248.114 -p 5433 -U bnair -d music_db
```
### Using Python
```python
import psycopg2
conn = psycopg2.connect(
host='100.91.248.114',
port=5433,
user='bnair',
password='Bharath2002',
dbname='music_db'
)
```
### Common Queries
```sql
-- Recent plays
SELECT t.name, t.artist, ph.played_at
FROM play_history ph
JOIN tracks t ON ph.track_id = t.id
ORDER BY ph.played_at DESC
LIMIT 10;
-- Top tracks by play count
SELECT t.name, t.artist, COUNT(*) as plays
FROM play_history ph
JOIN tracks t ON ph.track_id = t.id
GROUP BY t.id, t.name, t.artist
ORDER BY plays DESC
LIMIT 10;
-- Genre distribution
SELECT genre, COUNT(*)
FROM artists, jsonb_array_elements_text(genres::jsonb) AS genre
GROUP BY genre
ORDER BY count DESC;
```