podcast-mindmap/Dockerfile at cb5978132c6b9c54b20973148a7b23f6781a3b0c - podcast-mindmap - Toppyr Code

tobias/podcast-mindmap

Dotty Dotter b0649cea49 Phase 1+2: FastAPI-Backend, SQLite, Embeddings, Semantische Suche

Phase 1:
- FastAPI-Backend (backend/app.py) mit REST-API
- SQLite-Datenbank für Podcasts, Episoden, Absätze, Zitate
- Auto-Import aus mindmap_data.json + srt_index.json beim Start
- Webapp als SPA: API-first mit Static-File-Fallback
- Audio als gemountetes Volume statt im Docker-Image
- Docker-Compose mit Traefik-Labels

Phase 2:
- Qwen text-embedding-v3 via DashScope (1024-dim Vektoren)
- Embedding aller Transkript-Absätze (728 für NEU DENKEN)
- Semantische Suche: /api/semantic-search?q=...
- Similarity-API: /api/similar/{podcast}/{episode}/{paragraph}
- Cosine-Similarity auf normalisierten Vektoren, <100ms
- Findet thematisch verwandte Stellen über Episoden hinweg,
  auch bei komplett unterschiedlicher Wortwahl

Vorbereitet für Multi-Podcast (#10): Datenstruktur unterstützt
mehrere Podcasts, Cross-Podcast-Similarity ist ein Parameter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-20 10:24:53 +02:00

18 lines

343 B

Docker

Raw Blame History

 FROM python:3.12-slim
 WORKDIR /app
 # Install dependencies
 COPY backend/requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
 # Copy backend code
 COPY backend/ .
 # Copy webapp as static files
 COPY webapp/index.html webapp/d3.v7.min.js /static/
 EXPOSE 8000
 CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]