Commit Graph

2 Commits

Author SHA1 Message Date
Dotty Dotter
ee0218b5af Refactor wahlprogramme/embeddings/analyzer for multi-state (#5)
Atomic refactor of the three modules that previously hardcoded NRW
behaviour. After this commit, every analysis path consults the central
BUNDESLAENDER registry for governing fractions, parliament name, and
state metadata.

wahlprogramme.py
- WAHLPROGRAMME is now nested {bundesland: {partei: meta}}; NRW data
  hoisted unchanged under the "NRW" key.
- New WAHLPROGRAMM_KONTEXT_FILES dict maps a state to its overview
  markdown file (currently only NRW).
- find_relevant_quotes(text, fraktionen, bundesland) — bundesland is
  now a required positional. Governing fractions for the requested
  state are merged with the submitting fractions before lookup.
- Helpers get_wahlprogramm() and parteien_mit_wahlprogramm() expose
  the new shape to other modules.
- ValueError on unknown bundesland (no silent fallback).

embeddings.py
- Schema migration in init_embeddings_db: adds a `bundesland` column
  to the chunks table when missing, plus an index, and backfills
  existing rows from the PROGRAMME registry. Grundsatzprogramme
  (federal level) keep bundesland NULL by design.
- find_relevant_chunks accepts a bundesland filter that matches state
  rows OR NULL — so federal Grundsatzprogramme remain visible to every
  analysis.
- get_relevant_quotes_for_antrag(text, fraktionen, bundesland, …) —
  bundesland required, governing fractions read from BUNDESLAENDER
  instead of hardcoded ["CDU","GRÜNE"]. Order-preserving dedup
  replaces the previous set-based merge.
- index_programm now writes the bundesland column on insert.
- Dropped the hardcoded "Wahlprogramm NRW 2022" label in
  format_quotes_for_prompt — bundesland context is implicit in the
  surrounding prompt block.

analyzer.py
- get_bundesland_context reads parlament_name, regierungsfraktionen,
  landtagsfraktionen and the optional WAHLPROGRAMM_KONTEXT_FILES entry
  from the central registry. Throws ValueError on unknown OR inactive
  bundesland — kills the silent NRW fallback that previously masked
  configuration gaps.
- The Antragsteller-detection heuristic now iterates
  BUNDESLAENDER[bundesland].landtagsfraktionen instead of
  WAHLPROGRAMME.keys(), so we recognise parties for which we don't
  yet have a Wahlprogramm PDF.
- Both quote lookups (semantic + keyword fallback) now receive the
  bundesland.

Resolves issue #5. Foundation for #2 (LSA), #3 (Berlin), #4 (MV).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 18:48:11 +02:00
Dotty Dotter
63de3ca20d Initial commit: GWÖ-Antragsprüfer v1.0
Features:
- GWÖ-Matrix 2.0 Analyse für NRW-Landtagsanträge
- Verbesserungsvorschläge im Redline-Format (Original/Vorschlag/Begründung)
- Wahlprogramm- und Parteiprogrammtreue-Bewertung
- Landtag-Suche via OPAL-API
- Tag-Wolke mit Multi-Select Filter
- Partei-Filter mit Durchschnittswerten
- PDF-Report-Generierung
- Security Headers (CSP, X-Frame-Options, etc.)
- Persistente SQLite-DB via Docker Volumes

Tech Stack:
- FastAPI + Jinja2
- Qwen LLM via DashScope API
- SQLite + aiosqlite
- WeasyPrint für PDF
- Docker Compose mit Traefik
2026-03-28 22:30:24 +01:00