gwoe-antragspruefer/app
Dotty Dotter ee0218b5af Refactor wahlprogramme/embeddings/analyzer for multi-state (#5)
Atomic refactor of the three modules that previously hardcoded NRW
behaviour. After this commit, every analysis path consults the central
BUNDESLAENDER registry for governing fractions, parliament name, and
state metadata.

wahlprogramme.py
- WAHLPROGRAMME is now nested {bundesland: {partei: meta}}; NRW data
  hoisted unchanged under the "NRW" key.
- New WAHLPROGRAMM_KONTEXT_FILES dict maps a state to its overview
  markdown file (currently only NRW).
- find_relevant_quotes(text, fraktionen, bundesland) — bundesland is
  now a required positional. Governing fractions for the requested
  state are merged with the submitting fractions before lookup.
- Helpers get_wahlprogramm() and parteien_mit_wahlprogramm() expose
  the new shape to other modules.
- ValueError on unknown bundesland (no silent fallback).

embeddings.py
- Schema migration in init_embeddings_db: adds a `bundesland` column
  to the chunks table when missing, plus an index, and backfills
  existing rows from the PROGRAMME registry. Grundsatzprogramme
  (federal level) keep bundesland NULL by design.
- find_relevant_chunks accepts a bundesland filter that matches state
  rows OR NULL — so federal Grundsatzprogramme remain visible to every
  analysis.
- get_relevant_quotes_for_antrag(text, fraktionen, bundesland, …) —
  bundesland required, governing fractions read from BUNDESLAENDER
  instead of hardcoded ["CDU","GRÜNE"]. Order-preserving dedup
  replaces the previous set-based merge.
- index_programm now writes the bundesland column on insert.
- Dropped the hardcoded "Wahlprogramm NRW 2022" label in
  format_quotes_for_prompt — bundesland context is implicit in the
  surrounding prompt block.

analyzer.py
- get_bundesland_context reads parlament_name, regierungsfraktionen,
  landtagsfraktionen and the optional WAHLPROGRAMM_KONTEXT_FILES entry
  from the central registry. Throws ValueError on unknown OR inactive
  bundesland — kills the silent NRW fallback that previously masked
  configuration gaps.
- The Antragsteller-detection heuristic now iterates
  BUNDESLAENDER[bundesland].landtagsfraktionen instead of
  WAHLPROGRAMME.keys(), so we recognise parties for which we don't
  yet have a Wahlprogramm PDF.
- Both quote lookups (semantic + keyword fallback) now receive the
  bundesland.

Resolves issue #5. Foundation for #2 (LSA), #3 (Berlin), #4 (MV).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 18:48:11 +02:00
..
kontext Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
routers Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
static/referenzen Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
templates Fix responsive layout for mobile viewports (#6) 2026-04-07 13:48:55 +02:00
__init__.py Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
analyzer.py Refactor wahlprogramme/embeddings/analyzer for multi-state (#5) 2026-04-07 18:48:11 +02:00
bundeslaender.py Add central bundeslaender.py module with all 16 states (#7) 2026-04-07 14:17:54 +02:00
config.py Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
database.py Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
embeddings.py Refactor wahlprogramme/embeddings/analyzer for multi-state (#5) 2026-04-07 18:48:11 +02:00
main.py Add central bundeslaender.py module with all 16 states (#7) 2026-04-07 14:17:54 +02:00
models.py Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
parlamente.py Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
report.py Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
wahlprogramme.py Refactor wahlprogramme/embeddings/analyzer for multi-state (#5) 2026-04-07 18:48:11 +02:00