gwoe-antragspruefer/app/static/referenzen/linke-lsa-2021.pdf
Dotty Dotter 87874a7a14 Activate LSA: Wahlprogramme + ingest + frontend (#2)
Brings Sachsen-Anhalt online as the second supported Bundesland after
NRW. Closes the gap that issue #2 left open: with the PortalaAdapter
already in place from c7242f8, this commit adds the reference data and
flips the activation switch.

Wahlprogramme (LTW Sachsen-Anhalt 06.06.2021)
- Six PDFs added under app/static/referenzen/{cdu,spd,gruene,fdp,afd,
  linke}-lsa-2021.pdf, plus paged plain-text extractions under
  app/kontext/*.txt for the keyword fallback search.
- Sources verified by hand:
  - CDU "Unsere Heimat. Unsere Verantwortung." (cdulsa.de, 82 pages)
  - SPD "Zusammenhalt und neue Chancen" (FES library, 77 pages)
  - GRÜNE "Verlässlich für Sachsen-Anhalt" (gruene-lsa.de, 164 pages)
  - FDP "Wahlprogramm zur Landtagswahl 2021" (Naumann-Stiftung, 76 pages)
  - AfD "Alles für unsere Heimat!" (klimawahlen.de mirror, 64 pages)
  - LINKE "Wahlprogramm zur Landtagswahl 2021" (dielinke-sachsen-anhalt.de,
    88 pages)
- The CDU PDF was the trickiest: KAS blocks bot downloads via
  Cloudflare; the cdulsa.de copy was located by an autonomous web
  search and verified to be byte-identical with the official document.

Embeddings indexed (in production container, OpenAI-compatible
DashScope embeddings via the existing index_programm pipeline):
- CDU 134, SPD 145, GRÜNE 183, FDP 100, AfD 64, LINKE 143 chunks
- Total LSA: 769 new chunks alongside the existing 775 NRW chunks
  and 335 federal Grundsatzprogramm chunks.

wahlprogramme.py
- WAHLPROGRAMME["LSA"] populated with all six parties (canonical fraction
  codes, original titles, page counts).

embeddings.py
- PROGRAMME extended with the six new "<partei>-lsa-2021" entries that
  the indexer pipeline expects.

bundeslaender.py
- LSA flipped to aktiv=True. The frontend dropdown will now offer
  Sachsen-Anhalt as a selectable bundesland and analyzer.get_bundesland_
  context() will produce a real LSA prompt block (CDU/SPD/FDP as
  governing fractions, all six landtagsfraktionen).

End-to-end smoke test (live in production container before commit)
- Adapter: PortalaAdapter.search() returned current Anträge of März 2026
  (LINKE + GRÜNE) with correct titles and PDF URLs.
- Semantic search for an LSA "ÖPNV in der Altmark" sample antrag
  matched LINKE S.53, SPD S.68, FDP S.52 — all three with similarity
  > 0.6 and topical hits (Regionalisierungsmittel, ÖPNV-Förderprogramm,
  Wasserstoffnetz).

Resolves issue #2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 22:12:32 +02:00

555 KiB