Adds a clean-room PortalaAdapter that talks to the eUI/portala framework
behind PADOKA (Landtag Sachsen-Anhalt). Same engine powers Berlin's
PARDOK; the same adapter will serve issue #3 once activated for BE.
Reverse-engineering notes
- The "PADOKA = StarWeb" assumption from issue #1 / dokukratie's st.yml
is outdated. The Sachsen-Anhalt portal was migrated to the same
eUI/portala SPA framework Berlin uses. The legacy starweb URL returns
503; the new entry point is /portal/browse.tt.html.
- Search workflow is two-stage:
1. POST /portal/browse.tt.json with a JSON action body containing an
Elasticsearch-style query tree under search.json. Returns a
report_id plus hit count.
2. POST /portal/report.tt.html with {report_id, start, chunksize}
returns the HTML hit list. Each record carries a Perl Data::Dumper
block in a <pre> tag with the canonical metadata.
- The query schema (sources, search.lines, search.json tree, report
block) is taken from dokukratie/scrapers/portala.query.json (GPL-3.0)
— only structure/selectors are reused, no Python code is ported.
- DB id is "lsa.lissh"; the server validates this and rejects unknown
interfaces with an explicit errormsg.
- PDFs live under /files/drs/wp{N}/drs/d{nr}{xxx}.pdf and are served
directly without any session cookie.
What the adapter does
- search() builds a date-window query (last ~24 months) for "Antrag"
document type and returns the most recent hits. The user's free-text
query is applied as a client-side title/Urheber filter (no fulltext
search server-side yet — see "Limitations" below).
- Hits are parsed from the Perl record dumps in the report HTML:
- WEV06.main → title (Perl \x{xx} hex escapes decoded)
- WEV32.5 → relative PDF path
- WEV32.main → "Antrag <Urheber> <DD.MM.YYYY> Drucksache <b>X/YYYY</b>"
- Fraktion strings are normalised to canonical codes (CDU, SPD, GRÜNE,
FDP, AfD, LINKE, Landesregierung).
- get_document() looks up a single Drucksache by re-running the search.
- download_text() fetches the PDF and extracts text via PyMuPDF.
- bundeslaender.py: LSA's doku_system corrected from "StarWeb" to
"PARDOK", anmerkung updated with the migration story.
Limitations (deliberate, MVP)
- No server-side full-text search. The portala framework's sf index
names for LSA full-text content are not yet known; tree mutations
with sf=alAB return 0 hits. Client-side filter is "good enough" for
the next ~24 months of Anträge (≈few hundred per WP).
- LSA is still aktiv=False in bundeslaender.py — the adapter is dormant
in production until issue #2's wahlprogramm ingest and frontend
activation land.
Verified live against padoka.landtag.sachsen-anhalt.de:
- search(query="", limit=5) returned 5 current Anträge from März 2026
(LINKE + GRÜNE) with correct dates, fractions, titles and PDF URLs.
- download_text("8/6790") returned 5051 chars of real Antragstext
("ICE-Halt für Salzwedel dauerhaft erhalten").
Refs #2.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Atomic refactor of the three modules that previously hardcoded NRW
behaviour. After this commit, every analysis path consults the central
BUNDESLAENDER registry for governing fractions, parliament name, and
state metadata.
wahlprogramme.py
- WAHLPROGRAMME is now nested {bundesland: {partei: meta}}; NRW data
hoisted unchanged under the "NRW" key.
- New WAHLPROGRAMM_KONTEXT_FILES dict maps a state to its overview
markdown file (currently only NRW).
- find_relevant_quotes(text, fraktionen, bundesland) — bundesland is
now a required positional. Governing fractions for the requested
state are merged with the submitting fractions before lookup.
- Helpers get_wahlprogramm() and parteien_mit_wahlprogramm() expose
the new shape to other modules.
- ValueError on unknown bundesland (no silent fallback).
embeddings.py
- Schema migration in init_embeddings_db: adds a `bundesland` column
to the chunks table when missing, plus an index, and backfills
existing rows from the PROGRAMME registry. Grundsatzprogramme
(federal level) keep bundesland NULL by design.
- find_relevant_chunks accepts a bundesland filter that matches state
rows OR NULL — so federal Grundsatzprogramme remain visible to every
analysis.
- get_relevant_quotes_for_antrag(text, fraktionen, bundesland, …) —
bundesland required, governing fractions read from BUNDESLAENDER
instead of hardcoded ["CDU","GRÜNE"]. Order-preserving dedup
replaces the previous set-based merge.
- index_programm now writes the bundesland column on insert.
- Dropped the hardcoded "Wahlprogramm NRW 2022" label in
format_quotes_for_prompt — bundesland context is implicit in the
surrounding prompt block.
analyzer.py
- get_bundesland_context reads parlament_name, regierungsfraktionen,
landtagsfraktionen and the optional WAHLPROGRAMM_KONTEXT_FILES entry
from the central registry. Throws ValueError on unknown OR inactive
bundesland — kills the silent NRW fallback that previously masked
configuration gaps.
- The Antragsteller-detection heuristic now iterates
BUNDESLAENDER[bundesland].landtagsfraktionen instead of
WAHLPROGRAMME.keys(), so we recognise parties for which we don't
yet have a Wahlprogramm PDF.
- Both quote lookups (semantic + keyword fallback) now receive the
bundesland.
Resolves issue #5. Foundation for #2 (LSA), #3 (Berlin), #4 (MV).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces app/bundeslaender.py as the single source of truth for all
bundesland-specific data (parliament name, current legislative period,
upcoming elections, governing coalition, doku system, base URLs,
drucksache format, dokukratie scraper code, active flag, optional
remarks). Data reflects April 2026 state.
main.py::index() and /api/bundeslaender now derive their lists from
this module instead of hardcoding. Frontend dropdown now shows all 16
bundesländer (15 disabled with "(bald)" suffix); previously the
landing template showed only 4. NRW remains the only "aktiv" entry.
API behaviour change worth noting: the /api/bundeslaender endpoint
previously emitted code "ST" for Sachsen-Anhalt; it now emits "LSA"
to match the politically dominant abbreviation. No functional impact
because non-NRW bundesländer were inactive in both versions.
Foundation for #5 and #2; deliberately a no-op for NRW so it can ship
and rollback independently.
Resolves issue #7.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Body becomes a flex column so the header takes its natural height and the
main container fills the rest via flex:1 — replaces the brittle
calc(100vh - 70px) that assumed a 70px header and broke as soon as the
header wrapped on mobile. Adds 100dvh fallback for iOS Safari address
bar quirks.
Mobile breakpoint (≤900px) reworked: list scrolls internally via
list-content max-height:50vh, detail-panel uses overflow:visible so the
whole document scrolls naturally instead of nesting scrollers. Tapping
an item auto-scrolls to the detail panel and a new "← Zur Liste" button
(mobile-only) jumps back. Adds a tighter ≤600px breakpoint that hides
the subtitle, collapses the matrix grid to one column and shrinks the
matrix table for phone screens.
Resolves issue #6.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>