gwoe-antragspruefer/app
Dotty Dotter c7242f8413 Add PortalaAdapter for PADOKA / Sachsen-Anhalt (#2)
Adds a clean-room PortalaAdapter that talks to the eUI/portala framework
behind PADOKA (Landtag Sachsen-Anhalt). Same engine powers Berlin's
PARDOK; the same adapter will serve issue #3 once activated for BE.

Reverse-engineering notes
- The "PADOKA = StarWeb" assumption from issue #1 / dokukratie's st.yml
  is outdated. The Sachsen-Anhalt portal was migrated to the same
  eUI/portala SPA framework Berlin uses. The legacy starweb URL returns
  503; the new entry point is /portal/browse.tt.html.
- Search workflow is two-stage:
  1. POST /portal/browse.tt.json with a JSON action body containing an
     Elasticsearch-style query tree under search.json. Returns a
     report_id plus hit count.
  2. POST /portal/report.tt.html with {report_id, start, chunksize}
     returns the HTML hit list. Each record carries a Perl Data::Dumper
     block in a <pre> tag with the canonical metadata.
- The query schema (sources, search.lines, search.json tree, report
  block) is taken from dokukratie/scrapers/portala.query.json (GPL-3.0)
  — only structure/selectors are reused, no Python code is ported.
- DB id is "lsa.lissh"; the server validates this and rejects unknown
  interfaces with an explicit errormsg.
- PDFs live under /files/drs/wp{N}/drs/d{nr}{xxx}.pdf and are served
  directly without any session cookie.

What the adapter does
- search() builds a date-window query (last ~24 months) for "Antrag"
  document type and returns the most recent hits. The user's free-text
  query is applied as a client-side title/Urheber filter (no fulltext
  search server-side yet — see "Limitations" below).
- Hits are parsed from the Perl record dumps in the report HTML:
  - WEV06.main → title (Perl \x{xx} hex escapes decoded)
  - WEV32.5   → relative PDF path
  - WEV32.main → "Antrag <Urheber> <DD.MM.YYYY> Drucksache <b>X/YYYY</b>"
- Fraktion strings are normalised to canonical codes (CDU, SPD, GRÜNE,
  FDP, AfD, LINKE, Landesregierung).
- get_document() looks up a single Drucksache by re-running the search.
- download_text() fetches the PDF and extracts text via PyMuPDF.
- bundeslaender.py: LSA's doku_system corrected from "StarWeb" to
  "PARDOK", anmerkung updated with the migration story.

Limitations (deliberate, MVP)
- No server-side full-text search. The portala framework's sf index
  names for LSA full-text content are not yet known; tree mutations
  with sf=alAB return 0 hits. Client-side filter is "good enough" for
  the next ~24 months of Anträge (≈few hundred per WP).
- LSA is still aktiv=False in bundeslaender.py — the adapter is dormant
  in production until issue #2's wahlprogramm ingest and frontend
  activation land.

Verified live against padoka.landtag.sachsen-anhalt.de:
- search(query="", limit=5) returned 5 current Anträge from März 2026
  (LINKE + GRÜNE) with correct dates, fractions, titles and PDF URLs.
- download_text("8/6790") returned 5051 chars of real Antragstext
  ("ICE-Halt für Salzwedel dauerhaft erhalten").

Refs #2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 21:50:23 +02:00
..
kontext Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
routers Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
static/referenzen Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
templates Fix responsive layout for mobile viewports (#6) 2026-04-07 13:48:55 +02:00
__init__.py Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
analyzer.py Refactor wahlprogramme/embeddings/analyzer for multi-state (#5) 2026-04-07 18:48:11 +02:00
bundeslaender.py Add PortalaAdapter for PADOKA / Sachsen-Anhalt (#2) 2026-04-07 21:50:23 +02:00
config.py Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
database.py Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
embeddings.py Refactor wahlprogramme/embeddings/analyzer for multi-state (#5) 2026-04-07 18:48:11 +02:00
main.py Add central bundeslaender.py module with all 16 states (#7) 2026-04-07 14:17:54 +02:00
models.py Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
parlamente.py Add PortalaAdapter for PADOKA / Sachsen-Anhalt (#2) 2026-04-07 21:50:23 +02:00
report.py Initial commit: GWÖ-Antragsprüfer v1.0 2026-03-28 22:30:24 +01:00
wahlprogramme.py Refactor wahlprogramme/embeddings/analyzer for multi-state (#5) 2026-04-07 18:48:11 +02:00