gwoe-antragspruefer

Author	SHA1	Message	Date
Dotty Dotter	80e16df288	Append original Antrag-PDF to GWÖ-Report (#9 ) Extends generate_pdf_report() with a best-effort second stage that appends the original Antrag PDF to the freshly rendered GWÖ-Report so the analysis and its source document live in the same file. Pipeline 1. WeasyPrint renders the report PDF as before. 2. _append_original_antrag() then: - Skips silently if assessment.link is empty or non-HTTP (manual uploads / pasted text leave nothing to fetch). - Downloads the original PDF via httpx (30s timeout, follow redirects, custom user agent). - Validates the response is actually a PDF (Content-Length not relied on; the magic bytes %PDF- are checked). - Adds a single A4 separator page that says "Original-Antrag", repeats the Drucksachen-ID and title, and either confirms the append or shows the failure reason (HTTP code, network error, parse error) plus the source URL. - Appends the downloaded PDF via PyMuPDF doc.insert_pdf(). - Saves to a sibling .tmp file and atomically replaces the original (PyMuPDF refuses non-incremental save into the same file). Edge cases handled - No link / pasted-text upload → no append, no divider, original report unchanged. - Download error / 404 / non-PDF response → divider page with explicit error message and source URL, report still ships. - PDF parse error → divider page without appended content, error logged. - Hard failure during save → fall back to the original WeasyPrint PDF. Verified live in production container against drucksache 8/6645 (Untrending Frauenhass, BÜNDNIS 90/DIE GRÜNEN LSA): - Report 4 pages + 1 divider + 3 pages original = 8 pages total - Divider correctly placed at index 4 - Page 5 starts with "(Ausgegeben am 24.02.2026) … Drucksache 8/6645 … Antrag — Fraktion BÜNDNIS 90/DIE GRÜNEN — Untrending Frauenhass …" - Negative test with a synthetic 404 link: 5 pages total, divider at index 4 with "Original-PDF konnte nicht angehängt werden. Grund: HTTP 404". Resolves #9. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 23:15:05 +02:00
Dotty Dotter	f1867d463c	Bundesland filter & transparency: stringent split + visible source (#8 ) Brings the Bundesland-Dropdown from a cosmetic header widget to a real filter that propagates through every layer (Listing, internal search, statistics, party/tag filters, upload mode), and at the same time makes the source parliament visible in every place where assessments from multiple bundesländer can be mixed. Backend - database.get_all_assessments(bundesland=None) — new optional filter, "ALL" treated as None. - database.search_assessments — bug fix: previous `if bundesland:` branch incorrectly added a `WHERE bundesland='ALL'` clause; now guarded with `bundesland and bundesland != "ALL"`. - main.list_assessments — accepts ?bundesland= query param, includes the bundesland field in the response so the frontend can render badges. - main.get_single_assessment — also includes bundesland in the response so the detail header can show the source parlament. - main.search_landtag — early HTTP 400 when bundesland is missing or "ALL"; the live Landtag adapter cannot serve a synthetic Bundesweit request. - main.index() and main.list_bundeslaender — synthetic "🌍 Bundesweit" entry prepended to the bundesländer list (kept out of bundeslaender.py on purpose — ALL is not a real state). Both endpoints additionally expose a parlament_names map so the frontend can render the source parliament without an extra round-trip. Report (PDF + HTML) - generate_html_report / generate_pdf_report — new optional bundesland parameter. When set, the report header carries the parliament name ("Landtag von Sachsen-Anhalt", "Landtag Nordrhein-Westfalen", …) beside the title. Three call sites updated: run_analysis, run_drucksache_analysis, download_assessment_pdf. Frontend (templates/index.html) - Header dropdown gets the synthetic ALL entry as first option; initial currentBundesland is now 'ALL' (was 'NRW'). - localStorage persistence: changeBundesland writes, DOMContentLoaded reads and validates against the visible options. - changeBundesland resets the score / party / tag filter state, syncs the upload-mode bundesland select, disables the Landtag-Suche button + tooltip when ALL, and toggles a data-mode attribute on .list-content (used by CSS to show/hide the per-item bundesland badge). - loadAssessments now sends ?bundesland=… so the API does the actual filtering. updateStats renders an additional per-bundesland average block (Ø NRW: x · Ø LSA: y) when in ALL mode and the loaded list spans more than one bundesland. - renderList prepends a small "bl-badge" beside the Drucksachen-Nummer. Hidden in single-bundesland mode via CSS selector to avoid clutter. - showDetail header now shows the parliament name as its own line (.detail-parlament). - searchLandtag has an early-out alert if currentBundesland === 'ALL', saving a network round-trip. - Upload-Mode bundesland select now starts with a "— Bundesland wählen —" placeholder (no auto-default), and startAnalysis validates that a concrete bundesland was chosen. CSS - .bl-badge plus the .list-content[data-mode="single"] hide rule. - .detail-parlament for the detail header line. - .header-parlament for the PDF report header line. Resolves #8. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 23:00:39 +02:00
Dotty Dotter	87874a7a14	Activate LSA: Wahlprogramme + ingest + frontend (#2 ) Brings Sachsen-Anhalt online as the second supported Bundesland after NRW. Closes the gap that issue #2 left open: with the PortalaAdapter already in place from `c7242f8`, this commit adds the reference data and flips the activation switch. Wahlprogramme (LTW Sachsen-Anhalt 06.06.2021) - Six PDFs added under app/static/referenzen/{cdu,spd,gruene,fdp,afd, linke}-lsa-2021.pdf, plus paged plain-text extractions under app/kontext/*.txt for the keyword fallback search. - Sources verified by hand: - CDU "Unsere Heimat. Unsere Verantwortung." (cdulsa.de, 82 pages) - SPD "Zusammenhalt und neue Chancen" (FES library, 77 pages) - GRÜNE "Verlässlich für Sachsen-Anhalt" (gruene-lsa.de, 164 pages) - FDP "Wahlprogramm zur Landtagswahl 2021" (Naumann-Stiftung, 76 pages) - AfD "Alles für unsere Heimat!" (klimawahlen.de mirror, 64 pages) - LINKE "Wahlprogramm zur Landtagswahl 2021" (dielinke-sachsen-anhalt.de, 88 pages) - The CDU PDF was the trickiest: KAS blocks bot downloads via Cloudflare; the cdulsa.de copy was located by an autonomous web search and verified to be byte-identical with the official document. Embeddings indexed (in production container, OpenAI-compatible DashScope embeddings via the existing index_programm pipeline): - CDU 134, SPD 145, GRÜNE 183, FDP 100, AfD 64, LINKE 143 chunks - Total LSA: 769 new chunks alongside the existing 775 NRW chunks and 335 federal Grundsatzprogramm chunks. wahlprogramme.py - WAHLPROGRAMME["LSA"] populated with all six parties (canonical fraction codes, original titles, page counts). embeddings.py - PROGRAMME extended with the six new "<partei>-lsa-2021" entries that the indexer pipeline expects. bundeslaender.py - LSA flipped to aktiv=True. The frontend dropdown will now offer Sachsen-Anhalt as a selectable bundesland and analyzer.get_bundesland_ context() will produce a real LSA prompt block (CDU/SPD/FDP as governing fractions, all six landtagsfraktionen). End-to-end smoke test (live in production container before commit) - Adapter: PortalaAdapter.search() returned current Anträge of März 2026 (LINKE + GRÜNE) with correct titles and PDF URLs. - Semantic search for an LSA "ÖPNV in der Altmark" sample antrag matched LINKE S.53, SPD S.68, FDP S.52 — all three with similarity > 0.6 and topical hits (Regionalisierungsmittel, ÖPNV-Förderprogramm, Wasserstoffnetz). Resolves issue #2. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 22:12:32 +02:00
Dotty Dotter	c7242f8413	Add PortalaAdapter for PADOKA / Sachsen-Anhalt (#2 ) Adds a clean-room PortalaAdapter that talks to the eUI/portala framework behind PADOKA (Landtag Sachsen-Anhalt). Same engine powers Berlin's PARDOK; the same adapter will serve issue #3 once activated for BE. Reverse-engineering notes - The "PADOKA = StarWeb" assumption from issue #1 / dokukratie's st.yml is outdated. The Sachsen-Anhalt portal was migrated to the same eUI/portala SPA framework Berlin uses. The legacy starweb URL returns 503; the new entry point is /portal/browse.tt.html. - Search workflow is two-stage: 1. POST /portal/browse.tt.json with a JSON action body containing an Elasticsearch-style query tree under search.json. Returns a report_id plus hit count. 2. POST /portal/report.tt.html with {report_id, start, chunksize} returns the HTML hit list. Each record carries a Perl Data::Dumper block in a <pre> tag with the canonical metadata. - The query schema (sources, search.lines, search.json tree, report block) is taken from dokukratie/scrapers/portala.query.json (GPL-3.0) — only structure/selectors are reused, no Python code is ported. - DB id is "lsa.lissh"; the server validates this and rejects unknown interfaces with an explicit errormsg. - PDFs live under /files/drs/wp{N}/drs/d{nr}{xxx}.pdf and are served directly without any session cookie. What the adapter does - search() builds a date-window query (last ~24 months) for "Antrag" document type and returns the most recent hits. The user's free-text query is applied as a client-side title/Urheber filter (no fulltext search server-side yet — see "Limitations" below). - Hits are parsed from the Perl record dumps in the report HTML: - WEV06.main → title (Perl \x{xx} hex escapes decoded) - WEV32.5 → relative PDF path - WEV32.main → "Antrag <Urheber> <DD.MM.YYYY> Drucksache <b>X/YYYY</b>" - Fraktion strings are normalised to canonical codes (CDU, SPD, GRÜNE, FDP, AfD, LINKE, Landesregierung). - get_document() looks up a single Drucksache by re-running the search. - download_text() fetches the PDF and extracts text via PyMuPDF. - bundeslaender.py: LSA's doku_system corrected from "StarWeb" to "PARDOK", anmerkung updated with the migration story. Limitations (deliberate, MVP) - No server-side full-text search. The portala framework's sf index names for LSA full-text content are not yet known; tree mutations with sf=alAB return 0 hits. Client-side filter is "good enough" for the next ~24 months of Anträge (≈few hundred per WP). - LSA is still aktiv=False in bundeslaender.py — the adapter is dormant in production until issue #2's wahlprogramm ingest and frontend activation land. Verified live against padoka.landtag.sachsen-anhalt.de: - search(query="", limit=5) returned 5 current Anträge from März 2026 (LINKE + GRÜNE) with correct dates, fractions, titles and PDF URLs. - download_text("8/6790") returned 5051 chars of real Antragstext ("ICE-Halt für Salzwedel dauerhaft erhalten"). Refs #2. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 21:50:23 +02:00
Dotty Dotter	ee0218b5af	Refactor wahlprogramme/embeddings/analyzer for multi-state (#5 ) Atomic refactor of the three modules that previously hardcoded NRW behaviour. After this commit, every analysis path consults the central BUNDESLAENDER registry for governing fractions, parliament name, and state metadata. wahlprogramme.py - WAHLPROGRAMME is now nested {bundesland: {partei: meta}}; NRW data hoisted unchanged under the "NRW" key. - New WAHLPROGRAMM_KONTEXT_FILES dict maps a state to its overview markdown file (currently only NRW). - find_relevant_quotes(text, fraktionen, bundesland) — bundesland is now a required positional. Governing fractions for the requested state are merged with the submitting fractions before lookup. - Helpers get_wahlprogramm() and parteien_mit_wahlprogramm() expose the new shape to other modules. - ValueError on unknown bundesland (no silent fallback). embeddings.py - Schema migration in init_embeddings_db: adds a `bundesland` column to the chunks table when missing, plus an index, and backfills existing rows from the PROGRAMME registry. Grundsatzprogramme (federal level) keep bundesland NULL by design. - find_relevant_chunks accepts a bundesland filter that matches state rows OR NULL — so federal Grundsatzprogramme remain visible to every analysis. - get_relevant_quotes_for_antrag(text, fraktionen, bundesland, …) — bundesland required, governing fractions read from BUNDESLAENDER instead of hardcoded ["CDU","GRÜNE"]. Order-preserving dedup replaces the previous set-based merge. - index_programm now writes the bundesland column on insert. - Dropped the hardcoded "Wahlprogramm NRW 2022" label in format_quotes_for_prompt — bundesland context is implicit in the surrounding prompt block. analyzer.py - get_bundesland_context reads parlament_name, regierungsfraktionen, landtagsfraktionen and the optional WAHLPROGRAMM_KONTEXT_FILES entry from the central registry. Throws ValueError on unknown OR inactive bundesland — kills the silent NRW fallback that previously masked configuration gaps. - The Antragsteller-detection heuristic now iterates BUNDESLAENDER[bundesland].landtagsfraktionen instead of WAHLPROGRAMME.keys(), so we recognise parties for which we don't yet have a Wahlprogramm PDF. - Both quote lookups (semantic + keyword fallback) now receive the bundesland. Resolves issue #5. Foundation for #2 (LSA), #3 (Berlin), #4 (MV). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 18:48:11 +02:00
Dotty Dotter	ac18743ff2	Add central bundeslaender.py module with all 16 states (#7 ) Introduces app/bundeslaender.py as the single source of truth for all bundesland-specific data (parliament name, current legislative period, upcoming elections, governing coalition, doku system, base URLs, drucksache format, dokukratie scraper code, active flag, optional remarks). Data reflects April 2026 state. main.py::index() and /api/bundeslaender now derive their lists from this module instead of hardcoding. Frontend dropdown now shows all 16 bundesländer (15 disabled with "(bald)" suffix); previously the landing template showed only 4. NRW remains the only "aktiv" entry. API behaviour change worth noting: the /api/bundeslaender endpoint previously emitted code "ST" for Sachsen-Anhalt; it now emits "LSA" to match the politically dominant abbreviation. No functional impact because non-NRW bundesländer were inactive in both versions. Foundation for #5 and #2; deliberately a no-op for NRW so it can ship and rollback independently. Resolves issue #7. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 14:17:54 +02:00
Dotty Dotter	bcd532be89	Fix responsive layout for mobile viewports (#6 ) Body becomes a flex column so the header takes its natural height and the main container fills the rest via flex:1 — replaces the brittle calc(100vh - 70px) that assumed a 70px header and broke as soon as the header wrapped on mobile. Adds 100dvh fallback for iOS Safari address bar quirks. Mobile breakpoint (≤900px) reworked: list scrolls internally via list-content max-height:50vh, detail-panel uses overflow:visible so the whole document scrolls naturally instead of nesting scrollers. Tapping an item auto-scrolls to the detail panel and a new "← Zur Liste" button (mobile-only) jumps back. Adds a tighter ≤600px breakpoint that hides the subtitle, collapses the matrix grid to one column and shrinks the matrix table for phone screens. Resolves issue #6. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 13:48:55 +02:00
Dotty Dotter	63de3ca20d	Initial commit: GWÖ-Antragsprüfer v1.0 Features: - GWÖ-Matrix 2.0 Analyse für NRW-Landtagsanträge - Verbesserungsvorschläge im Redline-Format (Original/Vorschlag/Begründung) - Wahlprogramm- und Parteiprogrammtreue-Bewertung - Landtag-Suche via OPAL-API - Tag-Wolke mit Multi-Select Filter - Partei-Filter mit Durchschnittswerten - PDF-Report-Generierung - Security Headers (CSP, X-Frame-Options, etc.) - Persistente SQLite-DB via Docker Volumes Tech Stack: - FastAPI + Jinja2 - Qwen LLM via DashScope API - SQLite + aiosqlite - WeasyPrint für PDF - Docker Compose mit Traefik	2026-03-28 22:30:24 +01:00

1 2

58 Commits