fix: Citation-Binding partei-skopiert (Cross-Partei-Misattribution gestoppt)

Bug: AfD-Parteiprogramm-Block enthielt ein Zitat mit quelle "CDU Grundsatzprogramm 2024, S. 33" (DRS 21/4939). Ursache: reconstruct_zitate hatte alle Chunks aller Parteien in einen Pool gemischt. Wenn der LLM unter AfD-Parteiprogramm einen Text emittierte, der zufaellig auch im CDU-Programm vorkam, matched der Code den CDU-Chunk und ueberschrieb quelle/url mit CDU-Werten. Fix: Match strikt auf chunks_by_party[fraktion][kind]. Fallback auf gleiche Partei/andere Kategorie (z.B. AfD hat nur Grundsatz-, kein Wahlprogramm im Index). Wenn kein Match in der eigenen Partei → Zitat verwerfen statt fremde quelle behalten. Lieber 0 Zitate als ein Misattributions-Zitat. Plus v3-UI: - News-Box von ganz hinten nach oberhalb "Neu analysieren" verschoben - News-Liste auf 1 Item gekuerzt + 9-Zeilen-Clamp via -webkit-line-clamp Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 11:51:03 +02:00 · 2026-05-07 11:51:03 +02:00 · 535c2f15e4
commit 535c2f15e4
parent 1ef5578e02
3 changed files with 108 additions and 35 deletions
--- a/app/embeddings.py
+++ b/app/embeddings.py
@ -901,46 +901,94 @@ def find_chunk_for_text(text: str, chunks: list[dict]) -> Optional[dict]:
 def reconstruct_zitate(data: dict, semantic_quotes: dict) -> dict:
    """Verify and reconstruct LLM-emitted zitate against retrieved chunks.

-    For each Zitat:
-    * **verified** (substring/4-word-anchor match): overwrite quelle/url
-      with canonical chunk values, set ``verified: true``.
-    * **unverified** (no match found): keep the Zitat but set
-      ``verified: false``. The UI shows it with a different style so the
-      user knows it's an LLM-Paraphrase, not a wörtliches Zitat.
+    Matching ist strikt **partei-skopiert** — ein Zitat im AfD-Block darf
+    nur gegen AfD-Chunks gematcht werden, niemals gegen CDU/SPD-Chunks.
+    Sonst landet ein zufaellig wortgleicher Text aus einem fremden Programm
+    mit fremder ``quelle`` im falschen Block (Cross-Partei-Misattribution).

-    This replaces the old drop-on-no-match behavior (ADR 0001 Option B)
-    with a more honest approach: paraphrased citations are still valuable
-    context, they just need to be marked as such.
+    Match-Reihenfolge pro Zitat:
+    1. Partei + exakte Programm-Kategorie (z.B. AfD-Parteiprogramm-Chunks
+       fuer ein Zitat im AfD-Parteiprogramm-Block) → ``verified: true`` mit
+       kanonischer ``quelle``/``url`` aus dem Chunk.
+    2. Partei + andere Programm-Kategorie (z.B. AfD hat nur Grundsatz-/
+       Parteiprogramm im Index, der LLM hat den Text aber im Wahlprogramm-
+       Block emittiert) → ``verified: true`` mit korrigierter ``quelle``,
+       Block bleibt wie vom LLM gesetzt.
+    3. Kein Match in der eigenen Partei → **Zitat verwerfen**. Lieber 0
+       Zitate als eines mit falscher Partei-Zuschreibung. Vorher wurde
+       solche Zitate als ``verified: false`` mit der LLM-quelle behalten —
+       das fuehrte z.B. zu CDU-quellen in AfD-Bloecken (#175-bug).
    """
    if not semantic_quotes:
        return data

-    all_chunks: list[dict] = []
-    for d in semantic_quotes.values():
-        all_chunks.extend(d.get("wahlprogramm", []))
-        all_chunks.extend(d.get("parteiprogramm", []))
-    if not all_chunks:
+    # Pool pro Partei aufbauen — Lookup geht direkt + ueber normalize_partei,
+    # damit Aliase ("BÜNDNIS 90/DIE GRÜNEN" ↔ "GRÜNE") beidseitig matchen.
+    chunks_by_party: dict[str, dict[str, list]] = {}
+    for partei, d in (semantic_quotes or {}).items():
+        chunks_by_party[partei] = {
+            "wahlprogramm":   list(d.get("wahlprogramm",   []) or []),
+            "parteiprogramm": list(d.get("parteiprogramm", []) or []),
+        }
+    if not chunks_by_party:
        return data

+    try:
+        from .parteien import normalize_partei
+    except Exception:
+        normalize_partei = lambda x: x  # noqa: E731
+
+    def _pool_for(fraktion: str) -> dict[str, list]:
+        # Versuch direkt, dann normalisiert. Wenn weder noch — leerer Pool.
+        if fraktion in chunks_by_party:
+            return chunks_by_party[fraktion]
+        norm = normalize_partei(fraktion) or fraktion
+        if norm in chunks_by_party:
+            return chunks_by_party[norm]
+        # Reverse-Lookup: vielleicht ist `chunks_by_party` mit normalisiertem
+        # Key bestueckt waehrend `fraktion` der Original-Name ist.
+        for key, val in chunks_by_party.items():
+            if normalize_partei(key) == norm:
+                return val
+        return {"wahlprogramm": [], "parteiprogramm": []}
+
    for fs in data.get("wahlprogrammScores", []) or []:
+        partei_name = fs.get("fraktion", "")
+        partei_pool = _pool_for(partei_name)
+
        for kind in ("wahlprogramm", "parteiprogramm"):
            blk = fs.get(kind) or {}
            zitate = blk.get("zitate") or []
+            allowed = partei_pool.get(kind) or []
+            cross_kind = "parteiprogramm" if kind == "wahlprogramm" else "wahlprogramm"
+            fallback = partei_pool.get(cross_kind) or []
+
            cleaned = []
            for z in zitate:
-                text = z.get("text", "")
-                matched = find_chunk_for_text(text, all_chunks)
-                if matched is not None:
+                text = z.get("text", "") or ""
+
+                # 1. Strikter Match in (Partei, eigenes Programm)
+                matched = find_chunk_for_text(text, allowed) if allowed else None
+                if matched is None and fallback:
+                    # 2. Fallback: gleiche Partei, andere Programm-Kategorie
+                    matched = find_chunk_for_text(text, fallback)
+
+                if matched is None:
+                    # 3. Kein Match in der eigenen Partei → verwerfen.
+                    logger.warning(
+                        "Zitat verworfen (kein Partei-Match): fraktion=%r "
+                        "kind=%r text=%r llm_quelle=%r",
+                        partei_name, kind, text[:80], z.get("quelle"),
+                    )
+                    continue
+
                z["quelle"] = _chunk_source_label(matched)
                url = _chunk_pdf_url(matched)
                if url:
                    z["url"] = url
                z["verified"] = True
-                else:
-                    # Kein Match — Zitat behalten aber als unverified markieren.
-                    # Die LLM-emittierte quelle/url bleibt (best effort).
-                    z["verified"] = False
                cleaned.append(z)
+
            blk["zitate"] = cleaned
    return data

--- a/app/static/v3/v3.css
+++ b/app/static/v3/v3.css
@ -496,6 +496,31 @@
  margin: 0 0 10px;
 }

+/* Eine Nachricht, max 9 Zeilen — restliche News-Items verstecken,
+   Beschreibungs-Text per line-clamp kürzen.
+
+   Die News-JS aus v2 rendert Items in #ad-news-list als child-Knoten;
+   Heuristisch: jeder direkt-Child = ein News-Item. Wir zeigen nur den
+   ersten und kürzen ihn auf 9 Zeilen via -webkit-line-clamp. */
+#ad-news-list > * {
+  display: none;
+}
+#ad-news-list > *:first-child {
+  display: block;
+  display: -webkit-box;
+  -webkit-line-clamp: 9;
+  -webkit-box-orient: vertical;
+  line-clamp: 9;
+  overflow: hidden;
+}
+/* Loading-Hinweis nicht clampen */
+#ad-news-list > .v3-loading:first-child {
+  display: block;
+  -webkit-line-clamp: none;
+  line-clamp: none;
+  overflow: visible;
+}
+
 .v3-comments .v3-comment-label {
  font-family: var(--font-mono);
  font-size: 10px;
--- a/app/templates/v3/screens/antrag_detail.html
+++ b/app/templates/v3/screens/antrag_detail.html
@ -388,6 +388,15 @@
      <a href="/antrag/{{ antrag.drucksache }}">Permalink</a>
    </div>

+    {# News-Box (per JS gefuellt) — eine Nachricht, max 9 Zeilen #}
+    <div id="ad-news-box" class="v3-news-box" style="display:none;">
+      <h3 class="v3-h3">Aktuelle News passend zu diesem Antrag</h3>
+      <p class="v3-news-meta">Embedding-Match aus den letzten 90 Tagen. Quelle: Tagesschau-API + Bundestag-RSS.</p>
+      <div id="ad-news-list">
+        <div class="v3-loading">Lade …</div>
+      </div>
+    </div>
+
    {# Neu analysieren #}
    <div class="v3-rest-block">
      <button id="v2-reanalyze-btn" onclick="v2DetailReAnalyze(this)" class="v3-action-btn v3-action-muted">
@ -403,15 +412,6 @@
      </div>
    </div>

-    {# News-Box (per JS gefuellt) #}
-    <div id="ad-news-box" class="v3-news-box" style="display:none;">
-      <h3 class="v3-h3">Aktuelle News passend zu diesem Antrag</h3>
-      <p class="v3-news-meta">Embedding-Match aus den letzten 90 Tagen. Quelle: Tagesschau-API + Bundestag-RSS.</p>
-      <div id="ad-news-list">
-        <div class="v3-loading">Lade …</div>
-      </div>
-    </div>
-
    {# 9i Kommentare #}
    <div class="v3-rest-block v3-comments">
      <h3 class="v3-h3">Kommentare</h3>