fix: Citation-Binding partei-skopiert (Cross-Partei-Misattribution gestoppt)
Bug: AfD-Parteiprogramm-Block enthielt ein Zitat mit quelle "CDU Grundsatzprogramm 2024, S. 33" (DRS 21/4939). Ursache: reconstruct_zitate hatte alle Chunks aller Parteien in einen Pool gemischt. Wenn der LLM unter AfD-Parteiprogramm einen Text emittierte, der zufaellig auch im CDU-Programm vorkam, matched der Code den CDU-Chunk und ueberschrieb quelle/url mit CDU-Werten. Fix: Match strikt auf chunks_by_party[fraktion][kind]. Fallback auf gleiche Partei/andere Kategorie (z.B. AfD hat nur Grundsatz-, kein Wahlprogramm im Index). Wenn kein Match in der eigenen Partei → Zitat verwerfen statt fremde quelle behalten. Lieber 0 Zitate als ein Misattributions-Zitat. Plus v3-UI: - News-Box von ganz hinten nach oberhalb "Neu analysieren" verschoben - News-Liste auf 1 Item gekuerzt + 9-Zeilen-Clamp via -webkit-line-clamp Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
1ef5578e02
commit
535c2f15e4
@ -901,46 +901,94 @@ def find_chunk_for_text(text: str, chunks: list[dict]) -> Optional[dict]:
|
|||||||
def reconstruct_zitate(data: dict, semantic_quotes: dict) -> dict:
|
def reconstruct_zitate(data: dict, semantic_quotes: dict) -> dict:
|
||||||
"""Verify and reconstruct LLM-emitted zitate against retrieved chunks.
|
"""Verify and reconstruct LLM-emitted zitate against retrieved chunks.
|
||||||
|
|
||||||
For each Zitat:
|
Matching ist strikt **partei-skopiert** — ein Zitat im AfD-Block darf
|
||||||
* **verified** (substring/4-word-anchor match): overwrite quelle/url
|
nur gegen AfD-Chunks gematcht werden, niemals gegen CDU/SPD-Chunks.
|
||||||
with canonical chunk values, set ``verified: true``.
|
Sonst landet ein zufaellig wortgleicher Text aus einem fremden Programm
|
||||||
* **unverified** (no match found): keep the Zitat but set
|
mit fremder ``quelle`` im falschen Block (Cross-Partei-Misattribution).
|
||||||
``verified: false``. The UI shows it with a different style so the
|
|
||||||
user knows it's an LLM-Paraphrase, not a wörtliches Zitat.
|
|
||||||
|
|
||||||
This replaces the old drop-on-no-match behavior (ADR 0001 Option B)
|
Match-Reihenfolge pro Zitat:
|
||||||
with a more honest approach: paraphrased citations are still valuable
|
1. Partei + exakte Programm-Kategorie (z.B. AfD-Parteiprogramm-Chunks
|
||||||
context, they just need to be marked as such.
|
fuer ein Zitat im AfD-Parteiprogramm-Block) → ``verified: true`` mit
|
||||||
|
kanonischer ``quelle``/``url`` aus dem Chunk.
|
||||||
|
2. Partei + andere Programm-Kategorie (z.B. AfD hat nur Grundsatz-/
|
||||||
|
Parteiprogramm im Index, der LLM hat den Text aber im Wahlprogramm-
|
||||||
|
Block emittiert) → ``verified: true`` mit korrigierter ``quelle``,
|
||||||
|
Block bleibt wie vom LLM gesetzt.
|
||||||
|
3. Kein Match in der eigenen Partei → **Zitat verwerfen**. Lieber 0
|
||||||
|
Zitate als eines mit falscher Partei-Zuschreibung. Vorher wurde
|
||||||
|
solche Zitate als ``verified: false`` mit der LLM-quelle behalten —
|
||||||
|
das fuehrte z.B. zu CDU-quellen in AfD-Bloecken (#175-bug).
|
||||||
"""
|
"""
|
||||||
if not semantic_quotes:
|
if not semantic_quotes:
|
||||||
return data
|
return data
|
||||||
|
|
||||||
all_chunks: list[dict] = []
|
# Pool pro Partei aufbauen — Lookup geht direkt + ueber normalize_partei,
|
||||||
for d in semantic_quotes.values():
|
# damit Aliase ("BÜNDNIS 90/DIE GRÜNEN" ↔ "GRÜNE") beidseitig matchen.
|
||||||
all_chunks.extend(d.get("wahlprogramm", []))
|
chunks_by_party: dict[str, dict[str, list]] = {}
|
||||||
all_chunks.extend(d.get("parteiprogramm", []))
|
for partei, d in (semantic_quotes or {}).items():
|
||||||
if not all_chunks:
|
chunks_by_party[partei] = {
|
||||||
|
"wahlprogramm": list(d.get("wahlprogramm", []) or []),
|
||||||
|
"parteiprogramm": list(d.get("parteiprogramm", []) or []),
|
||||||
|
}
|
||||||
|
if not chunks_by_party:
|
||||||
return data
|
return data
|
||||||
|
|
||||||
|
try:
|
||||||
|
from .parteien import normalize_partei
|
||||||
|
except Exception:
|
||||||
|
normalize_partei = lambda x: x # noqa: E731
|
||||||
|
|
||||||
|
def _pool_for(fraktion: str) -> dict[str, list]:
|
||||||
|
# Versuch direkt, dann normalisiert. Wenn weder noch — leerer Pool.
|
||||||
|
if fraktion in chunks_by_party:
|
||||||
|
return chunks_by_party[fraktion]
|
||||||
|
norm = normalize_partei(fraktion) or fraktion
|
||||||
|
if norm in chunks_by_party:
|
||||||
|
return chunks_by_party[norm]
|
||||||
|
# Reverse-Lookup: vielleicht ist `chunks_by_party` mit normalisiertem
|
||||||
|
# Key bestueckt waehrend `fraktion` der Original-Name ist.
|
||||||
|
for key, val in chunks_by_party.items():
|
||||||
|
if normalize_partei(key) == norm:
|
||||||
|
return val
|
||||||
|
return {"wahlprogramm": [], "parteiprogramm": []}
|
||||||
|
|
||||||
for fs in data.get("wahlprogrammScores", []) or []:
|
for fs in data.get("wahlprogrammScores", []) or []:
|
||||||
|
partei_name = fs.get("fraktion", "")
|
||||||
|
partei_pool = _pool_for(partei_name)
|
||||||
|
|
||||||
for kind in ("wahlprogramm", "parteiprogramm"):
|
for kind in ("wahlprogramm", "parteiprogramm"):
|
||||||
blk = fs.get(kind) or {}
|
blk = fs.get(kind) or {}
|
||||||
zitate = blk.get("zitate") or []
|
zitate = blk.get("zitate") or []
|
||||||
|
allowed = partei_pool.get(kind) or []
|
||||||
|
cross_kind = "parteiprogramm" if kind == "wahlprogramm" else "wahlprogramm"
|
||||||
|
fallback = partei_pool.get(cross_kind) or []
|
||||||
|
|
||||||
cleaned = []
|
cleaned = []
|
||||||
for z in zitate:
|
for z in zitate:
|
||||||
text = z.get("text", "")
|
text = z.get("text", "") or ""
|
||||||
matched = find_chunk_for_text(text, all_chunks)
|
|
||||||
if matched is not None:
|
# 1. Strikter Match in (Partei, eigenes Programm)
|
||||||
|
matched = find_chunk_for_text(text, allowed) if allowed else None
|
||||||
|
if matched is None and fallback:
|
||||||
|
# 2. Fallback: gleiche Partei, andere Programm-Kategorie
|
||||||
|
matched = find_chunk_for_text(text, fallback)
|
||||||
|
|
||||||
|
if matched is None:
|
||||||
|
# 3. Kein Match in der eigenen Partei → verwerfen.
|
||||||
|
logger.warning(
|
||||||
|
"Zitat verworfen (kein Partei-Match): fraktion=%r "
|
||||||
|
"kind=%r text=%r llm_quelle=%r",
|
||||||
|
partei_name, kind, text[:80], z.get("quelle"),
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
|
||||||
z["quelle"] = _chunk_source_label(matched)
|
z["quelle"] = _chunk_source_label(matched)
|
||||||
url = _chunk_pdf_url(matched)
|
url = _chunk_pdf_url(matched)
|
||||||
if url:
|
if url:
|
||||||
z["url"] = url
|
z["url"] = url
|
||||||
z["verified"] = True
|
z["verified"] = True
|
||||||
else:
|
|
||||||
# Kein Match — Zitat behalten aber als unverified markieren.
|
|
||||||
# Die LLM-emittierte quelle/url bleibt (best effort).
|
|
||||||
z["verified"] = False
|
|
||||||
cleaned.append(z)
|
cleaned.append(z)
|
||||||
|
|
||||||
blk["zitate"] = cleaned
|
blk["zitate"] = cleaned
|
||||||
return data
|
return data
|
||||||
|
|
||||||
|
|||||||
@ -496,6 +496,31 @@
|
|||||||
margin: 0 0 10px;
|
margin: 0 0 10px;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Eine Nachricht, max 9 Zeilen — restliche News-Items verstecken,
|
||||||
|
Beschreibungs-Text per line-clamp kürzen.
|
||||||
|
|
||||||
|
Die News-JS aus v2 rendert Items in #ad-news-list als child-Knoten;
|
||||||
|
Heuristisch: jeder direkt-Child = ein News-Item. Wir zeigen nur den
|
||||||
|
ersten und kürzen ihn auf 9 Zeilen via -webkit-line-clamp. */
|
||||||
|
#ad-news-list > * {
|
||||||
|
display: none;
|
||||||
|
}
|
||||||
|
#ad-news-list > *:first-child {
|
||||||
|
display: block;
|
||||||
|
display: -webkit-box;
|
||||||
|
-webkit-line-clamp: 9;
|
||||||
|
-webkit-box-orient: vertical;
|
||||||
|
line-clamp: 9;
|
||||||
|
overflow: hidden;
|
||||||
|
}
|
||||||
|
/* Loading-Hinweis nicht clampen */
|
||||||
|
#ad-news-list > .v3-loading:first-child {
|
||||||
|
display: block;
|
||||||
|
-webkit-line-clamp: none;
|
||||||
|
line-clamp: none;
|
||||||
|
overflow: visible;
|
||||||
|
}
|
||||||
|
|
||||||
.v3-comments .v3-comment-label {
|
.v3-comments .v3-comment-label {
|
||||||
font-family: var(--font-mono);
|
font-family: var(--font-mono);
|
||||||
font-size: 10px;
|
font-size: 10px;
|
||||||
|
|||||||
@ -388,6 +388,15 @@
|
|||||||
<a href="/antrag/{{ antrag.drucksache }}">Permalink</a>
|
<a href="/antrag/{{ antrag.drucksache }}">Permalink</a>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
{# News-Box (per JS gefuellt) — eine Nachricht, max 9 Zeilen #}
|
||||||
|
<div id="ad-news-box" class="v3-news-box" style="display:none;">
|
||||||
|
<h3 class="v3-h3">Aktuelle News passend zu diesem Antrag</h3>
|
||||||
|
<p class="v3-news-meta">Embedding-Match aus den letzten 90 Tagen. Quelle: Tagesschau-API + Bundestag-RSS.</p>
|
||||||
|
<div id="ad-news-list">
|
||||||
|
<div class="v3-loading">Lade …</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
{# Neu analysieren #}
|
{# Neu analysieren #}
|
||||||
<div class="v3-rest-block">
|
<div class="v3-rest-block">
|
||||||
<button id="v2-reanalyze-btn" onclick="v2DetailReAnalyze(this)" class="v3-action-btn v3-action-muted">
|
<button id="v2-reanalyze-btn" onclick="v2DetailReAnalyze(this)" class="v3-action-btn v3-action-muted">
|
||||||
@ -403,15 +412,6 @@
|
|||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
{# News-Box (per JS gefuellt) #}
|
|
||||||
<div id="ad-news-box" class="v3-news-box" style="display:none;">
|
|
||||||
<h3 class="v3-h3">Aktuelle News passend zu diesem Antrag</h3>
|
|
||||||
<p class="v3-news-meta">Embedding-Match aus den letzten 90 Tagen. Quelle: Tagesschau-API + Bundestag-RSS.</p>
|
|
||||||
<div id="ad-news-list">
|
|
||||||
<div class="v3-loading">Lade …</div>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
{# 9i Kommentare #}
|
{# 9i Kommentare #}
|
||||||
<div class="v3-rest-block v3-comments">
|
<div class="v3-rest-block v3-comments">
|
||||||
<h3 class="v3-h3">Kommentare</h3>
|
<h3 class="v3-h3">Kommentare</h3>
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user