format_quotes_for_prompt previously rendered each retrieved chunk as just "S. X: text", giving the LLM no way to know which Bundesland or Wahlprogramm the passage came from. Result: even when the embedding search correctly returned MV-only chunks, the LLM hallucinated familiar source labels from its training set (typically "FDP NRW Wahlprogramm 2022, S. 75") because that was its strongest prior for budget/transparency policy citations. Fix: prepend the fully-qualified PROGRAMME[programm_id]["name"] to each quote and explicitly instruct the model to use these labels verbatim. Discovered while smoke-testing MV after indexing the new MV+BE programmes — embedding retrieval was clean (sim ~0.6 chunks all from fdp-mv-2021), only the prompt serialisation was lossy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| kontext | ||
| routers | ||
| static/referenzen | ||
| templates | ||
| __init__.py | ||
| analyzer.py | ||
| bundeslaender.py | ||
| config.py | ||
| database.py | ||
| embeddings.py | ||
| main.py | ||
| models.py | ||
| parlamente.py | ||
| report.py | ||
| wahlprogramme.py | ||