691 lines
20 KiB
Markdown
691 lines
20 KiB
Markdown
|
|
# GWÖ-Wahlprüfsteine Auswertung — Nachbau-Anleitung
|
|||
|
|
|
|||
|
|
Diese Anleitung erklärt, wie du das Tool für eigene Wahlprüfstein-Auswertungen nachbauen oder anpassen kannst.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Was macht dieses Tool?
|
|||
|
|
|
|||
|
|
1. **Scrapen:** Lädt Wahlprüfstein-Antworten von einer Webseite
|
|||
|
|
2. **Analysieren:** Bewertet jede Antwort automatisch nach GWÖ-Kriterien (mit KI)
|
|||
|
|
3. **Aggregieren:** Fasst Ergebnisse nach Parteien zusammen
|
|||
|
|
4. **Berichten:** Erstellt Markdown-Reports mit Zitaten und Empfehlungen
|
|||
|
|
|
|||
|
|
**Ergebnis:** Objektive, reproduzierbare Auswertung von Wahlprüfsteinen aus GWÖ-Perspektive.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Voraussetzungen
|
|||
|
|
|
|||
|
|
### Software
|
|||
|
|
- **Python 3.11+** (macOS/Linux: vorinstalliert, Windows: python.org)
|
|||
|
|
- **Git** (optional, für Versionierung)
|
|||
|
|
|
|||
|
|
### API-Zugang
|
|||
|
|
Du brauchst einen LLM-API-Zugang. Optionen:
|
|||
|
|
|
|||
|
|
| Anbieter | Modell | Kosten | Anmerkung |
|
|||
|
|
|----------|--------|--------|-----------|
|
|||
|
|
| **Alibaba DashScope** | qwen-plus | ~$0.001/Call | Empfohlen, günstig |
|
|||
|
|
| **OpenAI** | gpt-4o-mini | ~$0.003/Call | Teurer, aber bekannt |
|
|||
|
|
| **Anthropic** | claude-3-haiku | ~$0.002/Call | Gute Alternative |
|
|||
|
|
|
|||
|
|
**Für DashScope (empfohlen):**
|
|||
|
|
1. Account erstellen: https://dashscope.console.aliyun.com
|
|||
|
|
2. API-Key generieren
|
|||
|
|
3. Key sicher speichern (siehe unten)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Installation
|
|||
|
|
|
|||
|
|
### 1. Projektordner erstellen
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
mkdir -p ~/projekte/wahlpruefsteine-auswertung
|
|||
|
|
cd ~/projekte/wahlpruefsteine-auswertung
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. Python Virtual Environment
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
python3 -m venv .venv
|
|||
|
|
source .venv/bin/activate # Linux/macOS
|
|||
|
|
# oder: .venv\Scripts\activate # Windows
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. Abhängigkeiten installieren
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
pip install beautifulsoup4 requests openai
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. API-Key konfigurieren
|
|||
|
|
|
|||
|
|
**Option A: Umgebungsvariable (einfach)**
|
|||
|
|
```bash
|
|||
|
|
export DASHSCOPE_API_KEY="sk-xxx..."
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Option B: macOS Keychain (sicher)**
|
|||
|
|
```bash
|
|||
|
|
security add-generic-password -s qwen-api -a $USER -w "sk-xxx..."
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Option C: .env-Datei**
|
|||
|
|
```bash
|
|||
|
|
echo "DASHSCOPE_API_KEY=sk-xxx..." > .env
|
|||
|
|
pip install python-dotenv
|
|||
|
|
# Dann in Python: from dotenv import load_dotenv; load_dotenv()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Dateien erstellen
|
|||
|
|
|
|||
|
|
Erstelle diese 4 Dateien im Projektordner:
|
|||
|
|
|
|||
|
|
### 1. `schema.sql` — Datenbankstruktur
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
-- Kandidat:innen
|
|||
|
|
CREATE TABLE IF NOT EXISTS kandidaten (
|
|||
|
|
id INTEGER PRIMARY KEY,
|
|||
|
|
vorname TEXT NOT NULL,
|
|||
|
|
nachname TEXT NOT NULL,
|
|||
|
|
plz TEXT,
|
|||
|
|
kommune TEXT NOT NULL,
|
|||
|
|
landkreis TEXT,
|
|||
|
|
partei_raw TEXT NOT NULL,
|
|||
|
|
partei_normalisiert TEXT NOT NULL,
|
|||
|
|
ist_waehlergemeinschaft BOOLEAN DEFAULT FALSE,
|
|||
|
|
pdf_url TEXT,
|
|||
|
|
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
|||
|
|
);
|
|||
|
|
|
|||
|
|
-- Die 6 Fragen (anpassen an deine Wahlprüfsteine!)
|
|||
|
|
CREATE TABLE IF NOT EXISTS fragen (
|
|||
|
|
id INTEGER PRIMARY KEY,
|
|||
|
|
nummer INTEGER UNIQUE NOT NULL,
|
|||
|
|
kurztext TEXT NOT NULL,
|
|||
|
|
volltext TEXT NOT NULL
|
|||
|
|
);
|
|||
|
|
|
|||
|
|
-- Rohantworten
|
|||
|
|
CREATE TABLE IF NOT EXISTS antworten_raw (
|
|||
|
|
id INTEGER PRIMARY KEY,
|
|||
|
|
kandidat_id INTEGER REFERENCES kandidaten(id) ON DELETE CASCADE,
|
|||
|
|
frage_id INTEGER REFERENCES fragen(id) ON DELETE CASCADE,
|
|||
|
|
antwort_kurz TEXT,
|
|||
|
|
antwort_erlaeuterung TEXT,
|
|||
|
|
UNIQUE(kandidat_id, frage_id)
|
|||
|
|
);
|
|||
|
|
|
|||
|
|
-- LLM-Bewertungen
|
|||
|
|
CREATE TABLE IF NOT EXISTS bewertungen (
|
|||
|
|
id INTEGER PRIMARY KEY,
|
|||
|
|
antwort_id INTEGER REFERENCES antworten_raw(id) ON DELETE CASCADE UNIQUE,
|
|||
|
|
substanz_score INTEGER CHECK(substanz_score >= 0 AND substanz_score <= 3),
|
|||
|
|
umfang TEXT CHECK(umfang IN ('keine', 'kurz', 'mittel', 'ausführlich')),
|
|||
|
|
wortanzahl INTEGER,
|
|||
|
|
gwoe_score REAL CHECK(gwoe_score >= 0 AND gwoe_score <= 10),
|
|||
|
|
gwoe_begruendung TEXT,
|
|||
|
|
matrix_felder TEXT, -- JSON-Array
|
|||
|
|
staerken TEXT, -- JSON-Array
|
|||
|
|
schwaechen TEXT, -- JSON-Array
|
|||
|
|
bewertet_am DATETIME DEFAULT CURRENT_TIMESTAMP,
|
|||
|
|
model TEXT DEFAULT 'qwen-plus'
|
|||
|
|
);
|
|||
|
|
|
|||
|
|
-- Stammdaten: Fragen (ANPASSEN!)
|
|||
|
|
INSERT OR IGNORE INTO fragen (nummer, kurztext, volltext) VALUES
|
|||
|
|
(1, 'Leitlinien', 'Werden Sie sich für Maßnahmen einsetzen, welche die Werte und Themen der GWÖ in Leitlinien und Strategien Ihrer Kommune integrieren?'),
|
|||
|
|
(2, 'Anreize', 'Werden Sie sich für Anreize einsetzen, um Unternehmen zu unterstützen, gemeinwohl-orientierter zu wirtschaften?'),
|
|||
|
|
(3, 'Vergabe', 'Werden Sie sich dafür einsetzen, dass öffentliche Aufträge bevorzugt an Unternehmen mit Gemeinwohl-Bilanz vergeben werden?'),
|
|||
|
|
(4, 'Information', 'Möchten Sie Bürger:innen regelmäßig über die Gemeinwohl-Auswirkungen kommunaler Entwicklungen informieren?'),
|
|||
|
|
(5, 'Mitentscheidung', 'Möchten Sie Bürger:innen stärker in kommunale Entscheidungsprozesse einbinden?'),
|
|||
|
|
(6, 'Bekanntheit', 'Möchten Sie die GWÖ in Ihrer Kommune und auf höheren Ebenen bekannter machen?');
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. `scraper.py` — Daten laden
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
#!/usr/bin/env python3
|
|||
|
|
"""
|
|||
|
|
Wahlprüfsteine Scraper — ANPASSEN AN DEINE DATENQUELLE!
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
import re
|
|||
|
|
import sqlite3
|
|||
|
|
from pathlib import Path
|
|||
|
|
from bs4 import BeautifulSoup
|
|||
|
|
import requests
|
|||
|
|
|
|||
|
|
# Partei-Normalisierung (ANPASSEN für dein Bundesland!)
|
|||
|
|
PARTEI_MAPPING = {
|
|||
|
|
r'bündnis\s*90\s*/?\s*die\s*grünen?': 'Grüne',
|
|||
|
|
r'grüne': 'Grüne',
|
|||
|
|
r'freie\s*wähler': 'Freie Wähler',
|
|||
|
|
r'^csu$': 'CSU',
|
|||
|
|
r'^cdu$': 'CDU',
|
|||
|
|
r'^spd$': 'SPD',
|
|||
|
|
r'^fdp$': 'FDP',
|
|||
|
|
r'^ödp': 'ÖDP',
|
|||
|
|
r'die\s*linke': 'Linke',
|
|||
|
|
r'^afd$': 'AfD',
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
def normalize_partei(raw: str) -> tuple[str, bool]:
|
|||
|
|
"""Normalisiert Parteinamen. Returns: (partei, ist_wählergemeinschaft)"""
|
|||
|
|
raw_lower = raw.lower().strip()
|
|||
|
|
|
|||
|
|
for pattern, normalized in PARTEI_MAPPING.items():
|
|||
|
|
if re.search(pattern, raw_lower, re.IGNORECASE):
|
|||
|
|
return normalized, False
|
|||
|
|
|
|||
|
|
return 'Wählergemeinschaft', True
|
|||
|
|
|
|||
|
|
def parse_html(html_content: str) -> list[dict]:
|
|||
|
|
"""
|
|||
|
|
Parst HTML und extrahiert Kandidaten + Antworten.
|
|||
|
|
|
|||
|
|
WICHTIG: Diese Funktion musst du an deine HTML-Struktur anpassen!
|
|||
|
|
"""
|
|||
|
|
soup = BeautifulSoup(html_content, 'html.parser')
|
|||
|
|
kandidaten = []
|
|||
|
|
|
|||
|
|
# BEISPIEL: Tabelle mit Kandidaten parsen
|
|||
|
|
# ANPASSEN an deine HTML-Struktur!
|
|||
|
|
for table in soup.find_all('table'):
|
|||
|
|
rows = table.find_all('tr')
|
|||
|
|
|
|||
|
|
for row in rows[1:]: # Header überspringen
|
|||
|
|
cells = row.find_all('td')
|
|||
|
|
if len(cells) < 7:
|
|||
|
|
continue
|
|||
|
|
|
|||
|
|
try:
|
|||
|
|
kandidaten.append({
|
|||
|
|
'plz': cells[0].get_text(strip=True),
|
|||
|
|
'kommune': cells[1].get_text(strip=True),
|
|||
|
|
'landkreis': cells[2].get_text(strip=True),
|
|||
|
|
'vorname': cells[3].get_text(strip=True),
|
|||
|
|
'nachname': cells[4].get_text(strip=True),
|
|||
|
|
'partei_raw': cells[5].get_text(strip=True),
|
|||
|
|
'pdf_url': cells[6].find('a')['href'] if cells[6].find('a') else None,
|
|||
|
|
'antworten': {} # Frage-Nr -> (ja_nein, erläuterung)
|
|||
|
|
})
|
|||
|
|
|
|||
|
|
# Antworten extrahieren (6 Fragen × 2 Spalten)
|
|||
|
|
for i in range(6):
|
|||
|
|
ja_nein = cells[7 + i*2].get_text(strip=True) if 7 + i*2 < len(cells) else None
|
|||
|
|
erlaeuterung = cells[8 + i*2].get_text(strip=True) if 8 + i*2 < len(cells) else None
|
|||
|
|
kandidaten[-1]['antworten'][i+1] = (ja_nein, erlaeuterung)
|
|||
|
|
|
|||
|
|
except (IndexError, KeyError):
|
|||
|
|
continue
|
|||
|
|
|
|||
|
|
return kandidaten
|
|||
|
|
|
|||
|
|
def init_db(db_path: Path) -> sqlite3.Connection:
|
|||
|
|
"""Initialisiert die Datenbank."""
|
|||
|
|
conn = sqlite3.connect(db_path)
|
|||
|
|
conn.row_factory = sqlite3.Row
|
|||
|
|
|
|||
|
|
schema_path = Path(__file__).parent / 'schema.sql'
|
|||
|
|
with open(schema_path) as f:
|
|||
|
|
conn.executescript(f.read())
|
|||
|
|
|
|||
|
|
conn.commit()
|
|||
|
|
return conn
|
|||
|
|
|
|||
|
|
def save_to_db(conn: sqlite3.Connection, kandidaten: list[dict]) -> int:
|
|||
|
|
"""Speichert Kandidaten in der Datenbank."""
|
|||
|
|
cursor = conn.cursor()
|
|||
|
|
|
|||
|
|
for k in kandidaten:
|
|||
|
|
partei_norm, ist_wg = normalize_partei(k['partei_raw'])
|
|||
|
|
|
|||
|
|
cursor.execute("""
|
|||
|
|
INSERT OR REPLACE INTO kandidaten
|
|||
|
|
(vorname, nachname, plz, kommune, landkreis, partei_raw,
|
|||
|
|
partei_normalisiert, ist_waehlergemeinschaft, pdf_url)
|
|||
|
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
|
|||
|
|
""", (k['vorname'], k['nachname'], k['plz'], k['kommune'],
|
|||
|
|
k['landkreis'], k['partei_raw'], partei_norm, ist_wg, k['pdf_url']))
|
|||
|
|
|
|||
|
|
kandidat_id = cursor.lastrowid
|
|||
|
|
|
|||
|
|
for frage_nr, (ja_nein, erlaeuterung) in k['antworten'].items():
|
|||
|
|
cursor.execute("""
|
|||
|
|
INSERT OR REPLACE INTO antworten_raw
|
|||
|
|
(kandidat_id, frage_id, antwort_kurz, antwort_erlaeuterung)
|
|||
|
|
VALUES (?, ?, ?, ?)
|
|||
|
|
""", (kandidat_id, frage_nr, ja_nein, erlaeuterung))
|
|||
|
|
|
|||
|
|
conn.commit()
|
|||
|
|
return len(kandidaten)
|
|||
|
|
|
|||
|
|
def main():
|
|||
|
|
import argparse
|
|||
|
|
|
|||
|
|
parser = argparse.ArgumentParser(description='Wahlprüfsteine Scraper')
|
|||
|
|
parser.add_argument('--url', required=True, help='URL der Wahlprüfsteine-Seite')
|
|||
|
|
parser.add_argument('--db', default='wahlpruefsteine.db', help='Datenbankpfad')
|
|||
|
|
|
|||
|
|
args = parser.parse_args()
|
|||
|
|
|
|||
|
|
print(f"Lade: {args.url}")
|
|||
|
|
response = requests.get(args.url, timeout=30)
|
|||
|
|
kandidaten = parse_html(response.text)
|
|||
|
|
|
|||
|
|
print(f"Gefunden: {len(kandidaten)} Kandidat:innen")
|
|||
|
|
|
|||
|
|
conn = init_db(Path(args.db))
|
|||
|
|
count = save_to_db(conn, kandidaten)
|
|||
|
|
conn.close()
|
|||
|
|
|
|||
|
|
print(f"Gespeichert: {count} in {args.db}")
|
|||
|
|
|
|||
|
|
if __name__ == '__main__':
|
|||
|
|
main()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. `analyzer.py` — KI-Bewertung
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
#!/usr/bin/env python3
|
|||
|
|
"""
|
|||
|
|
GWÖ-Bewertung mit LLM
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
import sqlite3
|
|||
|
|
import json
|
|||
|
|
import os
|
|||
|
|
import time
|
|||
|
|
from pathlib import Path
|
|||
|
|
from openai import OpenAI
|
|||
|
|
|
|||
|
|
# GWÖ-Kontext für das LLM
|
|||
|
|
GWOE_CONTEXT = """
|
|||
|
|
## GWÖ-Matrix 2.0 für Gemeinden
|
|||
|
|
|
|||
|
|
### Die 5 Werte
|
|||
|
|
1. **Menschenwürde** — Grundrechte, Rechtsstaatlichkeit
|
|||
|
|
2. **Solidarität** — Gemeinnutz, Mehrwert für Gemeinschaft
|
|||
|
|
3. **Ökologische Nachhaltigkeit** — Klimaschutz, Ressourcenschonung
|
|||
|
|
4. **Soziale Gerechtigkeit** — Daseinsvorsorge, gerechte Verteilung
|
|||
|
|
5. **Transparenz & Mitbestimmung** — Bürgerbeteiligung, Demokratie
|
|||
|
|
|
|||
|
|
### Berührungsgruppen
|
|||
|
|
- A: Lieferant:innen (Beschaffung)
|
|||
|
|
- B: Finanzpartner:innen (Haushalt)
|
|||
|
|
- C: Verwaltung (Mitarbeitende)
|
|||
|
|
- D: Bürger:innen (Daseinsvorsorge) — WICHTIGSTE für Kommunalpolitik!
|
|||
|
|
- E: Gesellschaft (überregional)
|
|||
|
|
|
|||
|
|
### Bewertungsskala
|
|||
|
|
- 7-10: Vorbildlich (innovative Maßnahmen)
|
|||
|
|
- 4-6: Erfahren (erkennbare Verbesserungen)
|
|||
|
|
- 2-3: Fortgeschritten (erste Maßnahmen)
|
|||
|
|
- 0-1: Basislinie
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
SYSTEM_PROMPT = f"""Du bist ein GWÖ-Experte und analysierst Wahlprüfstein-Antworten.
|
|||
|
|
|
|||
|
|
{GWOE_CONTEXT}
|
|||
|
|
|
|||
|
|
Bewerte die Antwort und gib NUR valides JSON zurück:
|
|||
|
|
|
|||
|
|
{{
|
|||
|
|
"substanz_score": 0-3, // 0=keine Antwort, 1=ausweichend, 2=substanziell, 3=umfassend
|
|||
|
|
"umfang": "keine|kurz|mittel|ausführlich",
|
|||
|
|
"gwoe_score": 0.0-10.0,
|
|||
|
|
"gwoe_begruendung": "2-3 Sätze",
|
|||
|
|
"matrix_felder": ["D5", "C2"], // Berührte Felder
|
|||
|
|
"staerken": ["Punkt 1"],
|
|||
|
|
"schwaechen": ["Punkt 1"]
|
|||
|
|
}}"""
|
|||
|
|
|
|||
|
|
def get_api_client():
|
|||
|
|
"""Erstellt API-Client (ANPASSEN für deinen Anbieter!)"""
|
|||
|
|
|
|||
|
|
# Option 1: DashScope (Qwen)
|
|||
|
|
api_key = os.environ.get('DASHSCOPE_API_KEY')
|
|||
|
|
if api_key:
|
|||
|
|
return OpenAI(
|
|||
|
|
api_key=api_key,
|
|||
|
|
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
|
|||
|
|
), "qwen-plus"
|
|||
|
|
|
|||
|
|
# Option 2: OpenAI
|
|||
|
|
api_key = os.environ.get('OPENAI_API_KEY')
|
|||
|
|
if api_key:
|
|||
|
|
return OpenAI(api_key=api_key), "gpt-4o-mini"
|
|||
|
|
|
|||
|
|
raise ValueError("Kein API-Key gefunden! Setze DASHSCOPE_API_KEY oder OPENAI_API_KEY")
|
|||
|
|
|
|||
|
|
def analyze_answer(client, model: str, kandidat: str, frage: str, antwort: str) -> dict:
|
|||
|
|
"""Analysiert eine einzelne Antwort."""
|
|||
|
|
|
|||
|
|
user_prompt = f"""## Frage
|
|||
|
|
{frage}
|
|||
|
|
|
|||
|
|
## Antwort von {kandidat}
|
|||
|
|
{antwort if antwort else "(keine Antwort)"}
|
|||
|
|
|
|||
|
|
Bewerte nach GWÖ-Kriterien."""
|
|||
|
|
|
|||
|
|
response = client.chat.completions.create(
|
|||
|
|
model=model,
|
|||
|
|
messages=[
|
|||
|
|
{"role": "system", "content": SYSTEM_PROMPT},
|
|||
|
|
{"role": "user", "content": user_prompt}
|
|||
|
|
],
|
|||
|
|
temperature=0.3,
|
|||
|
|
response_format={"type": "json_object"}
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
return json.loads(response.choices[0].message.content)
|
|||
|
|
|
|||
|
|
def main():
|
|||
|
|
import argparse
|
|||
|
|
|
|||
|
|
parser = argparse.ArgumentParser(description='GWÖ-Analyzer')
|
|||
|
|
parser.add_argument('--db', default='wahlpruefsteine.db')
|
|||
|
|
parser.add_argument('--limit', type=int, help='Max. Anzahl Bewertungen')
|
|||
|
|
|
|||
|
|
args = parser.parse_args()
|
|||
|
|
|
|||
|
|
client, model = get_api_client()
|
|||
|
|
print(f"Verwende Modell: {model}")
|
|||
|
|
|
|||
|
|
conn = sqlite3.connect(args.db)
|
|||
|
|
conn.row_factory = sqlite3.Row
|
|||
|
|
|
|||
|
|
# Unbewertete Antworten holen
|
|||
|
|
query = """
|
|||
|
|
SELECT ar.id, k.vorname || ' ' || k.nachname as kandidat,
|
|||
|
|
f.volltext as frage, ar.antwort_erlaeuterung as antwort
|
|||
|
|
FROM antworten_raw ar
|
|||
|
|
JOIN kandidaten k ON ar.kandidat_id = k.id
|
|||
|
|
JOIN fragen f ON ar.frage_id = f.id
|
|||
|
|
LEFT JOIN bewertungen b ON ar.id = b.antwort_id
|
|||
|
|
WHERE b.id IS NULL
|
|||
|
|
"""
|
|||
|
|
if args.limit:
|
|||
|
|
query += f" LIMIT {args.limit}"
|
|||
|
|
|
|||
|
|
antworten = conn.execute(query).fetchall()
|
|||
|
|
print(f"Zu bewerten: {len(antworten)}")
|
|||
|
|
|
|||
|
|
for i, row in enumerate(antworten, 1):
|
|||
|
|
print(f"[{i}/{len(antworten)}] {row['kandidat']}...", end=" ", flush=True)
|
|||
|
|
|
|||
|
|
try:
|
|||
|
|
result = analyze_answer(client, model, row['kandidat'], row['frage'], row['antwort'])
|
|||
|
|
|
|||
|
|
conn.execute("""
|
|||
|
|
INSERT INTO bewertungen
|
|||
|
|
(antwort_id, substanz_score, umfang, gwoe_score, gwoe_begruendung,
|
|||
|
|
matrix_felder, staerken, schwaechen, model)
|
|||
|
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
|
|||
|
|
""", (
|
|||
|
|
row['id'],
|
|||
|
|
result.get('substanz_score'),
|
|||
|
|
result.get('umfang'),
|
|||
|
|
result.get('gwoe_score'),
|
|||
|
|
result.get('gwoe_begruendung'),
|
|||
|
|
json.dumps(result.get('matrix_felder', [])),
|
|||
|
|
json.dumps(result.get('staerken', [])),
|
|||
|
|
json.dumps(result.get('schwaechen', [])),
|
|||
|
|
model
|
|||
|
|
))
|
|||
|
|
conn.commit()
|
|||
|
|
|
|||
|
|
print(f"GWÖ: {result.get('gwoe_score', '?')}")
|
|||
|
|
|
|||
|
|
except Exception as e:
|
|||
|
|
print(f"FEHLER: {e}")
|
|||
|
|
|
|||
|
|
time.sleep(0.5) # Rate Limiting
|
|||
|
|
|
|||
|
|
conn.close()
|
|||
|
|
print("Fertig!")
|
|||
|
|
|
|||
|
|
if __name__ == '__main__':
|
|||
|
|
main()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. `aggregator.py` — Reports erstellen
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
#!/usr/bin/env python3
|
|||
|
|
"""
|
|||
|
|
Aggregation und Report-Generierung
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
import sqlite3
|
|||
|
|
import json
|
|||
|
|
from pathlib import Path
|
|||
|
|
from datetime import datetime
|
|||
|
|
|
|||
|
|
def get_partei_stats(conn) -> list[dict]:
|
|||
|
|
"""Statistiken pro Partei."""
|
|||
|
|
cursor = conn.cursor()
|
|||
|
|
|
|||
|
|
cursor.execute("""
|
|||
|
|
SELECT
|
|||
|
|
k.partei_normalisiert AS partei,
|
|||
|
|
COUNT(DISTINCT k.id) AS kandidaten,
|
|||
|
|
COUNT(b.id) AS antworten,
|
|||
|
|
ROUND(AVG(b.substanz_score), 2) AS avg_substanz,
|
|||
|
|
ROUND(AVG(b.gwoe_score), 2) AS avg_gwoe
|
|||
|
|
FROM kandidaten k
|
|||
|
|
LEFT JOIN antworten_raw ar ON k.id = ar.kandidat_id
|
|||
|
|
LEFT JOIN bewertungen b ON ar.id = b.antwort_id
|
|||
|
|
WHERE b.id IS NOT NULL
|
|||
|
|
GROUP BY k.partei_normalisiert
|
|||
|
|
ORDER BY avg_gwoe DESC
|
|||
|
|
""")
|
|||
|
|
|
|||
|
|
return [dict(row) for row in cursor.fetchall()]
|
|||
|
|
|
|||
|
|
def get_top_quotes(conn, partei: str, limit: int = 3) -> list[dict]:
|
|||
|
|
"""Beste Zitate einer Partei."""
|
|||
|
|
cursor = conn.cursor()
|
|||
|
|
|
|||
|
|
cursor.execute("""
|
|||
|
|
SELECT
|
|||
|
|
k.vorname || ' ' || k.nachname AS name,
|
|||
|
|
k.kommune,
|
|||
|
|
f.kurztext AS frage,
|
|||
|
|
ar.antwort_erlaeuterung AS zitat,
|
|||
|
|
b.gwoe_score
|
|||
|
|
FROM bewertungen b
|
|||
|
|
JOIN antworten_raw ar ON b.antwort_id = ar.id
|
|||
|
|
JOIN kandidaten k ON ar.kandidat_id = k.id
|
|||
|
|
JOIN fragen f ON ar.frage_id = f.id
|
|||
|
|
WHERE k.partei_normalisiert = ?
|
|||
|
|
AND ar.antwort_erlaeuterung IS NOT NULL
|
|||
|
|
AND LENGTH(ar.antwort_erlaeuterung) > 50
|
|||
|
|
ORDER BY b.gwoe_score DESC
|
|||
|
|
LIMIT ?
|
|||
|
|
""", (partei, limit))
|
|||
|
|
|
|||
|
|
return [dict(row) for row in cursor.fetchall()]
|
|||
|
|
|
|||
|
|
def generate_report(stats: list[dict], conn) -> str:
|
|||
|
|
"""Generiert Markdown-Report."""
|
|||
|
|
|
|||
|
|
lines = [
|
|||
|
|
"# GWÖ-Wahlprüfsteine Auswertung",
|
|||
|
|
"",
|
|||
|
|
f"*Stand: {datetime.now().strftime('%d.%m.%Y %H:%M')}*",
|
|||
|
|
"",
|
|||
|
|
"## Zusammenfassung",
|
|||
|
|
"",
|
|||
|
|
"| Partei | Kandidat:innen | Ø GWÖ | Ø Substanz |",
|
|||
|
|
"|--------|---------------|-------|------------|",
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
for s in stats:
|
|||
|
|
lines.append(f"| **{s['partei']}** | {s['kandidaten']} | {s['avg_gwoe']:.1f} | {s['avg_substanz']:.1f}/3 |")
|
|||
|
|
|
|||
|
|
lines.extend(["", "---", "", "## Details", ""])
|
|||
|
|
|
|||
|
|
for s in stats:
|
|||
|
|
lines.append(f"### {s['partei']} (Ø {s['avg_gwoe']:.1f})")
|
|||
|
|
lines.append("")
|
|||
|
|
|
|||
|
|
quotes = get_top_quotes(conn, s['partei'])
|
|||
|
|
if quotes:
|
|||
|
|
lines.append("**Beste Antworten:**")
|
|||
|
|
for q in quotes:
|
|||
|
|
zitat = q['zitat'][:200] + "..." if len(q['zitat']) > 200 else q['zitat']
|
|||
|
|
lines.append(f"> *\"{zitat}\"*")
|
|||
|
|
lines.append(f"> — {q['name']} ({q['kommune']}), GWÖ: {q['gwoe_score']:.1f}")
|
|||
|
|
lines.append("")
|
|||
|
|
|
|||
|
|
lines.append("---")
|
|||
|
|
lines.append("")
|
|||
|
|
|
|||
|
|
return "\n".join(lines)
|
|||
|
|
|
|||
|
|
def main():
|
|||
|
|
import argparse
|
|||
|
|
|
|||
|
|
parser = argparse.ArgumentParser()
|
|||
|
|
parser.add_argument('--db', default='wahlpruefsteine.db')
|
|||
|
|
parser.add_argument('--output', default='report.md')
|
|||
|
|
|
|||
|
|
args = parser.parse_args()
|
|||
|
|
|
|||
|
|
conn = sqlite3.connect(args.db)
|
|||
|
|
conn.row_factory = sqlite3.Row
|
|||
|
|
|
|||
|
|
stats = get_partei_stats(conn)
|
|||
|
|
report = generate_report(stats, conn)
|
|||
|
|
|
|||
|
|
Path(args.output).write_text(report)
|
|||
|
|
print(f"Report erstellt: {args.output}")
|
|||
|
|
|
|||
|
|
conn.close()
|
|||
|
|
|
|||
|
|
if __name__ == '__main__':
|
|||
|
|
main()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Ausführung
|
|||
|
|
|
|||
|
|
### Vollständiger Workflow
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 1. Daten laden
|
|||
|
|
python3 scraper.py --url "https://example.org/wahlpruefsteine"
|
|||
|
|
|
|||
|
|
# 2. KI-Bewertung (dauert je nach Anzahl)
|
|||
|
|
python3 analyzer.py
|
|||
|
|
|
|||
|
|
# 3. Report generieren
|
|||
|
|
python3 aggregator.py --output auswertung.md
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Status prüfen
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
sqlite3 wahlpruefsteine.db "
|
|||
|
|
SELECT
|
|||
|
|
(SELECT COUNT(*) FROM kandidaten) as kandidaten,
|
|||
|
|
(SELECT COUNT(*) FROM antworten_raw) as antworten,
|
|||
|
|
(SELECT COUNT(*) FROM bewertungen) as bewertet;
|
|||
|
|
"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Anpassungen für andere Datenquellen
|
|||
|
|
|
|||
|
|
### Andere HTML-Struktur
|
|||
|
|
|
|||
|
|
Passe `parse_html()` in `scraper.py` an:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
def parse_html(html_content: str) -> list[dict]:
|
|||
|
|
soup = BeautifulSoup(html_content, 'html.parser')
|
|||
|
|
|
|||
|
|
# DEINE LOGIK HIER
|
|||
|
|
# z.B. für andere Tabellenstruktur, div-basierte Layouts, etc.
|
|||
|
|
|
|||
|
|
return kandidaten
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Andere Fragen
|
|||
|
|
|
|||
|
|
Passe die `INSERT`-Statements in `schema.sql` an:
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
INSERT OR IGNORE INTO fragen (nummer, kurztext, volltext) VALUES
|
|||
|
|
(1, 'Deine Frage 1', 'Vollständiger Text...'),
|
|||
|
|
(2, 'Deine Frage 2', 'Vollständiger Text...'),
|
|||
|
|
-- usw.
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Andere Parteien/Region
|
|||
|
|
|
|||
|
|
Passe `PARTEI_MAPPING` in `scraper.py` an:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
PARTEI_MAPPING = {
|
|||
|
|
r'^cdu$': 'CDU', # statt CSU für andere Bundesländer
|
|||
|
|
# deine Parteien...
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Andere LLM-Anbieter
|
|||
|
|
|
|||
|
|
Passe `get_api_client()` in `analyzer.py` an:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# Für Anthropic Claude:
|
|||
|
|
from anthropic import Anthropic
|
|||
|
|
client = Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])
|
|||
|
|
|
|||
|
|
# Für lokales Ollama:
|
|||
|
|
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Kosten-Übersicht
|
|||
|
|
|
|||
|
|
| Anbieter | Modell | ~Kosten/100 Antworten |
|
|||
|
|
|----------|--------|----------------------|
|
|||
|
|
| DashScope | qwen-plus | $0.06 |
|
|||
|
|
| DashScope | qwen-turbo | $0.02 |
|
|||
|
|
| OpenAI | gpt-4o-mini | $0.30 |
|
|||
|
|
| Anthropic | claude-3-haiku | $0.20 |
|
|||
|
|
| Ollama | lokal | $0.00 |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Tipps
|
|||
|
|
|
|||
|
|
1. **Erst testen:** Starte mit `--limit 5` beim Analyzer
|
|||
|
|
2. **HTML speichern:** `curl URL > data.html` für Offline-Entwicklung
|
|||
|
|
3. **Logs prüfen:** Bei Fehlern einzelne Antworten manuell checken
|
|||
|
|
4. **Schwellen anpassen:** Die GWÖ-Schwellen (5 für "empfohlen") ggf. senken
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Lizenz
|
|||
|
|
|
|||
|
|
MIT — Frei verwendbar, auch kommerziell.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
*Anleitung erstellt: 29.03.2026*
|
|||
|
|
*Fragen? → GWÖ-Community oder ECOnGOOD*
|