gwoe-antragspruefer/tests/conftest.py

"""Shared pytest fixtures and path setup.

Stubs heavy optional dependencies (``fitz``/PyMuPDF, ``bs4``/BeautifulSoup,
``openai``) so the test suite can run without the full prod requirements
installed. The tests in this directory are pure unit tests over parser
logic and prompt formatters — they neither parse PDFs nor make HTTP
calls, so the stubs are inert placeholders that satisfy the import
machinery but never get exercised.

If a test ever does need real PyMuPDF or httpx integration, give it a
fixture marked with ``@pytest.mark.integration`` and skip it by default.
"""
import sys
import types
from pathlib import Path

# Make the `app` package importable when pytest is run from the webapp/ root.
ROOT = Path(__file__).resolve().parent.parent
sys.path.insert(0, str(ROOT))


def _stub(name: str, **attrs) -> None:
    if name in sys.modules:
        return
    mod = types.ModuleType(name)
    for k, v in attrs.items():
        setattr(mod, k, v)
    sys.modules[name] = mod


_stub("fitz")  # PyMuPDF — used for PDF parsing, not in unit tests
_stub("bs4", BeautifulSoup=lambda *a, **kw: None)  # only needed by NRWAdapter live calls
_stub("openai", OpenAI=lambda **kw: None)  # only needed by embeddings live calls


# pydantic_settings is a small but external dep that's not in the test
# environment. Stub it with a minimal BaseSettings shim so app.config can
# import without crashing — the tests don't actually read settings values.
class _BaseSettingsShim:
    model_config: dict = {}

    def __init__(self, **kwargs):
        for k, v in kwargs.items():
            setattr(self, k, v)


def _settings_config_dict(**kwargs):
    return kwargs


_stub("pydantic_settings", BaseSettings=_BaseSettingsShim, SettingsConfigDict=_settings_config_dict)
Add pytest suite + fix two regex bugs uncovered by it (#46) Erste Tests für die Codebase. 77 Tests, 0.08s Laufzeit, decken die drei Bug-Klassen aus der April-2026-Adapter-Session ab plus haben schon zwei weitere Bugs in Production-Code aufgedeckt. ## Setup - requirements-dev.txt mit pytest + pytest-asyncio - pytest.ini mit asyncio_mode=auto - tests/conftest.py stubbt fitz/bs4/openai/pydantic_settings, damit die Suite ohne den vollen prod-requirements-Satz läuft (pure unit tests, kein PDF-Parsing, kein HTTP) ## Tests - tests/test_parlamente.py (33 Tests) * PortalaAdapter._parse_hit_list_cards: doctype/doctype_full NameError-Regression aus 1cb030a, plus Title/Drucksache/Fraktion- /Datum/PDF-Extraktion gegen ein BE-Card-Fixture * PortalaAdapter._parse_hit_list_dump: gegen ein LSA-Perl-Dump- Fixture inkl. Hex-Escape-Decoding (\x{fc} → ü) * PortalaAdapter._parse_hit_list_html: Auto-Detection zwischen Card- und Dump-Format * PortalaAdapter._normalize_fraktion: kanonische Fraktion-Codes inkl. F.D.P.-mit-Punkten, BÜNDNIS 90, DIE LINKE, BSW * ParLDokAdapter._hit_to_drucksache: JSON-Hit → Drucksache Mapping inkl. /navpanes-Stripping, MdL-mit-Partei-in-Klammern, Landesregierung-Detection * ParLDokAdapter._fulltext_id: bundle.js-mirroring (deferred, aber dokumentiert) * ADAPTERS-Registry-Sanity - tests/test_embeddings.py (11 Tests) * _chunk_source_label: Programm-Name + Seite (Halluzinations- Bug-Regression aus 1b5fd96) * format_quotes_for_prompt: jeder Chunk muss Programm-Name enthalten, strict-citation-Hinweis muss im Output sein, keine NRW-Halluzinationen für MV/BE-Chunk-Sets - tests/test_wahlprogramme.py (14 Tests) * Registry-Struktur (jahr int, seiten int, .pdf-Endung) * File-Existenz: jede registrierte PDF muss in static/referenzen/ liegen — würde Tippfehler in den 22 indexierten Programmen sofort fangen * embeddings.PROGRAMME-Konsistenz-Cross-Check - tests/test_bundeslaender.py (15 Tests) * Sanity über 16-State-Registry * #48-Klassifikations-Regression: TH=ParlDok, HB=StarWeb, SN=Eigensystem * Wahltermine plausibel (zwischen 2026 und 2035) - tests/test_analyzer.py (4 Tests) * Markdown-Codeblock-Stripping aus dem JSON-Retry-Loop ## Bug-Funde während der Test-Schreibphase Zwei Production-Bugs in den _normalize_fraktion-Helfern wurden durch die neuen Tests sofort aufgedeckt und im selben Commit gefixt: 1. PortalaAdapter._normalize_fraktion matched "F.D.P." (mit Punkten, wie historische SH/HB-Drucksachen) nicht — Regex \bFDP\b ist zu strikt. Fix: \bF\.?\sD\.?\sP\.?\b analog zu ParLDokAdapter. 2. ParLDokAdapter._normalize_fraktion (auch PortalaAdapter) matched "Ministerium der Finanzen" nicht als Landesregierung, weil \bMINISTER\b die Wortgrenze auch nach MINISTER verlangt — bei MINISTERIUM steht aber IUM danach, keine Wortgrenze. Fix: \bMINISTER ohne abschließendes \b. Beide Bugs hätten Fraktion-Felder bei Drucksachen der Bremischen Bürgerschaft (FDP-Listen) und bei Landesregierungs-Drucksachen in MV/LSA fälschlich leer gelassen — exakt der "fraktionen=[]"- Befund aus dem MV-Smoke-Test in #4. Phase 0 aus Roadmap-Issue #49. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-04-08 23:26:06 +02:00			`"""Shared pytest fixtures and path setup.`

			Stubs heavy optional dependencies (``fitz``/PyMuPDF, ``bs4``/BeautifulSoup,
			``openai``) so the test suite can run without the full prod requirements
			`installed. The tests in this directory are pure unit tests over parser`
			`logic and prompt formatters — they neither parse PDFs nor make HTTP`
			`calls, so the stubs are inert placeholders that satisfy the import`
			`machinery but never get exercised.`

			`If a test ever does need real PyMuPDF or httpx integration, give it a`
			fixture marked with ``@pytest.mark.integration`` and skip it by default.
			`"""`
			`import sys`
			`import types`
			`from pathlib import Path`

			# Make the `app` package importable when pytest is run from the webapp/ root.
			`ROOT = Path(__file__).resolve().parent.parent`
			`sys.path.insert(0, str(ROOT))`


			`def _stub(name: str, **attrs) -> None:`
			`if name in sys.modules:`
			`return`
			`mod = types.ModuleType(name)`
			`for k, v in attrs.items():`
			`setattr(mod, k, v)`
			`sys.modules[name] = mod`


			`_stub("fitz") # PyMuPDF — used for PDF parsing, not in unit tests`
			`_stub("bs4", BeautifulSoup=lambda a, *kw: None) # only needed by NRWAdapter live calls`
			`_stub("openai", OpenAI=lambda **kw: None) # only needed by embeddings live calls`


			`# pydantic_settings is a small but external dep that's not in the test`
			`# environment. Stub it with a minimal BaseSettings shim so app.config can`
			`# import without crashing — the tests don't actually read settings values.`
			`class _BaseSettingsShim:`
			`model_config: dict = {}`

			`def __init__(self, **kwargs):`
			`for k, v in kwargs.items():`
			`setattr(self, k, v)`


			`def _settings_config_dict(**kwargs):`
			`return kwargs`


			`_stub("pydantic_settings", BaseSettings=_BaseSettingsShim, SettingsConfigDict=_settings_config_dict)`