evalsig.integrations¶
Optional adapters for the systems people already use. Each submodule imports its target lazily so missing dependencies do not block the package install.
pytest_plugin¶
Exposes an evalsig_gate fixture that lets you treat a release gate
as a test.
Enable in your conftest.py:
Use in a test body:
def test_no_regression(evalsig_gate):
a = evalsig_gate.load("baseline.json", format="runframe")
b = evalsig_gate.load("candidate.json", format="runframe")
evalsig_gate.assert_no_regression(
a, b,
metric="accuracy",
min_delta=0.005,
alpha=0.05,
power=0.80,
cluster="passage_id",
method="auto",
)
On REJECT or INCONCLUSIVE, the fixture raises AssertionError with the
full Markdown report embedded, so pytest's failure message tells the
developer exactly what to do next.
load() accepts format="runframe" | "lm_eval" | "inspect" | "helm" |
"parquet" and forwards any extra kwargs to the matching reader.
github_action¶
The Python entry point the published GitHub Action calls. Reads inputs
from environment variables (the actions/core convention), runs the gate,
writes a Markdown summary to $GITHUB_STEP_SUMMARY, emits structured
outputs to $GITHUB_OUTPUT, and exits with the gate's exit code.
You normally do not call this directly. Use the action:
- uses: vtensor/evalsig@v0.1
with:
baseline: baseline.json
candidate: candidate.json
metric: accuracy
min_delta: '0.005'
If you want to script it yourself:
INPUT_BASELINE=baseline.json \
INPUT_CANDIDATE=candidate.json \
INPUT_METRIC=accuracy \
INPUT_MIN_DELTA=0.005 \
INPUT_ALPHA=0.05 \
INPUT_POWER=0.80 \
python -m evalsig.integrations.github_action
braintrust¶
Publishes a comparison result to Braintrust as an experiment record.
Imports the braintrust SDK lazily; raises IntegrationError if it
isn't installed.
from evalsig.integrations.braintrust import publish_comparison
publish_comparison(
baseline, candidate,
project="my-eval-suite",
experiment="release-2026-05-16",
min_delta=0.005,
)
Install with pip install evalsig[braintrust] (when the extra is
available) or pip install braintrust separately.
Writing your own integration¶
Every integration follows the same pattern:
- Import EVALSIG primitives at module top level.
- Import the target system's SDK inside the function that needs it.
- Wrap the call in a
try/except ImportErrorthat raisesIntegrationErrorwith an install hint.
This keeps the package import side-effect-free and lets users pip
install evalsig without dragging in every possible target SDK.
See also¶
- Usage: integrations for the workflow-level view.
- Scenarios: CI release gate for a full GitHub Actions example.