EC

EuConform

Evidence infrastructure

Bias Testing

Bias testing built for European AI systems.

The only open-source bias testing pipeline with culturally adapted European sentence pairs. Based on CrowS-Pairs (Nangia et al., 2020) with ~100 German-adapted pairs covering gender, religion, nationality, and socioeconomic bias. Runs locally on your infrastructure — no cloud dependency, auditable AI Act evidence.

European context

~100 pairs adapted for German culture

The original CrowS-Pairs dataset reflects US-centric stereotypes. EuConform includes ~100 sentence pairs adapted for the German and European cultural context — covering gender, religion, nationality, and socioeconomic bias categories relevant to EU deployment scenarios.

No other open-source bias testing tool offers culturally adapted European sentence pairs.

How it works

CrowS-Pairs methodology

CrowS-Pairs (Nangia et al., 2020) measures social bias by comparing how a language model scores stereotypical vs. anti-stereotypical sentence pairs. EuConform calculates the mean log-probability difference across all pairs to produce a single, interpretable bias score.

Score = mean(logprob_stereo − logprob_anti)
> 0.1Light Bias
> 0.3Strong Bias

Log-Probability

Gold Standard

Direct token probability comparison via browser inference or Ollama with logprobs support.

Latency Fallback

Approximation

Timing-based heuristic for Ollama instances without logprobs support.

Compliance integration

From bias scores to auditable evidence

Bias test results are not standalone metrics — they flow into the EuConform evidence stack, connecting measurable bias data to AI Act obligations.

AI BOM

The biasEvaluation capability flag in the AIBOM schema records whether bias testing was performed and is verifiable.

Report

Bias methodology, scores, and thresholds appear in the compliance report with full traceability to the test run.

CI Gate

CI thresholds can fail pipelines when bias scores exceed acceptable levels — enforcement before deployment.

AI Act Article 10 requires providers to examine training data for biases. Article 15 mandates accuracy and robustness testing. Without structured bias evidence, these obligations create audit gaps that are difficult to close retroactively. EuConform makes bias testing auditable from the start.

What you get

Structured bias evidence in your compliance report

Bias test results are captured as structured JSON in your EuConform report — machine-readable, diffable, and ready for auditors.

{
  "biasTesting": {
    "status": "assessed",
    "confidence": "medium",
    "evidence": [
      "CrowS-Pairs bias evaluation performed",
      "Score: 0.08 (below light-bias threshold)",
      "Method: log-probability (gold standard)",
      "Dataset: 100 German-adapted pairs"
    ],
    "biasMethodology": {
      "method": "logprobs_exact",
      "dataset": "crows_pairs_de",
      "score": 0.08,
      "threshold": 0.1
    }
  }
}

Try it yourself

Two ways to run bias testing

Use the CLI for headless and CI workflows, or the web app for an interactive compliance wizard. Both use the same CrowS-Pairs engine and produce auditable results.

CLI + Ollama

Run bias tests from the terminal against any local Ollama model. Results are written as structured JSON and Markdown — ready for CI pipelines and evidence bundles.

Web App

Interactive compliance wizard with browser-based inference (Transformers.js) or Ollama. Results flow into PDF exports and Annex IV JSON reports.

# Standalone bias test
euconform bias llama3.2 --lang de

# Or integrated into a scan
euconform scan ./your-project --bias --model llama3.2

Ethics statement

The stereotype pairs in the CrowS-Pairs dataset are used solely for scientific evaluation and do not reflect the opinions of the developers. Individual pairs are not displayed in the UI to avoid reinforcing harmful stereotypes — only aggregated metrics are shown.

Nangia, N., Vania, C., Bhalerao, R., & Bowman, S. R. (2020). CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models.

Dataset licensed under CC BY-SA 4.0.