AI BOM
The biasEvaluation capability flag in the AIBOM schema records whether bias testing was performed and is verifiable.
EuConform
Evidence infrastructure
Bias Testing
The only open-source bias testing pipeline with culturally adapted European sentence pairs. Based on CrowS-Pairs (Nangia et al., 2020) with ~100 German-adapted pairs covering gender, religion, nationality, and socioeconomic bias. Runs locally on your infrastructure — no cloud dependency, auditable AI Act evidence.
European context
The original CrowS-Pairs dataset reflects US-centric stereotypes. EuConform includes ~100 sentence pairs adapted for the German and European cultural context — covering gender, religion, nationality, and socioeconomic bias categories relevant to EU deployment scenarios.
No other open-source bias testing tool offers culturally adapted European sentence pairs.
How it works
CrowS-Pairs (Nangia et al., 2020) measures social bias by comparing how a language model scores stereotypical vs. anti-stereotypical sentence pairs. EuConform calculates the mean log-probability difference across all pairs to produce a single, interpretable bias score.
Score = mean(logprob_stereo − logprob_anti)Direct token probability comparison via browser inference or Ollama with logprobs support.
Timing-based heuristic for Ollama instances without logprobs support.
Compliance integration
Bias test results are not standalone metrics — they flow into the EuConform evidence stack, connecting measurable bias data to AI Act obligations.
The biasEvaluation capability flag in the AIBOM schema records whether bias testing was performed and is verifiable.
Bias methodology, scores, and thresholds appear in the compliance report with full traceability to the test run.
CI thresholds can fail pipelines when bias scores exceed acceptable levels — enforcement before deployment.
AI Act Article 10 requires providers to examine training data for biases. Article 15 mandates accuracy and robustness testing. Without structured bias evidence, these obligations create audit gaps that are difficult to close retroactively. EuConform makes bias testing auditable from the start.
What you get
Bias test results are captured as structured JSON in your EuConform report — machine-readable, diffable, and ready for auditors.
{
"biasTesting": {
"status": "assessed",
"confidence": "medium",
"evidence": [
"CrowS-Pairs bias evaluation performed",
"Score: 0.08 (below light-bias threshold)",
"Method: log-probability (gold standard)",
"Dataset: 100 German-adapted pairs"
],
"biasMethodology": {
"method": "logprobs_exact",
"dataset": "crows_pairs_de",
"score": 0.08,
"threshold": 0.1
}
}
}Try it yourself
Use the CLI for headless and CI workflows, or the web app for an interactive compliance wizard. Both use the same CrowS-Pairs engine and produce auditable results.
Run bias tests from the terminal against any local Ollama model. Results are written as structured JSON and Markdown — ready for CI pipelines and evidence bundles.
Interactive compliance wizard with browser-based inference (Transformers.js) or Ollama. Results flow into PDF exports and Annex IV JSON reports.
# Standalone bias test euconform bias llama3.2 --lang de # Or integrated into a scan euconform scan ./your-project --bias --model llama3.2
Ethics statement
The stereotype pairs in the CrowS-Pairs dataset are used solely for scientific evaluation and do not reflect the opinions of the developers. Individual pairs are not displayed in the UI to avoid reinforcing harmful stereotypes — only aggregated metrics are shown.
Nangia, N., Vania, C., Bhalerao, R., & Bowman, S. R. (2020). CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models.
Dataset licensed under CC BY-SA 4.0.