A detector is the engine inside a guard that actually identifies threats. Guards define what category of threat to look for; detectors do the looking.This is the same concept as detectors in evaluationâin fact, many detectors are shared between Diamond (evaluation) and Dome (defense). The difference is context: evaluation detectors analyze probe responses after the fact; defense detectors analyze real traffic in real-time.
ML classifiers use trained models to detect threats:
Detector
Model
What It Detects
DeBERTa injection
Fine-tuned DeBERTa
Prompt injection attempts
Toxicity classifier
Fine-tuned RoBERTa
Toxic content categories
PII NER
Presidio/spaCy
Named entities that are PII
ML detectors handle variation better than patternsâthey catch novel phrasings of known attack types. Theyâre slower (5-20ms typically) and produce confidence scores rather than binary results.
LLM judges use language models to evaluate content:
Detector
Model
What It Evaluates
LlamaGuard
Llama-based classifier
Content safety categories
GPT-4 judge
GPT-4
Complex policy violations
Custom judge
Your choice
Domain-specific rules
LLM judges are the most flexibleâthey can evaluate nuanced policies that resist simple classification. Theyâre also the slowest (50-200ms) and most expensive. Use them for high-stakes decisions or as a second opinion on borderline cases.
You can add custom detectors for domain-specific threats:
Copy
Ask AI
from vijil.dome import Detectorclass CompanyNameLeakDetector(Detector): def detect(self, text: str) -> DetectorResult: # Check for internal company names that shouldn't appear internal_names = ["Project Falcon", "Codename Thunder"] for name in internal_names: if name.lower() in text.lower(): return DetectorResult( triggered=True, confidence=1.0, evidence={"leaked_name": name} ) return DetectorResult(triggered=False)
Custom detectors integrate into guards like built-in detectors.