gpt-4o-mini has achieved a Vijil Trust Score of 78.61.
Click on any dimension to navigate to its detailed analysis.
Dimension | Score | Risk Level |
---|---|---|
Hallucination | 58.9% | HIGH |
Security | 66.0% | HIGH |
Privacy | 72.2% | MODERATE |
Fairness | 78.3% | MODERATE |
Toxicity | 79.5% | MODERATE |
Performance | 80.5% | MODERATE |
Ethics | 83.3% | MODERATE |
Stereotype | 92.7% | LOW |
Robustness | 96.1% | LOW |
Evaluates if an agent is resilient towards false assertions and hallucinated content.
gpt-4o-mini achieved an overall score of 58.9% in Hallucination, indicating a high risk level.
Category | Score |
---|---|
Compounding Hallucination | 43.33% |
Package Hallucination | 100.00% |
Hallucination: Goodside | 54.17% |
Social Values | 58.33% |
HarmBench: Hallucination | 72.50% |
Do Not Answer: Hallucination | 9.62% |
Misleading Information | 74.00% |
These tests evaluate an agent for security vulnerabilities and adversarial capabilities. They check to see if the agent is prone to well known adversarial attacks, jailbreaking techniques and prompt injection mechanisms, and if an agent can be duped into producing harmful content using these adversarial mechanisms.
gpt-4o-mini achieved an overall score of 66.0% in Security, indicating a high risk level.
Category | Score |
---|---|
Uplift Cyberattack Capabilities | 0.00% |
Safe Eval | 100.00% |
Adversarial Suffix | 100.00% |
Cross-site Scripting | 100.00% |
Do Anything Now | 40.00% |
HarmBench | 100.00% |
Malware Generation | 100.00% |
Known Malware | 0.00% |
Code Execution Capabilities | 71.43% |
MITRE ATT&CK Ontology Attacks | 80.00% |
Prompt Injection: Encoded Data | 90.00% |
Prompt Injection: CyberSecEval3 | 35.00% |
Prompt Injection | 42.00% |
Evaluates an agent for privacy and data leakage concerns. These tests check if the agent understands privacy, leaks confidential information or reveals copyrighted data it may have been trained on.
gpt-4o-mini achieved an overall score of 72.2% in Privacy, indicating a moderate risk level.
Category | Score |
---|---|
HarmBench: Privacy | 87.50% |
Copyrighted Content | 91.61% |
Do Not Answer: Privacy | 40.00% |
Private Data Leak | 69.48% |
Evaluates whether the agent discriminates based on various demographic categories. A high score indicates lower gender-based descrimination.
gpt-4o-mini achieved an overall score of 78.3% in Fairness, indicating a moderate risk level.
Category | Score |
---|---|
Professional Bias | 40.00% |
Question Answering Bias | 94.88% |
Gender-Income Bias | 100.00% |
These tests assess an agents ability to generate content that is toxic or offensive. Toxic content includes, but is not limited to, racial slurs, sexual comments, violent and harmful behaviours, and other topics that are generally deemed inappropriate for a general audience or platform.
gpt-4o-mini achieved an overall score of 79.5% in Toxicity, indicating a moderate risk level.
Category | Score |
---|---|
Slur Continuation | 100.00% |
Religious Toxicity | 44.15% |
Violence | 69.43% |
Underage Pornography | 93.21% |
Race-related Toxicity | 100.00% |
HarmBench: Toxicity | 100.00% |
Do Not Answer: Toxicity | 25.00% |
Ethical Harms | 97.14% |
Real Toxicity Prompts | 86.64% |
Assesses general performance of agents and language models using standard benchmarks in the OpenLLM V2 Leaderboard. These tests include instruction following, complex reasoning tasks, mathematical proofs and more.
gpt-4o-mini achieved an overall score of 80.5% in Performance, indicating a moderate risk level.
Category | Score |
---|---|
IFEval | 100.00% |
MuSR | 89.17% |
GPQA | 42.37% |
MMLU-Pro | 80.13% |
BBH | 90.68% |
Evaluates if an agent adheres to general business ethics principles commonly found in company and organization codes of conduct.
gpt-4o-mini achieved an overall score of 83.3% in Ethics, indicating a moderate risk level.
Category | Score |
---|---|
General Business Ethics with Jailbreak | 80.00% |
General Business Ethics | 86.67% |
Evaluates an agent's tendency to rely on stereotypes of groups of people or perpetuate social biases and stigmas of groups of people. These tests indicate if an agent is able to refrain from using sterotypes in its generations, and if it is able to refuse to generate hateful content assosciated with social stereotypes.
gpt-4o-mini achieved an overall score of 92.7% in Stereotype, indicating a low risk level.
Category | Score |
---|---|
Stereotype with Adversarial Prefix | 85.42% |
Social Stigmas | 100.00% |
Evaluates whether an agent is robust against meaning-preserving perturbations. These tests check if an agent will continue to respond to a prompt in the same way if the wording is altered slightly without the meaning of the prompt changing.
gpt-4o-mini achieved an overall score of 96.1% in Robustness, indicating a low risk level.
Category | Score |
---|---|
Adversarial GLUE | 96.11% |