Configuring Guardrails¶
Dome offers a flexible system to secure your applications through configurable guardrails. Here’s how to understand and set them up:
Step 1: Choose and Configure Guardrails¶
Dome provides the following types of guardrails:
Input Guardrails
Output Guardrails
Retrieval Guardrails (coming soon)
Execution Guardrails (coming soon)
The type of guardrail simply indicates where it is supposed to run in an application flow.
To configure a guardrail :
provide a list of guards
specify whether guards should run serially or in parallel (
run-parallel
)specify whether all guards should run, or execution should terminate as soon as one guard is a hit (
early-exit
)
Step 2: Assemble and Configure Guards¶
Guards are groups of detectors of a particular type that make up a guardrail. Guards can be named as you like, and are completely user defined.
Each guard must be one of the following types:
Security
Moderation
Privacy
Integrity
To configure a guard:
specify the type of guard
provide a list of detectors of the same type as the guard
specify whether detectors should run serially or in parallel (
run-parallel
)specify whether all detectors should run, or execution should terminate as soon as one detector is a hit (
early-exit
)
Step 3: Select and Configure Detectors¶
Guards are made up of detectors. To set up your guards:
Choose specific detectors from Dome’s growing library
Configure each detector with settings like thresholds or context-length.
These settings depend on the detector used. Please see the list of detection methods for a comprehensive list of all the detection methods we currently offer and the settings they have.
Example¶
example_config = {
# The input guardrail has two guards - 'prompt-injection' and 'input-toxicity'
"input-guards": ["prompt-injection", "input-toxicity"],
# The output guardrail has one guard - 'output-toxicity'
"output-guards": ["output-toxicity"],
# Here, we set the input guardrail's execution settings. Early-exit and parallel execution are enabled
"input-early-exit": True,
"input-run-parallel": True,
# By not specifying the output guardrail's execution settings, the default values are used. It runs serially, and has early-exit enabled.
# Here, we set up the 'prompt-injection' guard.
"prompt-injection": {
"type": "security", # This type governs what detectors can be used in the guard. Only detectors of the same type can be used
# Here, we specify guard-level execution settings
"early-exit": False, # This guard does not terminate early
"run-parallel": True, # This guard runs its detection methods in parallel
# We specify the detection methods we want to use - "prompt-injection-deberta-v3-base" and "security-llm", both of which are security methods
"methods" : ["prompt-injection-deberta-v3-base", "security-llm"],
# Here, we are configuring detector-specific settings.
# We are using GPT 4 as the LLM in the security-llm detector instead of the default GPT-4 Turbo
"security-llm": {
"model_name": "gpt-4"
}
},
# This sets up the 'input-toxicity' guard.
"input-toxicity":{
"type":"moderation",
# Since we do not specify the guard-level execution settings, the default values are used. It runs serially, and has early-exit enabled.
# We have just one method in this guard, which is the OpenAI moderations API
"methods":["moderations-oai-api"]
},
# This sets up the 'input-toxicity' guard.
"output-toxicity":{
"type":"moderation",
"methods":["moderation-llamaguard"]
},
}
Configuring Dome via a TOML file¶
Dome configurations can also be described via .toml files. This enables users to easily save and edit configurations. Instantiating Dome from a TOML file is identical to instantiating it from a dictionary.
from vijil_dome import Dome
dome_from_toml = Dome(<PATH_TO_TOML_FILE>)
An example of a TOML configuration is shown below, followed by its corresponding dictionary representation.
[guardrail]
input-guards = ["prompt-injection", "input-toxicity"]
output-guards = ["output-toxicity"]
input-early-exit = false
[prompt-injection]
type="security"
early-exit = false
methods = ["prompt-injection-deberta-v3-base", "security-llm"]
[prompt-injection.security-llm]
model_name = "gpt-4o"
[input-toxicity]
type="moderation"
methods = ["moderations-oai-api"]
[output-toxicity]
type="moderation"
methods = ["moderation-llamaguard"]
{
"input-guards": ["prompt-injection", "input-toxicity"],
"output-guards": ["output-toxicity"],
"input-early-exit": False,
"prompt-injection": {
"type": "security",
"early-exit": False,
"methods" : ["prompt-injection-deberta-v3-base", "security-llm"],
"security-llm": {
"model_name": "gpt-4o"
}
},
"input-toxicity":{
"type":"moderation",
"methods":["moderations-oai-api"]
},
"output-toxicity":{
"type":"moderation",
"methods":["moderation-llamaguard"]
},
}
To create a TOML configuration for Dome, ensure that you have a field named guardrail
that lists out the input and output guards, and the parallel execution/early termination settings you need. To specify settings for a method in a guard, use <GUARD_NAME>.<METHOD_NAME>
.
Note
When specifying booleans in the TOML, please ensure to use lowercase values (i.e, ‘true’ and ‘false’).
Overall Config Structure¶
The guardrail
field describes the high level structure of a Dome configuration.
Attribute |
Type |
Description |
---|---|---|
|
List |
user-defined list of guard modules used to scan an input |
|
Boolean |
determines if the input guardrail runs in “early-exit” mode, i.e, stop execution as soon as one of the guards in the input guardrail has flagged the input. (Default is True) |
|
Boolean |
determines whether guards in the input guardrail are executed in parallel. (Default is False) |
|
List |
user-defined list of guard modules used to scan an output |
|
Boolean |
determines if the output guardrail runs in “early-exit” mode, i.e, stop execution as soon as one of the guards in the output guardrail has flagged the input. (Default is True) |
|
Boolean |
determines whether guards in the output guardrail are executed in parallel. (Default is False) |
Inside a guardrail specification, the configuration of each guard module can be specified by creating a new field with the module name.
Each guard
module has four attributes.
Attribute |
Type |
Description |
---|---|---|
|
Enum |
user-defined type of the guard module, specifying the category of guards that it holds. Every guard module can be exactly one of the four currently supported types: |
|
List |
describes which methods are to be used in the guard module. Possible methods depend on the type selected, and can be chosen from the list of methods under that type. |
|
Boolean |
determines if the guard runs in “early-exit” mode, i.e, stop execution as soon as one of the methods in guard has the flagged the query |
|
Boolean |
determines whether methods in the guard are executed in parallel. (Default is False) |
Finally, to customize a configuration at the method-level, you need to create a new field with the name <MODULE-NAME>.<METHOD-NAME>
. In the example above, prompt-injection.security-llm
describes the configuration settings for the security-llm
method of the prompt-injection
guard module. This configuration scheme lets users create custom settings for different groups and use the same method with different settings in a guardrail. See the next section for detailed instructions on the options available for each method.