Skip to main content
Guardrails scan agent inputs and outputs to enforce safety and policy boundaries. Idun Agent Platform provides 15 built-in guardrail types powered by Guardrails AI, applied at the input position, output position, or both.

How guardrails work

Guardrails run at two positions in the agent request lifecycle:
  • Input guardrails validate user messages before the agent processes them. If any input guardrail fails, the request is blocked immediately and the agent never sees the message.
  • Output guardrails validate agent responses before returning them to the user. They run after agent processing completes. Output guardrails add latency to the response time.
You can configure multiple guardrails at each position. All guardrails at a given position are checked, and any single failure blocks the request or response.

Configuration

1

Browse the guardrail catalog

Navigate to Guardrails in the sidebar. The page groups guards by category: Content Safety, Identity & Security, Enterprise, and Context & Quality. Each guard type shows as a card with a ”+” button.Guardrails catalog
2

Create a guardrail

Click + on the guard type you want (e.g., Ban List, Detect PII, Toxic Language). Fill in the configuration form. A Guardrails AI API key is required. Get one from hub.guardrailsai.com.Create ban list guardrail
3

Attach to an agent

The guardrail appears as a card showing “Used by N agents.” Attach it to agents from the agent detail Overview tab.
Some guards are marked “Soon” and not yet available: Code Scanner, Jailbreak, Prompt Injection, Model Armor, Custom LLM, and RAG Hallucination.
Guardrails require the GUARDRAILS_API_KEY environment variable. Get your API key from Guardrails AI. Set it in your .env file or in the service environment for Manager deployments.

Available guardrail types

All 15 guardrail types and their key parameters:
config_idDescriptionKey parameters
ban_listBlock specific words or phrasesbanned_words (list of strings)
detect_piiDetect personally identifiable information (emails, phone numbers, addresses)pii_entities (list of PII types)
nsfw_textBlock sexually explicit or violent contentthreshold (0.0 to 1.0)
toxic_languageDetect toxic or offensive languagethreshold (0.0 to 1.0)
detect_jailbreakIdentify attempts to bypass safety guidelinesthreshold (0.0 to 1.0)
prompt_injectionDetect prompt injection attacksthreshold (0.0 to 1.0)
competition_checkBlock mentions of competitor names or productscompetitors (list of strings)
bias_checkDetect biased languagethreshold (0.0 to 1.0)
correct_languageVerify text is written in expected languagesexpected_languages (ISO codes, e.g. ["en", "fr"])
restrict_to_topicKeep conversation within defined subject areastopics (list of allowed topics)
gibberish_textFilter nonsensical or incoherent outputthreshold (0.0 to 1.0)
rag_hallucinationDetect hallucinated content in RAG responsesthreshold (0.0 to 1.0)
code_scannerValidate code blocks for allowed programming languagesallowed_languages (list of language names)
model_armorGoogle Cloud Model Armor integrationproject_id, location, template_id
custom_llmDefine custom validation rules using an LLMmodel, prompt

Adding guardrails through the Manager UI

You can also configure guardrails when creating or editing an agent in the Agent Manager:
1

Open agent creation or edit

Navigate to your agent in the Manager UI and go to the guardrails step.
2

Select guardrail type

Choose a guardrail type from the dropdown menu (for example, “Ban List” or “PII Detector”).
3

Configure parameters

Fill in the guard-specific parameters. For ban list, enter the words to block. For PII detection, select which PII entity types to detect.
4

Set position

Choose whether the guardrail applies to input, output, or both.
5

Save and wait for installation

Save the agent configuration. The engine downloads and initializes the guardrail validators from Guardrails AI. Wait for installation to complete before testing.

Adding guardrails through config file

For standalone deployments using local config, add guardrails directly to your config.yaml:
config.yaml
guardrails:
  input:
    - config_id: "ban_list"
      banned_words: ["competitor-product", "internal-codename"]
    - config_id: "detect_pii"
      pii_entities: ["EMAIL_ADDRESS", "PHONE_NUMBER"]
Each guardrail entry supports an optional reject_message field to customize the error message returned when the guardrail triggers:
guardrails:
  input:
    - config_id: "ban_list"
      banned_words: ["blocked-term"]
      reject_message: "Your message contains a restricted term."

Testing guardrails

After configuring guardrails, verify they work as expected by sending test requests through the API.
curl -X POST http://localhost:8008/v1/agents/{agent_id}/query \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer {api_key}" \
  -d '{"message": "My email is john.doe@example.com and phone is 555-0123"}'
When a guardrail blocks a request, the response includes the guardrail field identifying which guard triggered and a detail message explaining why.

Best practices

  • Layer multiple guardrails at the input position for defense in depth. Combine ban lists with PII detection and jailbreak prevention.
  • Use output guardrails sparingly since they add latency. Reserve them for critical checks like hallucination detection or gibberish filtering.
  • Set thresholds conservatively at first (higher values = stricter), then lower them if you see too many false positives.
  • Test with realistic inputs before production. Send messages that should trigger each guardrail and verify legitimate content passes through.

Next steps

Last modified on March 22, 2026