Deploy Vijil

Already have a cluster and AWS resources? Jump to Step 2: IAM and EKS Add-ons and then Step 3: Helm Values and Secrets.

Prerequisites

Before you start, make sure you have the following tools installed and configured:

Requirement	Details
AWS CLI	Configured and working — `aws sts get-caller-identity` should succeed
`eksctl`	≥ 0.160
`kubectl`	≥ 1.28 (keep client within one minor version of the server)
`helm`	≥ 3.8
Domain + DNS	A registered domain with a Route 53 hosted zone (or another DNS provider)
Container images	Access to Vijil’s ECR, or `vijil-console` and `vijil-console-frontend` built and pushed to your own registry
Helm chart	Distributed as an OCI artifact from Vijil’s ECR (same access as container images)

Resource Checklist

Before starting, review the full list of AWS resources and external services this guide will walk you through creating. Items marked Vijil + Customer or Vijil require Vijil to update access policies on their side — request this early.

Two steps require Vijil’s involvement before you can proceed:

ECR Pull Access (Step 2) — Vijil must grant your AWS account pull access to their container registry
Diamond Artifacts (Step 2) — Vijil must copy Diamond evaluation artifacts (~4.5 MB) to your S3 bucket

Resource	Required?	Who Creates It	Guide Section
VPC + subnets	Yes	Customer	Step 1: VPC
EKS Cluster (OIDC enabled)	Yes	Customer	Step 1: EKS
RDS PostgreSQL	Yes	Customer	Step 1: RDS
S3 Bucket	Yes	Customer	Step 1: S3
ACM Certificate	Yes	Customer	Step 1: ACM
ECR Cross-Account Pull	Yes (if using Vijil ECR)	Vijil + Customer	Step 2: ECR
Bedrock AgentCore IAM	Yes	Customer	Step 2: AgentCore
AgentCore Runtime	Non-dev only	Customer	Step 2: AgentCore Runtime
Diamond Artifacts	Yes	Vijil copies to your bucket	Step 2: Diamond Artifacts
LLM API Key	Yes	Customer (Groq, OpenAI, Anthropic, or local)	Step 3: Secrets
Darwin (separate Helm chart)	If evolution enabled	Customer	Step 4.5: Darwin
Route 53 DNS Records	Yes	Customer	Step 5: DNS

Architecture Overview

Vijil Console Architecture showing VPC, subnets, NLB, EKS, and RDS

All application workloads and the database live in private subnets. Only a single NGINX Network Load Balancer (NLB) sits in a public subnet , both console.* and console-api.* resolve to this one NLB, and the unified nginx router handles splitting frontend vs API traffic internally. EKS nodes reach S3 and ECR via NAT Gateway.

Step 1: AWS Infrastructure

VPC

Create a VPC with two public and two private subnets across two Availability Zones. The subnet tags are required. EKS uses them to discover subnets, and the AWS Load Balancer Controller uses them to place NLBs correctly.

export AWS_REGION=us-west-2
export VPC_NAME=vijil-prod

# Create VPC
VPC_ID=$(aws ec2 create-vpc \
  --cidr-block 10.0.0.0/16 \
  --region $AWS_REGION \
  --query 'Vpc.VpcId' --output text)

aws ec2 create-tags --resources $VPC_ID \
  --tags Key=Name,Value=$VPC_NAME

# Enable DNS hostnames (required for EKS)
aws ec2 modify-vpc-attribute --vpc-id $VPC_ID --enable-dns-hostnames

# Internet Gateway
IGW_ID=$(aws ec2 create-internet-gateway \
  --query 'InternetGateway.InternetGatewayId' --output text)
aws ec2 attach-internet-gateway --vpc-id $VPC_ID --internet-gateway-id $IGW_ID

Public subnets: the NLB goes here. It must have the kubernetes.io/role/elb=1 tag:

# Replace us-west-2a / us-west-2b with your AZs
PUBLIC_SUBNET_1=$(aws ec2 create-subnet \
  --vpc-id $VPC_ID --cidr-block 10.0.1.0/24 \
  --availability-zone ${AWS_REGION}a \
  --query 'Subnet.SubnetId' --output text)

PUBLIC_SUBNET_2=$(aws ec2 create-subnet \
  --vpc-id $VPC_ID --cidr-block 10.0.2.0/24 \
  --availability-zone ${AWS_REGION}b \
  --query 'Subnet.SubnetId' --output text)

for SUBNET in $PUBLIC_SUBNET_1 $PUBLIC_SUBNET_2; do
  aws ec2 create-tags --resources $SUBNET --tags \
    Key=Name,Value="$VPC_NAME-public" \
    Key=kubernetes.io/role/elb,Value=1
  aws ec2 modify-subnet-attribute --subnet-id $SUBNET \
    --map-public-ip-on-launch
done

# Route table: public subnets → Internet Gateway
PUBLIC_RT=$(aws ec2 create-route-table --vpc-id $VPC_ID \
  --query 'RouteTable.RouteTableId' --output text)
aws ec2 create-route --route-table-id $PUBLIC_RT \
  --destination-cidr-block 0.0.0.0/0 --gateway-id $IGW_ID
for SUBNET in $PUBLIC_SUBNET_1 $PUBLIC_SUBNET_2; do
  aws ec2 associate-route-table --subnet-id $SUBNET --route-table-id $PUBLIC_RT
done

Private subnets: EKS nodes and RDS go here. It must have the kubernetes.io/role/internal-elb=1 tag:

PRIVATE_SUBNET_1=$(aws ec2 create-subnet \
  --vpc-id $VPC_ID --cidr-block 10.0.3.0/24 \
  --availability-zone ${AWS_REGION}a \
  --query 'Subnet.SubnetId' --output text)

PRIVATE_SUBNET_2=$(aws ec2 create-subnet \
  --vpc-id $VPC_ID --cidr-block 10.0.4.0/24 \
  --availability-zone ${AWS_REGION}b \
  --query 'Subnet.SubnetId' --output text)

for SUBNET in $PRIVATE_SUBNET_1 $PRIVATE_SUBNET_2; do
  aws ec2 create-tags --resources $SUBNET --tags \
    Key=Name,Value="$VPC_NAME-private" \
    Key=kubernetes.io/role/internal-elb,Value=1
done

# NAT Gateway in the first public subnet (lets private nodes reach ECR and S3)
EIP=$(aws ec2 allocate-address --domain vpc \
  --query 'AllocationId' --output text)
NAT_GW=$(aws ec2 create-nat-gateway \
  --subnet-id $PUBLIC_SUBNET_1 --allocation-id $EIP \
  --query 'NatGateway.NatGatewayId' --output text)
echo "Waiting for NAT Gateway..."
aws ec2 wait nat-gateway-available --nat-gateway-ids $NAT_GW

# Route table: private subnets → NAT Gateway
PRIVATE_RT=$(aws ec2 create-route-table --vpc-id $VPC_ID \
  --query 'RouteTable.RouteTableId' --output text)
aws ec2 create-route --route-table-id $PRIVATE_RT \
  --destination-cidr-block 0.0.0.0/0 --nat-gateway-id $NAT_GW
for SUBNET in $PRIVATE_SUBNET_1 $PRIVATE_SUBNET_2; do
  aws ec2 associate-route-table --subnet-id $SUBNET --route-table-id $PRIVATE_RT
done

EKS Cluster

export CLUSTER_NAME=vijil-prod
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

eksctl create cluster \
  --name $CLUSTER_NAME \
  --region $AWS_REGION \
  --version 1.30 \
  --vpc-private-subnets=$PRIVATE_SUBNET_1,$PRIVATE_SUBNET_2 \
  --vpc-public-subnets=$PUBLIC_SUBNET_1,$PUBLIC_SUBNET_2 \
  --with-oidc \
  --nodegroup-name standard-workers \
  --node-type r5.xlarge \
  --node-volume-size 200 \
  --nodes 3 \
  --nodes-min 2 \
  --nodes-max 8 \
  --managed

# Verify
kubectl get nodes

--with-oidc is required. It enables the OIDC provider used by the EBS CSI driver’s IRSA configuration. Do not omit it.

Cluster Sizing Guidance

Vijil Console deploys 8 service pods plus telemetry (Grafana, Loki, Mimir, Tempo), a database migration job, and creates on-demand job pods for evaluations (Diamond), red-teaming, and scanning in separate namespaces.

Profile	Node Type	Nodes	vCPU Total	Memory Total	Use Case
Minimum	`m5.xlarge`	2	8 vCPU	32 GiB	Small teams, light evaluation load
Recommended	`r5.xlarge`	3	12 vCPU	96 GiB	Production use with concurrent evaluations
Dev reference	`r5.xlarge`	6	24 vCPU	192 GiB	Vijil’s own dev cluster (headroom for testing)

Why memory-optimized (r5)? Evaluation and red-team jobs are memory-intensive. r5 instances provide a better $/GiB ratio than general-purpose m5 for this workload. Disk: 200 GiB gp3 per node (default 80 GiB is tight when telemetry PVCs and container images accumulate). Set via --node-volume-size 200 in eksctl. Autoscaling: Enable Cluster Autoscaler or Karpenter. Set --nodes-min to your baseline and --nodes-max high enough to absorb burst evaluation jobs. Each Diamond/red-team job runs as a separate pod — 5 concurrent evaluations means 5 extra pods.

RDS PostgreSQL

# Security group: allow port 5432 from EKS nodes only
RDS_SG=$(aws ec2 create-security-group \
  --group-name vijil-rds-sg \
  --description "RDS access from EKS nodes" \
  --vpc-id $VPC_ID \
  --query 'GroupId' --output text)

EKS_NODE_SG=$(aws eks describe-cluster --name $CLUSTER_NAME \
  --query 'cluster.resourcesVpcConfig.clusterSecurityGroupId' --output text)

aws ec2 authorize-security-group-ingress \
  --group-id $RDS_SG \
  --protocol tcp --port 5432 \
  --source-group $EKS_NODE_SG

# Subnet group using private subnets
aws rds create-db-subnet-group \
  --db-subnet-group-name vijil-prod-subnets \
  --db-subnet-group-description "Vijil Console RDS subnets" \
  --subnet-ids $PRIVATE_SUBNET_1 $PRIVATE_SUBNET_2

# Custom parameter group with SSL disabled
# PostgreSQL 15+ defaults to rds.force_ssl=1; this lets the app connect without SSL
# for in-VPC-only traffic. Remove this if your app connection string uses SSL.
aws rds create-db-parameter-group \
  --db-parameter-group-name vijil-pg15-no-ssl \
  --db-parameter-group-family postgres15 \
  --description "Vijil Console RDS — SSL disabled for in-VPC clients"

aws rds modify-db-parameter-group \
  --db-parameter-group-name vijil-pg15-no-ssl \
  --parameters "ParameterName=rds.force_ssl,ParameterValue=0,ApplyMethod=pending-reboot"

# Create RDS instance
aws rds create-db-instance \
  --db-instance-identifier vijil-prod-pg \
  --db-instance-class db.t3.medium \
  --engine postgres \
  --engine-version 15 \
  --master-username postgres \
  --master-user-password YOUR_STRONG_PASSWORD \
  --db-name postgres \
  --allocated-storage 20 \
  --storage-type gp3 \
  --no-publicly-accessible \
  --vpc-security-group-ids $RDS_SG \
  --db-subnet-group-name vijil-prod-subnets \
  --db-parameter-group-name vijil-pg15-no-ssl \
  --backup-retention-period 7

# Wait for it to be available (takes a few minutes), then get the endpoint
aws rds wait db-instance-available --db-instance-identifier vijil-prod-pg
RDS_ENDPOINT=$(aws rds describe-db-instances \
  --db-instance-identifier vijil-prod-pg \
  --query 'DBInstances[0].Endpoint.Address' --output text)
echo "RDS endpoint: $RDS_ENDPOINT"

S3 Bucket

export S3_BUCKET=vijil-console-data-prod

aws s3api create-bucket \
  --bucket $S3_BUCKET \
  --region $AWS_REGION \
  --create-bucket-configuration LocationConstraint=$AWS_REGION

# Block all public access
aws s3api put-public-access-block \
  --bucket $S3_BUCKET \
  --public-access-block-configuration \
    BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true

Add a CORS configuration so the bucket can accept signed-URL file uploads from the browser:

cat > /tmp/s3-cors.json <<'EOF'
[
  {
    "AllowedHeaders": ["*"],
    "AllowedMethods": ["GET", "PUT", "POST", "HEAD"],
    "AllowedOrigins": ["https://*.yourdomain.com"],
    "ExposeHeaders": ["ETag", "x-amz-request-id", "x-amz-id-2"],
    "MaxAgeSeconds": 3000
  }
]
EOF

aws s3api put-bucket-cors --bucket $S3_BUCKET --cors-configuration file:///tmp/s3-cors.json

ACM Certificate

Request a wildcard certificate, it covers both console. and console-api. subdomains with a single cert:

CERT_ARN=$(aws acm request-certificate \
  --domain-name "*.yourdomain.com" \
  --validation-method DNS \
  --region $AWS_REGION \
  --query 'CertificateArn' --output text)

echo "Certificate ARN: $CERT_ARN"
# Add the CNAME record from the ACM console to Route 53 to validate:
# aws acm describe-certificate --certificate-arn $CERT_ARN

Wait for ISSUED status before proceeding:

aws acm wait certificate-validated --certificate-arn $CERT_ARN

Step 2: IAM and EKS Add-ons

IAM Policy for S3

Create a scoped S3 policy for your app data bucket. If you are deploying to multiple environments, use distinct policy names (e.g. VijilConsoleS3Access for prod, VijilConsoleS3AccessStaging for staging) to avoid conflicts.

export S3_BUCKET=vijil-console-data-prod  # adjust per environment

cat > /tmp/vijil-s3-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::${S3_BUCKET}",
        "arn:aws:s3:::${S3_BUCKET}/*"
      ]
    }
  ]
}
EOF

S3_POLICY_ARN=$(aws iam create-policy \
  --policy-name VijilConsoleS3Access \
  --policy-document file:///tmp/vijil-s3-policy.json \
  --query 'Policy.Arn' --output text)

echo "S3 policy ARN: $S3_POLICY_ARN"

S3 Access for Pods

There are two supported approaches. EKS Pod Identity (Option A) and Node IAM Role (Option B). EKS Pod Identity is recommended, it is what the current dev environment uses and scopes credentials to specific service accounts rather than all pods on a node.

Option A: EKS Pod Identity (recommended)

Pod Identity is the modern replacement for IRSA. Pods receive AWS credentials via http://169.254.170.23/v1/credentials, injected automatically by the Pod Identity Agent. The AWS SDK picks this up with no code changes.

# Step 1: Install the EKS Pod Identity Agent add-on
aws eks create-addon \
  --cluster-name $CLUSTER_NAME \
  --addon-name eks-pod-identity-agent \
  --region $AWS_REGION

kubectl -n kube-system rollout status daemonset/eks-pod-identity-agent

# Step 2: Create an IAM role for Pod Identity to assume
cat > /tmp/pod-identity-trust.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": { "Service": "pods.eks.amazonaws.com" },
      "Action": ["sts:AssumeRole", "sts:TagSession"]
    }
  ]
}
EOF

S3_ROLE_ARN=$(aws iam create-role \
  --role-name VijilConsoleS3Role \
  --assume-role-policy-document file:///tmp/pod-identity-trust.json \
  --query 'Role.Arn' --output text)

aws iam attach-role-policy \
  --role-name VijilConsoleS3Role \
  --policy-arn $S3_POLICY_ARN

# Step 3: Associate the role with the service accounts that need AWS access
# `default` covers most app pods; the other two are dedicated SAs created by the chart
for SA in default diamond-service redteam-service; do
  aws eks create-pod-identity-association \
    --cluster-name $CLUSTER_NAME \
    --namespace vijil-console \
    --service-account $SA \
    --role-arn $S3_ROLE_ARN \
    --region $AWS_REGION
done

echo "Pod Identity associations created for vijil-console/default, diamond-service, and redteam-service."

Option B: Node IAM Role (simpler, broader scope)

Attach the S3 policy directly to the EKS node group role. All pods on every node in the cluster will inherit these permissions, simpler to set up but less isolated.

NODE_ROLE=$(aws eks describe-nodegroup \
  --cluster-name $CLUSTER_NAME \
  --nodegroup-name standard-workers \
  --query 'nodegroup.nodeRole' --output text | awk -F/ '{print $NF}')

aws iam attach-role-policy \
  --role-name $NODE_ROLE \
  --policy-arn $S3_POLICY_ARN

EBS CSI Driver

The telemetry stack (Grafana, Loki, Mimir, Tempo) uses PersistentVolumeClaims backed by EBS. The driver must be installed before helm install. Install the driver via Helm first (this creates the ebs-csi-controller-sa service account that the IRSA script needs), then run the IRSA setup script:

# Step 1: Install the driver
helm repo add aws-ebs-csi-driver https://kubernetes-sigs.github.io/aws-ebs-csi-driver
helm repo update
helm install aws-ebs-csi-driver aws-ebs-csi-driver/aws-ebs-csi-driver \
  --namespace kube-system

# Step 2: Configure IRSA (from the repo root)
export CLUSTER_NAME=$CLUSTER_NAME
export REGION=$AWS_REGION
export ACCOUNT_ID=$ACCOUNT_ID
./scripts/setup-ebs-csi-driver.sh

# Step 3: Restart the controller to pick up the IAM role
kubectl rollout restart deployment ebs-csi-controller -n kube-system

# Verify both controller replicas are running (expect 5/5)
kubectl get pods -n kube-system -l app=ebs-csi-controller

Alternatively, use the EKS managed add-on: aws eks create-addon --cluster-name $CLUSTER_NAME --addon-name aws-ebs-csi-driverThe EBS CSI driver deploys two controller replicas across AZs. Verify they show 5/5 Running before proceeding.

ECR Pull Access

Vijil action required. Before you can pull container images or the Helm chart, Vijil must run put-registry-policy in their account to grant your AWS account pull access. Request this before starting Step 2. The registry-level policy covers all repositories, so a single setup grants access to both container images and the Helm chart (OCI artifact).

Option A: Vijil’s ECR (cross-account pull)

Vijil’s images and Helm chart live in account 266735823956 (region us-west-2). Two steps are required: one run by Vijil’s side, one by yours. Step 1 - Vijil side (run in account 266735823956): Grant your account pull access on the ECR registry.

# Run in Vijil account (266735823956)
export AWS_REGION=us-west-2
CUSTOMER_ACCOUNT_ID=YOUR_ACCOUNT_ID

cat > /tmp/ecr-registry-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowCustomerPull",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::${CUSTOMER_ACCOUNT_ID}:root"
      },
      "Action": [
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
        "ecr:BatchCheckLayerAvailability"
      ],
      "Resource": "arn:aws:ecr:${AWS_REGION}:266735823956:repository/*"
    }
  ]
}
EOF

aws ecr put-registry-policy --policy-text file:///tmp/ecr-registry-policy.json --region $AWS_REGION

To allow multiple customer accounts, add additional ARNs to Principal.AWS as an array and re-run put-registry-policy once. Step 2 - Customer side: Give the EKS node role permission to pull from Vijil’s ECR.

NODE_ROLE=$(aws eks describe-nodegroup \
  --cluster-name $CLUSTER_NAME \
  --nodegroup-name standard-workers \
  --query 'nodegroup.nodeRole' --output text | awk -F/ '{print $NF}')

aws iam attach-role-policy \
  --role-name $NODE_ROLE \
  --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPullOnly

cat > /tmp/vijil-ecr-pull-policy.json <<'EOF'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "ecr:GetAuthorizationToken",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecr:BatchGetImage",
        "ecr:GetDownloadUrlForLayer"
      ],
      "Resource": "arn:aws:ecr:us-west-2:266735823956:repository/*"
    }
  ]
}
EOF

aws iam put-role-policy \
  --role-name $NODE_ROLE \
  --policy-name VijilECRPull \
  --policy-document file:///tmp/vijil-ecr-pull-policy.json

After a minute, run a test pod that pulls from Vijil’s ECR:

kubectl run ecr-test \
  --image=266735823956.dkr.ecr.us-west-2.amazonaws.com/vijil-console:latest \
  --restart=Never --rm -it -- /bin/sh -c "echo ok"

Option B: Your own registry

Build and push images to your own registry, then override *.image.repository in your values file (see Step 3).

Bedrock AgentCore (Diamond / custom Harness)

The Diamond evaluation page and custom Harness workflows call the Bedrock AgentCore API. Attach a scoped policy to the same IAM principal your Console pods use for AWS access. If you used Pod Identity, attach it to the Pod Identity role. If you used the node IAM role, attach it to that.

# If using Pod Identity, set BEDROCK_ROLE to your Pod Identity role name (e.g. VijilConsoleS3Role)
# If using node IAM, resolve the node role:
BEDROCK_ROLE=${BEDROCK_ROLE:-$(aws eks describe-nodegroup \
  --cluster-name $CLUSTER_NAME \
  --nodegroup-name standard-workers \
  --query 'nodegroup.nodeRole' --output text | awk -F/ '{print $NF}')}

export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

cat > /tmp/vijil-bedrock-agentcore-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "BedrockAgentCoreControl",
      "Effect": "Allow",
      "Action": [
        "bedrock-agentcore:ListAgentRuntimes",
        "bedrock-agentcore-control:ListAgentRuntimes"
      ],
      "Resource": "arn:aws:bedrock-agentcore:${AWS_REGION}:${ACCOUNT_ID}:runtime/*"
    },
    {
      "Sid": "BedrockAgentCoreData",
      "Effect": "Allow",
      "Action": [
        "bedrock-agentcore:InvokeAgentRuntime",
        "bedrock-agentcore:GetSessionStatus",
        "bedrock-agentcore:StopSession",
        "bedrock-agentcore:GetAgentCard"
      ],
      "Resource": "arn:aws:bedrock-agentcore:${AWS_REGION}:${ACCOUNT_ID}:runtime/*"
    }
  ]
}
EOF

BEDROCK_POLICY_ARN=$(aws iam create-policy \
  --policy-name VijilBedrockAgentCoreAccess \
  --policy-document file:///tmp/vijil-bedrock-agentcore-policy.json \
  --query 'Policy.Arn' --output text)

aws iam attach-role-policy \
  --role-name $BEDROCK_ROLE \
  --policy-arn $BEDROCK_POLICY_ARN

The policy above includes bedrock-agentcore:GetAgentCard so the Console can fetch the agent card when creating custom Harnesses. If you created the policy before this was added, add a new policy version that includes bedrock-agentcore:GetAgentCard in the BedrockAgentCoreData statement and run aws iam create-policy-version --policy-arn <policy-arn> --policy-document file://policy.json --set-as-default.

Staging / Non-Dev: Custom Harness AgentCore Runtime

In dev, a custom Harness AgentCore runtime already exists in the account and is looked up by name. In any non-dev account (staging, customer), that runtime does not exist — you need to create one and point the Console at it via CUSTOM_HARNESS_AGENT_RUNTIME_ARN. Prerequisite: Vijil’s dev ECR (266735823956) must allow your account to pull images — see 2.4 Step 1.

Step 1: Create the execution role

The runtime runs the Harness container under an IAM role that Bedrock AgentCore assumes. Create a role with (a) a trust policy allowing bedrock-agentcore.amazonaws.com, and (b) permissions for ECR pull (your account + Vijil’s 266735823956), S3 for your app bucket, and CloudWatch Logs. Replace STAGING_ACCOUNT_ID with your account ID (e.g. 565393042914 for staging). Run in the staging account:

export AWS_PROFILE=your-staging-profile
export AWS_REGION=us-west-2
export STAGING_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

cat > /tmp/agentcore-trust.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AssumeRolePolicy",
      "Effect": "Allow",
      "Principal": { "Service": "bedrock-agentcore.amazonaws.com" },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": { "aws:SourceAccount": "${STAGING_ACCOUNT_ID}" },
        "ArnLike": { "aws:SourceArn": "arn:aws:bedrock-agentcore:${AWS_REGION}:${STAGING_ACCOUNT_ID}:*" }
      }
    }
  ]
}
EOF

aws iam create-role \
  --role-name vijil_staging_harness_agent_execution_role \
  --assume-role-policy-document file:///tmp/agentcore-trust.json \
  --description "Execution role for Bedrock AgentCore custom harness runtime"

Then attach permissions for ECR (your account + Vijil’s 266735823956), S3, CloudWatch Logs, X-Ray, and Bedrock model invocation. Replace vijil-console-data-staging with your app bucket name:

cat > /tmp/agentcore-permissions.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ECRTokenAccess",
      "Effect": "Allow",
      "Action": [ "ecr:GetAuthorizationToken" ],
      "Resource": "*"
    },
    {
      "Sid": "ECRPull",
      "Effect": "Allow",
      "Action": [ "ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer" ],
      "Resource": [
        "arn:aws:ecr:${AWS_REGION}:${STAGING_ACCOUNT_ID}:repository/*",
        "arn:aws:ecr:${AWS_REGION}:266735823956:repository/*"
      ]
    },
    {
      "Sid": "S3HarnessBucket",
      "Effect": "Allow",
      "Action": [ "s3:GetObject", "s3:PutObject", "s3:ListBucket", "s3:DeleteObject" ],
      "Resource": [
        "arn:aws:s3:::vijil-console-data-staging",
        "arn:aws:s3:::vijil-console-data-staging/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [ "logs:DescribeLogStreams", "logs:CreateLogGroup" ],
      "Resource": [ "arn:aws:logs:${AWS_REGION}:${STAGING_ACCOUNT_ID}:log-group:/aws/bedrock-agentcore/runtimes/*" ]
    },
    {
      "Effect": "Allow",
      "Action": [ "logs:DescribeLogGroups" ],
      "Resource": [ "arn:aws:logs:${AWS_REGION}:${STAGING_ACCOUNT_ID}:log-group:*" ]
    },
    {
      "Effect": "Allow",
      "Action": [ "logs:CreateLogStream", "logs:PutLogEvents" ],
      "Resource": [ "arn:aws:logs:${AWS_REGION}:${STAGING_ACCOUNT_ID}:log-group:/aws/bedrock-agentcore/runtimes/*:log-stream:*" ]
    },
    {
      "Effect": "Allow",
      "Action": [ "xray:PutTraceSegments", "xray:PutTelemetryRecords", "xray:GetSamplingRules", "xray:GetSamplingTargets" ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": "cloudwatch:PutMetricData",
      "Resource": "*",
      "Condition": { "StringEquals": { "cloudwatch:namespace": "bedrock-agentcore" } }
    },
    {
      "Sid": "BedrockModelInvocation",
      "Effect": "Allow",
      "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ],
      "Resource": [
        "arn:aws:bedrock:*::foundation-model/*",
        "arn:aws:bedrock:${AWS_REGION}:${STAGING_ACCOUNT_ID}:*"
      ]
    }
  ]
}
EOF

aws iam put-role-policy \
  --role-name vijil_staging_harness_agent_execution_role \
  --policy-name StagingAgentCoreExecutionPolicy \
  --policy-document file:///tmp/agentcore-permissions.json

ROLE_ARN=$(aws iam get-role \
  --role-name vijil_staging_harness_agent_execution_role \
  --query 'Role.Arn' --output text)
echo "Role ARN: $ROLE_ARN"

Step 2: Create the AgentCore runtime

Use the same VPC subnets and security group as your EKS. Container image: Vijil’s Harness agent in dev ECR (vijil-harness-agent-agentcore). Environment variables should match dev except the bucket; the following snippet uses the same set as dev with the staging bucket.

# Replace with your own subnet IDs and EKS node security group
SUBNET_IDS='["subnet-xxxxxxxx","subnet-yyyyyyyy"]'
SG_ID="sg-xxxxxxxxxxxxxxxxx"
CONTAINER_URI="266735823956.dkr.ecr.us-west-2.amazonaws.com/vijil-harness-agent-agentcore:latest"

# Set GOOGLE_API_KEY to your actual key — do not commit it
ENV_VARS='{"AGENT_OBSERVABILITY_ENABLED":"true","GOOGLE_API_KEY":"YOUR_GOOGLE_API_KEY","VIJIL_HARNESS_MODEL":"gemini-2.5-flash","VIJIL_HARNESS_OUTPUTS_BUCKET":"vijil-console-data-staging","VIJIL_HARNESS_OUTPUT_BACKEND":"aws","VIJIL_HARNESS_RPM":"500"}'

aws bedrock-agentcore-control create-agent-runtime \
  --agent-runtime-name vijil_staging_harness_agent \
  --agent-runtime-artifact "{\"containerConfiguration\":{\"containerUri\":\"$CONTAINER_URI\"}}" \
  --role-arn "$ROLE_ARN" \
  --network-configuration "{\"networkMode\":\"VPC\",\"networkModeConfig\":{\"securityGroups\":[\"$SG_ID\"],\"subnets\":$SUBNET_IDS}}" \
  --protocol-configuration '{"serverProtocol":"A2A"}' \
  --lifecycle-configuration '{"idleRuntimeSessionTimeout":1800,"maxLifetime":28800}' \
  --environment-variables "$ENV_VARS" \
  --region "$AWS_REGION" \
  --profile "$AWS_PROFILE"

The command returns agentRuntimeArn and agentRuntimeId. Set commonEnv.CUSTOM_HARNESS_AGENT_RUNTIME_ARN in your values (e.g. my-values.yaml) to the returned agentRuntimeArn.

Diamond Artifacts

Vijil action required. Vijil must copy Diamond evaluation artifacts (~4.5 MB, 38 files) to your S3 bucket before evaluations can run.

The Diamond evaluation engine downloads detector configs, harness definitions, and small model weights from S3 at startup. These artifacts are versioned and live under a diamond/{RESOURCE_VERSION}/ prefix, where RESOURCE_VERSION is a version string managed by Vijil (e.g. v1.2.0) that matches the deployed Diamond release. You do not need to set this value — the application reads it automatically. What gets copied:

diamond/{RESOURCE_VERSION}/
├── harnesses/           # Standard trust score harnesses (safety/, security/, reliability/)
├── detector_configs/    # YAML detector configs (strongreject, refusal, etc.)
├── weights/             # Small model weights (~2 MB)
├── garak/resources/     # Garak redteam resources
└── mitigations/         # Mitigation data/mapping

How it works:

Vijil runs aws s3 sync from their internal artifacts bucket to your S3 bucket.
You set RESOURCE_BUCKET_NAME to your own bucket (the same bucket as S3_BUCKET_NAME). No separate artifacts bucket or cross-account IAM is needed — your existing S3 policy already covers your own bucket.
If using the install block in your Helm values, install.aws.s3Bucket automatically sets both S3_BUCKET_NAME and RESOURCE_BUCKET_NAME to the same value. No extra configuration required.

Step 3: Helm Values and Secrets

Work from the chart directory:

cd helm_charts/vijil-console

3.1 Secrets file

cp values/secrets/example.yaml values/secrets/secrets.yaml

Edit values/secrets/secrets.yaml:

secrets:
  vijil-console:
    POSTGRES_PASSWORD: "the-rds-password-you-set-in-step-1.3"
    SECRET_KEY: "$(openssl rand -hex 32)"  # generate once and store securely

    # LLM provider for report generation and Diamond evaluations.
    # Default provider is Groq. To use a different provider, set
    # REPORTS_LLM_PROVIDER (groq | openai | anthropic | local) and
    # REPORTS_MODEL (e.g. gpt-4o, claude-sonnet-4-20250514).
    GROQ_API_KEY: "your-groq-api-key"           # default provider
    # OPENAI_API_KEY: "sk-..."                  # if using openai
    # ANTHROPIC_API_KEY: "sk-ant-..."           # if using anthropic

  vijil-diamond:
    GROQ_API_KEY: "your-groq-api-key"           # must match provider above
    # OPENAI_API_KEY: "sk-..."
    # ANTHROPIC_API_KEY: "sk-ant-..."

Never commit secrets.yaml — add it to .gitignore.

3.2 Environment values file

Create my-values.yaml. The install block is the simplest way to configure the chart — set values once and the chart derives the redundant env vars for you:

commonEnv:
  # Database (RDS endpoint from Step 1.3)
  POSTGRES_HOST: "vijil-prod-pg.xxxx.us-west-2.rds.amazonaws.com"

  # S3
  S3_BUCKET_NAME: "vijil-console-data-prod"
  AWS_REGION: "us-west-2"

  # Custom harness bucket (can be the same bucket)
  CUSTOM_HARNESS_BUCKET_NAME: "vijil-console-data-prod"

  # Frontend and API URLs — both must be reachable from the internet
  # because the React SPA calls the API from the user's browser
  VITE_API_PREFIX: "https://console-api.yourdomain.com"
  API_HOST: "https://console-api.yourdomain.com"
  API_DOMAIN_FOR_CSP: "console-api.yourdomain.com"
  CORS_ORIGINS: "https://console.yourdomain.com,https://console-api.yourdomain.com"

  # Custom harness / Diamond (optional): default CUSTOM_HARNESS_AGENT_NAME is "vijil_dev_harness_agent", which is looked up in this account. In non-dev (e.g. staging), set CUSTOM_HARNESS_AGENT_RUNTIME_ARN to your AgentCore runtime ARN, or create a runtime with that name.
  # CUSTOM_HARNESS_AGENT_RUNTIME_ARN: "arn:aws:bedrock-agentcore:us-west-2:ACCOUNT:runtime/RUNTIME_ID"

# Single nginx NLB — internet-facing (public subnets)
# NOTE: internal: false is required for users outside the VPC to reach both the UI and API.
# The default values.yaml has internal: true (private-only). Override it here.
nginx:
  service:
    aws:
      nlb:
        internal: false                                  # ← must be false for public access
        subnets: "subnet-public-1-id,subnet-public-2-id"  # public subnet IDs from Step 1.1
      tls:
        enabled: true
        certificateArn: "arn:aws:acm:us-west-2:YOUR_ACCOUNT_ID:certificate/YOUR_CERT_ID"

Why the nginx NLB must be internal: false for public deployments: The React frontend is a single-page application that runs in the user’s browser.VITE_API_PREFIX is baked into the frontend at build time, and the browser makes API calls directly to that URL. If the single nginx NLB is internal-only, browser requests will fail even if DNS resolves.

Step 4: Helm Install

# Authenticate Helm with Vijil's ECR (one-time per session)
aws ecr get-login-password --region us-west-2 \
  | helm registry login --username AWS --password-stdin 266735823956.dkr.ecr.us-west-2.amazonaws.com

# Install from ECR (replace VERSION with the chart version, e.g. 1.0.0)
helm install vijil-console \
  oci://266735823956.dkr.ecr.us-west-2.amazonaws.com/vijil-console --version VERSION \
  --namespace vijil-console \
  --create-namespace \
  -f values/secrets/secrets.yaml \
  -f my-values.yaml \
  --set telemetry.enabled=true

# Watch rollout
kubectl get pods -n vijil-console -w

Wait for all pods to reach Running. Then get the nginx NLB hostname:

kubectl get svc -n vijil-console vijil-console-nginx \
  -o jsonpath='{.status.loadBalancer.ingress[0].hostname}{"\n"}'

If EXTERNAL-IP shows <pending>, wait a moment and re-run.

Step 4.5: Deploy Darwin (Evolution Engine)

Darwin is the evolution engine. It runs in its own namespace (vijil-darwin) and is deployed separately from the vijil-console Helm chart. The Console chart routes /evolution/... API traffic to service-darwin.vijil-darwin.svc.cluster.local. Console starts fine without Darwin, but any /evolution calls will return 502 or 504 until Darwin is running.

# From the vijil-darwin repo root
helm upgrade vijil-darwin helm_charts/vijil-darwin \
  --install \
  --namespace vijil-darwin \
  --create-namespace \
  --values helm_charts/vijil-darwin/values.yaml \
  --values helm_charts/vijil-darwin/my-values.yaml \
  --set image.tag=<IMAGE_TAG>

Verify Darwin is healthy:

kubectl get pods -n vijil-darwin

# Health check via port-forward
kubectl port-forward -n vijil-darwin svc/service-darwin 8099:80 &
curl http://localhost:8099/health        # expect: 200
curl http://localhost:8099/health/ready  # expect: 200

Step 5: DNS

Create Route 53 records after the NLB is provisioned. Both console.* and console-api.* point to the same nginx NLB hostname.

HOSTED_ZONE_ID=$(aws route53 list-hosted-zones-by-name \
  --dns-name yourdomain.com \
  --query 'HostedZones[0].Id' --output text | awk -F/ '{print $3}')

NGINX_NLB=$(kubectl get svc vijil-console-nginx -n vijil-console \
  -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

# Alias hosted zone ID for NLBs in us-west-2
NLB_HOSTED_ZONE_ID=Z18D5FSROUN65G

cat > /tmp/dns-records.json <<EOF
{
  "Changes": [
    {
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "console.yourdomain.com",
        "Type": "A",
        "AliasTarget": {
          "HostedZoneId": "$NLB_HOSTED_ZONE_ID",
          "DNSName": "$NGINX_NLB",
          "EvaluateTargetHealth": true
        }
      }
    },
    {
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "console-api.yourdomain.com",
        "Type": "A",
        "AliasTarget": {
          "HostedZoneId": "$NLB_HOSTED_ZONE_ID",
          "DNSName": "$NGINX_NLB",
          "EvaluateTargetHealth": true
        }
      }
    }
  ]
}
EOF

aws route53 change-resource-record-sets \
  --hosted-zone-id $HOSTED_ZONE_ID \
  --change-batch file:///tmp/dns-records.json

If you prefer CNAME records instead of alias records, replace AliasTarget with "Type": "CNAME" and "ResourceRecords": [{"Value": "$NGINX_NLB"}] — no NLB_HOSTED_ZONE_ID needed.The NLB alias hosted zone ID differs by region — see the full list in AWS docs.

Step 6: Database Migrations

Migrations run automatically as Helm pre-install and pre-upgrade hooks — no manual action needed on a normal install. Verify they completed:

kubectl get jobs -n vijil-console
# Expect: vijil-console-migrate-teams and vijil-console-migrate-agent-environment
# both show COMPLETIONS: 1/1

If a hook failed, run the migration manually after pods are in Running state:

# Teams service
kubectl exec -n vijil-console \
  $(kubectl get pod -n vijil-console -l app=teams -o jsonpath='{.items[0].metadata.name}') -- \
  bash -c "cd /vijil-console && python -m alembic -c src/service_teams/alembic.ini upgrade head"

# Agent-environment service
kubectl exec -n vijil-console \
  $(kubectl get pod -n vijil-console -l app=agent-environment -o jsonpath='{.items[0].metadata.name}') -- \
  bash -c "cd /vijil-console && python -m alembic -c src/service_agent_environment/alembic.ini upgrade head"

Step 7: Bootstrap

Run from the repo root (not the chart directory):

export BOOTSTRAP_USER_EMAIL=admin@yourdomain.com
export BOOTSTRAP_USER_PASSWORD=your-secure-admin-password
export BOOTSTRAP_USER_NAME="Admin"
export TEAMS_SERVICE_URL=https://console-api.yourdomain.com

poetry run python scripts/bootstrap_teams.py

Optionally seed default content:

# Predefined agents (Groq, OpenAI, etc.)
poetry run python scripts/seed_agents.py

# System preset personas (professional + adversarial)
poetry run python scripts/seed_persona_presets.py

# Demographic dimensions for bias testing
poetry run python scripts/seed_demographics.py

# Compliance policy presets (GDPR, CCPA, OWASP, etc.)
poetry run python scripts/seed_policy_presets.py

For staging environments with in-cluster sample agents: Set these env vars before running seed_agents.py to point seeded agents at in-cluster sample agent services:

export SAMPLE_AGENTS_USE_EKS_URLS=1
export SAMPLE_AGENTS_EKS_NAMESPACE=vijil-sample-agents  # default
export SAMPLE_AGENTS_DUMMY_API_KEY=dummy

Then run the seed script (or use --update to refresh existing agents’ URLs to http://<service>.<namespace>.svc.cluster.local/v1):

poetry run python scripts/seed_agents.py
# Or to update existing agents' URLs without re-seeding:
poetry run python scripts/seed_agents.py --update

Step 8: Verify

# All pods running
kubectl get pods -n vijil-console
kubectl get pods -n vijil-telemetry
kubectl get pods -n vijil-darwin

# API health checks — the gateway exposes these paths (no root /healthz)
curl -f https://console-api.yourdomain.com/teams/healthz
curl -f https://console-api.yourdomain.com/evaluations/healthz
curl -f https://console-api.yourdomain.com/console/healthz

# Frontend loads
curl -f -o /dev/null -w "%{http_code}" https://console.yourdomain.com/
# Expect: 200

# Run smoke tests
make helm-smoketest

Access the UI at https://console.yourdomain.com and log in with the credentials you set in bootstrap.

Get Started

Evaluate Agents

Protect Agents

Reference

Deploy Vijil

Prerequisites

Resource Checklist

Architecture Overview

Step 1: AWS Infrastructure

VPC

EKS Cluster

Cluster Sizing Guidance

RDS PostgreSQL

S3 Bucket

ACM Certificate

Step 2: IAM and EKS Add-ons

IAM Policy for S3

S3 Access for Pods

Option A: EKS Pod Identity (recommended)

Option B: Node IAM Role (simpler, broader scope)

EBS CSI Driver

ECR Pull Access

Option A: Vijil’s ECR (cross-account pull)

Option B: Your own registry

Bedrock AgentCore (Diamond / custom Harness)

Staging / Non-Dev: Custom Harness AgentCore Runtime

Step 1: Create the execution role

Step 2: Create the AgentCore runtime

Diamond Artifacts

Step 3: Helm Values and Secrets

3.1 Secrets file

3.2 Environment values file

Step 4: Helm Install

Step 4.5: Deploy Darwin (Evolution Engine)

Step 5: DNS

Step 6: Database Migrations

Step 7: Bootstrap

Step 8: Verify

​Prerequisites

​Resource Checklist

​Architecture Overview

​Step 1: AWS Infrastructure

​VPC

​EKS Cluster

​Cluster Sizing Guidance

​RDS PostgreSQL

​S3 Bucket

​ACM Certificate

​Step 2: IAM and EKS Add-ons

​IAM Policy for S3

​S3 Access for Pods

​Option A: EKS Pod Identity (recommended)

​Option B: Node IAM Role (simpler, broader scope)

​EBS CSI Driver

​ECR Pull Access

​Option A: Vijil’s ECR (cross-account pull)

​Option B: Your own registry

​Bedrock AgentCore (Diamond / custom Harness)

​Staging / Non-Dev: Custom Harness AgentCore Runtime

​Step 1: Create the execution role

​Step 2: Create the AgentCore runtime

​Diamond Artifacts

​Step 3: Helm Values and Secrets

​3.1 Secrets file

​3.2 Environment values file

​Step 4: Helm Install

​Step 4.5: Deploy Darwin (Evolution Engine)

​Step 5: DNS

​Step 6: Database Migrations

​Step 7: Bootstrap

​Step 8: Verify

Prerequisites

Resource Checklist

Architecture Overview

Step 1: AWS Infrastructure

VPC

EKS Cluster

Cluster Sizing Guidance

RDS PostgreSQL

S3 Bucket

ACM Certificate

Step 2: IAM and EKS Add-ons

IAM Policy for S3

S3 Access for Pods

Option A: EKS Pod Identity (recommended)

Option B: Node IAM Role (simpler, broader scope)

EBS CSI Driver

ECR Pull Access

Option A: Vijil’s ECR (cross-account pull)

Option B: Your own registry

Bedrock AgentCore (Diamond / custom Harness)

Staging / Non-Dev: Custom Harness AgentCore Runtime

Step 1: Create the execution role

Step 2: Create the AgentCore runtime

Diamond Artifacts

Step 3: Helm Values and Secrets

3.1 Secrets file

3.2 Environment values file

Step 4: Helm Install

Step 4.5: Deploy Darwin (Evolution Engine)

Step 5: DNS

Step 6: Database Migrations

Step 7: Bootstrap

Step 8: Verify