Configuration Reference

Values

Key

Type

Default

Description

nameOverride

string

""

Unique identifier of SuperSONIC instance (equal to release name by default)

serverLoadMetric

string

""

A metric used by both KEDA autoscaler and Envoy’s prometheus-based rate limiter. # Default metric (inference queue latency) is defined in templates/_helpers.tpl

serverLoadThreshold

int

100

Threshold for the metric

triton.replicas

int

1

Number of Triton server instances (if autoscaling is disabled)

triton.image

string

"nvcr.io/nvidia/tritonserver:24.12-py3-min"

Docker image for the Triton server

triton.command

list

["/bin/sh","-c"]

Command and arguments to run in Triton container

triton.args[0]

string

"/opt/tritonserver/bin/tritonserver \\\n--model-repository=/tmp/ \\\n--log-verbose=0 \\\n--exit-timeout-secs=60\n"

triton.resources

object

{"limits":{"cpu":1,"memory":"2G"},"requests":{"cpu":1,"memory":"2G"}}

Resource limits and requests for each Triton instance. You can add necessary GPU request here.

triton.affinity

object

{}

Affinity rules for Triton pods - another way to request GPUs

triton.modelRepository

object

{"enabled":false,"mountPath":""}

Model repository configuration

triton.modelRepository.mountPath

string

""

Model repository mount path

triton.service.labels

object

{}

triton.service.annotations

object

{}

triton.service.ports

list

[{"name":"http","port":8000,"protocol":"TCP","targetPort":8000},{"name":"grpc","port":8001,"protocol":"TCP","targetPort":8001},{"name":"metrics","port":8002,"protocol":"TCP","targetPort":8002}]

Ports for communication with Triton servers

triton.readinessProbe

object

{"command":["/bin/sh","-c","curl -sf http://localhost:8000/v2/health/ready > /dev/null && [ ! -f /tmp/shutdown ]"],"failureThreshold":10,"initialDelaySeconds":10,"periodSeconds":10,"reset":false,"successThreshold":1,"timeoutSeconds":5}

Custom readiness probe configuration

triton.readinessProbe.reset

bool

false

If true, will reset settings to k8s defaults (other readinessProbe settings will be ignored)

triton.startupProbe

object

{"failureThreshold":12,"httpGet":{"path":"/v2/health/ready","port":"http"},"initialDelaySeconds":0,"periodSeconds":10,"reset":false}

Custom startup probe configuration

triton.startupProbe.reset

bool

false

If true, will reset settings to k8s defaults (other startupProbe settings will be ignored)

envoy.enabled

bool

true

Enable Envoy Proxy

envoy.replicas

int

1

Number of Envoy Proxy pods in Deployment

envoy.image

string

"envoyproxy/envoy:v1.30-latest"

Envoy Proxy Docker image

envoy.args

list

["--config-path","/etc/envoy/envoy.yaml","--log-level","info","--log-path","/dev/stdout"]

Arguments for Envoy

envoy.resources

object

{"limits":{"cpu":2,"memory":"4G"},"requests":{"cpu":1,"memory":"2G"}}

Resource requests and limits for Envoy Proxy. Note: an Envoy Proxy with too many connections might run out of CPU

envoy.service.type

string

"ClusterIP"

This is the client-facing endpoint. In order to be able to connect to it, either enable ingress, or use type: LoadBalancer.

envoy.service.ports

list

[{"name":"grpc","port":8001,"targetPort":8001},{"name":"admin","port":9901,"targetPort":9901}]

Envoy Service ports

envoy.ingress

object

{"annotations":{},"enabled":false,"hostName":"","ingressClassName":""}

Ingress configuration for Envoy

envoy.grpc_route_timeout

string

"0s"

Timeout for gRPC route in Envoy; disabled by default (0s), preventing Envoy from closing connections too early.

envoy.rate_limiter.listener_level

object

{"enabled":false,"fill_interval":"12s","max_tokens":5,"tokens_per_fill":1}

This rate limiter explicitly controls the number of client connections to the Envoy Proxy.

envoy.rate_limiter.listener_level.enabled

bool

false

Enable rate limiter

envoy.rate_limiter.listener_level.max_tokens

int

5

Maximum number of simultaneous connections to the Envoy Proxy. Each new connection takes a “token” from the “bucket” which initially contains max_tokens tokens.

envoy.rate_limiter.listener_level.tokens_per_fill

int

1

tokens_per_fill tokens are added to the “bucket” every fill_interval, allowing new connections to be established.

envoy.rate_limiter.listener_level.fill_interval

string

"12s"

For example, adding a new token every 12 seconds allows 5 new connections every minute.

envoy.rate_limiter.prometheus_based

object

{"enabled":false,"luaConfig":"cfg/envoy-filter.lua"}

This rate limiter rejects new connections based on metric extracted from Prometheus (e.g. inference queue latency). The metric is taken from parameter prometheus.serverLoadMetric, and the threshold is set by prometheus.serverLoadThreshold. These parameters are the same as those used by the KEDA autoscaler.

envoy.rate_limiter.prometheus_based.enabled

bool

false

Enable rate limiter

envoy.loadBalancerPolicy

string

"LEAST_REQUEST"

Envoy load balancer policy. Options: ROUND_ROBIN, LEAST_REQUEST, RING_HASH, RANDOM, MAGLEV

envoy.auth.enabled

bool

false

Enable authentication in Envoy proxy

envoy.auth.jwt_issuer

string

""

envoy.auth.jwt_remote_jwks_uri

string

""

envoy.auth.audiences

list

[]

envoy.auth.url

string

""

envoy.auth.port

int

443

autoscaler.enabled

bool

false

Enable autoscaling (requires Prometheus to also be enabled). Autoscaling will be based on the metric is taken from parameter prometheus.serverLoadMetric, new Triton servers will spawn if the metric exceedds the threshold set by prometheus.serverLoadThreshold.

autoscaler.minReplicaCount

int

1

Minimum and maximum number of Triton servers. Warning: if min=0 and desired Prometheus metric is empty, the first server will never start

autoscaler.maxReplicaCount

int

2

autoscaler.zeroIdleReplicas

bool

false

If set to true, the server will release all GPUs when idle. Be careful: if the scaling metric is extracted from Triton servers, it will be unavailable, and scaling from 0 to 1 will never happen.

autoscaler.scaleUp.stabilizationWindowSeconds

int

60

autoscaler.scaleUp.periodSeconds

int

60

autoscaler.scaleUp.stepsize

int

1

autoscaler.scaleDown.stabilizationWindowSeconds

int

600

autoscaler.scaleDown.periodSeconds

int

120

autoscaler.scaleDown.stepsize

int

1

nodeSelector

object

{}

Node selector for all pods (Triton and Envoy)

tolerations

list

[]

Tolerations for all pods (Triton and Envoy)

prometheus.external.enabled

bool

false

Enable external Prometheus instance. If true, Prometheus parameters outside of prometheus.external will be ignored.

prometheus.external.url

string

""

External Prometheus server url

prometheus.external.port

int

443

External Prometheus server port number

prometheus.external.scheme

string

"https"

Specify whether external Prometheus endpoint is exposed as http or https

prometheus.enabled

bool

false

Enable or disable custom Prometheus deployment

prometheus.server.useExistingClusterRoleName

string

"supersonic-prometheus-role"

prometheus.server.releaseNamespace

bool

true

prometheus.server.persistentVolume.enabled

bool

false

prometheus.server.resources.requests.cpu

string

"500m"

prometheus.server.resources.requests.memory

string

"512Mi"

prometheus.server.resources.limits.cpu

int

1

prometheus.server.resources.limits.memory

string

"1Gi"

prometheus.server.retention

string

"15d"

prometheus.server.global.scrape_interval

string

"5s"

prometheus.server.global.evaluation_interval

string

"5s"

prometheus.server.service.enabled

bool

true

prometheus.server.service.servicePort

int

9090

prometheus.server.configMapOverrideName

string

"prometheus-config"

prometheus.server.ingress

object

{"annotations":{},"enabled":false,"hosts":[],"ingressClassName":"","tls":[{"hosts":[]}]}

Ingress configuration for Prometheus

prometheus.serviceAccounts.server.create

bool

false

prometheus.serviceAccounts.server.name

string

"supersonic-prometheus-sa"

prometheus.rbac.create

bool

false

prometheus.alertmanager.enabled

bool

false

prometheus.pushgateway.enabled

bool

false

prometheus.kube-state-metrics.enabled

bool

false

prometheus.prometheus-node-exporter.enabled

bool

false

prometheus.prometheus-pushgateway.enabled

bool

false

prometheus.configmapReload.prometheus.enabled

bool

false

grafana.enabled

bool

false

Enable Grafana

grafana.adminUser

string

"admin"

grafana.adminPassword

string

"admin"

grafana.persistence.enabled

bool

false

grafana.rbac.create

bool

false

grafana.serviceAccount.create

bool

false

grafana.datasources

object

{"datasources.yaml":{"apiVersion":1,"datasources":[{"access":"proxy","isDefault":true,"jsonData":{"timeInterval":"5s","tlsSkipVerify":true},"name":"prometheus","type":"prometheus","url":"http://supersonic-prometheus-server:9090"}]}}

Grafana datasources configuration

grafana.dashboardProviders.”dashboardproviders.yaml”.apiVersion

int

1

grafana.dashboardProviders.”dashboardproviders.yaml”.providers[0].name

string

"default"

grafana.dashboardProviders.”dashboardproviders.yaml”.providers[0].orgId

int

1

grafana.dashboardProviders.”dashboardproviders.yaml”.providers[0].folder

string

""

grafana.dashboardProviders.”dashboardproviders.yaml”.providers[0].type

string

"file"

grafana.dashboardProviders.”dashboardproviders.yaml”.providers[0].disableDeletion

bool

false

grafana.dashboardProviders.”dashboardproviders.yaml”.providers[0].editable

bool

true

grafana.dashboardProviders.”dashboardproviders.yaml”.providers[0].options.path

string

"/var/lib/grafana/dashboards/default"

grafana.dashboardsConfigMaps.default

string

"supersonic-grafana-default-dashboard"

grafana.”grafana.ini”.auth.disable_login_form

bool

true

grafana.”grafana.ini”.”auth.anonymous”.enabled

bool

true

grafana.”grafana.ini”.”auth.anonymous”.org_role

string

"Admin"

grafana.”grafana.ini”.dashboards.default_home_dashboard_path

string

"/var/lib/grafana/dashboards/default/default.json"

grafana.”grafana.ini”.server.root_url

string

""

grafana.resources.limits.cpu

int

1

grafana.resources.limits.memory

string

"1Gi"

grafana.resources.requests.cpu

string

"100m"

grafana.resources.requests.memory

string

"128Mi"

grafana.service.type

string

"ClusterIP"

grafana.service.port

int

80

grafana.service.targetPort

int

3000

grafana.ingress

object

{"annotations":{},"enabled":false,"hosts":[],"ingressClassName":"","path":"/","pathType":"ImplementationSpecific","tls":[{"hosts":[]}]}

Ingress configuration for Grafana