Configuration Reference
Values
Key |
Type |
Default |
Description |
---|---|---|---|
nameOverride |
string |
|
Unique identifier of SuperSONIC instance (equal to release name by default) |
serverLoadMetric |
string |
|
A metric used by both KEDA autoscaler and Envoy’s prometheus-based rate limiter. # Default metric (inference queue latency) is defined in templates/_helpers.tpl |
serverLoadThreshold |
int |
|
Threshold for the metric |
triton.replicas |
int |
|
Number of Triton server instances (if autoscaling is disabled) |
triton.image |
string |
|
Docker image for the Triton server |
triton.command |
list |
|
Command and arguments to run in Triton container |
triton.args[0] |
string |
|
|
triton.resources |
object |
|
Resource limits and requests for each Triton instance. You can add necessary GPU request here. |
triton.affinity |
object |
|
Affinity rules for Triton pods - another way to request GPUs |
triton.modelRepository |
object |
|
Model repository configuration |
triton.modelRepository.mountPath |
string |
|
Model repository mount path |
triton.service.labels |
object |
|
|
triton.service.annotations |
object |
|
|
triton.service.ports |
list |
|
Ports for communication with Triton servers |
triton.readinessProbe |
object |
|
Custom readiness probe configuration |
triton.readinessProbe.reset |
bool |
|
If true, will reset settings to k8s defaults (other readinessProbe settings will be ignored) |
triton.startupProbe |
object |
|
Custom startup probe configuration |
triton.startupProbe.reset |
bool |
|
If true, will reset settings to k8s defaults (other startupProbe settings will be ignored) |
envoy.enabled |
bool |
|
Enable Envoy Proxy |
envoy.replicas |
int |
|
Number of Envoy Proxy pods in Deployment |
envoy.image |
string |
|
Envoy Proxy Docker image |
envoy.args |
list |
|
Arguments for Envoy |
envoy.resources |
object |
|
Resource requests and limits for Envoy Proxy. Note: an Envoy Proxy with too many connections might run out of CPU |
envoy.service.type |
string |
|
This is the client-facing endpoint. In order to be able to connect to it, either enable ingress, or use type: LoadBalancer. |
envoy.service.ports |
list |
|
Envoy Service ports |
envoy.ingress |
object |
|
Ingress configuration for Envoy |
envoy.grpc_route_timeout |
string |
|
Timeout for gRPC route in Envoy; disabled by default (0s), preventing Envoy from closing connections too early. |
envoy.rate_limiter.listener_level |
object |
|
This rate limiter explicitly controls the number of client connections to the Envoy Proxy. |
envoy.rate_limiter.listener_level.enabled |
bool |
|
Enable rate limiter |
envoy.rate_limiter.listener_level.max_tokens |
int |
|
Maximum number of simultaneous connections to the Envoy Proxy. Each new connection takes a “token” from the “bucket” which initially contains |
envoy.rate_limiter.listener_level.tokens_per_fill |
int |
|
|
envoy.rate_limiter.listener_level.fill_interval |
string |
|
For example, adding a new token every 12 seconds allows 5 new connections every minute. |
envoy.rate_limiter.prometheus_based |
object |
|
This rate limiter rejects new connections based on metric extracted from Prometheus (e.g. inference queue latency). The metric is taken from parameter |
envoy.rate_limiter.prometheus_based.enabled |
bool |
|
Enable rate limiter |
envoy.loadBalancerPolicy |
string |
|
Envoy load balancer policy. Options: ROUND_ROBIN, LEAST_REQUEST, RING_HASH, RANDOM, MAGLEV |
envoy.auth.enabled |
bool |
|
Enable authentication in Envoy proxy |
envoy.auth.jwt_issuer |
string |
|
|
envoy.auth.jwt_remote_jwks_uri |
string |
|
|
envoy.auth.audiences |
list |
|
|
envoy.auth.url |
string |
|
|
envoy.auth.port |
int |
|
|
autoscaler.enabled |
bool |
|
Enable autoscaling (requires Prometheus to also be enabled). Autoscaling will be based on the metric is taken from parameter |
autoscaler.minReplicaCount |
int |
|
Minimum and maximum number of Triton servers. Warning: if min=0 and desired Prometheus metric is empty, the first server will never start |
autoscaler.maxReplicaCount |
int |
|
|
autoscaler.zeroIdleReplicas |
bool |
|
If set to true, the server will release all GPUs when idle. Be careful: if the scaling metric is extracted from Triton servers, it will be unavailable, and scaling from 0 to 1 will never happen. |
autoscaler.scaleUp.stabilizationWindowSeconds |
int |
|
|
autoscaler.scaleUp.periodSeconds |
int |
|
|
autoscaler.scaleUp.stepsize |
int |
|
|
autoscaler.scaleDown.stabilizationWindowSeconds |
int |
|
|
autoscaler.scaleDown.periodSeconds |
int |
|
|
autoscaler.scaleDown.stepsize |
int |
|
|
nodeSelector |
object |
|
Node selector for all pods (Triton and Envoy) |
tolerations |
list |
|
Tolerations for all pods (Triton and Envoy) |
prometheus.external.enabled |
bool |
|
Enable external Prometheus instance. If true, Prometheus parameters outside of prometheus.external will be ignored. |
prometheus.external.url |
string |
|
External Prometheus server url |
prometheus.external.port |
int |
|
External Prometheus server port number |
prometheus.external.scheme |
string |
|
Specify whether external Prometheus endpoint is exposed as http or https |
prometheus.enabled |
bool |
|
Enable or disable custom Prometheus deployment |
prometheus.server.useExistingClusterRoleName |
string |
|
|
prometheus.server.releaseNamespace |
bool |
|
|
prometheus.server.persistentVolume.enabled |
bool |
|
|
prometheus.server.resources.requests.cpu |
string |
|
|
prometheus.server.resources.requests.memory |
string |
|
|
prometheus.server.resources.limits.cpu |
int |
|
|
prometheus.server.resources.limits.memory |
string |
|
|
prometheus.server.retention |
string |
|
|
prometheus.server.global.scrape_interval |
string |
|
|
prometheus.server.global.evaluation_interval |
string |
|
|
prometheus.server.service.enabled |
bool |
|
|
prometheus.server.service.servicePort |
int |
|
|
prometheus.server.configMapOverrideName |
string |
|
|
prometheus.server.ingress |
object |
|
Ingress configuration for Prometheus |
prometheus.serviceAccounts.server.create |
bool |
|
|
prometheus.serviceAccounts.server.name |
string |
|
|
prometheus.rbac.create |
bool |
|
|
prometheus.alertmanager.enabled |
bool |
|
|
prometheus.pushgateway.enabled |
bool |
|
|
prometheus.kube-state-metrics.enabled |
bool |
|
|
prometheus.prometheus-node-exporter.enabled |
bool |
|
|
prometheus.prometheus-pushgateway.enabled |
bool |
|
|
prometheus.configmapReload.prometheus.enabled |
bool |
|
|
grafana.enabled |
bool |
|
Enable Grafana |
grafana.adminUser |
string |
|
|
grafana.adminPassword |
string |
|
|
grafana.persistence.enabled |
bool |
|
|
grafana.rbac.create |
bool |
|
|
grafana.serviceAccount.create |
bool |
|
|
grafana.datasources |
object |
|
Grafana datasources configuration |
grafana.dashboardProviders.”dashboardproviders.yaml”.apiVersion |
int |
|
|
grafana.dashboardProviders.”dashboardproviders.yaml”.providers[0].name |
string |
|
|
grafana.dashboardProviders.”dashboardproviders.yaml”.providers[0].orgId |
int |
|
|
grafana.dashboardProviders.”dashboardproviders.yaml”.providers[0].folder |
string |
|
|
grafana.dashboardProviders.”dashboardproviders.yaml”.providers[0].type |
string |
|
|
grafana.dashboardProviders.”dashboardproviders.yaml”.providers[0].disableDeletion |
bool |
|
|
grafana.dashboardProviders.”dashboardproviders.yaml”.providers[0].editable |
bool |
|
|
grafana.dashboardProviders.”dashboardproviders.yaml”.providers[0].options.path |
string |
|
|
grafana.dashboardsConfigMaps.default |
string |
|
|
grafana.”grafana.ini”.auth.disable_login_form |
bool |
|
|
grafana.”grafana.ini”.”auth.anonymous”.enabled |
bool |
|
|
grafana.”grafana.ini”.”auth.anonymous”.org_role |
string |
|
|
grafana.”grafana.ini”.dashboards.default_home_dashboard_path |
string |
|
|
grafana.”grafana.ini”.server.root_url |
string |
|
|
grafana.resources.limits.cpu |
int |
|
|
grafana.resources.limits.memory |
string |
|
|
grafana.resources.requests.cpu |
string |
|
|
grafana.resources.requests.memory |
string |
|
|
grafana.service.type |
string |
|
|
grafana.service.port |
int |
|
|
grafana.service.targetPort |
int |
|
|
grafana.ingress |
object |
|
Ingress configuration for Grafana |