API Reference¶

Packages¶

inference.networking.x-k8s.io/v1alpha2

inference.networking.x-k8s.io/v1alpha2¶

Package v1alpha2 contains API Schema definitions for the inference.networking.x-k8s.io API group.

Resource Types¶

InferenceModelRewrite
InferenceObjective

Group¶

Underlying type: string

Group refers to a Kubernetes Group. It must either be an empty string or a RFC 1123 subdomain.

This validation is based off of the corresponding Kubernetes validation: https://github.com/kubernetes/apimachinery/blob/02cfb53916346d085a6c6c7c66f882e3c6b0eca6/pkg/util/validation/validation.go#L208

Valid values include:

"" - empty string implies core Kubernetes API group
"gateway.networking.k8s.io"
"foo.example.com"

Invalid values include:

"example.com/bar" - "/" is an invalid character

Validation: - MaxLength: 253 - Pattern: ^$|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$

Appears in: - PoolObjectReference

InferenceModelRewrite¶

InferenceModelRewrite is the Schema for the InferenceModelRewrite API.

Field	Description	Default	Validation
`apiVersion` string	`inference.networking.x-k8s.io/v1alpha2`
`kind` string	`InferenceModelRewrite`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` InferenceModelRewriteSpec
`status` InferenceModelRewriteStatus

InferenceModelRewriteRule¶

InferenceModelRewriteRule defines the match criteria and corresponding action. For details on how precedence is determined across multiple rules and InferenceModelRewrite resources, see the "Precedence and Conflict Resolution" section in InferenceModelRewriteSpec.

Appears in: - InferenceModelRewriteSpec

Field	Description	Default	Validation
`matches` Match array
`targets` TargetModel array			MinItems: 1

InferenceModelRewriteSpec¶

InferenceModelRewriteSpec defines the desired state of InferenceModelRewrite.

Appears in: - InferenceModelRewrite

Field	Description	Default	Validation
`poolRef` PoolObjectReference	PoolRef is a reference to the inference pool.		Required: {}
`rules` InferenceModelRewriteRule array

InferenceModelRewriteStatus¶

InferenceModelRewriteStatus defines the observed state of InferenceModelRewrite.

Appears in: - InferenceModelRewrite

Field	Description	Default	Validation
`conditions` Condition array	Conditions track the state of the InferenceModelRewrite. Known condition types are: * "Accepted"	[map[lastTransitionTime:1970-01-01T00:00:00Z message:Waiting for controller reason:Pending status:Unknown type:Accepted]]	MaxItems: 8

InferenceObjective¶

InferenceObjective is the Schema for the InferenceObjectives API.

Field	Description	Default	Validation
`apiVersion` string	`inference.networking.x-k8s.io/v1alpha2`
`kind` string	`InferenceObjective`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` InferenceObjectiveSpec
`status` InferenceObjectiveStatus

InferenceObjectiveSpec¶

InferenceObjectiveSpec represents the desired state of a specific model use case. This resource is managed by the "Inference Workload Owner" persona.

The Inference Workload Owner persona is someone that trains, verifies, and leverages a large language model from a model frontend, drives the lifecycle and rollout of new versions of those models, and defines the specific performance and latency goals for the model. These workloads are expected to operate within an InferencePool sharing compute capacity with other InferenceObjectives, defined by the Inference Platform Admin.

Appears in: - InferenceObjective

Field	Description	Default	Validation
`priority` integer	Priority defines how important it is to serve the request compared to other requests in the same pool. Priority is an integer value that defines the priority of the request. The higher the value, the more critical the request is; negative values are allowed. No default value is set for this field, allowing for future additions of new fields that may 'one of' with this field. However, implementations that consume this field (such as the Endpoint Picker) will treat an unset value as '0'. Priority is used in flow control, primarily in the event of resource scarcity(requests need to be queued). All requests will be queued, and flow control will always allow requests of higher priority to be served first. Fairness is only enforced and tracked between requests of the same priority. Example: requests with Priority 10 will always be served before requests with Priority of 0 (the value used if Priority is unset or no InferenceObjective is specified). Similarly requests with a Priority of -10 will always be served after requests with Priority of 0.
`poolRef` PoolObjectReference	PoolRef is a reference to the inference pool, the pool must exist in the same namespace.		Required: {}

InferenceObjectiveStatus¶

InferenceObjectiveStatus defines the observed state of InferenceObjective

Appears in: - InferenceObjective

Field	Description	Default	Validation
`conditions` Condition array	Conditions track the state of the InferenceObjective. Known condition types are: * "Accepted"	[map[lastTransitionTime:1970-01-01T00:00:00Z message:Waiting for controller reason:Pending status:Unknown type:Ready]]	MaxItems: 8

Kind¶

Underlying type: string

Kind refers to a Kubernetes Kind.

Valid values include:

"Service"
"HTTPRoute"

Invalid values include:

"invalid/kind" - "/" is an invalid character

Validation: - MaxLength: 63 - MinLength: 1 - Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$

Appears in: - PoolObjectReference

Match¶

Match defines the criteria for matching the LLM requests.

Appears in: - InferenceModelRewriteRule

Field	Description	Default	Validation
`model` ModelMatch	Model specifies the criteria for matching the 'model' field within the JSON request body.

MatchValidationType¶

Underlying type: string

MatchValidationType specifies the type of string matching to use.

Validation: - Enum: [Exact]

Appears in: - ModelMatch

Field	Description
`Exact`	MatchExact indicates that the model name must match exactly.

ModelMatch¶

ModelMatch defines how to match against the model name in the request body.

Appears in: - Match

Field	Description	Default	Validation
`type` MatchValidationType	Type specifies the kind of string matching to use. Supported value is "Exact". Defaults to "Exact".	Exact	Enum: [Exact]
`value` string	Value is the model name string to match against.		MinLength: 1

ObjectName¶

Underlying type: string

ObjectName refers to the name of a Kubernetes object. Object names can have a variety of forms, including RFC 1123 subdomains, RFC 1123 labels, or RFC 1035 labels.

Validation: - MaxLength: 253 - MinLength: 1

Appears in: - PoolObjectReference

PoolObjectReference¶

PoolObjectReference identifies an API object within the namespace of the referrer.

Appears in: - InferenceModelRewriteSpec - InferenceObjectiveSpec

Field	Description	Default	Validation
`group` Group	Group is the group of the referent.	inference.networking.k8s.io	MaxLength: 253 Pattern: `^$\\|^[a-z0-9]([-a-z0-9][a-z0-9])?(\.[a-z0-9]([-a-z0-9][a-z0-9])?)*$`
`kind` Kind	Kind is kind of the referent. For example "InferencePool".	InferencePool	MaxLength: 63 MinLength: 1 Pattern: `^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$`
`name` ObjectName	Name is the name of the referent.		MaxLength: 253 MinLength: 1 Required: {}

TargetModel¶

TargetModel defines a weighted model destination for traffic distribution.

Appears in: - InferenceModelRewriteRule

Field	Description	Default	Validation
`weight` integer	(The following comment is copied from the original targetModel) Weight is used to determine the proportion of traffic that should be sent to this model when multiple target models are specified. Weight defines the proportion of requests forwarded to the specified model. This is computed as weight/(sum of all weights in this TargetModels list). For non-zero values, there may be some epsilon from the exact proportion defined here depending on the precision an implementation supports. Weight is not a percentage and the sum of weights does not need to equal 100. If a weight is set for any targetModel, it must be set for all targetModels. Conversely weights are optional, so long as ALL targetModels do not specify a weight.		Maximum: 1e+06 Minimum: 1
`modelRewrite` string