API Reference¶
Packages¶
inference.networking.x-k8s.io/v1alpha2¶
Package v1alpha2 contains API Schema definitions for the inference.networking.x-k8s.io API group.
Resource Types¶
Group¶
Underlying type: string
Group refers to a Kubernetes Group. It must either be an empty string or a RFC 1123 subdomain.
This validation is based off of the corresponding Kubernetes validation: https://github.com/kubernetes/apimachinery/blob/02cfb53916346d085a6c6c7c66f882e3c6b0eca6/pkg/util/validation/validation.go#L208
Valid values include:
- "" - empty string implies core Kubernetes API group
- "gateway.networking.k8s.io"
- "foo.example.com"
Invalid values include:
- "example.com/bar" - "/" is an invalid character
Validation:
- MaxLength: 253
- Pattern: ^$|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$
Appears in: - PoolObjectReference
InferenceModelRewrite¶
InferenceModelRewrite is the Schema for the InferenceModelRewrite API.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
inference.networking.x-k8s.io/v1alpha2 |
||
kind string |
InferenceModelRewrite |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec InferenceModelRewriteSpec |
|||
status InferenceModelRewriteStatus |
InferenceModelRewriteRule¶
InferenceModelRewriteRule defines the match criteria and corresponding action. For details on how precedence is determined across multiple rules and InferenceModelRewrite resources, see the "Precedence and Conflict Resolution" section in InferenceModelRewriteSpec.
Appears in: - InferenceModelRewriteSpec
| Field | Description | Default | Validation |
|---|---|---|---|
matches Match array |
|||
targets TargetModel array |
MinItems: 1 |
InferenceModelRewriteSpec¶
InferenceModelRewriteSpec defines the desired state of InferenceModelRewrite.
Appears in: - InferenceModelRewrite
| Field | Description | Default | Validation |
|---|---|---|---|
poolRef PoolObjectReference |
PoolRef is a reference to the inference pool. | Required: {} |
|
rules InferenceModelRewriteRule array |
InferenceModelRewriteStatus¶
InferenceModelRewriteStatus defines the observed state of InferenceModelRewrite.
Appears in: - InferenceModelRewrite
| Field | Description | Default | Validation |
|---|---|---|---|
conditions Condition array |
Conditions track the state of the InferenceModelRewrite. Known condition types are: * "Accepted" |
[map[lastTransitionTime:1970-01-01T00:00:00Z message:Waiting for controller reason:Pending status:Unknown type:Accepted]] | MaxItems: 8 |
InferenceObjective¶
InferenceObjective is the Schema for the InferenceObjectives API.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
inference.networking.x-k8s.io/v1alpha2 |
||
kind string |
InferenceObjective |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec InferenceObjectiveSpec |
|||
status InferenceObjectiveStatus |
InferenceObjectiveSpec¶
InferenceObjectiveSpec represents the desired state of a specific model use case. This resource is managed by the "Inference Workload Owner" persona.
The Inference Workload Owner persona is someone that trains, verifies, and leverages a large language model from a model frontend, drives the lifecycle and rollout of new versions of those models, and defines the specific performance and latency goals for the model. These workloads are expected to operate within an InferencePool sharing compute capacity with other InferenceObjectives, defined by the Inference Platform Admin.
Appears in: - InferenceObjective
| Field | Description | Default | Validation |
|---|---|---|---|
priority integer |
Priority defines how important it is to serve the request compared to other requests in the same pool. Priority is an integer value that defines the priority of the request. The higher the value, the more critical the request is; negative values are allowed. No default value is set for this field, allowing for future additions of new fields that may 'one of' with this field. However, implementations that consume this field (such as the Endpoint Picker) will treat an unset value as '0'. Priority is used in flow control, primarily in the event of resource scarcity(requests need to be queued). All requests will be queued, and flow control will always allow requests of higher priority to be served first. Fairness is only enforced and tracked between requests of the same priority. Example: requests with Priority 10 will always be served before requests with Priority of 0 (the value used if Priority is unset or no InferenceObjective is specified). Similarly requests with a Priority of -10 will always be served after requests with Priority of 0. |
||
poolRef PoolObjectReference |
PoolRef is a reference to the inference pool, the pool must exist in the same namespace. | Required: {} |
InferenceObjectiveStatus¶
InferenceObjectiveStatus defines the observed state of InferenceObjective
Appears in: - InferenceObjective
| Field | Description | Default | Validation |
|---|---|---|---|
conditions Condition array |
Conditions track the state of the InferenceObjective. Known condition types are: * "Accepted" |
[map[lastTransitionTime:1970-01-01T00:00:00Z message:Waiting for controller reason:Pending status:Unknown type:Ready]] | MaxItems: 8 |
Kind¶
Underlying type: string
Kind refers to a Kubernetes Kind.
Valid values include:
- "Service"
- "HTTPRoute"
Invalid values include:
- "invalid/kind" - "/" is an invalid character
Validation:
- MaxLength: 63
- MinLength: 1
- Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$
Appears in: - PoolObjectReference
Match¶
Match defines the criteria for matching the LLM requests.
Appears in: - InferenceModelRewriteRule
| Field | Description | Default | Validation |
|---|---|---|---|
model ModelMatch |
Model specifies the criteria for matching the 'model' field within the JSON request body. |
MatchValidationType¶
Underlying type: string
MatchValidationType specifies the type of string matching to use.
Validation: - Enum: [Exact]
Appears in: - ModelMatch
| Field | Description |
|---|---|
Exact |
MatchExact indicates that the model name must match exactly. |
ModelMatch¶
ModelMatch defines how to match against the model name in the request body.
Appears in: - Match
| Field | Description | Default | Validation |
|---|---|---|---|
type MatchValidationType |
Type specifies the kind of string matching to use. Supported value is "Exact". Defaults to "Exact". |
Exact | Enum: [Exact] |
value string |
Value is the model name string to match against. | MinLength: 1 |
ObjectName¶
Underlying type: string
ObjectName refers to the name of a Kubernetes object. Object names can have a variety of forms, including RFC 1123 subdomains, RFC 1123 labels, or RFC 1035 labels.
Validation: - MaxLength: 253 - MinLength: 1
Appears in: - PoolObjectReference
PoolObjectReference¶
PoolObjectReference identifies an API object within the namespace of the referrer.
Appears in: - InferenceModelRewriteSpec - InferenceObjectiveSpec
| Field | Description | Default | Validation |
|---|---|---|---|
group Group |
Group is the group of the referent. | inference.networking.k8s.io | MaxLength: 253 Pattern: ^$\|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$ |
kind Kind |
Kind is kind of the referent. For example "InferencePool". | InferencePool | MaxLength: 63 MinLength: 1 Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$ |
name ObjectName |
Name is the name of the referent. | MaxLength: 253 MinLength: 1 Required: {} |
TargetModel¶
TargetModel defines a weighted model destination for traffic distribution.
Appears in: - InferenceModelRewriteRule
| Field | Description | Default | Validation |
|---|---|---|---|
weight integer |
(The following comment is copied from the original targetModel) Weight is used to determine the proportion of traffic that should be sent to this model when multiple target models are specified. Weight defines the proportion of requests forwarded to the specified model. This is computed as weight/(sum of all weights in this TargetModels list). For non-zero values, there may be some epsilon from the exact proportion defined here depending on the precision an implementation supports. Weight is not a percentage and the sum of weights does not need to equal 100. If a weight is set for any targetModel, it must be set for all targetModels. Conversely weights are optional, so long as ALL targetModels do not specify a weight. |
Maximum: 1e+06 Minimum: 1 |
|
modelRewrite string |