The SageMakerEndpointInput interface defines the input parameters for the SageMakerEndpoint class, which includes the endpoint name, client options for the SageMaker client, the content handler, and optional keyword arguments for the model and the endpoint.

interface SageMakerEndpointInput {
    cache?: boolean | BaseCache<Generation[]>;
    callbackManager?: CallbackManager;
    callbacks?: Callbacks;
    clientOptions: SageMakerRuntimeClientConfig;
    concurrency?: number;
    contentHandler: SageMakerLLMContentHandler;
    endpointKwargs?: Record<string, unknown>;
    endpointName: string;
    maxConcurrency?: number;
    maxRetries?: number;
    metadata?: Record<string, unknown>;
    modelKwargs?: Record<string, unknown>;
    onFailedAttempt?: FailedAttemptHandler;
    streaming?: boolean;
    tags?: string[];
    verbose?: boolean;
}

Hierarchy

  • BaseLLMParams
    • SageMakerEndpointInput

Properties

cache?: boolean | BaseCache<Generation[]>
callbackManager?: CallbackManager

Use callbacks instead

callbacks?: Callbacks
clientOptions: SageMakerRuntimeClientConfig

Options passed to the SageMaker client.

concurrency?: number

Use maxConcurrency instead

The content handler class that provides an input and output transform functions to handle formats between LLM and the endpoint.

endpointKwargs?: Record<string, unknown>

Optional attributes passed to the InvokeEndpointCommand

endpointName: string

The name of the endpoint from the deployed SageMaker model. Must be unique within an AWS Region.

maxConcurrency?: number

The maximum number of concurrent calls that can be made. Defaults to Infinity, which means no limit.

maxRetries?: number

The maximum number of retries that can be made for a single call, with an exponential backoff between each attempt. Defaults to 6.

metadata?: Record<string, unknown>
modelKwargs?: Record<string, unknown>

Key word arguments to pass to the model.

onFailedAttempt?: FailedAttemptHandler

Custom handler to handle failed attempts. Takes the originally thrown error object as input, and should itself throw an error if the input error is not retryable.

streaming?: boolean
tags?: string[]
verbose?: boolean