Judge implementation that handles evaluation functionality and conversation management.

According to the AIEval spec, judges are AI Configs with mode: "judge" that evaluate other AI Configs using structured output.

Hierarchy

  • Judge

Constructors

Properties

_aiConfig: LDAIJudgeConfig
_logger?: LDLogger
_runner: Runner
_sampleRate: number = 1.0

Accessors

  • get sampleRate(): number
  • The default sampling rate baked in at construction. Used by evaluate / evaluateMessages when no per-call rate is supplied.

    Returns number

Methods

  • Builds the evaluation input string passed to the runner.

    Combines the original prompt and the response into a single, well-known format the judge model is expected to evaluate.

    Parameters

    • input: string
    • output: string

    Returns string

  • Gets the evaluation metric key from the judge AI config. Treats empty strings and whitespace-only strings as invalid.

    Returns undefined | string

    The evaluation metric key, or undefined if not available

  • Parses the structured evaluation response. Expects top-level {score, reasoning}. Returns score and reasoning, or undefined if parsing fails.

    Parameters

    • data: undefined | Record<string, unknown>

    Returns undefined | {
        reasoning: string;
        score: number;
    }

  • Evaluates an AI response using the judge's configuration.

    Parameters

    • input: string

      The input prompt or question that was provided to the AI

    • output: string

      The AI-generated response to be evaluated

    • Optional samplingRate: number

      Sampling rate (0-1) to determine if evaluation should be processed. When omitted, the Judge's constructor-default rate is used. An explicit 0 overrides the default — only undefined falls through.

      Optional

    Returns Promise<LDJudgeResult>

    Promise that resolves to evaluation results

  • Evaluates an AI response from chat messages and a runner result.

    Each message is rendered as <role>: <content> so the judge model can distinguish speakers in the message history. Messages are joined with a single newline.

    Parameters

    • messages: LDMessage[]

      Array of messages representing the conversation history

    • response: RunnerResult

      The runner result containing the AI-generated content to evaluate

    • Optional samplingRatio: number

      Sampling ratio (0-1). When omitted, the Judge's constructor-default rate is used.

      Optional

    Returns Promise<LDJudgeResult>

    Promise that resolves to evaluation results

Generated using TypeDoc