Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrays and primitive types are not supported in structured output (native or otherwise) #5521

Open
kzu opened this issue Oct 15, 2024 · 3 comments · May be fixed by #5522
Open

Arrays and primitive types are not supported in structured output (native or otherwise) #5521

kzu opened this issue Oct 15, 2024 · 3 comments · May be fixed by #5522

Comments

@kzu
Copy link
Contributor

kzu commented Oct 15, 2024

If you need an array (or list) of typed values from completion rather than a single object, things fail whether you're using native json schema support or not. These two tests fail, for example:

    [ConditionalFact]
    public virtual async Task CompleteAsync_StructuredOutputArray()
    {
        SkipIfNotEnabled();

        var response = await _chatClient.CompleteAsync<Person[]>("""
            Who are described in the following sentence?
            Jimbo Smith is a 35-year-old software developer from Cardiff, Wales.
            Josh Simpson is a 25-year-old software developer from Newport, Wales.
            """,
            useNativeJsonSchema: _chatClient.Metadata.ProviderName == "openai");

        Assert.Equal(2, response.Result.Length);
        Assert.Contains(response.Result, x => x.FullName == "Jimbo Smith");
        Assert.Contains(response.Result, x => x.FullName == "Josh Simpson");
    }

    [ConditionalFact]
    public virtual async Task CompleteAsync_StructuredOutputPrimitive()
    {
        SkipIfNotEnabled();

        var response = await _chatClient.CompleteAsync<int>("""
            As of today (october 2024), Jimbo Smith is a 35-year-old software developer from Cardiff, Wales.
            Which year was he born in?
            """,
            useNativeJsonSchema: _chatClient.Metadata.ProviderName == "openai");

        Assert.Equal(1989, response.Result);
    }

    [ConditionalFact]
    public virtual async Task CompleteAsync_StructuredOutputEnum()
    {
        SkipIfNotEnabled();

        var response = await _chatClient.CompleteAsync<Architecture>("""
            I'm using a Macbook Pro with an M2 chip. What architecture am I using?
            """,
            useNativeJsonSchema: _chatClient.Metadata.ProviderName == "openai");

        Assert.Equal(Architecture.Arm64, response.Result);
    }

    [ConditionalFact]
    public virtual async Task CompleteAsync_StructuredOutputString()
    {
        SkipIfNotEnabled();

        var response = await _chatClient.CompleteAsync<string>("""
            The software developer, Jimbo Smith, is a 35-year-old software developer from Cardiff, Wales.
            What's his full name?
            """,
            useNativeJsonSchema: _chatClient.Metadata.ProviderName == "openai");

        Assert.Equal("Jimbo Smith", response.Result);
    }

    [ConditionalFact]
    public virtual async Task CompleteAsync_StructuredOutputBool()
    {
        SkipIfNotEnabled();

        var response = await _chatClient.CompleteAsync<bool>("""
            The software developer, Jimbo Smith, is a 35-year-old software developer from Cardiff, Wales.
            Is he a medical doctor?
            """,
            useNativeJsonSchema: _chatClient.Metadata.ProviderName == "openai");

        Assert.False(response.Result);
    }

Except for the first test, the others don't even compile due to a constraint on the T being a class, but there is really no need for that to be the case and it seriously restricts the usefulness of the typed API, forcing the creation of intermediate types just to hold those values (i.e. a superfluous Payload<T>).

The native JSON schema in OpenAI does require the schema type to be an object (not an array or anything else). But this could be solved by having the API itself providing the wrapping object automatically, so that users fall more easily in the proverbial pit of success.

kzu added a commit to kzu/extensions that referenced this issue Oct 15, 2024
Structured outputs in OpenAI require an `object` schema object. By detecting this situation and wrapping always in a `Payload<T>(T Data)` record, we significantly improve the developer experience by making the API transparent to that limitation (which might even be a temporary one?).

The approach works for both OpenAI as well as Azure Inference without native structured outputs. In order to signal a wrapped result to the `ChatCompletion<T>`, we use the `AdditionalProperties` dictionary with a non-standard `$wrapped` property which is a typical convention for JSON properties that are not intended for end-user consumption (like $schema).

Fixes dotnet#5521
@kzu kzu linked a pull request Oct 15, 2024 that will close this issue
@stephentoub
Copy link
Member

cc: @SteveSandersonMS

@SteveSandersonMS
Copy link
Member

SteveSandersonMS commented Oct 16, 2024

The native JSON schema in OpenAI does require the schema type to be an object (not an array or anything else). But this could be solved by having the API itself providing the wrapping object automatically, so that users fall more easily in the proverbial pit of success.

We could potentially do that, but it has other risks:

  1. The more this library takes responsibility for controlling the behavior, the more it becomes responsible for cases where it doesn't work or behaves inconsistently across LLMs.
  2. It's harder for developers to work out what's going on when looking at telemetry or debugging, since they don't just see their own object shapes, but rather something else provided by the library

Based on this we've taken the decision (so far at least - it could change) to limit how much magic this library does around structured output. This means:

  • For native structured output, try to pass through the functionality given, not extend it
  • For non-native structured output, be limited to an autogenerated prompt that just says "Output JSON in this schema: {schema}" and not give any other further hints about how to write the output. Because anything we write might have side effects on some LLMs, for example even just writing this in English will make the LLM more likely to respond in English, even if the developer/user wants the response in another language.

As such it's up to the developer to augment this behavior further if they want. For scalar outputs, they should provide their own wrapping object. And if the non-native prompt isn't sufficient, they should add their own further prompting.

It's a tricky area and we'll only know if the design is right once people start using this at scale in hundreds of different app scenarios. If we go on to collect evidence from further feedback that a particular change is warranted, we'll certainly be open to making that change. Hope that seems reasonable!

@kzu
Copy link
Contributor Author

kzu commented Oct 16, 2024

Hm. This is one of the areas that starts to feel like old EF where the LINQ you got was actually a VERY leaky abstraction that didn't hold in a ton of scenarios, requiring knowledge of what worked and what didn't. Here you see a T, where a large number of instances just won't work. Lots of frustration and unnecessary wrappers by everyone. I view it as a VERY low-hanging fruit, and something that I'm almost sure future OpenAI LLMs may likely solve natively too, since it just looks like an oversight (or a bug).

I don't see how automatically wrapping (AND passing that as the schema you tell the LLM to use, either natively or via your existing prompt) can have vastly different results as it already can have with the existing mechanism. My PR shows how you basically pass the same schema (wrapped top-level) in both cases, so the delta in behavior would be the same as it could be today.

Sure, this can end up in a package with a single extension method CompleteAsyncForAny<T>, say, so it's not as if it's a showstopper...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants