Skip to content

Force Azure OpenAI to response with JSON data

Photo by Pixabay: https://www.pexels.com/photo/brown-wooden-mallet-near-brown-chicken-egg-40721/

Aug 7, 2024 Microsoft announced, that they are now supporting JSON mode Azure OpenAI. This mode forces OpenAI to response only with JSON data. You could kind of do the same thing earlier, but the OpenAI response didn’t always contain only valid JSON. There could be some additional text before and after the JSON data. However now you can force the OpenAI to stay strictly in JSON format.

Supported Models

Force JSON is supported currently in these models

  • gpt-35-turbo (1106)
  • gpt-35-turbo (0125)
  • gpt-4 (1106-Preview)
  • gpt-4 (0125-Preview)

The API support was added in API version 2023-12-01-preview, so it was added a bit earlier.

Regular JSON queries

If we don’t force the JSON mode and ask following question from OpenAI “Which athlete won the most medals in paraolympics at 2020?” we will receive these kind of answers:

{
  "response": {
    "name": "Diede de Groot",
    "country": "Netherlands",
    "medals": {
      "gold": 2
    },
    "sport": "Wheelchair Tennis"
  }
}

That was the result for first run and it is pure JSON, but as seen in this second answer the data is not always only JSON.

At the Tokyo 2020 Paralympic Games, Chinese swimmer Zheng Tao was one of the top medalists. He won a total of five gold medals, all in swimming events. However, it is important to check the latest and specific records to determine if he was the athlete with the most medals, as there could be other athletes who also performed exceptionally. Please verify from official sources for the most accurate and up-to-date count.

Here is the JSON representation of Zheng Tao's medal count at the Tokyo 2020 Paralympic Games:

```json
{
  "athlete": "Zheng Tao",
  "country": "China",
  "sport": "Swimming",
  "medals": {
    "gold": 5,
    "silver": 0,
    "bronze": 0,
    "total": 5
  },
  "paralympics": "Tokyo 2020"
}
```

The answer type quite drastically changed and also the athlete name changed. I couldn’t find what is the real answer from internet, but I think it could be related to swimming. Anyway Let’s see how we can force the JSON mode and how the answers will change.

Force JSON Mode

Forcing JSON mode is done by setting response format to ChatResponseFormat.JsonObject. We also need to have a word JSON in system prompt. If you set the response format, but don’t provide the JSON in prompt, you will receive a bad request from OpenAI.

                string endpoint = GetEnvironmentVariable("OpenAi_Endpoint");
                string key = GetEnvironmentVariable("OpenAi_ApiKey");

                AzureKeyCredential credential = new(key);
                AzureOpenAIClient azureClient = new(new Uri(endpoint), credential);
                ChatClient chatClient = azureClient.GetChatClient(GetEnvironmentVariable("OpenAi_ModelId"));

                var messages = new List<ChatMessage>
                {
                    ChatMessage.CreateSystemMessage("You are a helpful assistant designed to output JSON."),
                    ChatMessage.CreateUserMessage("Which athlete won the most medals in paraolympics at 2020?"),
                };

                var response = await chatClient.CompleteChatAsync(messages, new ChatCompletionOptions { ResponseFormat = ChatResponseFormat.JsonObject });
                response.Value.Content.ForEach(message => _logger.LogInformation(message.Text));

Now if we run this code three times, we will notice that the response stays in consistent format:

First run:
{
  "response": "The athlete who won the most medals at the Tokyo 2020 Paralympic Games was Jessica Long from the United States. She won a total of 6 medals in swimming, including 4 gold, 1 silver, and 1 bronze."
}

Second:
{
  "response": {
    "most_medals_won_by_athlete_at_2020_Paralympics": "Jessica Long",
    "total_medals_won": 6,
    "medals_breakdown": {
      "gold": 2,
      "silver": 3,
      "bronze": 1
    },
    "sport": "Swimming",
    "country": "USA"
  }
}

Third:
{
  "response": "The athlete who won the most medals at the 2020 Tokyo Paralympics was Jessica Long from the United States. She is a swimmer who has won a total of six medals at these Games, comprising 3 gold, 2 silver, and 1 bronze."
}

I didn’t provide any exact information about the schema that OpenAI needs to return, but you can add that also into system prompt. For example prompt “You are a helpful assistant designed to output JSON. Return response in response: {result} schema.”, will always return response in {“result”:”…”} format.

Summary

Force JSON mode is a nice little trick to make sure, that OpenAI always returns valid JSON response without any forewords. I think we will see some other options in near future and personally I am waiting for some RAG options like forced SQL or graph queries responses.

If you want to learn more about this feature, check out this Microsoft official documentation.

Tags: