Skip to content

DevOps: Leverage AI to Diagnose and Resolve CI/CD Pipeline Errors

Photo by Google DeepMind: https://www.pexels.com/photo/ai-graphic-design-25626520/

CI/CD pipelines are a critical part of modern software development, but diagnosing errors when they arise can be difficult and time-consuming. In this post, we’ll look at how to use Azure OpenAI to help explain and troubleshoot common CI/CD errors. By applying AI to understand error messages and identify root causes, you can make it easier to debug errors.

Prerequisites

Before we start, we need to setup few things into Azure:

  • Create new consumption based Azure Function (+App Insights).
  • Create Azure Open AI service + gpt-4 model deployment. Check this page for more info about that.
  • Ensure you have enough permissions to setup service hooks at Azure DevOps.

Architecture

Our architecture starts from Azure DevOps build pipeline. Pipeline has active service hook that will notify our Azure Function app when build fails. Function app will then post the error message into Azure OpenAI and retrieve the possible solution for error. Finally the function app will post solution into Slack channel so that our SRE’s can see it.

Setup Service Hook

Lets start this exercise by setting up a new service hook. You can set service hooks from Azure DevOps settings page. Use Web Hooks action.

Set service hook to trigger when build completes and build status is Failed.

Next we need to enter our function app URL with token. You can view this easily from Azure Portal after you have deployed the function app. At this point you can use some dummy value or if you already know the app name, then use it. Just remember to have to api/(functionname) & token in the URL. You don’t need to use any other authentication methods.

One last note about service hooks: If your function app returns error code for DevOps, then the service hook can go into “enabled restricted mode“. This can lead into situation, where your service hook is not triggered properly. You can enable the hook by clicking … button next to consumer text on that service hook row. There is disable/enable options available in the context menu.

Azure Function

Now we have our service hook setup and it should call our function app when build fails. At this point you can setup your sample (test) build to fail and run the pipeline and verify from service hook page, that it actually runs the hook. Service hook page has nice “history” feature, which can be used to view old requests that hook has done. We can easily see what kind of payload the hook is sending to our function from this view.

For our function app we will use two NuGet packages: Azure.AI.OpenAI and Slack.Webhooks. We could also use the Microsoft.TeamFoundationServer.Client library to deserialize the payload from Azure DevOps, but the problem is, that it does not have detailedMessage property on Build class. So we cannot easily access the build error message if we use that class. For this reason I decided to make a simple class, that can be used to deserialize the payload:

namespace ExplainBuildFailure
{
    public class BuildFailedEvent
    {
        public DetailedMessage detailedMessage { get; set; }
        public Resource resource { get; set; }
    }

    public class DetailedMessage
    {
        public string text { get; set; }
        public string html { get; set; }
        public string markdown { get; set; }
    }

    public class Resource
    {
        public string result { get; set; }
    }
}

Now the actual function app implementation is rather simple. Only problem is, that error message contains some extra text at the beginning of the text, which we need to get rid of.

using Azure.AI.OpenAI;
using Azure;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.Functions.Worker;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.Logging;
using OpenAI.Chat;
using System.Text.Json;
using Slack.Webhooks;

namespace ExplainBuildFailure
{
    public class ExplainBuildFailure
    {
        private readonly ILogger<ExplainBuildFailure> _logger;
        private readonly IConfiguration _configuration;

        public ExplainBuildFailure(ILogger<ExplainBuildFailure> logger, IConfiguration configuration)
        {
            _logger = logger;
            _configuration = configuration;
        }

        [Function("ExplainBuildFailure")]
        public async Task<IActionResult> Run([HttpTrigger(AuthorizationLevel.Function, "get", "post")] HttpRequest req)
        {
            try
            {
                var requestBody = await new StreamReader(req.Body).ReadToEndAsync();
                _logger.LogInformation($"Received request with data {requestBody}");

                var build = JsonSerializer.Deserialize<BuildFailedEvent>(requestBody);
                if (build is null)
                {
                    _logger.LogError("Could not deserialize body data as BuildFailedEvent.");
                    return new BadRequestObjectResult("Invalid build data");
                }

                if (!string.Equals(build.resource.result, "failed"))
                {
                    _logger.LogError($"Invalid build result {build.resource.result}. Call only for failed builds.");
                    return new BadRequestObjectResult("Call only for failed builds");
                }

                var errormessage = build.detailedMessage.text;
                // Remove all until first - character in error message
                var index = errormessage.IndexOf('-');
                errormessage = errormessage[(index + 1)..];

                string endpoint = GetEnvironmentVariable("OpenAi_Endpoint");
                string key = GetEnvironmentVariable("OpenAi_ApiKey");

                AzureKeyCredential credential = new(key);
                AzureOpenAIClient azureClient = new(new Uri(endpoint), credential);
                ChatClient chatClient = azureClient.GetChatClient(GetEnvironmentVariable("OpenAi_ModelId"));

                ChatCompletion completion = chatClient.CompleteChat(
                  [
                    new SystemChatMessage("You are assistant that helps to resolve issues in CI/CD pipelines that are build into Azure DevOps"),
                    new UserChatMessage(errormessage)
                  ],
                  new ChatCompletionOptions()
                  {
                      Temperature = (float)0.7,
                      MaxTokens = 800,
                      FrequencyPenalty = 0,
                      PresencePenalty = 0,
                  }
                );

                // Post completion to Slack
                var slackClient = new SlackClient(GetEnvironmentVariable("Slack_Url"));
                await slackClient.PostAsync(new SlackMessage
                {
                    Channel = GetEnvironmentVariable("Slack_Channel"),
                    Text = completion.Content[0].Text
                });
                
                return new OkObjectResult("Success");
            }
            catch (Exception ex)
            {
                return new BadRequestObjectResult(ex.Message);
            }
        }

        private string GetEnvironmentVariable(string name)
        {
            return Environment.GetEnvironmentVariable(name, EnvironmentVariableTarget.Process) ?? string.Empty;
        }
    }
}

Lets go through what we got here. First we deserialize the data into our BuildFailedEvent class for easier usage. We also have some guard clauses at the beginning of function to protect our app from invalid API calls.

Then we will parse the error message from detailedMessage property and pass it to Azure OpenAI. I use OpenAi_ApiKey, OpenAi_ModelId and OpenAi_Endpoint environment variables in this code sample. For development purpose you can set these into local.settings.json file inside Values tag. I don’t know why functions does not support secrets.json yet, but that’s how it is now. Remember to set these into Function App environment variables at Azure Portal!

After receiving the possible solution from OpenAI, we will post it into Slack channel by using SlackClient library. For this purpose you need to setup Slack_Url and Slack_Channel environment variables. Create new Slack App, that has incoming web hooks setup for this purpose. You can easily replace the Slack integration with Teams integration or what ever you want to. You can also return the completion.Content[0].Text value from function app and then chain this function app into Logic App or something like that.

Next publish the Function App from Visual Studio (right click project, or create build pipeline for it).

Remember to setup the correct function app URL into Azure DevOps service hook after you have deployed the app.

Testing That Everything Works

Now lets test, that everything works. I attached the service hook into one of my .NET sample builds, which I know will fail. Running the pipeline shows, that build fails and service hook history shows what data is sent into Azure Function.

Build failed into some weird .NET exception…

Our function will pick up this error message from the service hook payload and pass it into Open AI:

Microsoft.PackageDependencyResolution.targets (266): C:\Program Files\dotnet\sdk\8.0.303\Sdks\Microsoft.NET.Sdk\targets\Microsoft.PackageDependencyResolution.targets(266,5): Error NETSDK1004: Assets file ‘D:\a\1\s\SampleWebApi\obj\project.assets.json’ not found. Run a NuGet package restore to generate this file.\r\n- Microsoft.PackageDependencyResolution.targets (266): C:\Program Files\dotnet\sdk\8.0.303\Sdks\Microsoft.NET.Sdk\targets\Microsoft.PackageDependencyResolution.targets(266,5): Error NETSDK1004: Assets file ‘D:\a\1\s\SampleTestProject\obj\project.assets.json’ not found. Run a NuGet package restore to generate this file.\r\n- Process ‘msbuild.exe’ exited with code ‘1’.\r\n

Open AI returned these possible solutions for the error:

For this test I just tried to apply all the possible solutions without using my own brains too much. First I tried to add the dotnet restore as the AI suggested it and you know what. The build succeeded after that!

Debugging

If you encounter any problems with this sample code here are some tips to help in debugging:

  • If you have problems with DevOps / payload handling. Check Azure DevOps service hook page for more info. You can copy paste the payload from history view and send it into Azure Function with Postman (or Bruno). Run Function App in local dev environment and post the data into your localhost URL.
  • If you receive Azure Open AI errors, check that you have deployment in Azure AI Studio for your Open AI account. Verify that deployment name, URL and API key are correct in environment variables.
  • Try to get pieces working one by one. Verify that service hook is working (for example use the webhook.site to debug it). Verify that function app is working and finally verify that Azure Open AI is working correctly. Then glue everything together.

Summary

Combining AI into our DevOps pipelines can help us solve problems faster. Usually the problems with pipelines are related to some easy things like missing permissions, wrong paths or missing steps in pipeline. As seen in my test case you can actually receive valuable information from AI and use it to fix pipelines faster. Doing this automatically and posting the possible solutions into Slack/Teams helps the DevOps teams to retrieve this information easily and without any effort.