Part II: Microsoft Fabric GraphQL Performance

This blog post is second part of my Fabric GraphQL testing posts. You can find the first part HERE.

I have been using Fabric SQL Analytics Endpoint to build solutions in where Fabric data is used outside of Fabric. You can quite easily integrate into endpoint and fetch data with .NET app by using the MSAL authentication library (Only Service Principal access is supported at the moment). However the SQL Endpoint performance hasn’t been really consistent and that can cause issues if you want to build customer facing apps that are directly integrated into Fabric.

Test Data for Benchmarking

I wanted to do some benchmarking against the Fabric GraphQL API to see how it performs. For the test I setup three different delta tables which tries to mimic real life use cases. First table holds 500 000 rows, second has 1 million rows and third one contains 10 million rows. Each of the tables has random data inserted into 8 different columns: two integer columns, two boolean columns, two date columns and two string columns. I used following script to generate the test data:

# Generate 500k rows of data for testing which has 8 columns: two integers, two booleans, two dates and two strings
from pyspark.sql import Row
from random import randint, choice
from datetime import datetime, timedelta

def random_date(start, end):
    return start + timedelta(
        seconds=randint(0, int((end - start).total_seconds()))
    )

start_date = datetime.strptime('1/1/2008 1:30 PM', '%m/%d/%Y %I:%M %p')
end_date = datetime.strptime('1/1/2025 4:50 AM', '%m/%d/%Y %I:%M %p')

data = []
for i in range(500000):
    data.append(Row(
        randint(0, 1000),
        randint(0, 1000),
        choice([True, False]),
        choice([True, False]),
        random_date(start_date, end_date),
        random_date(start_date, end_date),
        ''.join(choice("abcdefghijklmnopqrstuvwxyz") for i in range(10)),
        ''.join(choice("abcdefghijklmnopqrstuvwxyz") for i in range(10))
    ))

schema = spark.createDataFrame(data).schema
df = spark.createDataFrame(data, schema)
df.write.mode("append").option("mergeSchema", "true").saveAsTable("test_data_10M")

One thing to note here is that test data in different tables has different values. 500k first rows in 1M table has different values than the 500k row table. When using the queries with limits this can lead into bit different performance times as the other tables might contain suitable rows earlier than the other. To tackle this I also added test with count(*) so that Fabric needs to iterate through all rows in the table (or use whatever it does for counting the rows).

The test data looks like this:

Column names are automatically filled as it does not make any difference in performance benchmarks.

I wanted to use SQL Endpoint as reference point to see how much overhead the GraphQL API adds. I used SELECT COUNT(*) FROM [test_data_500k] WHERE [_5] >= ‘2020-01-01T00:00:00Z’ AND [_5] <= ‘2020-01-02T00:00:00Z’ query to do the reference run. For benchmarking I decided to use .NET 9.0 console application with BenchmarkDotNet library. I also don’t download any actual data from the Fabric during the test. I just run the query and end benchmark as it receives response from the server.

This is the result of SQL Endpoint run with 2CU’s.

Full list of details:
Mean = 145.186 ms, StdErr = 0.768 ms (0.53%), N = 26, StdDev = 3.916 ms
Min = 136.702 ms, Q1 = 142.807 ms, Median = 145.057 ms, Q3 = 147.288 ms, Max = 153.817 ms
IQR = 4.481 ms, LowerFence = 136.085 ms, UpperFence = 154.010 ms
ConfidenceInterval = [142.325 ms; 148.046 ms] (CI 99.9%), Margin = 2.861 ms (1.97% of Mean)
Skewness = 0.02, Kurtosis = 2.73, MValue = 2

The mean performance is around 145ms which is really good. This test was done during Sunday, so there aren’t lots of traffic in the Fabric in general that time, but I don’t think the traffic affects that much on query performance. It seems to affect a lot on Spark session setup times thou.

GraphQL Performance

For GraphQL I wanted to do the performance benchmark with all three test tables that I had. To compare against SQL Analytics Endpoint I did the first run with 2 CU’s. This is the GraphQL query that I used:

query {
  countAnswers(
    filter: {
      _5_gte: "2020-01-01T00:00:00Z",
      _6_lte: "2020-01-02T00:00:00Z"
    }
  )
}

And here are the results:

It is bit confusing to see that row count in test tables does not affect the query performance, but I didn’t have time to dig deeper how Delta Lake format can handle the count query. To compare against the SQL Endpoint it seems that GraphQL is 28,6ms slower. I think the great part of the performance difference is caused by the .NET HTTP Client library vs. SQL Client. By using the .NET app the GraphQL is approx. 20% slower than using the SQL Endpoint.

I also did the test run with 4 CU’s to see how the compute unit count affects the storage performance.

As the result shows, the CU count does not affect the storage speed. All measurements are withing the error rates.

GraphQL Query Performance

I also wanted to see how the data queries affects the performance. I did two different runs by using 2 CU’s and 4 CU’s with following query:

{ 
  test_data_500ks(filter: { _1:  { eq: 104 } }, first: 50) 
  { 
    items 
    { 
      _1, 
      _2 
    } 
  } 
}

This query returns first 50 rows that has the value of 104 in _1 column and returns the values of _1 and _2 columns. I added 50 rows as limit, because GraphQL can return only 100 rows in response. If you want to receive more rows, you have to use GraphQL pagination feature.

2 CU Run

This is the result of simple query with 2 CU’s

The performance is in good level. Again for some reason the bigger table is performing better than the smaller one. The test data structure might cause this as the 500k row table might have suitable rows later in the data compared to 1M table. The overall performance is still in very good level.

4 CU Run

This is the result of simple query run with 4 CU’s

As expected the CU amount does not affect the query performance.

But

As you might have expected there is a but. These performance tests shows how good the API is when you call it multiple times in a row, but as seen in this picture, this is how the first call runtime looks like:

See that 17 seconds. It seems that the first call can be quite slow (my tests does not measure the time that is used for authentication). I don’t know what is the reason for this, but the first call is always way slower. I suspect that this could be related to caching in Fabric. The measurement above was done with ColdStart strategy so the BenchmarkDotNet did not try to optimize the first call away.

As I wanted to dig bit deeper into this, I wrote a simple .NET application that uses stopwatch to measure how long it takes to make the first call into API.

As seen here the MSAL login takes 864ms. to complete, but the first run is now 1.6 seconds. Based on my measurements the first run is usually around 3 to 4 seconds and the subsequent are around 400ms. Some of the slowness in first run in my manual test is coming from the .NET HTTP client instantioning and serialization.

This is the console app code that I used:

var timer = new Stopwatch();

timer.Start();
var app = ConfidentialClientApplicationBuilder.Create(Configuration.ClientId)
    .WithClientSecret(Configuration.Secret)
    .WithAuthority(new Uri($"https://login.microsoftonline.com/{Configuration.TenantId}"))
    .Build();
var authenticationResult = await app.AcquireTokenForClient(["https://api.fabric.microsoft.com/.default"]).ExecuteAsync();
HttpClient client = new();
client.DefaultRequestHeaders.Add("Authorization", $"Bearer {authenticationResult.AccessToken}");

Console.WriteLine($"Time taken to login: {timer.ElapsedMilliseconds} ms");
var query = new
{
    query = "{ test_data_500ks(filter:  { _1: { eq: 104 } }, first: 50) { items { _1, _2 } } }"
};

var content = new StringContent(JsonSerializer.Serialize(query), Encoding.UTF8, "application/json");
timer.Restart();
HttpResponseMessage response = await client.PostAsync(Configuration.FabricGraphqlEndpoint, content);
response.EnsureSuccessStatusCode();

Console.WriteLine($"Time taken to query 500k rows: {timer.ElapsedMilliseconds} ms");

I wanted to inspect a little bit more that first call slowness and added some wait time between API calls. Results indicated that after 60 seconds the API call gets slower and near 10 minutes mark the API call is at cold start level. I didn’t test what is the exact time to get back to cold start level, but as seen in the picture below the cold start is quite a lot if you want to build a customer facing app.

Summary

Lets wrap this post. GraphQL performance is surprisingly good in Fabric. The amount of data or CU units didn’t affect the query speed a lot. GraphQL and SQL Analytics Endpoints seems to have similar speeds in queries which indicates that GraphQL does not add much of an overhead. Fabric however has a problem, that first endpoint query or GraphQL API call takes longer to complete, than the subsequent ones. I suspect that Fabric is caching the caller auth + some other things for API calls and if you don’t use API that often your session will be cleaned from the cache.