AI Toolkit for Visual Studio Code is a “relatively” new toolset, that was previously known as Windows AI Studio. Toolkit is an extension, that allows developers to run SLMs (Small Language Models) on their local machine, or use Azure AI Studio LLM models from cloud. As you can easily use the Azure AI Studio models without any extensions like this, I find the local models more interesting part in this tool.
In this post I’m going to load Phi-3-mini-128K-instruct-onnx model into my own computer with this extension and then build a simple Semantic Kernel app to test it out. This extension has a model playground built-in and it provides all the tools to test out models quickly. You can do the same thing in Azure AI Studio, but if you wan to run small models in there (like ones from Huggingface), you need to setup a VM into Azure. With this extension you can run them on your own computer without worrying about costs of those VMs.
I’m going to use my own computer to run models in this post. My computer has AMD Ryzen 9 5900X and NVIDIA GeForce 2060 Super GPU. At this point I would also like to point out, that you do need a NVIDIA GPU to run models locally with this tool. You can use cloud models to avoid this problem, if you don’t have suitable GPU.
Quick Start
To begin, simply install the extension from the VS Code marketplace.
Tool has only 13K downloads, so I think it is not that widely known. Hope this post will boost it’s install rates!
The toolkit has lots of features, that we can list by typing ai toolkit into VS Code command palette. There are commands like Install Conda in Linux… I don’t know why, but you have it in there.
The more interesting part is the left side menu, that you get after installing the extension. With this menu we can downloads models and load them directly into build-in playground. As with almost all SLMs the PHI-3-mini does not know Finnish (or other language than English) very well even though it promises to do so. I asked it to translate “How may I help you with it” and it returned answer that is not a proper Finnish. The individual words are, but the sentence as a whole does not make any sense. Correct answer would be “Miten voin auttaa sinua“. Anyway this problem is not related to this tool. It is just how SLMs behave currently.
Use Local Models from App
The AI Toolkit comes with a local REST API web server, that uses the OpenAI chat completions compatible format. This enables us to easily test our applications locally – using the endpoint http://127.0.0.1:5272/v1/chat/completions. This means that we can use these local models easily from Semantic Kernel with its OpenAI connector.
In this sample app I’m using Microsoft.SemanticKernel.Core and Microsoft.SemanticKernel.Connectors.OpenAI NuGet packages. App is simple console app, that uses the local model to answer questions. ModelId parameter needs to be set as what is loaded into AI Toolkit playground (name of the model directly). The URL is http://127.0.0.1:5272 and apikey can be left empty, because we are running this in localhost.
I also added simple stopwatch in this sample code to see how long it takes to answer questions and on my computer it took around ~6 seconds per question. The perf varied from three seconds to ten, but I think the average was near 6. I didn’t spend any time to optimize this, so you could better perf. if you optimize your setup a little bit.
using Microsoft.SemanticKernel; var kernel = Kernel.CreateBuilder() .AddOpenAIChatCompletion( modelId: "Phi-3-mini-128k-cpu-int4-rtn-block-32-onnx", endpoint: new Uri("http://127.0.0.1:5272"), apiKey: string.Empty) .Build(); Console.WriteLine("Hello and welcome to use SLM local models. Ask me any question and I will try to answer it."); var question = Console.ReadLine() ?? string.Empty; var watch = System.Diagnostics.Stopwatch.StartNew(); var response = await kernel.InvokePromptAsync(question); Console.WriteLine(response + $" : It took {watch.ElapsedMilliseconds} ms. to answer the question.");
Visual Studio Code shows a log of questions etc. so you can verify from there, that your local code is really using the local model.
Summary
There are many tools out there for running local models and AI Toolkit for Visual Studio Code is just one of them. You can use the almighty Ollama (Jussi has great post about that you can find it from here), or you can use LLM Studio as I have pointed out in my previous blog post, or you can go with tool like AI Toolkit.
I think there is not a much of a difference in these tools. They can all load models and run them locally + show some logs about what is happening behind the scene. What else you really need? If you want to have more control over things then I would recommend the LLM Studio as it has good GUI and lots of features under the bonnet.
For AI Toolkit I like the concept of Projects as it creates you a bunch of helpful script files, that you can use to quickly setup environments, but other than that. It is just an another tool in the arsenal.