Skip to content

Develop LLM Solutions In Local Environment

Photo by David Bartus: https://www.pexels.com/photo/black-audio-mixer-690779/

Sometimes you want to develop or experiment with LLM APIs without having to submit your data to Azure Open AI or any other LLM service. Instead, you simply want to run a basic LLM AI locally to test whether your idea works. Jussi has authored some informative blog posts about Ollama, which indeed is a great tool for this situation, but it might not be the best…

LM Studio

LM Studio is a desktop application for running local LLMs on your computer. It is not open source, but it is free of charge for personal usage. When you start the app you will first see this quite chaotic home scene. Don’t worry about it too much, you can just focus on that search field. Type the name of the LLM model you would like to use in your testing. For example I recently used the LLAMA 3 8B, which you can download by typing the name into search box, click Go and finally click Download from result page.

Home scree can feel a bit confusing at beginning

If you just want to try out the model, use the Chat feature from left menu. Load model you want to use and start chatting. You can change the default system prompt from right side.

Chat is nice tool to play with different models

Develop Solutions

For us the developers, the more interesting part is the local server mode. You can local HTTP server with selected model by loading the model and then clicking Start Server button. This starts HTTP server at your local machine, which you can access by using http://localhost:1234/v1/… address. LM Studio provides three different endpoints for different use cases:

  • v1/chat/completions for chat like API, which can store messages and context
  • v1/completions for simple Q&A type of conversation without worrying about memory etc.
  • v1/embeddings to generate search vectors from text data.

When the server is running we can easily access it by using HttpClient in C#. We just need to provide model, prompt and some basic settings like temperature in request. You don’t have to worry about authentication because the HTTP server is running on your local machine.

using var client = new HttpClient();

var request = new HttpRequestMessage
{
    Method = HttpMethod.Post,
    RequestUri = new Uri("http://localhost:1234/v1/completions"),
    Content = JsonContent.Create(
        new
        {
            model = "lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF", // Name of the model which is loaded
            prompt = prompts,
            temperature = 0.2, // 0.5 is default temperature, but I find 0.2 more suitable for my needs
            max_tokens = -1,
            stream = false
        })
    };

var response = await client.SendAsync(request);
if (response.IsSuccessStatusCode)
{
    var responseContent = await response.Content.ReadAsStringAsync();
    return responseContent; // do something with the response from LM STudio
}
else
{
    throw new HttpRequestException($"Response status code does not indicate success: {(int)response.StatusCode} ({response.StatusCode}).");
}

The best aspect of LM Studio is that its web server API closely resembles the Azure Open AI API. This means that once you’re done experimenting, you can easily transition your solution to use Azure.

Summary

I personally usually use the LM Studio to test different models and to try out if the LLM can do what I want it to do. For example to see if it can classify given text or extract data as JSON from text. Tool itself is easy to setup and use. LM Studio provides nice server log when completion API’s are invoked and you can monitor how the response is build by the model. I like how you can quickly evaluate different prompts in chat mode and store the chat sessions as files. On overall the tool feels like the Postman in good old days, when it was not bloated by mandatory sign-ins and workspaces. UI is simple enough and you get things done quickly.

Leave a Reply

Your email address will not be published. Required fields are marked *