Self-Hosted AI Agents with LocalAI, LocalAGI, and LocalRecall
LocalAI is now integrated with Matrix on eom.dev! You can now chat with open source AI models by mentioning @LocalAI in Matrix rooms.
Simple Matrix Bot
This integration was originally powered by Baibot, an open source project created by etke.cc.
I had actually tried deploying a couple other bots for this purpose before finding Baibot. Unfortunately, these were difficult to configure in a Kubernetes environment, produced mysterious error messages that were difficult to diagnose, or were not able to support encryption.
I found Baibot easy to set up, well-documented, and featureful. I will be interested to explore other works from this organization.
We’re a group of individuals from Europe that have been active and recognized members of the Matrix community for ~7 years.
The etke.cc service was founded in 2021 by Nikita Chernyi, based on Slavi Pantaleev’s free-software work: the matrix-docker-ansible-deploy Ansible playbook - the most popular and sane way to deploy
Matrix on your own infrastructure. [1]
That said, Baibot only interfaces with the LocalAI API. Much more powerful options are available through LocalAGI connectors. Once I realized this, Baibot was removed after only a short period of time. Its deployment definition was saved in the ansible-role-localai repository for reference.
AI Agents
Agents are effectively workflows that combine AI output with certain actions, such as performing searches, using APIs, or executing code. An agent may loop between consulting the LLM and taking actions several times before producing its final output.
LocalAGI
Agents are defined through LocalAGI.
Connectors
Connectors define the user interface for an agent. The Matrix integration with LocalAI on eom.dev is now powered by a LocalAGI connector. In some ways, I actually preferred Baibot: it replied in threads, formatted messages properly, and supported E2E encryption. The LocalAGI Matrix connector lacks these features; however, the agentic workflows are far more interesting, so I will have to hope for an update or write one myself. There are additional connectors for email, IRC, Twitter, and more. For my purposes, I would be interested in a Mastodon connector; unfortunately, one does not yet exist.
Prompts
There are several differnet prompts that can be configured for the agent. The system prompt seems to be the most impactful, as it is sent alongside each chat completion; however, the identity guidance and long-term goals are available as well. In truth, I’m not certain what these do just yet.
Retrieval Augmented Generation
Retrieval augmented generation (RAG) is a technique for providing additional contextual information to large language models as a way of improving their output. Search results from Wikipedia, Google, or a local knowledge base may be injected into the user’s prompt to provide the model with up-to-date and relevant information to inform its response.
LocalAGI utilizes some RAG techniques on its own through the use of actions and MCP; however, the more advanced RAG functionality using vector databases and word embeddings is implemented in LocalRecall.
Actions
An action is something the agent is able to do. They are basically scripts that the agent can execute under certain circumstances. These actions can be anything from searching DuckDuckGo to executing trades on the Kraken cryptocurrency exchange. Anything that we can define programmatically can be used as an action in the workflow.
MCP Servers
Model Context Protocol (MPC) servers are an in-between layer for agents and web services that provide a standardized interface for models to utilize the APIs of these services.
MCP (Model Context Protocol) is an open-source standard for connecting AI applications to external systems. [2]
I have not set up any MCP connections for my agent just yet, though I have been browsing some of the available ones for my existing services such as Discourse, Grafana, and Gitea.
LocalRecall
LocalRecall handles memory for the AI agent through the use of vector databases and word embedding.
In short, sources from the knowledge base are transformed into vectors using the embeddings endpoint of LocalAI. These vectors are then stored in a vector database that comes packaged with LocalRecall and is queried for additional context on chat completions. I am using the llama3.2-1b-instruct model also known as bert-embeddings for embeddings, llama3-8b-instruct as my base LLM, and chromem for the vector database.
Sources can be local files, web pages, git repositories, sitemaps, etc., and are stored in a collection that maps to the agent name from LocalAGI. When the agent is created and access to the knowledge base is enabled, the collection will be created automatically thorough the LocalRecall API and sources can be added subsequently.
Results
The end result of this effort is a functioning chatbot in the Matrix room that is aware of threads on Discourse and Gitea. It can answer questions about the platform, remembers its chat history, and can learn and improve over time. At the time of writing, this is still a prototype. It honestly doesn’t produce the greatest responses just yet; however, with some tweaking of the models, prompts, and other features, I am hoping this can become quite a useful feature of this platform.