Querying local documents, powered by LLM
The purpose of this package is to offer an advanced question-answering (RAG) system with a simple YAML-based configuration that enables interaction with a collection of local documents. Special attention is given to improvements in various components of the system in addition to basic LLM-based RAGs - better document parsing, hybrid search, HyDE, chat history, deep linking, re-ranking, the ability to customize embeddings, and more. The package is designed to work with custom Large Language Models (LLMs) – whether from OpenAI or installed locally.
Interaction with the package is supported through the built-in frontend, or by exposing an MCP server, allowing clients like Cursor, Windsurf or VSCode GH Copilot to interact with the RAG system.
Fast, incremental parsing and embedding of medium size document bases (tested on up to few gigabytes of markdown and pdfs)
Supported document formats
.md
- Divides files based on logical components such as headings, subheadings, and code blocks. Supports additional features like cleaning image links, adding custom metadata, and more..pdf
- MuPDF-based parser..docx
- custom parser, supports nested tables.Unstructured
pre-processor:
Allows interaction with embedded documents, internally supporting the following models and methods (including locally hosted):
Interoperability with LiteLLM + Ollama via OpenAI API, supporting hundreds of different models (see Model configuration for LiteLLM)
SSE MCP Server enabling interface with popular MCP clients.
Generates dense embeddings from a folder of documents and stores them in a vector database (ChromaDB).
multilingual-e5-base
.instructor-large
.Generates sparse embeddings using SPLADE (https://github.com/naver/splade) to enable hybrid search (sparse + dense).
An ability to update the embeddings incrementally, without a need to re-index the entire document base.
Support for table parsing via open-source gmft (https://github.com/conjuncts/gmft) or Azure Document Intelligence.
Optional support for image parsing using Gemini API.
Supports the "Retrieve and Re-rank" strategy for semantic search, see here.
ms-marco-MiniLM
cross-encoder, more modern bge-reranker
is supported.Supports HyDE (Hypothetical Document Embeddings) - see here.
Support for multi-querying, inspired by RAG Fusion
- https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1
Supprts optional chat history with question contextualization
Other features
No configuration available
Related projects feature coming soon
Will recommend related projects based on sub-categories