Skip to main content

RAGFlow Vector Stores

Litellm support creation and management of datasets for document processing and knowledge base management in Ragflow.

PropertyDetails
DescriptionRAGFlow datasets enable document processing, chunking, and knowledge base management for RAG applications.
Provider Route on LiteLLMragflow in the litellm vector_store_registry
Provider DocRAGFlow API Documentation ↗
Supported OperationsDataset Management (Create, List, Update, Delete)
Search/Retrieval❌ Not supported (management only)

Quick Start

LiteLLM Python SDK

Example using LiteLLM Python SDK
import os
import litellm

# Set RAGFlow credentials
os.environ["RAGFLOW_API_KEY"] = "your-ragflow-api-key"
os.environ["RAGFLOW_API_BASE"] = "http://localhost:9380" # Optional, defaults to localhost:9380

# Create a RAGFlow dataset
response = litellm.vector_stores.create(
name="my-dataset",
custom_llm_provider="ragflow",
metadata={
"description": "My knowledge base dataset",
"embedding_model": "BAAI/bge-large-zh-v1.5@BAAI",
"chunk_method": "naive"
}
)

print(f"Created dataset ID: {response.id}")
print(f"Dataset name: {response.name}")

LiteLLM Proxy

1. Configure your vector_store_registry

model_list:
- model_name: gpt-4o-mini
litellm_params:
model: gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY

vector_store_registry:
- vector_store_name: "ragflow-knowledge-base"
litellm_params:
vector_store_id: "your-dataset-id"
custom_llm_provider: "ragflow"
api_key: os.environ/RAGFLOW_API_KEY
api_base: os.environ/RAGFLOW_API_BASE # Optional
vector_store_description: "RAGFlow dataset for knowledge base"
vector_store_metadata:
source: "Company documentation"

2. Create a dataset via Proxy

curl http://localhost:4000/v1/vector_stores \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-d '{
"name": "my-ragflow-dataset",
"custom_llm_provider": "ragflow",
"metadata": {
"description": "Test dataset",
"chunk_method": "naive"
}
}'

Configuration

Environment Variables

RAGFlow vector stores support configuration via environment variables:

  • RAGFLOW_API_KEY - Your RAGFlow API key (required)
  • RAGFLOW_API_BASE - RAGFlow API base URL (optional, defaults to http://localhost:9380)

Parameters

You can also pass these via litellm_params:

  • api_key - RAGFlow API key (overrides RAGFLOW_API_KEY env var)
  • api_base - RAGFlow API base URL (overrides RAGFLOW_API_BASE env var)

Dataset Creation Options

Basic Dataset Creation

response = litellm.vector_stores.create(
name="basic-dataset",
custom_llm_provider="ragflow"
)

Dataset with Chunk Method

RAGFlow supports various chunk methods for different document types:

response = litellm.vector_stores.create(
name="general-dataset",
custom_llm_provider="ragflow",
metadata={
"chunk_method": "naive",
"parser_config": {
"chunk_token_num": 512,
"delimiter": "\n",
"html4excel": False,
"layout_recognize": "DeepDOC"
}
}
)

Dataset with Ingestion Pipeline

Instead of using a chunk method, you can use an ingestion pipeline:

response = litellm.vector_stores.create(
name="pipeline-dataset",
custom_llm_provider="ragflow",
metadata={
"parse_type": 2, # Number of parsers in your pipeline
"pipeline_id": "d0bebe30ae2211f0970942010a8e0005" # 32-character hex ID
}
)

Note: chunk_method and pipeline_id are mutually exclusive. Use one or the other.

Advanced Parser Configuration

response = litellm.vector_stores.create(
name="advanced-dataset",
custom_llm_provider="ragflow",
metadata={
"chunk_method": "naive",
"description": "Advanced dataset with custom parser config",
"embedding_model": "BAAI/bge-large-zh-v1.5@BAAI",
"permission": "me", # or "team"
"parser_config": {
"chunk_token_num": 1024,
"delimiter": "\n!?;。;!?",
"html4excel": True,
"layout_recognize": "DeepDOC",
"auto_keywords": 5,
"auto_questions": 3,
"task_page_size": 12,
"raptor": {
"use_raptor": True
},
"graphrag": {
"use_graphrag": False
}
}
}
)

Supported Chunk Methods

RAGFlow supports the following chunk methods:

  • naive - General purpose (default)
  • book - For book documents
  • email - For email documents
  • laws - For legal documents
  • manual - Manual chunking
  • one - Single chunk
  • paper - For academic papers
  • picture - For image documents
  • presentation - For presentation documents
  • qa - Q&A format
  • table - For table documents
  • tag - Tag-based chunking

RAGFlow-Specific Parameters

All RAGFlow-specific parameters should be passed via the metadata field:

ParameterTypeDescription
avatarstringBase64 encoding of the avatar (max 65535 chars)
descriptionstringBrief description of the dataset (max 65535 chars)
embedding_modelstringEmbedding model name (e.g., "BAAI/bge-large-zh-v1.5@BAAI")
permissionstringAccess permission: "me" (default) or "team"
chunk_methodstringChunking method (see supported methods above)
parser_configobjectParser configuration (varies by chunk_method)
parse_typeintNumber of parsers in pipeline (required with pipeline_id)
pipeline_idstring32-character hex pipeline ID (required with parse_type)

Error Handling

RAGFlow returns error responses in the following format:

{
"code": 101,
"message": "Dataset name 'my-dataset' already exists"
}

LiteLLM automatically maps these to appropriate exceptions:

  • code != 0 → Raises exception with the error message
  • Missing required fields → Raises ValueError
  • Mutually exclusive parameters → Raises ValueError

Limitations

  • Search/Retrieval: RAGFlow vector stores support dataset management only. Search operations are not supported and will raise NotImplementedError.
  • List/Update/Delete: These operations are not yet implemented through the standard vector store API. Use RAGFlow's native API endpoints directly.

Further Reading

Vector Stores: