effGen

An open-source agent framework optimized for Small Language Models (SLMs), featuring efficient local deployment, multi-agent DAG orchestration, RAG pipelines, and a production-grade API Server without cloud LLM API dependencies.

effGen is an agent framework purpose-built for Small Language Models (SLMs). Through core mechanisms including context compression (70-80% reduction), intelligent task decomposition, five-factor complexity routing, and speculative execution, it significantly improves SLM performance in agent tasks. The paper reports an 11.2% success rate improvement on 1.5B models, outperforming LangChain, AutoGen, and Smolagents across 13 benchmarks.

The framework supports 7 inference backends: MLX (Apple Silicon native Metal GPU acceleration), MLX-VLM (vision-language models), vLLM (NVIDIA GPU), Transformers (general CPU/GPU), Cloud API (OpenAI/Anthropic/Gemini), GGUF/AWQ/GPTQ quantization formats, and Cerebras (in development). A unified load_model interface abstracts backend differences.

The agent loop follows the ReAct paradigm. Version 0.2.0 provides 31 built-in tools with native function calling support (Qwen, Llama, Mistral) and hybrid mode, featuring JSON/Pydantic structured output validation. Multi-agent orchestration uses a DAG workflow engine supporting automatic sub-agent generation, parallel execution, shared memory, and A2A/MCP/ACP protocols.

The RAG pipeline covers PDF/DOCX/HTML/Markdown document ingestion, semantic + BM25 hybrid retrieval, reranking, and inline citations, with FAISS and Chroma as vector store backends. Security features include PII detection, prompt injection blocking, toxicity filtering, tool permission control, and Docker sandbox isolation, activatable via get_guardrail_preset("strict").

For production deployment, a built-in OpenAI-compatible /v1/chat/completions API Server provides request queuing, agent pooling, multi-tenancy, and API key management. The framework includes 270 test cases across an 11-model × 10-agent compatibility matrix, with OpenTelemetry tracing and Prometheus metrics integration. In addition to the Python library, a TypeScript client SDK and Conda build recipe are provided.

Installation:

pip install effgen              # Basic
pip install effgen[mlx]         # Apple Silicon
pip install effgen[vllm]        # NVIDIA GPU
pip install effgen[all]         # Full

Quick Start:

from effgen import Agent, load_model
from effgen.core.agent import AgentConfig
from effgen.tools.builtin import Calculator, PythonREPL

model = load_model("Qwen/Qwen2.5-1.5B-Instruct", quantization="4bit")
config = AgentConfig(name="math_agent", model=model, tools=[Calculator(), PythonREPL()])
agent = Agent(config=config)
result = agent.run("What is 24344 * 334?")
print(result.output)

Unconfirmed items: The arXiv paper states MIT License while the GitHub repo shows Apache License 2.0 (repo takes precedence); the official website shows 14 built-in tools vs. 31 in v0.2.0 README (website may be outdated); Cerebras adapter is at skeleton stage; specific benchmark names and scores across the 13 benchmarks require consulting the full paper.

Related Projects

OpenBrowser

memU Bot

debug-that

STAY UPDATED