No-code LLM platform to launch APIs and ETL pipelines that structure unstructured documents—the Data Layer for your Agentic Workflows. Features Prompt Studio for visual prompt engineering, one-click deployment, and close to 100% accuracy with LLMChallenge verification.
Overview#
Unstract is an enterprise-focused no-code LLM platform for extracting structured data from unstructured documents. The core component, Prompt Studio, provides a visual prompt engineering environment with multi-LLM real-time comparison, fill-rate monitoring, and cost evaluation. Users can design extraction logic through drag-and-drop configuration and publish with one click as REST APIs, ETL pipelines, MCP Servers, or n8n nodes.
Problems Solved#
- Traditional IDP and OCR solutions have limited accuracy with complex documents
- Hand-written document parsing scripts have high maintenance costs and poor generalization
- Direct LLM calls risk hallucinations, difficult to deploy in high-compliance scenarios
- Enterprise historical documents cannot efficiently be loaded into warehouses/lakes
- Agent and RAG applications lack reliable structured data supply layers
Core Capabilities#
Prompt Studio#
- Visual development environment designed for document data extraction
- Multi-LLM side-by-side output comparison and cost evaluation
- Real-time validation and feedback for rapid iteration
- Fill-rate monitoring to quantify prompt quality
- One-click extraction API launch
Deployment & Integration Modes#
| Mode | Target Users | Features |
|---|---|---|
| API Deployments | Dev/Business teams | One-click REST API generation |
| ETL Pipelines | Data engineering teams | Batch processing to data warehouses |
| MCP Servers | Agent/LLM developers | MCP protocol structured data extraction |
| n8n Nodes | Low-code/Ops teams | Drag-and-drop node invocation |
Enterprise Features#
- LLMChallenge: Dual-LLM cross-validation, eliminates hallucinations
- SinglePass Extraction: Up to 8x token reduction
- SummarizedExtraction: Up to 6x token savings
- Human-In-The-Loop: Side-by-side comparison with source highlighting
- SSO Support: Enterprise unified authentication
Supported Formats#
Word Processing (DOCX/DOC/ODT), Presentations (PPTX/PPT/ODP), Spreadsheets (XLSX/XLS/ODS), Documents (PDF/TXT/CSV/JSON), Images (BMP/GIF/JPEG/PNG/TIFF/WEBP)
Architecture#
Design Principles#
- No-Code First: Business users need no programming
- Zero Trust Security: In-memory processing, container isolation
- Scalable Microservices: From development to enterprise deployment
Four-Layer Architecture#
| Layer | Responsibility |
|---|---|
| External Integrations | AI and data service integration |
| Application | Core platform, business logic, workflow coordination |
| Persistence | PostgreSQL+pgvector, Redis, RabbitMQ, MinIO |
| Tool Execution | Independent container execution, auto-cleanup |
Core Services#
| Service | Tech Stack | Responsibility |
|---|---|---|
| Frontend | React 18 + Ant Design | Prompt Studio & Workflow Studio SPA |
| Backend | Django 4.2 + DRF | Public API, multi-tenant management, auth |
| Platform Service | Flask | Tool gateway, connector authentication |
| Prompt Service | Flask | LLM unified interface, LlamaIndex integration |
| Runner | Python + Docker API | Container lifecycle management |
| X2Text Service | Flask | Document format conversion |
Data Architecture#
- PostgreSQL 14+ with pgvector (multi-tenant schema, vector storage)
- RabbitMQ + Celery (async task queue)
- Redis (session, cache, rate limiting)
- MinIO (S3-compatible object storage)
Ecosystem#
LLM Providers: OpenAI, Azure OpenAI, Anthropic, Google VertexAI/Gemini, Bedrock, Ollama, Mistral AI
Vector Databases: Qdrant, Weaviate, Pinecone, Milvus, PostgreSQL pgvector
Text Extractors: LLMWhisperer V2, Unstructured.io, LlamaIndex Parse
ETL Sources: AWS S3, MinIO, GCS, Azure Blob, Google Drive, Dropbox, SFTP
ETL Targets: Snowflake, Redshift, BigQuery, PostgreSQL, MySQL, SQL Server, Oracle
Installation#
Requirements#
- Memory: 8GB RAM (minimum)
- OS: Linux or macOS
- Dependencies: Docker, Docker Compose, Git
Quick Start#
git clone https://github.com/Zipstack/unstract.git
cd unstract
./run-platform.sh
# Access http://frontend.unstract.localhost
# Default credentials: unstract/unstract
Three-Step Workflow#
- Prompt Studio: Design extraction logic for specific document types
- Connect Sources & Targets: Configure data sources and warehouses
- Deploy: Choose API, ETL pipeline, or Q&A application
Typical Use Cases#
- Financial document processing (bank statements, invoices, contracts)
- Government and insurance form automation
- Long document understanding in complex business processes
- Unstructured data ETL before data warehouses/lakes
- Structured data supply layer for Agent/LLM applications