A framework for building semantic layers, context graphs, and decision intelligence systems with explainability and provenance.
Overview#
Semantica is an open-source semantic layer & context graph framework designed to bridge the "semantic gap" and solve "black box" issues in AI systems. It provides a complete pipeline from unstructured data (PDF/DOCX/HTML/JSON/CSV/Excel/PPTX) to structured knowledge graphs, integrating NER, relation extraction, ontology generation, and vector retrieval.
Core Capabilities#
Semantics & Knowledge#
- Context Graphs — Structured knowledge representation with entity relationships and semantic context
- Decision Tracking — Full decision lifecycle management with precedent search and causal analysis
- KG Algorithms — Built-in centrality analysis, community detection, Node2Vec/DeepWalk embeddings
- Provenance Tracking — W3C PROV-O compliant data lineage tracking across 17 modules
Data Processing#
- Universal Ingestion — Support for PDF, DOCX, HTML, JSON, CSV, Excel, PPTX and database streaming
- Entity/Relation Extraction — NER, relation extraction, event detection with LLM enhancement
- Ontology Generation — 6-stage LLM pipeline for automatic OWL ontology generation with HermiT/Pellet validation
- Custom Ontology Import — Support for OWL, RDF, Turtle, JSON-LD formats
Governance & Quality#
- Conflict Detection — Semantic conflict detection and resolution
- Deduplication — Jaro-Winkler similarity-based entity deduplication
- Change Management — Enterprise-grade version control with SHA-256 integrity verification
AI Enhancement#
- GraphRAG — Knowledge graph-enhanced retrieval with multi-hop reasoning and semantic reranking
- Unified LLM Interface — Support for Groq, OpenAI, HuggingFace, LiteLLM (100+ LLMs)
Architecture#
Three-layer design:
- Input Layer — Governed data ingestion with Docling, OCR, APIs
- Semantic Layer — Trust & reasoning engine for NER, relation extraction, ontology induction, deduplication, conflict detection
- Output Layer — Auditable knowledge assets: Knowledge Graphs, OWL Ontologies, Vector Embeddings
Storage Backend Support#
- Vector Store: FAISS, PostgreSQL/pgvector, Weaviate, Qdrant, Milvus, Pinecone, InMemory
- Graph Store: Neo4j, FalkorDB, Amazon Neptune, Apache AGE
- Triplet Store: Blazegraph, Jena, RDF4J
Installation#
# PyPI (Recommended)
pip install semantica
# With all optional dependencies
pip install semantica[all]
# Development install from source
git clone https://github.com/Hawksight-AI/semantica.git
cd semantica
pip install -e ".[all]"
Quick Start#
from semantica.context import AgentContext, ContextGraph
from semantica.vector_store import VectorStore
# Initialize
vs = VectorStore(backend="faiss", dimension=768)
kg = ContextGraph(advanced_analytics=True)
context = AgentContext(
vector_store=vs,
knowledge_graph=kg,
decision_tracking=True,
advanced_analytics=True,
kg_algorithms=True,
)
# Store memory and auto-build context graph
memory_id = context.store(
"User is working on a React project with FastAPI",
conversation_id="session_1"
)
# Record decision
decision_id = context.graph_builder.add_decision(
category="technology_choice",
scenario="Framework selection for web API",
reasoning="React ecosystem with FastAPI provides best performance",
outcome="selected_fastapi",
confidence=0.92
)
Typical Use Cases#
- 🏥 Healthcare: Clinical decision support, drug interaction analysis, medical literature reasoning
- 💰 Finance: Fraud detection, regulatory compliance (SOX, GDPR, MiFID II), credit risk assessment
- ⚖️ Legal: Evidence-based legal research, contract analysis, regulatory change tracking
- 🔒 Cybersecurity: Threat attribution, incident response, security audit trails
- 🏛️ Government & Defense: AI system governance, policy decisions, defense intelligence
- 🚗 Autonomous Systems: Vehicle decision logging, robotics safety
Design Principles#
- Opt-In Design — Provenance disabled by default, zero breaking changes
- Modular Architecture — Each module can be used independently
- Production Ready — Comprehensive error handling and scalability design