InternTA

Multi-agent AI Teaching Assistant training system that learns from limited course materials. It provides an end-to-end pipeline — dataset generation, QLoRA fine-tuning, and RAG augmentation — to produce domain-specific TA models with full local deployment support.

InternTA is a multi-agent automated AI Teaching Assistant training system designed for courses with scarce teaching materials, validated in synthetic biology education. The system operates through three collaborative agents:

Dataset Agent: Extracts exercises, terms, and concepts from Excel course materials, generating OpenAI conversation-format training data with explicit reasoning paths and guided answering strategies for thought-provoking questions, outputting training.json / validation.json.

Training Agent: Uses DeepSeek-R1-Distill-Qwen-7B as the base model, performing efficient fine-tuning via PEFT + QLoRA (4-bit quantization), with a built-in LLM Judge for automated training plan generation and hyperparameter tuning, supporting both basic SFT (train.sh) and advanced Agent-driven training (traino.sh) modes.

RAG Agent: Performs structured processing and semantic retrieval over course materials during inference, injecting relevant knowledge fragments into the generation process to compensate for fine-grained knowledge gaps in the fine-tuned model.

End-to-End Data Flow#

Excel raw data → data/generate_data.py → training.json / validation.json
                                    ↓
                    train_agent.py or sft_internTA2.py (QLoRA fine-tuning)
                                    ↓
                            merge.py (merge LoRA adapter)
                                    ↓
                    api.py (FastAPI) + app.py (Streamlit) → User access

Deployment & Integration#

Full local deployment on 8GB+ VRAM GPU, preventing data leakage
OpenAI-compatible /v1/chat/completions endpoint with Bearer Token authentication
Dual entry: Streamlit Web interface (default port 8080) + FastAPI API service
Dockerfile.web and docker-compose.web.yml exist; usage details TBD

Quick Start#

git clone https://github.com/kongfoo-ai/internTA
cd internTA
pip install -r requirements.txt
sh run.sh

Unconfirmed Information#

Associated paper: Not directly referenced in README; may exist but unlisted
Online demo: "E. Copi (Education)" mentioned but no specific URL provided
Model weights/checkpoints: Not published in the repository
Quantitative evaluation results: Described as strong but no specific metrics disclosed
RAG implementation details: Vector database/embedding model not specified
LLM Judge specifics: Evaluation criteria and dependent model not detailed

End-to-End Data Flow#

Deployment & Integration#

Quick Start#

Unconfirmed Information#

Related Projects

Basic Memory

vfs (Virtual Function Signatures)

RexCLI

STAY UPDATED