JarvisArt is a multi-modal large language model (MLLM)-driven agent for intelligent photo retouching. It liberates human creativity by understanding user intent, mimicking professional artist reasoning, and coordinating over 200 tools in Adobe Lightroom.
One-Minute Overview#
JarvisArt, accepted to NeurIPS 2025, is an intelligent photo retouching agent that controls 200+ professional tools through natural language. It allows users to perform professional-level photo editing by simply conversing with an AI agent, eliminating the need for expertise in complex editing software.
Core Value: Transforms complex professional photo editing into natural language interactions, dramatically lowering the barrier to professional retouching.
Getting Started#
Installation Difficulty: Medium - Requires basic Python and machine learning knowledge, but offers complete Gradio Demo and online demonstrations
# Gradio Demo setup
# For specific steps, please refer to the Gradio Demo section in the README
Is this suitable for my scenario?
- ✅ Professional photographers/retouchers: Automate complex retouching workflows to improve efficiency
- ✅ Photography enthusiasts: Achieve high-quality image adjustments without professional editing skills
- ✅ Image researchers: Useful for research in image processing and editing algorithms
- ❌ Commercial applications: Explicitly prohibited by the project license
Core Capabilities#
1. Multi-granularity Retouching Control#
- Supports editing goals at various levels, from scene-level adjustments to region-specific refinements Actual Value: Users can flexibly control editing scope, achieving perfect balance between global optimization and local adjustments
2. Natural Language Interaction#
- Perform intuitive, free-form edits through text prompts and bounding boxes Actual Value: Transforms professional retouching knowledge into natural language descriptions, lowering usage barriers
3. Professional Tool Coordination#
- Coordinates over 200 professional tools in Adobe Lightroom to execute retouching tasks Actual Value: Access to professional-grade editing capabilities without needing to master complex Lightroom operations
4. Innovative Training Framework#
- Employs a two-stage training framework: Chain-of-Thought supervised fine-tuning + Group Relative Policy Optimization for Retouching (GRPO-R) Actual Value: Ensures the model possesses professional-level reasoning and decision-making capabilities
5. Multi-scenario Adaptation#
- Supports various application scenarios including global and local retouching Actual Value: Meets different editing needs, from overall style adjustments to local detail refinements
Technical Stack & Integration#
Development Languages: Python (specific dependencies need code inspection)
Key Dependencies: Multi-modal LLM frameworks, Adobe Lightroom integration protocols
Integration Method: API/SDK/Protocol - Provides Agent-to-Lightroom Protocol for seamless integration with Adobe Lightroom
Maintenance Status#
- Development Activity: Very active, with continuous releases from June to December 2025
- Recent Updates: Released MMArt-Bench dataset and training scripts in December 2025
- Community Response: Offers WeChat discussion groups for active user feedback collection
Commercial & Licensing#
License: Apache License 2.0 (modified version)
- ✅ Commercial Use: Prohibited (explicitly forbidden)
- ✅ Modification: Allowed (under Apache 2.0 terms)
- ⚠️ Restrictions: Any commercial application requires explicit written permission from the authors
Documentation & Learning Resources#
- Documentation Quality: Comprehensive
- Official Documentation: https://github.com/LYL1015/JarvisArt
- Example Code: Complete (inference code, training scripts, data scripts, evaluation code)
- Tutorial Resources: Gradio Demo, online demo, Agent-to-Lightroom Protocol documentation, training guide