JarvisArt

JarvisArt is a multi-modal large language model (MLLM)-driven agent for intelligent photo retouching. It liberates human creativity by understanding user intent, mimicking professional artist reasoning, and coordinating over 200 tools in Adobe Lightroom.

One-Minute Overview#

JarvisArt, accepted to NeurIPS 2025, is an intelligent photo retouching agent that controls 200+ professional tools through natural language. It allows users to perform professional-level photo editing by simply conversing with an AI agent, eliminating the need for expertise in complex editing software.

Core Value: Transforms complex professional photo editing into natural language interactions, dramatically lowering the barrier to professional retouching.

Getting Started#

Installation Difficulty: Medium - Requires basic Python and machine learning knowledge, but offers complete Gradio Demo and online demonstrations

# Gradio Demo setup
# For specific steps, please refer to the Gradio Demo section in the README

Is this suitable for my scenario?

✅ Professional photographers/retouchers: Automate complex retouching workflows to improve efficiency

✅ Photography enthusiasts: Achieve high-quality image adjustments without professional editing skills

✅ Image researchers: Useful for research in image processing and editing algorithms

❌ Commercial applications: Explicitly prohibited by the project license

Core Capabilities#

1. Multi-granularity Retouching Control#

Supports editing goals at various levels, from scene-level adjustments to region-specific refinements Actual Value: Users can flexibly control editing scope, achieving perfect balance between global optimization and local adjustments

2. Natural Language Interaction#

Perform intuitive, free-form edits through text prompts and bounding boxes Actual Value: Transforms professional retouching knowledge into natural language descriptions, lowering usage barriers

3. Professional Tool Coordination#

Coordinates over 200 professional tools in Adobe Lightroom to execute retouching tasks Actual Value: Access to professional-grade editing capabilities without needing to master complex Lightroom operations

4. Innovative Training Framework#

Employs a two-stage training framework: Chain-of-Thought supervised fine-tuning + Group Relative Policy Optimization for Retouching (GRPO-R) Actual Value: Ensures the model possesses professional-level reasoning and decision-making capabilities

5. Multi-scenario Adaptation#

Supports various application scenarios including global and local retouching Actual Value: Meets different editing needs, from overall style adjustments to local detail refinements

Technical Stack & Integration#

Development Languages: Python (specific dependencies need code inspection)

Key Dependencies: Multi-modal LLM frameworks, Adobe Lightroom integration protocols

Integration Method: API/SDK/Protocol - Provides Agent-to-Lightroom Protocol for seamless integration with Adobe Lightroom

Maintenance Status#

Development Activity: Very active, with continuous releases from June to December 2025
Recent Updates: Released MMArt-Bench dataset and training scripts in December 2025
Community Response: Offers WeChat discussion groups for active user feedback collection

Commercial & Licensing#

License: Apache License 2.0 (modified version)

✅ Commercial Use: Prohibited (explicitly forbidden)
✅ Modification: Allowed (under Apache 2.0 terms)
⚠️ Restrictions: Any commercial application requires explicit written permission from the authors

Documentation & Learning Resources#

Documentation Quality: Comprehensive
Official Documentation: https://github.com/LYL1015/JarvisArt
Example Code: Complete (inference code, training scripts, data scripts, evaluation code)
Tutorial Resources: Gradio Demo, online demo, Agent-to-Lightroom Protocol documentation, training guide