A high-performance, end-to-end observability data pipeline built with Rust.
Overview#
Vector is a high-performance observability data pipeline built in Rust, distributed as a single static binary with no external dependencies. It operates seamlessly as both an Agent (daemon/sidecar) and an Aggregator, covering the entire data lifecycle from ingestion to delivery. Logs support is stable, Metrics is in Beta, and Traces is marked as "coming soon".
Core Capabilities#
- Performance & Reliability: Built in Rust for memory safety and multi-core concurrency; official benchmarks show throughput significantly outperforming Fluent Bit, Fluentd, Logstash, and Filebeat; disk-buffered persistence ensures zero data loss on crashes/restarts; rigorous correctness testing for file rotation, truncation, JSON wrapping, and process signal handling
- Data & Transformation: Unified Log and Metric data models; built-in Vector Remap Language (VRL) for complex parsing, manipulation, decoration, and PII redaction (e.g.,
redactfunction); Enrichment Tables (in-memory or file-based) for data correlation - Deployment & Topology: Single binary, no runtime dependencies; supports three deployment topologies: Distributed (agent-based), Centralized (aggregator-based), Stream-based
- Configuration & Operations: YAML, TOML, JSON formats with ytt/Jsonnet/CUE template tool compatibility; multi-file config merging and environment variable injection (e.g.,
${DATADOG_API_KEY}); built-in unit testing within config files to validate transform logic - Ecosystem & Compatibility: Vendor-neutral with 47+ sources, 17+ transforms, 61+ sinks; native
kubernetes_logssource for deep K8s integration; multi-destination routing (e.g., simultaneous delivery to Elasticsearch for querying and S3 for archiving)
Architecture Highlights#
- Pipeline architecture:
Sources→Transforms→Sinks, with explicit data flow dependencies declared viainputsfields, forming a DAG - Core built with Rust and Cargo; configuration schema validation extensively uses CUE (34.9% of codebase)
- Protocol Buffers definitions in
proto/directory with native gRPC support - Four-layer testing:
benches/(performance benchmarks),regression/(regression tests),testing/(integration tests),tests/(unit tests) - Multiple Docker image variants (debian, distroless-libc, distroless-static, alpine); Tilt integration for local dev orchestration
Typical Use Cases#
| Scenario | Description |
|---|---|
| Cost reduction | Data sampling, compression, routing to low-cost storage (e.g., S3) |
| Vendor migration | Seamless switching between observability vendors without workflow disruption |
| Data quality | VRL-powered parsing and enrichment for improved analyzability |
| Agent consolidation | Replace multiple agents (Filebeat + Logstash + Metricbeat) with one tool |
| Kubernetes log collection | Native kubernetes_logs source for K8s environments |
| Audit & compliance | PII redaction before routing to backends |
| Multi-destination routing | Same source to Elasticsearch (short-term) and S3 (long-term archival) |
Installation & Quick Start#
Script install (recommended):
curl --proto '=https' --tlsv1.2 -sSfL https://sh.vector.dev | bash
Docker install:
docker pull timberio/vector:0.55.0-debian
Minimal example:
Create vector.yaml:
sources:
in:
type: "stdin"
sinks:
out:
inputs:
- "in"
type: "console"
encoding:
codec: "text"
Run:
echo 'Hello world!' | vector
Configuration Essentials#
data_dir: "/var/lib/vector"
api:
enabled: false
sources:
<id>:
type: <source_type>
transforms:
<id>:
type: <transform_type>
inputs: ["<source_or_transform_id>"]
sinks:
<id>:
type: <sink_type>
inputs: ["<source_or_transform_id>"]
VRL transform example:
transforms:
remap_syslog:
type: "remap"
inputs: ["generate_syslog"]
source: |
structured = parse_syslog!(.message)
. = merge(., structured)
Additional Notes#
- Latest version: v0.55.0
- Primary languages: Rust (62.4%), CUE (34.9%)
- Supported platforms: Linux, macOS, Windows (x86_64, ARM64/v7)
- Licensed under MPL-2.0
- Maintained by Datadog Community Open Source Engineering team
- Officially cites a max user processing over 500TB/day; enterprise users include Atlassian, T-Mobile, Comcast (no public reference links provided)