blog-post

S3 Streamable Backup: Direct-to-Cloud Backups for Manticore Search

Since we introduced the backup tool in Manticore Search 6, backing up your data has become significantly easier. But we kept hearing the same question: "What about cloud storage?" Today, we're excited to announce that manticore-backup now supports S3-compatible storage with streaming uploads — no intermediate files, no local disk space headaches, just direct-to-cloud backups.

The Problem with Traditional Backups

When you're running Manticore Search in production, your datasets can grow quickly. Backing up to local storage has its limitations:

  • Disk space constraints: You need free space equal to your backup size on the same machine
  • Manual transfer steps: Backup locally, then upload to cloud storage
  • Time overhead: The copy-then-upload dance doubles your backup window
  • Complexity: Scripting reliable uploads with resume capability, encryption, and error handling

Streamable S3 Backup: How It Works

The new S3 storage support streams your backup data directly to S3-compatible storage. Here's what happens under the hood:

  1. No intermediate files: Data streams from Manticore straight to S3
  2. Automatic multipart uploads: Large files are automatically chunked and uploaded in parallel
  3. Built-in encryption: SSE-S3 encryption is enabled by default for AWS S3 (configurable for other providers)
  4. Compression support: Optional zstd compression reduces transfer time and storage costs
  5. Manifest-based restore: No s3:ListBucket permission required for restores

Supported Storage Providers

We've tested with AWS S3, MinIO, and Cloudflare R2, but any S3-compatible storage should work. The implementation uses the standard AWS SDK for PHP, so if it speaks the S3 API, it should work.

Usage

Using S3 backup is as simple as changing your destination path:

CLI

# Set your credentials
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION=us-east-1

# Backup to S3
manticore-backup --config=/etc/manticore/manticore.conf --backup-dir=s3://my-bucket/manticore-backups

# With custom endpoint (MinIO, Wasabi, etc.)
export AWS_ENDPOINT_URL=https://minio.example.com
manticore-backup --config=/etc/manticore/manticore.conf --backup-dir=s3://my-bucket/backups

Environment Variables

VariableDescription
AWS_ACCESS_KEY_IDYour S3 access key
AWS_SECRET_ACCESS_KEYYour S3 secret key
AWS_REGIONS3 region (e.g., us-east-1)
AWS_ENDPOINT_URLCustom endpoint for S3-compatible storage
AWS_S3_ENCRYPTIONSet to 0 to disable SSE-S3 encryption (for MinIO/custom endpoints)

Performance Considerations

S3 streaming backup performance depends primarily on your network bandwidth and the S3 provider's upload speeds. Unlike local disk backups where you're limited by disk I/O, S3 backups are network-bound. The key advantage is eliminating the "write locally, then upload" overhead — data streams directly from Manticore to S3 without touching the local filesystem.

For optimal performance:

  • Ensure adequate upload bandwidth to your S3 endpoint
  • Consider using compression (--compress) to reduce data transfer
  • Multipart uploads are automatic for files over 5MB, improving reliability for large datasets

Restore from S3

Restoring works seamlessly too. The tool downloads files to a temporary directory first, then performs the restore:

# List available backups
manticore-backup --backup-dir=s3://my-bucket/manticore-backups --list

# Restore a specific backup
manticore-backup --config=/etc/manticore/manticore.conf --backup-dir=s3://my-bucket/manticore-backups --restore=backup-20250115120000

Required S3 Permissions

For backup:

  • s3:PutObject
  • s3:PutObjectAcl (if using ACLs)

For listing backups:

  • s3:ListBucket

For restore:

  • s3:GetObject

Note: While listing backups requires s3:ListBucket, restoring a specific backup does not. If you know the backup folder name (e.g., backup-20250115120000), you can restore directly using --restore with just s3:GetObject permission. The manifest file tracks all backup contents, so no directory listing is needed.

Use Cases

Cloud-Native Deployments

Running Manticore in Kubernetes or Docker? S3 backup fits naturally into cloud-native workflows:

# Kubernetes CronJob example
apiVersion: batch/v1
kind: CronJob
metadata:
  name: manticore-backup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: manticoresearch/manticore:latest
            command:
            - manticore-backup
            - --config=/etc/manticore/manticore.conf
            - --backup-dir=s3://my-backup-bucket/manticore
            env:
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  name: s3-credentials
                  key: access-key
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: s3-credentials
                  key: secret-key
          restartPolicy: OnFailure

Disaster Recovery

Store backups in a different region or even a different cloud provider:

# Primary backup to local S3-compatible storage
export AWS_ENDPOINT_URL=https://minio.internal.company.com
manticore-backup --backup-dir=s3://backups-primary/manticore

# Secondary backup to AWS S3 for DR
unset AWS_ENDPOINT_URL
export AWS_REGION=eu-west-1
manticore-backup --backup-dir=s3://company-dr-backups/manticore

Reducing Local Storage Requirements

For large datasets, local backup storage can be expensive. With S3 streaming:

  • No need to provision large backup volumes
  • Pay only for the S3 storage you use
  • Lifecycle policies can automatically move old backups to cheaper storage classes

Technical Details

Streaming Architecture

The S3 storage implementation uses a streaming approach:

  1. File-by-file streaming: Each table file is read and uploaded as a stream
  2. Automatic multipart: Files over 5MB automatically use multipart upload for reliability
  3. Compression on-the-fly: If enabled, zstd compression happens during the stream
  4. Checksum verification: Each file is checksummed to ensure integrity

Storage Interface

The S3 support is built on a new StorageInterface that abstracts storage operations. This means:

  • Local filesystem and S3 share the same code path
  • Future storage backends (GCS, Azure Blob) can be added easily
  • Consistent behavior regardless of storage type

Migration from Local Backups

Already using local backups? Migration is straightforward:

  1. Set up your S3 credentials
  2. Change --backup-dir from /local/path to s3://bucket/path
  3. That's it! The same commands work exactly the same way

Your existing local backups remain accessible, and you can gradually transition to S3 or maintain both for redundancy.

Conclusion

S3 streamable backup brings Manticore Search backup capabilities to the cloud era. Whether you're running in a cloud-native environment, need cross-region disaster recovery, or simply want to reduce local storage overhead, direct-to-S3 streaming makes backups simpler and more efficient.

The feature is available now in manticore-backup. Check out the documentation for more details, and let us know what you think!


Ready to try it? Install Manticore Search and start backing up to S3 today. Questions or feedback? Join us on Slack or GitHub .

Install Manticore Search

Install Manticore Search