# S3 Streamable Backup: Direct-to-Cloud Backups for Manticore Search

Since we introduced the [backup tool](/blog/new-backup-and-recovery-approaches/) in Manticore Search 6, backing up your data has become significantly easier. But we kept hearing the same question: *"What about cloud storage?"* Today, we're excited to announce that **manticore-backup** now supports **S3-compatible storage** with streaming uploads — no intermediate files, no local disk space headaches, just direct-to-cloud backups.

## The Problem with Traditional Backups

When you're running Manticore Search in production, your datasets can grow quickly. Backing up to local storage has its limitations:

- **Disk space constraints**: You need free space equal to your backup size on the same machine
- **Manual transfer steps**: Backup locally, then upload to cloud storage
- **Time overhead**: The copy-then-upload dance doubles your backup window
- **Complexity**: Scripting reliable uploads with resume capability, encryption, and error handling

## Streamable S3 Backup: How It Works

The new S3 storage support streams your backup data **directly** to S3-compatible storage. Here's what happens under the hood:

1. **No intermediate files**: Data streams from Manticore straight to S3
2. **Automatic multipart uploads**: Large files are automatically chunked and uploaded in parallel
3. **Built-in encryption**: SSE-S3 encryption is enabled by default for AWS S3 (configurable for other providers)
4. **Compression support**: Optional zstd compression reduces transfer time and storage costs
5. **Manifest-based restore**: No `s3:ListBucket` permission required for restores

### Supported Storage Providers

We've tested with **AWS S3**, **MinIO**, and **Cloudflare R2**, but any S3-compatible storage should work. The implementation uses the standard AWS SDK for PHP, so if it speaks the S3 API, it should work.

## Usage

Using S3 backup is as simple as changing your destination path:

### CLI

```bash
# Set your credentials
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION=us-east-1

# Backup to S3
manticore-backup --config=/etc/manticore/manticore.conf --backup-dir=s3://my-bucket/manticore-backups

# With custom endpoint (MinIO, Wasabi, etc.)
export AWS_ENDPOINT_URL=https://minio.example.com
manticore-backup --config=/etc/manticore/manticore.conf --backup-dir=s3://my-bucket/backups
```

### Environment Variables

| Variable | Description |
|----------|-------------|
| `AWS_ACCESS_KEY_ID` | Your S3 access key |
| `AWS_SECRET_ACCESS_KEY` | Your S3 secret key |
| `AWS_REGION` | S3 region (e.g., `us-east-1`) |
| `AWS_ENDPOINT_URL` | Custom endpoint for S3-compatible storage |
| `AWS_S3_ENCRYPTION` | Set to `0` to disable SSE-S3 encryption (for MinIO/custom endpoints) |

## Performance Considerations

S3 streaming backup performance depends primarily on your network bandwidth and the S3 provider's upload speeds. Unlike local disk backups where you're limited by disk I/O, S3 backups are network-bound. The key advantage is eliminating the "write locally, then upload" overhead — data streams directly from Manticore to S3 without touching the local filesystem.

For optimal performance:
- Ensure adequate upload bandwidth to your S3 endpoint
- Consider using compression (`--compress`) to reduce data transfer
- Multipart uploads are automatic for files over 5MB, improving reliability for large datasets

## Restore from S3

Restoring works seamlessly too. The tool downloads files to a temporary directory first, then performs the restore:

```bash
# List available backups
manticore-backup --backup-dir=s3://my-bucket/manticore-backups --list

# Restore a specific backup
manticore-backup --config=/etc/manticore/manticore.conf --backup-dir=s3://my-bucket/manticore-backups --restore=backup-20250115120000
```

### Required S3 Permissions

**For backup:**
- `s3:PutObject`
- `s3:PutObjectAcl` (if using ACLs)

**For listing backups:**
- `s3:ListBucket`

**For restore:**
- `s3:GetObject`

**Note:** While listing backups requires `s3:ListBucket`, restoring a specific backup does not. If you know the backup folder name (e.g., `backup-20250115120000`), you can restore directly using `--restore` with just `s3:GetObject` permission. The manifest file tracks all backup contents, so no directory listing is needed.

## Use Cases

### Cloud-Native Deployments

Running Manticore in Kubernetes or Docker? S3 backup fits naturally into cloud-native workflows:

```yaml
# Kubernetes CronJob example
apiVersion: batch/v1
kind: CronJob
metadata:
  name: manticore-backup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: manticoresearch/manticore:latest
            command:
            - manticore-backup
            - --config=/etc/manticore/manticore.conf
            - --backup-dir=s3://my-backup-bucket/manticore
            env:
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  name: s3-credentials
                  key: access-key
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: s3-credentials
                  key: secret-key
          restartPolicy: OnFailure
```

### Disaster Recovery

Store backups in a different region or even a different cloud provider:

```bash
# Primary backup to local S3-compatible storage
export AWS_ENDPOINT_URL=https://minio.internal.company.com
manticore-backup --backup-dir=s3://backups-primary/manticore

# Secondary backup to AWS S3 for DR
unset AWS_ENDPOINT_URL
export AWS_REGION=eu-west-1
manticore-backup --backup-dir=s3://company-dr-backups/manticore
```

### Reducing Local Storage Requirements

For large datasets, local backup storage can be expensive. With S3 streaming:

- No need to provision large backup volumes
- Pay only for the S3 storage you use
- Lifecycle policies can automatically move old backups to cheaper storage classes

## Technical Details

### Streaming Architecture

The S3 storage implementation uses a streaming approach:

1. **File-by-file streaming**: Each table file is read and uploaded as a stream
2. **Automatic multipart**: Files over 5MB automatically use multipart upload for reliability
3. **Compression on-the-fly**: If enabled, zstd compression happens during the stream
4. **Checksum verification**: Each file is checksummed to ensure integrity

### Storage Interface

The S3 support is built on a new `StorageInterface` that abstracts storage operations. This means:

- Local filesystem and S3 share the same code path
- Future storage backends (GCS, Azure Blob) can be added easily
- Consistent behavior regardless of storage type

## Migration from Local Backups

Already using local backups? Migration is straightforward:

1. Set up your S3 credentials
2. Change `--backup-dir` from `/local/path` to `s3://bucket/path`
3. That's it! The same commands work exactly the same way

Your existing local backups remain accessible, and you can gradually transition to S3 or maintain both for redundancy.

## Conclusion

S3 streamable backup brings Manticore Search backup capabilities to the cloud era. Whether you're running in a cloud-native environment, need cross-region disaster recovery, or simply want to reduce local storage overhead, direct-to-S3 streaming makes backups simpler and more efficient.

The feature is available now in manticore-backup. Check out the [documentation](https://manual.manticoresearch.com/Securing_and_compacting_a_table/Backup_and_restore#S3-storage-support) for more details, and let us know what you think!

---

**Ready to try it?** [Install Manticore Search](/install/) and start backing up to S3 today. Questions or feedback? Join us on [Slack](https://slack.manticoresearch.com/) or [GitHub](https://github.com/manticoresoftware/manticoresearch-backup).
