Migrating large, complex datasets from Hugging Face to Couchbase just got easier. Whether you prefer working from the command line or within your favorite IDE, our new Hugging Face to Couchbase migration toolkit—featuring the Couchbase VS Code Extension and the cbmigrate CLI —simplifies and streamlines the process.
Introducing the Hugging Face-to-Couchbase migration tools
Our toolkit for migrating Hugging Face datasets to Couchbase includes two core components, optimized for different development workflows:
-
- Couchbase VS Code Extension
- CLI Tool (cbmigrate hugging-face)
Whether you’re a command-line purist or an IDE enthusiast, we have you covered!
1. Couchbase VS Code extension
Prefer a graphical interface within VS Code? Our Couchbase extension enables seamless dataset migration directly from your IDE.
Key features
-
- Integrated migration: Perform dataset migrations within VS Code, streamlining your workflow.
- User-friendly interface: Use a graphical UI to configure and monitor data migration processes effortlessly.
For more details, refer to the Couchbase VS Code plugin repository.
2. Command-line tool: cbmigrate hugging-face
For developers who prefer the terminal, the hugging-face subcommand of cbmigrate provides a powerful and efficient way to migrate Hugging Face datasets to Couchbase.
Key features
-
- Easy dataset exploration: List configurations, splits, and fields within Hugging Face datasets before committing to a migration.
- Flexible migration: Supports streaming, batch processing, and custom document ID generation.
- Security & privacy: Supports authentication for private datasets and secure Couchbase connections.
Usage
List dataset configurations
1 |
cbmigrate hugging-face list-configs --path <DATASET_PATH_OR_NAME> |
List dataset splits
1 |
cbmigrate hugging-face list-splits --path <DATASET_PATH_OR_NAME> |
List dataset fields
1 |
cbmigrate hugging-face list-fields --path <DATASET_PATH_OR_NAME> |
Migrate dataset to Couchbase
1 2 3 |
cbmigrate hugging-face migrate --path <DATASET_PATH_OR_NAME> --id-fields <FIELD1,FIELD2,...> \ --cb-url couchbase://<HOST> --cb-username <USERNAME> --cb-password <PASSWORD> \ --cb-bucket <BUCKET_NAME> --cb-scope <SCOPE_NAME> --cb-collection <COLLECTION_NAME> |
Example: simple migration of a public dataset
1 2 3 |
cbmigrate hugging-face migrate --path glue --split train --id-fields idx \ --cb-url couchbase://localhost --cb-username Administrator --cb-password password \ --cb-bucket sample_bucket --cb-scope sample_scope --cb-collection sample_collection |
For more information, visit the cbmigrate GitHub repository.