I’m architecting a new Couchbase database that will run on Capella w/Sync Gateway for multiclient synchronization. I would appreciate advice on document structure design, specifically regarding the performance implications of nested vs flat documents.
Context
- Infrastructure: Couchbase Capella with Sync Gateway
- Access Pattern: Multiple clients reading/writing through app endpoints
- Data Visibility: Mix of public data (all users) and user-specific channels
- Load Profile:
- Reads: Burst of requests on app load, then <1 request/second
- Writes: Infrequent, just a few during app lifecycle
- Constraints: No strict latency or storage requirements
Question
I’m debating between two document structures and want to understand the performance trade-offs. For the example doc we are frequently filtering/sorting by timestamp fields and therefore indexing the individual timestamp fields. Metadata fields are not indexed or commonly accessed by themselves/outside of fetching the entire document.
Option 1: Flat Structure
{
"id": "28d54c37-18ra-4f39-8a41”,
“displayTitle: "Robot or human?",
"canonicalTopicId”: "6e124bcb-7caa-4fr66-8cdf",
"created": "2025-06-21T20:27:27.298Z",
"customTitle": null,
"lastEdited": "2025-06-21T20:27:27.298Z",
"lastInteraction": "2025-06-21T20:27:27.298Z",
"metadataOriginalURL": "https://www.walmart.com/search?q=Sunglasses&sort=price_low",
"metadataRemoteVideoURL": null,
"metadataTitle": "Robot or human?",
"metadataURL": "https://www.walmart.com/blocked?url=...",
"user_id": "foo",
"previewURL": "https://cdn.watercap.com/links_screenshots/loading.png",
"tagIds": ["0a990dda-f0ec-4534b-84c9"],
"url": "https://www.walmart.com/blocked?url=..."
}
Option 2: Nested Structure
{
"id": "28d54c37-18ra-4f39-8a41",
"canonicalTopicId”: "6e124bcb-7caa-4fr66-8cdf",
"customTitle": null,
"displayTitle": "Robot or human?",
“user_id”: "foo",
"url": "https://www.walmart.com/blocked?url=...",
"previewURL": "https://cdn.watercap.com/links_screenshots/loading.png",
"tagIds": ["0a990dda-f0ec-4534b-84c9"],
"timeStamps": {
"lastEdited": "2025-06-21T20:27:27.298Z",
"lastInteraction": "2025-06-21T20:27:27.298Z",
"created": "2025-06-21T20:27:27.298Z"
},
"metadata": {
"metadataOriginalURL": "https://www.walmart.com/search?q=Sunglasses&sort=price_low",
"metadataRemoteVideoURL": null,
"metadataTitle": "Robot or human?",
"metadataURL": "https://www.walmart.com/blocked?url=..."
}
}
Specific Questions
-
What are the hidden performance costs of nesting in terms of:
- Query speed (especially with indexed nested fields)
- Document size overhead
- Index building/parsing performance
-
Is there a general rule of thumb for when nesting makes sense vs when to keep things flat? I prefer cleaner structure but not at significant performance cost.
-
For future proofing, I want to avoid restructuring if I need to add indexes to currently non-indexed fields (like metadata). Does nesting create limitations here?
I understand some overhead (bytes, not megabytes) is acceptable. I’m looking for the right balance between clean architecture and performance.
Thank you