Couchbase Data Structure Design: Flat vs Nested Documents?

I’m architecting a new Couchbase database that will run on Capella w/Sync Gateway for multiclient synchronization. I would appreciate advice on document structure design, specifically regarding the performance implications of nested vs flat documents.

Context

  • Infrastructure: Couchbase Capella with Sync Gateway
  • Access Pattern: Multiple clients reading/writing through app endpoints
  • Data Visibility: Mix of public data (all users) and user-specific channels
  • Load Profile:
    • Reads: Burst of requests on app load, then <1 request/second
    • Writes: Infrequent, just a few during app lifecycle
  • Constraints: No strict latency or storage requirements

Question
I’m debating between two document structures and want to understand the performance trade-offs. For the example doc we are frequently filtering/sorting by timestamp fields and therefore indexing the individual timestamp fields. Metadata fields are not indexed or commonly accessed by themselves/outside of fetching the entire document.

Option 1: Flat Structure

{
  "id": "28d54c37-18ra-4f39-8a41”,
  “displayTitle: "Robot or human?",
  "canonicalTopicId”: "6e124bcb-7caa-4fr66-8cdf",
  "created": "2025-06-21T20:27:27.298Z",
  "customTitle": null,
  "lastEdited": "2025-06-21T20:27:27.298Z",
  "lastInteraction": "2025-06-21T20:27:27.298Z",
  "metadataOriginalURL": "https://www.walmart.com/search?q=Sunglasses&sort=price_low",
  "metadataRemoteVideoURL": null,
  "metadataTitle": "Robot or human?",
  "metadataURL": "https://www.walmart.com/blocked?url=...",
  "user_id": "foo",
  "previewURL": "https://cdn.watercap.com/links_screenshots/loading.png",
  "tagIds": ["0a990dda-f0ec-4534b-84c9"],
  "url": "https://www.walmart.com/blocked?url=..."
}

Option 2: Nested Structure

{
  "id": "28d54c37-18ra-4f39-8a41",
  "canonicalTopicId”: "6e124bcb-7caa-4fr66-8cdf",
  "customTitle": null,
  "displayTitle": "Robot or human?",
  “user_id”: "foo",
  "url": "https://www.walmart.com/blocked?url=...",
  "previewURL": "https://cdn.watercap.com/links_screenshots/loading.png",
  "tagIds": ["0a990dda-f0ec-4534b-84c9"],
  "timeStamps": {
    "lastEdited": "2025-06-21T20:27:27.298Z",
    "lastInteraction": "2025-06-21T20:27:27.298Z",
    "created": "2025-06-21T20:27:27.298Z"
  },
  "metadata": {
    "metadataOriginalURL": "https://www.walmart.com/search?q=Sunglasses&sort=price_low",
    "metadataRemoteVideoURL": null,
    "metadataTitle": "Robot or human?",
    "metadataURL": "https://www.walmart.com/blocked?url=..."
  }
}

Specific Questions

  1. What are the hidden performance costs of nesting in terms of:

    • Query speed (especially with indexed nested fields)
    • Document size overhead
    • Index building/parsing performance
  2. Is there a general rule of thumb for when nesting makes sense vs when to keep things flat? I prefer cleaner structure but not at significant performance cost.

  3. For future proofing, I want to avoid restructuring if I need to add indexes to currently non-indexed fields (like metadata). Does nesting create limitations here?

I understand some overhead (bytes, not megabytes) is acceptable. I’m looking for the right balance between clean architecture and performance.

Thank you

1 Like

Mostly just the extra bytes. Is there any value in grouping the timestamps? Do you always want to have to refer to the lastEdited timestamp as timeStamps.lastEdited?

Indexing depends on the indexes you are using, and indexes depends on how you want to query. It doesn’t matter to the indexer if the index is on timeStamps.lastEdited or lastEdited. Likewise metadata.metadataTitle vs metadataTitle. It’s not likely that indexing on all of metadata would be useful - because you probably want to be able to query on individual attributes

Beyond that, there are the same advantages as any structured data (xml, json etc).
The one difference I see is that with the nested form, you could get all the metadata with ‘metadata’ (or metadata.*), which could be useful if additional metadata attributres were added.

1 Like

There would be hardly any performance difference between option 1 and option 2.
@mreiche explained the details already :slight_smile:

You would want to decide the nesting based on how logically the fields are connected, for cleaner readability and accessibility of those fields in your code.
In your case, the ‘metadata’ fields look like they could be nested together (they could avoid the redundant metadata prefix in their names, though :slight_smile: ). The ‘timestamp’ fields…not really.

1 Like