Data Modeling / Querying Nested Document References

Hello,
I am having trouble figuring out how to model my data for the most efficient and easy query-ability

I have two types of documents, Election and Candidate. The Election documents will have an internal ‘id’ field and a ‘candidates’ array which will be an array of internal ids of Candidate documents stored in the same bucket. The Candidate documents will have an internal ‘id’ field (the one referenced in the Elections’ candidates array) and a ‘name’ field. Both documents will also contain a ‘doc_type’ field.

Example Election Doc

{
'doc_type': 'election',
id: 'e1',
candidates: ['c1' , 'c2']
}

Example Candidate Doc

{
'doc_type': 'candidate',
id: 'c1'
name: 'George Washington'
}

Right now the keys I’m using for the documents have the following format: doc_type:id
So an example of a candidate key would be ‘candidate:c1’.
This means that the document key would be ‘candidate:c1’ but the document id would be ‘c1’.

I need my application to allow a PUT/ of Election objects and Candidate objects with predefined ids.
Right now when a user hits the route PUT /elections/e1 with the following body,

{
candidates: ['c1' , 'c2']
}

I take that id, assign it to the doc, create the key as defined above, and put that doc in the db. So my resulting doc and key look like:

Key: election:e1
{
'doc_type': 'election',
id: 'e1',
candidates: ['c1' , 'c2']
}

I also allow PUT of Candidates with similar process described above, but the problem is that my Candidate keys are therefore different than the candidate ids in the Elections’ ‘candidates’ arrays (i.e. candidate:c1 vs c1). This is problematic because I want the key structure to be abstracted away from the user, so as to allow them to use a ‘candidates’ array of ids, but I also want to be able to efficiently query the full Candidate objects referenced by the ids in that Elections ‘candidates’ array, and there could potentially be thousands of candidates for any given election. In this post N1QL Join on Ref Ids I was trying to use N1QL to perform a query that would do something like that but as @geraldss said “N1QL JOIN / NEST currently require that one document be able to produce the primary key of the other document.” So since my candidate ids are different than my candidate keys I’m kind of stuck. So my question has two parts:

  1. Does my data modeling technique seem to be ok or should I change how I handle keys vs ids?

  2. How do I efficiently do the querying given my key vs id problem?

Hi, you can keep your documents as they are. Here is the updated query.

SELECT OBJECT_REMOVE(e, “contests”).*, c AS contests
FROM main e
NEST main c ON KEYS ARRAY “contest:” || TO_STR© FOR c IN e.contests END
WHERE e.doc_type = ‘election’ AND e.id = ‘e1’
;

@geraldss Wow nice, that works, thanks! But I’m still wondering if this data model is a good idea? If I didn’t have N1QL, would I have to:

  1. GET election
  2. Loop through contests array in application, appending ‘contest:’ to the ids
  3. Do a getMulti for all those keys
    Seems pretty inefficient?

Well, you do have N1QL :slight_smile:

The data model is fine. One consideration is the distribution of candidates per election. If almost all elections have only a few candidates, the data model is fine. If many elections have hundreds or thousands of candidates, you may want to just have the candidate have a reference to the election id. In fact, you should do that whether or not you keep a candidates array in the election document.

If an election object has a large number of candidates, then the election objects are larger, and updating them is more expensive (including adding or removing candidate id’s).

Awesome ok thanks so much for all the help today @geraldss

1 Like