Large Documents + Slowness

I'm observing some slowness in my query and I'm thinking its to do with the way N1QL works. I have my CBQ instance running locally and am connecting to a remote cluster. If I select a doc which is small in size the response is immediate. A large doc takes several seconds, even if if i do a small subselection.

Am I correct in assuming that if you use select object.subsection.thing - the CBQ engine will get the full object from the cluster and then do the subselection in the N1QL engine itself? My thinking is the slowness is because the engine is pulling back the full document and then parsing it for me in CBQ-Engine?

What would you recommend in terms of deploying CBQ-Engine for best performance? On one node of the cluster? Can we load balance in some way?

2 Answers

« Back to question.

Yes, you are correct about how N1QL works in the developer previews. It fetches the whole document. Changing this will be a significant change because it impacts subsystems outside of the query engine (e.g. the data engine). We're aware of the issue and can provide more specific info when ready.

The query engine will live in cluster nodes (and can right now), but because we're a distributed system, the query engine and data node can always be on separate nodes for a particular document fetch.

« Back to question.

Further to my last post here are the queries.

Large (2.2mb document)
select buc.entity.clientid from buc where buc.entity.objectid = 552910680
Creates huge network spike on local machine where CBQ is running, even for small response.

Small (30KB)
select buc.entity.clientid from buc where buc.entity.objectid =  552394365
Create no discernable network spike

The field being selected is one small number, yields this response in both instances

 {
  "resultset": [
    {
      "clientid": 552901576
    }
  ]
}

Hence I guess all the json is returned from Couch to CBQ for processing. If this is the case, I'm wondering whether there are plans for N1QL to live 'in' the cluster. It would make sense from an IO perspective.

As to whether or not the roadmap contains moving N1QL's engine 'inside' the cluster, I am unsure, but I think it's important to remember that these early stages are just Developer Previews and optimisations will be ongoing throughout the product's lifecycle.

Thanks both Robin and Gerald for responding. I am a little confused about the plan for where N1QL will live long term, but am more interested in performance and will not try to second guess the topology of a sophisticated distributed system.

We moved CBQ on to one of our nodes and are now seeing better retreival times for a large document - I guess because the data is just moving within the cluster - However, we do see an occasional spike, which I guess is due to it going to another node in the cluster to get the data.

I understand consistency of response is a design goal of the project, and this seems to fall outside of that core requirement. What are the plans to ensure that a given response from N1QL is always consistent in terms of time taken?

pmckenna,

Over time N1QL like any other query language will be tightly integrated with the core server and also likely run in parallel on multiple nodes. As you know, we are still in dev preview and are looking for feedback on the core language itself. Performance is extremely important to us at Couchbase and that will include performance of N1QL when we productize and integrate with the server. There will be several different optimizations that will be needed and that we are actively investigating.