Query View with include_docs option

gruzovator · December 16, 2014, 6:52am

Hi.

Python SDK (v.1.2.4) has a kind of “memory leak” when query view with “include_docs” set to True.

I have about 20M docs in my couchbase (one server 30GB of RAM for my bucket) and view that contains about 15M of docs.
I’m trying to iterate via my index to do some aggregation. I’m using “include_docs” option to get docs. But python script memory usage constantly grows (1G, 2G, 3G, etc).

Code:

def run(server, bucket):
    cb = couchbase.Couchbase.connect(host=server, bucket=bucket, timeout=100)
    for item in cb.query('myhouse', 'houses', streaming=True, include_docs=True):
        pass 

def main():
    # .... parsing arguments
    run(args.server, args.bucket)

(Couchbase Server version - 2.5)

gruzovator · December 16, 2014, 7:05am

I’ve tried to switch off include_docs option and get docs by cb.get() call - problem remains.
Then i’ve tried to use another connection object for getting docs and everything is ok.

So, there is some interference between view query and get requests when using same connection object.

mnunberg · December 16, 2014, 7:13am

There is some interference there, but nothing I can think of which would affect memory usage in particular. One thing to note however, and this may be true especially if you have many documents to fetch, is that adding more pending operations (like include_docs, or fetching documents while iterating) will actually prolong the “network wait” and allow more data to be buffered into the server; countering to some extent, the impact of streaming=True. To understand what is happening, understand what happens when the library does I/O:

In a normal view request, the library issues an HTTP request. The HTTP results stream in from the server. Once some rows have arrived, the Python extension will ask the library to “Break” from performing I/O
When you use include_docs, you actually tell the library to resume all I/O; this includes fetching pending data in the HTTP stream (which you aren’t yet prepared to handle anyway)

gruzovator · December 16, 2014, 7:57am

Thanks for reply.

Could you tell me what strategy should i use to fetch many documents via view query? Separate connections for query and fetching?

mnunberg · December 16, 2014, 8:13am

I guess that would make sense for your particular use case. Normally I’d say to just use the include_docs; but indeed maintaining two connections would reduce the cross-chatter between them.

Topic		Replies	Views
Error when using include_docs Couchbase Server	2	2329	October 22, 2014
What happens to memory when iterating through bunch of documents using 'include_docs'? Ruby SDK	2	3061	July 22, 2015
Include_docs in viewquery Node.JS2.0 Node.js SDK	12	3841	July 27, 2016
Java client include docs option Java SDK	3	2389	November 20, 2014
Memory allocation for Views double the view content?! Couchbase Server	8	2231	April 23, 2015

Query View with include_docs option

Related topics