4.1.0-EE vs 4.1.1-EE: indexer too slow

Comparing 4.1.1-EE (Version: 4.1.1-5914 Enterprise Edition (build-5914)) with 4.1.0 (Version: 4.1.0-5005 Enterprise Edition (build-5005)) on 3-node cluster (same application code is used). For design docs (for one bucket only) on both clusters buckets {“updateMinChanges”:1,“replicaUpdateMinChanges”:1} are set + “updateInterval”:1000 (globally). Indexed bucket has 2 DD (first DD with 1 view, second DD with 2 views). Total document count is less then 1000, insertion rate is less then 10 docs per second. Value eviction mode is used.

Problem: 4.1.0-EE works normal while indexing; 4.1.1-EE is just unable to index fast enough => application tests show errors; output shows that view call returning not all results it should (comparing to same tests for 4.1.0). Even “sitting and watching ‘views’ web-interface page”, i see “indexing …%” is slooooow on 4.1.1; and “almost don’t see ‘indexing %’ on 4.1.0” during applications tests.

Error.log and indexer.log both have no error messages

Assumption: 4.1.1-EE indexer is broken. IMHO, seriously broken.
Has anyone got same results ?

UPDATE: further testing with different hardware configurations shows that the real cause is that 4.1.1-indexer probably consumes more memory, or uses completely different algorithm than 4.1.0-indexer; so it’s not a bug.

UPDATE2: IMHO, something terribly wrong was made with 4.1.1’s indexer. 4.1.0 allowed to fit 2 buckets within 1GB RAM and indexer worked very fast. 4.1.1 shows “periodical indexation slowing” even with 2GB RAM. “Minimal official requirement” of 4GB makes indexer work well. Well, “officialism rules”, ok, no problem, that’s the real life is.

P.S. But maybe, there are some developers, who would like to ask themselves “what have we really done with indexer service between 4.1.0 and 4.1.1 ?”…

UPDATE3: varying number of cpus 1…4 also allows to boost up indexer, but the problem completely solved only for 4VCPU + 4GB RAM. So sad :frowning: 1VCPU + 1GB RAM was enough for 4.1.0

Thanks @grep, apologies for the issue.

We re not seeing the issue in our internal testing on views. I should add that we do our testing on the minimal HW spec specified here. So smaller footprints you have may be the reason you are noticing the issue and we are not.

@vmx may also be able to comment on what else has changed in the core indexing in Map Reduce Views between 4.1.1 and 4.1
thanks
-cihan

Hi @grep,

could you please upload your logs. For instructions please see http://www.couchbase.com/wiki/display/couchbase/Working+with+the+Couchbase+Technical+Support+Team#WorkingwiththeCouchbaseTechnicalSupportTeam-CouchbaseServerLogs

If that doesn’t work for you, it would be great if you could zip the whole logs from both servers and upload them somewhere so that I can have a look.

Cheers,
Volker

ok, but this will take a while, because it’s a time-taking cycle “deploy-setup-test” for different configurations.

Probably 2 sets of logs 4.1.1-EE-[1+1] and 4.1.0-EE-[1+1] is enough: 4+4 has not that problem.

Yes sure, one log collection each for the same config is enough.

  1. Is overwrite possible for different log groups (i mean: “Please leave the “Upload to host:” option as the default s3.amazonaws.com/cb-customers”) ?
  2. Should i increase indexer log level (“Setting->Indexer log level”) ?

@grep: It’s timestamped, so things won’t be overwriten.
The “indexer log level” doesn’t matter here as you use views and not GSI.

Done. Client name = grep

  1. 4.1.0-EE[1+1] with perfectly passed application tests (tests begin after flush @~7:52)
  2. 4.1.1-EE[1+1] with failed same application tests (begin @~8:17)

Same tests, but in second case “indexer unable to index fast enough” to return full list of results via view for test (STALE=OK is used inside tests code)

I don’t want to publish direct links to logs here; do you have such collected logs access by “Client name” or there is another way i could send you these links ?

@grep, I got the logs, I’ll have a look.

@grep, there certainly is an issue. I have a look. This may take a while.

Does your test contain any sensitive data? if not, it would be great if I could somehow get access to the code, it will make it way easier for us to reproduce the issue in-house.

Well, “there certainly is an issue” sounds good, because it means, that last day, fully burdened with “different configurations testing” is not wasted :slight_smile:

About “if I could somehow get access to the code”, it’s complicated, but i see possible solution in this way: i can try to reproduce “problematic data input” separately. But i think, this is out of bounds for this topic, so, can you send me a private message with your e-mail so we can discuss it further ?

@grep, I’m able to reproduce it locally. It was easier than expected.

Cool!
Is it a bug ? Or just a misconfiguration of 4.1.1-EE “default params” (whatever they are)?
UPDATE: Well, thinking deeper about my last “testing day” and previous days:

@vmx,
can you please provide any additional info about this issue ? is it a bug ? does temporary workaround exist?

@grep, I’m on it. We’ll see.

@grep: I’ve opened a Jira issue, please follow that one for further progress on the issue: https://issues.couchbase.com/browse/MB-19503

@vmx,
great job!
Thanks.

@vmx,
there is something strange with results using view (4.1.1-EE): even if i do “pause” for “long enough settle all delays with indexing” (for testing purposes it is 50 seconds now), it seems like (from time to time, it may be 3,4,5 …N-th tests-cycle) view returns not all results. more deeper checks needed (probably, this is my tests mistakes), but i would like to clarify: can https://issues.couchbase.com/browse/MB-19503, theoretically, cause view to return “incomplete set of results” (i.e. not all that was upserted and should be indexed)?
[UPDATE] no, it’s more like my tests problem

@cihangirb, @vmx
could you please explain one thing for me: as i see https://issues.couchbase.com/browse/MB-19503 has mark “Fix Version/s: watson”. Does it mean, that there will be no fix of this bug for 4.1.X releases (4.1.2 etc.)? And If it is so, is there a way to ask respected developers, who are (i hope) going to find a solution for this bug, also to include 4.1.X branch as “Fix Version/s” ?

@grep, that’s right, “Watson” is the current target. We might backport a fix but there’s no guarantee (we can’t backport all fixes to all releases). Best is if you leave a comment on the issue (your forum credentials should work on Jira as well), so that everyone is aware that you’d like to have a fix for 4.1.x.