Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Couchbase | Couchbase Server 2.0

Indexing reporting 100% for an hour+, but less than 10% actually indexed

5 replies [Last post]
  • Login or register to post comments
Wed, 01/30/2013 - 17:50
cculler
Offline
Joined: 01/24/2013
Groups: None

New view/index is reporting "100%" for over an hour, yet the data is clearly not even close to indexed.

CB Version 2.2.0, Win2008R2, 16GB, 8-core, 3.4GHz, single node.
1.9M docs, ~800MB.

While trying to find an answer to my problem, I found the following CB issue that sounds exactly like what I have: http://www.couchbase.com/issues/browse/MB-6640
However, MB-6640 is marked as a duplicate, and closed. Although there is another issue (CBD-74) referenced within MB-6640, I cannot open CBD-74. (Permissions?) I'm hoping the answer to my issue resides in there.

I can see my server is still busy grinding away at the index, despite the "100%" report. I can get a rough idea of the progress by querying my view. Right now I'm getting about 10% of the expected count, suggesting that the index creation is about 10% done.

Is this normal? If yes, how do I get a real index progress report?
Why can't I access CBD-74, yet I can access other issues?

Top
  • Login or register to post comments
Wed, 01/30/2013 - 22:26
cculler
Offline
Joined: 01/24/2013
Groups: None

UPDATE: after 5 hours 20 minutes, the index is 66% complete by my estimation, and the index file on disk is at ~60GB. I hope this is my doing, and not CB. I really had high hopes for CB to handle this specific aggregate.

Top
  • Login or register to post comments
Thu, 01/31/2013 - 00:04
dipti
Offline
Joined: 11/02/2011
Groups:

I'm assuming you are creating the index after inserting the data.

Can you share more details on what your views look like?

Have you optimized them per the guidance here: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-views-writi... ?

Also, how many design docs do you have?

Top
  • Login or register to post comments
Thu, 01/31/2013 - 07:57
cculler
Offline
Joined: 01/24/2013
Groups: None

Answers to your questions:

I had not seen the optimization guide, but it appears I followed most of it. Even so there are some clear departures from the guide. The relationship between view values and reduce was unexpected after reading the material on how to create a view/reduce.

I have a single design doc, and a single view. I inserted 50,000 test docs before writing and experimenting with the view and reduce. Once I had good results I published the view, waited for indexing to complete (a minute or so) and verified I had correct results. THEN I bumped the test data up to 2,000,000 docs (40 copies of the original test data with unique docIds).....and started the clock. This morning I verified the view is producing correct results. I also see that size on disk has shrunk back to 1.5GB. I'm not sure how big it eventually got, but I know it was up to 60GB at one point.

After reading the optimization guide, and doing some more thinking, it occurs to me I might have had faster results by emitting a 1 for the value, and then using a simple _sum. I'm not sure.

Here is an example doc and the code (field names have been changed, so don't fret over the implied model too much). Theoretically the attributes array can be up to 1000 entries, but typically is less than 20. Reduce is outputting pairs of attributes, and a count of the number of animals on record that have those two attributes. There may be 100 kennels represented, and (eventually) when I query the view I'll want to limit the output to a single kennel at a time.

I have lots of ideas on how to compact the doc schema to save space (like brief names, and separate arrays of simple ints for the attributeIds and auditIds), but I don't think that is relevant to the indexing performance issue.

{
   "type": "animalAttribute",
   "kennel": "Seattle",
   "animalId": 51820,
   "deceased": false,
   "attributes": [
       {
           "attributeId": 2,
           "auditId": 1286783
       },
       {
           "attributeId": 7058,
           "auditId": 1300999
       },
       {
           "attributeId": 7068,
           "auditId": 1286783
       },
       {
           "attributeId": 7069,
           "auditId": 1286783
       },
       {
           "attributeId": 7074,
           "auditId": 1302483
       },
       {
           "attributeId": 7077,
           "auditId": 1327085
       },
       {
           "attributeId": 7089,
           "auditId": 1459734
       },
       {
           "attributeId": 7090,
           "auditId": 1538100
       },
       {
           "attributeId": 7091,
           "auditId": 1606227
       }
   ]
}
 
function (doc, meta) {
  if (doc.type == "animalAttribute" && doc.deceased == false){
    for(var col=0; col<doc.attributes.length; col++){
      for(var row=0; row<doc.attributes.length; row++){
        if (col <= row){
          emit([doc.kennel, doc.attributes[col].attributeId, doc.attributes[row].attributeId], null);
        }
      }
    }
  }
}
 
 
function (key, values, rereduce){
  if (rereduce){
    return sum(values);
  }
  else{
    return values.length;
  }
}

Example output:

{"rows":[
{"key":["Seattle",2,2],"value":1429},
{"key":["Seattle",2,7058],"value":1412},
{"key":["Seattle",2,7059],"value":4},
{"key":["Seattle",2,7062],"value":34},
{"key":["Seattle",2,7064],"value":16},
{"key":["Seattle",2,7067],"value":4},
{"key":["Seattle",2,7068],"value":1391},
{"key":["Seattle",2,7069],"value":1391},
{"key":["Seattle",2,7073],"value":1},
{"key":["Seattle",2,7074],"value":1395}
]
}

Top
  • Login or register to post comments
Thu, 01/31/2013 - 08:31
cculler
Offline
Joined: 01/24/2013
Groups: None

UPDATE: _sum of 1's does appear to work, and so does _count (of nulls), as one might expect. The strange thing is, I know I tried _count yesterday and got bad results. Unfortunately I don't have a copy of that view definition for reference.

Retesting with the above change will require flushing my data and restarting, so I'll hold off until I know you have no interest in my current index/data.

Top
  • Login or register to post comments
Fri, 02/01/2013 - 13:04
cculler
Offline
Joined: 01/24/2013
Groups: None

Retested with simple _count reduce. Made timed tests while updating 0.5% (0.005) of the data. Here's what I did:

- flushed data.
- changed reduce to use _count.
- inserted the 1.9M test docs.
- waited until next morning (all was quiet/complete, and returning correct query results).
- started a timer.
- overwrote 10k existing docs (approx 1/2 of 1% of the docs, or ~5MB out of ~1GB).
- observed the following:
- time to send the data to the cluster was almost instant.
- disk queue drain rate remained in single digits for nearly a minute, then went to ZERO for ~20 seconds, and then resumed in double-digits (~50 items/sec), where it remained.
- indexing progress bar appeared after 0:3:30.
- at 0:4:54, indexing reported 100%, and disk queue reached 0, but CPU remains high, and query results are not correct.
- indexing progress bar is finally hidden at 0:6:10, CPU returns to normal, and query results are correct.

Repeated runs of the same test showed much better drain rates, but indexing is still very slow, and even the improved drain rate (~150) is far slower than it ought to be. Average time to index that ~1 second of update activity is approximately 0:3:30. Extrapolating the above numbers suggests that updating 1% of a bucket that is currently consuming 8GB per node, with single map/reduce, will take about an hour to index. Does this sound right to anybody? I really, really want to be wrong here.

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker