Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.6.x

Data inconsistencies

10 replies [Last post]
  • Login or register to post comments
Tue, 10/05/2010 - 07:04
Nathan
Offline
Joined: 10/04/2010
Groups: None

Dear,

First of all we're impressed by the work done by Membase.

We have a test setup of 5 membase servers (each with 24gb ram) running ubuntu 10.04 Server 64bit. The Membase servers are running version: 1.6.0beta4, the Cluster State ID is: 05A-010-15.

As a test I used a very simple php script to insert 1m values into membase and in a second stage to retrieve them:
[PHP]
<?php
error_reporting(E_ALL);
ini_set('display_errors', 1);
ini_set('max_execution_time', 36000);
ini_set('memory_limit', '2048M');

$keyName = 'settest-1';
$maxValueLength = 10;
$total = 1000000;

$memcache = new Memcached('test');
$memcache->addServers(
array(
array (
'127.0.0.1',
11211,
1
),
));

/**
* Set
*/
for ($index = 0; $index < $total; $index++)
{
$valLength = mt_rand(2, $maxValueLength);
$value = '';
for($valIndex = 0; $valIndex < $valLength; $valIndex++)
{
$value .= chr(mt_rand(35, 126));
}
$memcache->set($keyName.'-'.$index, $value);

}

/**
* Get test
*/
$errors = array();
for ($index = 0; $index < $total; $index++)
{
$r = $memcache->get($keyName.'-'.$index);
if($memcache->getResultCode() == Memcached::RES_NOTFOUND)
{
$errors[] = $keyName.'-'.$index;
}
}
var_dump($errors);
[/PHP]

Normally I would expect this to have no errors. But this is not the case. We have a miss rate of 5%, which seems like a little too much.

I have let the client run on a different server with a moxi proxy in front and one of the membase servers. Both with similar results.

Thank you,
Best regards,
Nathan Bijnens

Top
  • Login or register to post comments
Tue, 10/05/2010 - 07:33
Perry Krug
Offline
Joined: 06/02/2010
Groups: None

Thanks for the feedback Nathan!

Were you checking for any errors when setting the data? The only reason I can think of that some data is not there is because it failed to be set. Following that logic, the most common reason would be that the bucket temporarily did not have enough memory to accept one of the writes and if your code does not retry, that piece of data would be lost.

We should be able to see this if you send over the output of 'stats'. You should be able to telnet to any of the servers (through Moxi on port 11211 or even your localhost) and type 'stats'. Please paste the output here and I will look for any signs of those errors.

Also, what is the configuration of your bucket?

Thanks!

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Membase: http://www.membase.com/products-and-services/overview
Call or email "sales -at- membase -dot- com" today!

Top
  • Login or register to post comments
Tue, 10/05/2010 - 07:42
Nathan
Offline
Joined: 10/04/2010
Groups: None

Thank you for looking at this.

Bucket configuration:
[CODE]
Bucket Name: default
Bucket Type: Membase
Access Control: Password Replication: 2 copies
Blocking Behavior: do not wait for replication to complete; send 2 copies asynchronously
[/CODE]
stats output.
[CODE]
STAT delete_misses 0
STAT ep_io_num_write 960664781
STAT rejected_conns 0
STAT connection_structures 343
STAT limit_maxbytes 67108864
STAT decr_hits 0
STAT ep_pending_ops_max_duration 339 s
STAT ep_flush_duration_total 83668
STAT ep_item_flush_expired 0
STAT ep_too_young 0
STAT curr_connections 287
STAT rusage_system 70732.031250
STAT ep_io_write_bytes 25707064212
STAT ep_total_cache_size 9483784303
STAT ep_storage_age 0
STAT ep_flush_duration_highwat 2245
STAT ep_flush_duration 438
STAT cas_misses 0
STAT ep_flusher_todo 50972
STAT ep_pending_ops 0
STAT tap_mutation_received 4044977401
STAT mem_used 13855290505
STAT tap_mutation_sent 7867295821
STAT ep_warmup_oom 0
STAT ep_vbucket_del 8921
STAT get_misses 66728
STAT ep_num_value_ejects 0
STAT ep_queue_size 101715897
STAT bytes_read 270277859321
STAT get_hits 27850204
STAT tap_vbucket_set_received 750
STAT decr_misses 0
STAT ep_commit_num 138676
STAT rusage_user 377397.437500
STAT bucket_conns 266
STAT ep_num_non_resident 0
STAT ep_tap_keepalive 0
STAT ep_oom_errors 0
STAT ep_too_old 23405936
STAT cmd_flush 0
STAT ep_max_txn_size 250000
STAT ep_version 1.6.0beta4
STAT uptime 238937
STAT ep_data_age_highwat 17389
STAT ep_queue_age_cap 4500
STAT incr_hits 0
STAT time 1286289676
STAT ep_warmup_dups 404555360
STAT ep_total_persisted 973265348
STAT daemon_connections 25
STAT ep_flusher_state running
STAT pointer_size 64
STAT version 1.4.4_292_gc61961b
STAT ep_max_data_size 80908124160
STAT ep_commit_time_total 37670
STAT ep_warmup_time 1680
STAT ep_item_commit_failed 0
STAT total_connections 108903
STAT curr_items 54141621
STAT ep_data_age 7161
STAT delete_hits 0
STAT ep_storage_type featured
STAT curr_items_tot 82197625
STAT ep_total_enqueued 1085829748
STAT ep_mem_low_wat 48544874495
STAT ep_kv_size 9483784303
STAT ep_vbucket_del_fail 0
STAT ep_min_data_age 0
STAT ep_io_num_read 416008662
STAT ep_warmed_up 415382689
STAT ep_item_flush_failed 0
STAT cas_hits 0
STAT ep_warmup true
STAT ep_dbname /var/opt/NorthScale/1.6.0beta4/data/ns_1/default
STAT ep_commit_time 10
STAT auth_errors 0
STAT ep_bg_fetched 0
STAT ep_storage_age_highwat 18658
STAT threads 20
STAT pid 18315
STAT auth_cmds 108843
STAT cas_badval 0
STAT cmd_set 100563723
STAT ep_io_read_bytes 9616817626
STAT cmd_get 27916932
STAT ep_expired 0
STAT tap_vbucket_set_sent 1014
STAT conn_yields 143073096
STAT ep_warmup_thread complete
STAT ep_flush_preempts 112431
STAT tap_connect_received 13787
STAT ep_num_eject_failures 0
STAT bytes_written 10454310427
STAT libevent 1.4.13-stable
STAT ep_num_pager_runs 0
STAT ep_mem_high_wat 60681093120
STAT ep_dbinit 3
STAT incr_misses 0
STAT ep_pending_ops_total 622490
STAT ep_pending_ops_max 78729
STAT ep_overhead 4371506202
END
[/CODE]

Top
  • Login or register to post comments
Tue, 10/05/2010 - 07:54
Perry Krug
Offline
Joined: 06/02/2010
Groups: None

hmm, very odd. This is the stat that I expected to see increased: STAT ep_oom_errors 0

I don't see any failures overall. One curious thing is that you are setting 1m items, but the DB currently has 54141621 items in it. There's also a discrepancy between curr_items_tot (which includes items that have been deleted but not yet removed from the system) so it appears that there are many items that were put into the cache at one point but are no longer accessible (they would return a miss if you tried to get them).

Can you send a "flush_all" to the server, run your script once through and see if you encounter any misses? If so, please send over another output of the stats command?

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Membase: http://www.membase.com/products-and-services/overview
Call or email "sales -at- membase -dot- com" today!

Top
  • Login or register to post comments
Tue, 10/05/2010 - 07:59
Nathan
Offline
Joined: 10/04/2010
Groups: None

It's correct. Currently there are 54M items in the db. I'll try again after a flush.

Thank you!

Nathan

Top
  • Login or register to post comments
Tue, 10/05/2010 - 08:01
Nathan
Offline
Joined: 10/04/2010
Groups: None

Here are the results for stats just after the flush:
[CODE]
flush_all
OK
stats
STAT delete_misses 0
STAT ep_io_num_write 960664807
STAT rejected_conns 0
STAT connection_structures 343
STAT limit_maxbytes 67108864
STAT decr_hits 0
STAT ep_pending_ops_max_duration 49 s
STAT ep_flush_duration_total 83674
STAT ep_item_flush_expired 0
STAT ep_too_young 0
STAT curr_connections 288
STAT rusage_system 70920.875000
STAT ep_io_write_bytes 25707064212
STAT ep_total_cache_size 3236475484
STAT ep_storage_age 0
STAT ep_flush_duration_highwat 2245
STAT ep_flush_duration 442
STAT cas_misses 0
STAT ep_flusher_todo 50972
STAT ep_pending_ops 0
STAT tap_mutation_received 4044977401
STAT mem_used 7607981734
STAT tap_mutation_sent 7867295821
STAT ep_warmup_oom 0
STAT ep_vbucket_del 8947
STAT get_misses 66728
STAT ep_num_value_ejects 0
STAT ep_queue_size 101715899
STAT bytes_read 270278070064
STAT get_hits 27850204
STAT tap_vbucket_set_received 750
STAT decr_misses 0
STAT ep_commit_num 139853
STAT rusage_user 378463.781250
STAT bucket_conns 267
STAT ep_num_non_resident 0
STAT ep_tap_keepalive 0
STAT ep_oom_errors 0
STAT ep_too_old 23405936
STAT cmd_flush 5
STAT ep_max_txn_size 250000
STAT ep_version 1.6.0beta4
STAT uptime 240119
STAT ep_data_age_highwat 17389
STAT ep_queue_age_cap 4500
STAT incr_hits 0
STAT time 1286290858
STAT ep_warmup_dups 404555360
STAT ep_total_persisted 973265348
STAT daemon_connections 25
STAT ep_flusher_state running
STAT pointer_size 64
STAT version 1.4.4_292_gc61961b
STAT ep_max_data_size 80908124160
STAT ep_commit_time_total 37674
STAT ep_warmup_time 1680
STAT ep_item_commit_failed 0
STAT total_connections 108904
STAT curr_items 0
STAT ep_data_age 7161
STAT delete_hits 0
STAT ep_storage_type featured
STAT curr_items_tot 28056004
STAT ep_total_enqueued 1085829753
STAT ep_mem_low_wat 48544874495
STAT ep_kv_size 3236475484
STAT ep_vbucket_del_fail 0
STAT ep_min_data_age 0
STAT ep_io_num_read 416008662
STAT ep_warmed_up 415382689
STAT ep_item_flush_failed 0
STAT cas_hits 0
STAT ep_warmup true
STAT ep_dbname /opt/NorthScale/1.6.0beta4/data/ns_1/default
STAT ep_commit_time 8
STAT auth_errors 0
STAT ep_bg_fetched 0
STAT ep_storage_age_highwat 18658
STAT tap_flush_sent 2
STAT threads 20
STAT pid 23935
STAT auth_cmds 108844
STAT cas_badval 0
STAT cmd_set 100563723
STAT ep_io_read_bytes 9616817626
STAT cmd_get 27916932
STAT ep_expired 0
STAT tap_vbucket_set_sent 1014
STAT conn_yields 143073096
STAT ep_warmup_thread complete
STAT ep_flush_preempts 113607
STAT tap_connect_received 13787
STAT ep_num_eject_failures 0
STAT bytes_written 10504782258
STAT libevent 1.4.13-stable
STAT ep_num_pager_runs 0
STAT ep_mem_high_wat 60681093120
STAT ep_dbinit 3
STAT incr_misses 0
STAT ep_pending_ops_total 622490
STAT ep_pending_ops_max 78729
STAT ep_overhead 4371506250
END[/CODE]

Top
  • Login or register to post comments
Tue, 10/05/2010 - 08:20
Nathan
Offline
Joined: 10/04/2010
Groups: None

Ok I have run it again.

Something very strange; I get exactly the same misses (same keys...). Maybe it's a client side bug; I'll do some further tests against a standard memcached server (see if it has the same problem).

[CODE]
STAT delete_misses 0
STAT ep_io_num_write 971398399
STAT rejected_conns 0
STAT connection_structures 357
STAT limit_maxbytes 67108864
STAT decr_hits 0
STAT ep_pending_ops_max_duration 339 s
STAT ep_flush_duration_total 83669
STAT ep_item_flush_expired 0
STAT ep_too_young 0
STAT curr_connections 291
STAT rusage_system 71401.414062
STAT ep_io_write_bytes 25954177798
STAT ep_total_cache_size 333414234
STAT ep_storage_age 1
STAT ep_flush_duration_highwat 1897
STAT ep_flush_duration 0
STAT tap_flush_received 3
STAT cas_misses 0
STAT ep_flusher_todo 50972
STAT ep_pending_ops 0
STAT tap_mutation_received 4065569772
STAT mem_used 4835009461
STAT tap_mutation_sent 7896943293
STAT ep_warmup_oom 0
STAT ep_vbucket_del 10344
STAT get_misses 66728
STAT ep_num_value_ejects 0
STAT ep_queue_size 104460928
STAT bytes_read 271661045613
STAT get_hits 28667433
STAT tap_vbucket_set_received 632
STAT decr_misses 0
STAT ep_commit_num 141435
STAT rusage_user 380993.218750
STAT bucket_conns 270
STAT ep_num_non_resident 0
STAT ep_tap_keepalive 0
STAT ep_oom_errors 0
STAT ep_too_old 23405936
STAT cmd_flush 12
STAT ep_max_txn_size 250000
STAT ep_version 1.6.0beta4
STAT uptime 241246
STAT ep_data_age_highwat 17038
STAT ep_queue_age_cap 4500
STAT incr_hits 0
STAT time 1286291985
STAT ep_warmup_dups 404394994
STAT ep_total_persisted 983918622
STAT daemon_connections 25
STAT ep_flusher_state running
STAT pointer_size 64
STAT version 1.4.4_292_gc61961b
STAT ep_max_data_size 80908124160
STAT ep_commit_time_total 37530
STAT ep_warmup_time 1664
STAT ep_item_commit_failed 0
STAT total_connections 117213
STAT curr_items 934639
STAT ep_data_age 10
STAT delete_hits 0
STAT ep_storage_type featured
STAT curr_items_tot 2901398
STAT ep_total_enqueued 1099316267
STAT ep_mem_low_wat 48544874495
STAT ep_kv_size 333414234
STAT ep_vbucket_del_fail 0
STAT ep_min_data_age 0
STAT ep_io_num_read 410656932
STAT ep_warmed_up 410031041
STAT ep_item_flush_failed 121676
STAT cas_hits 0
STAT ep_warmup true
STAT ep_dbname /var/opt/NorthScale/1.6.0beta4/data/ns_1/default
STAT ep_commit_time 0
STAT auth_errors 0
STAT ep_bg_fetched 0
STAT ep_storage_age_highwat 18309
STAT tap_flush_sent 31
STAT threads 20
STAT pid 18315
STAT auth_cmds 117153
STAT cas_badval 0
STAT cmd_set 101987849
STAT ep_io_read_bytes 9482751143
STAT cmd_get 28734161
STAT ep_expired 0
STAT tap_vbucket_set_sent 1220
STAT conn_yields 143909365
STAT ep_warmup_thread complete
STAT ep_flush_preempts 113733
STAT tap_connect_received 17996
STAT ep_num_eject_failures 0
STAT bytes_written 10556910062
STAT libevent 1.4.13-stable
STAT ep_num_pager_runs 0
STAT ep_mem_high_wat 60681093120
STAT ep_dbinit 3
STAT incr_misses 0
STAT ep_pending_ops_total 622490
STAT ep_pending_ops_max 78729
STAT ep_overhead 4501595227
END
[/CODE]

Thanks for your help!

Best regards,
Nathan

Top
  • Login or register to post comments
Tue, 10/05/2010 - 09:35
Perry Krug
Offline
Joined: 06/02/2010
Groups: None

Thanks Nathan. Looking at your latest stats, the DB only has: STAT curr_items 934639 which is obviously a few less than 1m so that would explain why you can't get them out.

The server doesn't seem to be reporting/recording any errors so I would think that it's on your client side. The code is very simple though which confuses me.

You might want to try starting with a lower number and then ramping up...if it still doesn't work at a few hundred keys then it will be easier to diagnose.

Let me know what I can do to help.

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Membase: http://www.membase.com/products-and-services/overview
Call or email "sales -at- membase -dot- com" today!

Top
  • Login or register to post comments
Wed, 10/06/2010 - 01:36
Nathan
Offline
Joined: 10/04/2010
Groups: None

Good morning,

I've tried again with a few of the offending keys, this time straight with telnet to one of the membase servers (no moxi proxy).

[CODE]
flush_all
OK
set test 1 0 11
Hello world
STORED

set settest-1-1 1 0 11
Hello world
SERVER_ERROR proxy write to downstream

set settest-1-0 1 0 11
Hello world
STORED
[/CODE]

The SERVER ERROR proxy write to downstream: I don't really know why this one comes up.

I have also tested my code against a normal memcached server (no cluster), without error.

I also tested it against a completely new, single instance, membase server. This was without error. I'll try again with a new cluster setup.

Thank you for your help so far.

Best regards,
Nathan

Top
  • Login or register to post comments
Wed, 10/06/2010 - 10:04
Perry Krug
Offline
Joined: 06/02/2010
Groups: None

Nathan, looking into this now.

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Membase: http://www.membase.com/products-and-services/overview
Call or email "sales -at- membase -dot- com" today!

Top
  • Login or register to post comments
Thu, 10/07/2010 - 00:46
Nathan
Offline
Joined: 10/04/2010
Groups: None

I've tried again with a newly configured cluster (this time 2 servers instead of 5). The problem disappeared, however it's possible that it might return. I'll keep you posted.

Thanks for your help,

Nathan

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker