Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.6.x

Rebalance couldn't succeed

18 replies [Last post]
  • Login or register to post comments
Fri, 12/10/2010 - 15:16
azreal
Offline
Joined: 12/10/2010
Groups: None

 Hi, 

    I created a small cluster of three servers using the lastest version of Membase server. 

    Each server has the same hardware with 2GB memory and single core of 2GHZ. 

    Test: I was able to write 10gb data (100000 files of 10KB) into the cluster. I assign 80% as the quota and only write to a single membase bucket.

             After that, I add a new server with the same config and trying to rebalance. I could never get pass it and and from the UI I found some of the data, about 1% is lost)

    Question: I was only able to rebalance if I remove the fourth server. I'm wondering if I have done something wrong. Do I need more memory to rebalance. Please suggest. 

 

Thanks,

Bin

Top
  • Login or register to post comments
Fri, 12/10/2010 - 15:59
bhawana@membase
Offline
Joined: 10/29/2010
Groups: None

Hi Bin,

 

You most likely need more memory. If you give me the following details I can explain how you should size your cluster for everything to work smoothly:

 

size of keys

size of values

number of keys

number of replicas

 

Thanks

Bhawana

 

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Membase: http://www.membase.com/products-and-services/overview
Call or email "sales -at- membase -dot- com" today!

Top
  • Login or register to post comments
Fri, 12/10/2010 - 16:06
azreal
Offline
Joined: 12/10/2010
Groups: None

size of keys: less than 16 bytes

size of values: 10kb

number of keys: 100K

number of replicas: 1

 

Thanks

Top
  • Login or register to post comments
Mon, 12/13/2010 - 11:05
bhawana@membase
Offline
Joined: 10/29/2010
Groups: None

 

azreal,

You have about 2 GB of data that you wrote, which is not much for Membase provided that other programs have enough memory to run and the system is not paging too much . What operating system are you on? How much memory is recommended by your operating system? 

Can you please generate diagnostics and send it across  to bhawana-at-membase-dot-com:

 /opt/membase/bin/browse_logs > /tmp/nslogs.txt (make sure to zip this output before you send)

and

/opt/membase/1.6.1rc2/bin/ns_server/collect_info

Thanks

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Membase: http://www.membase.com/products-and-services/overview
Call or email "sales -at- membase -dot- com" today!

Top
  • Login or register to post comments
Wed, 12/15/2010 - 11:27
azreal
Offline
Joined: 12/10/2010
Groups: None

 Hi bhawana, 

   I think one of the number I give you is wrong, the value size is actually 100kb.So the total data size is 10 GB. 

    I'm running centos 5 64bit. I don't know what's the recommended memory size. I only have Membase running in the box except other system processes. 

   The output file size is very big, I don't know what's the best way to send you this. 

 

Thanks,

Bin

Top
  • Login or register to post comments
Wed, 12/15/2010 - 17:05
bhawana@membase
Offline
Joined: 10/29/2010
Groups: None

You have 20 GB of data that you have and you have 3 * 2 = 6GB memory, which is a problem. Membase is suited for applications that have almost all of working set  in memory. If you have 20 GB of active data, you will have to size your cluster accordingly.

If you are still in a development/stage environment you should try our new release candidate.

here http://forums.membase.org/thread/membase-server-164-rc1-available

Since this is a Release Candidate, it's not meant to be used in production but any new or ongoing development should definitely be using this release to get the latest and greatest.

Thanks

Bhawana

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Membase: http://www.membase.com/products-and-services/overview
Call or email "sales -at- membase -dot- com" today!

Top
  • Login or register to post comments
Fri, 12/17/2010 - 05:03
ajperella
Offline
Joined: 12/17/2010
Groups: None

 Hi, coming into this discussion late after signing up to look at membase coming from a redis and cassandra user.

I listened to the FLOSS podcast on membase and heard that membase should be able to store vastly more data on disk (or other layers) than it has in ram. In fact it suggested you have the ability to assign certain storage option based on how frequentky you might want to access this. Given this, I am surprised at why the poster above has trouble adding a node in to such a system.

My potential use case would definitely need the ability to persist much more data than is ever needed in ram at once.

Can you advise?

Regards,

Andrew

 

Top
  • Login or register to post comments
Fri, 12/17/2010 - 08:47
azreal
Offline
Joined: 12/10/2010
Groups: None

Hi,

   I tried the lastest version and it works better.

   At the same time, I upgraded the box to 16MB memory and 500 GB disk. I wrote 120 GB data into the cluster of three box and tried a new one to rebalance and it works. Now the question comes to the time it takes to rebalance. It took about 4 hours or more to finish. In the real case, the data could be 10 times more than that. So any idea on speeding up the rebalance. I felt that it has something to do with the number of keys which are about 10million given 120 GB data.

 

Thanks,

Bin

Top
  • Login or register to post comments
Fri, 12/17/2010 - 08:47
azreal
Offline
Joined: 12/10/2010
Groups: None

Hi,

   I tried the lastest version and it works better.

   At the same time, I upgraded the box to 16MB memory and 500 GB disk. I wrote 120 GB data into the cluster of three box and tried a new one to rebalance and it works. Now the question comes to the time it takes to rebalance. It took about 4 hours or more to finish. In the real case, the data could be 10 times more than that. So any idea on speeding up the rebalance. I felt that it has something to do with the number of keys which are about 10million given 120 GB data.

 

Thanks,

Bin

Top
  • Login or register to post comments
Fri, 12/17/2010 - 15:17
bhawana@membase
Offline
Joined: 10/29/2010
Groups: None

Bin,

Thanks for trying out  the release candidate.  We are working on an optimization to speed up rebalance when most of the data is on the disk.

Bhawana
 

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Membase: http://www.membase.com/products-and-services/overview
Call or email "sales -at- membase -dot- com" today!

Top
  • Login or register to post comments
Mon, 12/20/2010 - 09:03
ajperella
Offline
Joined: 12/17/2010
Groups: None

 Hi Bhawana, please could you comment on my post above? It would really help to know more about this use case.

Best Regards,

Andrew

Top
  • Login or register to post comments
Mon, 12/20/2010 - 10:54
bhawana@membase
Offline
Joined: 10/29/2010
Groups: None

Andrew,

Membase can store vast amount of data - just that the working set (the one used by your application actively) should almost all be in memory. The user azreal may have some other problem in his setup. I have asked him to send to me his log files.

 

Bhawana

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Membase: http://www.membase.com/products-and-services/overview
Call or email "sales -at- membase -dot- com" today!

Top
  • Login or register to post comments
Mon, 12/20/2010 - 16:19
ajperella
Offline
Joined: 12/17/2010
Groups: None

Thanks Bhawana- that makes perfect sense. Thanks for the clarification,

Regards,

Andrew

Top
  • Login or register to post comments
Mon, 12/20/2010 - 17:03
azreal
Offline
Joined: 12/10/2010
Groups: None

Hi Bhawana,

   I think I got a new problem. I shut down my cluster last week and I restarted them just now. And I couldn't write to it and get any data from it anymore.

   From the web interface, I saw the item count as 2835837 instead of 10 million. Does it only count the items in the memory? I'm pretty sure all the four servers in my cluster are healthy and the rebalance I did last week is successful. 

   The stats show the following, 

    STAT delete_misses 0
STAT ep_io_num_write 0
STAT ep_store_max_concurrency 40
STAT rejected_conns 0
STAT connection_structures 51
STAT ep_db_strategy multiDB
STAT ep_num_eject_replicas 0
STAT limit_maxbytes 67108864
STAT decr_hits 0
STAT ep_pending_ops_max_duration 0
STAT ep_flush_duration_total 0
STAT ep_item_flush_expired 0
STAT ep_num_not_my_vbuckets 0
STAT ep_too_young 0
STAT curr_connections 51
STAT rusage_system 583.413330
STAT ep_io_write_bytes 0
STAT ep_total_cache_size 105310423107
STAT ep_storage_age 0
STAT ep_flush_duration_highwat 0
STAT ep_flush_duration 0
STAT cas_misses 0
STAT ep_flusher_todo 0
STAT ep_pending_ops 0
STAT ep_db_cleaner_status complete
STAT mem_used 27658551571
STAT ep_dbshards 16
STAT ep_warmup_oom 0
STAT ep_vbucket_del 0
STAT get_misses 0
STAT ep_num_value_ejects 7773418
STAT ep_queue_size 0
STAT ep_total_del_items 0
STAT bytes_read 205008
STAT get_hits 0
STAT decr_misses 0
STAT ep_commit_num 0
STAT rusage_user 392.088379
STAT bucket_conns 11
STAT ep_num_non_resident 7773418
STAT ep_store_max_readwrite 4
STAT ep_tap_keepalive 0
STAT ep_oom_errors 0
STAT ep_too_old 0
STAT cmd_flush 0
STAT ep_max_txn_size 40000
STAT ep_version 1.6.4r_91_g655ddef
STAT uptime 5399
STAT ep_data_age_highwat 0
STAT ep_queue_age_cap 3600
STAT incr_hits 0
STAT time 1292889682
STAT ep_warmup_dups 0
STAT ep_total_persisted 0
STAT daemon_connections 40
STAT ep_flusher_state running
STAT pointer_size 64
STAT version 1.4.4_358_gb2307d8
STAT ep_max_data_size 56623104000
STAT ep_tap_bg_fetched 0
STAT ep_tmp_oom_errors 0
STAT ep_commit_time_total 0
STAT ep_warmup_time 4156135713
STAT ep_item_commit_failed 0
STAT total_connections 51
STAT curr_items 0
STAT ep_total_new_items 0
STAT ep_data_age 0
STAT delete_hits 0
STAT ep_storage_type featured
STAT curr_items_tot 10391589
STAT ep_total_enqueued 0
STAT ep_mem_low_wat 33973862400
STAT ep_kv_size 27576243107
STAT ep_vbucket_del_fail 0
STAT ep_min_data_age 0
STAT ep_io_num_read 10391589
STAT ep_warmed_up 10391589
STAT ep_item_flush_failed 0
STAT cas_hits 0
STAT ep_warmup true
STAT ep_dbname /mnt/membase/1.6.4r/data/ns_1/default
STAT ep_num_expiry_pager_runs 1
STAT ep_commit_time 0
STAT auth_errors 0
STAT ep_store_max_readers 36
STAT ep_bg_fetched 0
STAT ep_storage_age_highwat 0
STAT threads 16
STAT pid 2472
STAT auth_cmds 7
STAT cas_badval 0
STAT cmd_set 0
STAT ep_io_read_bytes 104339101331
STAT cmd_get 0
STAT ep_expired 0
STAT conn_yields 0
STAT ep_warmup_thread complete
STAT ep_flush_preempts 0
STAT ep_num_eject_failures 0
STAT bytes_written 35960779
STAT libevent 1.4.13-stable
STAT ep_num_pager_runs 0
STAT ep_mem_high_wat 42467328000
STAT ep_dbinit 2
STAT incr_misses 0
STAT ep_pending_ops_total 0
STAT ep_item_begin_failed 0
STAT ep_pending_ops_max 0
STAT ep_overhead 82308464

    Could you suggest what I should do?

    The error i got from the setting is, MemCached: while expecting 'STORED', got unexpected response 'SERVER_ERROR proxy write to downstream'.

Thanks,

Bin


 

Top
  • Login or register to post comments
Mon, 12/20/2010 - 18:02
bhawana@membase
Offline
Joined: 10/29/2010
Groups: None

Bin,

You will have to send to me the log file. Please zip it before you send.

/opt/membase/bin/browse_logs > /tmp/nslogs.txt

Thanks

bhawana

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Membase: http://www.membase.com/products-and-services/overview
Call or email "sales -at- membase -dot- com" today!

Top
  • Login or register to post comments
Tue, 12/21/2010 - 11:52
azreal
Offline
Joined: 12/10/2010
Groups: None

Already sent, if you didn't receive it, let me know.

Top
  • Login or register to post comments
Tue, 12/21/2010 - 12:35
bhawana@membase
Offline
Joined: 10/29/2010
Groups: None

Bin,

Can you please paste the full text of the error (include the ip address/hotname and the port number) that is printed. 

 

Thanks

Bhawana

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Membase: http://www.membase.com/products-and-services/overview
Call or email "sales -at- membase -dot- com" today!

Top
  • Login or register to post comments
Tue, 12/21/2010 - 13:00
azreal
Offline
Joined: 12/10/2010
Groups: None

Bhawana, I don't think the error text contains ip address, hostname and port number.

Right now, I'm using the python module for memcache to test.

this is what I did is,

client = memcache.Client(['hostname:11211'], debug=1)

This is my /etc/hosts file,

192.168.1.102    cas01
192.168.1.103    cas02
192.168.1.104    cas03
192.168.1.105   cas05

I'm not sure whether this is what you want.


 

Top
  • Login or register to post comments
Tue, 12/21/2010 - 17:53
perry
Offline
Joined: 10/11/2010
Groups:

 Following up  here.  Looking at the logs, it seems that the Membase servers were still "warming" up (i.e., reading their data from disk).  In the current version, the servers are unable to respond to requests until this has completed.

 

I created a wiki page to explain this and how to monitor it (http://wiki.membase.org/display/membase/Monitoring+Membase).

 

Hopefully that takes care of this, let us know if there are any other outstanding issues.

 

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker