[MB-7149] cbbackup loops infinitely Created: 10/Nov/12 Updated: 18/May/13 |
|
| Status: | Reopened |
| Project: | Couchbase Server |
| Component/s: | tools |
| Affects Version/s: | 2.0-beta-2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Mike Wiederhold | Assignee: | Bin Cui |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | 2.0-release-notes | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | Linux | ||
| Description |
|
I'm trying to backup my cluster via cbbackup. I start the backup via
/opt/couchbase/bin/cbbackup couchbase://Administrator:password@mymachine:8091 /backups/couchbase_backup_test The backup appears to work fine and a progress bar appears, but then exceeds 100% progress and never stops! Example: ^Cinterrupted.###############################] 210.8% (26141950/12403784 msgs) (I ctrl-C'd to kill it after 200% because it seems that this can't possibly work. Note that both the percent and the number of messages are off). I only have ~12 million items in the bucket, but it went right past that limit when backing up. Help? |
| Comments |
| Comment by Steve Yen [ 12/Nov/12 ] |
|
There's a couple things going on here that probably need documentation...
- cbbackup first contacts the server to get the # of items. But, if the cluster is changing (there are item mutations), that # of items will just be an estimate. - then, cbbackup uses the TAP protocol to perform the backup. But, under some conditions (not all item values are resident in memory), the TAP protocol might actually send duplicate messages. That's why cbbackup reports "msgs" for progress instead of "items" in its numerator, but uses "items" in its denominator. That can lead to >100% in some cases. Whether it leads to >200% is somewhat unexpected, but it depends on the situation and what couchbase server is doing in generating the TAP stream. |
| Comment by Steve Yen [ 12/Nov/12 ] |
| The 2.0.2 filter isn't correct at the moment. Putting this into 2.0.1 for the moment as it'll be revisited again. |
| Comment by Michael L [ 27/Nov/12 ] |
|
(I am the original poster)
Changing the line above to use an IP address rather than hostname seems to have fixed the problem. My backups now run to 100% and then complete as expected. As for the root cause: I don't believe it has anything to do with the cluster changing, since I first encountered this when trying to backup an essentially idle cluster. |
| Comment by Bin Cui [ 06/Dec/12 ] |
| Verified on a multi-node cluster that cbtransfer get the total item number correctly. And fail to reproduce the bug on the idle cluster. |
| Comment by Mike Wiederhold [ 06/Dec/12 ] |
| I asked the user on the forums for more information to reproduce this issue. I will post the information here if and when he responds. |
| Comment by Farshid Ghods [ 10/Dec/12 ] |
| deferring to 2.1 per bug scrub meeting ( Dipti & Farshid -December 7th ) |
| Comment by Paul Janssen [ 04/Jan/13 ] |
|
I have the same problem.
Observed progress over 32000%. The expected msgs to save is far less than the actual msgs saved. Restore will load the same number of msgs as were actually saved. This will impact backup and restore time. This will impact diskspace. |
| Comment by Paul Janssen [ 04/Jan/13 ] |
|
Version info: 2.0.0 community edition (build-1723)
|
| Comment by Paul Janssen [ 04/Jan/13 ] |
| Using ip-address (local,external) or hostname (localhost) does not make any difference, issue remains. |
| Comment by Maria McDuff [ 25/Mar/13 ] |
|
bug scrub: Bin -- have you had a chance to take a look? pls update. thanks. |
| Comment by Bin Cui [ 25/Mar/13 ] |
| Cannot reproduce it in house. |
| Comment by Maria McDuff [ 01/Apr/13 ] |
| per bug scrub: abhinav -- can you please repro in latest 2.0.2 build? thanks. |
| Comment by Abhinav Dangeti [ 01/Apr/13 ] |
|
Cannot reproduce on 2.0.2-749-rel. - 3 nodes, 2 buckets [root@orange-11601 ~]# /opt/couchbase/bin/cbbackup couchbase://Administrator:password@localhost:8091 ~/backup [####################] 100.0% (8557766/8558766 msgs) bucket: default, msgs transferred... : total | last | per sec batch : 10657 | 10657 | 14.7 byte : 1131794345 | 1131794345 | 1562011.5 msg : 8558766 | 8558766 | 11812.1 [####################] 100.0% (2024739/2024739 msgs) bucket: saslbucket, msgs transferred... : total | last | per sec batch : 14775 | 14775 | 86.0 byte : 1367390279 | 1367390279 | 7955075.3 msg : 10583505 | 10583505 | 61571.7 done |
| Comment by Maria McDuff [ 09/Apr/13 ] |
| not reproducible. |
| Comment by Bin Cui [ 17/May/13 ] |
|
First, the error itself is harmless. The tool tried to transfer design docs and the source cluster doesn't contain any. Since 2.0.2, customer can specify --data-only option for cbtransfer/cbback/cbrestore tool. But we still dont know the root cause why there is such a big difference between the initial msgs to be sent and the final msgs that are transferred. |
| Comment by Bin Cui [ 17/May/13 ] |
|
One possible explanation about the deviant number: 1. the estimate number is the total active item number 2. the actual msg tranferred = total(tap_mutations + tap_delete) For the above customer case where they have 2 million item deleted, we will not only the current active items, but also any deleted items. At again, there will more msgs transferred if any key will have repeated set/deletions. |
| Comment by Perry Krug [ 18/May/13 ] |
| Reopening for visibility. Whether the tool is doing the "right" thing or not, there is still a major impact to the user both in terms of disk space being taken up, time being taken and perception of confidence, etc in the backup. |
[MB-8214] bgfetcher performance regression Created: 08/May/13 Updated: 18/May/13 |
|
| Status: | In Progress |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Jin Lim | Assignee: | Abhinav Dangeti |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
* bgfetcher condition variable for woken task is not getting correctly (hasWokenTask).
* also need to immediately move any woken task to readyQueue instead of re-pushing to futureQueue |
| Comments |
| Comment by Maria McDuff [ 08/May/13 ] |
| per bug triage, upgrading to blocker. |
| Comment by Jin Lim [ 10/May/13 ] |
|
* Two toy builds, MRW28 & MRW30, have shown bgfetcher performance regression is now fixed.
* Both implemented the two fixes mentioned in the above description + different ways of binding working threads to incoming data access requests. * QE (Abhinav) already started the 18 hours longevity litmus test * Jin to drop MRW30 to Perf (Ronnie) for the full cycle of performance test * Will mark this as fixed after QE and Perf validation |
| Comment by Maria McDuff [ 10/May/13 ] |
| abhinav, pls provide test result of the toy build. |
| Comment by Maria McDuff [ 10/May/13 ] |
|
Jin, abhinav will update this bug (with his test result) tonight when the run completes (~8pm onwards). if all looks good, you can merge the fix (as agreed in today's bug mtg) then he can launch a new run with the official build this weekend. |
| Comment by Abhinav Dangeti [ 11/May/13 ] |
|
Litmus run for 2.0.1-170 vs MRW28-toy (4 readers + 1 writer): Mixed-litmus: http://qa.hq.northscale.net/job/litmuses-graph-loop/178/ Mixed-dgm-litmus: http://qa.hq.northscale.net/job/litmuses-graph-loop/179/ Read-litmus: http://qa.hq.northscale.net/job/litmuses-graph-loop/180/ Write-litmus: http://qa.hq.northscale.net/job/litmuses-graph-loop/181/ Reb-in-litmus: http://qa.hq.northscale.net/job/litmuses-graph-loop/182/ -- Reb-in-time: 673s vs 845s Reb-out-litmus: http://qa.hq.northscale.net/job/litmuses-graph-loop/183/ -- Reb-out-time: 663s vs 1091s Reb-swap-litmus: http://qa.hq.northscale.net/job/litmuses-graph-loop/184/ -- Reb-swap-time: 453s vs 640s *Times for rebalance longer *Set-get latencies regressed a a little on the other runs. Wait for results of 2.0.1-170 vs MRW30-toy (4 readers + 2 writers) from Ronnie. |
| Comment by Maria McDuff [ 13/May/13 ] |
| Wayne to post results from Ronnie's run. |
| Comment by Wayne Siu [ 13/May/13 ] |
|
Performance tests numbers (2.0.1-170 vs MRW30-toy)
Reb-large-2 (Reb-in): 1299s vs 4865s (-275%) Reb-large-2-out (Reb-out) : 971s vs 5728s (-490%) |
| Comment by Jin Lim [ 13/May/13 ] |
| This is due to the same regression from rebalance. We will update from that end. |
| Comment by Jin Lim [ 13/May/13 ] |
|
|
| Comment by Maria McDuff [ 14/May/13 ] |
| per bug triage, keeping this bug opened since it is more about read i/o starvation. |
| Comment by Jin Lim [ 15/May/13 ] |
| After QE's validation on the build 803 and give us a GO, will have the perf team resume the performance test. Thanks. |
| Comment by Maria McDuff [ 16/May/13 ] |
| abhinav will post the test result tonight. test is still running. |
| Comment by Abhinav Dangeti [ 18/May/13 ] |
|
Test1's results from the following spreadsheet: https://docs.google.com/spreadsheet/ccc?key=0Ap_3tfZFLHzcdE16WnFyb09ZcE1CckZQMnN1eWRldFE#gid=0 Status: green |
[MB-8292] Memory leak? on one of the buckets on destination cluster (single node) with source cluster on v_iber workload Created: 15/May/13 Updated: 18/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Abhinav Dangeti | Assignee: | Mike Wiederhold |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
couchbase-server-2.0.2-800.x86_64
source cluster: 3 nodes destination cluster: 1 node source buckets: AbRegNums, MsgsCalls, RevAB, UserInfo destination buckets: AbRegNums, RevAB, UserInfo XDC replications on the common buckets |
||
| Operating System: | Centos 64-bit |
| Description |
|
- Viber workload on all 4 buckets, greater load on RevAB compared to the rest.
- After about half a day's run time: - After a point, replication broke for RevAB as that bucket on destination has run out of memory. - Resident ratios still at 100% for all buckets on source, and on destination 100% resident ratios on AbRegNums and UserInfo, but 0% on RevAB. - Mem used is at 3G, with higher water mark at about 2.7G, no temp OOMs noticed however. - cbstats on destination node: ep_diskqueue_memory: 0 ep_mem_high_wat: 2738041651 ep_mem_low_wat: 2415919104 ep_mem_tracker_enabled: true ep_meta_data_memory: 549618696 ep_mutation_mem_threshold: 95 ep_warmup_min_memory_threshold: 100 mem_used: 3060163976 vb_active_ht_memory: 50790400 vb_active_itm_memory: 610687928 vb_active_meta_data_memory: 549618696 vb_active_perc_mem_resident: 0 vb_active_queue_memory: 0 vb_pending_ht_memory: 0 vb_pending_itm_memory: 0 vb_pending_meta_data_memory: 0 vb_pending_perc_mem_resident: 0 vb_pending_queue_memory: 0 Attached cbcollect_info for source (10.3.4.27): https://s3.amazonaws.com/bugdb/MB--/10_3_4_27.zip , and destination (10.3.4.30): https://s3.amazonaws.com/bugdb/MB--/10_3_4_30.zip |
| Comments |
| Comment by Mike Wiederhold [ 16/May/13 ] |
| Just for bug distribution |
| Comment by Maria McDuff [ 16/May/13 ] |
| ketaki --- can you specify the impact of this issue? is this a stats mem bug only? is the system still operational (compaction, set/get working ok or not). can you provide more details of what the impact this issue is doing the cluster. |
| Comment by Ketaki Gangal [ 17/May/13 ] |
|
The memory stats show the correct usage. ( i.e. it is using up more memory than allocated) The bucket shows 0 percent resident items. The bucket is created w/ 3G and used up memory is 3G, which is well above the HIgh water mark ( 2.7G). Further inspection shows, 0.5G fragmentation, 0.5G meta data using up memory, there is no accounting for where rest of the memory is being used. It looks like an issue with freeing up the memory and is not a stats only issue. Checked this w/ "free" system command as well. |
| Comment by Maria McDuff [ 18/May/13 ] |
| mike, can you pls take a look at this issue? thanks. |
[MB-8294] Destination node with XDCR (heavy DGM) goes into pending. Created: 16/May/13 Updated: 18/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Deepkaran Salooja | Assignee: | Deepkaran Salooja |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
2.0.2-800-rel
<manifest> <remote name="couchbase" fetch="git://github.com/couchbase/"/> <remote name="membase" fetch="git://github.com/membase/"/> <remote name="apache" fetch="git://github.com/apache/"/> <remote name="erlang" fetch="git://github.com/erlang/"/> <default remote="couchbase" revision="master"/> <project name="tlm" path="tlm" revision="f30cd57af02e51eafa0b6d5fb71176c2a46a2cf9"> <copyfile src="Makefile.top" dest="Makefile"/> </project> <project name="bucket_engine" path="bucket_engine" revision="2a797a8d97f421587cce728f2e6aa2cd42c8fa26"/> <project name="ep-engine" path="ep-engine" revision="02714d36d61509195fb1b18953445fbdd4240ed3"/> <project name="libconflate" path="libconflate" revision="c0d3e26a51f25a2b020713559cb344d43ce0b06c"/> <project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/> <project name="libvbucket" path="libvbucket" revision="408057ec55da3862ab8d75b1ed25d2848afd640f"/> <project name="couchbase-cli" path="couchbase-cli" revision="3c3aa79db86684ba2bb01a952c43995b28797cd9" remote="couchbase"/> <project name="memcached" path="memcached" revision="b6ceb46fc26ac6f1d6be7a5866d6c6c0f6e6d32a" remote="membase"/> <project name="moxi" path="moxi" revision="4b391021af7a453bf88716724d2c644916ebd969"/> <project name="ns_server" path="ns_server" revision="2251cebb7efa5b0f77e73dc53435ce9348faae9e"/> <project name="portsigar" path="portsigar" revision="159b6179ea8a3c2075ee9eb2afa6f91c98c0fda6"/> <project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/> <project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/> <project name="couchbase-python-client" path="couchbase-python-client" revision="d443169c0694fca1be67d8f6934a8c50f0175ee7"/> <project name="couchdb" path="couchdb" revision="586e4bb73b92db4362192616370c4e3edb8c34a0"/> <project name="couchdbx-app" path="couchdbx-app" revision="e83b255bc7f7548e2bc36e709666e564c2a488dd"/> <project name="couchstore" path="couchstore" revision="963fc26eafc67514eed5c9a3752d5d4cbdf5971d"/> <project name="geocouch" path="geocouch" revision="ed9ad43aa361df0829262fef811b5236331b44c8"/> <project name="testrunner" path="testrunner" revision="c184e199382d5af23ac4a282deb12924f0635cd3"/> <project name="healthchecker" path="healthchecker" revision="29d45e7776ecb20800f6ad97aec585a1e1636370"/> <project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/> <project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/> <project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/> <project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/> <project name="gperftools" path="gperftools" revision="44a584d1de8c89addfb4f1d0522bdbbbed83ba48" remote="couchbase"/> <project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/> </manifest> |
||
| Attachments: |
|
| Operating System: | Centos 64-bit |
| Description |
|
Destination node with XDCR (heavy DGM) goes into pending. Setup Information: - Cluster Config : 4 nodes, OS: Centos 6.3, CPU : 6 Core, RAM : 16G, Disk : 500G - XDCR Topology : Unidirectional ( Master(3) -> Slave(1)) - 4 buckets AbRegNums(2GB), MsgsCalls(2GB), RevAB(3GB), UserInfo(2GB) - Data Loaded using Viber Workload with 6.7M(RR ~45%), 0.1M(RR 100%), 11M(RR 100%), 12M(RR ~30%) data on the 4 buckets - Unidirectional XDCR setup to 1 node(master -> slave) for 3 buckets AbRegNums, RevAB, UserInfo Destination node goes into pending state(screenshot attached). Both beam.smp and memcached are running on the node. There are lot of crash reports like below but there are no views/design docs in the system. [error_logger:error,2013-05-16T1:40:30.265,ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:log_report:72] =========================CRASH REPORT========================= crasher: initial call: set_view_update_daemon:init/1 pid: <0.9511.707> registered_name: set_view_update_daemon exception exit: {noproc, {gen_server,call, ['capi_set_view_manager-UserInfo', {foreach_doc, #Fun<capi_ddoc_replication_srv.2.102018441>}, infinity]}} in function gen_server:terminate/6 ancestors: [ns_server_sup,ns_server_cluster_sup,<0.58.0>] messages: [] links: [<0.298.0>,<0.9512.707>] dictionary: [] trap_exit: false status: running heap_size: 121393 stack_size: 24 reductions: 7537 neighbours: =========================CRASH REPORT========================= crasher: initial call: compaction_daemon:-spawn_bucket_compactor/3-fun-2-/0 pid: <0.9508.707> registered_name: [] exception exit: {noproc, {gen_server,call, ['capi_set_view_manager-RevAB', {foreach_doc, #Fun<capi_ddoc_replication_srv.1.36030090>}, infinity]}} in function gen_server:call/3 in call from capi_ddoc_replication_srv:foreach_live_ddoc_id/2 in call from capi_ddoc_replication_srv:fetch_ddoc_ids/1 in call from compaction_daemon:'-spawn_bucket_compactor/3-fun-2-'/4 ancestors: [compaction_daemon,ns_server_sup,ns_server_cluster_sup, <0.58.0>] messages: [] links: [<0.412.0>] dictionary: [] trap_exit: false status: running heap_size: 28657 stack_size: 24 reductions: 30607 neighbours: The pending node is in the same state and can be used for investigation: coconut-h20804.hq.couchbase.com |
| Comments |
| Comment by Deepkaran Salooja [ 16/May/13 ] |
|
Collect_info destination node https://s3.amazonaws.com/bugdb/jira/MB-8294/e9125b6b/coconut-h20804.hq.couchbase.com-5162013-244-diag.zip Collect_info source cluster https://s3.amazonaws.com/bugdb/jira/MB-8294/e9125b6b/coconut-h20801.hq.couchbase.com-5162013-235-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8294/e9125b6b/coconut-h20802.hq.couchbase.com-5162013-238-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8294/e9125b6b/coconut-h20803.hq.couchbase.com-5162013-241-diag.zip |
| Comment by Aliaksey Artamonau [ 16/May/13 ] |
| Because of the race that we found recently, bucket supervisors went down on destination node. Fix for this is already merged. But unfortunately I cannot figure out the reason of the system_limit errors that triggered the race without the fix. Could you please rerun with the same test with latest build. The fix in questions is this: http://review.couchbase.org/26331. |
| Comment by Maria McDuff [ 18/May/13 ] |
| deep, pls update this bug with your re-run using the latest build. |
[MB-8165] timeout error appeared while trying to read replica from a bucket with active resident ratio 30% Created: 29/Apr/13 Updated: 18/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Iryna Mironava | Assignee: | Mike Wiederhold |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
build 2.0.2-779-rel
<manifest><remote name="couchbase" fetch="git://10.1.1.210/"/><remote name="membase" fetch="git://10.1.1.210/"/><remote name="apache" fetch="git://github.com/apache/"/><remote name="erlang" fetch="git://github.com/erlang/"/><default remote="couchbase" revision="master"/><project name="tlm" path="tlm" revision="14fb7cc05baf418a57d33ab7dd0e7239645ec156"><copyfile src="Makefile.top" dest="Makefile"/></project><project name="bucket_engine" path="bucket_engine" revision="2a797a8d97f421587cce728f2e6aa2cd42c8fa26"/><project name="ep-engine" path="ep-engine" revision="e94e5336f9107f1375504b205a114902b66af374"/><project name="libconflate" path="libconflate" revision="c0d3e26a51f25a2b020713559cb344d43ce0b06c"/><project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/><project name="libvbucket" path="libvbucket" revision="026c79ae424a6daed4bb9345e86cc8fc21759b28"/><project name="couchbase-cli" path="couchbase-cli" revision="df2bf841fe1a88ce22cd931ee20fd8821cfcbd88" remote="couchbase"/><project name="memcached" path="memcached" revision="f5f43c6971d88c839ee78bcf87d6e7f177cef7b4" remote="membase"/><project name="moxi" path="moxi" revision="4b391021af7a453bf88716724d2c644916ebd969"/><project name="ns_server" path="ns_server" revision="44176c27097947806b721de4b84dc92f24802dbb"/><project name="portsigar" path="portsigar" revision="159b6179ea8a3c2075ee9eb2afa6f91c98c0fda6"/><project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/><project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/><project name="couchbase-python-client" path="couchbase-python-client" revision="d443169c0694fca1be67d8f6934a8c50f0175ee7"/><project name="couchdb" path="couchdb" revision="586e4bb73b92db4362192616370c4e3edb8c34a0"/><project name="couchdbx-app" path="couchdbx-app" revision="cf709acdb8ee24cef158a2007189184e1e0f8016"/><project name="couchstore" path="couchstore" revision="1011b9d3d6baf87103531c89acc7d2f44be37e2e"/><project name="geocouch" path="geocouch" revision="ed9ad43aa361df0829262fef811b5236331b44c8"/><project name="testrunner" path="testrunner" revision="7a76cbb6954e29750e17388ab0d559a720d3a8c2"/><project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/><project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/><project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/><project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/><project name="gperftools" path="gperftools" revision="8f60ba949fb8576c530ef4be148bff97106ddc59" remote="couchbase"/><project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/></manifest> |
||
| Operating System: | Centos 64-bit |
| Description |
|
test to reproduce
newmemcapable.GetrTests.getr_dgm_test,nodes_init=3,GROUP=P1,,dgm_run=true,wait_timeout=250 -p get-delays=true attaching logs and delays |
| Comments |
| Comment by Iryna Mironava [ 29/Apr/13 ] |
|
https://s3.amazonaws.com/bugdb/jira/MB-8165/a3ab787f/diaf.zip https://s3.amazonaws.com/bugdb/jira/MB-8165/a3ab787f/delay.zip |
| Comment by Mike Wiederhold [ 02/May/13 ] |
|
Iryna,
The replica read command follows the same code path as a normal get request so I am not sure why you would be seeing a timeout. Can you please point out the place in the logs that shows this timeout? Also, does this test always fail or is it sporadic? |
| Comment by Iryna Mironava [ 07/May/13 ] |
|
i get timeout socket error each time i run this test
When I am trying to get replica for a key i get not my vbucket error |
| Comment by Maria McDuff [ 14/May/13 ] |
|
mike, looks like iryna is seeing this issue for each test she runs. can u take a look at the logs? thanks. |
| Comment by Mike Wiederhold [ 14/May/13 ] |
| It looks like you forgot to set replica_to_read=1 in this test. From looking at the test code your getreplica commands are hitting active vbuckets and if an active vbucket receives a replica read command it returns not my vbucket. |
| Comment by Deepkaran Salooja [ 17/May/13 ] |
|
The test is supposed to work fine with replica_to_read=0(which is the default). Running it with replica_to_read=1 returns error e.g. newmemcapable.GetrTests.getr_dgm_test,nodes_init=3,GROUP=P1,replica_to_read=1,dgm_run=true,wait_timeout=250 "IndexError: list index out of range" Need to have 2 replicas if replica_to_read=1 e.g. newmemcapable.GetrTests.getr_dgm_test,nodes_init=3,GROUP=P1,replica_to_read=1,dgm_run=true,wait_timeout=250,resident_ratio=90,replicas=2 But still get the same Not my vbucket error: ====================================================================== ERROR: getr_dgm_test (newmemcapable.GetrTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "pytests/newmemcapable.py", line 265, in getr_dgm_test replica_to_read=self.replica_to_read, batch_size=1) File "pytests/basetestcase.py", line 455, in verify_cluster_stats batch_size=batch_size) File "pytests/basetestcase.py", line 386, in _verify_all_buckets task.result(timeout) File "lib/tasks/future.py", line 159, in result return self.__get_result() File "lib/tasks/future.py", line 111, in __get_result raise self._exception MemcachedError: Memcached error #7 'Not my vbucket': Connection reset ---------------------------------------------------------------------- The test code looks ok to me so far. Investigating more. |
| Comment by Mike Wiederhold [ 17/May/13 ] |
| Okay. I just looked again and I think your right. I got thrown off by one of the functions and didn't realize you guys keep separate maps of replica and active vbuckets. I'll try to add some logging in ep-engine and run the test one more time. If you could add logging in testrunner that says which server and vbucket a getr is sent to when it fails that would be helpful too. |
| Comment by Deepkaran Salooja [ 17/May/13 ] |
|
Added logging to testrunner. It will now log the replica vbucket and server. The very first getr returns Not my vbucket error always. http://review.couchbase.org/#/c/26387/ |
| Comment by Maria McDuff [ 18/May/13 ] |
| mike, pls take a look and provide some update to your ep-engine test run. |
[MB-8023] Port server moxi on node 'ns_1@10.3.121.97' exited with status 1. Created: 02/Apr/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | moxi |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Thuan Nguyen | Assignee: | Thuan Nguyen |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | ubuntu 11.04 64 bit | ||
| Attachments: |
|
| Description |
|
I reboot one vms install couchbase server 2.0.2-746.
After the vm is up and couchbase server started, I saw error "Port server moxi on node 'ns_1@10.3.121.97' exited with status 1." in log page Port server moxi on node 'ns_1@10.3.121.97' exited with status 1. Restarting. Messages: 2013-04-02 15:48:43: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13) 2013-04-02 15:48:43: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8) 2013-04-02 15:48:45: (cproxy.c.330) ERROR: could not listen on port 11211. Please use -Z port_listen=PORT_NUM to specify a different port number. ns_port_server000 ns_1@10.3.121.97 15:48:43 - Tue Apr 2, 2013 Port server moxi on node 'ns_1@10.3.121.97' exited with status 1. Restarting. Messages: 2013-04-02 15:48:38: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13) 2013-04-02 15:48:38: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8) 2013-04-02 15:48:40: (cproxy.c.330) ERROR: could not listen on port 11211. Please use -Z port_listen=PORT_NUM to specify a different port number. ns_port_server000 ns_1@10.3.121.97 15:48:38 - Tue Apr 2, 2013 Service moxi exited on node 'ns_1@10.3.121.97' in 0.01s supervisor_cushion001 ns_1@10.3.121.97 15:48:33 - Tue Apr 2, 2013 Port server moxi on node 'ns_1@10.3.121.97' exited with status 1. Restarting. Messages: 2013-04-02 15:48:33: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13) 2013-04-02 15:48:33: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8) 2013-04-02 15:48:35: (cproxy.c.330) ERROR: could not listen on port 11211. Please use -Z port_listen=PORT_NUM to specify a different port number. ns_port_server000 ns_1@10.3.121.97 15:48:33 - Tue Apr 2, 2013 Port server moxi on node 'ns_1@10.3.121.97' exited with status 1. Restarting. Messages: 2013-04-02 15:48:28: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13) 2013-04-02 15:48:28: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8) 2013-04-02 15:48:30: (cproxy.c.330) ERROR: could not listen on port 11211. Please use -Z port_listen=PORT_NUM to specify a different port number. ns_port_server000 ns_1@10.3.121.97 15:48:28 - Tue Apr 2, 2013 I will attach collect info file soon |
| Comments |
| Comment by Steve Yen [ 02/Apr/13 ] |
|
That looks like some other process is already using port 11211. For example, perhaps there's another memcached that is configured to auto-launch after a server reboot and it grabs port 11211 first before couchbase/moxi can start. |
| Comment by Thuan Nguyen [ 02/Apr/13 ] |
|
Link to collect info file of this node https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_2/2013_04/node10.3.121.97.zip Link to manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_x86_64_2.0.2-746-rel.deb.manifest.xml |
| Comment by Maria McDuff [ 13/May/13 ] |
|
tony, is this still an issue with the latest 2.0.2 build? if so, can you analyze the log and pinpoint what is causing the error? assign to steve yen afterwards. thanks. |
| Comment by Thuan Nguyen [ 14/May/13 ] |
|
I still see the error in windows build 2.0.2-801 user:info,2013-05-13T18:08:54.777,ns_1@10.3.2.143:<0.4483.4>:ns_log:crash_consumption_loop:64]Port server moxi on node 'babysitter_of_ns_1@127.0.0.1' exited with status 0. Restarting. Messages: 2013-05-13 18:05:28: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13) 2013-05-13 18:05:28: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8) 2013-05-13 18:07:55: (agent_config.c.705) ERROR: bad JSON configuration from http://127.0.0.1:8091/pools/default/saslBucketsStreaming: Number of vBuckets must be a power of two > 0 and <= 65536 ({ "name": "default", "nodeLocator": "vbucket", "saslPassword": "", "nodes": [{ "hostname": "10.3.2.143:8091", "ports": { "direct": 11210, "proxy": 11211 } }], "vBucketServerMap": { "hashAlgorithm": "CRC", "numReplicas": 1, "serverList": ["10.3.2.143:11210"], "vBucketMap": [] } }) I will dig into log to investigate it. |
| Comment by Maria McDuff [ 14/May/13 ] |
| tony, pls respond to steve's inquiry (i.e., what other process isusing 11211). |
| Comment by Thuan Nguyen [ 17/May/13 ] |
|
Moxi restart after delete bucket. default bucket. - create default bucket - grep moxi process ID [root@cen-1907 ~]# ps aux | grep moxi 101 29434 0.0 0.0 158060 2612 ? Ssl 13:32 0:00 /opt/couchbase/bin/moxi -Z port_listen=11211,default_bucket_name=default,downstream_max=1024,downstream_conn_max=4,connect_max_errors=5,connect_retry_interval=30000,connect_timeout=400,auth_timeout=100,cycle=200,downstream_conn_queue_timeout=200,downstream_timeout=5000,wait_queue_timeout=200 -z url=http://127.0.0.1:8091/pools/default/saslBucketsStreaming -p 0 -Y y -O stderr root 29524 0.0 0.0 61228 744 pts/3 S+ 13:39 0:00 grep moxi - Load 10K items to default bucket root@cen-1907 ~]# /opt/couchbase/bin/cbworkloadgen -n localhost:8091 -i 10000 -r .5 -s 100 --threads=10 [####################] 100.0% (199990/200000 msgs) bucket: default, msgs transferred... : total | last | per sec batch : 200 | 200 | 8.4 byte : 19999000 | 19999000 | 844890.7 msg : 199990 | 199990 | 8448.9 done - delete default bucket, moxi crashed and restart - grep moxi process ID right after delete default bucket, there is a new process ID [root@cen-1907 ~]# ps aux | grep moxi 101 29561 0.0 0.0 92260 1704 ? Ssl 13:41 0:00 /opt/couchbase/bin/moxi -Z port_listen=11211,default_bucket_name=default,downstream_max=1024,downstream_conn_max=4,connect_max_errors=5,connect_retry_interval=30000,connect_timeout=400,auth_timeout=100,cycle=200,downstream_conn_queue_timeout=200,downstream_timeout=5000,wait_queue_timeout=200 -z url=http://127.0.0.1:8091/pools/default/saslBucketsStreaming -p 0 -Y y -O stderr root 29571 0.0 0.0 61228 748 pts/3 S+ 13:41 0:00 grep moxi - log from info.1 log [ns_server:info,2013-05-17T13:41:27.912,ns_1@127.0.0.1:ns_memcached-default<0.771.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"default/1">>: ok [ns_server:info,2013-05-17T13:41:27.913,ns_1@127.0.0.1:ns_memcached-default<0.771.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"default/0">>: ok [ns_server:info,2013-05-17T13:41:27.913,ns_1@127.0.0.1:ns_memcached-default<0.771.0>:ns_storage_conf:delete_databases_and_files:471]Couch dbs are deleted. Proceeding with bucket directory [ns_server:info,2013-05-17T13:41:28.122,ns_1@127.0.0.1:<0.659.0>:ns_orchestrator:idle:536]Restarting moxi on nodes ['ns_1@127.0.0.1'] [user:info,2013-05-17T13:41:28.124,ns_1@127.0.0.1:<0.620.0>:ns_log:crash_consumption_loop:64]Port server moxi on node 'babysitter_of_ns_1@127.0.0.1' exited with status 0. Restarting. Messages: 2013-05-17 13:32:07: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13) 2013-05-17 13:32:07: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8) EOL on stdin. Exiting [menelaus:info,2013-05-17T13:41:28.126,ns_1@127.0.0.1:<0.11748.0>:menelaus_web_buckets:handle_bucket_delete:345]Deleted bucket "default" I will test in non default bucket next |
| Comment by Thuan Nguyen [ 17/May/13 ] |
|
Test on non default bucket with build 2.0.2-806 on centos 5.8 64bit Moxi also restarted after delete sasl bucket. - create sasl bucket - grep moxi process ID [root@cen-1907 ~]# ps aux | grep moxi 101 29561 0.0 0.0 92908 2464 ? Ssl 13:41 0:00 /opt/couchbase/bin/moxi -Z port_listen=11211,default_bucket_name=default,downstream_max=1024,downstream_conn_max=4,connect_max_errors=5,connect_retry_interval=30000,connect_timeout=400,auth_timeout=100,cycle=200,downstream_conn_queue_timeout=200,downstream_timeout=5000,wait_queue_timeout=200 -z url=http://127.0.0.1:8091/pools/default/saslBucketsStreaming -p 0 -Y y -O stderr root 29757 0.0 0.0 61228 748 pts/3 S+ 14:25 0:00 grep moxi - load items to sasl bucket [root@cen-1907 ~]# /opt/couchbase/bin/cbworkloadgen -n localhost:8091 -i 10000 -r .5 -s 100 --bucket=sasl -u Administrator -p password --threads=10 [####################] 100.0% (199990/200000 msgs) bucket: default, msgs transferred... : total | last | per sec batch : 200 | 200 | 8.5 byte : 19999000 | 19999000 | 846631.0 msg : 199990 | 199990 | 8466.3 done - delete sasl bucket, moxi crashed and restart [root@cen-1907 ~]# ps aux | grep moxi 101 29761 0.0 0.0 92260 1700 ? Ssl 14:25 0:00 /opt/couchbase/bin/moxi -Z port_listen=11211,default_bucket_name=default,downstream_max=1024,downstream_conn_max=4,connect_max_errors=5,connect_retry_interval=30000,connect_timeout=400,auth_timeout=100,cycle=200,downstream_conn_queue_timeout=200,downstream_timeout=5000,wait_queue_timeout=200 -z url=http://127.0.0.1:8091/pools/default/saslBucketsStreaming -p 0 -Y y -O stderr root 29770 0.0 0.0 61228 748 pts/3 S+ 14:25 0:00 grep moxi - in info log ns_server:info,2013-05-17T14:25:31.812,ns_1@127.0.0.1:ns_memcached-sasl<0.15869.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"sasl/1001">>: ok [ns_server:info,2013-05-17T14:25:31.813,ns_1@127.0.0.1:ns_memcached-sasl<0.15869.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"sasl/1000">>: ok [ns_server:info,2013-05-17T14:25:31.815,ns_1@127.0.0.1:ns_memcached-sasl<0.15869.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"sasl/100">>: ok [ns_server:info,2013-05-17T14:25:31.816,ns_1@127.0.0.1:ns_memcached-sasl<0.15869.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"sasl/10">>: ok [ns_server:info,2013-05-17T14:25:31.817,ns_1@127.0.0.1:ns_memcached-sasl<0.15869.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"sasl/1">>: ok [ns_server:info,2013-05-17T14:25:31.818,ns_1@127.0.0.1:ns_memcached-sasl<0.15869.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"sasl/0">>: ok [ns_server:info,2013-05-17T14:25:31.818,ns_1@127.0.0.1:ns_memcached-sasl<0.15869.0>:ns_storage_conf:delete_databases_and_files:471]Couch dbs are deleted. Proceeding with bucket directory [ns_server:info,2013-05-17T14:25:32.026,ns_1@127.0.0.1:<0.659.0>:ns_orchestrator:idle:536]Restarting moxi on nodes ['ns_1@127.0.0.1'] [user:info,2013-05-17T14:25:32.029,ns_1@127.0.0.1:<0.620.0>:ns_log:crash_consumption_loop:64]Port server moxi on node 'babysitter_of_ns_1@127.0.0.1' exited with status 0. Restarting. Messages: 2013-05-17 13:41:28: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13) 2013-05-17 13:41:28: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8) 2013-05-17 13:49:55: (agent_config.c.705) ERROR: bad JSON configuration from http://127.0.0.1:8091/pools/default/saslBucketsStreaming: Number of vBuckets must be a power of two > 0 and <= 65536 ({ "name": "sasl", "nodeLocator": "vbucket", "saslPassword": "password", "nodes": [{ "hostname": "127.0.0.1:8091", "ports": { "direct": 11210, "proxy": 11211 } }], "vBucketServerMap": { "hashAlgorithm": "CRC", "numReplicas": 1, "serverList": ["127.0.0.1:11210"], "vBucketMap": [] } }) EOL on stdin. Exiting [menelaus:info,2013-05-17T14:25:32.033,ns_1@127.0.0.1:<0.1865.1>:menelaus_web_buckets:handle_bucket_delete:345]Deleted bucket "sasl" So every time any bucket deleted, moxi crashed and restart. |
| Comment by Wayne Siu [ 17/May/13 ] |
|
Tony, Can you quantify what the impact is when this happens? |
[MB-8266] erlang crashed with error: "no match of right hand side value {error, closed}... Created: 13/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Thuan Nguyen | Assignee: | Thuan Nguyen |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | windows 2008 R2 64bit | ||
| Attachments: |
|
| Operating System: | Windows 64-bit |
| Description |
|
Environment:
4 windows vm with 4 core, 4GB RAM each Couchbase server version 2.0.2-798 Run sanity test with rebalance is part of the test. See erlang crash in log Check to see if there is an erlang crash dump in cluster => no core dump. Error in diags =========================CRASH REPORT========================= crasher: initial call: erlang:apply/2 pid: <0.13944.1> registered_name: [] exception error: no match of right hand side value {error,closed} in function mc_binary:quick_stats_recv/3 in call from mc_binary:quick_stats_loop/5 in call from mc_binary:quick_stats/5 in call from ns_memcached:do_handle_call/3 in call from ns_memcached:worker_loop/3 ancestors: ['ns_memcached-default','single_bucket_sup-default', <0.13913.1>] messages: [] links: [<0.13929.1>] dictionary: [] trap_exit: false status: running heap_size: 17711 stack_size: 24 reductions: 8661 neighbours: Link to manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.2-798-rel.setup.exe.manifest.xml |
| Comments |
| Comment by Maria McDuff [ 14/May/13 ] |
| per bug triage, bumping up to blocker. |
| Comment by Wayne Siu [ 17/May/13 ] |
| QE to try with a stable build (next week) to see if this issue is reproducible. |
[MB-8284] Rebalance fails with reason replicator_died Created: 15/May/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Deepkaran Salooja | Assignee: | Jin Lim |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
2.0.2-803-rel
<manifest> <remote name="couchbase" fetch="git://github.com/couchbase/"/> <remote name="membase" fetch="git://github.com/membase/"/> <remote name="apache" fetch="git://github.com/apache/"/> <remote name="erlang" fetch="git://github.com/erlang/"/> <default remote="couchbase" revision="master"/> <project name="tlm" path="tlm" revision="f30cd57af02e51eafa0b6d5fb71176c2a46a2cf9"> <copyfile src="Makefile.top" dest="Makefile"/> </project> <project name="bucket_engine" path="bucket_engine" revision="2a797a8d97f421587cce728f2e6aa2cd42c8fa26"/> <project name="ep-engine" path="ep-engine" revision="0d6b7b00df999bef2b9e7ff160fe908b3650e407"/> <project name="libconflate" path="libconflate" revision="c0d3e26a51f25a2b020713559cb344d43ce0b06c"/> <project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/> <project name="libvbucket" path="libvbucket" revision="408057ec55da3862ab8d75b1ed25d2848afd640f"/> <project name="couchbase-cli" path="couchbase-cli" revision="45f1370e3c440bde9763b124e88e26ee98941bcd" remote="couchbase"/> <project name="memcached" path="memcached" revision="b6ceb46fc26ac6f1d6be7a5866d6c6c0f6e6d32a" remote="membase"/> <project name="moxi" path="moxi" revision="4b391021af7a453bf88716724d2c644916ebd969"/> <project name="ns_server" path="ns_server" revision="232663cc06c71b92434ad70f9949b49e46269f9b"/> <project name="portsigar" path="portsigar" revision="159b6179ea8a3c2075ee9eb2afa6f91c98c0fda6"/> <project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/> <project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/> <project name="couchbase-python-client" path="couchbase-python-client" revision="d443169c0694fca1be67d8f6934a8c50f0175ee7"/> <project name="couchdb" path="couchdb" revision="586e4bb73b92db4362192616370c4e3edb8c34a0"/> <project name="couchdbx-app" path="couchdbx-app" revision="e83b255bc7f7548e2bc36e709666e564c2a488dd"/> <project name="couchstore" path="couchstore" revision="963fc26eafc67514eed5c9a3752d5d4cbdf5971d"/> <project name="geocouch" path="geocouch" revision="ed9ad43aa361df0829262fef811b5236331b44c8"/> <project name="testrunner" path="testrunner" revision="1f15e11d443be385ff8362d049a389331b502f9a"/> <project name="healthchecker" path="healthchecker" revision="29d45e7776ecb20800f6ad97aec585a1e1636370"/> <project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/> <project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/> <project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/> <project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/> <project name="gperftools" path="gperftools" revision="44a584d1de8c89addfb4f1d0522bdbbbed83ba48" remote="couchbase"/> <project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/> </manifest> |
||
| Operating System: | Centos 64-bit |
| Description |
|
Rebalance failures are seen in Centos64 sanity job: http://qa.hq.northscale.net/job/CouchbaseServer-SanityTest-4Node-Centos64/73/consoleFull ** Reason for termination == ** {exited, {'EXIT',<0.28006.6>, {replicator_died, {'EXIT',<17674.28481.2>,{badmatch,{error,closed}}}}}} [ns_server:info,2013-05-15T3:24:33.837,ns_1@10.3.3.32:janitor_agent-default<0.23324.6>:janitor_agent:handle_info:761]Undoing temporary vbucket states caused by rebalance [user:info,2013-05-15T3:24:33.837,ns_1@10.3.3.32:<0.30432.2>:ns_orchestrator:handle_info:403]Rebalance exited with reason {exited, {'EXIT',<0.28006.6>, {replicator_died, {'EXIT',<17674.28481.2>, {badmatch,{error,closed}}}}}} ns_server:error,2013-05-15T3:24:33.584,ns_1@10.3.3.32:<0.28006.6>:ns_replicas_builder:build_replicas_main:90]Got premature exit from one of ebucketmigrators: {'EXIT',<17674.28481.2>, {badmatch,{error,closed}}} [error_logger:error,2013-05-15T3:24:33.585,ns_1@10.3.3.32:error_logger<0.6.0>:ale_error_logger_handler:log_report:72] =========================CRASH REPORT========================= crasher: initial call: ebucketmigrator_srv:init/1 pid: <17674.28481.2> registered_name: [] exception error: no match of right hand side value {error,closed} in function mc_client_binary:cmd_binary_vocal_recv/5 in call from mc_client_binary:set_vbucket/3 in call from ebucketmigrator_srv:'-init/1-lc$^0/1-0-'/3 in call from ebucketmigrator_srv:init/1 ancestors: [<0.28006.6>,<0.28000.6>,<0.27823.6>,<0.26685.6>] messages: [] links: [#Port<17674.80060>,<0.28006.6>,#Port<17674.80059>] dictionary: [] trap_exit: false status: running heap_size: 987 stack_size: 24 reductions: 1725 neighbours: Attaching collect_info |
| Comments |
| Comment by Deepkaran Salooja [ 15/May/13 ] |
|
https://s3.amazonaws.com/bugdb/jira/MB-8284/e9125b6b/10.3.3.32-5152013-635-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8284/e9125b6b/10.3.3.33-5152013-638-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8284/e9125b6b/10.3.3.30-5152013-640-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8284/e9125b6b/10.3.3.224-5152013-643-diag.zip |
| Comment by Maria McDuff [ 15/May/13 ] |
|
FYI -- this is the newest build that contains Jin's rebalance fixes. |
| Comment by Aliaksey Artamonau [ 15/May/13 ] |
|
2013-05-15 03:26:26.249 ns_log:0:info:message(ns_1@10.3.3.224) - Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 134. Restarting ... MUTEX ERROR: Failed to acquire lock: Invalid argument |
| Comment by Jin Lim [ 16/May/13 ] |
|
* MRW37 passed http://qa.hq.northscale.net/job/CouchbaseServer-SanityTest-4Node-Centos64/75 except a singel view test. * Checked the failed node, 10.3.3.32, and found no previously detected errors (rebalance hang, mutex acquire abort, crash, etc) * Seems network glitch due to slowness caused connection timeout from and to this node Will merge the fix that went into MRW37 to 2.0.2 branch on Thursday morning. |
| Comment by Aliaksey Artamonau [ 16/May/13 ] |
| Please see my comment above. In fact there was a mutex abort on node 10.3.3.224. |
| Comment by Jin Lim [ 16/May/13 ] |
|
Thanks for reiterating the issue but the mutex error was from the test running on the build 803. Ketaki and Jin built another toy build with a fix, MRW37, later this evening and ran the same sanity test. No MUTEX abort is now found any nodes including 10.3.3.224. The successful results except a single view test can be found at here http://qa.hq.northscale.net/job/CouchbaseServer-SanityTest-4Node-Centos64/75. |
| Comment by Aliaksey Artamonau [ 16/May/13 ] |
| I misunderstood. Sorry then. |
| Comment by Jin Lim [ 16/May/13 ] |
|
The overnight test ran without an incident http://qa.hq.northscale.net/job/CouchbaseServer-SanityTest-4Node-Centos64/77/. This is again with the toy build MRW37. Thanks. |
| Comment by Jin Lim [ 17/May/13 ] |
| Deep this bug has been tracking MUTEX::acquire failure during rebalance (in the path within couch_notifier::resectConnection()). Please confirm if we no longer see this error with the latest test results. If so please close, otherwise, reopen and assign back to Jin. Thanks. |
| Comment by Jin Lim [ 17/May/13 ] |
| Build 805 and up should have included the fix. |
[MB-8240] memslap vbucketkeygen vbuckettool is not working with MAC for latest build Created: 09/May/13 Updated: 17/May/13 |
|
| Status: | In Progress |
| Project: | Couchbase Server |
| Component/s: | tools |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Chisheng Hong | Assignee: | Traun Leyden |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | build 2.0.2-793-rel | ||
| Operating System: | MacOSX 64-bit |
| Description |
|
Chishengs-MacBook-Pro:tools chisheng$ ./memslap -h
dyld: Library not loaded: /opt/couchbase/lib/libmemcached.6.dylib Referenced from: /Applications/Couchbase Server.app/Contents/Resources/couchbase-core/bin/tools/./memslap Reason: image not found Trace/BPT trap: 5 Chishengs-MacBook-Pro:tools chisheng$ ./vbuckettool -h dyld: Library not loaded: /opt/couchbase/lib/libvbucket.1.dylib Referenced from: /Applications/Couchbase Server.app/Contents/Resources/couchbase-core/bin/tools/./vbuckettool Reason: image not found Trace/BPT trap: 5 Chishengs-MacBook-Pro:tools chisheng$ ./vbucketkeygen -h dyld: Library not loaded: /opt/couchbase/lib/libvbucket.1.dylib Referenced from: /Applications/Couchbase Server.app/Contents/Resources/couchbase-core/bin/tools/./vbucketkeygen Reason: image not found Trace/BPT trap: 5 |
| Comments |
| Comment by Ravi Mayuram [ 13/May/13 ] |
| Have asked Traun L to look into this. Cant yet assign it to him. |
| Comment by Traun Leyden [ 13/May/13 ] |
|
I'm getting the same error in my local installation. It's trying to load some .dylib libraries from a non-existent, hardcoded path (/opt/couchbase/lib). The obvious fix would be to change it to load libraries from a relative path (eg, "../../lib/") Ravi: in order to dig in deeper, I'd need to know the location of the source code repository where these tools are stored (eg, the memslap tool). |
| Comment by Ravi Mayuram [ 13/May/13 ] |
|
/Users/ravi/couchsrc/2.0.2 (or wherever you source tree is) ./libvbucket/src/vbucket.c ./libvbucket/src/vbucketkeygen.c ./libvbucket/src/vbuckettool.c |
| Comment by Traun Leyden [ 13/May/13 ] |
|
I don't have a source tree.. I don't even know where the repository is to clone from. In CouchBase mobile the repositories are hosted on github. Not sure what the situation is for couchbase server. |
| Comment by Traun Leyden [ 16/May/13 ] |
|
Update: I've got the source tree and it's building now. After digging into the issue, I've got a "lead". On the buildbot machine, the script which is supposed to re-link the .dylib libraries does not seem to be doing anything: http://qa.hq.northscale.net:8010/builders/mac-x64-21-builder/builds/61/steps/couchbase-server%20make%20community%20/logs/stdio Actual output ========== Fixing library imports in bin ... /Users/buildbot/mac-x64-21-builder/build/build/couchdbx-app/Couchbase Server/install_libraries.rb:38: warning: Insecure world writable dir /opt/couchbase in PATH, mode 040777 Fixing library imports in lib ... Done fixing library imports! Expected output ============ It should be outputting "Change import x to y" for the memslap and other binaries. I sat down with Phil and we dug into it, and were not able to find the root cause. At the end we decided that I should commit some debugging information into the install_libraries.rb script, and he would re-run the build and we'd take a look at the updated output. |
| Comment by Traun Leyden [ 17/May/13 ] |
|
OK this is fixed by modifying the install_libraries.rb script to recurse into the bin/tools directory. Prior to this fix, this script was only fixing up the .dylib links in the bin directory, but not any subdirectories such as bin/tools, where the memslap, vbucketkeygen, and vbuckettool's live. I verified the fix by doing the following: - Install a clean version of Mountain Lion in a VMWare Fusion virtual machine - Install Couchbase server from this build: http://builder.hq.couchbase.com/get/couchbase-server-community_x86_64_2.0.2-806-rel.zip - Cd into /Applications/Couchbase Server.app/../../bin/tools directory - Run ./memslap and the other tools in this directory, verify that there are no .dylib related errors This fix has only been done on the 2.02 branch. @Phil: will this fix automatically make it back into master, or do I need to explicitly push it to master too? (will close the ticket after I hear back on this) |
[MB-8238] NO menu bar to access menu commands after installation and click the Couchbase icon on MAC Created: 09/May/13 Updated: 17/May/13 |
|
| Status: | In Progress |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Chisheng Hong | Assignee: | Traun Leyden |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | build-2.0.2-793 | ||
| Attachments: |
|
| Operating System: | MacOSX 64-bit |
| Description |
|
1. Intall Couchbase server on MAC Lion OS
2. Click Couchbase Icon no access menu poped out For previous build like 755, we have this access menu. The screen shot is from 755 |
| Comments |
| Comment by Maria McDuff [ 10/May/13 ] |
| assigning to bin. has nothing to do with ns_server. |
| Comment by Ravi Mayuram [ 13/May/13 ] |
|
Chisheng, the image shows the menu ... can you clarify the issue and show it to Tarun L? thanks, ravi |
| Comment by Chisheng Hong [ 13/May/13 ] |
| There should be a couchbase icon on menu bar for Mac after you open couchbase app and when you click it you can see some option like "about couchbase server", "open admin console". But for build 793, I did not see this anymore. The screen shot shows how the couchbase menu icon looks like in previous build. It's in the upper right part of the screen shot. |
| Comment by Maria McDuff [ 16/May/13 ] |
| per bug triage, promoting to blocker since there's no way to access the Couchbase Server from the MAC without this icon. |
| Comment by Traun Leyden [ 16/May/13 ] |
|
I'm installing VMWare Fusion with a Mountain Lion OSX guest OS, so I can do repeatable experiments and have control over the environment (and not mess with my released version of the Couchbase server running on my macbook) I'm currently waiting for a 4.4 GB download to finish. (it's crawling) |
| Comment by Traun Leyden [ 17/May/13 ] |
|
I have installed this build:
http://builder.hq.couchbase.com/get/couchbase-server-community_x86_64_2.0.2-806-rel.zip to a virtual machine running a clean install of Mountain Lion. What I see is that it's actually in the menu bar, but the icon is just a small "dot" rather than the couch icon. (see CouchbaseMenubar attachment) @Chisheng is this what you are seeing too? Or in your case, is there not even a "dot" icon? |
| Comment by Ravi Mayuram [ 17/May/13 ] |
| Now that you mention it, I see that as well (just a period) - which is build 793, so Chisheng must see that as well. |
| Comment by Traun Leyden [ 17/May/13 ] |
|
Don't have a fix yet, but found a workaround which sheds some light on the issue.
If I resize the Couchbase-Status-bw.png icon on the file system from 16x16 -> 116x116, it appears at closer to the normal size. (see attached screenshot Couchbase-Menubar-Resized.png) |
| Comment by Traun Leyden [ 17/May/13 ] |
|
I modified the image to have 72 DPI (was: 499 DPI), left the size the same, and it fixed the issue. Screenshot attached: CouchbaseMenubar-Fixed.png
Have submitted to Gerrit for code review: http://review.couchbase.org/#/c/26392/ |
[MB-8239] cbcollect_info only collects couchbase.log and misses other logs Created: 09/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | tools |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Chisheng Hong | Assignee: | Anil Kumar |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | build 2.0.2-793 | ||
| Attachments: |
|
| Operating System: | MacOSX 64-bit |
| Description |
|
Chishengs-MacBook-Pro:bin chisheng$ ./cbcollect_info
info.zipuname (uname -a) - OK Directory structure membase - previous versions (ls -lR /opt/membase /var/membase /var/opt/membase /etc/opt/membase) - Exit code 1 sysctl settings (sysctl -a) - OK Process list snapshot (top -l 1) - OK Disk activity (iostat 1 10) - OK Process list (ps -Aww -o user,pid,lwp,ppid,nlwp,pcpu,pri,nice,vsize,rss,tty,stat,wchan:12,start,bsdtime,command) - Exit code 1 Network configuration (ifconfig -a) - OK Taking sample 2 after 10.000000 seconds - OK Network status (netstat -an) - OK Network routing table (netstat -rn) - OK Arp cache (arp -na) - OK Filesystem (df -ha) - OK System activity reporter (sar 1 10) - OK System paging activity (vmstat 1 10) - Exit code 127 System uptime (uptime) - OK couchbase user definition (getent passwd couchbase) - Exit code 127 couchbase user limits (su couchbase -c "ulimit -a") - skipped (needs root privs) membase user definition (getent passwd membase) - Exit code 127 couchbase user limits (su couchbase -c "ulimit -a") - skipped (needs root privs) membase user limits (su membase -c "ulimit -a") - skipped (needs root privs) Interrupt status (intrstat 1 10) - Exit code 127 Processor status (mpstat 1 10) - Exit code 127 System log (cat /var/adm/messages) - Exit code 1 Kernel log buffer (dmesg) - Exit code 1 Checking for server guts in /Applications/Couchbase Server.app/Contents/Resources/couchbase-core/var/lib/couchbase/initargs... ./erl: line 28: /lib/erlang/erts-5.8.*/bin/erlexec: No such file or directory ./erl: line 28: exec: /lib/erlang/erts-5.8.*/bin/erlexec: cannot execute: No such file or directory Checking for server guts in /opt/couchbase/var/lib/couchbase/initargs... ./erl: line 28: /lib/erlang/erts-5.8.*/bin/erlexec: No such file or directory ./erl: line 28: exec: /lib/erlang/erts-5.8.*/bin/erlexec: cannot execute: No such file or directory Checking for server guts in /Users/chisheng/Library/Application Support/Couchbase/var/lib/couchbase/initargs... ./erl: line 28: /lib/erlang/erts-5.8.*/bin/erlexec: No such file or directory ./erl: line 28: exec: /lib/erlang/erts-5.8.*/bin/erlexec: cannot execute: No such file or directory |
| Comments |
| Comment by Maria McDuff [ 10/May/13 ] |
| need for 2.0.2. bumping up to critical. |
| Comment by Maria McDuff [ 14/May/13 ] |
|
anil to sit with alk k and loan his mac. chisheng will help if needed. |
| Comment by Anil Kumar [ 14/May/13 ] |
| tested this on 2.0.1 build it works as expected collects all the logs. snapshot included. thanks! |
| Comment by Aleksey Kondratenko [ 14/May/13 ] |
| Thats not enough. May I have those 2.0.1 collectinfos from mac? |
| Comment by Anil Kumar [ 14/May/13 ] |
| attached all the log files |
| Comment by Wayne Siu [ 17/May/13 ] |
| Anil to provide a live system for Alk to look at. |
[MB-8270] node down with error net_kernal_terminated Created: 13/May/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Thuan Nguyen | Assignee: | Aleksey Kondratenko |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | windows 2008 R2 64bit | ||
| Operating System: | Windows 64-bit |
| Description |
|
Environment:
4 windows server r2 64-bit with 4 core, 4GB RAM for each server Couchbase: couchbase server version 2.0.2-801 Run sanity test on windows. When investigate log file for bug 2013-05-13 17:59:28: (agent_config.c.705) ERROR: bad JSON configuration from http://127.0.0.1:8091/pools/default/saslBucketsStreaming: Number of vBuckets must be a power of two > 0 and <= 65536 ({ "name": "default", "nodeLocator": "vbucket", "saslPassword": "", "nodes": [{ "hostname": "127.0.0.1:8091", "ports": { "direct": 11210, "proxy": 11211 } }], "vBucketServerMap": { "hashAlgorithm": "CRC", "numReplicas": 1, "serverList": ["127.0.0.1:11210"], "vBucketMap": [] } }) EOL on stdin. Exiting [menelaus:info,2013-05-13T18:02:35.839,ns_1@127.0.0.1:<0.20352.3>:menelaus_web_buckets:handle_bucket_delete:345]Deleted bucket "default" [ns_server:debug,2013-05-13T18:02:36.980,ns_1@127.0.0.1:ns_config_log<0.284.0>:ns_config_log:log_common:111]config change: memory_quota -> 2184 [ns_server:debug,2013-05-13T18:02:36.980,ns_1@127.0.0.1:ns_config_rep<0.308.0>:ns_config_rep:do_push_keys:317]Replicating some config keys ([memory_quota]..) [cluster:debug,2013-05-13T18:02:38.136,ns_1@127.0.0.1:ns_cluster<0.273.0>:ns_cluster:handle_call:135]handling add_node("10.3.2.142", 8091, ..) [cluster:info,2013-05-13T18:02:38.136,ns_1@127.0.0.1:ns_cluster<0.273.0>:ns_cluster:do_change_address:315]Decided to change address to "10.3.2.143" [user:warn,2013-05-13T18:02:38.136,nonode@nohost:ns_node_disco<0.301.0>:ns_node_disco:handle_info:168]Node nonode@nohost saw that node 'ns_1@127.0.0.1' went down. Details: [{nodedown_reason, net_kernel_terminated}] [ns_server:info,2013-05-13T18:02:38.136,nonode@nohost:dist_manager<0.263.0>:dist_manager:handle_call:249]Adjusted IP to "10.3.2.143" [ns_server:info,2013-05-13T18:02:38.136,nonode@nohost:dist_manager<0.263.0>:dist_manager:bringup:227]Attempting to bring up net_kernel with name 'ns_1@10.3.2.143' [error_logger:info,2013-05-13T18:02:38.136,nonode@nohost:error_logger<0.6.0>:ale_error_logger_handler:log_report:72] =========================PROGRESS REPORT========================= supervisor: {local,net_sup} started: [{pid,<0.4435.4>}, {name,erl_epmd}, {mfargs,{erl_epmd,start_link,[]}}, {restart_type,permanent}, {shutdown,2000}, {child_type,worker}] Link to manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.2-801-rel.setup.exe.manifest.xml Link to collect info of all node https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_2/2013_05/4nodes-202-801_reb_hang_20130513-185157.tgz |
| Comments |
| Comment by Maria McDuff [ 14/May/13 ] |
| per bug triage, bumping up to critical. |
| Comment by Aleksey Kondratenko [ 17/May/13 ] |
| seeing net_kernel_terminate is not error in fact. So not a bug. |
[MB-7856] better to prevent running two instances of couchbase Install Shield wizards Created: 04/Mar/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | installer |
| Affects Version/s: | 2.0.1, 2.0.2 |
| Fix Version/s: | 2.1 |
| Security Level: | Public |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andrei Baranouski | Assignee: | Maria McDuff |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
build 170
can run 2 installer instances at the same time result: --------------------------- Couchbase Server - InstallShield Wizard --------------------------- Cannot open source file for reading. --------------------------- OK --------------------------- [Window Title] Setup Launcher [Main Instruction] Setup Launcher has stopped working [Content] Windows can check online for a solution to the problem. [V] View problem details [Check online for a solution and close the program] [Close the program] |
| Comments |
| Comment by Maria McDuff [ 26/Apr/13 ] |
| tony, can u quickly repro this? we want to know what the impact is when there's 2 installer wizards. does the install proceed or stall? can u check if the instances in task mgr actually still exist or actually terminates? pls confirm today, 4/26. thanks. |
| Comment by Thuan Nguyen [ 17/May/13 ] |
|
I could repro this bug in latest build 2.0.2-806 on windows 2008 R2 64bit. I could run 2 install instances and the installation is failed. After kill install process in task manager, re-install only one instance. The installation completed and I can go through the initial setup couchbase server in couchbase web console. |
[MB-8314] rebalance exited ns_vbucket_mover failed to initiate_indexing Created: 17/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | None |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Tommie McAfee | Assignee: | Aleksey Kondratenko |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
Have a 5 node cluster that was upgraded from 181 -> 202 (build 805).
After upgrade attempted to swap out orchestrator(10.3.121.69) and add in a new 202 node (10.3.3.131) Rebalance fails with ns_vbucket_mover000 on orchestrator reporting: <0.11037.1> exited with {noproc, {gen_server,call, [{'janitor_agent-saslbucket', 'ns_1@10.3.3.131'}, {if_rebalance,<0.1.1>,initiate_indexing}, infinity]}} On new node 202, couchdb went down and erlang crash dump was generated(attached). Saw these errors from mccouch which may be why vbuckets couldn't be moved to this node: Fri May 17 07:27:36.981472 PDT 3: (saslbucket) Trying to connect to mccouch: "127.0.0.1:11213" Fri May 17 07:27:36.981615 PDT 3: (saslbucket) Connected to mccouch: "127.0.0.1:11213" Fri May 17 07:27:37.019496 PDT 3: (saslbucket) Connection closed by mccouch Fri May 17 07:27:37.019527 PDT 3: (saslbucket) Resetting connection to mccouch, lastReceivedCommand = notify_vbucket_update lastSentCommand = notify_vbucket_update currentCommand =unknown Fri May 17 07:27:37.019595 PDT 3: (saslbucket) Trying to connect to mccouch: "127.0.0.1:11213" Fri May 17 07:27:37.019730 PDT 3: (saslbucket) Connected to mccouch: "127.0.0.1:11213" Fri May 17 07:27:37.021763 PDT 3: (saslbucket) Connection closed by mccouch Fri May 17 07:27:37.021788 PDT 3: (saslbucket) Resetting connection to mccouch, lastReceivedCommand = select_bucket lastSentCommand = notify_vbucket_update currentCommand =unknown =========================CRASH REPORT========================= crasher: initial call: mc_connection:init/1 pid: <0.965.0> registered_name: [] exception error: no case clause matching {error,system_limit} in function mc_connection:do_notify_vbucket_update/3 in call from mc_connection:handle_message/9 in call from mc_connection:read_full_message/2 in call from mc_connection:run_loop/2 ancestors: [mc_conn_sup,mc_sup,ns_server_sup,ns_server_cluster_sup, <0.59.0>] messages: [] links: [<0.641.0>,#Port<0.6784>] dictionary: [] trap_exit: false status: running heap_size: 1597 stack_size: 24 reductions: 1094838 neighbours: [error_logger:error,2013-05-17T7:27:36.622,ns_1@10.3.3.131:error_logger<0.6.0>:ale_error_logger_handler:log_report:72] =========================SUPERVISOR REPORT========================= Supervisor: {local,mc_conn_sup} Context: child_terminated Reason: {case_clause,{error,system_limit}} Offender: [{pid,<0.965.0>}, {name,mc_connection}, {mfargs,{mc_connection,start_link,undefined}}, {restart_type,temporary}, {shutdown,brutal_kill}, {child_type,worker}] =========================CRASH REPORT========================= crasher: initial call: couch_file:spawn_reader/2 pid: <0.652.0> registered_name: [] exception exit: {problem_reopening_file, {error,system_limit}, {set_close_after,infinity,<0.650.0>}, <0.652.0>, "/opt/couchbase/var/lib/couchbase/data/_replicator.couch.1", 10} in function couch_file:reader_loop/3 ancestors: [<0.650.0>,couch_server,couch_primary_services, couch_server_sup,cb_couch_sup,ns_server_cluster_sup, <0.59.0>] messages: [] links: [<0.650.0>] dictionary: [] trap_exit: true status: running heap_size: 377 stack_size: 24 reductions: 504 neighbours: |
| Comments |
| Comment by Tommie McAfee [ 17/May/13 ] |
|
the final trace here is similar to MB-8235 {problem_reopening_file although this is a much different context |
| Comment by Tommie McAfee [ 17/May/13 ] |
|
1st attempt to retry rebalance fails with: Server error during processing: ["web request failed", {path,"/pools/default/tasks"}, {type,exit}, {what, {{bulk_set_vbucket_state_failed, [{'ns_1@10.3.3.131', {'EXIT', {{{kill, {gen_server,call, [couch_server, {open,<<"saslbucket/master">>,[]}, infinity]}}, {gen_server,call, ['capi_set_view_manager-saslbucket', {set_vbucket_states, [replica,replica,replica,replica, replica,replica,replica,replica, replica,replica,replica,replica, replica,replica,replica,replica, replica,replica,replica,replica, replica,replica,replica,replica, replica,replica,replica,replica, replica,replica,replica,replica, replica,replica,replica,replica, replica,replica,replica,replica, replica,replica,replica,replica, replica,replica,replica,replica, replica,replica,replica,replica, missing,missing,missing,missing, missing,missing,missing,missing, missing,missing,missing,missing, |
| Comment by Aleksey Kondratenko [ 17/May/13 ] |
| Tommie, please avoid just giving raw logs. That's very very inconvenient compared to diag or cbcollectinfo. |
| Comment by Aleksey Kondratenko [ 17/May/13 ] |
|
We hit fds limit. Not clear why. I need cbcollectinfo from point of time when this happened. |
| Comment by Tommie McAfee [ 17/May/13 ] |
|
Sorry alk, I've restarted test Should I do : ulimit -n unlimited ? |
| Comment by Aleksey Kondratenko [ 17/May/13 ] |
| No. Just grab me cbcollect_info ASAP from moment it fails. That's important. |
[MB-7300] [Doc'd] cbtansfer/cbbackup/cbrestore - option to just transfer design-docs only Created: 30/Nov/12 Updated: 17/May/13 |
|
| Status: | Reopened |
| Project: | Couchbase Server |
| Component/s: | documentation |
| Affects Version/s: | 2.0, 2.0.1, 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Improvement | Priority: | Major |
| Reporter: | Steve Yen | Assignee: | Chisheng Hong |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | 2.0.2-release-notes, info-request | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Flagged: |
Release Note
|
| Description |
|
cbtansfer/cbbackup/cbrestore should optionally allow just transferring design-docs only, without any transfering of data items.
|
| Comments |
| Comment by Maria McDuff [ 27/Mar/13 ] |
| candidate for deferral. |
| Comment by Anil Kumar [ 01/Apr/13 ] |
| moving to 2.1. given other high prio tasks in the tools area, we can't get to this in 2.0.2. |
| Comment by Bin Cui [ 18/Apr/13 ] |
|
It is addressed in CBD-798.
cbtransfer -x design_doc_only=1 or cbtransfer -x data_only=1 |
| Comment by Maria McDuff [ 22/Apr/13 ] |
| in 2.0.2 build. pls verify/close. |
| Comment by Perry Krug [ 22/Apr/13 ] |
| Reopening to make sure we have documentation updates as well. |
| Comment by Karen Zeller [ 30/Apr/13 ] |
|
Hi Bin,
Can you come by or send me a message with the option to just transfer, backup, or restore just the design documents using these commands? Thanks, Karen |
| Comment by Bin Cui [ 02/May/13 ] |
|
I mentioned it in above comments. -x design_doc_only=1 means to transfer design doc only -x data_only=1 means to transfer data only without any design docs. Note, we cannot specify -x design_doc_only=1,data_only=1 because they are exclusive. |
| Comment by Karen Zeller [ 14/May/13 ] |
|
Hi Andrei,
I just talked to Bin about getting sample output from these command options for cbtransfer, cbbackup, and cbrestore.There isn't a current stable build on Mac yet for me to get this. Could you please send this? Thanks, Karen |
| Comment by Andrei Baranouski [ 15/May/13 ] |
| I believe that Chisheng is the best candidate to provide information related cbtansfer/cbbackup/cbrestore tools |
| Comment by Karen Zeller [ 16/May/13 ] |
|
From Chisheng: [root@cen-1725 bin]# ./cbtransfer http://10.5.2.30:8091 http://10.3.1.10:8091 -x design_doc_only=1 -b default -B default transfer design doc only. bucket msgs will be skipped. done [root@cen-1725 bin]# ./cbbackup http://10.5.2.30:8091 ~/backup -x design_doc_only=1 -b default transfer design doc only. bucket msgs will be skipped. done [root@cen-1725 bin]# ls ~/backup/ bucket-default [root@cen-1725 bin]# ls ~/backup/bucket-default/ design.json [{"controllers": {"compact": "/pools/default/buckets/default/ddocs/_design%2Fddoc1/controller/compactView", "setUpdateMinChanges": "/pools/default/buckets/default/ddocs/_design%2Fddoc1/controller/setUpdateMinChanges"}, "doc": {"json": {"views": {"view1": {"map": "function(doc){emit(doc.key,doc.key_num);}"}, "view2": {"map": "function(doc,meta){emit(meta.id,doc.key);}"}}}, "meta": {"rev": "1-6f9bfe0a", "id": "_design/ddoc1"}}}, {"controllers": {"compact": "/pools/default/buckets/default/ddocs/_design%2Fddoc2/controller/compactView", "setUpdateMinChanges": "/pools/default/buckets/default/ddocs/_design%2Fddoc2/controller/setUpdateMinChanges"}, "doc": {"json": {"views": {"chisheng": {"map": "function (doc, meta) {\n emit(meta.id, null);\n}"}}}, "meta": {"rev": "1-4b533871", "id": "_design/ddoc2"}}}, {"controllers": {"compact": "/pools/default/buckets/default/ddocs/_design%2Fdev_ddoc2/controller/compactView", "setUpdateMinChanges": "/pools/default/buckets/default/ddocs/_design%2Fdev_ddoc2/controller/setUpdateMinChanges"}, "doc": {"json": {"views": {"chisheng": {"map": "function (doc, meta) {\n emit(meta.id, null);\n}"}}}, "meta": {"rev": "1-a8b6f59b", "id": "_design/dev_ddoc2"}}}] [root@cen-1725 bin]# ./cbrestore ~/backup http://10.3.1.10:8091 -x design_doc_only=1 -b default -B default transfer design doc only. bucket msgs will be skipped. done |
| Comment by Karen Zeller [ 17/May/13 ] |
|
sent to Anil:: In the past this was experimental only. Bin is saying this will be officially supported in 2.0.2. Please confirm Thanks, Karen |
| Comment by Anil Kumar [ 17/May/13 ] |
| Karen, this is different improvement. this one was never experimental its a new feature for cbtansfer/cbbackup/cbrestore to optionally allow just transferring design-docs only, without any transfering of data items. thanks! |
| Comment by Karen Zeller [ 17/May/13 ] |
|
Yes I documented this for 2.0.2.
Sending for review. |
[MB-8309] online_upgrade_swap_rebalance 1.8.1->2.0.2: Rebalance exited with reason {{change_filter_failed,{'EXIT',{{unexpected_reason,killed} Created: 17/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Task | Priority: | Blocker |
| Reporter: | Andrei Baranouski | Assignee: | Aleksey Kondratenko |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | centos 32, 2.0.2-805-rel | ||
| Description |
|
http://qa.hq.northscale.net/job/centos-32-2.0-upgrade-P1/4/consoleFull
./testrunner -i /tmp/centos-32-2.0-upgrade.ini upgrade_version=2.0.2-805-rel,get-cbcollect-info=True,GROUP=P1 -t newupgradetests.MultiNodesUpgradeTests.online_upgrade_swap_rebalance,initial_version=1.8.1-942-rel,standard_buckets=1,items=500000,max_verify=1000,GROUP=1_8;ONLINE;WINDOWS;P1 steps: 1)10.3.3.151, 10.3.3.152 with 1.8.1 build 10.3.3.153 with 2.0.2-805-rel 2)[2013-05-17 01:11:42,198] - [rest_client:925] INFO - rebalance params : password=password&ejectedNodes=ns_1%4010.3.3.151&user=Administrator&knownNodes=ns_1%4010.3.3.152%2Cns_1%4010.3.3.151%2Cns_1%4010.3.3.153 ... [2013-05-17 01:26:15,908] - [task:340] INFO - rebalancing was completed with progress: 100.0% in 873.699863911 sec 3)[2013-05-17 01:26:37,009] - [rest_client:925] INFO - rebalance params : password=password&ejectedNodes=ns_1%4010.3.3.152&user=Administrator&knownNodes=ns_1%4010.3.3.152%2Cns_1%4010.3.3.153 ... [2013-05-17 01:31:07,549] - [rest_client:1014] ERROR - {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'} - rebalance failed [2013-05-17 01:31:07,550] - [rest_client:1015] INFO - Latest logs from UI: [2013-05-17 01:31:07,601] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.3.153', u'code': 2, u'text': u"Rebalance exited with reason {{change_filter_failed,\n {'EXIT',\n {{unexpected_reason,killed},\n [{misc,executing_on_new_process,1},\n {ns_vbm_sup,local_change_vbucket_filter,4},\n {rpc,local_call,3},\n {ns_vbm_sup,change_vbucket_filter,4},\n {ns_vbm_sup,'-set_replicas/3-fun-2-',5},\n {lists,foldl,3},\n {ns_vbm_sup,set_replicas,3},\n {ns_vbm_sup,\n '-set_replicas_on_nodes/3-fun-1-',3}]}}},\n [{ns_vbm_sup,change_vbucket_filter,4},\n {ns_vbm_sup,'-set_replicas/3-fun-2-',5},\n {lists,foldl,3},\n {ns_vbm_sup,set_replicas,3},\n {ns_vbm_sup,'-set_replicas_on_nodes/3-fun-1-',\n 3},\n {lists,foreach,2},\n {janitor_agent,\n do_bulk_set_vbucket_state_old_style,4},\n {ns_vbucket_mover,\n update_replication_post_move,3}]}\n", u'shortText': u'message', u'module': u'ns_orchestrator', u'tstamp': 1368779444497.0, u'type': u'info'} [2013-05-17 01:31:07,602] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.3.153', u'code': 0, u'text': u'Failed to get tap stats after 5 attempts (repeated 43 times)', u'shortText': u'message', u'module': u'ebucketmigrator_srv', u'tstamp': 1368779418787.0, u'type': u'critical'} [2013-05-17 01:31:07,603] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.3.153', u'code': 0, u'text': u'Failed to get tap stats after 5 attempts', u'shortText': u'message', u'module': u'ebucketmigrator_srv', u'tstamp': 1368779358861.0, u'type': u'critical'} [2013-05-17 01:31:07,603] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.3.153', u'code': 0, u'text': u'Failed to get tap stats after 5 attempts (repeated 179 times)', u'shortText': u'message', u'module': u'ebucketmigrator_srv', u'tstamp': 1368779358787.0, u'type': u'critical'} [2013-05-17 01:31:07,604] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.3.153', u'code': 0, u'text': u'Failed to get tap stats after 5 attempts', u'shortText': u'message', u'module': u'ebucketmigrator_srv', u'tstamp': 1368779299138.0, u'type': u'critical'} [2013-05-17 01:31:07,604] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.3.153', u'code': 0, u'text': u'Failed to get tap stats after 5 attempts (repeated 154 times)', u'shortText': u'message', u'module': u'ebucketmigrator_srv', u'tstamp': 1368779298786.0, u'type': u'critical'} [2013-05-17 01:31:07,605] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.3.153', u'code': 0, u'text': u'Failed to get tap stats after 5 attempts', u'shortText': u'message', u'module': u'ebucketmigrator_srv', u'tstamp': 1368779238798.0, u'type': u'critical'} [2013-05-17 01:31:07,605] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.3.153', u'code': 0, u'text': u'Failed to get tap stats after 5 attempts (repeated 132 times)', u'shortText': u'message', u'module': u'ebucketmigrator_srv', u'tstamp': 1368779238787.0, u'type': u'critical'} [2013-05-17 01:31:07,605] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.3.153', u'code': 0, u'text': u'Failed to get tap stats after 5 attempts', u'shortText': u'message', u'module': u'ebucketmigrator_srv', u'tstamp': 1368779184396.0, u'type': u'critical'} [2013-05-17 01:31:07,605] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.3.153', u'code': 0, u'text': u'Bucket "standard_bucket0" rebalance does not seem to be swap rebalance', u'shortText': u'message', u'module': u'ns_vbucket_mover', u'tstamp': 1368779184144.0, u'type': u'info'} |
| Comments |
[MB-8246] [system test] Rebalance exited with reason timeout waiting for backfill determination Created: 10/May/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Chisheng Hong | Assignee: | Mike Wiederhold |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | build 2.0.2-789-rel | ||
| Operating System: | Centos 64-bit |
| Description |
|
Cluster ip is 172.23.105.23
1. create 8 nodes cluster, each node has 12G RAM, HHD 2. create 2 buckets default and saslbucket, with memory quota 6G and 4G 3. Run the KV use case for 1 day: loading 35M items to each bucket, access the data 4k ops/sec with 5% create, 5% delete, 5%expire, 5% update, 80 gets for several hours. Then with the same work load, run some rebalance and failover operations. Works good. 4 Continue the workload for another day, then try to rebalance in one node, rebalance exit with time out. Rebalance exited with reason {unexpected_exit, {'EXIT',<0.7961.56>, {{badmatch, [{'EXIT', {timeout, {gen_server,call, [<12869.26294.2>,had_backfill,30000]}}}]}, [{ns_single_vbucket_mover, '-wait_backfill_determination/1-fun-1-', 1}]}}} ns_orchestrator002 ns_1@172.23.105.23 14:36:20 - Fri May 10, 2013 <0.7953.56> exited with {unexpected_exit, {'EXIT',<0.7961.56>, {{badmatch, [{'EXIT', {timeout, {gen_server,call, [<12869.26294.2>,had_backfill,30000]}}}]}, [{ns_single_vbucket_mover, '-wait_backfill_determination/1-fun-1-',1}]}}} The link for diags is https://s3.amazonaws.com/bugdb/jira/MB-8246/ns-diag-20130510162640.txt.zip |
| Comments |
| Comment by Chisheng Hong [ 10/May/13 ] |
| cbcollect info link is https://s3.amazonaws.com/bugdb/jira/MB-8246/10nodes_202-789_rebalance_timetout_20130510-173321.tgz |
| Comment by Chisheng Hong [ 13/May/13 ] |
| Aleksey K thought this was related to http://www.couchbase.com/issues/browse/MB-8231 |
| Comment by Maria McDuff [ 14/May/13 ] |
|
per bug triage, bumping up to blocker. if this is related to pls update with your findings nonetheless. |
| Comment by Maria McDuff [ 14/May/13 ] |
| per bug triage, wait for ep-engine to stabilize. |
| Comment by Mike Wiederhold [ 17/May/13 ] |
|
Duplicate of |
[MB-8235] error restarting couch_server: {{read_loop_died, {problem_reopening_file, Created: 09/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | cross-datacenter-replication, ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Tommie McAfee | Assignee: | Aleksey Kondratenko |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
Junyi > For some reasons, the CouchDB updater crashed during XDCR, and caused cascading results that babysitting proc restart the CouchDB multiple times, and then cause XDCR replicator crashed due to the inconsistent instance startup time (Source database out of sync).
Test topology: source:bucket0 <- bidirection -> dest1:bucket0 source:bucket0 -> dest2:bucket0 dest3:bucket0 dest4:bucket0 we have 4 outbound streams from source and 1 inbound. there is a 20k frontend load on bucket0 with get:70%,delete:10%,update:10%,set:10%,expire:5% inbound load from destination is from a 4k load with get:90%,delete:2%,update:5%,set:5%,expire:5% data is loaded till about 70% dgm. No views, no rebalancing. In Couchdb Logs we see: [couchdb:error,2013-05-09T7:44:46.318,ns_1@172.23.105.55:couch_server<0.8331.208>:couch_log:error:42]Unexpected message, restarting couch_server: {'EXIT',<0.16961.208>, {{read_loop_died, {problem_reopening_file, {error,system_limit}, {set_close_after,infinity, <0.16958.208>}, <0.16959.208>, "/opt/couchbase/var/lib/couchbase/data/bucket1/345.couch.2", 10}}, {gen_server,call, [<0.16958.208>,snapshot_reads, infinity]}}} In xdcr there are sync errors suggesting we increase max_dbs, but seems we are already hitting a limit as couchdb is restarting: Replication `a1c985cbafac10e773b130f01d1ba85c/bucket0/bucket0` ...failed: Source database out of sync. Try to increase max_dbs_open at the source's server. Attaching logs here from time of crash. full logs were copied up and left here: 172.23.105.55:/0509_couchdb/ (use rsa key) |
| Comments |
| Comment by Tommie McAfee [ 09/May/13 ] |
|
Given the 1->4 outbound topology, wondering if we are hitting OS limits for reads to db files, or this may be cause by a bug in server (excessive reads?). The source cluster has 4 nodes with 6 core's , accepting inbound replica's from another 6 core server with 3 of the destination clusters having 4 core cpus. |
| Comment by Filipe Manana [ 09/May/13 ] |
|
You are reaching the Erlang VM process limit, after which point spawning a process returns the system_limit error. This is unrelated to file descriptor limits. Use the Erlang VM startup option +P N (somewhere in the server startup script) to increase the hard limit (default is 32768). |
| Comment by Maria McDuff [ 09/May/13 ] |
| per QE, couchdb keeps restarting. |
| Comment by Tommie McAfee [ 09/May/13 ] |
| Thanks Filipe, also wondering if hitting this limit is related to the out of sync errors in xdcr? |
| Comment by Junyi Xie [ 09/May/13 ] |
|
Tommie, The out of sync errors in xdcr is due to db instance restarted during replication, which is probably because of hitting this limit. Please increase the limit and rerun the test to see if it works. The test probably overloaded the cluster with too many replications. |
| Comment by Filipe Manana [ 09/May/13 ] |
|
Yes it is Tommie. Replicator barfs if any of the database erlang processes restart. Here they restart because when file activity is needed, they attempt to open file processes which fail because the process limit was hit. Why there are so many processes on the system, it's another question. Probably way too many replicators (and all their child processes) and/or several buckets case is also another possibility (which doesn't seem to be the case from your short description). |
| Comment by Tommie McAfee [ 10/May/13 ] |
|
Updated to +P 131072 ( 4x default) And no longer hitting this problem. |
| Comment by Junyi Xie [ 10/May/13 ] |
|
Thanks Tommie and Filipe.
Dipti, Shall we document this behavior somewhere so customers wont hit the wall? The issue here is we need to bump up Erlang process limit, especially when users want to start multiple replications on a cluster like Tommie's test. Otherwise, XDCR may not work properly even the cluster itself is powerful enough to handle that many replications. |
| Comment by Maria McDuff [ 10/May/13 ] |
| bumping up erlang thread limit is 1 solution. it may have an impact on ns_server and view engine. tech lead from xdcr, view engine and ns_server need to decide on best setting to support multiple xdcr strings. |
| Comment by Aleksey Kondratenko [ 10/May/13 ] |
|
Tommie how exactly you added +P ? I'm asking because there's chance you actually bumped limit of babysitter VM which has nothing at all to do with XDCR. So your report that +P fixes the problem looks really weird and I'd like to understand this a bit more. Also given you hit error when you're opening file my understanding is that this is file descriptors limit rather than processes. |
| Comment by Aleksey Kondratenko [ 10/May/13 ] |
| Also let me note that I've double checked our code and we still run ns_server with +P 327680 which should be plenty. |
| Comment by Ravi Mayuram [ 10/May/13 ] |
|
Looks like this limit was raised in the past to ~300K. Perhaps that is either not set in the process init, or not there in the code. either way, raising the limit is acceptable here. Aleksey will review the code and guide Tommie as needed. |
| Comment by Aleksey Kondratenko [ 10/May/13 ] |
| See my reply above. I believe there's no problem with process count limit as we continue having it pretty high (327680). |
| Comment by Tommie McAfee [ 13/May/13 ] |
|
In /opt/couchbase/bin/couchbase-server I added +P here: exec erl +P 131072 \ +A 16 \ -smp enable \ -kernel inet_dist_listen_min 21100 inet_dist_listen_max 21299 \ error_logger false \ -sasl sasl_error_logger false \ -hidden \ -name 'babysitter_of_ns_1@127.0.0.1' \ looks like babysitter, where is ns_servers option for this? |
| Comment by Ravi Mayuram [ 13/May/13 ] |
| Which OS is this? If this is how the server is started it looks incomplete (at least based on what I'd to do for fixing OSX). |
| Comment by Tommie McAfee [ 13/May/13 ] |
|
CentOS 6.3 The rest of the command is omitted as I only changed the top line |
| Comment by Aleksey Kondratenko [ 13/May/13 ] |
|
There is no way to change process limit of ns_server. At least yet. And as I said it's already 320k. So I'd like you guys to reproduce this few more times and verify that indeed adding whatever you added to beam args of babysitter does not help. And then I'd recommend playing with file descriptors limit which could be bumped to 100k and most likely it will help. |
| Comment by Tommie McAfee [ 13/May/13 ] |
| Going back to vanilla install of 789 to see if it's reproduced without +P in babysitter. |
| Comment by Tommie McAfee [ 14/May/13 ] |
|
Seems I ran into another issue in most recent run( |
| Comment by Maria McDuff [ 14/May/13 ] |
|
tommie, can u confirm that this is no longer happening? pls confirm asap. Thanks. if not reproducible, pls close it. |
| Comment by Tommie McAfee [ 14/May/13 ] |
| This did not happen in latest run but I don't think it should be closed. There may be some other cause. In my opinion this test is unstable |
| Comment by Maria McDuff [ 14/May/13 ] |
| abhinav, pls update this bug. per tommie, you are now running the test. |
| Comment by Abhinav Dangeti [ 17/May/13 ] |
|
Still seen: http://guinep-s10501:8091/ (Live cluster) xdcr_error: Error replicating vbucket 510: <<"Source database out of sync. Try to increase max_dbs_open at the source's server.">> Build 2.0.2-803-rel |
| Comment by Wayne Siu [ 17/May/13 ] |
| Please provide Ali the live system. |
[MB-8291] [system test] warmup access log corrupt during warmup in windows Created: 15/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2, 2.1 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Thuan Nguyen | Assignee: | Thuan Nguyen |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | windows 2008 r2 64-bit | ||
| Attachments: |
|
| Operating System: | Windows 64-bit |
| Description |
|
Environment:
9 nodes windows server 2008 R2 64-bit Each node has 8GB RAM, 40+ GB SSD Cluster: Create 7 nodes cluster from nodes below 1:10.3.121.173 2:10.3.121.169 3:10.3.121.171 4:10.3.3.214 5:10.3.121.47 6:10.3.3.180 7:10.3.3.181 8:10.3.3.182 9:10.3.121.243 Create 2 buckets: default: 3GB, 1 replica, replica index enable saslbucket: 3GB, 1 replica, replica index disable Run tests kv only using specification from http://hub.internal.couchbase.com/confluence/display/QA/views+%28and+now+with+XDCR%29+tests At warm up phase (stop and restart couchbase server on node 10.3.121.243), monitor the warmup process Check the wamup status of this node and see access_log is corrupted as below $ /cygdrive/c/Program\ Files/Couchbase/Server/bin/cbstats.exe localhost:11210 raw warmup ep_warmup: enabled ep_warmup_access_log: corrupt ep_warmup_dups: 0 ep_warmup_estimate_time: 28569 ep_warmup_estimated_key_count: 3828057 ep_warmup_estimated_value_count: 3828057 ep_warmup_item_expired: 0 ep_warmup_key_count: 3828057 ep_warmup_keys_time: 45952020 ep_warmup_min_item_threshold: 100 ep_warmup_min_memory_threshold: 100 ep_warmup_oom: 0 ep_warmup_state: done ep_warmup_thread: complete ep_warmup_time: 131417617 ep_warmup_value_count: 3804872 Administrator@WIN-5IC1JB7LHUB ~ $ /cygdrive/c/Program\ Files/Couchbase/Server/bin/cbstats.exe localhost:11210 raw warmup -b saslbucket -p password ep_warmup: enabled ep_warmup_access_log: corrupt ep_warmup_dups: 0 ep_warmup_estimate_time: 40933 ep_warmup_estimated_key_count: 23217866 ep_warmup_estimated_value_count: 23217866 ep_warmup_item_expired: 0 ep_warmup_key_count: 23217866 ep_warmup_keys_time: 243091809 ep_warmup_min_item_threshold: 100 ep_warmup_min_memory_threshold: 100 ep_warmup_oom: 0 ep_warmup_state: done ep_warmup_thread: complete ep_warmup_time: 243243953 ep_warmup_value_count: 1 Link to manifest file http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.2-802-rel.setup.exe.manifest.xml Link to cbcollect_info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_2/2013_05/9ndoes_202-802_access-log-corrupt_at_warmup_20130515-125236.tgz |
| Comments |
| Comment by Thuan Nguyen [ 15/May/13 ] |
|
Link to access logs of default bucket and saslbucket https://s3.amazonaws.com/packages.couchbase/access_logs/2013_05/2_access_logs_202-802_access_log_corrupt_20130515-135037.tgz |
| Comment by Maria McDuff [ 15/May/13 ] |
| tony, does the data still load even when it's corrupt? what is the impact of this bug? is the system still running? |
| Comment by Thuan Nguyen [ 16/May/13 ] |
|
Even though the access logs were corrupted, all items load back to cluster. Cluster is running as normal. |
| Comment by Thuan Nguyen [ 16/May/13 ] |
|
Do warmup whole cluster:
Before shutdown whole cluster: 10.3.121.47 default bucket curr_items: 1908843 saslbucket curr_items: 11571700 After shutdown and warmup done 10.3.121.47 default bucket curr_items: 1312298 ep_warmup_key_count: 2633615 saslbucket curr_items: 3198476 ep_warmup_key_count: 6418492 Check warmup stats in all nodes at both default and saslbucket 10.3.3.182 ep_warmup_access_log: corrupt ep_warmup_thread: complete 10.3.3.214 ep_warmup_thread: complete 10.3.121.171 ep_warmup_access_log: corrupt ep_warmup_thread: complete 10.3.121.243 ep_warmup_access_log: corrupt ep_warmup_thread: complete 10.3.121.47 ep_warmup_access_log: corrupt ep_warmup_thread: complete 10.3.121.169 ep_warmup_access_log: corrupt ep_warmup_thread: complete 10.3.3.180 ep_warmup_access_log: corrupt ep_warmup_thread: complete Since we do items expiration in one day in the test, we should see some drop down in curr_items but not much as in result above. we need to re-run the test without expiration to confirm the result. |
| Comment by Wayne Siu [ 17/May/13 ] |
| Tony will run another test to confirm if there is data loss. |
[MB-7735] Memcached crash on a node on the destination cluster Created: 13/Feb/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Abhinav Dangeti | Assignee: | Mike Wiederhold |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.1-153-rel.deb.manifest.xml
source : destination :: 7 : 5 default, saslbucket ( both unidirectional) 2 views for each bucket on each cluster 4 core SSDs 30GB CentOS |
||
| Description |
|
Live clusters:
Source: http://10.6.2.37:8091/ Destination: http://10.6.2.89:8091/ On Source: default: ~>70% Resident ratio saslbucket: ~>40% Resident ratio System uptime: >36 hours On Destination: 2 nodes went down: 10.6.2.68, 10.6.2.69 - "ip seems to have changed" seen for both the nodes - ns_1@10.6.2.68 Server error during processing: ["web request failed", {path,"/pools/default"}, {type,exit}, {what, {timeout, {gen_server,call, [ns_node_disco,nodes_wanted]}}}, {trace, [{gen_server,call,2}, {ns_orchestrator,needs_rebalance,0}, {ns_cluster_membership,is_balanced,0}, {menelaus_web,build_pool_info,4}, {menelaus_web,handle_pool_info,2}, {menelaus_web,loop,3}, {mochiweb_http,headers,5}, {proc_lib,init_p_do_apply,3}]}] (repeated 15 times) - ns_1@10.6.2.69 Server error during processing: ["web request failed", {path,"/pools/default"}, {type,exit}, {what, {{timeout, {gen_server,call, [{'stats_reader-default', 'ns_1@10.6.2.69'}, {latest,minute,1}]}}, {gen_server,call, [menelaus_web_alerts_srv,fetch_alert]}}}, {trace, [{gen_server,call,2}, {diag_handler,diagnosing_timeouts,1}, {menelaus_web,build_pool_info,4}, {menelaus_web,handle_pool_info,2}, {menelaus_web,loop,3}, {mochiweb_http,headers,5}, {proc_lib,init_p_do_apply,3}]}] - Memcached core on 10.6.2.68 The Back trace: [root@pine-11803 data]# gdb /opt/couchbase/bin/memcached core.memcached.17636 GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /opt/couchbase/bin/memcached...done. [New Thread 17656] [New Thread 17645] [New Thread 17654] [New Thread 17646] [New Thread 17652] [New Thread 17653] [New Thread 17655] [New Thread 17636] Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done. Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0 Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done. Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/librt.so.1 Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done. Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4 Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libstdc++.so.6 Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libgcc_s.so.1 Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done. Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done. Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done. Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so Reading symbols from /opt/couchbase/lib/memcached/ep.so...done. Loaded symbols for /opt/couchbase/lib/memcached/ep.so Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done. Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1 Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done. Loaded symbols for /opt/couchbase/lib/libsnappy.so.1 Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libnss_files.so.2 Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'. Program terminated with signal 6, Aborted. #0 0x00007fc8afec28a5 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install couchbase-server-2.0.1-153.x86_64 (gdb) t a a bt Thread 8 (Thread 0x7fc8b1376720 (LWP 17636)): #0 0x00007fc8aff6a43d in write () from /lib64/libc.so.6 #1 0x00007fc8aff01033 in _IO_new_file_write () from /lib64/libc.so.6 #2 0x00007fc8aff00efa in _IO_new_file_xsputn () from /lib64/libc.so.6 #3 0x00007fc8afef692c in fputs () from /lib64/libc.so.6 #4 0x00007fc8aeb61143 in logger_log (severity=EXTENSION_LOG_WARNING, client_cookie=<value optimized out>, fmt=0x7fc8aab1bebb "Schedule cleanup of \"%s\"") at extensions/loggers/file_logger.c:275 #5 0x00007fc8aaad06a4 in TapConnMap::shutdownAllTapConnections (this=0x636e240) at src/tapconnmap.cc:366 #6 0x00007fc8aaa989d1 in EventuallyPersistentEngine::destroy (this=0x6374000, force=<value optimized out>) at src/ep_engine.cc:1401 #7 0x00007fc8aaa98ace in EvpDestroy (handle=<value optimized out>, force=false) at src/ep_engine.cc:130 #8 0x00007fc8adf52bb5 in bucket_shutdown_engine (key=<value optimized out>, nkey=<value optimized out>, val=0x63262a0, nval=<value optimized out>, args=<value optimized out>) at bucket_engine.c:1290 #9 0x00007fc8adf5966c in genhash_iter (h=0x632a000, iterfunc=0x7fc8adf52b80 <bucket_shutdown_engine>, arg=0x0) at genhash.c:275 #10 0x00007fc8adf53f46 in bucket_destroy (handle=0x7fc8ae15c640, force=<value optimized out>) at bucket_engine.c:1327 #11 0x0000000000409777 in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7927 Thread 7 (Thread 0x7fc8a8627700 (LWP 17655)): #0 0x00007fc8b022d7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fc8aaa76f28 in wait (this=0x63a22d0, d=...) at src/syncobject.hh:58 #2 IdleTask::run (this=0x63a22d0, d=...) at src/dispatcher.cc:336 #3 0x00007fc8aaa795ea in Dispatcher::run (this=0x636b880) at src/dispatcher.cc:173 #4 0x00007fc8aaa79eeb in launch_dispatcher_thread (arg=0x636b880) at src/dispatcher.cc:28 #5 0x00007fc8b0229851 in start_thread () from /lib64/libpthread.so.0 #6 0x00007fc8aff776dd in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7fc8a9a29700 (LWP 17653)): #0 0x00007fc8b022d7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fc8aaa76f28 in wait (this=0x63a2120, d=...) at src/syncobject.hh:58 #2 IdleTask::run (this=0x63a2120, d=...) at src/dispatcher.cc:336 #3 0x00007fc8aaa795ea in Dispatcher::run (this=0x636ac40) at src/dispatcher.cc:173 #4 0x00007fc8aaa79eeb in launch_dispatcher_thread (arg=0x636ac40) at src/dispatcher.cc:28 #5 0x00007fc8b0229851 in start_thread () from /lib64/libpthread.so.0 #6 0x00007fc8aff776dd in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7fc8aa638700 (LWP 17652)): #0 0x00007fc8aff3b97d in nanosleep () from /lib64/libc.so.6 #1 0x00007fc8aff70b34 in usleep () from /lib64/libc.so.6 #2 0x00007fc8aaab67f5 in updateStatsThread (arg=0x1ab44c0) at src/memory_tracker.cc:31 #3 0x00007fc8b0229851 in start_thread () from /lib64/libpthread.so.0 #4 0x00007fc8aff776dd in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7fc8aeb5d700 (LWP 17646)): #0 0x00007fc8b022d7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fc8aeb614d6 in logger_thead_main (arg=0x1ab4040) at extensions/loggers/file_logger.c:368 #2 0x00007fc8b0229851 in start_thread () from /lib64/libpthread.so.0 #3 0x00007fc8aff776dd in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7fc8a9028700 (LWP 17654)): #0 0x00007fc8b022d7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fc8aaa76f28 in wait (this=0x63a2090, d=...) at src/syncobject.hh:58 #2 IdleTask::run (this=0x63a2090, d=...) at src/dispatcher.cc:336 #3 0x00007fc8aaa795ea in Dispatcher::run (this=0x636aa80) at src/dispatcher.cc:173 #4 0x00007fc8aaa79eeb in launch_dispatcher_thread (arg=0x636aa80) at src/dispatcher.cc:28 #5 0x00007fc8b0229851 in start_thread () from /lib64/libpthread.so.0 #6 0x00007fc8aff776dd in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7fc8af772700 (LWP 17645)): #0 0x00007fc8aff6a3dd in read () from /lib64/libc.so.6 #1 0x00007fc8aff01248 in _IO_new_file_underflow () from /lib64/libc.so.6 #2 0x00007fc8aff02d4e in _IO_default_uflow_internal () from /lib64/libc.so.6 #3 0x00007fc8afef753a in _IO_getline_info_internal () from /lib64/libc.so.6 #4 0x00007fc8afef6399 in fgets () from /lib64/libc.so.6 #5 0x00007fc8af773939 in check_stdin_thread (arg=<value optimized out>) at extensions/daemon/stdin_check.c:37 #6 0x00007fc8b0229851 in start_thread () from /lib64/libpthread.so.0 #7 0x00007fc8aff776dd in clone () from /lib64/libc.so.6 ---Type <return> to continue, or q <return> to quit--- Thread 1 (Thread 0x7fc8a7c26700 (LWP 17656)): #0 0x00007fc8afec28a5 in raise () from /lib64/libc.so.6 #1 0x00007fc8afec4085 in abort () from /lib64/libc.so.6 #2 0x0000000000404315 in release_cookie (cookie=<value optimized out>) at daemon/memcached.c:6707 #3 0x00007fc8adf55009 in bucket_engine_release_cookie (cookie=0x62c3b80) at bucket_engine.c:2565 #4 0x00007fc8aaa9461a in EventuallyPersistentEngine::releaseCookie (this=0x6374000, cookie=0x62c3b80) at src/ep_engine.cc:1230 #5 0x00007fc8aaabf376 in TapConnection::releaseReference (this=0x638a000, force=<value optimized out>) at src/tapconnection.cc:110 #6 0x00007fc8aaad2aeb in TapConnectionReaperCallback::callback(Dispatcher&, SingleThreadedRCPtr<Task>&) () from /opt/couchbase/lib/memcached/ep.so #7 0x00007fc8aaa795ea in Dispatcher::run (this=0x636b6c0) at src/dispatcher.cc:173 #8 0x00007fc8aaa79eeb in launch_dispatcher_thread (arg=0x636b6c0) at src/dispatcher.cc:28 #9 0x00007fc8b0229851 in start_thread () from /lib64/libpthread.so.0 #10 0x00007fc8aff776dd in clone () from /lib64/libc.so.6 |
| Comments |
| Comment by Junyi Xie [ 13/Feb/13 ] |
|
memcached crashed after 36 hours test. From source, XDCR just tried to replicate as expected but suddenly at a timepoint, destination node is not accessible due to memcached crash. And the web UI at destination is also not accessible at this time
ep_engine team, can you please take a first look? Thanks. |
| Comment by Abhinav Dangeti [ 13/Feb/13 ] |
|
Very high erlang and memcached usage on the nodes with inbound XDCR and multiple views. 'top' results: 10.6.2.68: 7851 couchbas 20 0 4069m 2.4g 40m S 61.9 7.8 1636:56 beam.smp 1963 couchbas 20 0 15.0g 14g 2684 S 21.5 47.3 89:04.92 memcached 10.6.2.45: 10617 couchbas 20 0 6853m 4.3g 40m S 134.4 14.0 2363:19 beam.smp 10783 couchbas 20 0 16.6g 11g 2600 S 1.0 37.0 573:37.54 memcached 10.6.2.66: 30917 couchbas 20 0 7418m 4.9g 40m S 110.4 16.6 2128:11 beam.smp 6793 couchbas 20 0 15.0g 14g 2552 S 1.8 50.0 12:47.97 memcached 10.6.2.69: 29428 couchbas 20 0 8233m 5.7g 40m S 95.9 18.4 1845:56 beam.smp 879 couchbas 20 0 15.1g 13g 2572 S 5.2 44.9 434:52.01 memcached |
| Comment by Junyi Xie [ 13/Feb/13 ] |
|
Abhinav, The destination seems not accessible to me. Can you please upload the diag or log of dest cluster? |
| Comment by Abhinav Dangeti [ 13/Feb/13 ] |
|
sure, could you try http://10.6.2.89:8091/ , the other 4 nodes seem unaccessible.
https://s3.amazonaws.com/bugdb/MB-7735/ns-diag-20130213122639.txt.zip |
| Comment by Abhinav Dangeti [ 14/Feb/13 ] |
|
Based on a recent conversation with Chiyoung mentioned that this is a known issue that happens sometimes when there is a heavy load on the cluster (indexing + compaction of views running on destination cluster: 2 views per bucket, 2 buckets, 2 inbound replication streams from source for the 2 buckets) and that there is an other duplicate bug with a similar memcached crash (possibly occurred during the tap connections in the destination cluster). Also Junyi had to say that XDCR does backoff based on the how the destination is doing (if disk write queue on destination is very high and increasing); We could try and reduce and load on the destination (by reducing the number of views or by reducing the number of vbucket replicators) and see how the destination does; Still need to figure out why the other 3 nodes went into pending state with core generated on just the one node. |
| Comment by Jin Lim [ 14/Feb/13 ] |
|
Chiyoung/Abhinav - maybe this is the similar bug you are referring to? http://www.couchbase.com/issues/browse/MB-7601 I guess if the crash turns out to be the same issue as |
| Comment by Jin Lim [ 15/Feb/13 ] |
| Per bug scrubs, any memcached crash is to be treated as blockers for 2.0.1. Thanks. |
| Comment by Chiyoung Seo [ 15/Feb/13 ] |
|
Mike,
Please take a second eye into this issue. |
| Comment by Mike Wiederhold [ 19/Feb/13 ] |
|
Abhinav,
Please re-run this test and let me know if you can get a core dump for this crash so I can investigate it further. If you cannot reproduce it then let's close the issue for now. |
| Comment by Mike Wiederhold [ 19/Feb/13 ] |
|
See |
| Comment by Ketaki Gangal [ 19/Feb/13 ] |
|
We are running views independent of xdcr for the current sprint. Will resume the xdcr+views from the next sprint onwards. |
| Comment by Jin Lim [ 20/Feb/13 ] |
|
Deferring this to 2.0.2 as is a duplicate of |
| Comment by Mike Wiederhold [ 21/Feb/13 ] |
| Please assign back to me if you find a core dump. Otherwise I'm not sure what else I can do here. |
| Comment by Jin Lim [ 05/Mar/13 ] |
| core file is in the specified location above but no cbcollect info |
| Comment by Mike Wiederhold [ 05/Mar/13 ] |
|
This thread crashed in memcached because the thread in the connection is null. Thread 1 (Thread 0x47718940 (LWP 17718)): #0 add_conn_to_pending_io_list (cookie=0x164c7340, status=ENGINE_SUCCESS) at daemon/thread.c:722 #1 notify_io_complete (cookie=0x164c7340, status=ENGINE_SUCCESS) at daemon/thread.c:488 #2 0x00002aaaaaf4a4fd in notifyIOComplete (this=<value optimized out>, tc=0x16d13400) at src/ep_engine.h:439 #3 TapConnMap::notifyPausedConnection_UNLOCKED (this=<value optimized out>, tc=0x16d13400) at src/tapconnmap.cc:347 #4 0x00002aaaaaee4901 in performTapOp<void*> (this=0x173d3f80, d=<value optimized out>, t=<value optimized out>) at src/tapconnmap.hh:119 #5 BackfillDiskLoad::callback (this=0x173d3f80, d=<value optimized out>, t=<value optimized out>) at src/backfill.cc:78 #6 0x00002aaaaaef473a in Dispatcher::run (this=0x16583880) at src/dispatcher.cc:173 #7 0x00002aaaaaef503b in launch_dispatcher_thread (arg=0x16583880) at src/dispatcher.cc:28 #8 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0 #9 0x00002b9d8d52225d in clone () from /lib64/libc.so.6 (gdb) (gdb) frame #0 add_conn_to_pending_io_list (cookie=0x164c7340, status=ENGINE_SUCCESS) at daemon/thread.c:722 722 in daemon/thread.c (gdb) info locals notify = 0 (gdb) info args c = 0x164c7340 (gdb) print c $1 = (conn *) 0x164c7340 (gdb) print *c $2 = {sfd = -1, nevents = 1, sasl_conn = 0x0, state = 0x405c80 <conn_immediate_close>, substate = bin_reading_packet, registered_in_libevent = false, event = { ev_active_next = {tqe_next = 0x11ce1a08, tqe_prev = 0x11cdc080}, ev_next = {tqe_next = 0x0, tqe_prev = 0x164c78f0}, ev_timeout_pos = {ev_next_with_common_timeout = { tqe_next = 0xffffffff, tqe_prev = 0x0}, min_heap_idx = -1}, ev_fd = 71, ev_base = 0x16546280, _ev = {ev_io = {ev_io_next = {tqe_next = 0x0, tqe_prev = 0x1e413e20}, ev_timeout = {tv_sec = 0, tv_usec = 0}}, ev_signal = {ev_signal_next = {tqe_next = 0x0, tqe_prev = 0x1e413e20}, ev_ncalls = 0, ev_pncalls = 0x0}}, ev_events = 22, ev_res = 0, ev_flags = 128, ev_pri = 0 '\000', ev_closure = 2 '\002', ev_timeout = {tv_sec = 0, tv_usec = 0}, ev_callback = 0x405d60 <event_handler>, ev_arg = 0x164c7340}, ev_flags = 22, which = 2, rbuf = 0x1652f800 "", rcurr = 0x1eec0000 "", rsize = 2048, rbytes = 0, wbuf = 0x164c8000 "\200A", wcurr = 0x164c8000 "\200A", wsize = 2048, wbytes = 284795, write_and_go = 0x413960 <conn_ship_log>, write_and_free = 0x0, ritem = 0x1eec0810 "\201A", rlbytes = 0, item = 0x0, store_op = 0, sbytes = 0, iov = 0x164d2800, iovsize = 400, iovused = 0, msglist = 0x164c4fc0, msgsize = 10, msgused = 1, msgcurr = 0, msgbytes = 0, ilist = 0x164b2000, isize = 200, icurr = 0x164b2000, ileft = 0, suffixlist = 0x164700a0, suffixsize = 20, suffixcurr = 0x164700a0, suffixleft = 0, protocol = binary_prot, transport = tcp_transport, request_id = 0, request_addr = {ss_family = 0, __ss_align = 0, __ss_padding = '\000' <repeats 111 times>}, request_addr_size = 0, hdrbuf = 0x0, hdrsize = 0, noreply = false, refcount = 1 '\001', dynamic_buffer = {buffer = 0x0, size = 2048, offset = 32}, engine_storage = 0x0, ascii_cmd = 0x0, binary_header = { request = {magic = 129 '\201', opcode = 65 'A', keylen = 0, extlen = 0 '\000', datatype = 0 '\000', vbucket = 0, bodylen = 0, opaque = 1082130944, cas = 0}, bytes = "\201A", '\000' <repeats 11 times>, "\002\200@\000\000\000\000\000\000\000"}, cas = 0, cmd = 65, opaque = 1082130944, keylen = 0, list_state = 0, next = 0x164a38c0, thread = 0x0, aiostat = ENGINE_SUCCESS, ewouldblock = true, tap_iterator = 0, parent_port = 11209} (gdb) |
| Comment by Mike Wiederhold [ 05/Mar/13 ] |
|
Trond,
Please see my comments above. The core file is also still available if you need to dig any further. |
| Comment by Trond Norbye [ 05/Mar/13 ] |
|
The code calls release_cookie on a cookie that is already closed. Could there be code paths where release_cookie is called _twice_, or that reserve_cookie isn't called?
All operations that happens "async" in connections thats running duplex needs to call reserve_cookie if they are doing stuff with a cookie in the background. WIthout that you might get a disconnect from the frontend and memcached don't think the cookie is in use so it will close the connection immediately. |
| Comment by Mike Wiederhold [ 07/Mar/13 ] |
|
See |
| Comment by Thuan Nguyen [ 07/Mar/13 ] |
|
memcached coredump from online upgrade from 1.8.1-945 to 2.0.1-170 https://s3.amazonaws.com/bugdb/jira/MB-7735/core.memcached-7735-2.0.1-6895.gz memcached crashed during swap rebalance as I updated in bug |
| Comment by Thuan Nguyen [ 07/Mar/13 ] |
|
Links to diags, collect info https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_1/201303/6nodes-diags-colinfo-ec2-online-upgrade-181_945-201_170-memcached_crashed-20130307.tgz Link to stack trace https://friendpaste.com/7QRfJfkG2r9w7roPLyRdMJ |
| Comment by Mike Wiederhold [ 11/Mar/13 ] |
|
(gdb) bt #0 add_conn_to_pending_io_list (cookie=0x164c7340, status=ENGINE_SUCCESS) at daemon/thread.c:722 #1 notify_io_complete (cookie=0x164c7340, status=ENGINE_SUCCESS) at daemon/thread.c:488 #2 0x00002aaaaaf4a4fd in notifyIOComplete (this=<value optimized out>, tc=0x16d13400) at src/ep_engine.h:439 #3 TapConnMap::notifyPausedConnection_UNLOCKED (this=<value optimized out>, tc=0x16d13400) at src/tapconnmap.cc:347 #4 0x00002aaaaaee4901 in performTapOp<void*> (this=0x173d3f80, d=<value optimized out>, t=<value optimized out>) at src/tapconnmap.hh:119 #5 BackfillDiskLoad::callback (this=0x173d3f80, d=<value optimized out>, t=<value optimized out>) at src/backfill.cc:78 #6 0x00002aaaaaef473a in Dispatcher::run (this=0x16583880) at src/dispatcher.cc:173 #7 0x00002aaaaaef503b in launch_dispatcher_thread (arg=0x16583880) at src/dispatcher.cc:28 #8 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0 #9 0x00002b9d8d52225d in clone () from /lib64/libc.so.6 (gdb) frame 3 #3 TapConnMap::notifyPausedConnection_UNLOCKED (this=<value optimized out>, tc=0x16d13400) at src/tapconnmap.cc:347 347 src/tapconnmap.cc: No such file or directory. in src/tapconnmap.cc (gdb) p *((TapConnection*)tc) $1 = {_vptr.TapConnection = 0x2aaaab1c4590, static tapCounter = {value = 408}, engine = @0x16542000, cookie = 0x164c6840, name = "eq_tapq:replication_ns_1@10.3.3.94", created = 460, connToken = 8109446884534031, expiryTime = 4294967295, connected = true, disconnect = false, numDisconnects = {value = 14}, supportAck = true, supportCheckpointSync = true, reserved = {value = true}, stats = @0x165422e0, logString = "TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 -"} (gdb) frame 2 #2 0x00002aaaaaf4a4fd in notifyIOComplete (this=<value optimized out>, tc=0x16d13400) at src/ep_engine.h:439 439 src/ep_engine.h: No such file or directory. in src/ep_engine.h (gdb) info args status = ENGINE_SUCCESS cookie = 0x164c7340 this = 0x16542000 Notice how in the 3rd frame when I inspect the cookie on the tap connection it has address 0x164c6840, but then in the next function when I look at the cookie passed to it the cookie has address 0x164c7340. All that happens to get this cookie from the tap connection is a call to tp->getCookie() which simply return the tap connections cookie. It is also not possible to change the cookie between these calls since the only place you can change the cookie requires the same lock as the one we are currently holding. Below is the code for the 2 functions I mentioned. void TapConnMap::notifyPausedConnection_UNLOCKED(TapProducer *tc) { if (tc && tc->paused) { engine.notifyIOComplete(tc->getCookie(), ENGINE_SUCCESS); tc->notifySent.set(true); } } void notifyIOComplete(const void *cookie, ENGINE_ERROR_CODE status) { if (cookie == NULL) { LOG(EXTENSION_LOG_WARNING, "Tried to signal a NULL cookie!"); } else { BlockTimer bt(&stats.notifyIOHisto); EventuallyPersistentEngine *epe = ObjectRegistry::onSwitchThread(NULL, true); serverApi->cookie->notify_io_complete(cookie, status); ObjectRegistry::onSwitchThread(epe); } } |
| Comment by Xiaoqin Ma [ 14/Mar/13 ] |
|
When we calling notifyPausedConnection_UNLOCKED(tp), we don't hold any lock, I am not sure we do it purposely or it is a potential bug: bool performTapOp(const std::string &name, TapOperation<V> &tapop, V arg) { bool ret(true); LockHolder lh(notifySync); TapConnection *tc = findByName_UNLOCKED(name); if (tc) { TapProducer *tp = dynamic_cast<TapProducer*>(tc); assert(tp != NULL); tapop.perform(tp, arg); lh.unlock(); notifyPausedConnection_UNLOCKED(tp); } else { ret = false; } return ret; } |
| Comment by Mike Wiederhold [ 14/Mar/13 ] |
|
Xiaoqin, We grab the notifySync lock in the performTapOp function so we don't need to grab the lock when in notifyPausedConnection_UNLOCKED. In ep_engine calling any function with _UNLOCKED appended to the end means that the calling function should grab the lock before calling and unlocked function. |
| Comment by Xiaoqin Ma [ 14/Mar/13 ] |
| See the code, right before calling notifyPausedConnection, we release the lock. |
| Comment by Mike Wiederhold [ 14/Mar/13 ] |
| I missed that. Thanks for pointing that out and that might fix the problem. I will upload a code review, but I would also like to get Chiyoung's opinion on this since he introduced the code that released the lock there. |
| Comment by Maria McDuff [ 25/Mar/13 ] |
|
bug scrub: chiyoung -- have you taken a look at this? pls update. thanks. |
| Comment by Jin Lim [ 25/Mar/13 ] |
| EP Engine team (MV) has been working on this with possible fixes. Please assign this to Mike so he can reassign to the right engineer. Thanks. |
| Comment by Mike Wiederhold [ 25/Mar/13 ] |
| Chiyoung is the right engineer to look at this. I would like to get his perspective. I will also take another look sometime this week. |
| Comment by Mike Wiederhold [ 09/Apr/13 ] |
|
Abhinav,
Please verify that you no longer see this crash in the toy build below. It is a 2.0.2 build with an ep-engine patch. http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_toy-mikewied-x86_64_2.0.0-17-toy-mikewied.rpm |
| Comment by Maria McDuff [ 16/Apr/13 ] |
| abhinav will repro in toy build by tomorrow (end of this week at the latest). looking for hw resource. |
| Comment by Mike Wiederhold [ 16/Apr/13 ] |
| I need to make a new toy build and will assign this back to Abhinav when I am done. |
| Comment by Mike Wiederhold [ 16/Apr/13 ] |
| http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_toy-mikewied-x86_64_2.0.0-19-toy-mikewied.rpm |
| Comment by Maria McDuff [ 22/Apr/13 ] |
| abhinav, pls update with your test result using the toy build from mike. thanks. |
| Comment by Chisheng Hong [ 22/Apr/13 ] |
|
With Mike's toy build 2.0.0 mikewied edition (build-19), I set up 4 node cluster (10.6.2.66) as xdcr source and 3 node cluster (10.6.2.43) as xdcr destination. For both clusters, create 2 buckets, 12.7G memory quota for default bucket and 7 G memory quota for saslbucket and create 1 ddoc with 2 views for each bucket. On source cluster, load around 40M items (512 Byte for each item) to make these 2 buckets dgm state. 70% for default and 40% for saslbucket. During the initial loading, create uni-directional xdcr from source to destination. After initial indexing and xdcr replication for initial load between 2 clusters finished, start the data access phase on Saturday morning, April 20. During this phase, workload ratio is set:5,update:5,get:80,delete:5,expire:5 with ops per second 8K for 4 node cluster on source and query is 120 view reads per second. No workload was running on destination. The observation is that after the workload was running for around 12 hours, core.memcached.xxxx began to be generated. All the nodes on source are all in pending state now after the workload rans for 2 days. The source cluster is unresponsive. |
| Comment by Chisheng Hong [ 22/Apr/13 ] |
| Hi Mike if you want me or Abhinav to do some other test, assign this ticket back. |
| Comment by Maria McDuff [ 29/Apr/13 ] |
| per mike this morning's scrub, he will be fixing this for 2.0.2. |
| Comment by Wayne Siu [ 03/May/13 ] |
|
@Mike, slso see comments from ChiYoung in tickets |
| Comment by Maria McDuff [ 07/May/13 ] |
|
we got a toy build from mike today: http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_toy-mikewied-x86_64_2.0.0-19-toy-mikewied.rpm chisheng/abhinav will re-run this test as soon as hw is free'd up. |
| Comment by Maria McDuff [ 08/May/13 ] |
|
mike to drop a new toy build to chisheng with mrw code in it. he'll also try to loan 4 vms to QE for this test run... thanks, mike. |
| Comment by Maria McDuff [ 09/May/13 ] |
| mike delivered a toy build an hour ago. chisheng is running the test. he will update this bug with his test result. |
| Comment by Maria McDuff [ 14/May/13 ] |
| per mike, he has to give chisheng another toy build. |
| Comment by Maria McDuff [ 16/May/13 ] |
| per mike, wait until build is stable. he will issue another toybuild today --- QE has to let him know as soon as the build is stable. Maria will update this bug in the next few hours. |
| Comment by Wayne Siu [ 17/May/13 ] |
| Mike to give a build to QE by Monday 05.20. |
[MB-7153] /opt/couchbase/bin/tools/unsupported ( and the equivalent on mac ) is empty folder Created: 11/Nov/12 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | installer, tools |
| Affects Version/s: | 2.0, 2.0.1, 2.0.2 |
| Fix Version/s: | 2.0.3 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Don Pinto | Assignee: | Bin Cui |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | MacOSX 64-bit |
| Description |
|
reported in bug-bash-athon
tools/unsupported folder is now empty. this folder contained unsupported and experimental scripts which now all of them are removed in 2.0 so maybe it makes sense to delete this folder and add this back later when needed ? |
| Comments |
| Comment by Maria McDuff [ 01/Apr/13 ] |
| per bug scrub: Requesting Phil to check with Bin to locate this empty folder for deletion. |
| Comment by Phil Labee [ 01/May/13 ] |
| Empty directories are created in installation. These directories are no longer needed. |
| Comment by Maria McDuff [ 16/May/13 ] |
| per bug triage, Bin -- can you pls try to fix by the end of this week? Thanks. |
| Comment by Bin Cui [ 17/May/13 ] |
|
There are lots of places that are related unsupported directory. I don't think it is a good idea to remove it right before the eve of release. |
| Comment by Wayne Siu [ 17/May/13 ] |
| Moving it to 2.0.3. |
[MB-8243] memcached crashed in Configuration::getCouchBucket Created: 10/May/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Andrei Baranouski | Assignee: | Andrei Baranouski |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | 2.0.2-795 | ||
| Operating System: | Ubuntu 64-bit |
| Description |
|
I see crash from http://qa.hq.northscale.net/job/CouchbaseServer-SanityTest-4Node-
Centos64/63/consoleFull run on build 2.0.2-795-rel seems it's the test: ./testrunner -i fournode.ini -t viewquerytests.ViewQueryTests.test_employee_dataset_all_queries,limit=1000,docs-per-day=2,wait_persistence=true but this job doesn't generate collect info after failures, and I have logs that have been obtained much later gdb /opt/couchbase/bin/memcached core.memcached.7524 GNU gdb (GDB) CentOS (7.0.1-45.el5.centos) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /opt/couchbase/bin/memcached...done. [New Thread 7705] [New Thread 8584] [New Thread 8583] [New Thread 8582] [New Thread 7709] [New Thread 7708] [New Thread 7707] [New Thread 7706] [New Thread 7704] [New Thread 7703] [New Thread 7702] [New Thread 7701] [New Thread 7539] [New Thread 7538] [New Thread 7537] [New Thread 7536] [New Thread 7535] [New Thread 7533] [New Thread 7532] [New Thread 7524] warning: .dynamic section for "/usr/lib64/libstdc++.so.6" is not at the expected address warning: difference appears to be caused by prelink, adjusting expectations warning: .dynamic section for "/lib64/libgcc_s.so.1" is not at the expected address warning: difference appears to be caused by prelink, adjusting expectations Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done. Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0 Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done. Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/librt.so.1 Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done. Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4 Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libstdc++.so.6 Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libgcc_s.so.1 Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done. Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done. Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done. Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so Reading symbols from /opt/couchbase/lib/memcached/ep.so...done. Loaded symbols for /opt/couchbase/lib/memcached/ep.so Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done. Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1 Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done. Loaded symbols for /opt/couchbase/lib/libsnappy.so.1 Reading symbols from /opt/couchbase/lib/libicuuc.so.44...done. Loaded symbols for /opt/couchbase/lib/libicuuc.so.44 Reading symbols from /opt/couchbase/lib/libicudata.so.44...(no debugging symbols found)...done. Loaded symbols for /opt/couchbase/lib/libicudata.so.44 Reading symbols from /opt/couchbase/lib/libicui18n.so.44...done. Loaded symbols for /opt/couchbase/lib/libicui18n.so.44 warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff9dbe8000 Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'. Program terminated with signal 6, Aborted. #0 0x0000003866c30285 in raise () from /lib64/libc.so.6 (gdb) t aa bt A syntax error in expression, near `bt'. (gdb) t a a bt Thread 20 (Thread 0x2af656d41220 (LWP 7524)): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002af6568c7576 in epoll_dispatch (base=0x222b6000, tv=<value optimized out>) at epoll.c:404 #2 0x00002af6568b2e44 in event_base_loop (base=0x222b6000, flags=<value optimized out>) at event.c:1558 #3 0x00000000004097d6 in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7926 Thread 19 (Thread 7532): #0 0x0000003866cc545b in read () from /lib64/libc.so.6 #1 0x0000003866c6b677 in _IO_new_file_underflow () from /lib64/libc.so.6 #2 0x0000003866c6c03e in _IO_default_uflow_internal () from /lib64/libc.so.6 #3 0x0000003866c61124 in _IO_getline_info_internal () from /lib64/libc.so.6 #4 0x0000003866c5ffc9 in fgets () from /lib64/libc.so.6 #5 0x00002af656d42939 in check_stdin_thread (arg=<value optimized out>) at extensions/daemon/stdin_check.c:37 #6 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #7 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 18 (Thread 7533): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaaae4d6 in logger_thead_main (arg=0x1da4e040) at extensions/loggers/file_logger.c:368 #2 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #3 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 17 (Thread 7535): #0 0x00002af656b17f6f in tcmalloc::CentralFreeList::Populate() () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #1 0x00002af656b17c93 in tcmalloc::CentralFreeList::FetchFromSpansSafe() () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #2 0x00002af656b17bcc in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #3 0x00002af656b1ccd5 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #4 0x00002af656b12054 in tcmalloc::ThreadCache::Allocate(unsigned long, unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #5 0x00002af656b10b60 in (anonymous namespace)::do_malloc(unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #6 0x00002af656b10892 in (anonymous namespace)::do_malloc_or_cpp_alloc(unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #7 0x00002af656b10bec in (anonymous namespace)::do_calloc(unsigned long, unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #8 0x00002af656b24764 in tc_calloc () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #9 0x00002aaaaaef94b9 in HashTable (this=<value optimized out>, i=918, newState=vbucket_state_replica, st=..., checkpointConfig=..., checkpointId=1, initState=vbucket_state_dead) at src/stored-value.hh:814 #10 VBucket::VBucket (this=<value optimized out>, i=918, newState=vbucket_state_replica, st=..., checkpointConfig=..., checkpointId=1, initState=vbucket_state_dead) at src/vbucket.hh:126 #11 0x00002aaaaaefb932 in EventuallyPersistentStore::setVBucketState (this=0x2a424400, vbid=918, to=vbucket_state_replica) at src/ep.cc:795 #12 0x00002aaaaaf1e11d in setVBucketState (h=0x2230e900, cookie=0x222858c0, request=0x22276800, response=0x4075e0 <binary_response_handler>) at src/ep_engine.h:508 #13 setVBucket (h=0x2230e900, cookie=0x222858c0, request=0x22276800, response=0x4075e0 <binary_response_handler>) at src/ep_engine.cc:738 #14 processUnknownCommand (h=0x2230e900, cookie=0x222858c0, request=0x22276800, response=0x4075e0 <binary_response_handler>) at src/ep_engine.cc:884 #15 0x00002aaaaaf1e99c in EvpUnknownCommand (handle=<value optimized out>, cookie=0x222858c0, request=0x22276800, response=0x4075e0 <binary_response_handler>) at src/ep_engine.cc:1021 #16 0x00002aaaaacc4dd4 in bucket_unknown_command (handle=<value optimized out>, cookie=0x222858c0, request=0x22276800, response=0x4075e0 <binary_response_handler>) at bucket_engine.c:2475 #17 0x0000000000411b4e in process_bin_unknown_packet (c=0x222858c0) at daemon/memcached.c:2882 #18 process_bin_packet (c=0x222858c0) at daemon/memcached.c:3170 #19 complete_nread_binary (c=0x222858c0) at daemon/memcached.c:3744 #20 complete_nread (c=0x222858c0) at daemon/memcached.c:3826 #21 conn_nread (c=0x222858c0) at daemon/memcached.c:5679 #22 0x0000000000405ec5 in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x222858c0) at daemon/memcached.c:5942 #23 0x00002af6568b2f3c in event_process_active_single_queue (base=0x222b6500, flags=<value optimized out>) at event.c:1308 #24 event_process_active (base=0x222b6500, flags=<value optimized out>) at event.c:1375 #25 event_base_loop (base=0x222b6500, flags=<value optimized out>) at event.c:1572 #26 0x0000000000414604 in worker_libevent (arg=0x1da51900) at daemon/thread.c:301 #27 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #28 0x0000003866cd325d in clone () from /lib64/libc.so.6 ---Type <return> to continue, or q <return> to quit--- Thread 16 (Thread 7536): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002af6568c7576 in epoll_dispatch (base=0x222b6280, tv=<value optimized out>) at epoll.c:404 #2 0x00002af6568b2e44 in event_base_loop (base=0x222b6280, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x1da519f8) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 15 (Thread 7537): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002af6568c7576 in epoll_dispatch (base=0x222b6c80, tv=<value optimized out>) at epoll.c:404 #2 0x00002af6568b2e44 in event_base_loop (base=0x222b6c80, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x1da51af0) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 14 (Thread 7538): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002af6568c7576 in epoll_dispatch (base=0x222b6a00, tv=<value optimized out>) at epoll.c:404 #2 0x00002af6568b2e44 in event_base_loop (base=0x222b6a00, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x1da51be8) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 13 (Thread 7539): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002af6568c7576 in epoll_dispatch (base=0x222b6780, tv=<value optimized out>) at epoll.c:404 #2 0x00002af6568b2e44 in event_base_loop (base=0x222b6780, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x1da51ce0) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 12 (Thread 7701): #0 0x0000003866c99221 in nanosleep () from /lib64/libc.so.6 #1 0x0000003866cccba4 in usleep () from /lib64/libc.so.6 #2 0x00002aaaaaf35125 in updateStatsThread (arg=0x1da4e4c0) at src/memory_tracker.cc:31 #3 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #4 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 11 (Thread 7702): #0 0x000000386740d594 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003867408e8a in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x0000003867408d4c in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00002aaaaaf3601a in Mutex::acquire (this=0x223310f0) at src/mutex.cc:79 #4 0x00002aaaaaf7d303 in lock (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at ./src/locks.hh:48 #5 LockHolder (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at ./src/locks.hh:26 #6 CouchNotifier::notify_update (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at src/couch-kvstore/couch-notifier.cc:753 #7 0x00002aaaaaf73183 in CouchKVStore::setVBucketState (this=0x26595800, vbucketId=936, vbstate=..., vb_change_type=1, newfile=96, notify=true) at src/couch-kvstore/couch-kvstore.cc:745 #8 0x00002aaaaaf74089 in CouchKVStore::snapshotVBuckets (this=0x26595800, vbstates=Traceback (most recent call last): File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (keytype, valuetype)) RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >. std::map with 1 elements) at src/couch-kvstore/couch-kvstore.cc:596 #9 0x00002aaaaaefc2b3 in EventuallyPersistentStore::snapshotVBuckets (this=0x2a424400, priority=..., shardId=<value optimized out>) at src/ep.cc:760 #10 0x00002aaaaaf54bef in VBSnapshotTask::run (this=<value optimized out>) at src/tasks.cc:78 ---Type <return> to continue, or q <return> to quit--- #11 0x00002aaaaaf397a0 in ExecutorThread::run (this=0x22361ba0) at src/scheduler.cc:153 #12 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x22361ba0) at src/scheduler.cc:34 #13 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #14 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 10 (Thread 7703): #0 0x000000386740d594 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003867408e8a in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x0000003867408d4c in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00002aaaaaf3601a in Mutex::acquire (this=0x223310f0) at src/mutex.cc:79 #4 0x00002aaaaaf7d303 in lock (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at ./src/locks.hh:48 #5 LockHolder (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at ./src/locks.hh:26 #6 CouchNotifier::notify_update (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at src/couch-kvstore/couch-notifier.cc:753 #7 0x00002aaaaaf73183 in CouchKVStore::setVBucketState (this=0x26595200, vbucketId=937, vbstate=..., vb_change_type=1, newfile=96, notify=true) at src/couch-kvstore/couch-kvstore.cc:745 #8 0x00002aaaaaf74089 in CouchKVStore::snapshotVBuckets (this=0x26595200, vbstates=Traceback (most recent call last): File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (keytype, valuetype)) RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >. std::map with 1 elements) at src/couch-kvstore/couch-kvstore.cc:596 #9 0x00002aaaaaefc2b3 in EventuallyPersistentStore::snapshotVBuckets (this=0x2a424400, priority=..., shardId=<value optimized out>) at src/ep.cc:760 #10 0x00002aaaaaf54bef in VBSnapshotTask::run (this=<value optimized out>) at src/tasks.cc:78 #11 0x00002aaaaaf397a0 in ExecutorThread::run (this=0x22361a00) at src/scheduler.cc:153 #12 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x22361a00) at src/scheduler.cc:34 #13 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #14 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 9 (Thread 7704): #0 0x000000386740d594 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003867408e8a in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x0000003867408d4c in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00002aaaaaf3601a in Mutex::acquire (this=0x223310f0) at src/mutex.cc:79 #4 0x00002aaaaaf7d303 in lock (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at ./src/locks.hh:48 #5 LockHolder (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at ./src/locks.hh:26 #6 CouchNotifier::notify_update (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at src/couch-kvstore/couch-notifier.cc:753 #7 0x00002aaaaaf73183 in CouchKVStore::setVBucketState (this=0x26594c00, vbucketId=934, vbstate=..., vb_change_type=1, newfile=96, notify=true) at src/couch-kvstore/couch-kvstore.cc:745 #8 0x00002aaaaaf74089 in CouchKVStore::snapshotVBuckets (this=0x26594c00, vbstates=Traceback (most recent call last): File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (keytype, valuetype)) RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >. std::map with 1 elements) at src/couch-kvstore/couch-kvstore.cc:596 #9 0x00002aaaaaefc2b3 in EventuallyPersistentStore::snapshotVBuckets (this=0x2a424400, priority=..., shardId=<value optimized out>) at src/ep.cc:760 #10 0x00002aaaaaf54bef in VBSnapshotTask::run (this=<value optimized out>) at src/tasks.cc:78 #11 0x00002aaaaaf397a0 in ExecutorThread::run (this=0x22361860) at src/scheduler.cc:153 #12 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x22361860) at src/scheduler.cc:34 #13 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #14 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 8 (Thread 7706): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf39631 in wait (this=0x22388d00) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x22388d00) at src/scheduler.cc:139 #3 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x22388d00) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 ---Type <return> to continue, or q <return> to quit--- Thread 7 (Thread 7707): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf39631 in wait (this=0x22388b60) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x22388b60) at src/scheduler.cc:139 #3 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x22388b60) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 6 (Thread 7708): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf39631 in wait (this=0x223889c0) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x223889c0) at src/scheduler.cc:139 #3 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x223889c0) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 5 (Thread 7709): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf39631 in wait (this=0x22388820) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x22388820) at src/scheduler.cc:139 #3 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x22388820) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 4 (Thread 8582): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf0fe0f in wait (this=0x2230e900) at src/syncobject.hh:57 #2 wait (this=0x2230e900) at src/syncobject.hh:73 #3 wait (this=0x2230e900) at src/tapconnmap.hh:169 #4 EventuallyPersistentEngine::notifyPendingConnections (this=0x2230e900) at src/ep_engine.cc:3379 #5 0x00002aaaaaf0fef3 in EvpNotifyPendingConns (arg=0x2230e900) at src/ep_engine.cc:1153 #6 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #7 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 3 (Thread 8583): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaef3688 in wait (this=0x222aa1b0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x222aa1b0, d=...) at src/dispatcher.cc:342 #3 0x00002aaaaaef61ea in Dispatcher::run (this=0x22313c00) at src/dispatcher.cc:184 #4 0x00002aaaaaef69ad in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28 #5 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #6 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 2 (Thread 8584): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaef3688 in wait (this=0x222ab0e0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x222ab0e0, d=...) at src/dispatcher.cc:342 #3 0x00002aaaaaef61ea in Dispatcher::run (this=0x22313500) at src/dispatcher.cc:184 #4 0x00002aaaaaef69ad in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28 #5 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #6 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x49688940 (LWP 7705)): #0 0x0000003866c30285 in raise () from /lib64/libc.so.6 #1 0x0000003866c31d30 in abort () from /lib64/libc.so.6 ---Type <return> to continue, or q <return> to quit--- #2 0x00002aaaaaf36070 in Mutex::acquire (this=0x2230e828) at src/mutex.cc:83 #3 0x00002aaaaaf83fe8 in lock (this=0x2230e828, key="couch_bucket") at src/locks.hh:48 #4 LockHolder (this=0x2230e828, key="couch_bucket") at src/locks.hh:26 #5 Configuration::getString (this=0x2230e828, key="couch_bucket") at src/configuration.cc:38 #6 0x00002aaaaaf8e5eb in Configuration::getCouchBucket (this=0x2230e828) at src/generated_configuration.cc:71 #7 0x00002aaaaaf7c59e in CouchNotifier::selectBucket (this=0x22331000) at src/couch-kvstore/couch-notifier.cc:721 #8 0x00002aaaaaf7cc0f in CouchNotifier::processInput (this=0x22331000) at src/couch-kvstore/couch-notifier.cc:606 #9 0x00002aaaaaf7c199 in maybeProcessInput (this=0x22331000, rh=0x22385540) at src/couch-kvstore/couch-notifier.cc:546 #10 CouchNotifier::sendCommand (this=0x22331000, rh=0x22385540) at src/couch-kvstore/couch-notifier.cc:439 #11 0x00002aaaaaf7d478 in CouchNotifier::notify_update (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at src/couch-kvstore/couch-notifier.cc:774 #12 0x00002aaaaaf73183 in CouchKVStore::setVBucketState (this=0x26594600, vbucketId=935, vbstate=..., vb_change_type=1, newfile=96, notify=true) at src/couch-kvstore/couch-kvstore.cc:745 #13 0x00002aaaaaf74089 in CouchKVStore::snapshotVBuckets (this=0x26594600, vbstates=Traceback (most recent call last): File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (keytype, valuetype)) RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >. std::map with 1 elements) at src/couch-kvstore/couch-kvstore.cc:596 #14 0x00002aaaaaefc2b3 in EventuallyPersistentStore::snapshotVBuckets (this=0x2a424400, priority=..., shardId=<value optimized out>) at src/ep.cc:760 #15 0x00002aaaaaf54bef in VBSnapshotTask::run (this=<value optimized out>) at src/tasks.cc:78 #16 0x00002aaaaaf397a0 in ExecutorThread::run (this=0x22388ea0) at src/scheduler.cc:153 #17 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x22388ea0) at src/scheduler.cc:34 #18 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #19 0x0000003866cd325d in clone () from /lib64/libc.so.6 core file: root(couchbase)@10.3.3.30:/tmp/core.memcached.7524 |
| Comments |
| Comment by Andrei Baranouski [ 10/May/13 ] |
|
https://s3.amazonaws.com/bugdb/jira/MB-8243/10.3.3.224-5102013-1219-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8243/10.3.3.30-5102013-1218-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8243/10.3.3.32-5102013-1214-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8243/10.3.3.33-5102013-1216-diag.zip |
| Comment by Maria McDuff [ 10/May/13 ] |
| bumping up to blocker. |
| Comment by Jin Lim [ 10/May/13 ] |
|
* A fix has been uploaded for code review, http://review.couchbase.org/#/c/26253/ * In the mean time, please try the toy build, 2.0.0-MRW33-toy at http://builds.hq.northscale.net:8010/builders/ec2-centos-x64_toy-couchstore-builder/builds/166 - this toy build has the same fix for validation * reassign it back to ep engine (Jin) if the same symptom persists, thanks. |
| Comment by Andrei Baranouski [ 11/May/13 ] |
|
http://qa.hq.northscale.net/job/ubuntu-32-2.0-swaprebalance-test-P0/74/ numerous the same memcached crashes files stored on the servers: 10.3.2.153 -rw------- 1 couchbase couchbase 290811904 2013-05-10 15:19 core.memcached.993 10.3.2.155 -rw------- 1 couchbase couchbase 289701888 2013-05-10 14:50 core.memcached.11617 -rw------- 1 couchbase couchbase 289763328 2013-05-10 05:43 core.memcached.1511 -rw------- 1 couchbase couchbase 335925248 2013-05-10 06:33 core.memcached.1764 10.3.2.158 -rw------- 1 couchbase couchbase 445218816 2013-05-10 13:28 core.memcached.14070 -rw------- 1 couchbase couchbase 398946304 2013-05-10 14:50 core.memcached.15749 -rw------- 1 couchbase couchbase 280260608 2013-05-10 15:40 core.memcached.20753 -rw------- 1 couchbase couchbase 322297856 2013-05-10 06:33 core.memcached.2353 -rw------- 1 couchbase couchbase 441024512 2013-05-10 04:16 core.memcached.28046 -rw------- 1 couchbase couchbase 398950400 2013-05-10 05:43 core.memcached.29729 10.3.2.154 -rw------- 1 couchbase couchbase 225599488 2013-05-10 10:00 core.memcached.15916 -rw------- 1 couchbase couchbase 584019968 2013-05-09 02:05 core.memcached.22381 10.3.2.156 -rw------- 1 couchbase couchbase 506163200 2013-05-10 07:51 core.memcached.20281 -rw------- 1 couchbase couchbase 423190528 2013-04-26 11:28 core.memcached.20425 -rw------- 1 couchbase couchbase 279216128 2013-05-10 15:47 core.memcached.9400 -rw------- 1 couchbase couchbase 439971840 2013-05-10 04:16 core.memcached.9825 10.3.2.157 -rw------- 1 couchbase couchbase 401043456 2013-05-10 14:49 core.memcached.13490 -rw------- 1 couchbase couchbase 490438656 2013-05-10 15:39 core.memcached.18141 -rw------- 1 couchbase couchbase 398950400 2013-05-10 05:42 core.memcached.21944 -rw------- 1 couchbase couchbase 447315968 2013-05-10 06:54 core.memcached.26362 -rw------- 1 couchbase couchbase 299196416 2013-04-27 03:26 core.memcached.7099 |
| Comment by Andrei Baranouski [ 11/May/13 ] |
|
saw it on ubuntu32 http://qa.hq.northscale.net/job/ubuntu-32-2.0-swaprebalance-test-P0/74/ launched the toy build with the same suite but on centos64 http://qa.hq.northscale.net/job/centos-64-2.0-basic-rebalance-tests-P0/443/console |
| Comment by Andrei Baranouski [ 11/May/13 ] |
|
rebalance hangs as in no crashes for now test to reproduce: ./testrunner -i /tmp/rebalance-tests.ini get-cbcollect-info=True,GROUP=P0 -t swaprebalance.SwapRebalanceFailedTests.test_add_back_failed_node,replica=1,num-buckets=1,num-swap=3,GROUP=P0 ini file for centos-64-2.0-basic-rebalance-tests-P0 job [global] port:8091 [servers] 1:vm1 2:vm2 3:vm3 4:vm4 5:vm5 6:vm6 #7:vm7 [vm1] ip:10.5.2.13 username:jenkins ssh_key:/home/couchbase/QAkey.pem [vm2] ip:10.5.2.14 username:jenkins ssh_key:/home/couchbase/QAkey.pem [vm3] ip:10.5.2.15 username:jenkins ssh_key:/home/couchbase/QAkey.pem [vm4] ip:10.3.121.63 username:root password:couchbase [vm5] ip:10.3.121.64 username:root password:couchbase [vm6] ip:10.3.121.66 username:root password:couchbase #[vm7] #ip:10.3.121.69 #username:root #password:couchbase [membase] rest_username:Administrator rest_password:password |
| Comment by Maria McDuff [ 13/May/13 ] |
|
andrei, with this toybuild 33, is memcached crashing at all? or you are seeing the same hang as |
| Comment by Jin Lim [ 13/May/13 ] |
|
Please see the latest update in Thanks much for your help and time! Jin |
| Comment by Andrei Baranouski [ 13/May/13 ] |
|
we still don't have the latest build with (http://review.couchbase.org/#/c/26253/) the latest one is http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_x86_2.0.2-799-rel.deb.manifest.xml <project name="ep-engine" path="ep-engine" revision="e657fe4789a4a8be3ef145d602548278b48ad3de"/> |
| Comment by Maria McDuff [ 13/May/13 ] |
|
Andrei, Use Build 800. it's ready. Thanks. |
| Comment by Andrei Baranouski [ 15/May/13 ] |
|
801 still has |
| Comment by Jin Lim [ 15/May/13 ] |
|
EINVAL The mutex was created with the protocol attribute having the value PTHREAD_PRIO_PROTECT and the calling thread's priority is higher than the mutex's current priority ceiling. |
| Comment by Jin Lim [ 15/May/13 ] |
| To double confirm please provide the core dump from this getCouchBucket() call. Thanks. |
| Comment by Andrei Baranouski [ 16/May/13 ] |
|
see many crashes on 2.0.2-803 http://qa.hq.northscale.net/job/centos-64-2.0-basic-rebalance-tests-P0/445/consoleFull python scripts/ssh.py -i centos-64-2.0-basic-rebalance-tests-P0.ini "ls -la /tmp/" 10.3.121.64 total 650768 drwxrwxrwt 5 couchbase couchbase 4096 May 15 16:43 . drwxr-xr-x 24 root root 4096 Apr 30 15:29 .. -rw------- 1 couchbase couchbase 316329984 May 15 09:26 core.memcached.12720 -rw------- 1 couchbase couchbase 279449600 May 15 10:19 core.memcached.12989 -rw------- 1 couchbase couchbase 333119488 May 15 10:27 core.memcached.15922 -rw------- 1 couchbase couchbase 310018048 May 15 10:30 core.memcached.16218 -rw------- 1 couchbase couchbase 327880704 May 15 10:54 core.memcached.16435 -rw------- 1 couchbase couchbase 282595328 May 12 06:36 core.memcached.23225 -rw-r--r-- 1 root root 128307838 May 15 00:28 couchbase-server-enterprise_x86_64_2.0.2-803-rel.rpm drwxrwxrwt 2 root root 4096 Apr 30 15:43 .font-unix drwxrwxrwt 2 root root 4096 Apr 30 15:29 .ICE-unix drwxrwxr-x 3 1000 1000 4096 May 9 04:02 measure-sched-delays 10.3.121.69 total 179824 drwxrwxrwt 9 couchbase couchbase 4096 May 16 00:30 . drwxr-xr-x 25 root root 4096 Apr 30 15:30 .. drwxr-xr-x 3 1000 1000 4096 Apr 25 20:06 automake-1.11.1 drwxr-xr-x 2 root root 4096 May 15 16:47 backup -rw-r--r-- 1 jenkins jenkins 55587177 Jul 18 2012 couchbase-server-enterprise_x86_64_1.8.0r-55-g80f24f2.rpm -rw-r--r-- 1 root root 128307838 May 15 00:28 couchbase-server-enterprise_x86_64_2.0.2-803-rel.rpm drwxrwxrwt 2 root root 4096 Apr 30 15:45 .font-unix drwxrwxrwt 2 root root 4096 Apr 30 15:30 .ICE-unix drwxrwxr-x 2 501 staff 4096 Apr 25 20:06 libtool-2.4.2 drwxrwxr-x 3 1000 1000 4096 May 9 04:02 measure-sched-delays drwxr-xr-x 3 root root 4096 Apr 25 20:06 s3cmd 10.3.121.66 total 179812 drwxrwxrwt 6 couchbase couchbase 4096 May 16 00:30 . drwxr-xr-x 24 root root 4096 Apr 30 15:30 .. drwxr-xr-x 2 root root 4096 May 15 16:46 backup -rw-r--r-- 1 jenkins jenkins 55587177 Jul 18 2012 couchbase-server-enterprise_x86_64_1.8.0r-55-g80f24f2.rpm -rw-r--r-- 1 root root 128307838 May 15 00:28 couchbase-server-enterprise_x86_64_2.0.2-803-rel.rpm drwxrwxrwt 2 root root 4096 Apr 30 15:45 .font-unix drwxrwxrwt 2 root root 4096 Apr 30 15:30 .ICE-unix drwxrwxr-x 3 1000 1000 4096 May 9 04:02 measure-sched-delays 10.3.121.63 total 757504 drwxrwxrwt 5 couchbase couchbase 4096 May 15 16:42 . drwxr-xr-x 24 root root 4096 Apr 30 15:29 .. -rw------- 1 couchbase couchbase 490033152 May 15 10:19 core.memcached.1355 -rw------- 1 couchbase couchbase 401608704 May 15 10:30 core.memcached.5139 -rw------- 1 couchbase couchbase 303599616 May 15 10:42 core.memcached.6619 -rw------- 1 couchbase couchbase 326811648 May 15 10:54 core.memcached.6701 -rw-r--r-- 1 root root 128307838 May 15 00:28 couchbase-server-enterprise_x86_64_2.0.2-803-rel.rpm drwxrwxrwt 2 root root 4096 Apr 30 15:44 .font-unix drwxrwxrwt 2 root root 4096 Apr 30 15:29 .ICE-unix drwxrwxr-x 3 1000 1000 4096 May 15 00:15 measure-sched-delays 10.5.2.13 total 54400 drwxrwxrwt 6 couchbase couchbase 20480 May 16 00:30 . drwxr-xr-x 25 root root 4096 Mar 5 18:31 .. drwxrwxrwt 2 root root 4096 Mar 5 18:31 .ICE-unix -r--r--r-- 1 root root 11 Mar 5 18:35 .X0-lock drwxrwxrwt 2 root root 4096 Mar 5 18:35 .X11-unix drwxrwxrwt 2 root root 4096 Mar 5 18:35 .font-unix srw-rw-rw- 1 root root 0 Mar 5 18:35 .gdm_socket -rw-r--r-- 1 jenkins jenkins 55587177 Jul 18 2012 couchbase-server-enterprise_x86_64_1.8.0r-55-g80f24f2.rpm drwxr-xr-x 3 jenkins jenkins 4096 May 9 04:01 measure-sched-delays 10.5.2.15 total 1776236 drwxrwxrwt 7 membase membase 24576 May 16 00:30 . drwxr-xr-x 24 root root 4096 Mar 5 18:30 .. drwxrwxrwt 2 root root 4096 Mar 5 18:30 .ICE-unix -r--r--r-- 1 root root 11 Mar 5 18:33 .X0-lock drwxrwxrwt 2 root root 4096 Mar 5 18:33 .X11-unix drwxrwxrwt 2 root root 4096 Mar 5 18:33 .font-unix srw-rw-rw- 1 root root 0 Mar 5 18:33 .gdm_socket drwxr-xr-x 2 root root 4096 May 15 16:46 backup -rw-r--r-- 1 root root 5098 May 15 16:46 core-10.5.2.15-0.log -rw-r--r-- 1 root root 5386 May 15 16:46 core-10.5.2.15-1.log -rw-r--r-- 1 root root 5098 May 15 16:46 core-10.5.2.15-2.log -rw-r--r-- 1 root root 4810 May 15 16:46 core-10.5.2.15-3.log -rw-r--r-- 1 root root 4810 May 15 16:46 core-10.5.2.15-4.log -rw-r--r-- 1 root root 4810 May 15 16:46 core-10.5.2.15-5.log -rw-r--r-- 1 root root 5098 May 15 16:46 core-10.5.2.15-6.log -rw-r--r-- 1 root root 4522 May 15 16:46 core-10.5.2.15-7.log -rw-r--r-- 1 root root 5098 May 15 16:46 core-10.5.2.15-8.log -rw------- 1 couchbase couchbase 647872512 May 12 06:43 core.memcached.14945 -rw------- 1 couchbase couchbase 336265216 May 15 09:15 core.memcached.16949 -rw------- 1 couchbase couchbase 690114560 May 15 09:33 core.memcached.17634 -rw------- 1 couchbase couchbase 278417408 May 15 09:48 core.memcached.19427 -rw------- 1 couchbase couchbase 278417408 May 15 10:14 core.memcached.20148 -rw------- 1 couchbase couchbase 497389568 May 15 10:27 core.memcached.21502 -rw------- 1 couchbase couchbase 311087104 May 15 10:30 core.memcached.22944 -rw------- 1 couchbase couchbase 319340544 May 15 10:42 core.memcached.23156 -rw------- 1 couchbase couchbase 375361536 May 15 10:54 core.memcached.23256 -rw-r--r-- 1 jenkins jenkins 55587177 Jul 18 2012 couchbase-server-enterprise_x86_64_1.8.0r-55-g80f24f2.rpm drwxr-xr-x 3 jenkins jenkins 4096 May 9 04:01 measure-sched-delays 10.5.2.14 total 1411284 drwxrwxrwt 7 couchbase couchbase 24576 May 16 00:30 . drwxr-xr-x 25 root root 4096 Mar 5 18:31 .. drwxrwxrwt 2 root root 4096 Mar 5 18:31 .ICE-unix -r--r--r-- 1 root root 11 Mar 5 18:35 .X0-lock drwxrwxrwt 2 root root 4096 Mar 5 18:35 .X11-unix drwxrwxrwt 2 root root 4096 Mar 5 18:35 .font-unix srw-rw-rw- 1 root root 0 Mar 5 18:35 .gdm_socket drwxr-xr-x 2 root root 4096 May 15 16:46 backup -rw-r--r-- 1 root root 5386 May 15 16:46 core-10.5.2.14-0.log -rw-r--r-- 1 root root 5386 May 15 16:46 core-10.5.2.14-1.log -rw-r--r-- 1 root root 5386 May 15 16:46 core-10.5.2.14-2.log -rw-r--r-- 1 root root 4810 May 15 16:46 core-10.5.2.14-3.log -rw-r--r-- 1 root root 5098 May 15 16:46 core-10.5.2.14-4.log -rw-r--r-- 1 root root 4810 May 15 16:46 core-10.5.2.14-5.log -rw-r--r-- 1 root root 5386 May 15 16:46 core-10.5.2.14-6.log -rw-r--r-- 1 root root 5098 May 15 16:46 core-10.5.2.14-7.log -rw-r--r-- 1 root root 5098 May 15 16:46 core-10.5.2.14-8.log -rw------- 1 couchbase couchbase 336265216 May 15 09:15 core.memcached.15346 -rw------- 1 couchbase couchbase 689065984 May 15 09:33 core.memcached.16002 -rw------- 1 couchbase couchbase 278413312 May 15 09:48 core.memcached.17262 -rw------- 1 couchbase couchbase 278401024 May 15 10:06 core.memcached.17992 -rw------- 1 couchbase couchbase 278413312 May 15 10:14 core.memcached.18717 -rw------- 1 couchbase couchbase 493174784 May 15 10:27 core.memcached.19403 -rw------- 1 couchbase couchbase 310022144 May 15 10:30 core.memcached.20835 -rw------- 1 couchbase couchbase 318291968 May 15 10:42 core.memcached.21046 -rw------- 1 couchbase couchbase 374312960 May 15 10:54 core.memcached.21154 -rw-r--r-- 1 jenkins jenkins 55587177 Jul 18 2012 couchbase-server-enterprise_x86_64_1.8.0r-55-g80f24f2.rpm drwxr-xr-x 3 jenkins jenkins 4096 May 9 04:01 measure-sched-delays gdb /opt/couchbase/bin/memcached core.memcached.12720 GNU gdb (GDB) CentOS (7.0.1-45.el5.centos) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /opt/couchbase/bin/memcached...done. [New Thread 12915] [New Thread 12988] [New Thread 12987] [New Thread 12986] [New Thread 12920] [New Thread 12919] [New Thread 12918] [New Thread 12917] [New Thread 12916] [New Thread 12914] [New Thread 12913] [New Thread 12912] [New Thread 12735] [New Thread 12734] [New Thread 12733] [New Thread 12732] [New Thread 12731] [New Thread 12730] [New Thread 12729] [New Thread 12720] Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done. Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0 Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done. Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/librt.so.1 Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done. Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4 Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libstdc++.so.6 Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libgcc_s.so.1 Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done. Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done. Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done. Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so Reading symbols from /opt/couchbase/lib/memcached/ep.so...done. Loaded symbols for /opt/couchbase/lib/memcached/ep.so Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done. Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1 Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done. Loaded symbols for /opt/couchbase/lib/libsnappy.so.1 Reading symbols from /opt/couchbase/lib/libicuuc.so.44...done. Loaded symbols for /opt/couchbase/lib/libicuuc.so.44 Reading symbols from /opt/couchbase/lib/libicudata.so.44...(no debugging symbols found)...done. Loaded symbols for /opt/couchbase/lib/libicudata.so.44 Reading symbols from /opt/couchbase/lib/libicui18n.so.44...done. Loaded symbols for /opt/couchbase/lib/libicui18n.so.44 warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff55dfd000 Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'. Program terminated with signal 6, Aborted. #0 0x0000003866c30285 in raise () from /lib64/libc.so.6 (gdb) t a a bt Thread 20 (Thread 0x2b5633c46220 (LWP 12720)): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b56337ca576 in epoll_dispatch (base=0x6b4a000, tv=<value optimized out>) at epoll.c:404 #2 0x00002b56337b5e44 in event_base_loop (base=0x6b4a000, flags=<value optimized out>) at event.c:1558 #3 0x00000000004097d6 in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7926 Thread 19 (Thread 12729): #0 0x0000003866cc545b in read () from /lib64/libc.so.6 #1 0x0000003866c6b677 in _IO_new_file_underflow () from /lib64/libc.so.6 #2 0x0000003866c6c03e in _IO_default_uflow_internal () from /lib64/libc.so.6 #3 0x0000003866c61124 in _IO_getline_info_internal () from /lib64/libc.so.6 #4 0x0000003866c5ffc9 in fgets () from /lib64/libc.so.6 #5 0x00002b5633c47939 in check_stdin_thread (arg=<value optimized out>) at extensions/daemon/stdin_check.c:37 #6 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #7 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 18 (Thread 12730): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaaae4d6 in logger_thead_main (arg=0x22e6040) at extensions/loggers/file_logger.c:368 #2 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #3 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 17 (Thread 12731): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b56337ca576 in epoll_dispatch (base=0x6b4a500, tv=<value optimized out>) at epoll.c:404 #2 0x00002b56337b5e44 in event_base_loop (base=0x6b4a500, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x22e9900) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 16 (Thread 12732): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b56337ca576 in epoll_dispatch (base=0x6b4a280, tv=<value optimized out>) at epoll.c:404 #2 0x00002b56337b5e44 in event_base_loop (base=0x6b4a280, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x22e99f8) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 15 (Thread 12733): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b56337ca576 in epoll_dispatch (base=0x6b4ac80, tv=<value optimized out>) at epoll.c:404 #2 0x00002b56337b5e44 in event_base_loop (base=0x6b4ac80, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x22e9af0) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 14 (Thread 12734): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b56337ca576 in epoll_dispatch (base=0x6b4aa00, tv=<value optimized out>) at epoll.c:404 #2 0x00002b56337b5e44 in event_base_loop (base=0x6b4aa00, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x22e9be8) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 ---Type <return> to continue, or q <return> to quit--- Thread 13 (Thread 12735): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b56337ca576 in epoll_dispatch (base=0x6b4a780, tv=<value optimized out>) at epoll.c:404 #2 0x00002b56337b5e44 in event_base_loop (base=0x6b4a780, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x22e9ce0) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 12 (Thread 12912): #0 0x0000003866c99221 in nanosleep () from /lib64/libc.so.6 #1 0x0000003866cccba4 in usleep () from /lib64/libc.so.6 #2 0x00002aaaaaf351a5 in updateStatsThread (arg=0x22e64c0) at src/memory_tracker.cc:31 #3 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #4 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 11 (Thread 12913): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf399f1 in wait (this=0x6bf5ba0) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6bf5ba0) at src/scheduler.cc:139 #3 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6bf5ba0) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 10 (Thread 12914): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf399f1 in wait (this=0x6bf5a00) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6bf5a00) at src/scheduler.cc:139 #3 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6bf5a00) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 9 (Thread 12916): #0 0x0000003866ccc767 in fdatasync () from /lib64/libc.so.6 #1 0x00002aaaab1da67f in couch_sync (handle=<value optimized out>) at src/os.c:117 #2 0x00002aaaaaf7a25f in cfs_sync (h=0x89cac40) at src/couch-kvstore/couch-fs-stats.cc:88 #3 0x00002aaaab1d475f in couchstore_commit (db=0x6b51ce0) at src/couch_db.c:193 #4 0x00002aaaaaf73d46 in CouchKVStore::setVBucketState (this=0x8908600, vbucketId=323, vbstate=..., vb_change_type=1, newfile=96, notify=true) at src/couch-kvstore/couch-kvstore.cc:728 #5 0x00002aaaaaf74b69 in CouchKVStore::snapshotVBuckets (this=0x8908600, vbstates=Traceback (most recent call last): File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (keytype, valuetype)) RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >. std::map with 1 elements) at src/couch-kvstore/couch-kvstore.cc:596 #6 0x00002aaaaaefc3f3 in EventuallyPersistentStore::snapshotVBuckets (this=0x8d4a800, priority=..., shardId=<value optimized out>) at src/ep.cc:760 #7 0x00002aaaaaf54daf in VBSnapshotTask::run (this=<value optimized out>) at src/tasks.cc:78 #8 0x00002aaaaaf39b61 in ExecutorThread::run (this=0x6c1cea0) at src/scheduler.cc:153 #9 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6c1cea0) at src/scheduler.cc:34 #10 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #11 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 8 (Thread 12917): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf399f1 in wait (this=0x6c1cd00) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6c1cd00) at src/scheduler.cc:139 #3 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6c1cd00) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 7 (Thread 12918): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf399f1 in wait (this=0x6c1cb60) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6c1cb60) at src/scheduler.cc:139 #3 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6c1cb60) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 6 (Thread 12919): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf399f1 in wait (this=0x6c1c9c0) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6c1c9c0) at src/scheduler.cc:139 #3 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6c1c9c0) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 5 (Thread 12920): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf399f1 in wait (this=0x6c1c820) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6c1c820) at src/scheduler.cc:139 #3 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6c1c820) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 4 (Thread 12986): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf101df in wait (this=0x8d1ad00) at src/syncobject.hh:57 #2 wait (this=0x8d1ad00) at src/syncobject.hh:73 #3 wait (this=0x8d1ad00) at src/tapconnmap.hh:169 #4 EventuallyPersistentEngine::notifyPendingConnections (this=0x8d1ad00) at src/ep_engine.cc:3377 #5 0x00002aaaaaf102c3 in EvpNotifyPendingConns (arg=0x8d1ad00) at src/ep_engine.cc:1153 #6 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #7 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 3 (Thread 12987): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaef37c8 in wait (this=0x71f2fc0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x71f2fc0, d=...) at src/dispatcher.cc:342 #3 0x00002aaaaaef632a in Dispatcher::run (this=0x7e73c00) at src/dispatcher.cc:184 #4 0x00002aaaaaef6aed in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28 #5 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #6 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 2 (Thread 12988): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaef37c8 in wait (this=0x71f2ab0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x71f2ab0, d=...) at src/dispatcher.cc:342 #3 0x00002aaaaaef632a in Dispatcher::run (this=0x7e73dc0) at src/dispatcher.cc:184 #4 0x00002aaaaaef6aed in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28 #5 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #6 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x48664940 (LWP 12915)): ---Type <return> to continue, or q <return> to quit--- #0 0x0000003866c30285 in raise () from /lib64/libc.so.6 #1 0x0000003866c31d30 in abort () from /lib64/libc.so.6 #2 0x00002aaaaaf360f0 in Mutex::acquire (this=0x6ba2828) at src/mutex.cc:83 #3 0x00002aaaaaf84a71 in lock (this=<value optimized out>, key="couch_port") at src/locks.hh:48 #4 LockHolder (this=<value optimized out>, key="couch_port") at src/locks.hh:26 #5 Configuration::getInteger (this=<value optimized out>, key="couch_port") at src/configuration.cc:77 #6 0x00002aaaaaf8da15 in Configuration::getCouchPort (this=0x6ba2828) at src/generated_configuration.cc:83 #7 0x00002aaaaaf7c056 in CouchNotifier::ensureConnection (this=0x6bc5000) at src/couch-kvstore/couch-notifier.cc:317 #8 0x00002aaaaaf7cc91 in CouchNotifier::sendCommand (this=0x6bc5000, rh=0x8f81460) at src/couch-kvstore/couch-notifier.cc:437 #9 0x00002aaaaaf7d1b4 in CouchNotifier::selectBucket (this=0x6bc5000) at src/couch-kvstore/couch-notifier.cc:739 #10 0x00002aaaaaf7d6cf in CouchNotifier::processInput (this=0x6bc5000) at src/couch-kvstore/couch-notifier.cc:606 #11 0x00002aaaaaf7cce9 in maybeProcessInput (this=0x6bc5000, rh=0x8f81400) at src/couch-kvstore/couch-notifier.cc:546 #12 CouchNotifier::sendCommand (this=0x6bc5000, rh=0x8f81400) at src/couch-kvstore/couch-notifier.cc:439 #13 0x00002aaaaaf7df58 in CouchNotifier::notify_update (this=0x6bc5000, vbs=..., file_version=1, header_offset=4096, cb=...) at src/couch-kvstore/couch-notifier.cc:772 #14 0x00002aaaaaf73c63 in CouchKVStore::setVBucketState (this=0x8ff8f00, vbucketId=682, vbstate=..., vb_change_type=1, newfile=96, notify=true) at src/couch-kvstore/couch-kvstore.cc:745 #15 0x00002aaaaaf74b69 in CouchKVStore::snapshotVBuckets (this=0x8ff8f00, vbstates=Traceback (most recent call last): File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (keytype, valuetype)) RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >. std::map with 1 elements) at src/couch-kvstore/couch-kvstore.cc:596 #16 0x00002aaaaaefc3f3 in EventuallyPersistentStore::snapshotVBuckets (this=0x8d4a800, priority=..., shardId=<value optimized out>) at src/ep.cc:760 #17 0x00002aaaaaf54daf in VBSnapshotTask::run (this=<value optimized out>) at src/tasks.cc:78 #18 0x00002aaaaaf39b61 in ExecutorThread::run (this=0x6bf5860) at src/scheduler.cc:153 #19 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6bf5860) at src/scheduler.cc:34 #20 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #21 0x0000003866cd325d in clone () from /lib64/libc.so.6 vms: ip:10.5.2.13 username:jenkins ssh_key:QAkey.pem [vm2] ip:10.5.2.14 username:jenkins ssh_key:QAkey.pem [vm3] ip:10.5.2.15 username:jenkins ssh_key:QAkey.pem [vm4] ip:10.3.121.63 username:root password:couchbase [vm5] ip:10.3.121.64 username:root password:couchbase [vm6] ip:10.3.121.66 username:root password:couchbase [vm7] ip:10.3.121.69 username:root password:couchbase |
| Comment by Andrei Baranouski [ 16/May/13 ] |
|
seems like it was fixed in 2.0.0-MRW37-toy http://qa.hq.northscale.net/job/CouchbaseServer-SanityTest-4Node-Centos64/77/consoleFull |
| Comment by Andrei Baranouski [ 16/May/13 ] |
| please assign back to me when we get build with corresponding commit |
| Comment by Jin Lim [ 17/May/13 ] |
| build 806 didn't run into this issue anymore. please confirm and close the bug. thanks. |
[MB-7938] 2.0.2 memcached crashes in EventuallyPersistentStore::flushVBucket Created: 19/Mar/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Andrei Baranouski | Assignee: | Andrei Baranouski |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | version=2.0.2-741-rel, Ubuntu 11.04 | ||
| Description |
|
http://qa.hq.northscale.net/job/centos-64-2.0-rebalance-regressions/196/consoleFull
./testrunner -i /tmp/rebalance_regression.ini wait_timeout=100,get-cbcollect-info=True -t swaprebalance.SwapRebalanceFailedTests.test_failover_swap_rebalance,replica=2,num-buckets=2,num-swap=2,keys-count=1000000,swap-orchestrator=True test logs: [2013-03-18 08:35:42,154] - [data_helper:289] INFO - creating direct client 10.3.121.94:11210 bucket-1 [2013-03-18 08:35:42,432] - [rest_client:913] ERROR - {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'} - rebalance failed [2013-03-18 08:35:42,432] - [rest_client:914] INFO - Latest logs from UI: [2013-03-18 08:35:42,565] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.93', u'code': 1, u'text': u"Node 'ns_1@10.3.121.93' is leaving cluster.", u'shortText': u'message', u'module': u'ns_cluster', u'tstamp': 1363621308275, u'type': u'info'} [2013-03-18 08:35:42,567] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.93', u'code': 4, u'text': u'Node ns_1@10.3.121.93 left cluster', u'shortText': u'message', u'module': u'ns_cluster', u'tstamp': 1363621307949, u'type': u'info'} [2013-03-18 08:35:42,567] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.95', u'code': 4, u'text': u"Control connection to memcached on 'ns_1@10.3.121.95' disconnected: {badmatch,\n {error,\n closed}}", u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1363621307350, u'type': u'info'} [2013-03-18 08:35:42,570] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.93', u'code': 2, u'text': u"Rebalance exited with reason {{bulk_set_vbucket_state_failed,\n [{'ns_1@10.3.121.95',\n {'EXIT',\n {{{{unexpected_reason,\n {{badmatch,{error,closed}},\n [{mc_binary,quick_stats_recv,3},\n {mc_binary,\n mass_get_last_closed_checkpoint_loop,\n 5},\n {mc_binary,\n mass_get_last_closed_checkpoint,3},\n {ebucketmigrator_srv,handle_call,3},\n {gen_server,handle_msg,5},\n {proc_lib,init_p_do_apply,3}]}},\n [{misc,executing_on_new_process,1},\n {tap_replication_manager,\n change_vbucket_filter,4},\n {tap_replication_manager,\n '-do_set_incoming_replication_map/3-lc$^5/1-5-',\n 2},\n {tap_replication_manager,\n do_set_incoming_replication_map,3},\n {tap_replication_manager,handle_call,3},\n {gen_server,handle_msg,5},\n {proc_lib,init_p_do_apply,3}]},\n {gen_server,call,\n ['tap_replication_manager-bucket-1',\n {change_vbucket_replication,531,\n 'ns_1@10.3.121.96'},\n infinity]}},\n {gen_server,call,\n [{'janitor_agent-bucket-1',\n 'ns_1@10.3.121.95'},\n {if_rebalance,<0.13276.97>,\n {update_vbucket_state,531,replica,\n undefined,'ns_1@10.3.121.96'}},\n infinity]}}}},\n {'ns_1@10.3.121.94',\n {'EXIT',\n {{{{unexpected_reason,\n {{badmatch,{error,closed}},\n [{mc_binary,quick_stats_recv,3},\n {mc_binary,quick_stats_loop,5},\n {mc_binary,quick_stats,5},\n {mc_client_binary,\n get_zero_open_checkpoint_vbuckets,3},\n {ebucketmigrator_srv,handle_call,3},\n {gen_server,handle_msg,5},\n {proc_lib,init_p_do_apply,3}]}},\n [{misc,executing_on_new_process,1},\n {tap_replication_manager,\n change_vbucket_filter,4},\n {tap_replication_manager,\n '-do_set_incoming_replication_map/3-lc$^5/1-5-',\n 2},\n {tap_replication_manager,\n do_set_incoming_replication_map,3},\n {tap_replication_manager,handle_call,3},\n {gen_server,handle_msg,5},\n {proc_lib,init_p_do_apply,3}]},\n {gen_server,call,\n ['tap_replication_manager-bucket-1',\n {change_vbucket_replication,531,\n 'ns_1@10.3.121.95'},\n infinity]}},\n {gen_server,call,\n [{'janitor_agent-bucket-1',\n 'ns_1@10.3.121.94'},\n {if_rebalance,<0.13276.97>,\n {update_vbucket_state,531,replica,\n undefined,'ns_1@10.3.121.95'}},\n infinity]}}}}]},\n [{janitor_agent,bulk_set_vbucket_state,4},\n {ns_vbucket_mover,\n update_replication_post_move,3},\n {ns_vbucket_mover,on_move_done,2},\n {gen_server,handle_msg,5},\n {proc_lib,init_p_do_apply,3}]}\n", u'shortText': u'message', u'module': u'ns_orchestrator', u'tstamp': 1363621307292, u'type': u'info'} [2013-03-18 08:35:42,571] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.95', u'code': 0, u'text': u'Port server memcached on node \'ns_1@10.3.121.95\' exited with status 134. Restarting. Messages: Mon Mar 18 08:41:35.578716 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_451 - Reset vbucket 538 was completed succecssfully.\nMon Mar 18 08:41:35.704327 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_452 - disconnected\nMon Mar 18 08:41:35.805105 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_450"\nMon Mar 18 08:41:35.805157 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_452"\nMon Mar 18 08:41:35.805446 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:35.830322 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 538, cookie 0x6ccf080\nMon Mar 18 08:41:36.347550 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_451 - disconnected\nMon Mar 18 08:41:36.444078 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_453 - Reset vbucket 870 was completed succecssfully.\nMon Mar 18 08:41:36.479457 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_451"\nMon Mar 18 08:41:36.480162 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.94 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:36.714685 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 870, cookie 0x6ca6000\nMon Mar 18 08:41:36.976727 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_453 - disconnected\nMon Mar 18 08:41:37.019371 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_454 - Reset vbucket 537 was completed succecssfully.\nMon Mar 18 08:41:37.206467 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_455 - disconnected\nMon Mar 18 08:41:37.271568 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 537, cookie 0x6ca6000\nMon Mar 18 08:41:37.285754 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_453"\nMon Mar 18 08:41:37.285827 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_455"\nMon Mar 18 08:41:37.286268 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:37.643780 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_456 - Reset vbucket 869 was completed succecssfully.\nMon Mar 18 08:41:37.799981 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 869, cookie 0x6cf1b80\nMon Mar 18 08:41:37.837434 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_456 - disconnected\nMon Mar 18 08:41:37.907834 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_454 - disconnected\nMon Mar 18 08:41:37.984234 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_457 - Reset vbucket 868 was completed succecssfully.\nMon Mar 18 08:41:38.035570 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_454"\nMon Mar 18 08:41:38.035669 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_456"\nMon Mar 18 08:41:38.036087 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.94 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:38.180437 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_458 - disconnected\nMon Mar 18 08:41:38.250083 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_458"\nMon Mar 18 08:41:38.383886 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:38.544533 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 868, cookie 0x6ca62c0\nMon Mar 18 08:41:38.692906 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_459 - Reset vbucket 536 was completed succecssfully.\nMon Mar 18 08:41:38.933451 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_457 - disconnected\nMon Mar 18 08:41:38.934109 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 536, cookie 0x6ca7080\nMon Mar 18 08:41:39.241149 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_460 - Reset vbucket 867 was completed succecssfully.\nMon Mar 18 08:41:39.396965 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_461 - disconnected\nMon Mar 18 08:41:39.472257 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_457"\nMon Mar 18 08:41:39.472345 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_461"\nMon Mar 18 08:41:39.506632 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:39.553585 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_459 - disconnected\nMon Mar 18 08:41:39.572591 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 867, cookie 0x6ca6000\nMon Mar 18 08:41:39.579114 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_459"\nMon Mar 18 08:41:39.695310 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_460 - disconnected\nMon Mar 18 08:41:39.900652 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_460"\nMon Mar 18 08:41:39.900985 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.94 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:40.151512 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_462 - disconnected\nMon Mar 18 08:41:40.266294 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_462"\nMon Mar 18 08:41:40.266461 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:40.856053 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_463 - Reset vbucket 866 was completed succecssfully.\nMon Mar 18 08:41:41.091008 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 866, cookie 0x6ca6000\nMon Mar 18 08:41:41.323419 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_463 - disconnected\nMon Mar 18 08:41:41.342314 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_463"\nMon Mar 18 08:41:41.371404 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_464 - Reset vbucket 535 was completed succecssfully.\nMon Mar 18 08:41:41.661043 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_465 - disconnected\nMon Mar 18 08:41:41.717840 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 535, cookie 0x6ca62c0\nMon Mar 18 08:41:41.742040 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_465"\nMon Mar 18 08:41:41.742448 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:41.954347 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_466 - Reset vbucket 865 was completed succecssfully.\nMon Mar 18 08:41:42.166879 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 865, cookie 0x6cf1600\nMon Mar 18 08:41:42.269048 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_464 - disconnected\nMon Mar 18 08:41:42.340677 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_464"\nMon Mar 18 08:41:42.340711 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.94 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:42.384967 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_467 - Reset vbucket 864 was completed succecssfully.\nMon Mar 18 08:41:42.413198 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_466 - disconnected\nMon Mar 18 08:41:42.441804 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_466"\nMon Mar 18 08:41:42.684453 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 864, cookie 0x6ccf080\nMon Mar 18 08:41:42.818667 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_468 - disconnected\nMon Mar 18 08:41:42.875395 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_467 - disconnected\nMon Mar 18 08:41:42.904356 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_469 - Reset vbucket 534 was completed succecssfully.\nMon Mar 18 08:41:42.978903 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_467"\nMon Mar 18 08:41:42.978974 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_468"\nMon Mar 18 08:41:42.979235 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:43.243646 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_470 - disconnected\nMon Mar 18 08:41:43.403811 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:43.403750 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_470"\nMon Mar 18 08:41:43.494580 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 534, cookie 0x6ccf600\nMon Mar 18 08:41:43.580734 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_471 - Reset vbucket 533 was completed succecssfully.\nMon Mar 18 08:41:43.893154 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 533, cookie 0x6ca6000\nMon Mar 18 08:41:44.327385 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_472 - Reset vbucket 532 was completed succecssfully.\nMon Mar 18 08:41:44.412144 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_469 - disconnected\nMon Mar 18 08:41:44.495170 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_469"\nMon Mar 18 08:41:44.502353 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_471 - disconnected\nMon Mar 18 08:41:44.516347 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_471"\nMon Mar 18 08:41:44.516369 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.94 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:44.519367 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 532, cookie 0x6ca62c0\nMon Mar 18 08:41:44.855524 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.94 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:45.087666 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_473 - Reset vbucket 863 was completed succecssfully.\nMon Mar 18 08:41:45.320777 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_472 - disconnected\nMon Mar 18 08:41:45.440897 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 863, cookie 0x6cceb00\nMon Mar 18 08:41:45.445425 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_472"\nMon Mar 18 08:41:45.446922 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.94 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:45.531819 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_474 - Reset vbucket 531 was completed succecssfully.\nMon Mar 18 08:41:45.566986 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_473 - disconnected\nMon Mar 18 08:41:45.641574 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_473"\nMon Mar 18 08:41:45.822884 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 531, cookie 0x6ccf600\nMon Mar 18 08:41:45.850503 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_475 - disconnected\nMon Mar 18 08:41:45.932729 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_475"\nMon Mar 18 08:41:45.932798 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:46.151290 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_476 - Reset vbucket 530 was completed succecssfully.\nMon Mar 18 08:41:46.393310 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_474 - disconnected\nmemcached: src/ep.cc:1790: virtual void PersistenceCallback::callback(mutation_result&): Assertion `stats->diskQueueSize < ((size_t)1<<(sizeof(size_t)*8-1))\' failed.', u'shortText': u'message', u'module': u'ns_port_server', u'tstamp': 1363621307102, u'type': u'info'} [2013-03-18 08:35:42,577] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.93', u'code': 0, u'text': u'Bucket "bucket-1" rebalance does not seem to be swap rebalance', u'shortText': u'message', u'module': u'ns_vbucket_mover', u'tstamp': 1363620900761, u'type': u'info'} [2013-03-18 08:35:42,578] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.95', u'code': 1, u'text': u'Bucket "bucket-1" loaded on node \'ns_1@10.3.121.95\' in 0 seconds.', u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1363620899668, u'type': u'info'} [2013-03-18 08:35:42,579] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.96', u'code': 1, u'text': u'Bucket "bucket-1" loaded on node \'ns_1@10.3.121.96\' in 0 seconds.', u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1363620899353, u'type': u'info'} [2013-03-18 08:35:42,580] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.94', u'code': 5, u'text': u"Node 'ns_1@10.3.121.94' saw that node 'ns_1@10.3.121.98' went down.", u'shortText': u'node down', u'module': u'ns_node_disco', u'tstamp': 1363620899198, u'type': u'warning'} [2013-03-18 08:35:42,585] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.95', u'code': 5, u'text': u"Node 'ns_1@10.3.121.95' saw that node 'ns_1@10.3.121.98' went down.", u'shortText': u'node down', u'module': u'ns_node_disco', u'tstamp': 1363620899195, u'type': u'warning'} ERROR root@ubuntu1104-64:/tmp# sudo gdb /opt/couchbase/bin/memcached core.memcached.30589 GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2 Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /opt/couchbase/bin/memcached...done. [New Thread 30607] [New Thread 30597] [New Thread 30611] [New Thread 30589] [New Thread 30604] [New Thread 30598] [New Thread 30602] [New Thread 30600] [New Thread 30603] [New Thread 30601] [New Thread 30606] [New Thread 30610] [New Thread 30609] [New Thread 30608] warning: Can't read pathname for load map: Input/output error. Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done. Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0 Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done. Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5 Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libdl.so.2 Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libm.so.6 Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/librt.so.1 Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done. Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4 Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libpthread.so.0 Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib/x86_64-linux-gnu/libstdc++.so.6 Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libgcc_s.so.1 Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done. Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done. Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so Reading symbols from /lib/x86_64-linux-gnu/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libz.so.1 Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done. Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so Reading symbols from /opt/couchbase/lib/memcached/ep.so...done. Loaded symbols for /opt/couchbase/lib/memcached/ep.so Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done. Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1 Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done. Loaded symbols for /opt/couchbase/lib/libsnappy.so.1 Reading symbols from /opt/couchbase/lib/libicuuc.so.44...done. Loaded symbols for /opt/couchbase/lib/libicuuc.so.44 Reading symbols from /opt/couchbase/lib/libicudata.so.44...(no debugging symbols found)...done. Loaded symbols for /opt/couchbase/lib/libicudata.so.44 Reading symbols from /opt/couchbase/lib/libicui18n.so.44...done. Loaded symbols for /opt/couchbase/lib/libicui18n.so.44 Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2 Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'. Program terminated with signal 6, Aborted. #0 0x00007f9d01257d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) t a a bt Thread 14 (Thread 30608): #0 0x00007f9d015c7f2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f9cfca013d8 in wait (this=0x24ee2d0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x24ee2d0, d=...) at src/dispatcher.cc:328 #3 0x00007f9cfca042ca in Dispatcher::run (this=0x6d9b880) at src/dispatcher.cc:171 #4 0x00007f9cfca04b7d in launch_dispatcher_thread (arg=0x6d9b8d4) at src/dispatcher.cc:28 #5 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 13 (Thread 30609): #0 0x00007f9d015c7f2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f9cfca013d8 in wait (this=0x24ee240, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x24ee240, d=...) at src/dispatcher.cc:328 #3 0x00007f9cfca042ca in Dispatcher::run (this=0x6d9b6c0) at src/dispatcher.cc:171 #4 0x00007f9cfca04b7d in launch_dispatcher_thread (arg=0x6d9b714) at src/dispatcher.cc:28 #5 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 12 (Thread 30610): #0 0x00007f9d015c7f2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f9cfca013d8 in wait (this=0x24ee5a0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x24ee5a0, d=...) at src/dispatcher.cc:328 #3 0x00007f9cfca042ca in Dispatcher::run (this=0x6d9b500) at src/dispatcher.cc:171 #4 0x00007f9cfca04b7d in launch_dispatcher_thread (arg=0x6d9b554) at src/dispatcher.cc:28 #5 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 11 (Thread 30606): #0 0x00007f9d012d44ed in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f9d01305914 in usleep () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007f9cfca416a5 in updateStatsThread (arg=<value optimized out>) at src/memory_tracker.cc:31 #3 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #4 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x0000000000000000 in ?? () Thread 10 (Thread 30601): #0 0x00007f9d0130d633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f9d020dbf36 in epoll_dispatch (base=0x6d4e280, tv=<value optimized out>) at epoll.c:404 #2 0x00007f9d020c7394 in event_base_loop (base=0x6d4e280, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414b94 in worker_libevent (arg=0x24e94f8) at daemon/thread.c:301 #4 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 9 (Thread 30603): #0 mc_get_allocation_size (ptr=0x7977200) at daemon/alloc_hooks.c:115 #1 0x00007f9cfca41264 in DeleteHook (ptr=0x7977200) at src/memory_tracker.cc:56 #2 0x00007f9d01801402 in MallocHook::InvokeDeleteHookSlow(void const*) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #3 0x00007f9d017f455a in MallocHook::InvokeDeleteHook(void const*) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #4 0x00007f9d018077f6 in tc_delete () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #5 0x00007f9d00fb3e19 in std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 ---Type <return> to continue, or q <return> to quit--- #6 0x00007f9cfca3809a in void core_engine::add_casted_stat<unsigned long>(char const*, unsigned long, void (*)(char const*, unsigned short, char const*, unsigned int, void const*), void const*) () from /opt/couchbase/lib/memcached/ep.so #7 0x00007f9cfca28af0 in addCheckpointStat (this=<value optimized out>, vb=...) at src/ep_engine.cc:2858 #8 EventuallyPersistentEngine::StatCheckpointVisitor::visitBucket (this=<value optimized out>, vb=...) at src/ep_engine.cc:2842 #9 0x00007f9cfca09adc in EventuallyPersistentStore::visit (this=<value optimized out>, visitor=...) at src/ep.cc:2417 #10 0x00007f9cfca2d315 in EventuallyPersistentEngine::doCheckpointStats (this=0x6da2000, cookie=0x6cf18c0, add_stat=0x40d1d0 <append_stats>, stat_key=<value optimized out>, nkey=<value optimized out>) at src/ep_engine.cc:2868 #11 0x00007f9cfca2fdcc in EventuallyPersistentEngine::getStats (this=0x6da2000, cookie=0x6cf18c0, stat_key=0x6cf5018 "checkpointn_ns_1@10.3.121.94", nkey=10, add_stat=0x40d1d0 <append_stats>) at src/ep_engine.cc:3328 #12 0x00007f9cfca30416 in EvpGetStats (handle=0x6da2000, cookie=0x6cf18c0, stat_key=0x6cf5018 "checkpointn_ns_1@10.3.121.94", nkey=10, add_stat=<value optimized out>) at src/ep_engine.cc:193 #13 0x00007f9cff4e1c20 in bucket_get_stats (handle=<value optimized out>, cookie=0x6cf18c0, stat_key=0x6cf5018 "checkpointn_ns_1@10.3.121.94", nkey=10, add_stat=0x40d1d0 <append_stats>) at bucket_engine.c:1720 #14 0x00000000004113f6 in process_bin_stat (c=0x6cf18c0) at daemon/memcached.c:2199 #15 0x0000000000411d65 in complete_nread_binary (c=0x7977200) at daemon/memcached.c:3708 #16 complete_nread (c=0x7977200) at daemon/memcached.c:3820 #17 conn_nread (c=0x7977200) at daemon/memcached.c:5673 #18 0x00000000004068a5 in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x6cf18c0) at daemon/memcached.c:5936 #19 0x00007f9d020c748c in event_process_active_single_queue (base=0x6d4ea00, flags=<value optimized out>) at event.c:1308 #20 event_process_active (base=0x6d4ea00, flags=<value optimized out>) at event.c:1375 #21 event_base_loop (base=0x6d4ea00, flags=<value optimized out>) at event.c:1572 #22 0x0000000000414b94 in worker_libevent (arg=0x24e96e8) at daemon/thread.c:301 #23 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #24 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #25 0x0000000000000000 in ?? () Thread 8 (Thread 30600): #0 0x00007f9d015ca955 in __lll_unlock_wake () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f9d015c6e7a in _L_unlock_1177 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f9d015c6da3 in pthread_mutex_unlock () from /lib/x86_64-linux-gnu/libpthread.so.0 #3 0x00007f9cfca42636 in Mutex::release (this=0xed948a8) at src/mutex.cc:94 #4 0x00007f9cfc9fab10 in unlock (this=<value optimized out>, qi=..., vbucket=...) at src/locks.hh:58 #5 ~LockHolder (this=<value optimized out>, qi=..., vbucket=...) at src/locks.hh:41 #6 CheckpointManager::queueDirty (this=<value optimized out>, qi=..., vbucket=...) at src/checkpoint.cc:717 #7 0x00007f9cfca07423 in EventuallyPersistentStore::queueDirty (this=0x6d9e480, vb=..., key=..., vbid=<value optimized out>, op=queue_op_set, seqno=<value optimized out>, tapBackfill=false) at src/ep.cc:2168 #8 0x00007f9cfca0bc2d in EventuallyPersistentStore::setWithMeta (this=0x6d9e480, itm=..., cas=<value optimized out>, cookie=<value optimized out>, force=<value optimized out>, allowExisting=<value optimized out>, nru=3 '\003') at src/ep.cc:1399 #9 0x00007f9cfca2ac53 in EventuallyPersistentEngine::tapNotify(const void *, void *, uint16_t, uint8_t, uint16_t, <anonymous enum>, uint32_t, const void *, size_t, uint32_t, uint32_t, uint64_t, const void *, size_t, uint16_t) (this=0x6da2000, cookie=0x6ccf080, engine_specific=<value optimized out>, nengine=<value optimized out>, tap_flags=<value optimized out>, tap_event=TAP_MUTATION, tap_seqno=974, key=0x1102c031, nkey=25, flags=0, exptime=0, cas=15671462831700303, data=0x1102c04a, ndata=369, vbucket=530) at src/ep_engine.cc:2067 #10 0x00007f9cfca2b368 in EvpTapNotify(ENGINE_HANDLE *, const void *, void *, uint16_t, uint8_t, uint16_t, <anonymous enum>, uint32_t, const void *, size_t, uint32_t, uint32_t, uint64_t, const void *, size_t, uint16_t) (handle=0x6da2000, cookie=0x6ccf080, engine_specific=0x1102c028, nengine=65535, ttl=254 '\376', tap_flags=0, tap_event=TAP_MUTATION, tap_seqno=974, key=0x1102c031, nkey=25, flags=0, exptime=0, cas=15671462831700303, data=0x1102c04a, ndata=369, vbucket=<value optimized out>) at src/ep_engine.cc:1037 #11 0x00007f9cff4e0a04 in bucket_tap_notify (handle=<value optimized out>, cookie=0x6ccf080, engine_specific=0x1102c028, nengine=65535, ttl=254 '\376', tap_flags=23, tap_event=TAP_MUTATION, tap_seqno=974, key=0x1102c031, nkey=25, flags=0, exptime=0, cas=15671462831700303, data=0x1102c04a, ndata=369, vbucket=<value optimized out>) at bucket_engine.c:1942 #12 0x0000000000409eb2 in process_bin_tap_packet (event=TAP_MUTATION, c=0x6ccf080) at daemon/memcached.c:3031 #13 0x00000000004120c3 in process_bin_packet (c=0x6ccf080) at daemon/memcached.c:3117 #14 complete_nread_binary (c=0x6ccf080) at daemon/memcached.c:3738 #15 complete_nread (c=0x6ccf080) at daemon/memcached.c:3820 #16 conn_nread (c=0x6ccf080) at daemon/memcached.c:5673 #17 0x00000000004068a5 in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x6ccf080) at daemon/memcached.c:5936 #18 0x00007f9d020c748c in event_process_active_single_queue (base=0x6d4e500, flags=<value optimized out>) at event.c:1308 #19 event_process_active (base=0x6d4e500, flags=<value optimized out>) at event.c:1375 #20 event_base_loop (base=0x6d4e500, flags=<value optimized out>) at event.c:1572 #21 0x0000000000414b94 in worker_libevent (arg=0x24e9400) at daemon/thread.c:301 ---Type <return> to continue, or q <return> to quit--- #22 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #23 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #24 0x0000000000000000 in ?? () Thread 7 (Thread 30602): #0 0x00007f9d0130d633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f9d020dbf36 in epoll_dispatch (base=0x6d4ec80, tv=<value optimized out>) at epoll.c:404 #2 0x00007f9d020c7394 in event_base_loop (base=0x6d4ec80, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414b94 in worker_libevent (arg=0x24e95f0) at daemon/thread.c:301 #4 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 6 (Thread 30598): #0 0x00007f9d015c7f2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f9d00102176 in logger_thead_main (arg=<value optimized out>) at extensions/loggers/file_logger.c:368 #2 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 Thread 5 (Thread 30604): #0 0x00007f9d0130d633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f9d020dbf36 in epoll_dispatch (base=0x6d4e780, tv=<value optimized out>) at epoll.c:404 #2 0x00007f9d020c7394 in event_base_loop (base=0x6d4e780, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414b94 in worker_libevent (arg=0x24e97e0) at daemon/thread.c:301 #4 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 4 (Thread 30589): #0 0x00007f9d0130d633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f9d020dbf36 in epoll_dispatch (base=0x6d4e000, tv=<value optimized out>) at epoll.c:404 #2 0x00007f9d020c7394 in event_base_loop (base=0x6d4e000, flags=<value optimized out>) at event.c:1558 #3 0x000000000040c2e1 in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7918 Thread 3 (Thread 30611): #0 0x00007f9d015c7f2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f9cfca2090f in wait (this=0x6da2000) at src/syncobject.hh:57 #2 wait (this=0x6da2000) at src/syncobject.hh:73 #3 wait (this=0x6da2000) at src/tapconnmap.hh:169 #4 EventuallyPersistentEngine::notifyPendingConnections (this=0x6da2000) at src/ep_engine.cc:3406 #5 0x00007f9cfca209f3 in EvpNotifyPendingConns (arg=0x6da2000) at src/ep_engine.cc:1139 #6 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #7 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #8 0x0000000000000000 in ?? () Thread 2 (Thread 30597): #0 0x00007f9d012fe2ed in read () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f9d01299798 in _IO_file_underflow () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007f9d0129a7be in _IO_default_uflow () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007f9d0128e8fa in _IO_getline_info () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x00007f9d0128d7ca in fgets () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x00007f9d00b05b19 in fgets (arg=<value optimized out>) at /usr/include/bits/stdio2.h:255 #6 check_stdin_thread (arg=<value optimized out>) at extensions/daemon/stdin_check.c:37 #7 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #8 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #9 0x0000000000000000 in ?? () Thread 1 (Thread 30607): #0 0x00007f9d01257d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f9d0125bab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007f9d012507c5 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007f9cfca1d8db in PersistenceCallback::callback(std::pair<int, long>&) () from /opt/couchbase/lib/memcached/ep.so #4 0x00007f9cfca78724 in CouchKVStore::commitCallback(CouchRequest **, int, <anonymous enum>) (this=0x6dbc000, committedReqs=<value optimized out>, numReqs=10, errCode=COUCHSTORE_SUCCESS) at src/couch-kvstore/couch-kvstore.cc:1655 #5 0x00007f9cfca7c07c in CouchKVStore::commit2couchstore (this=0x6dbc000) at src/couch-kvstore/couch-kvstore.cc:1488 #6 0x00007f9cfca7c25a in CouchKVStore::commit (this=0x777d) at src/couch-kvstore/couch-kvstore.cc:871 #7 0x00007f9cfca0ee06 in EventuallyPersistentStore::flushVBucket (this=0x6d9e480, vbid=<value optimized out>) at src/ep.cc:1977 #8 0x00007f9cfca3ca1a in doFlush (this=0x6dba5a0, d=..., tid=...) at src/flusher.cc:215 #9 Flusher::step (this=0x6dba5a0, d=..., tid=...) at src/flusher.cc:153 #10 0x00007f9cfca042ca in Dispatcher::run (this=0x6d9aa80) at src/dispatcher.cc:171 #11 0x00007f9cfca04b7d in launch_dispatcher_thread (arg=0x777d) at src/dispatcher.cc:28 #12 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #13 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #14 0x0000000000000000 in ?? () |
| Comments |
| Comment by Andrei Baranouski [ 19/Mar/13 ] |
|
core file: root(couchbase)@10.3.121.95:/core/7938-2.0.2-741.core.memcached.30589 https://s3.amazonaws.com/bugdb/jira/MB-7938/10.3.121.92-3182013-842-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/10.3.121.93-3182013-836-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/10.3.121.94-3182013-838-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/10.3.121.95-3182013-839-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/10.3.121.96-3182013-840-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/10.3.121.97-3182013-841-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/10.3.121.98-3182013-837-diag.zip |
| Comment by Xiaoqin Ma [ 26/Mar/13 ] |
|
Looks like the value get overflowed, we did 0 - 1 on diskQueuSize, which causes it be the biggest positive number: (gdb) p stats->diskQueueSize $6 = {value = 18446744073709551615} (gdb) p (size_t) - 1 $7 = 18446744073709551615 |
| Comment by Xiaoqin Ma [ 26/Mar/13 ] |
|
Hi Andrei, Can you give me more input about what is the setup? What is the rebalance set up? Are there any read/write operations at the same time on the cluster? Is the failed node is the new node to be added or an existing node in the cluster? Thanks! |
| Comment by Xiaoqin Ma [ 26/Mar/13 ] |
| Also, is it possible that I run the script by myself to do live debugging? Does it happen each time for the tests or just occasionally? How long does it run before the crash? |
| Comment by Mike Wiederhold [ 10/Apr/13 ] |
|
See |
| Comment by Maria McDuff [ 26/Apr/13 ] |
| pls verify / close. |
| Comment by Thuan Nguyen [ 27/Apr/13 ] |
|
Integrated in github-ep-engine-2-0 #485 (See [http://qa.hq.northscale.net/job/github-ep-engine-2-0/485/]) Result = SUCCESS Mike Wiederhold : Files : * src/ep.cc |
| Comment by Andrei Baranouski [ 08/May/13 ] |
|
reproduced on 2.0.2-789 http://qa.hq.northscale.net/job/centos-64-2.0-rebalance-regressions-P1/206/consoleFull gdb /opt/couchbase/bin/memcached core.memcached.26798 GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2 Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /opt/couchbase/bin/memcached...done. [New Thread 26893] [New Thread 26895] [New Thread 26897] [New Thread 26898] [New Thread 26896] [New Thread 2986] [New Thread 2988] [New Thread 2987] [New Thread 2996] [New Thread 2995] [New Thread 2997] [New Thread 3003] [New Thread 3004] [New Thread 26798] [New Thread 26806] [New Thread 26807] [New Thread 26811] [New Thread 26813] [New Thread 26890] [New Thread 26809] [New Thread 26812] [New Thread 3005] [New Thread 26810] [New Thread 26892] [New Thread 26891] [New Thread 26894] warning: Can't read pathname for load map: Input/output error. Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done. Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0 Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done. Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5 Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libdl.so.2 Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libm.so.6 Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/librt.so.1 Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done. Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4 Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libpthread.so.0 Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib/x86_64-linux-gnu/libstdc++.so.6 Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libgcc_s.so.1 Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done. Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done. Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so Reading symbols from /lib/x86_64-linux-gnu/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libz.so.1 Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done. Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so Reading symbols from /opt/couchbase/lib/memcached/ep.so...done. Loaded symbols for /opt/couchbase/lib/memcached/ep.so Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done. Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1 Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done. Loaded symbols for /opt/couchbase/lib/libsnappy.so.1 Reading symbols from /opt/couchbase/lib/libicuuc.so.44...done. Loaded symbols for /opt/couchbase/lib/libicuuc.so.44 Reading symbols from /opt/couchbase/lib/libicudata.so.44...(no debugging symbols found)...done. Loaded symbols for /opt/couchbase/lib/libicudata.so.44 Reading symbols from /opt/couchbase/lib/libicui18n.so.44...done. Loaded symbols for /opt/couchbase/lib/libicui18n.so.44 Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'. Program terminated with signal 6, Aborted. #0 0x00007fb4f811ad05 in raise () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) t a a bt Thread 26 (Thread 26894): #0 0x00007fb4f81c843d in fdatasync () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f2e8faaf in couch_sync (handle=<value optimized out>) at src/os.c:117 #2 0x00007fb4f314857f in cfs_sync (h=0x48f8ea40) at src/couch-kvstore/couch-fs-stats.cc:88 #3 0x00007fb4f2e89e2f in couchstore_commit (db=0x5dd2ee0) at src/couch_db.c:193 #4 0x00007fb4f313dff6 in CouchKVStore::saveDocs (this=0x3b6e4c00, vbid=731, rev=<value optimized out>, docs=<value optimized out>, docinfos=0x48f8f9a0, docCount=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:1486 #5 0x00007fb4f313e74b in CouchKVStore::commit2couchstore (this=0x3b6e4c00) at src/couch-kvstore/couch-kvstore.cc:1411 #6 0x00007fb4f313e93a in CouchKVStore::commit (this=0x42) at src/couch-kvstore/couch-kvstore.cc:806 #7 0x00007fb4f30ca4c6 in EventuallyPersistentStore::flushVBucket (this=0x92c5000, vbid=<value optimized out>) at src/ep.cc:1919 #8 0x00007fb4f30f55d9 in doFlush (this=0x9ac59e0, tid=26148) at src/flusher.cc:222 #9 Flusher::step (this=0x9ac59e0, tid=26148) at src/flusher.cc:152 #10 0x00007fb4f3106140 in ExecutorThread::run (this=0x5eacea0) at src/scheduler.cc:148 #11 0x00007fb4f310686d in launch_executor_thread (arg=0x42) at src/scheduler.cc:34 #12 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #13 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #14 0x0000000000000000 in ?? () Thread 25 (Thread 26891): #0 0x00007fb4f81c843d in fdatasync () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f2e8faaf in couch_sync (handle=<value optimized out>) at src/os.c:117 #2 0x00007fb4f314857f in cfs_sync (h=0xc1ade20) at src/couch-kvstore/couch-fs-stats.cc:88 #3 0x00007fb4f2e89e03 in couchstore_commit (db=0x5dd36c0) at src/couch_db.c:184 #4 0x00007fb4f313dff6 in CouchKVStore::saveDocs (this=0xb06cc00, vbid=1020, rev=<value optimized out>, docs=<value optimized out>, docinfos=0x4b03c660, docCount=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:1486 #5 0x00007fb4f313e74b in CouchKVStore::commit2couchstore (this=0xb06cc00) at src/couch-kvstore/couch-kvstore.cc:1411 #6 0x00007fb4f313e93a in CouchKVStore::commit (this=0x79) at src/couch-kvstore/couch-kvstore.cc:806 #7 0x00007fb4f30ca4c6 in EventuallyPersistentStore::flushVBucket (this=0x92c5000, vbid=<value optimized out>) at src/ep.cc:1919 #8 0x00007fb4f30f55d9 in doFlush (this=0x9ac5680, tid=26146) at src/flusher.cc:222 #9 Flusher::step (this=0x9ac5680, tid=26146) at src/flusher.cc:152 #10 0x00007fb4f3106140 in ExecutorThread::run (this=0x5e81ba0) at src/scheduler.cc:148 #11 0x00007fb4f310686d in launch_executor_thread (arg=0x79) at src/scheduler.cc:34 #12 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #13 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #14 0x0000000000000000 in ?? () Thread 24 (Thread 26892): #0 0x00007fb4f81c2e93 in poll () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f314b8b5 in CouchNotifier::waitForReadable (this=0x3fd70000, tryOnce=<value optimized out>) at src/couch-kvstore/couch-notifier.cc:629 #2 0x00007fb4f314bfa5 in waitOnce (this=0x3fd70000, vbs=..., file_version=<value optimized out>, header_offset=<value optimized out>, cb=<value optimized out>) at src/couch-kvstore/couch-notifier.cc:674 #3 CouchNotifier::notify_update (this=0x3fd70000, vbs=..., file_version=<value optimized out>, header_offset=<value optimized out>, cb=<value optimized out>) at src/couch-kvstore/couch-notifier.cc:752 #4 0x00007fb4f313e138 in notify_headerpos_update (this=0x3b6e5800, vbid=973, rev=<value optimized out>, docs=<value optimized out>, docinfos=0x2736fd80, docCount=<value optimized out>) at ./src/couch-kvstore/couch-notifier.hh:127 #5 CouchKVStore::saveDocs (this=0x3b6e5800, vbid=973, rev=<value optimized out>, docs=<value optimized out>, docinfos=0x2736fd80, docCount=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:1499 #6 0x00007fb4f313e74b in CouchKVStore::commit2couchstore (this=0x3b6e5800) at src/couch-kvstore/couch-kvstore.cc:1411 #7 0x00007fb4f313e93a in CouchKVStore::commit (this=0x7fb4f0cd2680) at src/couch-kvstore/couch-kvstore.cc:806 #8 0x00007fb4f30ca4c6 in EventuallyPersistentStore::flushVBucket (this=0x92c5000, vbid=<value optimized out>) at src/ep.cc:1919 #9 0x00007fb4f30f55d9 in doFlush (this=0x9ac5d40, tid=26149) at src/flusher.cc:222 #10 Flusher::step (this=0x9ac5d40, tid=26149) at src/flusher.cc:152 #11 0x00007fb4f3106140 in ExecutorThread::run (this=0x5e81a00) at src/scheduler.cc:148 #12 0x00007fb4f310686d in launch_executor_thread (arg=0x7fb4f0cd2680) at src/scheduler.cc:34 #13 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #14 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #15 0x0000000000000000 in ?? () ---Type <return> to continue, or q <return> to quit--- Thread 23 (Thread 26810): #0 0x00007fb4f81d0633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f8f8ef36 in epoll_dispatch (base=0x5dcc280, tv=<value optimized out>) at epoll.c:404 #2 0x00007fb4f8f7a394 in event_base_loop (base=0x5dcc280, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414c84 in worker_libevent (arg=0x15674f8) at daemon/thread.c:301 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 22 (Thread 3005): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30bc778 in wait (this=0x81463f0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x81463f0, d=...) at src/dispatcher.cc:342 #3 0x00007fb4f30bf32a in Dispatcher::run (this=0x8162700) at src/dispatcher.cc:184 #4 0x00007fb4f30bfafd in launch_dispatcher_thread (arg=0x8162754) at src/dispatcher.cc:28 #5 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 21 (Thread 26812): #0 0x00007fb4f81d0633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f8f8ef36 in epoll_dispatch (base=0x5dcca00, tv=<value optimized out>) at epoll.c:404 #2 0x00007fb4f8f7a394 in event_base_loop (base=0x5dcca00, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414c84 in worker_libevent (arg=0x15676e8) at daemon/thread.c:301 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 20 (Thread 26809): #0 0x00007fb4f81d0633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f8f8ef36 in epoll_dispatch (base=0x5dcc500, tv=<value optimized out>) at epoll.c:404 #2 0x00007fb4f8f7a394 in event_base_loop (base=0x5dcc500, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414c84 in worker_libevent (arg=0x1567400) at daemon/thread.c:301 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 19 (Thread 26890): #0 0x00007fb4f81974ed in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f81c8914 in usleep () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fb4f3101385 in updateStatsThread (arg=<value optimized out>) at src/memory_tracker.cc:31 #3 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #4 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x0000000000000000 in ?? () Thread 18 (Thread 26813): #0 0x00007fb4f81d0633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f8f8ef36 in epoll_dispatch (base=0x5dcc780, tv=<value optimized out>) at epoll.c:404 #2 0x00007fb4f8f7a394 in event_base_loop (base=0x5dcc780, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414c84 in worker_libevent (arg=0x15677e0) at daemon/thread.c:301 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 17 (Thread 26811): #0 0x00007fb4f81d0633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f8f8ef36 in epoll_dispatch (base=0x5dccc80, tv=<value optimized out>) at epoll.c:404 ---Type <return> to continue, or q <return> to quit--- #2 0x00007fb4f8f7a394 in event_base_loop (base=0x5dccc80, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414c84 in worker_libevent (arg=0x15675f0) at daemon/thread.c:301 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 16 (Thread 26807): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f6fc5176 in logger_thead_main (arg=<value optimized out>) at extensions/loggers/file_logger.c:368 #2 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #3 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x0000000000000000 in ?? () Thread 15 (Thread 26806): #0 0x00007fb4f81c12ed in read () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f815c798 in _IO_file_underflow () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fb4f815d7be in _IO_default_uflow () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007fb4f81518fa in _IO_getline_info () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x00007fb4f81507ca in fgets () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x00007fb4f79c8b19 in fgets (arg=<value optimized out>) at /usr/include/bits/stdio2.h:255 #6 check_stdin_thread (arg=<value optimized out>) at extensions/daemon/stdin_check.c:37 #7 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #8 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #9 0x0000000000000000 in ?? () Thread 14 (Thread 26798): #0 0x00007fb4f81d0633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f8f8ef36 in epoll_dispatch (base=0x5dcc000, tv=<value optimized out>) at epoll.c:404 #2 0x00007fb4f8f7a394 in event_base_loop (base=0x5dcc000, flags=<value optimized out>) at event.c:1558 #3 0x000000000040c841 in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7926 Thread 13 (Thread 3004): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30bc778 in wait (this=0x81465a0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x81465a0, d=...) at src/dispatcher.cc:342 #3 0x00007fb4f30bf32a in Dispatcher::run (this=0x81621c0) at src/dispatcher.cc:184 #4 0x00007fb4f30bfafd in launch_dispatcher_thread (arg=0x8162214) at src/dispatcher.cc:28 #5 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 12 (Thread 3003): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30daf7f in wait (this=0x5dc8400) at src/syncobject.hh:57 #2 wait (this=0x5dc8400) at src/syncobject.hh:73 #3 wait (this=0x5dc8400) at src/tapconnmap.hh:169 #4 EventuallyPersistentEngine::notifyPendingConnections (this=0x5dc8400) at src/ep_engine.cc:3379 #5 0x00007fb4f30db063 in EvpNotifyPendingConns (arg=0x5dc8400) at src/ep_engine.cc:1153 #6 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #7 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #8 0x0000000000000000 in ?? () Thread 11 (Thread 2997): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30bc778 in wait (this=0x6e46900, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x6e46900, d=...) at src/dispatcher.cc:342 #3 0x00007fb4f30bf32a in Dispatcher::run (this=0x5e0b6c0) at src/dispatcher.cc:184 ---Type <return> to continue, or q <return> to quit--- #4 0x00007fb4f30bfafd in launch_dispatcher_thread (arg=0x5e0b714) at src/dispatcher.cc:28 #5 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 10 (Thread 2995): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30daf7f in wait (this=0x5dc6000) at src/syncobject.hh:57 #2 wait (this=0x5dc6000) at src/syncobject.hh:73 #3 wait (this=0x5dc6000) at src/tapconnmap.hh:169 #4 EventuallyPersistentEngine::notifyPendingConnections (this=0x5dc6000) at src/ep_engine.cc:3379 #5 0x00007fb4f30db063 in EvpNotifyPendingConns (arg=0x5dc6000) at src/ep_engine.cc:1153 #6 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #7 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #8 0x0000000000000000 in ?? () Thread 9 (Thread 2996): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30bc778 in wait (this=0x6e46630, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x6e46630, d=...) at src/dispatcher.cc:342 #3 0x00007fb4f30bf32a in Dispatcher::run (this=0x5e0ba40) at src/dispatcher.cc:184 #4 0x00007fb4f30bfafd in launch_dispatcher_thread (arg=0x5e0ba94) at src/dispatcher.cc:28 #5 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 8 (Thread 2987): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30bc778 in wait (this=0x8146990, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x8146990, d=...) at src/dispatcher.cc:342 #3 0x00007fb4f30bf32a in Dispatcher::run (this=0xbe02000) at src/dispatcher.cc:184 #4 0x00007fb4f30bfafd in launch_dispatcher_thread (arg=0xbe02054) at src/dispatcher.cc:28 #5 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 7 (Thread 2988): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30bc778 in wait (this=0x8147680, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x8147680, d=...) at src/dispatcher.cc:342 #3 0x00007fb4f30bf32a in Dispatcher::run (this=0xbe02a80) at src/dispatcher.cc:184 #4 0x00007fb4f30bfafd in launch_dispatcher_thread (arg=0xbe02ad4) at src/dispatcher.cc:28 #5 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 6 (Thread 2986): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30daf7f in wait (this=0x5dc9600) at src/syncobject.hh:57 #2 wait (this=0x5dc9600) at src/syncobject.hh:73 #3 wait (this=0x5dc9600) at src/tapconnmap.hh:169 #4 EventuallyPersistentEngine::notifyPendingConnections (this=0x5dc9600) at src/ep_engine.cc:3379 #5 0x00007fb4f30db063 in EvpNotifyPendingConns (arg=0x5dc9600) at src/ep_engine.cc:1153 #6 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #7 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #8 0x0000000000000000 in ?? () ---Type <return> to continue, or q <return> to quit--- Thread 5 (Thread 26896): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f3105fd1 in wait (this=0x5eacb60) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x5eacb60) at src/scheduler.cc:134 #3 0x00007fb4f310686d in launch_executor_thread (arg=0x5eacba4) at src/scheduler.cc:34 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 4 (Thread 26898): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f3105fd1 in wait (this=0x5eac820) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x5eac820) at src/scheduler.cc:134 #3 0x00007fb4f310686d in launch_executor_thread (arg=0x5eac864) at src/scheduler.cc:34 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 3 (Thread 26897): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f3105fd1 in wait (this=0x5eac9c0) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x5eac9c0) at src/scheduler.cc:134 #3 0x00007fb4f310686d in launch_executor_thread (arg=0x5eaca04) at src/scheduler.cc:34 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 2 (Thread 26895): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f3105fd1 in wait (this=0x5eacd00) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x5eacd00) at src/scheduler.cc:134 #3 0x00007fb4f310686d in launch_executor_thread (arg=0x5eacd44) at src/scheduler.cc:34 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 1 (Thread 26893): #0 0x00007fb4f811ad05 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f811eab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fb4f81137c5 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007fb4f30d816a in PersistenceCallback::callback(std::pair<int, long>&) () from /opt/couchbase/lib/memcached/ep.so #4 0x00007fb4f313cc34 in CouchKVStore::commitCallback(CouchRequest **, int, <anonymous enum>) (this=0x840c300, committedReqs=<value optimized out>, numReqs=981, errCode=COUCHSTORE_SUCCESS) at src/couch-kvstore/couch-kvstore.cc:1591 #5 0x00007fb4f313e766 in CouchKVStore::commit2couchstore (this=0x840c300) at src/couch-kvstore/couch-kvstore.cc:1418 #6 0x00007fb4f313e93a in CouchKVStore::commit (this=0x68ae) at src/couch-kvstore/couch-kvstore.cc:806 #7 0x00007fb4f30ca4c6 in EventuallyPersistentStore::flushVBucket (this=0x8c4cc00, vbid=<value optimized out>) at src/ep.cc:1919 #8 0x00007fb4f30f55d9 in doFlush (this=0x9ac46c0, tid=24958) at src/flusher.cc:222 #9 Flusher::step (this=0x9ac46c0, tid=24958) at src/flusher.cc:152 #10 0x00007fb4f3106140 in ExecutorThread::run (this=0x5e81860) at src/scheduler.cc:148 #11 0x00007fb4f310686d in launch_executor_thread (arg=0x68ae) at src/scheduler.cc:34 #12 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #13 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #14 0x0000000000000000 in ?? () root(couchbase)@10.3.121.98 /cores/7938-2.0.2-789-core.memcached.26798 https://s3.amazonaws.com/bugdb/jira/MB-7938/13460c18/10.3.121.92-582013-546-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/13460c18/10.3.121.93-582013-541-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/13460c18/10.3.121.94-582013-543-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/13460c18/10.3.121.95-582013-545-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/13460c18/10.3.121.96-582013-544-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/13460c18/10.3.121.97-582013-547-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/13460c18/10.3.121.98-582013-543-diag.zip |
| Comment by Maria McDuff [ 08/May/13 ] |
|
upgrading to blocker. also see MB-7735. |
| Comment by Mike Wiederhold [ 08/May/13 ] |
| This crash has nothing to do with MB-7735. |
| Comment by Mike Wiederhold [ 08/May/13 ] |
| Please take a look at this issue. I can take a look at it next week if you don't have time to get it resolved by then. |
| Comment by Andrei Baranouski [ 10/May/13 ] |
|
see the same crash on 2.0.2-787-rel http://qa.hq.northscale.net/job/centos-64-2.0-failover-tests-P0/609/consoleFull
Thread 1 (Thread 0x46a74940 (LWP 22854)): #0 0x0000003828630265 in raise () from /lib64/libc.so.6 #1 0x0000003828631d10 in abort () from /lib64/libc.so.6 #2 0x00000038286296e6 in __assert_fail () from /lib64/libc.so.6 #3 0x00002aaaaaf0dd0a in PersistenceCallback::callback(std::pair<int, long>&) () from /opt/couchbase/lib/memcached/ep.so #4 0x00002aaaaaf65a84 in CouchKVStore::commitCallback (this=0x2aaabdb7e300, committedReqs=<value optimized out>, numReqs=31, errCode=COUCHSTORE_SUCCESS) at src/couch-kvstore/couch-kvstore.cc:1591 #5 0x00002aaaaaf67596 in CouchKVStore::commit2couchstore (this=0x2aaabdb7e300) at src/couch-kvstore/couch-kvstore.cc:1418 #6 0x00002aaaaaf6776a in CouchKVStore::commit (this=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:806 #7 0x00002aaaaaf01aa6 in EventuallyPersistentStore::flushVBucket (this=0x1d1f2c00, vbid=599) at src/ep.cc:2059 #8 0x00002aaaaaf2ae59 in doFlush (this=0x86f3a40, d=..., tid=...) at src/flusher.cc:226 #9 Flusher::step (this=0x86f3a40, d=..., tid=...) at src/flusher.cc:157 #10 0x00002aaaaaef582a in Dispatcher::run (this=0x190a1180) at src/dispatcher.cc:184 #11 0x00002aaaaaef5fed in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28 #12 0x000000382920673d in start_thread () from /lib64/libpthread.so.0 #13 0x00000038286d44bd in clone () from /lib64/libc.so.6 vms: root(couchbase)@10.1.3.118:/tmp/core.memcached.17604 & /tmp/core.memcached.23647 root(couchbase)@10.1.3.117:/tmp/core.memcached.730 |
| Comment by Jin Lim [ 10/May/13 ] |
|
* Both the first reported crash in March and the latest crash caused by incorrectly accounting disk queue stat. It appears to be that the stat got arithmetically underflowed. But I believe the root cause that led to the condition was different for each crash though (there are a few changes have made in the code path btw March and now, ex KVShard, etc) * A fix is uploaded for review, http://review.couchbase.org/#/c/26242/ * QE (Andrei) please pick up a toy build, 2.0.0-MRW31-toy-community at http://builds.hq.northscale.net:8010/builders/ec2-centos-x64_toy-couchstore-builder/builds/165, and validate the above fix (as usual we could not reproduce the same crash at the dev side) - many thanks! * Will mark this as fixed after the code review + QE's validation |
| Comment by Maria McDuff [ 10/May/13 ] |
| andrei, pls provide test result from this toy build. thanks. |
| Comment by Andrei Baranouski [ 15/May/13 ] |
| toy build hangs on rebalance http://qa.hq.northscale.net/job/centos-64-2.0-failover-tests-P0/614/console |
| Comment by Jin Lim [ 15/May/13 ] |
|
|
| Comment by Andrei Baranouski [ 15/May/13 ] |
|
okay, but 2.0.0-MRW31 hangs on rebalance http://qa.hq.northscale.net/job/centos-64-2.0-failover-tests-P0/614/consoleFull only one test passed for this run so, I can't verify entire set of tests where we had the crash. I guess 2.0.0-MRW31 doesn't contain fix for |
| Comment by Jin Lim [ 15/May/13 ] |
| yes thanks. the crash must have occurred prior to the hang condition though - but I agree we need to validate the fix with the complete run of any relavant test. |
| Comment by Jin Lim [ 17/May/13 ] |
| The last build 806 didn't show the flushVBucket crash anymore . Please confirm and close it. Thanks! |
[MB-7972] Sever performance dip during rebalancing observed in demo environment Created: 26/Mar/13 Updated: 17/May/13 |
|
| Status: | In Progress |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket, moxi, ns_server |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Perry Krug | Assignee: | Chiyoung Seo |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | customer, performance | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
Environment:
-memcache-test running through client-side Moxi. -memcache-test workload: /root/mctest/Users/ingenthr/tmp/memcachetest-centos_x86 -h localhost -i 1000 -t 4 -K test -M 10 -F -S -l -Moxi config: CLI: /opt/moxi/bin/moxi -u nobody -Z usr=Administrator,pwd=password http://10.197.1.147:8091/pools/default/bucketsStreaming/beer-sample -vv -O /var/log/moxi, config: port_listen=11211, default_bucket_name=default, downstream_max=1024, downstream_conn_max=4, downstream_conn_queue_timeout=200, downstream_timeout=5000, wait_queue_timeout=200, connect_max_errors=5, connect_retry_interval=30000, connect_timeout=400, auth_timeout=100, cycle=200 -4 nodes of Couchbase Server 2.0.1. c1.medium EC2 instances, beer-sample database bucket -2 nodes of same config added to cluster and rebalanced Observation: -Workload very steady around 8k ops/sec before rebalance -Workload drops to around 4k ops/sec during rebalance. Both gets and sets affected, compaction not running when load initially drops (kicks in eventually). Workload very choppy during rebalance. -Workload returns to 8k ops/sec after rebalance Client memcachetest and moxi logs: https://s3.amazonaws.com/customers.couchbase.com/demo_performance/client1.zip https://s3.amazonaws.com/customers.couchbase.com/demo_performance/client2.zip Cluster logs: https://s3.amazonaws.com/customers.couchbase.com/demo_performance/node1.zip https://s3.amazonaws.com/customers.couchbase.com/demo_performance/node2.zip https://s3.amazonaws.com/customers.couchbase.com/demo_performance/node3.zip https://s3.amazonaws.com/customers.couchbase.com/demo_performance/node4.zip https://s3.amazonaws.com/customers.couchbase.com/demo_performance/node5.zip (aded to first 4) https://s3.amazonaws.com/customers.couchbase.com/demo_performance/node6.zip (added to first 4) |
| Comments |
| Comment by Pavel Paulau [ 27/Mar/13 ] |
| c1.medium has 1.7 GB RAM. Are you sure? Pretty much enough for beer samples but generally it's very strange setup. |
| Comment by Pavel Paulau [ 27/Mar/13 ] |
| Ok, I reproduced that. Will try to investigate. |
| Comment by Sharon Barr [ 28/Mar/13 ] |
|
related bug is |
| Comment by Pavel Paulau [ 10/Apr/13 ] |
|
I tried physical environment with SSD. Instead of memcachetest and moxi I used YCSB which is based on Java SDK.
It's non-DGM rebalance from 3 to 4 nodes, just 5M items. However there is a noticeable drop in throughput (ycsb_01.png) in my case as well. Performance improves shortly after that but throughput is still choppy (ycsb_02.png). |
| Comment by Pavel Paulau [ 11/Apr/13 ] |
|
Answering Jin's question: "Was the performance regression comparison to the same test running on memcachetest and moxi or existing baseline? EP Engine tweaks priorities among I/Os from rebalancing vbuckets and front loads, which might have caused choppy throughput for front loads. However, everything should have gone back to normal after the rebalancing. Did you see different behavior on your latest test?" It's not necessarily regression. At least I haven't tried 2.0 yet (and I'm going to do that). Perry reported the issue so there is no baseline so far. Both me and Perry observed that everything becomes normal once rebalance finishes. |
| Comment by Pavel Paulau [ 11/Apr/13 ] |
| Well, 2.0 looks perfect, no dip, no choppy throughput. |
| Comment by Anil Kumar [ 11/Apr/13 ] |
|
Assigning this to Jin. We need resolution from EP-Engine team. |
| Comment by Perry Krug [ 12/Apr/13 ] |
| Pavel, I had reproduced this behavior on both 2.0 and 2.0.1...is it possible something changed with your test? |
| Comment by Chiyoung Seo [ 23/Apr/13 ] |
|
I tried to reproduce this issue, but didn't see the performance drop in the following scenario: 1) Set up the two node cluster and loaded 1M items with a mixed load 2) Add two new nodes to the cluster with the constant mixed load (50% SET and 50% GET) I saw that the performance dropped in the beginning of the rebalance, but came back after a few seconds. Let me continue to debug this issue with different scenarios. |
| Comment by Chiyoung Seo [ 23/Apr/13 ] |
|
Pavel,
I'd like the perf team to measure the throughput and latency during rebalance in a large scale through our perf test framework. I didn't see this issue in my simple scenarios. Thanks |
| Comment by Pavel Paulau [ 25/Apr/13 ] |
|
Chiyoung, Performance testing framework uses moxi and I'd not rely on it in this case. How do you simulate workload? |
| Comment by Chiyoung Seo [ 25/Apr/13 ] |
|
Pavel, I just used mcsoda with the moxi and mixed load, and didn't see this issue. As you know, we've measured latency and throughput during rebalance as part of performance tests for 2.0 GA release, but didn't see this issue. Ronnie showed me lots of graphs in the past. I just want to make sure whether we newly have this performance issue from 2.0.1 or 2.0.2. Thanks |
| Comment by Perry Krug [ 26/Apr/13 ] |
|
I just reproduced this myself again on the latest 2.0.2 (build 778). See the attached screenshot showing performance drop from a very steady 8k ops/sec down to 4k and then it stays very low and choppy for many minutes until the rebalance completes and it immediately picks back up (see second screenshot). All of the details of the environment and workload are above, what else do you need in order to reproduce and/or diagnose this? |
| Comment by Pavel Paulau [ 29/Apr/13 ] |
|
I tried two different java workload generators: wrath from AOL and RoadRunner from Michael Nitschinger.
Both demonstrate certain issues during rebalance. However there is a huge difference between 2.0.0 and 2.0.1/2.0.2 throughput patterns. See attached screenshot for details. |
| Comment by Pavel Paulau [ 29/Apr/13 ] |
|
Attached packages and sample config file for wrath:
$ java -jar wrath.jar $ java -jar RoadRunner-0.3-jar-with-dependencies.jar -c 16 -d 10000000 -n 172.23.96.15,172.23.96.16,172.23.96.17 -s 1 S 2048 --ratio 0 |
| Comment by Chiyoung Seo [ 29/Apr/13 ] |
|
Thanks Pavel.
I saw the same issue in my tests too when I rebalance in 3 -> 4 with 2.0.2 build. One of the interesting things was that the erlang beam.smp CPU usage on a couple of nodes spiked to more than 200% frequently and stayed for a while during rebalance. The memcached CPU usage usually remains between 50% - 100%. Each node has four cores. I think we need to run this test through our performance framework to monitor and get the CPU usage of both beam.smp and memcached processes on each node before / during / after rebalance. Please let me know if you have any other suggestions. |
| Comment by Pavel Paulau [ 30/Apr/13 ] |
|
It makes sense to me. By the way, do you observe the same issue with 2.0.1? |
| Comment by Chiyoung Seo [ 30/Apr/13 ] |
|
Pavel, Yes, I also saw the same issue with 2.0.1. |
| Comment by Wayne Siu [ 02/May/13 ] |
| Hi Pavel, do you have an update on this? Thanks. |
| Comment by Pavel Paulau [ 06/May/13 ] |
|
cpu stats for memcached and beam.smp.
Total cpu utilization is collected via ns_server so maximum is 100%, otherwise it's 2400% (I have 24 cores). |
| Comment by Chiyoung Seo [ 06/May/13 ] |
|
Thanks Pavel. Obviously, this isn't the CPU bound. In your tests, ops / sec dropped by 15 - 20% during rebalance and came back afterwards. Btw, I also ran the full backup client against a single node cluster to see if the frontend ops / sec drops while the backup is performed through TAP, but didn't see any performance drop although the memcached CPU usage is increased by 30 - 60% during the backup period. The reason why I ran this additional test is to see if TAP-based streaming can affect the frontend performance significantly or not. Let me continue to work on this issue. |
| Comment by Chiyoung Seo [ 07/May/13 ] |
| I even saw the ops / sec drop in 2.0.0 and 1.8.1 during rebalance. |
| Comment by Chiyoung Seo [ 07/May/13 ] |
|
We saw a fair number of "not_my_vbucket" errors when rebalanced in from 3 to 4: Chiyoung-MacBook:ep-engine chiyoung$ ./management/cbstats 10.5.2.35:11211 all | grep not ep_num_not_my_vbuckets: 1018 The rebalance took 3 minutes, which means that there are approximately five "not_my_vbucket" errors per second. I ran the mcsoda load generator with 8 threads and saw 20% ~ 40% of the performance drop during the rebalance. I need to discuss how we can handle these not_my_vbucket errors in a better way with Steve (moxi inventor). My understanding is that if a moxi receives "not_my_vbucket" error from a given downstream node, it will simply try each of the other nodes at a time. |
| Comment by Chiyoung Seo [ 09/May/13 ] |
|
To debug this issue furthermore, I implemented the simple python client that intentionally sends some of GET / SET requests to their replica vbuckets, so that it gets "NOT_MY_VBUCKET" responses from the nodes and then sends them to their corresponding active nodes. When the ratio of incorrect GET / SET requests is under 10%, I didn't see significant impact on ops / sec. Obviously, NOT_MY_VBUCKET" responses during rebalance is much lower than 10%. However, in my tests, I assume that the new partition map is immediately available when the client resends GET / SET requests to their current active nodes. |
| Comment by Chiyoung Seo [ 09/May/13 ] |
|
Here is the summary of my findings so far:
1) When the number of threads in the client load generator <= 8, I saw the ops / sec drop by 20% - 40% range during rebalance. 2) If the number of threads > 8 (I tried 16 and 32 threads), the ops sec didn't drop too much (less than 5%) during rebalance. 3) When ops / sec starts to drop, I observed that the number of "NOT_MY_VBUCKET" errors increased suddely. I used mcsoda and client-side moxi and did 3->4 rebalance-in or 4->3 rebalance-out operations in my tests. I'm actually running out of ideas, but don't think this issue is a blocker as long as we don't see much drop in case of more threads in a client load generator. As you know, most of our customers have much more than 8 threads in their applications. Here are my suggestions: 1) We may need to ask the perf team to increase the number of threads in the client load generator to see if they observe the same behavior as my tests. 2) We need to measure the ops / sec during rebalance in a large cluster (at least 10 nodes). In our tests, we simply use 3 or 4 nodes, which is a micro test case. 3) I also suggest the perf team to monitor "NOT_MY_VBUCKET" errors during rebalance in order to match "NOT_MY_VBUCKET" errors with the performance drop over the time. |
| Comment by Pavel Paulau [ 10/May/13 ] |
|
1. From my perspective I can confirm that adding more working threads does help a lot. 2. At this point I can hardly help with > 10 nodes cluster since I don't have such resources. 3. Will check that. |
| Comment by Maria McDuff [ 13/May/13 ] |
| assigning to wayne to identify resources for item 2 (see pavel's). if no resources, can we do this in ec2? or it should be physical hw? |
| Comment by Perry Krug [ 13/May/13 ] |
|
Team, We're still missing something here. I just tried again with 16 threads...no difference. The main point in my situation that is not being addressed is the fact that the performance drop is on a bucket that is _NOT_ being rebalanced at first. Here is my setup (as I have detailed above): -4 node cluster -2 buckets, beer-sample and gamesim-sample -memcachetest through Moxi to beer-sample -Add 2 nodes and rebalance -You can see that gamesim-sample starts moving its vbuckets first, but the performance drop is on beer-sample immediately -I see no "not_my_vbuckets" reported on this bucket during the performance drop, and only a few hundred reported by the end of the second bucket's rebalance Can we please confirm that we see the same behavior in the exact same environment? I will happily hand over a RightScale environment that can reproduce this 100% and it is a major problem for the sales team and their demos. It may very well be a problem with Moxi but I don't have a load generator that I can try out that works with rebalance (MB-7991) and I'm surprised that no one has seen this before. |
| Comment by Pavel Paulau [ 13/May/13 ] |
|
I can confirm the performance drop when running test against bucket that is not being rebalanced. I used Java and Python based workload generators, both avoid moxi. https://github.com/daschl/RoadRunner https://github.com/pavel-paulau/spring |
| Comment by Maria McDuff [ 13/May/13 ] |
| chiyoung, assigning back to you. can you repro per perry's comment? |
| Comment by Chiyoung Seo [ 14/May/13 ] |
|
Maria,
I observed the same issue, but don't know how that can happen. As mentioned, I debugged this issue in the engine side, but couldn't find anything suspicious. I'm running out of idea. Please assign it to the other engineers for the second investigation. |
| Comment by Maria McDuff [ 14/May/13 ] |
|
per bug triage, mike to take a look at and will reach out to trond if needed.
|
| Comment by Maria McDuff [ 16/May/13 ] |
| per bug triage, put in jin's queue. when we get a stable 2.0.2 build with rebalance scheduling mods, this will need to be re-tested by Pavel. Jin will assign to him as soon as it is ready for Pavel. |
[MB-8316] Fix cbtransfer --help output Created: 17/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | None |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | None |
| Security Level: | Public |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Karen Zeller | Assignee: | Anil Kumar |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
Need to Engineering to update descriptions for cbtransfer --h to match with documentation. We just edited these for 2.0.2 int he docs and made them readable/understandable.
Not sure what this release this should be in. -h, --help Command help -b BUCKET_SOURCE Single named bucket from source cluster to transfer -B BUCKET_DESTINATION, --bucket-destination=BUCKET_DESTINATION Single named bucket on destination cluster which receives transfer. This allows you to transfer to a bucket with a different name as your source bucket. If you do not provide defaults to the same name as the bucket-source -i ID, --id=ID Transfer only items that match a vbucketID -k KEY, --key=KEY Transfer only items with keys that match a regexp -n, --dry-run No actual transfer; just validate parameters, files, connectivity and configurations -u USERNAME, --username=USERNAME REST username for source cluster or server node -p PASSWORD, --password=PASSWORD REST password for cluster or server node -t THREADS, --threads=THREADS Number of concurrent workers threads performing the transfer. Defaults to 4. -v, --verbose Verbose logging; provide more verbosity -x EXTRA, --extra=EXTRA Provide extra, uncommon config parameters --single-node Transfer from a single server node in a source cluster. This single server node is a source node URL --source-vbucket-state=SOURCE_VBUCKET_STATE Only transfer from source vbuckets in this state, such as 'active' (default) or 'replica'. Must be used with Couchbase cluster as source. --destination-vbucket-state=DESTINATION_VBUCKET_STATE Only transfer to destination vbuckets in this state, such as 'active' (default) or 'replica'. Must be used with Couchbase cluster as destination. --destination-operation=DESTINATION_OPERATION Perform this operation on transfer. "set" will override an existing document, 'add' will not override, 'get' will load all keys transferred from a source cluster into the caching layer at the destination. The following are extra, specialized command options you use in this form cbtransfer -x [EXTRA OPTIONS]: batch_max_bytes=400000 Transfer this # of bytes per batch. batch_max_size=1000 Transfer this # of documents per batch cbb_max_mb=100000 Split backup file on destination cluster if it exceeds MB max_retry=10 Max number of sequential retries if transfer fails nmv_retry=1 0 or 1, where 1 retries transfer after a NOT_MY_VBUCKET message. Default of 1. recv_min_bytes=4096 Amount of bytes for every TCP/IP batch transferred report=5 Number batches transferred before updating progress bar in console report_full=2000 Number batches transferred before emitting progress information in console try_xwm=1 As of 2.0.2, transfer documents with metadata. 1 is default. 0 should only be used if you transfer from 1.8.x to 1.8.x. For 2.0.2 <row> <entry>data_only=0</entry> <entry>For value 1, only transfer data from a backup file or cluster.</entry> </row> <row> <entry>design_doc_only=0</entry> <entry>For value 1, transfer design documents only from a backup file or cluster. Defaults to 0.</entry> </row> For 2.0.2 |
[MB-8154] ep_queue_size is not decreasing 20 mins after load was stopped Created: 25/Apr/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Iryna Mironava | Assignee: | Mike Wiederhold |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
centOS 64 bit
build 2.0.2-772-rel <manifest><remote name="couchbase" fetch="git://github.com/couchbase/"/><remote name="membase" fetch="git://github.com/membase/"/><remote name="apache" fetch="git://github.com/apache/"/><remote name="erlang" fetch="git://github.com/erlang/"/><default remote="couchbase" revision="master"/><project name="tlm" path="tlm" revision="14fb7cc05baf418a57d33ab7dd0e7239645ec156"><copyfile src="Makefile.top" dest="Makefile"/></project><project name="bucket_engine" path="bucket_engine" revision="2a797a8d97f421587cce728f2e6aa2cd42c8fa26"/><project name="ep-engine" path="ep-engine" revision="e38e9e49855362bcab0fa72258d888cf2423e4d5"/><project name="libconflate" path="libconflate" revision="c0d3e26a51f25a2b020713559cb344d43ce0b06c"/><project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/><project name="libvbucket" path="libvbucket" revision="026c79ae424a6daed4bb9345e86cc8fc21759b28"/><project name="couchbase-cli" path="couchbase-cli" revision="af83ea2e04736c1e9977f59bdba3f2e3390a86d8" remote="couchbase"/><project name="memcached" path="memcached" revision="f5f43c6971d88c839ee78bcf87d6e7f177cef7b4" remote="membase"/><project name="moxi" path="moxi" revision="4b391021af7a453bf88716724d2c644916ebd969"/><project name="ns_server" path="ns_server" revision="3d51cec3c9bc31e9d4d4dd496993aa5e9c39a00b"/><project name="portsigar" path="portsigar" revision="159b6179ea8a3c2075ee9eb2afa6f91c98c0fda6"/><project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/><project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/><project name="couchbase-python-client" path="couchbase-python-client" revision="006c1aa8b76f6bce11109af8a309133b57079c4c"/><project name="couchdb" path="couchdb" revision="586e4bb73b92db4362192616370c4e3edb8c34a0"/><project name="couchdbx-app" path="couchdbx-app" revision="cf709acdb8ee24cef158a2007189184e1e0f8016"/><project name="couchstore" path="couchstore" revision="ddc4ba05ac9459994464aac973f5815abb9d8aa6"/><project name="geocouch" path="geocouch" revision="ed9ad43aa361df0829262fef811b5236331b44c8"/><project name="testrunner" path="testrunner" revision="96018840bf35a31ae43bc2c409cd6012ac27879e"/><project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/><project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/><project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/><project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/><project name="gperftools" path="gperftools" revision="8f60ba949fb8576c530ef4be148bff97106ddc59" remote="couchbase"/><project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/></manifest> |
||
| Description |
|
1 default bucket, 3 replicas, 4 nodes cluster
loaded 1M items after load ep_queue_size is still 16 for 20 mins. attaching logs |
| Comments |
| Comment by Iryna Mironava [ 25/Apr/13 ] |
|
logs: https://s3.amazonaws.com/bugdb/jira/MB-8154/16fc64ab/172.27.33.10-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8154/16fc64ab/172.27.33.11-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8154/16fc64ab/172.27.33.12-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8154/16fc64ab/172.27.33.13-diag.zip |
| Comment by Chiyoung Seo [ 26/Apr/13 ] |
| I looked at the logs and seems to me that there are actually no dirty items in the disk write queue. This is just a stat bug. I will take a look at it more to see where the stat is not decremented correctly. |
| Comment by Mike Wiederhold [ 02/May/13 ] |
|
Chiyoung, I've been looking into a similar issue. If you don't have time to look at this please feel free to assign it to me. |
| Comment by Chiyoung Seo [ 02/May/13 ] |
| This is a stat update bug. I don't think we should promote it to critical. Please don't change the priority without understanding this issue and discussing it with the assignee. |
| Comment by Chiyoung Seo [ 10/May/13 ] |
|
Mike,
This stat issue happend before merging the MRW implementation. But, we still see this issue after merging the MRW. I think you did lots of refactoring around the flusher before. Please take a look at it. I'm running out of ideas. |
| Comment by Maria McDuff [ 14/May/13 ] |
| per bug triage, bumping to critical. we need status on this. |
| Comment by Mike Wiederhold [ 17/May/13 ] |
| We think the code with the multi-reader/writer fixed this issue and we have not seen it lately. Please file another issue if you run into again. |
[MB-8315] Need Decision on cbtransfer, recv_min_bytes=4096 Created: 17/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | None |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | None |
| Security Level: | Public |
| Type: | Bug | Priority: | Minor |
| Reporter: | Karen Zeller | Assignee: | Anil Kumar |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
It is a batch side for TCP/IP used in the transfer. Not sure if it should be public/tunable or not......
Public for now. |
[MB-8306] Number of document mutations pending XDC replication is rapidly growing Created: 17/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | cross-datacenter-replication |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Pavel Paulau | Assignee: | Pavel Paulau |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
VMs, 24GB RAM, 4 cores.
build 2.0.2-802 4 <-> 4 bidir replication, non-DGM. |
||
| Attachments: |
|
| Operating System: | Centos 64-bit |
| Description |
|
Also time spent waiting for checkpointing is much higher.
It seems to be 2.0.2 regression. Diags (internal IP address): http://172.23.96.10:8080/view/xdcr/job/xperf-lnx-bi/25/artifact/ PDF report (internal IP address): http://172.23.96.10:8080/view/graphs/job/graph-loop/159/artifact/xperf-mixed-bi.loop_2.0.1-170-rel-enterprise_2.0.2-802-rel-enterprise_DEST_May-16-2013_22%3A42%3A54.pdf |
| Comments |
| Comment by Junyi Xie [ 17/May/13 ] |
|
In the log, the XDCR worked as expected. No error was recorded within XDCR during test except 1 or 2 of checkpoint timeout, which should not matter much in terms of performance.
The increasing number of docs pending XDC is most likely due to the increasing drain rate (hence increasing the inflow rate to XDCR) , not because XDCR replicated significantly slower. This can be verified by the drain rate stat. This is not regression. The remaining question is, why checkpoint at destination becomes a bit longer. There are around 12 checkpoints per vb during the test, the average checkpoint time in 2.0.2 like 48K/(1024*12) = 4sec while in 2.0.1, it is around 1 sec. If i am correct, this should be a priority checkpoint in ep-engine, which also used in rebalance, I am not sure if it is a side-effects of MRW. |
| Comment by Junyi Xie [ 17/May/13 ] |
|
1. increasing pending docs
This is not a XDCR regression to me but we can try increase # of maxConcurrentReps from 32 to higher number to see if it is better. 2 slower checkpoint From discussion with Mike, recent ep_engine change to flusher may make checkpoint slower. XDCR will not complain much since we only issue checkpoint once every 30 min, but rebalance may possibly slow down due to slower checkpoint. |
[MB-7935] Don't set vbuckets to dead state when warming up Created: 18/Mar/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.1 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Mike Wiederhold | Assignee: | Abhinav Dangeti |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
[MB-8313] Table 7.4. Administration — Standard couchbase Tool Options should be updated with new options Created: 17/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | documentation |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | None |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Andrei Baranouski | Assignee: | Karen Zeller |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
http://www.couchbase.com/docs/couchbase-manual-2.0.pdf
Table 7.4. Administration — Standard couchbase Tool Options doesn't contains the full list of available options. Actual info we can get from from './couchbase-cli -h": CLUSTER: --cluster=HOST[:PORT] or -c HOST[:PORT] OPTIONS: -u USERNAME, --user=USERNAME admin username of the cluster -p PASSWORD, --password=PASSWORD admin password of the cluster -o KIND, --output=KIND KIND is json or standard -d, --debug server-add OPTIONS: --server-add=HOST[:PORT] server to be added --server-add-username=USERNAME admin username for the server to be added --server-add-password=PASSWORD admin password for the server to be added server-readd OPTIONS: --server-add=HOST[:PORT] server to be added --server-add-username=USERNAME admin username for the server to be added --server-add-password=PASSWORD admin password for the server to be added rebalance OPTIONS: --server-add* see server-add OPTIONS --server-remove=HOST[:PORT] the server to be removed failover OPTIONS: --server-failover=HOST[:PORT] server to failover cluster-* OPTIONS: --cluster-username=USER new admin username --cluster-password=PASSWORD new admin password --cluster-port=PORT new cluster REST/http port --cluster-ramsize=RAMSIZEMB per node ram quota in MB node-init OPTIONS: --node-init-data-path=PATH per node path to store data --node-init-index-path=PATH per node path to store index bucket-* OPTIONS: --bucket=BUCKETNAME bucket to act on --bucket-type=TYPE memcached or couchbase --bucket-port=PORT supports ASCII protocol and is auth-less --bucket-password=PASSWORD standard port, exclusive with bucket-port --bucket-ramsize=RAMSIZEMB ram quota in MB --bucket-replica=COUNT replication count --enable-flush=[0|1] enable/disable flush --enable-index-replica=[0|1] enable/disable index replicas --wait wait for bucket create to be complete before returning --force force to execute command without asking for confirmation --data-only compact datbase data only --view-only compact view data only setting-compacttion OPTIONS: --compaction-db-percentage=PERCENTAGE at which point database compaction is triggered --compaction-db-size=SIZE[MB] at which point database compaction is triggered --compaction-view-percentage=PERCENTAGE at which point view compaction is triggered --compaction-view-size=SIZE[MB] at which point view compaction is triggered --compaction-period-from=HH:MM allow compaction time period from --compaction-period-to=HH:MM allow compaction time period to --enable-compaction-abort=[0|1] allow compaction abort when time expires --enable-compaction-parallel=[0|1] allow parallel compaction for database and view setting-notification OPTIONS: --enable-notification=[0|1] allow notification setting-alert OPTIONS: --enable-email-alert=[0|1] allow email alert --email-recipients=RECIPIENT email recipents, separate addresses with , or ; --email-sender=SENDER sender email address --email-user=USER email server username --email-password=PWD email server password --email-host=HOST email server host --email-port=PORT email server port --enable-email-encrypt=[0|1] email encrypt --alert-auto-failover-node node was auto failover --alert-auto-failover-max-reached maximum number of auto failover nodes was reached --alert-auto-failover-node-down node wasn't auto failover as other nodes are down at the same time --alert-auto-failover-cluster-small node wasn't auto fail over as cluster was too small --alert-ip-changed node ip address has changed unexpectedly --alert-disk-space disk space used for persistent storgage has reached at least 90% capacity --alert-meta-overhead metadata overhead is more than 50% --alert-meta-oom bucket memory on a node is entirely used for metadata --alert-write-failed writing data to disk for a specific bucket has failed setting-autofailover OPTIONS: --enable-auto-failover=[0|1] allow auto failover --auto-failover-timeout=TIMEOUT (>=30) specify timeout that expires to trigger auto failover setting-xdcr OPTIONS: --max-concurrent-reps=[32] maximum concurrent replications per bucket, 8 to 256. --checkpoint-interval=[1800] intervals between checkpoints, 60 to 14400 seconds. --worker-batch-size=[500] doc batch size, 500 to 10000. --doc-batch-size=[2048]KB document batching size, 10 to 100000 KB --failure-restart-interval=[30] interval for restarting failed xdcr, 1 to 300 seconds --optimistic-replication-threshold=[256] document body size threshold (bytes) to trigger optimistic replication xdcr-setup OPTIONS: --create create a new xdcr configuration --edit modify existed xdcr configuration --delete delete existed xdcr configuration --xdcr-cluster-name=CLUSTERNAME cluster name --xdcr-hostname=HOSTNAME remote host name to connect to --xdcr-username=USERNAME remote cluster admin username --xdcr-password=PASSWORD remtoe cluster admin password xdcr-replicate OPTIONS: --create create and start a new replication --delete stop and cancel a replication --xdcr-from-bucket=BUCKET local bucket name to replicate from --xdcr-clucter-name=CLUSTERNAME remote cluster to replicate to --xdcr-to-bucket=BUCKETNAME remote bucket to replicate to |
[MB-8311] Table 7.3. Administration — couchbase Tool Commands should be updated with new features Created: 17/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | documentation |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Andrei Baranouski | Assignee: | Karen Zeller |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
according to http://www.couchbase.com/docs/couchbase-manual-2.0.pdf
Table 7.3. Administration — couchbase Tool Commands Command Description server-list List all servers in a cluster server-info Show details on one server server-add Add one or more servers to the cluster server-readd Re-add a server that was failed over to the cluster rebalance Start a cluster rebalancingCommand-line Interface for Administration 169 Command Description rebalance-stop Stop current cluster rebalancing rebalance-status Show status of current cluster rebalancing failover Failover one or more servers cluster-init Set the username, password and port of the cluster node-init Set node specific parameters bucket-list List all buckets in a cluster bucket-create Add a new bucket to the cluster bucket-edit Modify an existing bucket bucket-delete Delete an existing bucket bucket-flush Flush a given bucket help Show longer usage/help and examples but the full list should be for 2.0.2 as: ./couchbase-cli usage: couchbase-cli COMMAND CLUSTER [OPTIONS] CLUSTER is --cluster=HOST[:PORT] or -c HOST[:PORT] COMMANDs include server-list list all servers in a cluster server-info show details on one server server-add add one or more servers to the cluster server-readd readd a server that was failed over rebalance start a cluster rebalancing rebalance-stop stop current cluster rebalancing rebalance-status show status of current cluster rebalancing failover failover one or more servers cluster-init set the username,password and port of the cluster cluster-edit modify cluster settings node-init set node specific parameters bucket-list list all buckets in a cluster bucket-create add a new bucket to the cluster bucket-edit modify an existing bucket bucket-delete delete an existing bucket bucket-flush flush all data from disk for a given bucket bucket-compact compact database and index data setting-compaction set auto compaction settings setting-notification set notification settings setting-alert set email alert settings setting-autofailover set auto failover settings setting-xdcr set xdcr related settings xdcr-setup set up XDCR connection xdcr-replicate xdcr operations help show longer usage/help and examples |
[MB-8215] [windows] firewalled node is seen as healthy Created: 08/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | None |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Iryna Mironava | Assignee: | Deepkaran Salooja |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
build 2.0.2-786
<manifest><remote name="couchbase" fetch="git://10.1.1.210/"/><remote name="membase" fetch="git://10.1.1.210/"/><remote name="apache" fetch="git://github.com/apache/"/><remote name="erlang" fetch="git://github.com/erlang/"/><default remote="couchbase" revision="master"/><project name="tlm" path="tlm" revision="8adbb64f2fd38c89cd8e2f21e49d593577ca548f"><copyfile dest="Makefile" src="Makefile.top"/></project><project name="bucket_engine" path="bucket_engine" revision="2a797a8d97f421587cce728f2e6aa2cd42c8fa26"/><project name="ep-engine" path="ep-engine" revision="b27577b5e1f476a50432b5f57549821b2c189cc6"/><project name="libconflate" path="libconflate" revision="c0d3e26a51f25a2b020713559cb344d43ce0b06c"/><project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/><project name="libvbucket" path="libvbucket" revision="026c79ae424a6daed4bb9345e86cc8fc21759b28"/><project name="couchbase-cli" path="couchbase-cli" revision="f550cdac33c231a13b6025a281d6308a518bcab2" remote="couchbase"/><project name="memcached" path="memcached" revision="b6ceb46fc26ac6f1d6be7a5866d6c6c0f6e6d32a" remote="membase"/><project name="moxi" path="moxi" revision="4b391021af7a453bf88716724d2c644916ebd969"/><project name="ns_server" path="ns_server" revision="477f595ebbfc6ef1429a7ba7215830fd56269688"/><project name="portsigar" path="portsigar" revision="159b6179ea8a3c2075ee9eb2afa6f91c98c0fda6"/><project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/><project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/><project name="couchbase-python-client" path="couchbase-python-client" revision="d443169c0694fca1be67d8f6934a8c50f0175ee7"/><project name="couchdb" path="couchdb" revision="586e4bb73b92db4362192616370c4e3edb8c34a0"/><project name="couchdbx-app" path="couchdbx-app" revision="ce8722ea78596663a1932881e1ea51af0164a313"/><project name="couchstore" path="couchstore" revision="8de31a9e4232688de0b0fa70e218601881cdd0af"/><project name="geocouch" path="geocouch" revision="ed9ad43aa361df0829262fef811b5236331b44c8"/><project name="testrunner" path="testrunner" revision="e98e0059f8da7927a9e5c7896d8a69e5656befbb"/><project name="healthchecker" path="healthchecker" revision="53b4ae787cb93f53c3dbaa90c266882a211ff4d8"/><project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/><project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/><project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/><project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/><project name="gperftools" path="gperftools" revision="8f60ba949fb8576c530ef4be148bff97106ddc59" remote="couchbase"/><project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/></manifest> |
||
| Operating System: | Windows 64-bit |
| Description |
|
.29 node was firewalled
Domain Profile Settings: ---------------------------------------------------------------------- State ON Firewall Policy BlockInbound,AllowOutbound LocalFirewallRules N/A (GPO-store only) LocalConSecRules N/A (GPO-store only) InboundUserNotification Disable RemoteManagement Disable UnicastResponseToMulticast Enable Logging: LogAllowedConnections Disable LogDroppedConnections Disable FileName %systemroot%\system32\LogFiles\Firewall\pfirewall.log MaxFileSize 4096 Private Profile Settings: ---------------------------------------------------------------------- State ON Firewall Policy BlockInbound,AllowOutbound LocalFirewallRules N/A (GPO-store only) LocalConSecRules N/A (GPO-store only) InboundUserNotification Disable RemoteManagement Disable UnicastResponseToMulticast Enable Logging: LogAllowedConnections Disable LogDroppedConnections Disable FileName %systemroot%\system32\LogFiles\Firewall\pfirewall.log MaxFileSize 4096 Public Profile Settings: ---------------------------------------------------------------------- State ON Firewall Policy BlockInbound,AllowOutbound LocalFirewallRules N/A (GPO-store only) LocalConSecRules N/A (GPO-store only) InboundUserNotification Disable RemoteManagement Disable UnicastResponseToMulticast Enable Logging: LogAllowedConnections Disable LogDroppedConnections Disable FileName %systemroot%\system32\LogFiles\Firewall\pfirewall.log MaxFileSize 4096 Ok. |
| Comments |
| Comment by Iryna Mironava [ 08/May/13 ] |
|
https://s3.amazonaws.com/bugdb/jira/MB-8215/16fc64ab/172.27.33.26-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8215/16fc64ab/172.27.33.27-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8215/16fc64ab/172.27.33.29-diag.zip |
| Comment by Maria McDuff [ 08/May/13 ] |
|
iryna, can you check if the couchbase-ports are enabled or disabled?
did you also specifically disabled ports for couchbase? Deep -- since Iryna is on vacation, can you take a look? thanks. |
| Comment by Aliaksey Artamonau [ 08/May/13 ] |
| I see that .29 receives fresh heartbeats from other nodes when it supposed to be firewalled. I know nothing about windows firewall but it seems that you're doing something wrong. |
| Comment by Deepkaran Salooja [ 17/May/13 ] |
|
Tested below on a 4 node windows cluster(2.0.2-803-rel)
1. Created a 4 node cluster and default bucket. 2. Enabled windows firewall on node 10.3.2.25. Verified connecting to this node by following means: bash> rdesktop 10.3.2.25 Autoselected keyboard map en-us ERROR: 10.3.2.25: unable to connect bash> ssh Administrator@10.3.2.25 ssh: connect to host 10.3.2.25 port 22: Connection timed out Http access returns - Error 118 (net::ERR_CONNECTION_TIMED_OUT): The operation timed out. http://10.3.2.25:8091/ [root@caper-012 bin]# ./cbstats 10.3.2.25:11210 all Stats '' are not available from the requested engine. But still the node .25 is shown as healthy in the cluster. Enabled auto-failover. But .25 didn't get failed over. 3. Loaded 10k items in the default bucket. 7.5k items get loaded. No error is returned to the client. Cluster is available for debugging at http://10.3.2.23:8091 Attaching logs from 3 nodes as I can't login to the .25 node now. |
| Comment by Deepkaran Salooja [ 17/May/13 ] |
|
https://s3.amazonaws.com/bugdb/jira/MB-8215/e9125b6b/10.3.2.23-5172013-457-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8215/e9125b6b/10.3.2.24-5172013-458-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8215/e9125b6b/10.3.2.73-5172013-52-diag.zip |
| Comment by Aliaksey Artamonau [ 17/May/13 ] |
|
Our auto failover mechanism is based on the heartbeats that every node sends to every other node every 5 seconds. Of course we don't print out all of the heartbeats to the logs. But we do log all node statuses every 60 seconds. Every status has a timestamp attached to it. So here's the differences between pairs of consecutive timestamps for node .25 in microseconds starting from the time when auto failover was enabled: [60000000,60000000,59999999,60000001,60000000,60000000, 60000000,60000000,60000000,60000006,59999994,60000000, 60000000,60000000,60000000,60000000,60000002,59999998, 60000000,59999999,60000003,59999998,60000000,60000000, 59999999,60000001,60000000,60000000,60000000,60000000, 59999999,60000001,60000000,60000000,59999999,60000001, 60000000,60000000,60000000,60000000,59999999,60000001, 60000000,60000000,60000002,59999998,60000000,60000000, 60000000,60000000,60000000,59999999,60000001,60000000, 60000000,60000000,60000000,60000000,60000004,59999996, 60000000,60000000,60000000,60000000,59999999,60000001, 60000000,60000000,59999999,60000003,59999998,59999999, 60000001,60000000,60000000,60000000,60000000,60000000, 60000000,60000000,60000000,60000000,60000000,60000000, 59999999,60000001,60000000,59999999,60000001,60000000, 60000000,60000000,60000000,60000000,60000000,59999999, 60000001,60000000,59999999,60000001,60000000,59999999, 60000001,60000000,60000000,59999999,60000001,59999999, 60000001,60000000,60000000,59999999,60000001,59999999, 60000001,60000000,59999999,60000001,60000000,60000002, 59999998,60000000,60000000,60000000,60000000,60000000, 60000000,59999999,60000000,60000001,59999999,60000000, 60000001,59999999,60000001,60000000,60000000,59999999, 60000001,59999999,60000001,60000000,59999999,60000001, 60000000,60000000,59999999,60000001,59999999,60000000, 60000001,60000000,59999999,60000001,60000000,60000000, 59999999,60000001,60000000,59999999,60000000,60000001, 60000000,59999999,60000001,60000006,59999993,60000001, 60000000,60000000,60000000,59999999,60000001,60000000, 60000000,60000000,59983999,60016001,60000000,60000000, 60000000,60000000,59999999,60000015,59999986,60000000, 60000000,59999999,60000001,60000000,60000000,59999999, 60000000,60000001,60000000,60000000,60000002,59999998, 60000000,60000000,60000000,60000000,60000000,60000000, 59999999,60000001,59999999,60000001,60000000,60000000, 59999999,60000001,60000000,60000000,60000000,60000004] You can see that all of them are pretty close to 60 seconds. So if node .25 ever went down, it could not go down for more than 60 seconds. Here's another evidence for this: 2013-05-17 02:03:22.347 ns_node_disco:5:warning:node down(ns_1@10.3.2.75) - Node 'ns_1@10.3.2.75' saw that node 'ns_1@10.3.2.25' went down. Details: [{nodedown_reason, connection_closed}] 2013-05-17 02:03:30.441 ns_node_disco:4:info:node up(ns_1@10.3.2.75) - Node 'ns_1@10.3.2.75' saw that node 'ns_1@10.3.2.25' came up. Tags: [] I guess you waited for more than 60 seconds. And there's absolutely no evidence that auto failover process has ever seen node .25 being down for at least one heartbeat period. Though it's briefly seen nodes .24 and .75 down. So I compelled to reiterate my previous conclusion that you guys are doing something wrong with firewall configuration. Though I have no idea what exactly. Maybe already established connections are not dropped. Or maybe firewall is enabled only in one direction. Or maybe something else. I don't know. But something is definitely wrong. |
| Comment by Aliaksey Artamonau [ 17/May/13 ] |
| And btw, the node was auto failovered in the end. |
| Comment by Aleksey Kondratenko [ 17/May/13 ] |
|
Resolution from ns_server team lead. Please show us evidence that indeed all packets on already established connections are rejected after firewall is enabled. We do have good reasons to suspect that indeed firewall is not quite rejecting everything. And keep in mind that you're trying to use firewall to simulate network failure, so you have to make sure that indeed you're simulating what you think you're simulating. |
[MB-8312] we don't have Table 7.3. and Table 7.4. (clouhbase-cli stuff)in online documentation Created: 17/May/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | documentation |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | None |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Andrei Baranouski | Assignee: | Karen Zeller |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
I see that they are presented in http://www.couchbase.com/docs/couchbase-manual-2.0.pdf
Table 7.3. Administration — couchbase Tool Commands Table 7.4. Administration — Standard couchbase Tool Options but missed in online documentation for '7.4. couchbase-cli Tool' in http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-cmdline-couchbase-cli.html |
| Comments |
| Comment by Karen Zeller [ 17/May/13 ] |
|
HI Andre,
This is because I have been asked to comment out the 2.0.2 command line commands + options until we actually release 2.0.2. They only appear in the PDF sent for review. Thanks, Karen |
| Comment by Karen Zeller [ 17/May/13 ] |
|
HI Andre,
This is because I have been asked to comment out the 2.0.2 command line commands + options until we actually release 2.0.2. They only appear in the PDF sent for review. Thanks, Karen |
[MB-8199] [2.0.2 - RN + Docs] many requests to views causes resource leak, crash Created: 04/May/13 Updated: 17/May/13 Resolved: 16/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Matt Ingenthron | Assignee: | Abhinav Dangeti |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | customer, documentation | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | 4-core CPU, 16GB RAM, Linux | ||
| Operating System: | Centos 64-bit |
| Description |
|
In response to many view requests against the scatter/gather view merger, a node can allocate so many resources that it will fail to recover.
In one case, this did cause many timeouts in the log leading to max_restart_intensity: [error_logger:error,2013-04-25T15:23:26.047,ns_1@10.128.16.171:error_logger<0.6.0>:ale_error_logger_handler:log_report:72] =========================SUPERVISOR REPORT========================= Supervisor: {local,ns_node_disco_sup} Context: shutdown Reason: reached_max_restart_intensity Offender: [{pid,<0.17237.774>}, {name,ns_config_rep}, {mfargs,{ns_config_rep,start_link,[]}}, {restart_type,permanent}, {shutdown,1000}, {child_type,worker}] |
| Comments |
| Comment by Matt Ingenthron [ 04/May/13 ] |
| Note, I put this on 2.0.2 since I know it shouldn't be 2.1 and there does not appear to be a 2.0.3. I feared it would be lost if it didn't have a fixfor version. Please move as appropriate. |
| Comment by Maria McDuff [ 07/May/13 ] |
| per bug scrub, alk - can you chk if aleksey a. can take a look at this? |
| Comment by Aleksey Kondratenko [ 07/May/13 ] |
|
We know this problem so I don't believe we should look again. Fixing it for 2.0.2 feels a bit late but possible if really needed |
| Comment by Dipti Borkar [ 07/May/13 ] |
|
When you say, "we know this problem" can you elaborate on it a bit more? With more customers using views, they are likely to hit this as well. Can you help us understand the scenario a bit more? When this problem can happen? What is the probability of hitting this? |
| Comment by Aleksey Kondratenko [ 07/May/13 ] |
| If you send too many view requests to any node it'll swamp it and kill. I recall seeing that during pre-2.0 testing and there must be MB- somewhere. |
| Comment by Maria McDuff [ 09/May/13 ] |
|
per bug triage, upgrading to blocker.
the fix is to throttle the requests and not to crash/terminate. it's fine to be slow but not crash. alk k to take a look for 2.0.2 |
| Comment by Aliaksey Artamonau [ 16/May/13 ] |
| We merged a simple request that can be configured via internal settings: http://review.couchbase.org/26334. |
| Comment by Aleksey Kondratenko [ 16/May/13 ] |
|
It should also be noted that given we don't have experience how well this approach works in production we decided to have "unlimited" as default limits. We can try playing with that stuff in-house plus get some experience with customers after 2.0.2 is out and then we'll have enough data to enable it by default and set right limits. |
| Comment by Aleksey Kondratenko [ 16/May/13 ] |
| CHANGES text is here: http://review.couchbase.org/#/c/26361/2/CHANGES,unified |
| Comment by Matt Ingenthron [ 16/May/13 ] |
| Alk: we should request QE to develop a test for this. See it cause the problem in 2.0.1 and see it not cause the problem in 2.0.2, right? Assigning it to Maria for that purpose, then it should be closed perhaps when verified? Not sure what QE's process is here now. |
| Comment by Matt Ingenthron [ 16/May/13 ] |
| Maria: Can you work with the team on the appropriate way to test that this is fixed and won't cause other problems? |
| Comment by Maria McDuff [ 17/May/13 ] |
|
Abhinav, pls verify by: -instrumenting a test that sends many view requests. do manual first then automate (if you already have a test that does similar test scenario such as this, just tweak that and use it here for this verification testing). -verifying no crashes happen. if you observe, slowness, note it here. slowness is ok. -noting alk k's "unlimited" dflt limit set. verify all his changes on review link. -using stable build of 2.0.2 which should be built tonight or tomorrow. thanks. |
| Comment by Dipti Borkar [ 17/May/13 ] |
|
We also need to document this. 270 271 +* ( 272 + 273 + It's behavior is controlled by three parameters which can be set via 274 + /internalSettings REST endpoint: 275 + 276 + - restRequestLimit 277 + 278 + Maximum number of simultaneous connections each node should 279 + accept on REST port. Diagnostics related endpoints and 280 + /internalSettings are not counted. 281 + 282 + - capiRequestLimit 283 + 284 + Maximum number of simultaneous connections each node should 285 + accept on CAPI port. It should be noted that it includes XDCR 286 + connections. 287 + 288 + - dropRequestMemoryThresholdMiB 289 + 290 + The amount of memory used by Erlang VM that should not be 291 + exceeded. If it's exceeded the server will start dropping 292 + incoming connections. 293 + 294 + When the server decides to reject incoming connection because some 295 + limit was exceeded, it does so by responding with status code of 503 296 + and Retry-After header set appropriately (more or less). On REST 297 + port textual description of why request was rejected returned in a 298 + body. On CAPI port in CouchDB tradition a JSON object is returned 299 + with "error" and "reason" fields. 300 + 301 + By default all the thresholds are set to be unlimited. |
[MB-8307] Litmus dashboard becomes unresponsive after fetching big set of results Created: 17/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | None |
| Affects Version/s: | 2.0.1, 2.0.2, 2.1 |
| Fix Version/s: | None |
| Security Level: | Public |
| Type: | Task | Priority: | Major |
| Reporter: | Pavel Paulau | Assignee: | Ronnie Sun |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | Chrome latest, Firefox latest | ||
| Attachments: |
|
| Description |
|
I can see that it eats one of my cores for a couple of seconds.
Firefox tries to kill the script from time to time. |
| Comments |
| Comment by Ronnie Sun [ 17/May/13 ] |
|
Pavel, Would you plz provide more info: What every user action? What it the network latencies you have thru vpn? It worked fine on my computer, chrome and safari. |
| Comment by Ronnie Sun [ 17/May/13 ] |
| need more info |
| Comment by Pavel Paulau [ 17/May/13 ] |
|
Sorry, not every:
-- Initial load -- Clicking filter buttons, especially if there are many results (like ALL or KV) Latency is about 200-300ms. |
[MB-8066] Observed rebalance regression from 2.0.1 to 2.0.2 Created: 10/Apr/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | performance |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Abhinav Dangeti | Assignee: | Ronnie Sun |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
2.0.1-170, 2.0.2-755
Centos Physical servers |
||
| Attachments: |
|
| Description |
|
Noticed that Rebalance consistently taking longer time in 2.0.2-755 than 2.0.1-170.
Consider the last or last 2 results from each rebalance job on: http://dashboard.hq.couchbase.com/litmus/dashboard/?env=thor Also consider the graphs of rebalance jobs attached in http://www.couchbase.com/issues/browse/CBD-914, to note the increase in rebalance times. Logs and graphs from reb-out-large-2 attached. |
| Comments |
| Comment by Aleksey Kondratenko [ 17/Apr/13 ] |
|
attaching visualizations of last rebalances from logs above
and about 1.5x time difference is obvious there |
| Comment by Aleksey Kondratenko [ 17/Apr/13 ] |
|
Looks like difference is mostly due to waiting for checkpoint persistence. root@beta:~/src/altoros/moxi/ns_server# ./scripts/analyze-rebalance-waiting.rb rebalance-2.0.2-master-events.json total waitings: 41420.484508275986 total index waitings: 0.4656705856323242 total checkpoint waitings: 41420.01883769035 total vbucket moves: 512 root@beta:~/src/altoros/moxi/ns_server# ./scripts/analyze-rebalance-waiting.rb rebalance-2.0.1-master-events.json total waitings: 19539.872338056564 total index waitings: 0.6576874256134033 total checkpoint waitings: 19539.21465063095 total vbucket moves: 512 May I ask you guys to retry same litmus test on 2.0.2 but with reduced rebalanceMovesBeforeCompaction value ? |
| Comment by Aleksey Kondratenko [ 17/Apr/13 ] |
|
Also you will simplify my life a bit by grabbing master events in addition to collectinfos |
| Comment by Aleksey Kondratenko [ 17/Apr/13 ] |
| Hm. this is unrelated but apparently this hardware has some sort of forcefully disabled numa. |
| Comment by Aleksey Kondratenko [ 17/Apr/13 ] |
|
Here's another per-vbucket visualiazation: http://i.imgur.com/fI2EEeI.png We can see quite dramatic difference in per-vbucket move times. My guess is that current 2.0.2 ends up waiting 64 vbuckets at a time when 2.0.1 did 16 and vbucket persistence prioritization code in ep-engine is not effective anymore. That large gap between final takeover and last master event before that is likely waiting for second checkpoint persistence that I've found we are not sending to master events. So my proposal is the following: wait next build with better diagnostics and re-run 2.0.2 with default rebalanceMovesBeforeCompaction and with old value of 16. |
| Comment by Aleksey Kondratenko [ 17/Apr/13 ] |
| See above |
| Comment by Ketaki Gangal [ 19/Apr/13 ] |
|
Hi Aliaksey, Are there new diagnostic info pushed into the newer builds? "wait next build with better diagnostics" or does this mean, we get cbcollect_info on the bug as additional diags. We plan to re-run this w/ the latest 202 , 773. -Ketaki |
| Comment by Aleksey Kondratenko [ 19/Apr/13 ] |
|
I don't know what 773 means. I'm seeing 768 as latest in http://builds.hq.northscale.net/latestbuilds/ Anyway 2.0.2-768 has newer diagnostics I need. |
| Comment by Ketaki Gangal [ 19/Apr/13 ] |
|
Yes, 768*. Ok, will start up the runs w/ latest diags and changed default value. Thanks! Ketaki |
| Comment by Abhinav Dangeti [ 24/Apr/13 ] |
|
The latest run against build 772 still suggests that there is some regression in rebalance times.
Attached reb-in, reb-out and reb-swap comparison charts between 170 and 772. *Results from litmus jobs. |
| Comment by Ketaki Gangal [ 24/Apr/13 ] |
| Does this run also have the "rebalanceMovesBeforeCompaction " as 16? |
| Comment by Abhinav Dangeti [ 24/Apr/13 ] |
| Yes. |
| Comment by Aleksey Kondratenko [ 30/Apr/13 ] |
| I didn't explicitly mention it here, but for any perf runs that affect rebalance I also need so called master events gathered and attached. In general I can often extract them from collect infos but explicitly getting them is more convenient. |
| Comment by Abhinav Dangeti [ 30/Apr/13 ] |
|
For the reb-in-litmus job on 2.0.2-772-rel: All the logs that the parent collected from all the nodes: - https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus_logs/logs.zip From the client: Phase1: Stats from the load phase - https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus_logs/load_phase_client.zip Phase2: Stats from the hot load phase - https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus_logs/hot_load_phase_client.zip Phase3: Stats from the access phase - https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus_logs/access_phase_client.zip Master_events are available under each of the client's stats (phase1, phase2, phase3) |
| Comment by Aleksey Kondratenko [ 30/Apr/13 ] |
| Don't have all data I need. Abhinav is aware and will reassign |
| Comment by Abhinav Dangeti [ 01/May/13 ] |
|
reb-in-litmus job 2.0.2-772-rel:: rebalanceMovesBeforeCompaction=16 Logs: https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus/logs_772.zip Load_phase: https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus/client_load_phase.zip Hot_load_phase: https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus/client_hot_load_phase.zip Access_phase: https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus/client_access_phase.zip 2.0.1-170-rel: Logs: https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus/logs_170.zip Access_phase: https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus/170_client_access_phase.zip Master_events available in access_phase logs. Comparison graphs: https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus/reb-in-litmus.loop_2.0.1-170-rel-enterprise_2.0.2-772-rel-enterprise_litmuses_May-01-2013_19-11-29.pdf |
| Comment by Aleksey Kondratenko [ 02/May/13 ] |
|
Indeed still seeing about 2x difference in rebalance time: http://i.imgur.com/XBX3PyI.png With 2.0.2 being clearly slower |
| Comment by Aleksey Kondratenko [ 02/May/13 ] |
|
Traced this down to quite massive difference between 2.0.1 and 2.0.2 in time it takes for replica building tap stream to complete backfill phase. See below: 10 matches for "0.8075.0" in buffer: ns_server.debug.log 16317:[ns_server:info,2013-04-12T18:01:49.319,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:init:509]Setting {"10.6.2.38",11209} vbucket 1021 to state replica 16318:[ns_server:debug,2013-04-12T18:01:49.424,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:kill_tapname:1005]killing tap named: replication_building_1021_'ns_1@10.6.2.38' 16319:[rebalance:info,2013-04-12T18:01:49.428,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:init:568]Starting tap stream: 16331:[rebalance:debug,2013-04-12T18:01:49.429,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:init:605]upstream_sender pid: <0.8076.0> 16332:[rebalance:debug,2013-04-12T18:01:49.429,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:handle_call:336]Suspended had_backfill waiter 16334:[rebalance:info,2013-04-12T18:01:49.429,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:process_upstream:932]Initial stream for vbucket 1021 16335:[rebalance:debug,2013-04-12T18:01:49.430,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:process_downstream:896]Replied had_backfill: true to [{<17946.9219.1>,#Ref<17946.0.32.211193>}] 16365:[ns_server:debug,2013-04-12T18:01:49.585,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:process_upstream:946]seen backfill-close message 16424:[ns_server:debug,2013-04-12T18:01:50.011,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:confirm_sent_messages:727]Going to wait for reception of opaque message ack 16427:[rebalance:info,2013-04-12T18:01:50.011,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:do_confirm_sent_messages:701]Got close ack! 10 matches for "0.8649.0" in buffer: ns_server.debug.log|cbcollect_info_ns_1@10.6.2.38_20130502-011157 14362:[ns_server:info,2013-05-01T15:05:38.736,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:init:509]Setting {"10.6.2.38",11209} vbucket 1021 to state replica 14371:[ns_server:debug,2013-05-01T15:05:38.860,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:kill_tapname:1005]killing tap named: replication_building_1021_'ns_1@10.6.2.38' 14372:[rebalance:info,2013-05-01T15:05:38.865,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:init:568]Starting tap stream: 14384:[rebalance:debug,2013-05-01T15:05:38.866,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:init:605]upstream_sender pid: <0.8653.0> 14385:[rebalance:debug,2013-05-01T15:05:38.866,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:handle_call:336]Suspended had_backfill waiter 14387:[rebalance:info,2013-05-01T15:05:38.869,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:process_upstream:932]Initial stream for vbucket 1021 14388:[rebalance:debug,2013-05-01T15:05:38.870,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:process_downstream:896]Replied had_backfill: true to [{<15619.4214.1>,#Ref<15619.0.18.184492>}] 14431:[ns_server:debug,2013-05-01T15:05:39.409,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:process_upstream:946]seen backfill-close message 14464:[ns_server:debug,2013-05-01T15:05:39.714,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:confirm_sent_messages:727]Going to wait for reception of opaque message ack 14467:[rebalance:info,2013-05-01T15:05:39.715,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:do_confirm_sent_messages:701]Got close ack! Both sets of lines are replica building ebucketmigrator logs for same vbucket into same node (and hopefully same number if items). Former is for 2.0.1 and later is for 2.0.2. You can see about 500 milliseconds gap between tap initiation in "seen backfill-close message" message in 2.0.2 and much shorter ~100 milliseconds. And this is consistent with numbers I'm seeing for other nodes and other vbuckets. Given ns_server didn't have any changes in this area it must be something either in memcached or ep-engine. |
| Comment by Aleksey Kondratenko [ 02/May/13 ] |
| See above. Given I don't know how is best to own this on ep-engine side I'm just passing it to Ravi for further distribution. |
| Comment by Aleksey Kondratenko [ 02/May/13 ] |
| Sorry accidently assigned back to me |
| Comment by Ravi Mayuram [ 02/May/13 ] |
| Chiyoung, can you pls look into this. Thanks,ravi |
| Comment by Chiyoung Seo [ 06/May/13 ] |
|
Abhinav,
Here are the steps that I took for debugging this issue: 1) Install 2.0.1-185 package (final build for 2.0.1 release) and set up one node cluster 2) Load 1M items 3) Access phase with 200K items and the mixed load (50% set, 50% get) 4) Add the second node and rebalance Rebalance time ranged from 6 mins 20 secs to 6 mins 50 secs in multiple runs Repeat the above steps with the 2.0.2-786 build: Rebalance time ranged from 4 mins 40 secs to 5 mins 10 secs in multiple runs. Please let me know if your scenarios are quite different from the above ones. Can you also please compare 2.0.1-185 with 2.0.2-786 or latest build? |
| Comment by Abhinav Dangeti [ 07/May/13 ] |
|
Chiyoung, when I compared 2.0.1-185 against 2.0.2-786, I see a slight regression: http://qa.hq.northscale.net/job/litmuses-graph-loop/142/ Rebalance in time went from 727s to 800s However, the 2.0.1 GA was 2.0.1-170 (if I'm not wrong), so I've been comparing all 2.0.2 builds against this: http://qa.hq.northscale.net/job/litmuses-graph-loop/140/ Rebalance in time went from 491s to 800s So the bigger regression was from 2.0.1-170 to 2.0.1-185 then? http://qa.hq.northscale.net/job/litmuses-graph-loop/144/ |
| Comment by Chiyoung Seo [ 08/May/13 ] |
|
Abhinav, From the graphs that you attached, it seems to me that there is definitely a regression from 2.0.1-170 and 2.0.1-185. Let me take a look at the changes in ep_engine between these two builds. |
| Comment by Chiyoung Seo [ 08/May/13 ] |
|
There is only one change in ep-engine between 2.0.1-170 and 2.0.1-185: http://review.couchbase.org/#/c/25284/ The above commit simply changed a type off_t ==> cs_off_t in couchstore stats. It is NOT related to the rebalance regression. |
| Comment by Maria McDuff [ 08/May/13 ] |
|
per chiyoung, abhinav running more tests. abhinav -- pls update with your latest test result. |
| Comment by Abhinav Dangeti [ 14/May/13 ] |
|
Agree with chiyoung, multiple tests' results show that this rebalance regression (between 170 and 185 at least) is not consistent.
For example, consider this larger scale run, showing no rebalance regression, http://qa.hq.northscale.net/job/litmuses-graph-loop/171/ |
| Comment by Chiyoung Seo [ 14/May/13 ] |
|
Abhinav,
If you don't see any rebalance regression in a large-scale test, I don't think it's a regression. I leave it up to you to make a decision regarding if we close this bug or not. |
| Comment by Maria McDuff [ 14/May/13 ] |
| per bug triage, assigning to mike to investigate if this regression issue (between 2.0.1 170 vs 185) belongs to ep-engine. |
| Comment by Chiyoung Seo [ 15/May/13 ] |
| I need to discuss this with Abhinav to see if we can close this bug or not. |
| Comment by Chiyoung Seo [ 15/May/13 ] |
|
Discussed it further with Abhinav.
Here is the summary of the things that we observed so far: 1) Rebalance performance in 2.0.2 latest including MRW is comparable (or sometimes better) to 2.0.1 185 build. 2) There is a high variation in rebalance performance between 2.0.1 170 (Linux GA) and 2.0.1 185 build in a small scale test. However, the variation becomes very minor or almost no difference in a large-scale test. There is only one minor warning fix in ep-engine between 2.0.1 170 and 2.0.1 185 build, which doesn't have any impact on the rebalance performance. Abhinav will work with the perf team and do more small-scale tests between 2.0.1 170 and 2.0.1 185 in physical nodes cluster. Abhinav and I agreed that this is not a blocker at this time. |
| Comment by Maria McDuff [ 15/May/13 ] |
| per bug mtg, wayne to coordinate with ronnie on large scale test between 2.0.1 185 vs 2.0.2 with MRW latest build -- not toybuild. |
| Comment by Wayne Siu [ 16/May/13 ] |
|
Ronnie, Can we run the perf tests on the same hardware set with 2.0.1-185 and the latest 2.0.2? Can you give us an estimate when we can expect the results? Thanks. |
| Comment by Ronnie Sun [ 16/May/13 ] |
|
- There is no regression from 2.0.1-170 to 2.0.1-185 in the reb-in large scale test. - For 2.0.2 large scale tests, noticed the there is some bizarre cache miss in the acces phase for 2.0.2. Now falls back to medium scale test. ETA tmr morning. |
| Comment by Ronnie Sun [ 17/May/13 ] |
|
The medim scale tests look ok so far, there are minor regressions MIGHT be variance-related. Will continue the full sets (probably multiple runs) - then to large scale reb out/in tests. |
[MB-8259] Disk not flush on xdcr dest node Created: 13/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Ronnie Sun | Assignee: | Jin Lim |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | linux | ||
| Attachments: |
|
| Description |
|
On the ec2 cluster I used for cbrecovery, the disk write queue didn't flush for a very long time. (this symptom was observed twice)
It's an xdcr dest cluster, no other ops rather than xdcr incoming traffic. Fired as blocker, since it blocks the cbrecovery testing. The clusters are left intact as they are for developer to diagnose, please email back to me when the problem got resolved then I could continue the test. Thanks! Ronnie Destination: http://ec2-184-169-190-197.us-west-1.compute.amazonaws.com:8091 And the source: http://ec2-23-21-15-103.compute-1.amazonaws.com:8091 User: Administrator Passwd: password SSH: user: root passwd: couchbase Now attach the original email thread with diagnosis from Jin and Junyi: This should be a bug. There are a lot of crash in ns_server/baby_sitter and couchdb/writer process. Couchdb is unable to spawn writer process for some reasons. I am not 100% sure which caused which but It should be unrelated to XDCR. Please file a 2.0.2 bug and assign to Filipe to take first look. Thanks. Junyi On May 13, 2013, at 10:56 AM, Jin Lim wrote: Logs from the ec2-184-169-190-197.us-west-1.compute.amazonaws.com. I see following error. After the error I see couch_db/updater kept crashing. I will wait until Junyi's quick looking at it from the xdcr side then figure out what is next. Thanks, Jin [ns_server:info,2013-05-10T18:00:22.665,ns_1@ec2-184-169-190-197.us-west-1.compute.amazonaws.com:ns_config_log<0.562.0>:ns_config_log:handle_info:57]config change: rest_creds -> ******** [user:info,2013-05-10T18:00:22.718,ns_1@ec2-184-169-190-197.us-west-1.compute.amazonaws.com:<0.570.0>:ns_log:crash_consumption_loop:64]Port server moxi on node 'babysitter_of_ns_1@127.0.0.1' exited with status 0. Restarting. Messages: WARNING: curl error: transfer closed with outstanding read data remaining from: http://127.0.0.1:8091/pools/default/saslBucketsStreaming WARNING: curl error: couldn't connect to host from: http://127.0.0.1:8091/pools/default/saslBucketsStreaming ERROR: could not contact REST server(s): http://127.0.0.1:8091/pools/default/saslBucketsStreaming EOL on stdin. Exiting [user:info,2013-05-10T18:00:32.169,ns_1@ec2-184-169-190-197.us-west-1.compute.amazonaws.com:<0.570.0>:ns_log:crash_consumption_loop:64]Port server moxi on node 'babysitter_of_ns_1@127.0.0.1' exited with status 0. Restarting. Messages: 2013-05-10 18:00:22: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13) 2013-05-10 18:00:22: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8) On May 13, 2013, at 10:41 AM, Ronnie Sun wrote: Hi Jin, Junyi On the ec2 cluster I used for cbrecovery, the disk write queue didn't flush for a very long time. (this symptom was observed twice) It's an xdcr dest cluster, no other ops rather than xdcr incoming traffic. Everything looks normal, not sure if it's related to the recent reb regression. Would you please take a look at it, thanks! Ronnie Destination: http://ec2-184-169-190-197.us-west-1.compute.amazonaws.com:8091 And the source: http://ec2-23-21-15-103.compute-1.amazonaws.com:8091 User: Administrator Passwd: password SSH: user: root passwd: couchbase |
| Comments |
| Comment by Ronnie Sun [ 14/May/13 ] |
|
BTW: Please let me know when you are done investigation with the clusters, then I can use it for other tasks (cbrecovery), or save some ec2 $:)
|
| Comment by Maria McDuff [ 14/May/13 ] |
| per bug triage, junyi to diagnose/debug issue. |
| Comment by Jin Lim [ 14/May/13 ] |
| There was a deadlock issue from ep engine, which might have caused this disk not flush issue. Please wait for a build that includes http://review.couchbase.org/#/c/26300/. Thanks. |
| Comment by Jin Lim [ 14/May/13 ] |
|
Per Junyi, provide more detailed description about the deadlock here: * When persistent write completes ep engine broadcasts a header update notification via mccouch connection * For a whatever reason, network glitch or heavy load over the single port, mccouch causes ep engine to reset the connection * The deadlock issue was introduced with the recent code code changes in the above resetting connection path * Unfortunately once ep engine get into this deadlock, a flusher task that belongs to the deadlocked thread won't flush any mutation over infinite time * the fix has merged and Perf team may want to pick up tomorrow's build and rerun this test * again, this deadlock fix may or may not address the xdcr issue here but worth a try ;) |
| Comment by Junyi Xie [ 14/May/13 ] |
|
Ronnie,
I will keep investigating. At the meantime, since ep-engine has already fixed the deadlock issue, can you please rerun the test with latest build with Jin's fix? Thanks. |
| Comment by Ronnie Sun [ 14/May/13 ] |
|
sure, will rerun as it gets thru the build system. Ronnie |
| Comment by Junyi Xie [ 15/May/13 ] |
|
Problem reproduced with build 803 with Jin's commit.
By some investigation with Damien, at the time flusher got stuck, the CouchDB is in good shape and we do not see file descriptors used up (via lsof on that node) by Erlang beam process. [root@ip-10-196-8-158 bin]# lsof -u couchbase | wc -l 439 This is verified by that the ep_engine process is idling at the time of stuck, The flusher is not waken up at all. Talked to Jin, there are two possibilities, 1) the stat disk_write_queue size itself is fooling us, or 2) there are bugs introduced by recent ep_engine changes that block the flusher forever. To verify, Ronnie will rerun the test with very small number of items. From XDCR perspective, this is a simple test case which we ran many times before but never saw this issue. In this test, XDCR has successfully replicated all 200M items to the memory of destination cluster, thus IMHO it is unlikely caused by the core XDCR code. Seems to me some recent underlying enhancement (ep_engine, storage, etc) may modify the flusher behavior incorrectly (this is just my guess which could be wrong). Jin, can you please take a look into ep_engine to see why the flusher is not waken up? Thanks. |
| Comment by Jin Lim [ 15/May/13 ] |
|
Sure, thanks. Will wait Ronnie's test result first. If this is in deed a bug in flusher not waking up - we probably had to hit the issue very earlier on as well. So this is very mysterious, scandalous bug ;( |
| Comment by Junyi Xie [ 15/May/13 ] |
| Ronnie's small test (1000 item only) does not see this issue. |
| Comment by Jin Lim [ 16/May/13 ] |
|
* If this is a truly ep engine + MRW feature induced bug, we should have run into the similar bug constantly regardless of i/o load * Nonetheless the latest MRW feature might have caused a previously circumvented issue into a real issue * Plan for debugging this issue will be: 1) Jin to build a special toy build instrumenting more logs + stats 2) Jin to add per dispatcher (write thread) monitoring stats to pin-point where/what/when stop flushing these items 3) Perf (Ronnie) to run the toy build and try to reproduce the same symptom |
| Comment by Junyi Xie [ 16/May/13 ] |
|
Thanks so much, Jin. Once Ronnie is done with your toybuild, we could work together to figure out what happened. |
| Comment by Maria McDuff [ 16/May/13 ] |
| per bug triage, ronnie is running right now with 100million items and will update this bug. he will re-assign back to jin. |
| Comment by Ronnie Sun [ 16/May/13 ] |
|
100M with 200 bytes avg value size, problem reproduced, screenshot and diag attached. cluster is left intact on http://ec2-184-169-190-197.us-west-1.compute.amazonaws.com:8091/ |
| Comment by Ronnie Sun [ 16/May/13 ] |
|
Hi Jin, Would you please take a look and advise next steps, Thanks! Ronnie |
| Comment by Jin Lim [ 16/May/13 ] |
|
A toy build is available to re-run the test. 2.0.0-XDCR001-toy at http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_toy-couchstore-x86_64_2.0.0-XDCR001-toy.rpm |
| Comment by Maria McDuff [ 16/May/13 ] |
| Assigning bk to ronnie. |
| Comment by Jin Lim [ 16/May/13 ] |
| For further detail about this toy build: this build is to determine whether the issue here is an incorrect stat or flusher not running (leading to possible inconsistent state btw memory and storage) |
| Comment by Ronnie Sun [ 17/May/13 ] |
|
still have the issue:
assign back to Jin. cluster on: http://ec2-184-169-190-197.us-west-1.compute.amazonaws.com:8091/ |
| Comment by Jin Lim [ 17/May/13 ] |
| It appears to be not the case of incorrect stat but relatively very small number of pending mutation on a replica vbucket aren't flushing. Continuing debugging the issue as the highest priority now. |
Couchbase logo needs to be updated on UI, desktop and program-settings icon
(MB-7804)
|
|
| Status: | Reopened |
| Project: | Couchbase Server |
| Component/s: | UI |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Technical task | Priority: | Blocker |
| Reporter: | Anil Kumar | Assignee: | Pavel Blagodov |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Comments |
| Comment by Anil Kumar [ 10/Apr/13 ] |
| Can you take look at the Spec and logo assets and let me know if you've what you need to make the changes. |
| Comment by Pavel Blagodov [ 13/Apr/13 ] |
| Updated http://review.couchbase.org/25647 |
| Comment by Pavel Blagodov [ 16/Apr/13 ] |
| http://review.couchbase.org/25697 |
| Comment by Maria McDuff [ 26/Apr/13 ] |
| tony, merged/fixed. can u verify/close. Thanks. |
| Comment by Anil Kumar [ 26/Apr/13 ] |
| minor correction to logo images we will fix resolve this bug. |
| Comment by Steve Yen [ 30/Apr/13 ] |
|
fyi that assets are needed for both windows and mac (but none are needed for linux (which has no U/I for its installer or desktop/menubar widgets)). Please attach here when they're ready (not in Dropbox, as we don't all have Dropbox access). Thanks. |
| Comment by Anil Kumar [ 02/May/13 ] |
| Attached the images |
| Comment by Pavel Blagodov [ 06/May/13 ] |
| http://review.couchbase.org/26117 |
| Comment by Anil Kumar [ 14/May/13 ] |
| Pavel, can you merge the changes and resolve the bug to QE for verifying it. thanks! |
| Comment by Maria McDuff [ 14/May/13 ] |
| tony, pls verify/ close. |
| Comment by Thuan Nguyen [ 16/May/13 ] |
|
Logo in couchbase server web console is not consistent with the one in couchbase.com web site Here is the example couchbase server web console http://10.3.2.46:8091/images/couchbase_small_2.0.2.png couchbase.com web site http://www.couchbase.com/sites/all/themes/nosql/logo.png we need to fix the logo in couchbase server web console match with the one in couchbase.com web site. There is a gap between couchbase and the red u shape as you see the logo in couchbase.com |
| Comment by Pavel Blagodov [ 17/May/13 ] |
| http://review.couchbase.org/26381 |
[MB-8003] investigate whether hot-fix commits were included in later releases Created: 01/Apr/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | build |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Task | Priority: | Critical |
| Reporter: | Phil Labee | Assignee: | Phil Labee |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
commits should be merged forward once they are found to be good.
This is especaily true for changes that are included in hot-fixes. Track the changes that were included in hot-fixes and verify which releases they are int. |
| Comments |
| Comment by Phil Labee [ 02/Apr/13 ] |
|
2.0.0 Hot-Fixes: CBSE-473: from 2.0.0-1988, need to trace commits |
| Comment by Wayne Siu [ 22/Apr/13 ] |
| Phil, can you confirm if these commits are in 2.0.1 and 2.0.2 branches? Once confirmed, can you comment, and update the ticket? Thanks. |
| Comment by Anil Kumar [ 14/May/13 ] |
| Phil, can you update the ticket? |
| Comment by Maria McDuff [ 16/May/13 ] |
| can you update this ticket by EOD, 5/16? If this is resolved, pls change to RESOLVED-FIXED state. Thanks. |
| Comment by Phil Labee [ 17/May/13 ] |
|
Update wiki pages: CBSE-473 :: http://hub.internal.couchbase.com/confluence/display/CR/CBSE-473+*+backup+taken+from+1.8.1+could+not+be+restored+fully+to+2.0+cluster There are changes that were merged to 2.0.1 but NOT merged to 2.0.2, in repos couchdbx-app and membase-cli. Unclear what the situation is in couchbase-python-client and couchdb because the commits listed in the manifests are no longer in the history of these repos. Likewise I couldn't find the commit for libconflate, but we stopped using the branches on libconflate due to a bad checkin. |
[MB-8241] Refactor set_view code Created: 10/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | view-engine |
| Affects Version/s: | 2.1 |
| Fix Version/s: | 2.1 |
| Security Level: | Public |
| Type: | Task | Priority: | Major |
| Reporter: | Volker Mische | Assignee: | Volker Mische |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
Refactor the MapReduce specific code into its own module. The goal is to have major code sharing between the MapReduce and that Spatial Views.
|
| Comments |
| Comment by Thuan Nguyen [ 17/May/13 ] |
|
Integrated in github-couchdb-preview #578 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/578/]) MB-8241: Refactor couch_set_view_updater module (Revision edfebea5c5db0338eecf6a8d0abb547829ca433b) MB-8241: Refactor couch_set_view_group module (Revision bddf446702dfd1d949c1eb30b01e060f13615a8e) MB-8241: Rename ViewBtreeStates to just ViewStates (Revision c76d9393d8ec5615836da966dc69a666ebce882e) MB-8241: Refactor couch_set_view_util module (Revision 326a463af1cc6162705c867794bb4ab44c99d569) MB-8241: Refactor couch_set_view_compactor module (Revision 3e627d2c8405ec9bf8bee1dbd0d28aa275a5abc9) Result = SUCCESS vmx : Files : * src/couch_set_view/Makefile.am * src/couch_set_view/src/couch_set_view_updater.erl * src/couch_set_view/include/couch_set_view.hrl * src/couch_set_view/src/mapreduce_view.erl vmx : Files : * src/couch_set_view/src/couch_set_view_group.erl * src/couch_set_view/src/couch_set_view.erl * src/couch_set_view/src/mapreduce_view.erl * src/couch_set_view/include/couch_set_view.hrl * src/couch_set_view/src/couch_set_view_util.erl vmx : Files : * src/couch_set_view/src/couch_set_view_util.erl vmx : Files : * src/couch_set_view/src/couch_set_view_util.erl * src/couch_set_view/src/mapreduce_view.erl vmx : Files : * src/couch_set_view/src/mapreduce_view.erl * src/couch_set_view/src/couch_set_view_compactor.erl |
| Comment by Thuan Nguyen [ 17/May/13 ] |
|
Integrated in github-couchdb-preview #579 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/579/]) MB-8241: Refactor couch_set_view module (Revision aed89e3d1f230f838cec5d0a8a32d5c68ce2cd50) MB-8241: Move MapReduce View specific stuff into its own record (Revision 880e127299b389586941810d17936b88327ae5b4) MB-8241: Make make_views_fun clearer (Revision 6da854cec35d1f8ec3757c7430f95026a92aff42) MB-8241: Move records into common header file (Revision ffef2b2c61ff5d8e66815c2937dd2e0579ffb818) Result = SUCCESS vmx : Files : * src/couch_set_view/test/22-compactor-cleanup.t * src/couch_set_view/test/14-duplicated-keys-per-doc.t * src/couch_set_view/test/13-progressive-cleanup.t * src/couch_set_view/test/06-main-compaction.t * src/couch_set_view/test/17-unindexable-partitions.t * src/couch_set_view/test/26-multiple-reductions.t * src/couch_set_view/src/couch_set_view_compactor.erl * src/couch_set_view/test/20-debug-params.t * src/couch_set_view/test/21-updater-cleanup.t * src/couch_set_view/test/24-updater-add-more-passive-partitions.t * src/couch_set_view/src/mapreduce_view.erl * src/couch_set_view/test/15-passive-partitions.t * src/couch_set_view/test/12-errors.t * src/couch_set_view/test/02-old-index-cleanup.t * src/couch_set_view/test/18-monitor-partition-updates.t * src/couch_set_view/test/25-util-stats.t * src/couch_set_view/test/19-compaction-retry.t * src/couch_set_view/test/05-replicas-transfer.t * src/couch_set_view/test/16-pending-transition.t * src/couch_set_view/test/11-updates-cleanup-many-views.t * src/couch_set_view/test/09-deletes-cleanup-many-views.t * src/couch_set_view/test/10-updates-cleanup.t * src/couch_set_view/test/08-deletes-cleanup.t * src/couch_set_view/src/couch_set_view_http.erl * src/couch_set_view/test/04-handle-db-deletes.t * src/couch_set_view/src/couch_set_view.erl * src/couch_set_view/test/07-replica-compaction.t * src/couch_set_view/test/03-db-compaction-file-leaks.t * src/couch_set_view/test/23-replica-group-missing.t vmx : Files : * src/couch_set_view/src/couch_set_view_compactor.erl * src/couch_set_view/test/22-compactor-cleanup.t * src/couch_set_view/include/couch_set_view.hrl * src/couch_set_view/test/16-pending-transition.t * src/couch_set_view/test/08-deletes-cleanup.t * src/couch_set_view/test/17-unindexable-partitions.t * src/couch_set_view/src/couch_set_view_updater.erl * src/couch_set_view/src/couch_set_view_util.erl * src/couch_set_view/src/couch_set_view_http.erl * src/couch_set_view/src/mapreduce_view.erl * src/couch_set_view/test/21-updater-cleanup.t * src/couch_set_view/test/14-duplicated-keys-per-doc.t * src/couch_set_view/test/19-compaction-retry.t * src/couch_set_view/test/09-deletes-cleanup-many-views.t * src/couch_set_view/test/11-updates-cleanup-many-views.t * src/couch_set_view/test/26-multiple-reductions.t * src/couch_set_view/test/24-updater-add-more-passive-partitions.t * src/couch_set_view/src/couch_set_view.erl * src/couch_set_view/src/couch_set_view_group.erl * src/couch_set_view/test/10-updates-cleanup.t * src/couch_set_view/test/05-replicas-transfer.t * src/couch_set_view/test/15-passive-partitions.t * src/couch_set_view/src/couch_set_view_mapreduce.erl * src/couch_set_view/test/13-progressive-cleanup.t vmx : Files : * src/couch_set_view/src/mapreduce_view.erl * src/couch_set_view/src/couch_set_view_group.erl vmx : Files : * src/couch_set_view/src/couch_set_view_updater.erl * src/couch_set_view/src/mapreduce_view.erl * src/couch_set_view/src/couch_set_view_updater.hrl |
[MB-8310] Couchdb gerrit changes for master must use manifest 2.1-unstable.xml Created: 17/May/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | build |
| Affects Version/s: | 2.1 |
| Fix Version/s: | None |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Filipe Manana | Assignee: | Phil Labee |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
2.1-stable.xml was abandoned, and any couchdb master gerrit change depends on couchstore revision more recent than the one listed in 2.1-stable.xml - this means the jobs couchdb-gerrit-views-master and couchdb-gerrit-views-pre-merge-master always fail.
|
| Comments |
| Comment by Volker Mische [ 17/May/13 ] |
| I've changed the builds to 2.1-unstable.xml. |
[MB-8308] Rebalance time regression in large scale DGM vperf tests Created: 17/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | ns_server, performance, view-engine |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Pavel Paulau | Assignee: | Pavel Paulau |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
4 physical boxes, 128GB ram, 24 cores, 2x SATA.
3 - > 4 build 2.0.2-799 |
||
| Operating System: | Centos 64-bit |
| Description |
|
For instance, with aggressive auto-compaction setting it's > 2x slower.
Need to summarize and post all results. |
[MB-7209] [RN 2.0.2 + Doc]2.0.0 cluster restore with data and index Created: 16/Nov/12 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | documentation, tools |
| Affects Version/s: | 2.0, 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Thuan Nguyen | Assignee: | Karen Zeller |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | info-request, system-test | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | all os | ||
| Description |
|
In our document showing how to restore from one cluster to another identical cluster (http://www.couchbase.com/docs/couchbase-manual-1.8/couchbase-backup-restore-restore.html at session 5.2.2.2. Restoring to a different cluster),
it shows only restore data. In 2.0.0, we have @indexes directory to store index. What is the instruction to do cluster restore (data and indexes) from one cluster to another cluster. |
| Comments |
| Comment by Aleksey Kondratenko [ 16/Nov/12 ] |
| AFAIK there's no need to do anything. Indexes will be automatically rebuilt. Or at least I think so. No sure we should care about that however. |
| Comment by Aleksey Kondratenko [ 16/Nov/12 ] |
| Pinging Farshid |
| Comment by Aleksey Kondratenko [ 16/Nov/12 ] |
| And Yaseen |
| Comment by Thuan Nguyen [ 16/Nov/12 ] |
| In large cluster with more than 30 million items, rebuild index may take more than a day to complete. So in cluster restore in 2.0.0, do we need to copy both data and indexes files to new cluster to skip rebuild initial index? |
| Comment by Farshid Ghods [ 16/Nov/12 ] |
|
this is a use case which we need to discuss for 2.1 or 3.0 release.
as part of cluster restore which is meant for large deployments we want to allow them to also copy their index files as well as their data files currently cbbackup/cbrestore indeed restore the data files but also definition of design documents . the cluster restore feature on the other hand is now designed to work for data files and not indexes. Tony' it is clear that indexes are not restored but can you confirm whether design docs are created or not after restore ? |
| Comment by Farshid Ghods [ 16/Nov/12 ] |
| please reassign to Yaseen after comemtning |
| Comment by Steve Yen [ 19/Nov/12 ] |
| looks like the instructions may need a scrub, too. |
| Comment by Steve Yen [ 20/Nov/12 ] |
| bug-scrub: since this is docs, just need alk to evaluate docs (he believes cluster-restore works) |
| Comment by Aleksey Kondratenko [ 21/Nov/12 ] |
| Tried manually. Everything works as expected. master couch database is restored as part of everything and indexes are back. Just being rebuilt after restore |
| Comment by Aleksey Kondratenko [ 21/Nov/12 ] |
|
The only thing that may be somewhat problematic is lack of _exact_ instructions of what data files are here (quoting from documentation):
The necessary steps for migrating data using this method are as follows: Take a backup of the data files of all nodes, using the above procedure. Alternately, shut down the couchbase-server on all nodes and copy the DB files. Install Couchbase Server (of at least version 1.7.1) on new nodes and cluster together. If using the web console to setup your cluster, a 'default' bucket will be created. Please delete this bucket before proceeding. |
| Comment by Farshid Ghods [ 09/Jan/13 ] |
|
per bug scrub.
the cluster restore feature on 2.0 is going to only restore data files. the user can use cbbackup/cbrestore tool to backup the ddoc definitions and restore it to the new cluster. and then they need to wait for index to build on the new cluster |
| Comment by Karen Zeller [ 27/Mar/13 ] |
| Removing documentation as component until we confirm this item for 1) release notes, or 2) required new content. |
| Comment by Karen Zeller [ 16/Apr/13 ] |
| Is this something that needs to be documented? If so, what is the point we want to make? |
| Comment by Thuan Nguyen [ 18/Apr/13 ] |
| I think we need to add more info for restore in 2.0.0 and later if user has design doc and view. After restore data to cluster, just re-create doc, view and indexes will be rebuild. |
| Comment by Maria McDuff [ 25/Apr/13 ] |
| abhinav, pls work with Karen for her to doc additional restore functionality/expected behavior. |
| Comment by Maria McDuff [ 10/May/13 ] |
| tony, before closing this bug, verify if there's any doc for karen to do. thanks. |
| Comment by Wayne Siu [ 14/May/13 ] |
| Updating the fix version to 2.0.2. Please review the current instruction, and provide addition/new steps if needed. |
| Comment by Anil Kumar [ 14/May/13 ] |
|
Tony, can you provide additional info and assign this to Karen for documentation. |
| Comment by Thuan Nguyen [ 14/May/13 ] |
|
Just talk to Aleksey, all we need to backup is database files and restore them to new cluster. In document for backup restore in 2.0,http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-backup-restore-restore.html I don't see how to restore docs and view in that page. We need to add how to restore views into that page. To restore docs and views, all user need to do is to re-create docs and views after restore data to new cluster. Then index will rebuild automatically. |
| Comment by Maria McDuff [ 17/May/13 ] |
|
karen, pls work with tony to get this doc'd for 2.0.2. thanks. |
[MB-8305] 2.1 manifests have reference to membase-cli Created: 17/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | None |
| Affects Version/s: | 2.1 |
| Fix Version/s: | 2.1 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Pavel Paulau | Assignee: | Phil Labee |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
membase/membase-cli instead of couchbase/couchbase-cli
For instance: https://github.com/couchbase/manifest/blob/master/2.1-stable.xml I'm not sure that we are using those manifest though. |
[MB-8304] Grommit synchronization is not automated Created: 17/May/13 Updated: 17/May/13 |
|
| Status: | Open |
| Project: | Couchbase Server |
| Component/s: | build |
| Affects Version/s: | 2.0.2, 2.1 |
| Fix Version/s: | None |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Pavel Paulau | Assignee: | Phil Labee |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
Currently any project update requires manual checkout at build slaves.
Apart from being annoying it introduces bugs. |
[MB-7849] cbtransfer crashes with Python 2.4 and couchstore-files as source Created: 01/Mar/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | build, tools |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Deepkaran Salooja | Assignee: | Deepkaran Salooja |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | build 2.0.1-170-rel | ||
| Description |
|
Steps to reproduce (build 170, VM to reproduce - 10.3.3.95): 1. Create default bucket and load 100k items using mcsoda. 2. Use the below command to transfer data to file: /opt/couchbase/bin/cbtransfer -v -v -v couchstore-files:///opt/couchbase/var/lib/couchbase/data/ /tmp/backup (there is sufficient disk space in /tmp) Getting the below error: [root@caper-012 ~]# /opt/couchbase/bin/cbtransfer -v -v -v couchstore-files:///opt/couchbase/var/lib/couchbase/data/ /tmp/backup 2013-03-01 03:19:31,359: mt cbtransfer... 2013-03-01 03:19:31,360: mt source : couchstore-files:///opt/couchbase/var/lib/couchbase/data/ 2013-03-01 03:19:31,360: mt sink : /tmp/backup 2013-03-01 03:19:31,360: mt opts : {'username': None, 'source_vbucket_state': 'active', 'destination_vbucket_state': 'active', 'verbose': 3, 'dry_run': False, 'extra': {'max_retry': 10.0, 'rehash': 0.0, 'nmv_retry': 1.0, 'cbb_max_mb': 100000.0, 'try_xwm': 1.0, 'batch_max_bytes': 400000.0, 'report_full': 2000.0, 'batch_max_size': 1000.0, 'report': 5.0, 'recv_min_bytes': 4096.0}, 'single_node': False, 'bucket_destination': None, 'destination_operation': None, 'threads': 4, 'key': None, 'password': None, 'id': None, 'bucket_source': None} 2013-03-01 03:19:31,361: mt source_class: <class 'pump_sfd.SFDSource'> 2013-03-01 03:19:31,395: mt sink_class: <class 'pump_bfd.BFDSink'> 2013-03-01 03:19:31,395: mt source_buckets: default 2013-03-01 03:19:31,395: mt bucket: default 2013-03-01 03:19:31,396: mt source_nodes: N/A 2013-03-01 03:19:31,411: mt enqueueing node: N/A 2013-03-01 03:19:31,411: w0 node: N/A 2013-03-01 03:19:31,509: s0 create_db: /tmp/backup/bucket-default/node-N%2FA/data-0000.cbb 2013-03-01 03:19:31,509: s0 connect_db: /tmp/backup/bucket-default/node-N%2FA/data-0000.cbb ....................python: Objects/obmalloc.c:765: PyObject_Malloc: Assertion `bp != ((void *)0)' failed. Aborted With less number of items e.g. 10k, this works fine. |
| Comments |
| Comment by Deepkaran Salooja [ 01/Mar/13 ] |
|
With build 1976(2.0), I am hitting the original issue filed in |
| Comment by Bin Cui [ 01/Mar/13 ] |
|
You can modify parameter cbb_max_mb to limit batch size. The default is 100000 MB. A small number may do it. Looks like python SDK fails to allocate memory for sqlite db operation. /opt/couchbase/bin/cbtransfer -v -v -v -x cbb_max_mb=1000 couchstore-files:///opt/couchbase/var/lib/couchbase/data/ /tmp/backup |
| Comment by Bin Cui [ 01/Mar/13 ] |
|
[root@caper-012 ~]# /opt/couchbase/bin/cbtransfer -v -v -v couchstore-files:/// opt/couchbase/var/lib/couchbase/data/ /tmp/backup -x cbb_max_mb=100 2013-03-01 10:55:35,071: mt cbtransfer... 2013-03-01 10:55:35,072: mt source : couchstore-files:///opt/couchbase/var/lib/ couchbase/data/ 2013-03-01 10:55:35,072: mt sink : /tmp/backup 2013-03-01 10:55:35,072: mt opts : {'username': None, 'source_vbucket_state': 'active', 'destination_vbucket_state': 'active', 'verbose': 3, 'dry_run': False , 'extra': {'max_retry': 10.0, 'rehash': 0.0, 'nmv_retry': 1.0, 'cbb_max_mb': 10 0.0, 'try_xwm': 1.0, 'batch_max_bytes': 400000.0, 'report_full': 2000.0, 'batch_ max_size': 1000.0, 'report': 5.0, 'recv_min_bytes': 4096.0}, 'single_node': Fals e, 'bucket_destination': None, 'destination_operation': None, 'threads': 4, 'key ': None, 'password': None, 'id': None, 'bucket_source': None} 2013-03-01 10:55:35,073: mt source_class: <class 'pump_sfd.SFDSource'> 2013-03-01 10:55:35,107: mt sink_class: <class 'pump_bfd.BFDSink'> 2013-03-01 10:55:35,108: mt source_buckets: default 2013-03-01 10:55:35,108: mt bucket: default 2013-03-01 10:55:35,108: mt source_nodes: N/A 2013-03-01 10:55:35,124: mt enqueueing node: N/A 2013-03-01 10:55:35,125: w0 node: N/A 2013-03-01 10:55:35,226: s0 create_db: /tmp/backup/bucket-default/node-N%2FA/d ata-0000.cbb 2013-03-01 10:55:35,227: s0 connect_db: /tmp/backup/bucket-default/node-N%2FA/ data-0000.cbb ...............Traceback (most recent call last): File "source/callbacks.c", line 206, in 'calling callback function' File "/opt/couchbase/lib/python/couchstore.py", line 358, in callback fn(DocumentInfo._fromStruct(docInfoPtr.contents, self)) File "/opt/couchbase/lib/python/pump_sfd.py", line 211, in change_callback cas, exp, flg = struct.unpack(SFD_REV_META, doc_info.revMeta) TypeError: unpack() argument 2 must be string or read-only buffer, not CArgObjec t Segmentation fault |
| Comment by Jin Lim [ 01/Mar/13 ] |
| Bin this isn't a regression from 2.0.1 but can you please advise on how likely (often) users may run into this? We need to figure out whether to push this to 2.0.2 or not based on your input. Thanks! |
| Comment by Jin Lim [ 01/Mar/13 ] |
|
From Bin: It’s a use case that we never test before. The good thing is the bug doesn’t sit in the critical path for backup/restore. So we can defer the fix to 2.0.2 . Moving this to 2.0.2. |
| Comment by Maria McDuff [ 25/Mar/13 ] |
| bug scrub: would be good to fix in 2.0.2 release. |
| Comment by Steve Yen [ 16/Apr/13 ] |
| Following up on Jin's comment, this is an edge case of usage; but from Bin's analysis, it exposes a bug that regular backup might also trigger. So leaving in 2.0.2 as Bin would like to address this for 2.0.2. |
| Comment by Maria McDuff [ 29/Apr/13 ] |
| per bug committee, this is critical for 2.0.2 release. must fix due to bkup that can affect customer. |
| Comment by Pavel Paulau [ 02/May/13 ] |
|
1. 100K isn't a magic number. The issue is occasional and large number of items increases probability. 2. I analyzed several core dumps, cbtransfer usually crashes because of segfault or because of: Objects/obmalloc.c:765: PyObject_Malloc: Assertion `bp != ((void *)0)' failed. Objects/obmalloc.c:953: PyObject_Free: Assertion `pool->ref.count > 0' failed File "source/callbacks.c", line 206, in 'calling callback function' File "/opt/couchbase/lib/python/couchstore.py", line 358, in callback fn(DocumentInfo._fromStruct(docInfoPtr.contents, self)) File "/opt/couchbase/lib/python/couchstore.py", line 128, in _fromStruct self = DocumentInfo(str(info.id)) TypeError: __str__ returned non-string (type buffer) backtraces in turn vary as well: #0 0x00002b8a11230e23 in PyObject_Malloc () from /usr/lib64/libpython2.4.so.1.0 #1 0x00002b8a112301cd in _PyObject_New () from /usr/lib64/libpython2.4.so.1.0 #2 0x00002b8a1630b950 in new_CArgObject () at source/callproc.c:288 #3 0x00002b8a16305a39 in PointerType_paramfunc (self=0x38) at source/_ctypes.c:564 #4 0x00002b8a1630b3de in ConvParam (obj=0x38, index=1, pa=0x2b8a1c157620) at source/callproc.c:477 #5 0x00002b8a1630bafd in _CallProc (pProc=0x2b8a163075b0 <string_at>, argtuple=0x1f152b48, flags=4097, argtypes=0x1ef82cb0, restype=0x1ef16790, checker=0x0) at source/callproc.c:959 #6 0x00002b8a16306bf3 in CFuncPtr_call (self=0x1eebf710, inargs=0x2, kwds=0x0) at source/_ctypes.c:3362 #7 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #8 0x00002b8a11262fbe in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #9 0x00002b8a112666d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #10 0x00002b8a11264e2f in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #11 0x00002b8a112666d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #12 0x00002b8a1121baa7 in ?? () from /usr/lib64/libpython2.4.so.1.0 #13 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #14 0x00002b8a1120b90f in ?? () from /usr/lib64/libpython2.4.so.1.0 #15 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #16 0x00002b8a1126032d in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.4.so.1.0 #17 0x00002b8a112445e8 in ?? () from /usr/lib64/libpython2.4.so.1.0 #18 0x00002b8a1122fe45 in PyObject_Str () from /usr/lib64/libpython2.4.so.1.0 #19 0x00002b8a1123a617 in ?? () from /usr/lib64/libpython2.4.so.1.0 #20 0x00002b8a1123ff53 in ?? () from /usr/lib64/libpython2.4.so.1.0 #21 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #22 0x00002b8a11262fbe in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #23 0x00002b8a112666d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #24 0x00002b8a11264e2f in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #25 0x00002b8a112666d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #26 0x00002b8a1121baa7 in ?? () from /usr/lib64/libpython2.4.so.1.0 #27 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #28 0x00002b8a1126032d in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.4.so.1.0 #29 0x00002b8a1630aacc in _CallPythonObject (cif=<value optimized out>, resp=0x2b8a1c158af0, args=0x2b8a1c158960, userdata=<value optimized out>) at source/callbacks.c:206 #30 closure_fcn (cif=<value optimized out>, resp=0x2b8a1c158af0, args=0x2b8a1c158960, userdata=<value optimized out>) at source/callbacks.c:252 #31 0x00002b8a1631054b in ffi_closure_unix64_inner (closure=0x2b8a1c15bab0, rvalue=0x2b8a1c158af0, reg_args=0x2b8a1c158a40, argp=0x2b8a1c158b10 "j\003") at /tmp/ctypes/source/libffi/src/x86/ffi64.c:563 #32 0x00002b8a16310800 in ffi_closure_unix64 () at /tmp/ctypes/source/libffi/src/x86/unix64.S:228 #33 0x00002b8a1651dc3d in lookup_callback (rq=<value optimized out>, k=0x2b8a1c158bb0, v=<value optimized out>) at src/couch_db.c:623 #34 0x00002b8a1651b714 in btree_lookup_inner (rq=0x2b8a1c158d10, diskpos=<value optimized out>, current=0, end=1) at src/btree_read.c:78 #35 0x00002b8a1651b608 in btree_lookup_inner (rq=0x2b8a1c158d10, diskpos=<value optimized out>, current=0, end=1) at src/btree_read.c:52 #36 0x00002b8a1651c08c in couchstore_changes_since (db=<value optimized out>, since=<value optimized out>, options=<value optimized out>, callback=<value optimized out>, ctx=<value optimized out>) at src/couch_db.c:667 #37 0x00002b8a163106d4 in ffi_call_unix64 () at /tmp/ctypes/source/libffi/src/x86/unix64.S:73 #38 0x00002b8a16310244 in ffi_call (cif=0x2b8a1c159090, fn=0x2b8a1651bf40 <couchstore_changes_since>, rvalue=0x2b8a1c158f90, avalue=0x2b8a1c158f50) at /tmp/ctypes/source/libffi/src/x86/ffi64.c:428 #39 0x00002b8a1630bce1 in _call_function_pointer (pProc=0x2b8a1651bf40 <couchstore_changes_since>, argtuple=0x1f1a2ef0, flags=<value optimized out>, argtypes=0x0, restype=0x1ef9cce0, checker=0x0) at source/callproc.c:668 #40 _CallProc (pProc=0x2b8a1651bf40 <couchstore_changes_since>, argtuple=0x1f1a2ef0, flags=<value optimized out>, argtypes=0x0, restype=0x1ef9cce0, checker=0x0) at source/callproc.c:991 #41 0x00002b8a16306bf3 in CFuncPtr_call (self=0x1f073e90, inargs=0x1f1a2ef0, kwds=0x0) at source/_ctypes.c:3362 #42 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #43 0x00002b8a11262fbe in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #44 0x00002b8a112666d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #45 0x00002b8a11264e2f in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #46 0x00002b8a112666d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #47 0x00002b8a1121bb9a in ?? () from /usr/lib64/libpython2.4.so.1.0 #48 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #49 0x00002b8a11263c1c in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #50 0x00002b8a11265256 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #51 0x00002b8a112666d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #52 0x00002b8a1121baa7 in ?? () from /usr/lib64/libpython2.4.so.1.0 #53 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #54 0x00002b8a1120b90f in ?? () from /usr/lib64/libpython2.4.so.1.0 #55 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #56 0x00002b8a1126032d in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.4.so.1.0 #57 0x00002b8a1128c96d in ?? () from /usr/lib64/libpython2.4.so.1.0 #58 0x00002b8a1150a83d in start_thread () from /lib64/libpthread.so.0 #59 0x00002b8a11e7ffad in clone () from /lib64/libc.so.6 #0 0x00002b7dd51b0285 in raise () from /lib64/libc.so.6 #1 0x00002b7dd51b1d30 in abort () from /lib64/libc.so.6 #2 0x00002b7dd51a9716 in __assert_fail () from /lib64/libc.so.6 #3 0x00002b7dd4606204 in PyObject_Malloc () from /usr/lib64/libpython2.4.so.1.0 #4 0x00002b7dd45decbe in ?? () from /usr/lib64/libpython2.4.so.1.0 #5 0x00002b7dd45def36 in ?? () from /usr/lib64/libpython2.4.so.1.0 #6 0x00002b7dd4614f53 in ?? () from /usr/lib64/libpython2.4.so.1.0 #7 0x00002b7dd45da7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #8 0x00002b7dd4637fbe in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #9 0x00002b7dd463b6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #10 0x00002b7dd45f0b9a in ?? () from /usr/lib64/libpython2.4.so.1.0 #11 0x00002b7dd45da7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #12 0x00002b7dd4638c1c in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #13 0x00002b7dd463a256 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #14 0x00002b7dd463b6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #15 0x00002b7dd45f0aa7 in ?? () from /usr/lib64/libpython2.4.so.1.0 #16 0x00002b7dd45da7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #17 0x00002b7dd45e090f in ?? () from /usr/lib64/libpython2.4.so.1.0 #18 0x00002b7dd45da7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #19 0x00002b7dd463532d in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.4.so.1.0 #20 0x00002b7dd466196d in ?? () from /usr/lib64/libpython2.4.so.1.0 #21 0x00002b7dd48df83d in start_thread () from /lib64/libpthread.so.0 #22 0x00002b7dd5254fad in clone () from /lib64/libc.so.6 #0 0x00002b00a0fa2285 in raise () from /lib64/libc.so.6 #1 0x00002b00a0fa3d30 in abort () from /lib64/libc.so.6 #2 0x00002b00a0f9b716 in __assert_fail () from /lib64/libc.so.6 #3 0x00002b00a03f8204 in PyObject_Malloc () from /usr/lib64/libpython2.4.so.1.0 #4 0x00002b00a040139f in PyString_FromString () from /usr/lib64/libpython2.4.so.1.0 #5 0x00002b00a04014d9 in PyString_InternFromString () from /usr/lib64/libpython2.4.so.1.0 #6 0x00002b00a03f58b6 in PyObject_GetAttrString () from /usr/lib64/libpython2.4.so.1.0 #7 0x00002b00a54d1b62 in ConvParam (obj=0x126d5510, index=1, pa=0x2b00ab31ec90) at source/callproc.c:562 #8 0x00002b00a54d1f7a in _CallProc (pProc=0x2b00a56e3300 <couchstore_open_doc_with_docinfo>, argtuple=0x124ec680, flags=4097, argtypes=0x0, restype=0x1231a980, checker=0x0) at source/callproc.c:966 #9 0x00002b00a54ccd0a in CFuncPtr_call (self=<value optimized out>, inargs=<value optimized out>, kwds=0x0) at source/_ctypes.c:3362 #10 0x00002b00a03cc7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #11 0x00002b00a0429fbe in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #12 0x00002b00a042d6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #13 0x00002b00a042be2f in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #14 0x00002b00a042d6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #15 0x00002b00a042be2f in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #16 0x00002b00a042d6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #17 0x00002b00a03e2aa7 in ?? () from /usr/lib64/libpython2.4.so.1.0 #18 0x00002b00a03cc7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #19 0x00002b00a042732d in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.4.so.1.0 #20 0x00002b00a54d0ef3 in _CallPythonObject (cif=<value optimized out>, resp=0x2b00ab31fac0, args=<value optimized out>, userdata=<value optimized out>) at source/callbacks.c:206 #21 closure_fcn (cif=<value optimized out>, resp=0x2b00ab31fac0, args=<value optimized out>, userdata=<value optimized out>) at source/callbacks.c:252 #22 0x00002b00a54d6639 in ffi_closure_unix64_inner (closure=0x2b00ab322e10, rvalue=0x2b00ab31fac0, reg_args=0x2b00ab31fa10, argp=0x2b00ab31fae0 "\021\005") at /home/buildbot/centos-x64-201-builder/build/build/ctypes/source/libffi/src/x86/ffi64.c:563 #23 0x00002b00a54d6fd4 in ffi_closure_unix64 () at /home/buildbot/centos-x64-201-builder/build/build/ctypes/source/libffi/src/x86/unix64.S:228 #24 0x00002b00a56e4c3d in lookup_callback (rq=<value optimized out>, k=0x2b00ab31fb80, v=<value optimized out>) at src/couch_db.c:623 #25 0x00002b00a56e2714 in btree_lookup_inner (rq=0x2b00ab31fce0, diskpos=<value optimized out>, current=0, end=1) at src/btree_read.c:78 #26 0x00002b00a56e2608 in btree_lookup_inner (rq=0x2b00ab31fce0, diskpos=<value optimized out>, current=0, end=1) at src/btree_read.c:52 #27 0x00002b00a56e308c in couchstore_changes_since (db=<value optimized out>, since=<value optimized out>, options=<value optimized out>, callback=<value optimized out>, ctx=<value optimized out>) at src/couch_db.c:667 #28 0x00002b00a54d6ea8 in ffi_call_unix64 () at /home/buildbot/centos-x64-201-builder/build/build/ctypes/source/libffi/src/x86/unix64.S:73 #29 0x00002b00a54d6c35 in ffi_call (cif=0x2b00ab320080, fn=0x2b00a56e2f40 <couchstore_changes_since>, rvalue=<value optimized out>, avalue=<value optimized out>) at /home/buildbot/centos-x64-201-builder/build/build/ctypes/source/libffi/src/x86/ffi64.c:428 #30 0x00002b00a54d215d in _call_function_pointer (pProc=0x2b00a56e2f40 <couchstore_changes_since>, argtuple=0x124ba7d0, flags=4097, argtypes=0x0, restype=0x1231a980, checker=0x0) at source/callproc.c:668 #31 _CallProc (pProc=0x2b00a56e2f40 <couchstore_changes_since>, argtuple=0x124ba7d0, flags=4097, argtypes=0x0, restype=0x1231a980, checker=0x0) at source/callproc.c:991 #32 0x00002b00a54ccd0a in CFuncPtr_call (self=<value optimized out>, inargs=<value optimized out>, kwds=0x0) at source/_ctypes.c:3362 #33 0x00002b00a03cc7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #34 0x00002b00a0429fbe in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #35 0x00002b00a042d6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #36 0x00002b00a042be2f in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #37 0x00002b00a042d6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #38 0x00002b00a03e2b9a in ?? () from /usr/lib64/libpython2.4.so.1.0 #39 0x00002b00a03cc7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #40 0x00002b00a042ac1c in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #41 0x00002b00a042c256 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #42 0x00002b00a042d6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #43 0x00002b00a03e2aa7 in ?? () from /usr/lib64/libpython2.4.so.1.0 #44 0x00002b00a03cc7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #45 0x00002b00a03d290f in ?? () from /usr/lib64/libpython2.4.so.1.0 #46 0x00002b00a03cc7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #47 0x00002b00a042732d in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.4.so.1.0 #48 0x00002b00a045396d in ?? () from /usr/lib64/libpython2.4.so.1.0 #49 0x00002b00a06d183d in start_thread () from /lib64/libpthread.so.0 #50 0x00002b00a1046fad in clone () from /lib64/libc.so.6 3. Installing Python 2.6 on this machine fixed the problem. However I'd not blame Python 2.4 - as I mentioned before absolutely identical setup worked pretty well on other machine. 4. ctypes that we ship is another suspect. That was a workaround, not necessarily it addresses all edge cases. And couchstore + Python 2.4 is one of them, in fact all other sources work fine. |
| Comment by Pavel Paulau [ 02/May/13 ] |
|
Deep,
May I ask you reproduce it on any other machine? This is very confusing issue... |
| Comment by Deepkaran Salooja [ 06/May/13 ] |
|
Reproduced on VM 10.3.3.104. 100k items loaded with mcsoda on default bucket. Build 2.0.2-781-rel. Crash is reproducible. [root@caper-016 ~]# /opt/couchbase/bin/cbtransfer -v -v -v couchstore-files:///opt/couchbase/var/lib/couchbase/data/ /tmp/backup 2013-05-06 10:41:23,838: mt cbtransfer... 2013-05-06 10:41:23,838: mt source : couchstore-files:///opt/couchbase/var/lib/couchbase/data/ 2013-05-06 10:41:23,839: mt sink : /tmp/backup 2013-05-06 10:41:23,839: mt opts : {'username': None, 'source_vbucket_state': 'active', 'destination_vbucket_state': 'active', 'verbose': 3, 'dry_run': False, 'extra': {'max_retry': 10.0, 'rehash': 0.0, 'data_only': 0.0, 'nmv_retry': 1.0, 'cbb_max_mb': 100000.0, 'try_xwm': 1.0, 'batch_max_bytes': 400000.0, 'report_full': 2000.0, 'batch_max_size': 1000.0, 'report': 5.0, 'design_doc_only': 0.0, 'recv_min_bytes': 4096.0}, 'single_node': False, 'bucket_destination': None, 'destination_operation': None, 'vbucket_list': None, 'threads': 4, 'key': None, 'password': None, 'id': None, 'bucket_source': None} 2013-05-06 10:41:23,840: mt source_class: <class 'pump_sfd.SFDSource'> 2013-05-06 10:41:24,094: mt sink_class: <class 'pump_bfd.BFDSink'> 2013-05-06 10:41:24,095: mt source_buckets: default 2013-05-06 10:41:24,095: mt bucket: default 2013-05-06 10:41:24,096: mt source_nodes: N/A 2013-05-06 10:41:24,108: mt enqueueing node: N/A 2013-05-06 10:41:24,108: w0 node: N/A 2013-05-06 10:41:24,217: s0 create_db: /tmp/backup/bucket-default/node-N%2FA/data-0000.cbb 2013-05-06 10:41:24,217: s0 connect_db: /tmp/backup/bucket-default/node-N%2FA/data-0000.cbb ...Segmentation fault |
| Comment by Pavel Paulau [ 08/May/13 ] |
| In fact 10.3.3.104 is a virtual copy of 10.3.3.95. |
| Comment by Pavel Paulau [ 13/May/13 ] |
|
http://review.couchbase.org/#/c/26260/ |
| Comment by Maria McDuff [ 14/May/13 ] |
| pls verify / close. |
| Comment by Pavel Paulau [ 15/May/13 ] |
| Actually it wasn't resolved. We still ship old version of ctypes, without direct access to builders I can't fix that. |
| Comment by Phil Labee [ 15/May/13 ] |
|
tlm commit af7b66f4c81b890adfcb8b7520a89436f2e4d0cd adds a 'clean-all' target and a 'clean-grommit' target The 'clean-grommit' target will remove directories named: ctypes* curl* google-perftools* gperftools* libevent* sqlite* Buildbot master.cfg now uses the 'clean all' target for "make clean". |
| Comment by Maria McDuff [ 16/May/13 ] |
|
Phil, Is this fixed? can you confirm. pavel's last comment states it's not resolved.... |
| Comment by Pavel Paulau [ 17/May/13 ] |
| It must be fixed. At least build 805 includes required version of ctypes. |
[MB-8153] [Doc'd] cbworkloadgen shows error import sqlite3 module Created: 24/Apr/13 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | documentation, tools |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Thuan Nguyen | Assignee: | Thuan Nguyen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | centos 5.7 64 bit | ||
| Description |
|
Install couchbase server 2.0.2-773
cbworkloadge does not work Error: root@cen-0408 thuan]# /opt/couchbase/bin/cbworkloadgen -h Error: could not import sqlite3 module [root@cen-0408 thuan]# cd /opt/couchbase/bin [root@cen-0408 bin]# ./cbworkloadgen -h Error: could not import sqlite3 module [root@cen-0408 bin]# ./cbworkloadgen Error: could not import sqlite3 module |
| Comments |
| Comment by Pavel Paulau [ 25/Apr/13 ] |
|
Interesting: # cat /etc/redhat-release CentOS release 5.8 (Final) # cat /opt/couchbase/VERSION.txt 2.0.2-774-rel # /opt/couchbase/bin/cbworkloadgen -h Usage: cbworkloadgen [options] Generate workload to destination. Examples: cbworkloadgen -n localhost:8091 cbworkloadgen -n 10.3.121.192:8091 -r .9 -i 100000 \ -s 100 -b my-other-bucket --threads=10 Options: -h, --help show this help message and exit -r .95, --ratio-sets=.95 set/get operation ratio -n 127.0.0.1:8091, --node=127.0.0.1:8091 node's ns_server ip:port -b default, --bucket=default insert data to a different bucket other than default -i 10000, --max-items=10000 number of items to be inserted -s 10, --size=10 minimum value size --prefix=pymc prefix to use for memcached keys or json ids -j, --json insert json data -l, --loop loop forever until interrupted by users -u USERNAME, --username=USERNAME REST username for cluster or server node -p PASSWORD, --password=PASSWORD REST password for cluster or server node -t 1, --threads=1 number of concurrent workers -v, --verbose verbose logging; more -v's provide more verbosity |
| Comment by Pavel Paulau [ 25/Apr/13 ] |
|
Thuan, could you provide output of: # ls -l /opt/couchbase/lib/python/ |
| Comment by Thuan Nguyen [ 26/Apr/13 ] |
|
this vm is available at 10.1.3.140 using key to login python version 2.7 |
| Comment by Bin Cui [ 26/Apr/13 ] |
| Looks like we cannot load sqlite again for this python version. |
| Comment by Bin Cui [ 26/Apr/13 ] |
|
-bash-3.2$ python Python 2.7 (r27:82500, Jul 29 2012, 09:49:59) [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sqlite3 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/python27/lib/python2.7/sqlite3/__init__.py", line 24, in <module> from dbapi2 import * File "/opt/python27/lib/python2.7/sqlite3/dbapi2.py", line 27, in <module> from _sqlite3 import * ImportError: No module named _sqlite3 -bash-3.2$ cd /opt/couchbase/lib/python -bash-3.2$ python Python 2.7 (r27:82500, Jul 29 2012, 09:49:59) [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from pysqlite2 import dbapi2 as sqlite3 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pysqlite2/dbapi2.py", line 27, in <module> from pysqlite2._sqlite import * ImportError: pysqlite2/_sqlite.so: undefined symbol: PyUnicodeUCS4_DecodeUTF8 >>> |
| Comment by Bin Cui [ 26/Apr/13 ] |
|
Quote: This is usually caused by a mismatch in the Unicode mode of the python interpreter and the extension module. Python can be built to use either 2-byte or 4-byte Unicode code points. If you build an extension on a Python interpreter that uses one, but use it on another, this error is the most common result. |
| Comment by Maria McDuff [ 29/Apr/13 ] |
| Per Bin, this is a release blocker... he's working with Pavel on this issue. |
| Comment by Pavel Paulau [ 29/Apr/13 ] |
| Thuan, just a quick question. How did you install Python 2.7 on this machine? Was it installed after Couchbase Server? |
| Comment by Pavel Paulau [ 29/Apr/13 ] |
| http://review.couchbase.org/#/c/25938/ |
| Comment by Thuan Nguyen [ 29/Apr/13 ] |
| On this vm, I just install couchbase server only. So I think python 2.7 is pre-installed before |
| Comment by Maria McDuff [ 06/May/13 ] |
| pls verify / close. |
| Comment by Thuan Nguyen [ 09/May/13 ] |
|
I still repro this bug in build 2.0.2-793 on centos 5.7 64 bit server 10.1.3.140 (same vm)
couchbase-server-enterprise_x86_64_2.0.2-793-rel.rpm [root@cen-0408 thuan]# rpm -i couchbase-server-enterprise_x86_64_2.0.2-793-rel.rpm Minimum RAM required : 4 GB System RAM configured : 3945388 kB Minimum number of processors required : 4 cores Number of processors on the system : 4 cores Starting couchbase-server[ OK ] You have successfully installed Couchbase Server. Please browse to http://cen-0408:8091/ to configure your server. Please refer to http://couchbase.com for additional resources. Please note that you have to update your firewall configuration to allow connections to the following ports: 11211, 11210, 11209, 4369, 8091, 8092 and from 21100 to 21299. By using this software you agree to the End User License Agreement. See /opt/couchbase/LICENSE.txt. [root@cen-0408 thuan]# /opt/couchbase/bin/cbworkloadgen -h Error: could not import sqlite3 module [[root@cen-0408 thuan]# /opt/couchbase/bin/sqlite3 SQLite version 3.7.2 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> [1]+ Stopped /opt/couchbase/bin/sqlite3 [root@cen-0408 thuan]# cd /opt/couchbase/bin/ [root@cen-0408 bin]# ./cbworkloadgen -h Error: could not import sqlite3 module [root@cen-0408 bin]# ./sqlite3 SQLite version 3.7.2 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> [2]+ Stopped ./sqlite3 [root@cen-0408 bin]# python -V Python 2.7 |
| Comment by Pavel Paulau [ 10/May/13 ] |
|
Normally Python 2.7 includes sqlite3, however you have custom installation that was compiled without sqlite support. So it obviously fails.
Expected: $ python2.7 -c "import sqlite3" $ echo $? 0 Your machine: $ python2.7 -c "import sqlite3" Traceback (most recent call last): File "<string>", line 1, in <module> File "/opt/python27/lib/python2.7/sqlite3/__init__.py", line 24, in <module> from dbapi2 import * File "/opt/python27/lib/python2.7/sqlite3/dbapi2.py", line 27, in <module> from _sqlite3 import * ImportError: No module named _sqlite3 Normal recommendation in such cases is to install sqlite-devel and rebuild Python. Or to use default OS setup. Addressing such edge cases is too expensive effort IMHO. This is my input, PMs may have other suggestions. |
| Comment by Bin Cui [ 10/May/13 ] |
|
The current assumption is that we support the following python environments: 1. python 2.4 which doesn't have sqlite3 bundled. we will install our bundled version. 2. python 2.5 and above. python will have its own version of sqlite3 installed. But if it doesn't meet our sqlite3 version requirement, we will install our bundled version. 3. This QA setup is something that we never meet before and it is not a standard environment, to say the least. |
| Comment by Anil Kumar [ 10/May/13 ] |
|
tony to test this on clean VM to verify if this repro. 1. installing python 2.7 on clean vm 2. check if it comes with sqllite3 already |
| Comment by Pavel Paulau [ 11/May/13 ] |
| 0. Package "sqlite-devel" must be installed *before* Python 2.7 installation. |
| Comment by Maria McDuff [ 13/May/13 ] |
|
karen, pls doc sqlite-devel need to be installed first prior to python 2.7. thanks. tony, pls verify / close. thanks. |
| Comment by Karen Zeller [ 16/May/13 ] |
|
Added to RN 2.0.2: <rnentry> <version ver="2.0.0m"/> <class id="fix"/> <issue type="cb" ref=" <rntext> <para> In the past when you used <command>cbworkloadgen</command> you see this error <literal>ImportError: No module named _sqlite3</literal>. This has been fixed.</para> </rntext> </rnentry> |
| Comment by Thuan Nguyen [ 16/May/13 ] |
|
Test on build 2.0.2-804 with 3 vms with python 2.4.3, 2.6.5 and 2.7.1 cbworkloadgen works as expected /opt/couchbase/bin/cbworkloadgen -h Usage: cbworkloadgen [options] Generate workload to destination. Examples: cbworkloadgen -n localhost:8091 cbworkloadgen -n 10.3.121.192:8091 -r .9 -i 100000 \ -s 100 -b my-other-bucket --threads=10 Options: -h, --help show this help message and exit -r .95, --ratio-sets=.95 set/get operation ratio -n 127.0.0.1:8091, --node=127.0.0.1:8091 node's ns_server ip:port -b default, --bucket=default insert data to a different bucket other than default -i 10000, --max-items=10000 number of items to be inserted -s 10, --size=10 minimum value size --prefix=pymc prefix to use for memcached keys or json ids -j, --json insert json data -l, --loop loop forever until interrupted by users -u USERNAME, --username=USERNAME REST username for cluster or server node -p PASSWORD, --password=PASSWORD REST password for cluster or server node -t 1, --threads=1 number of concurrent workers -v, --verbose verbose logging; more -v's provide more verbosity |
Couchbase logo needs to be updated on UI, desktop and program-settings icon
(MB-7804)
|
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | installer |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Technical task | Priority: | Blocker |
| Reporter: | Anil Kumar | Assignee: | Bin Cui |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Comments |
| Comment by Anil Kumar [ 10/Apr/13 ] |
| Can you take look at the Spec and logo assets and let me know if you've what you need to make the changes. |
| Comment by Steve Yen [ 16/Apr/13 ] |
|
Anil, Assigning to you to get the right assets from the visual design folk. Bin's sent email to you regarding what he needs. Once you've got the new stuff, please attach them here and reassign this jira issue back to Bin. Thanks, Steve |
| Comment by Bin Cui [ 18/Apr/13 ] |
| http://review.membase.org/#/c/25770/ |
| Comment by Maria McDuff [ 23/Apr/13 ] |
| pls verify / close. |
| Comment by Shashank Gupta [ 24/Apr/13 ] |
|
Verified. Build : 2.0.2-772-rel |
| Comment by Anil Kumar [ 02/May/13 ] |
| As discussed attached are new images please fix. |
| Comment by Bin Cui [ 02/May/13 ] |
| update revised bmp and icn files |
| Comment by Maria McDuff [ 06/May/13 ] |
| pls verify / close. |
| Comment by Shashank Gupta [ 07/May/13 ] |
| I couldn't find banner image in the latest build during setup. Attaching the screenshots of the old(having banner image) and new(no banner) setup processes. |
| Comment by Bin Cui [ 07/May/13 ] |
| I found the problem and get it fixed at http://review.couchbase.org/#/c/26141/. It should be included in next build. |
| Comment by Shashank Gupta [ 07/May/13 ] |
| Ok. Will verify then. |
| Comment by Shashank Gupta [ 09/May/13 ] |
| Verified with build 2.0.2-787 |
| Comment by Thuan Nguyen [ 16/May/13 ] |
|
The icon in
/cygdrive/c/Program Files/Couchbase/Server/share/couchdb/www/favicon.ico need to update to the new logo |
| Comment by Bin Cui [ 16/May/13 ] |
| http://review.couchbase.org/#/c/26369/ |
[MB-8019] [2.0.2 RN + doc?] healthchecker - refresh for the new stats available in 2.0.0 Created: 10/Dec/12 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | documentation, tools |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Type: | Improvement | Priority: | Major |
| Reporter: | Steve Yen | Assignee: | Bin Cui |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | 2.0.2-release-notes, PM-PRIORITIZED | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Comments |
| Comment by Bin Cui [ 14/Dec/12 ] |
|
Here are new stats related to 2.0
Doc stats: 1. data size 2. disk size 3. actual disk size View performance 1. data size 2. disk size 3. view ops CompactionPerformance - fine grain analysis with thresholds 1. view fragmentation 2. doc fragmentation IncomingXDCRPerformance - get/set ops ratio with thresholds OutgoingXDCRPerformance 1. ops 2. replication queue length |
| Comment by Dipti Borkar [ 28/Feb/13 ] |
|
Anil has setup weekly meetings and will put together requirements and work with Bin.
this is getting more important as we have more customers, and support wants more visibility into the cluster. |
| Comment by Anil Kumar [ 10/Apr/13 ] |
| Bin: Any update on this bug will this be fixed before code-freeze on Friday. |
| Comment by Anil Kumar [ 11/Apr/13 ] |
|
Bin to update the bug with details on which stats were fixed.
|
| Comment by Bin Cui [ 23/Apr/13 ] |
| All the above stats are added to healthchecker already. |
| Comment by Maria McDuff [ 23/Apr/13 ] |
| pls verify / close. |
| Comment by Chisheng Hong [ 16/May/13 ] |
|
Doc stats + view performance: { "description": "View data size", "formula": "N/A", "status": "OK", "value": "882.565 MB" }, { "description": "View total disk size", "formula": "N/A", "status": "OK", "value": "2.303 GB" }, { "description": "Doc data size", "formula": "N/A", "status": "OK", "value": "16.711 GB" }, { "description": "Docs total disk size", "formula": "N/A", "status": "OK", "value": "19.995 GB" }, { "description": "Docs actual disk size", "formula": "N/A", "status": "OK", "value": "17.692 GB" } |
| Comment by Chisheng Hong [ 16/May/13 ] |
| CompactionPerfermance stats is not available in build 2.0.2-804-rel |
| Comment by Bin Cui [ 16/May/13 ] |
|
http://review.couchbase.org/#/c/26367/ http://review.couchbase.org/#/c/26368/ |
[MB-8269] rebalance hang but log page shows it is completed Created: 13/May/13 Updated: 16/May/13 Resolved: 14/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Thuan Nguyen | Assignee: | Anil Kumar |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | regression | ||
| Remaining Estimate: | Not Specified | ||