[MB-7849] cbtransfer crashes with Python 2.4 and couchstore-files as source Created: 01/Mar/13 Updated: 19/May/13 Resolved: 17/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | build, tools |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Deepkaran Salooja | Assignee: | Deepkaran Salooja |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | build 2.0.1-170-rel | ||
| Description |
|
Steps to reproduce (build 170, VM to reproduce - 10.3.3.95): 1. Create default bucket and load 100k items using mcsoda. 2. Use the below command to transfer data to file: /opt/couchbase/bin/cbtransfer -v -v -v couchstore-files:///opt/couchbase/var/lib/couchbase/data/ /tmp/backup (there is sufficient disk space in /tmp) Getting the below error: [root@caper-012 ~]# /opt/couchbase/bin/cbtransfer -v -v -v couchstore-files:///opt/couchbase/var/lib/couchbase/data/ /tmp/backup 2013-03-01 03:19:31,359: mt cbtransfer... 2013-03-01 03:19:31,360: mt source : couchstore-files:///opt/couchbase/var/lib/couchbase/data/ 2013-03-01 03:19:31,360: mt sink : /tmp/backup 2013-03-01 03:19:31,360: mt opts : {'username': None, 'source_vbucket_state': 'active', 'destination_vbucket_state': 'active', 'verbose': 3, 'dry_run': False, 'extra': {'max_retry': 10.0, 'rehash': 0.0, 'nmv_retry': 1.0, 'cbb_max_mb': 100000.0, 'try_xwm': 1.0, 'batch_max_bytes': 400000.0, 'report_full': 2000.0, 'batch_max_size': 1000.0, 'report': 5.0, 'recv_min_bytes': 4096.0}, 'single_node': False, 'bucket_destination': None, 'destination_operation': None, 'threads': 4, 'key': None, 'password': None, 'id': None, 'bucket_source': None} 2013-03-01 03:19:31,361: mt source_class: <class 'pump_sfd.SFDSource'> 2013-03-01 03:19:31,395: mt sink_class: <class 'pump_bfd.BFDSink'> 2013-03-01 03:19:31,395: mt source_buckets: default 2013-03-01 03:19:31,395: mt bucket: default 2013-03-01 03:19:31,396: mt source_nodes: N/A 2013-03-01 03:19:31,411: mt enqueueing node: N/A 2013-03-01 03:19:31,411: w0 node: N/A 2013-03-01 03:19:31,509: s0 create_db: /tmp/backup/bucket-default/node-N%2FA/data-0000.cbb 2013-03-01 03:19:31,509: s0 connect_db: /tmp/backup/bucket-default/node-N%2FA/data-0000.cbb ....................python: Objects/obmalloc.c:765: PyObject_Malloc: Assertion `bp != ((void *)0)' failed. Aborted With less number of items e.g. 10k, this works fine. |
| Comments |
| Comment by Deepkaran Salooja [ 01/Mar/13 ] |
|
With build 1976(2.0), I am hitting the original issue filed in |
| Comment by Bin Cui [ 01/Mar/13 ] |
|
You can modify parameter cbb_max_mb to limit batch size. The default is 100000 MB. A small number may do it. Looks like python SDK fails to allocate memory for sqlite db operation. /opt/couchbase/bin/cbtransfer -v -v -v -x cbb_max_mb=1000 couchstore-files:///opt/couchbase/var/lib/couchbase/data/ /tmp/backup |
| Comment by Bin Cui [ 01/Mar/13 ] |
|
[root@caper-012 ~]# /opt/couchbase/bin/cbtransfer -v -v -v couchstore-files:/// opt/couchbase/var/lib/couchbase/data/ /tmp/backup -x cbb_max_mb=100 2013-03-01 10:55:35,071: mt cbtransfer... 2013-03-01 10:55:35,072: mt source : couchstore-files:///opt/couchbase/var/lib/ couchbase/data/ 2013-03-01 10:55:35,072: mt sink : /tmp/backup 2013-03-01 10:55:35,072: mt opts : {'username': None, 'source_vbucket_state': 'active', 'destination_vbucket_state': 'active', 'verbose': 3, 'dry_run': False , 'extra': {'max_retry': 10.0, 'rehash': 0.0, 'nmv_retry': 1.0, 'cbb_max_mb': 10 0.0, 'try_xwm': 1.0, 'batch_max_bytes': 400000.0, 'report_full': 2000.0, 'batch_ max_size': 1000.0, 'report': 5.0, 'recv_min_bytes': 4096.0}, 'single_node': Fals e, 'bucket_destination': None, 'destination_operation': None, 'threads': 4, 'key ': None, 'password': None, 'id': None, 'bucket_source': None} 2013-03-01 10:55:35,073: mt source_class: <class 'pump_sfd.SFDSource'> 2013-03-01 10:55:35,107: mt sink_class: <class 'pump_bfd.BFDSink'> 2013-03-01 10:55:35,108: mt source_buckets: default 2013-03-01 10:55:35,108: mt bucket: default 2013-03-01 10:55:35,108: mt source_nodes: N/A 2013-03-01 10:55:35,124: mt enqueueing node: N/A 2013-03-01 10:55:35,125: w0 node: N/A 2013-03-01 10:55:35,226: s0 create_db: /tmp/backup/bucket-default/node-N%2FA/d ata-0000.cbb 2013-03-01 10:55:35,227: s0 connect_db: /tmp/backup/bucket-default/node-N%2FA/ data-0000.cbb ...............Traceback (most recent call last): File "source/callbacks.c", line 206, in 'calling callback function' File "/opt/couchbase/lib/python/couchstore.py", line 358, in callback fn(DocumentInfo._fromStruct(docInfoPtr.contents, self)) File "/opt/couchbase/lib/python/pump_sfd.py", line 211, in change_callback cas, exp, flg = struct.unpack(SFD_REV_META, doc_info.revMeta) TypeError: unpack() argument 2 must be string or read-only buffer, not CArgObjec t Segmentation fault |
| Comment by Jin Lim [ 01/Mar/13 ] |
| Bin this isn't a regression from 2.0.1 but can you please advise on how likely (often) users may run into this? We need to figure out whether to push this to 2.0.2 or not based on your input. Thanks! |
| Comment by Jin Lim [ 01/Mar/13 ] |
|
From Bin: It’s a use case that we never test before. The good thing is the bug doesn’t sit in the critical path for backup/restore. So we can defer the fix to 2.0.2 . Moving this to 2.0.2. |
| Comment by Maria McDuff [ 25/Mar/13 ] |
| bug scrub: would be good to fix in 2.0.2 release. |
| Comment by Steve Yen [ 16/Apr/13 ] |
| Following up on Jin's comment, this is an edge case of usage; but from Bin's analysis, it exposes a bug that regular backup might also trigger. So leaving in 2.0.2 as Bin would like to address this for 2.0.2. |
| Comment by Maria McDuff [ 29/Apr/13 ] |
| per bug committee, this is critical for 2.0.2 release. must fix due to bkup that can affect customer. |
| Comment by Pavel Paulau [ 02/May/13 ] |
|
1. 100K isn't a magic number. The issue is occasional and large number of items increases probability. 2. I analyzed several core dumps, cbtransfer usually crashes because of segfault or because of: Objects/obmalloc.c:765: PyObject_Malloc: Assertion `bp != ((void *)0)' failed. Objects/obmalloc.c:953: PyObject_Free: Assertion `pool->ref.count > 0' failed File "source/callbacks.c", line 206, in 'calling callback function' File "/opt/couchbase/lib/python/couchstore.py", line 358, in callback fn(DocumentInfo._fromStruct(docInfoPtr.contents, self)) File "/opt/couchbase/lib/python/couchstore.py", line 128, in _fromStruct self = DocumentInfo(str(info.id)) TypeError: __str__ returned non-string (type buffer) backtraces in turn vary as well: #0 0x00002b8a11230e23 in PyObject_Malloc () from /usr/lib64/libpython2.4.so.1.0 #1 0x00002b8a112301cd in _PyObject_New () from /usr/lib64/libpython2.4.so.1.0 #2 0x00002b8a1630b950 in new_CArgObject () at source/callproc.c:288 #3 0x00002b8a16305a39 in PointerType_paramfunc (self=0x38) at source/_ctypes.c:564 #4 0x00002b8a1630b3de in ConvParam (obj=0x38, index=1, pa=0x2b8a1c157620) at source/callproc.c:477 #5 0x00002b8a1630bafd in _CallProc (pProc=0x2b8a163075b0 <string_at>, argtuple=0x1f152b48, flags=4097, argtypes=0x1ef82cb0, restype=0x1ef16790, checker=0x0) at source/callproc.c:959 #6 0x00002b8a16306bf3 in CFuncPtr_call (self=0x1eebf710, inargs=0x2, kwds=0x0) at source/_ctypes.c:3362 #7 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #8 0x00002b8a11262fbe in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #9 0x00002b8a112666d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #10 0x00002b8a11264e2f in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #11 0x00002b8a112666d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #12 0x00002b8a1121baa7 in ?? () from /usr/lib64/libpython2.4.so.1.0 #13 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #14 0x00002b8a1120b90f in ?? () from /usr/lib64/libpython2.4.so.1.0 #15 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #16 0x00002b8a1126032d in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.4.so.1.0 #17 0x00002b8a112445e8 in ?? () from /usr/lib64/libpython2.4.so.1.0 #18 0x00002b8a1122fe45 in PyObject_Str () from /usr/lib64/libpython2.4.so.1.0 #19 0x00002b8a1123a617 in ?? () from /usr/lib64/libpython2.4.so.1.0 #20 0x00002b8a1123ff53 in ?? () from /usr/lib64/libpython2.4.so.1.0 #21 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #22 0x00002b8a11262fbe in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #23 0x00002b8a112666d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #24 0x00002b8a11264e2f in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #25 0x00002b8a112666d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #26 0x00002b8a1121baa7 in ?? () from /usr/lib64/libpython2.4.so.1.0 #27 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #28 0x00002b8a1126032d in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.4.so.1.0 #29 0x00002b8a1630aacc in _CallPythonObject (cif=<value optimized out>, resp=0x2b8a1c158af0, args=0x2b8a1c158960, userdata=<value optimized out>) at source/callbacks.c:206 #30 closure_fcn (cif=<value optimized out>, resp=0x2b8a1c158af0, args=0x2b8a1c158960, userdata=<value optimized out>) at source/callbacks.c:252 #31 0x00002b8a1631054b in ffi_closure_unix64_inner (closure=0x2b8a1c15bab0, rvalue=0x2b8a1c158af0, reg_args=0x2b8a1c158a40, argp=0x2b8a1c158b10 "j\003") at /tmp/ctypes/source/libffi/src/x86/ffi64.c:563 #32 0x00002b8a16310800 in ffi_closure_unix64 () at /tmp/ctypes/source/libffi/src/x86/unix64.S:228 #33 0x00002b8a1651dc3d in lookup_callback (rq=<value optimized out>, k=0x2b8a1c158bb0, v=<value optimized out>) at src/couch_db.c:623 #34 0x00002b8a1651b714 in btree_lookup_inner (rq=0x2b8a1c158d10, diskpos=<value optimized out>, current=0, end=1) at src/btree_read.c:78 #35 0x00002b8a1651b608 in btree_lookup_inner (rq=0x2b8a1c158d10, diskpos=<value optimized out>, current=0, end=1) at src/btree_read.c:52 #36 0x00002b8a1651c08c in couchstore_changes_since (db=<value optimized out>, since=<value optimized out>, options=<value optimized out>, callback=<value optimized out>, ctx=<value optimized out>) at src/couch_db.c:667 #37 0x00002b8a163106d4 in ffi_call_unix64 () at /tmp/ctypes/source/libffi/src/x86/unix64.S:73 #38 0x00002b8a16310244 in ffi_call (cif=0x2b8a1c159090, fn=0x2b8a1651bf40 <couchstore_changes_since>, rvalue=0x2b8a1c158f90, avalue=0x2b8a1c158f50) at /tmp/ctypes/source/libffi/src/x86/ffi64.c:428 #39 0x00002b8a1630bce1 in _call_function_pointer (pProc=0x2b8a1651bf40 <couchstore_changes_since>, argtuple=0x1f1a2ef0, flags=<value optimized out>, argtypes=0x0, restype=0x1ef9cce0, checker=0x0) at source/callproc.c:668 #40 _CallProc (pProc=0x2b8a1651bf40 <couchstore_changes_since>, argtuple=0x1f1a2ef0, flags=<value optimized out>, argtypes=0x0, restype=0x1ef9cce0, checker=0x0) at source/callproc.c:991 #41 0x00002b8a16306bf3 in CFuncPtr_call (self=0x1f073e90, inargs=0x1f1a2ef0, kwds=0x0) at source/_ctypes.c:3362 #42 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #43 0x00002b8a11262fbe in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #44 0x00002b8a112666d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #45 0x00002b8a11264e2f in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #46 0x00002b8a112666d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #47 0x00002b8a1121bb9a in ?? () from /usr/lib64/libpython2.4.so.1.0 #48 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #49 0x00002b8a11263c1c in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #50 0x00002b8a11265256 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #51 0x00002b8a112666d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #52 0x00002b8a1121baa7 in ?? () from /usr/lib64/libpython2.4.so.1.0 #53 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #54 0x00002b8a1120b90f in ?? () from /usr/lib64/libpython2.4.so.1.0 #55 0x00002b8a112057e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #56 0x00002b8a1126032d in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.4.so.1.0 #57 0x00002b8a1128c96d in ?? () from /usr/lib64/libpython2.4.so.1.0 #58 0x00002b8a1150a83d in start_thread () from /lib64/libpthread.so.0 #59 0x00002b8a11e7ffad in clone () from /lib64/libc.so.6 #0 0x00002b7dd51b0285 in raise () from /lib64/libc.so.6 #1 0x00002b7dd51b1d30 in abort () from /lib64/libc.so.6 #2 0x00002b7dd51a9716 in __assert_fail () from /lib64/libc.so.6 #3 0x00002b7dd4606204 in PyObject_Malloc () from /usr/lib64/libpython2.4.so.1.0 #4 0x00002b7dd45decbe in ?? () from /usr/lib64/libpython2.4.so.1.0 #5 0x00002b7dd45def36 in ?? () from /usr/lib64/libpython2.4.so.1.0 #6 0x00002b7dd4614f53 in ?? () from /usr/lib64/libpython2.4.so.1.0 #7 0x00002b7dd45da7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #8 0x00002b7dd4637fbe in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #9 0x00002b7dd463b6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #10 0x00002b7dd45f0b9a in ?? () from /usr/lib64/libpython2.4.so.1.0 #11 0x00002b7dd45da7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #12 0x00002b7dd4638c1c in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #13 0x00002b7dd463a256 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #14 0x00002b7dd463b6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #15 0x00002b7dd45f0aa7 in ?? () from /usr/lib64/libpython2.4.so.1.0 #16 0x00002b7dd45da7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #17 0x00002b7dd45e090f in ?? () from /usr/lib64/libpython2.4.so.1.0 #18 0x00002b7dd45da7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #19 0x00002b7dd463532d in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.4.so.1.0 #20 0x00002b7dd466196d in ?? () from /usr/lib64/libpython2.4.so.1.0 #21 0x00002b7dd48df83d in start_thread () from /lib64/libpthread.so.0 #22 0x00002b7dd5254fad in clone () from /lib64/libc.so.6 #0 0x00002b00a0fa2285 in raise () from /lib64/libc.so.6 #1 0x00002b00a0fa3d30 in abort () from /lib64/libc.so.6 #2 0x00002b00a0f9b716 in __assert_fail () from /lib64/libc.so.6 #3 0x00002b00a03f8204 in PyObject_Malloc () from /usr/lib64/libpython2.4.so.1.0 #4 0x00002b00a040139f in PyString_FromString () from /usr/lib64/libpython2.4.so.1.0 #5 0x00002b00a04014d9 in PyString_InternFromString () from /usr/lib64/libpython2.4.so.1.0 #6 0x00002b00a03f58b6 in PyObject_GetAttrString () from /usr/lib64/libpython2.4.so.1.0 #7 0x00002b00a54d1b62 in ConvParam (obj=0x126d5510, index=1, pa=0x2b00ab31ec90) at source/callproc.c:562 #8 0x00002b00a54d1f7a in _CallProc (pProc=0x2b00a56e3300 <couchstore_open_doc_with_docinfo>, argtuple=0x124ec680, flags=4097, argtypes=0x0, restype=0x1231a980, checker=0x0) at source/callproc.c:966 #9 0x00002b00a54ccd0a in CFuncPtr_call (self=<value optimized out>, inargs=<value optimized out>, kwds=0x0) at source/_ctypes.c:3362 #10 0x00002b00a03cc7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #11 0x00002b00a0429fbe in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #12 0x00002b00a042d6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #13 0x00002b00a042be2f in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #14 0x00002b00a042d6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #15 0x00002b00a042be2f in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #16 0x00002b00a042d6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #17 0x00002b00a03e2aa7 in ?? () from /usr/lib64/libpython2.4.so.1.0 #18 0x00002b00a03cc7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #19 0x00002b00a042732d in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.4.so.1.0 #20 0x00002b00a54d0ef3 in _CallPythonObject (cif=<value optimized out>, resp=0x2b00ab31fac0, args=<value optimized out>, userdata=<value optimized out>) at source/callbacks.c:206 #21 closure_fcn (cif=<value optimized out>, resp=0x2b00ab31fac0, args=<value optimized out>, userdata=<value optimized out>) at source/callbacks.c:252 #22 0x00002b00a54d6639 in ffi_closure_unix64_inner (closure=0x2b00ab322e10, rvalue=0x2b00ab31fac0, reg_args=0x2b00ab31fa10, argp=0x2b00ab31fae0 "\021\005") at /home/buildbot/centos-x64-201-builder/build/build/ctypes/source/libffi/src/x86/ffi64.c:563 #23 0x00002b00a54d6fd4 in ffi_closure_unix64 () at /home/buildbot/centos-x64-201-builder/build/build/ctypes/source/libffi/src/x86/unix64.S:228 #24 0x00002b00a56e4c3d in lookup_callback (rq=<value optimized out>, k=0x2b00ab31fb80, v=<value optimized out>) at src/couch_db.c:623 #25 0x00002b00a56e2714 in btree_lookup_inner (rq=0x2b00ab31fce0, diskpos=<value optimized out>, current=0, end=1) at src/btree_read.c:78 #26 0x00002b00a56e2608 in btree_lookup_inner (rq=0x2b00ab31fce0, diskpos=<value optimized out>, current=0, end=1) at src/btree_read.c:52 #27 0x00002b00a56e308c in couchstore_changes_since (db=<value optimized out>, since=<value optimized out>, options=<value optimized out>, callback=<value optimized out>, ctx=<value optimized out>) at src/couch_db.c:667 #28 0x00002b00a54d6ea8 in ffi_call_unix64 () at /home/buildbot/centos-x64-201-builder/build/build/ctypes/source/libffi/src/x86/unix64.S:73 #29 0x00002b00a54d6c35 in ffi_call (cif=0x2b00ab320080, fn=0x2b00a56e2f40 <couchstore_changes_since>, rvalue=<value optimized out>, avalue=<value optimized out>) at /home/buildbot/centos-x64-201-builder/build/build/ctypes/source/libffi/src/x86/ffi64.c:428 #30 0x00002b00a54d215d in _call_function_pointer (pProc=0x2b00a56e2f40 <couchstore_changes_since>, argtuple=0x124ba7d0, flags=4097, argtypes=0x0, restype=0x1231a980, checker=0x0) at source/callproc.c:668 #31 _CallProc (pProc=0x2b00a56e2f40 <couchstore_changes_since>, argtuple=0x124ba7d0, flags=4097, argtypes=0x0, restype=0x1231a980, checker=0x0) at source/callproc.c:991 #32 0x00002b00a54ccd0a in CFuncPtr_call (self=<value optimized out>, inargs=<value optimized out>, kwds=0x0) at source/_ctypes.c:3362 #33 0x00002b00a03cc7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #34 0x00002b00a0429fbe in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #35 0x00002b00a042d6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #36 0x00002b00a042be2f in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #37 0x00002b00a042d6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #38 0x00002b00a03e2b9a in ?? () from /usr/lib64/libpython2.4.so.1.0 #39 0x00002b00a03cc7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #40 0x00002b00a042ac1c in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #41 0x00002b00a042c256 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #42 0x00002b00a042d6d5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #43 0x00002b00a03e2aa7 in ?? () from /usr/lib64/libpython2.4.so.1.0 #44 0x00002b00a03cc7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #45 0x00002b00a03d290f in ?? () from /usr/lib64/libpython2.4.so.1.0 #46 0x00002b00a03cc7e0 in PyObject_Call () from /usr/lib64/libpython2.4.so.1.0 #47 0x00002b00a042732d in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.4.so.1.0 #48 0x00002b00a045396d in ?? () from /usr/lib64/libpython2.4.so.1.0 #49 0x00002b00a06d183d in start_thread () from /lib64/libpthread.so.0 #50 0x00002b00a1046fad in clone () from /lib64/libc.so.6 3. Installing Python 2.6 on this machine fixed the problem. However I'd not blame Python 2.4 - as I mentioned before absolutely identical setup worked pretty well on other machine. 4. ctypes that we ship is another suspect. That was a workaround, not necessarily it addresses all edge cases. And couchstore + Python 2.4 is one of them, in fact all other sources work fine. |
| Comment by Pavel Paulau [ 02/May/13 ] |
|
Deep,
May I ask you reproduce it on any other machine? This is very confusing issue... |
| Comment by Deepkaran Salooja [ 06/May/13 ] |
|
Reproduced on VM 10.3.3.104. 100k items loaded with mcsoda on default bucket. Build 2.0.2-781-rel. Crash is reproducible. [root@caper-016 ~]# /opt/couchbase/bin/cbtransfer -v -v -v couchstore-files:///opt/couchbase/var/lib/couchbase/data/ /tmp/backup 2013-05-06 10:41:23,838: mt cbtransfer... 2013-05-06 10:41:23,838: mt source : couchstore-files:///opt/couchbase/var/lib/couchbase/data/ 2013-05-06 10:41:23,839: mt sink : /tmp/backup 2013-05-06 10:41:23,839: mt opts : {'username': None, 'source_vbucket_state': 'active', 'destination_vbucket_state': 'active', 'verbose': 3, 'dry_run': False, 'extra': {'max_retry': 10.0, 'rehash': 0.0, 'data_only': 0.0, 'nmv_retry': 1.0, 'cbb_max_mb': 100000.0, 'try_xwm': 1.0, 'batch_max_bytes': 400000.0, 'report_full': 2000.0, 'batch_max_size': 1000.0, 'report': 5.0, 'design_doc_only': 0.0, 'recv_min_bytes': 4096.0}, 'single_node': False, 'bucket_destination': None, 'destination_operation': None, 'vbucket_list': None, 'threads': 4, 'key': None, 'password': None, 'id': None, 'bucket_source': None} 2013-05-06 10:41:23,840: mt source_class: <class 'pump_sfd.SFDSource'> 2013-05-06 10:41:24,094: mt sink_class: <class 'pump_bfd.BFDSink'> 2013-05-06 10:41:24,095: mt source_buckets: default 2013-05-06 10:41:24,095: mt bucket: default 2013-05-06 10:41:24,096: mt source_nodes: N/A 2013-05-06 10:41:24,108: mt enqueueing node: N/A 2013-05-06 10:41:24,108: w0 node: N/A 2013-05-06 10:41:24,217: s0 create_db: /tmp/backup/bucket-default/node-N%2FA/data-0000.cbb 2013-05-06 10:41:24,217: s0 connect_db: /tmp/backup/bucket-default/node-N%2FA/data-0000.cbb ...Segmentation fault |
| Comment by Pavel Paulau [ 08/May/13 ] |
| In fact 10.3.3.104 is a virtual copy of 10.3.3.95. |
| Comment by Pavel Paulau [ 13/May/13 ] |
|
http://review.couchbase.org/#/c/26260/ |
| Comment by Maria McDuff [ 14/May/13 ] |
| pls verify / close. |
| Comment by Pavel Paulau [ 15/May/13 ] |
| Actually it wasn't resolved. We still ship old version of ctypes, without direct access to builders I can't fix that. |
| Comment by Phil Labee [ 15/May/13 ] |
|
tlm commit af7b66f4c81b890adfcb8b7520a89436f2e4d0cd adds a 'clean-all' target and a 'clean-grommit' target The 'clean-grommit' target will remove directories named: ctypes* curl* google-perftools* gperftools* libevent* sqlite* Buildbot master.cfg now uses the 'clean all' target for "make clean". |
| Comment by Maria McDuff [ 16/May/13 ] |
|
Phil, Is this fixed? can you confirm. pavel's last comment states it's not resolved.... |
| Comment by Pavel Paulau [ 17/May/13 ] |
| It must be fixed. At least build 805 includes required version of ctypes. |
| Comment by Deepkaran Salooja [ 19/May/13 ] |
| verified with 2.0.2-807-rel |
[MB-8284] Rebalance fails with reason replicator_died Created: 15/May/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Deepkaran Salooja | Assignee: | Jin Lim |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
2.0.2-803-rel
<manifest> <remote name="couchbase" fetch="git://github.com/couchbase/"/> <remote name="membase" fetch="git://github.com/membase/"/> <remote name="apache" fetch="git://github.com/apache/"/> <remote name="erlang" fetch="git://github.com/erlang/"/> <default remote="couchbase" revision="master"/> <project name="tlm" path="tlm" revision="f30cd57af02e51eafa0b6d5fb71176c2a46a2cf9"> <copyfile src="Makefile.top" dest="Makefile"/> </project> <project name="bucket_engine" path="bucket_engine" revision="2a797a8d97f421587cce728f2e6aa2cd42c8fa26"/> <project name="ep-engine" path="ep-engine" revision="0d6b7b00df999bef2b9e7ff160fe908b3650e407"/> <project name="libconflate" path="libconflate" revision="c0d3e26a51f25a2b020713559cb344d43ce0b06c"/> <project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/> <project name="libvbucket" path="libvbucket" revision="408057ec55da3862ab8d75b1ed25d2848afd640f"/> <project name="couchbase-cli" path="couchbase-cli" revision="45f1370e3c440bde9763b124e88e26ee98941bcd" remote="couchbase"/> <project name="memcached" path="memcached" revision="b6ceb46fc26ac6f1d6be7a5866d6c6c0f6e6d32a" remote="membase"/> <project name="moxi" path="moxi" revision="4b391021af7a453bf88716724d2c644916ebd969"/> <project name="ns_server" path="ns_server" revision="232663cc06c71b92434ad70f9949b49e46269f9b"/> <project name="portsigar" path="portsigar" revision="159b6179ea8a3c2075ee9eb2afa6f91c98c0fda6"/> <project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/> <project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/> <project name="couchbase-python-client" path="couchbase-python-client" revision="d443169c0694fca1be67d8f6934a8c50f0175ee7"/> <project name="couchdb" path="couchdb" revision="586e4bb73b92db4362192616370c4e3edb8c34a0"/> <project name="couchdbx-app" path="couchdbx-app" revision="e83b255bc7f7548e2bc36e709666e564c2a488dd"/> <project name="couchstore" path="couchstore" revision="963fc26eafc67514eed5c9a3752d5d4cbdf5971d"/> <project name="geocouch" path="geocouch" revision="ed9ad43aa361df0829262fef811b5236331b44c8"/> <project name="testrunner" path="testrunner" revision="1f15e11d443be385ff8362d049a389331b502f9a"/> <project name="healthchecker" path="healthchecker" revision="29d45e7776ecb20800f6ad97aec585a1e1636370"/> <project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/> <project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/> <project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/> <project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/> <project name="gperftools" path="gperftools" revision="44a584d1de8c89addfb4f1d0522bdbbbed83ba48" remote="couchbase"/> <project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/> </manifest> |
||
| Operating System: | Centos 64-bit |
| Description |
|
Rebalance failures are seen in Centos64 sanity job: http://qa.hq.northscale.net/job/CouchbaseServer-SanityTest-4Node-Centos64/73/consoleFull ** Reason for termination == ** {exited, {'EXIT',<0.28006.6>, {replicator_died, {'EXIT',<17674.28481.2>,{badmatch,{error,closed}}}}}} [ns_server:info,2013-05-15T3:24:33.837,ns_1@10.3.3.32:janitor_agent-default<0.23324.6>:janitor_agent:handle_info:761]Undoing temporary vbucket states caused by rebalance [user:info,2013-05-15T3:24:33.837,ns_1@10.3.3.32:<0.30432.2>:ns_orchestrator:handle_info:403]Rebalance exited with reason {exited, {'EXIT',<0.28006.6>, {replicator_died, {'EXIT',<17674.28481.2>, {badmatch,{error,closed}}}}}} ns_server:error,2013-05-15T3:24:33.584,ns_1@10.3.3.32:<0.28006.6>:ns_replicas_builder:build_replicas_main:90]Got premature exit from one of ebucketmigrators: {'EXIT',<17674.28481.2>, {badmatch,{error,closed}}} [error_logger:error,2013-05-15T3:24:33.585,ns_1@10.3.3.32:error_logger<0.6.0>:ale_error_logger_handler:log_report:72] =========================CRASH REPORT========================= crasher: initial call: ebucketmigrator_srv:init/1 pid: <17674.28481.2> registered_name: [] exception error: no match of right hand side value {error,closed} in function mc_client_binary:cmd_binary_vocal_recv/5 in call from mc_client_binary:set_vbucket/3 in call from ebucketmigrator_srv:'-init/1-lc$^0/1-0-'/3 in call from ebucketmigrator_srv:init/1 ancestors: [<0.28006.6>,<0.28000.6>,<0.27823.6>,<0.26685.6>] messages: [] links: [#Port<17674.80060>,<0.28006.6>,#Port<17674.80059>] dictionary: [] trap_exit: false status: running heap_size: 987 stack_size: 24 reductions: 1725 neighbours: Attaching collect_info |
| Comments |
| Comment by Deepkaran Salooja [ 15/May/13 ] |
|
https://s3.amazonaws.com/bugdb/jira/MB-8284/e9125b6b/10.3.3.32-5152013-635-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8284/e9125b6b/10.3.3.33-5152013-638-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8284/e9125b6b/10.3.3.30-5152013-640-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8284/e9125b6b/10.3.3.224-5152013-643-diag.zip |
| Comment by Maria McDuff [ 15/May/13 ] |
|
FYI -- this is the newest build that contains Jin's rebalance fixes. |
| Comment by Aliaksey Artamonau [ 15/May/13 ] |
|
2013-05-15 03:26:26.249 ns_log:0:info:message(ns_1@10.3.3.224) - Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 134. Restarting ... MUTEX ERROR: Failed to acquire lock: Invalid argument |
| Comment by Jin Lim [ 16/May/13 ] |
|
* MRW37 passed http://qa.hq.northscale.net/job/CouchbaseServer-SanityTest-4Node-Centos64/75 except a singel view test. * Checked the failed node, 10.3.3.32, and found no previously detected errors (rebalance hang, mutex acquire abort, crash, etc) * Seems network glitch due to slowness caused connection timeout from and to this node Will merge the fix that went into MRW37 to 2.0.2 branch on Thursday morning. |
| Comment by Aliaksey Artamonau [ 16/May/13 ] |
| Please see my comment above. In fact there was a mutex abort on node 10.3.3.224. |
| Comment by Jin Lim [ 16/May/13 ] |
|
Thanks for reiterating the issue but the mutex error was from the test running on the build 803. Ketaki and Jin built another toy build with a fix, MRW37, later this evening and ran the same sanity test. No MUTEX abort is now found any nodes including 10.3.3.224. The successful results except a single view test can be found at here http://qa.hq.northscale.net/job/CouchbaseServer-SanityTest-4Node-Centos64/75. |
| Comment by Aliaksey Artamonau [ 16/May/13 ] |
| I misunderstood. Sorry then. |
| Comment by Jin Lim [ 16/May/13 ] |
|
The overnight test ran without an incident http://qa.hq.northscale.net/job/CouchbaseServer-SanityTest-4Node-Centos64/77/. This is again with the toy build MRW37. Thanks. |
| Comment by Jin Lim [ 17/May/13 ] |
| Deep this bug has been tracking MUTEX::acquire failure during rebalance (in the path within couch_notifier::resectConnection()). Please confirm if we no longer see this error with the latest test results. If so please close, otherwise, reopen and assign back to Jin. Thanks. |
| Comment by Jin Lim [ 17/May/13 ] |
| Build 805 and up should have included the fix. |
[MB-8270] node down with error net_kernal_terminated Created: 13/May/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Thuan Nguyen | Assignee: | Aleksey Kondratenko |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | windows 2008 R2 64bit | ||
| Operating System: | Windows 64-bit |
| Description |
|
Environment:
4 windows server r2 64-bit with 4 core, 4GB RAM for each server Couchbase: couchbase server version 2.0.2-801 Run sanity test on windows. When investigate log file for bug 2013-05-13 17:59:28: (agent_config.c.705) ERROR: bad JSON configuration from http://127.0.0.1:8091/pools/default/saslBucketsStreaming: Number of vBuckets must be a power of two > 0 and <= 65536 ({ "name": "default", "nodeLocator": "vbucket", "saslPassword": "", "nodes": [{ "hostname": "127.0.0.1:8091", "ports": { "direct": 11210, "proxy": 11211 } }], "vBucketServerMap": { "hashAlgorithm": "CRC", "numReplicas": 1, "serverList": ["127.0.0.1:11210"], "vBucketMap": [] } }) EOL on stdin. Exiting [menelaus:info,2013-05-13T18:02:35.839,ns_1@127.0.0.1:<0.20352.3>:menelaus_web_buckets:handle_bucket_delete:345]Deleted bucket "default" [ns_server:debug,2013-05-13T18:02:36.980,ns_1@127.0.0.1:ns_config_log<0.284.0>:ns_config_log:log_common:111]config change: memory_quota -> 2184 [ns_server:debug,2013-05-13T18:02:36.980,ns_1@127.0.0.1:ns_config_rep<0.308.0>:ns_config_rep:do_push_keys:317]Replicating some config keys ([memory_quota]..) [cluster:debug,2013-05-13T18:02:38.136,ns_1@127.0.0.1:ns_cluster<0.273.0>:ns_cluster:handle_call:135]handling add_node("10.3.2.142", 8091, ..) [cluster:info,2013-05-13T18:02:38.136,ns_1@127.0.0.1:ns_cluster<0.273.0>:ns_cluster:do_change_address:315]Decided to change address to "10.3.2.143" [user:warn,2013-05-13T18:02:38.136,nonode@nohost:ns_node_disco<0.301.0>:ns_node_disco:handle_info:168]Node nonode@nohost saw that node 'ns_1@127.0.0.1' went down. Details: [{nodedown_reason, net_kernel_terminated}] [ns_server:info,2013-05-13T18:02:38.136,nonode@nohost:dist_manager<0.263.0>:dist_manager:handle_call:249]Adjusted IP to "10.3.2.143" [ns_server:info,2013-05-13T18:02:38.136,nonode@nohost:dist_manager<0.263.0>:dist_manager:bringup:227]Attempting to bring up net_kernel with name 'ns_1@10.3.2.143' [error_logger:info,2013-05-13T18:02:38.136,nonode@nohost:error_logger<0.6.0>:ale_error_logger_handler:log_report:72] =========================PROGRESS REPORT========================= supervisor: {local,net_sup} started: [{pid,<0.4435.4>}, {name,erl_epmd}, {mfargs,{erl_epmd,start_link,[]}}, {restart_type,permanent}, {shutdown,2000}, {child_type,worker}] Link to manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.2-801-rel.setup.exe.manifest.xml Link to collect info of all node https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_2/2013_05/4nodes-202-801_reb_hang_20130513-185157.tgz |
| Comments |
| Comment by Maria McDuff [ 14/May/13 ] |
| per bug triage, bumping up to critical. |
| Comment by Aleksey Kondratenko [ 17/May/13 ] |
| seeing net_kernel_terminate is not error in fact. So not a bug. |
[MB-8246] [system test] Rebalance exited with reason timeout waiting for backfill determination Created: 10/May/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Chisheng Hong | Assignee: | Mike Wiederhold |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | build 2.0.2-789-rel | ||
| Operating System: | Centos 64-bit |
| Description |
|
Cluster ip is 172.23.105.23
1. create 8 nodes cluster, each node has 12G RAM, HHD 2. create 2 buckets default and saslbucket, with memory quota 6G and 4G 3. Run the KV use case for 1 day: loading 35M items to each bucket, access the data 4k ops/sec with 5% create, 5% delete, 5%expire, 5% update, 80 gets for several hours. Then with the same work load, run some rebalance and failover operations. Works good. 4 Continue the workload for another day, then try to rebalance in one node, rebalance exit with time out. Rebalance exited with reason {unexpected_exit, {'EXIT',<0.7961.56>, {{badmatch, [{'EXIT', {timeout, {gen_server,call, [<12869.26294.2>,had_backfill,30000]}}}]}, [{ns_single_vbucket_mover, '-wait_backfill_determination/1-fun-1-', 1}]}}} ns_orchestrator002 ns_1@172.23.105.23 14:36:20 - Fri May 10, 2013 <0.7953.56> exited with {unexpected_exit, {'EXIT',<0.7961.56>, {{badmatch, [{'EXIT', {timeout, {gen_server,call, [<12869.26294.2>,had_backfill,30000]}}}]}, [{ns_single_vbucket_mover, '-wait_backfill_determination/1-fun-1-',1}]}}} The link for diags is https://s3.amazonaws.com/bugdb/jira/MB-8246/ns-diag-20130510162640.txt.zip |
| Comments |
| Comment by Chisheng Hong [ 10/May/13 ] |
| cbcollect info link is https://s3.amazonaws.com/bugdb/jira/MB-8246/10nodes_202-789_rebalance_timetout_20130510-173321.tgz |
| Comment by Chisheng Hong [ 13/May/13 ] |
| Aleksey K thought this was related to http://www.couchbase.com/issues/browse/MB-8231 |
| Comment by Maria McDuff [ 14/May/13 ] |
|
per bug triage, bumping up to blocker. if this is related to pls update with your findings nonetheless. |
| Comment by Maria McDuff [ 14/May/13 ] |
| per bug triage, wait for ep-engine to stabilize. |
| Comment by Mike Wiederhold [ 17/May/13 ] |
|
Duplicate of |
[MB-8243] memcached crashed in Configuration::getCouchBucket Created: 10/May/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Andrei Baranouski | Assignee: | Andrei Baranouski |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | 2.0.2-795 | ||
| Operating System: | Ubuntu 64-bit |
| Description |
|
I see crash from http://qa.hq.northscale.net/job/CouchbaseServer-SanityTest-4Node-
Centos64/63/consoleFull run on build 2.0.2-795-rel seems it's the test: ./testrunner -i fournode.ini -t viewquerytests.ViewQueryTests.test_employee_dataset_all_queries,limit=1000,docs-per-day=2,wait_persistence=true but this job doesn't generate collect info after failures, and I have logs that have been obtained much later gdb /opt/couchbase/bin/memcached core.memcached.7524 GNU gdb (GDB) CentOS (7.0.1-45.el5.centos) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /opt/couchbase/bin/memcached...done. [New Thread 7705] [New Thread 8584] [New Thread 8583] [New Thread 8582] [New Thread 7709] [New Thread 7708] [New Thread 7707] [New Thread 7706] [New Thread 7704] [New Thread 7703] [New Thread 7702] [New Thread 7701] [New Thread 7539] [New Thread 7538] [New Thread 7537] [New Thread 7536] [New Thread 7535] [New Thread 7533] [New Thread 7532] [New Thread 7524] warning: .dynamic section for "/usr/lib64/libstdc++.so.6" is not at the expected address warning: difference appears to be caused by prelink, adjusting expectations warning: .dynamic section for "/lib64/libgcc_s.so.1" is not at the expected address warning: difference appears to be caused by prelink, adjusting expectations Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done. Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0 Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done. Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/librt.so.1 Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done. Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4 Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libstdc++.so.6 Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libgcc_s.so.1 Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done. Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done. Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done. Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so Reading symbols from /opt/couchbase/lib/memcached/ep.so...done. Loaded symbols for /opt/couchbase/lib/memcached/ep.so Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done. Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1 Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done. Loaded symbols for /opt/couchbase/lib/libsnappy.so.1 Reading symbols from /opt/couchbase/lib/libicuuc.so.44...done. Loaded symbols for /opt/couchbase/lib/libicuuc.so.44 Reading symbols from /opt/couchbase/lib/libicudata.so.44...(no debugging symbols found)...done. Loaded symbols for /opt/couchbase/lib/libicudata.so.44 Reading symbols from /opt/couchbase/lib/libicui18n.so.44...done. Loaded symbols for /opt/couchbase/lib/libicui18n.so.44 warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff9dbe8000 Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'. Program terminated with signal 6, Aborted. #0 0x0000003866c30285 in raise () from /lib64/libc.so.6 (gdb) t aa bt A syntax error in expression, near `bt'. (gdb) t a a bt Thread 20 (Thread 0x2af656d41220 (LWP 7524)): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002af6568c7576 in epoll_dispatch (base=0x222b6000, tv=<value optimized out>) at epoll.c:404 #2 0x00002af6568b2e44 in event_base_loop (base=0x222b6000, flags=<value optimized out>) at event.c:1558 #3 0x00000000004097d6 in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7926 Thread 19 (Thread 7532): #0 0x0000003866cc545b in read () from /lib64/libc.so.6 #1 0x0000003866c6b677 in _IO_new_file_underflow () from /lib64/libc.so.6 #2 0x0000003866c6c03e in _IO_default_uflow_internal () from /lib64/libc.so.6 #3 0x0000003866c61124 in _IO_getline_info_internal () from /lib64/libc.so.6 #4 0x0000003866c5ffc9 in fgets () from /lib64/libc.so.6 #5 0x00002af656d42939 in check_stdin_thread (arg=<value optimized out>) at extensions/daemon/stdin_check.c:37 #6 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #7 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 18 (Thread 7533): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaaae4d6 in logger_thead_main (arg=0x1da4e040) at extensions/loggers/file_logger.c:368 #2 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #3 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 17 (Thread 7535): #0 0x00002af656b17f6f in tcmalloc::CentralFreeList::Populate() () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #1 0x00002af656b17c93 in tcmalloc::CentralFreeList::FetchFromSpansSafe() () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #2 0x00002af656b17bcc in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #3 0x00002af656b1ccd5 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #4 0x00002af656b12054 in tcmalloc::ThreadCache::Allocate(unsigned long, unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #5 0x00002af656b10b60 in (anonymous namespace)::do_malloc(unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #6 0x00002af656b10892 in (anonymous namespace)::do_malloc_or_cpp_alloc(unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #7 0x00002af656b10bec in (anonymous namespace)::do_calloc(unsigned long, unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #8 0x00002af656b24764 in tc_calloc () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #9 0x00002aaaaaef94b9 in HashTable (this=<value optimized out>, i=918, newState=vbucket_state_replica, st=..., checkpointConfig=..., checkpointId=1, initState=vbucket_state_dead) at src/stored-value.hh:814 #10 VBucket::VBucket (this=<value optimized out>, i=918, newState=vbucket_state_replica, st=..., checkpointConfig=..., checkpointId=1, initState=vbucket_state_dead) at src/vbucket.hh:126 #11 0x00002aaaaaefb932 in EventuallyPersistentStore::setVBucketState (this=0x2a424400, vbid=918, to=vbucket_state_replica) at src/ep.cc:795 #12 0x00002aaaaaf1e11d in setVBucketState (h=0x2230e900, cookie=0x222858c0, request=0x22276800, response=0x4075e0 <binary_response_handler>) at src/ep_engine.h:508 #13 setVBucket (h=0x2230e900, cookie=0x222858c0, request=0x22276800, response=0x4075e0 <binary_response_handler>) at src/ep_engine.cc:738 #14 processUnknownCommand (h=0x2230e900, cookie=0x222858c0, request=0x22276800, response=0x4075e0 <binary_response_handler>) at src/ep_engine.cc:884 #15 0x00002aaaaaf1e99c in EvpUnknownCommand (handle=<value optimized out>, cookie=0x222858c0, request=0x22276800, response=0x4075e0 <binary_response_handler>) at src/ep_engine.cc:1021 #16 0x00002aaaaacc4dd4 in bucket_unknown_command (handle=<value optimized out>, cookie=0x222858c0, request=0x22276800, response=0x4075e0 <binary_response_handler>) at bucket_engine.c:2475 #17 0x0000000000411b4e in process_bin_unknown_packet (c=0x222858c0) at daemon/memcached.c:2882 #18 process_bin_packet (c=0x222858c0) at daemon/memcached.c:3170 #19 complete_nread_binary (c=0x222858c0) at daemon/memcached.c:3744 #20 complete_nread (c=0x222858c0) at daemon/memcached.c:3826 #21 conn_nread (c=0x222858c0) at daemon/memcached.c:5679 #22 0x0000000000405ec5 in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x222858c0) at daemon/memcached.c:5942 #23 0x00002af6568b2f3c in event_process_active_single_queue (base=0x222b6500, flags=<value optimized out>) at event.c:1308 #24 event_process_active (base=0x222b6500, flags=<value optimized out>) at event.c:1375 #25 event_base_loop (base=0x222b6500, flags=<value optimized out>) at event.c:1572 #26 0x0000000000414604 in worker_libevent (arg=0x1da51900) at daemon/thread.c:301 #27 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #28 0x0000003866cd325d in clone () from /lib64/libc.so.6 ---Type <return> to continue, or q <return> to quit--- Thread 16 (Thread 7536): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002af6568c7576 in epoll_dispatch (base=0x222b6280, tv=<value optimized out>) at epoll.c:404 #2 0x00002af6568b2e44 in event_base_loop (base=0x222b6280, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x1da519f8) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 15 (Thread 7537): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002af6568c7576 in epoll_dispatch (base=0x222b6c80, tv=<value optimized out>) at epoll.c:404 #2 0x00002af6568b2e44 in event_base_loop (base=0x222b6c80, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x1da51af0) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 14 (Thread 7538): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002af6568c7576 in epoll_dispatch (base=0x222b6a00, tv=<value optimized out>) at epoll.c:404 #2 0x00002af6568b2e44 in event_base_loop (base=0x222b6a00, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x1da51be8) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 13 (Thread 7539): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002af6568c7576 in epoll_dispatch (base=0x222b6780, tv=<value optimized out>) at epoll.c:404 #2 0x00002af6568b2e44 in event_base_loop (base=0x222b6780, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x1da51ce0) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 12 (Thread 7701): #0 0x0000003866c99221 in nanosleep () from /lib64/libc.so.6 #1 0x0000003866cccba4 in usleep () from /lib64/libc.so.6 #2 0x00002aaaaaf35125 in updateStatsThread (arg=0x1da4e4c0) at src/memory_tracker.cc:31 #3 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #4 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 11 (Thread 7702): #0 0x000000386740d594 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003867408e8a in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x0000003867408d4c in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00002aaaaaf3601a in Mutex::acquire (this=0x223310f0) at src/mutex.cc:79 #4 0x00002aaaaaf7d303 in lock (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at ./src/locks.hh:48 #5 LockHolder (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at ./src/locks.hh:26 #6 CouchNotifier::notify_update (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at src/couch-kvstore/couch-notifier.cc:753 #7 0x00002aaaaaf73183 in CouchKVStore::setVBucketState (this=0x26595800, vbucketId=936, vbstate=..., vb_change_type=1, newfile=96, notify=true) at src/couch-kvstore/couch-kvstore.cc:745 #8 0x00002aaaaaf74089 in CouchKVStore::snapshotVBuckets (this=0x26595800, vbstates=Traceback (most recent call last): File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (keytype, valuetype)) RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >. std::map with 1 elements) at src/couch-kvstore/couch-kvstore.cc:596 #9 0x00002aaaaaefc2b3 in EventuallyPersistentStore::snapshotVBuckets (this=0x2a424400, priority=..., shardId=<value optimized out>) at src/ep.cc:760 #10 0x00002aaaaaf54bef in VBSnapshotTask::run (this=<value optimized out>) at src/tasks.cc:78 ---Type <return> to continue, or q <return> to quit--- #11 0x00002aaaaaf397a0 in ExecutorThread::run (this=0x22361ba0) at src/scheduler.cc:153 #12 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x22361ba0) at src/scheduler.cc:34 #13 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #14 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 10 (Thread 7703): #0 0x000000386740d594 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003867408e8a in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x0000003867408d4c in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00002aaaaaf3601a in Mutex::acquire (this=0x223310f0) at src/mutex.cc:79 #4 0x00002aaaaaf7d303 in lock (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at ./src/locks.hh:48 #5 LockHolder (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at ./src/locks.hh:26 #6 CouchNotifier::notify_update (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at src/couch-kvstore/couch-notifier.cc:753 #7 0x00002aaaaaf73183 in CouchKVStore::setVBucketState (this=0x26595200, vbucketId=937, vbstate=..., vb_change_type=1, newfile=96, notify=true) at src/couch-kvstore/couch-kvstore.cc:745 #8 0x00002aaaaaf74089 in CouchKVStore::snapshotVBuckets (this=0x26595200, vbstates=Traceback (most recent call last): File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (keytype, valuetype)) RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >. std::map with 1 elements) at src/couch-kvstore/couch-kvstore.cc:596 #9 0x00002aaaaaefc2b3 in EventuallyPersistentStore::snapshotVBuckets (this=0x2a424400, priority=..., shardId=<value optimized out>) at src/ep.cc:760 #10 0x00002aaaaaf54bef in VBSnapshotTask::run (this=<value optimized out>) at src/tasks.cc:78 #11 0x00002aaaaaf397a0 in ExecutorThread::run (this=0x22361a00) at src/scheduler.cc:153 #12 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x22361a00) at src/scheduler.cc:34 #13 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #14 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 9 (Thread 7704): #0 0x000000386740d594 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003867408e8a in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x0000003867408d4c in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00002aaaaaf3601a in Mutex::acquire (this=0x223310f0) at src/mutex.cc:79 #4 0x00002aaaaaf7d303 in lock (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at ./src/locks.hh:48 #5 LockHolder (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at ./src/locks.hh:26 #6 CouchNotifier::notify_update (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at src/couch-kvstore/couch-notifier.cc:753 #7 0x00002aaaaaf73183 in CouchKVStore::setVBucketState (this=0x26594c00, vbucketId=934, vbstate=..., vb_change_type=1, newfile=96, notify=true) at src/couch-kvstore/couch-kvstore.cc:745 #8 0x00002aaaaaf74089 in CouchKVStore::snapshotVBuckets (this=0x26594c00, vbstates=Traceback (most recent call last): File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (keytype, valuetype)) RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >. std::map with 1 elements) at src/couch-kvstore/couch-kvstore.cc:596 #9 0x00002aaaaaefc2b3 in EventuallyPersistentStore::snapshotVBuckets (this=0x2a424400, priority=..., shardId=<value optimized out>) at src/ep.cc:760 #10 0x00002aaaaaf54bef in VBSnapshotTask::run (this=<value optimized out>) at src/tasks.cc:78 #11 0x00002aaaaaf397a0 in ExecutorThread::run (this=0x22361860) at src/scheduler.cc:153 #12 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x22361860) at src/scheduler.cc:34 #13 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #14 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 8 (Thread 7706): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf39631 in wait (this=0x22388d00) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x22388d00) at src/scheduler.cc:139 #3 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x22388d00) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 ---Type <return> to continue, or q <return> to quit--- Thread 7 (Thread 7707): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf39631 in wait (this=0x22388b60) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x22388b60) at src/scheduler.cc:139 #3 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x22388b60) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 6 (Thread 7708): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf39631 in wait (this=0x223889c0) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x223889c0) at src/scheduler.cc:139 #3 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x223889c0) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 5 (Thread 7709): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf39631 in wait (this=0x22388820) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x22388820) at src/scheduler.cc:139 #3 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x22388820) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 4 (Thread 8582): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf0fe0f in wait (this=0x2230e900) at src/syncobject.hh:57 #2 wait (this=0x2230e900) at src/syncobject.hh:73 #3 wait (this=0x2230e900) at src/tapconnmap.hh:169 #4 EventuallyPersistentEngine::notifyPendingConnections (this=0x2230e900) at src/ep_engine.cc:3379 #5 0x00002aaaaaf0fef3 in EvpNotifyPendingConns (arg=0x2230e900) at src/ep_engine.cc:1153 #6 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #7 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 3 (Thread 8583): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaef3688 in wait (this=0x222aa1b0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x222aa1b0, d=...) at src/dispatcher.cc:342 #3 0x00002aaaaaef61ea in Dispatcher::run (this=0x22313c00) at src/dispatcher.cc:184 #4 0x00002aaaaaef69ad in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28 #5 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #6 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 2 (Thread 8584): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaef3688 in wait (this=0x222ab0e0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x222ab0e0, d=...) at src/dispatcher.cc:342 #3 0x00002aaaaaef61ea in Dispatcher::run (this=0x22313500) at src/dispatcher.cc:184 #4 0x00002aaaaaef69ad in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28 #5 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #6 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x49688940 (LWP 7705)): #0 0x0000003866c30285 in raise () from /lib64/libc.so.6 #1 0x0000003866c31d30 in abort () from /lib64/libc.so.6 ---Type <return> to continue, or q <return> to quit--- #2 0x00002aaaaaf36070 in Mutex::acquire (this=0x2230e828) at src/mutex.cc:83 #3 0x00002aaaaaf83fe8 in lock (this=0x2230e828, key="couch_bucket") at src/locks.hh:48 #4 LockHolder (this=0x2230e828, key="couch_bucket") at src/locks.hh:26 #5 Configuration::getString (this=0x2230e828, key="couch_bucket") at src/configuration.cc:38 #6 0x00002aaaaaf8e5eb in Configuration::getCouchBucket (this=0x2230e828) at src/generated_configuration.cc:71 #7 0x00002aaaaaf7c59e in CouchNotifier::selectBucket (this=0x22331000) at src/couch-kvstore/couch-notifier.cc:721 #8 0x00002aaaaaf7cc0f in CouchNotifier::processInput (this=0x22331000) at src/couch-kvstore/couch-notifier.cc:606 #9 0x00002aaaaaf7c199 in maybeProcessInput (this=0x22331000, rh=0x22385540) at src/couch-kvstore/couch-notifier.cc:546 #10 CouchNotifier::sendCommand (this=0x22331000, rh=0x22385540) at src/couch-kvstore/couch-notifier.cc:439 #11 0x00002aaaaaf7d478 in CouchNotifier::notify_update (this=0x22331000, vbs=..., file_version=1, header_offset=4096, cb=...) at src/couch-kvstore/couch-notifier.cc:774 #12 0x00002aaaaaf73183 in CouchKVStore::setVBucketState (this=0x26594600, vbucketId=935, vbstate=..., vb_change_type=1, newfile=96, notify=true) at src/couch-kvstore/couch-kvstore.cc:745 #13 0x00002aaaaaf74089 in CouchKVStore::snapshotVBuckets (this=0x26594600, vbstates=Traceback (most recent call last): File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (keytype, valuetype)) RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >. std::map with 1 elements) at src/couch-kvstore/couch-kvstore.cc:596 #14 0x00002aaaaaefc2b3 in EventuallyPersistentStore::snapshotVBuckets (this=0x2a424400, priority=..., shardId=<value optimized out>) at src/ep.cc:760 #15 0x00002aaaaaf54bef in VBSnapshotTask::run (this=<value optimized out>) at src/tasks.cc:78 #16 0x00002aaaaaf397a0 in ExecutorThread::run (this=0x22388ea0) at src/scheduler.cc:153 #17 0x00002aaaaaf39ebd in launch_executor_thread (arg=0x22388ea0) at src/scheduler.cc:34 #18 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #19 0x0000003866cd325d in clone () from /lib64/libc.so.6 core file: root(couchbase)@10.3.3.30:/tmp/core.memcached.7524 |
| Comments |
| Comment by Andrei Baranouski [ 10/May/13 ] |
|
https://s3.amazonaws.com/bugdb/jira/MB-8243/10.3.3.224-5102013-1219-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8243/10.3.3.30-5102013-1218-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8243/10.3.3.32-5102013-1214-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8243/10.3.3.33-5102013-1216-diag.zip |
| Comment by Maria McDuff [ 10/May/13 ] |
| bumping up to blocker. |
| Comment by Jin Lim [ 10/May/13 ] |
|
* A fix has been uploaded for code review, http://review.couchbase.org/#/c/26253/ * In the mean time, please try the toy build, 2.0.0-MRW33-toy at http://builds.hq.northscale.net:8010/builders/ec2-centos-x64_toy-couchstore-builder/builds/166 - this toy build has the same fix for validation * reassign it back to ep engine (Jin) if the same symptom persists, thanks. |
| Comment by Andrei Baranouski [ 11/May/13 ] |
|
http://qa.hq.northscale.net/job/ubuntu-32-2.0-swaprebalance-test-P0/74/ numerous the same memcached crashes files stored on the servers: 10.3.2.153 -rw------- 1 couchbase couchbase 290811904 2013-05-10 15:19 core.memcached.993 10.3.2.155 -rw------- 1 couchbase couchbase 289701888 2013-05-10 14:50 core.memcached.11617 -rw------- 1 couchbase couchbase 289763328 2013-05-10 05:43 core.memcached.1511 -rw------- 1 couchbase couchbase 335925248 2013-05-10 06:33 core.memcached.1764 10.3.2.158 -rw------- 1 couchbase couchbase 445218816 2013-05-10 13:28 core.memcached.14070 -rw------- 1 couchbase couchbase 398946304 2013-05-10 14:50 core.memcached.15749 -rw------- 1 couchbase couchbase 280260608 2013-05-10 15:40 core.memcached.20753 -rw------- 1 couchbase couchbase 322297856 2013-05-10 06:33 core.memcached.2353 -rw------- 1 couchbase couchbase 441024512 2013-05-10 04:16 core.memcached.28046 -rw------- 1 couchbase couchbase 398950400 2013-05-10 05:43 core.memcached.29729 10.3.2.154 -rw------- 1 couchbase couchbase 225599488 2013-05-10 10:00 core.memcached.15916 -rw------- 1 couchbase couchbase 584019968 2013-05-09 02:05 core.memcached.22381 10.3.2.156 -rw------- 1 couchbase couchbase 506163200 2013-05-10 07:51 core.memcached.20281 -rw------- 1 couchbase couchbase 423190528 2013-04-26 11:28 core.memcached.20425 -rw------- 1 couchbase couchbase 279216128 2013-05-10 15:47 core.memcached.9400 -rw------- 1 couchbase couchbase 439971840 2013-05-10 04:16 core.memcached.9825 10.3.2.157 -rw------- 1 couchbase couchbase 401043456 2013-05-10 14:49 core.memcached.13490 -rw------- 1 couchbase couchbase 490438656 2013-05-10 15:39 core.memcached.18141 -rw------- 1 couchbase couchbase 398950400 2013-05-10 05:42 core.memcached.21944 -rw------- 1 couchbase couchbase 447315968 2013-05-10 06:54 core.memcached.26362 -rw------- 1 couchbase couchbase 299196416 2013-04-27 03:26 core.memcached.7099 |
| Comment by Andrei Baranouski [ 11/May/13 ] |
|
saw it on ubuntu32 http://qa.hq.northscale.net/job/ubuntu-32-2.0-swaprebalance-test-P0/74/ launched the toy build with the same suite but on centos64 http://qa.hq.northscale.net/job/centos-64-2.0-basic-rebalance-tests-P0/443/console |
| Comment by Andrei Baranouski [ 11/May/13 ] |
|
rebalance hangs as in no crashes for now test to reproduce: ./testrunner -i /tmp/rebalance-tests.ini get-cbcollect-info=True,GROUP=P0 -t swaprebalance.SwapRebalanceFailedTests.test_add_back_failed_node,replica=1,num-buckets=1,num-swap=3,GROUP=P0 ini file for centos-64-2.0-basic-rebalance-tests-P0 job [global] port:8091 [servers] 1:vm1 2:vm2 3:vm3 4:vm4 5:vm5 6:vm6 #7:vm7 [vm1] ip:10.5.2.13 username:jenkins ssh_key:/home/couchbase/QAkey.pem [vm2] ip:10.5.2.14 username:jenkins ssh_key:/home/couchbase/QAkey.pem [vm3] ip:10.5.2.15 username:jenkins ssh_key:/home/couchbase/QAkey.pem [vm4] ip:10.3.121.63 username:root password:couchbase [vm5] ip:10.3.121.64 username:root password:couchbase [vm6] ip:10.3.121.66 username:root password:couchbase #[vm7] #ip:10.3.121.69 #username:root #password:couchbase [membase] rest_username:Administrator rest_password:password |
| Comment by Maria McDuff [ 13/May/13 ] |
|
andrei, with this toybuild 33, is memcached crashing at all? or you are seeing the same hang as |
| Comment by Jin Lim [ 13/May/13 ] |
|
Please see the latest update in Thanks much for your help and time! Jin |
| Comment by Andrei Baranouski [ 13/May/13 ] |
|
we still don't have the latest build with (http://review.couchbase.org/#/c/26253/) the latest one is http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_x86_2.0.2-799-rel.deb.manifest.xml <project name="ep-engine" path="ep-engine" revision="e657fe4789a4a8be3ef145d602548278b48ad3de"/> |
| Comment by Maria McDuff [ 13/May/13 ] |
|
Andrei, Use Build 800. it's ready. Thanks. |
| Comment by Andrei Baranouski [ 15/May/13 ] |
|
801 still has |
| Comment by Jin Lim [ 15/May/13 ] |
|
EINVAL The mutex was created with the protocol attribute having the value PTHREAD_PRIO_PROTECT and the calling thread's priority is higher than the mutex's current priority ceiling. |
| Comment by Jin Lim [ 15/May/13 ] |
| To double confirm please provide the core dump from this getCouchBucket() call. Thanks. |
| Comment by Andrei Baranouski [ 16/May/13 ] |
|
see many crashes on 2.0.2-803 http://qa.hq.northscale.net/job/centos-64-2.0-basic-rebalance-tests-P0/445/consoleFull python scripts/ssh.py -i centos-64-2.0-basic-rebalance-tests-P0.ini "ls -la /tmp/" 10.3.121.64 total 650768 drwxrwxrwt 5 couchbase couchbase 4096 May 15 16:43 . drwxr-xr-x 24 root root 4096 Apr 30 15:29 .. -rw------- 1 couchbase couchbase 316329984 May 15 09:26 core.memcached.12720 -rw------- 1 couchbase couchbase 279449600 May 15 10:19 core.memcached.12989 -rw------- 1 couchbase couchbase 333119488 May 15 10:27 core.memcached.15922 -rw------- 1 couchbase couchbase 310018048 May 15 10:30 core.memcached.16218 -rw------- 1 couchbase couchbase 327880704 May 15 10:54 core.memcached.16435 -rw------- 1 couchbase couchbase 282595328 May 12 06:36 core.memcached.23225 -rw-r--r-- 1 root root 128307838 May 15 00:28 couchbase-server-enterprise_x86_64_2.0.2-803-rel.rpm drwxrwxrwt 2 root root 4096 Apr 30 15:43 .font-unix drwxrwxrwt 2 root root 4096 Apr 30 15:29 .ICE-unix drwxrwxr-x 3 1000 1000 4096 May 9 04:02 measure-sched-delays 10.3.121.69 total 179824 drwxrwxrwt 9 couchbase couchbase 4096 May 16 00:30 . drwxr-xr-x 25 root root 4096 Apr 30 15:30 .. drwxr-xr-x 3 1000 1000 4096 Apr 25 20:06 automake-1.11.1 drwxr-xr-x 2 root root 4096 May 15 16:47 backup -rw-r--r-- 1 jenkins jenkins 55587177 Jul 18 2012 couchbase-server-enterprise_x86_64_1.8.0r-55-g80f24f2.rpm -rw-r--r-- 1 root root 128307838 May 15 00:28 couchbase-server-enterprise_x86_64_2.0.2-803-rel.rpm drwxrwxrwt 2 root root 4096 Apr 30 15:45 .font-unix drwxrwxrwt 2 root root 4096 Apr 30 15:30 .ICE-unix drwxrwxr-x 2 501 staff 4096 Apr 25 20:06 libtool-2.4.2 drwxrwxr-x 3 1000 1000 4096 May 9 04:02 measure-sched-delays drwxr-xr-x 3 root root 4096 Apr 25 20:06 s3cmd 10.3.121.66 total 179812 drwxrwxrwt 6 couchbase couchbase 4096 May 16 00:30 . drwxr-xr-x 24 root root 4096 Apr 30 15:30 .. drwxr-xr-x 2 root root 4096 May 15 16:46 backup -rw-r--r-- 1 jenkins jenkins 55587177 Jul 18 2012 couchbase-server-enterprise_x86_64_1.8.0r-55-g80f24f2.rpm -rw-r--r-- 1 root root 128307838 May 15 00:28 couchbase-server-enterprise_x86_64_2.0.2-803-rel.rpm drwxrwxrwt 2 root root 4096 Apr 30 15:45 .font-unix drwxrwxrwt 2 root root 4096 Apr 30 15:30 .ICE-unix drwxrwxr-x 3 1000 1000 4096 May 9 04:02 measure-sched-delays 10.3.121.63 total 757504 drwxrwxrwt 5 couchbase couchbase 4096 May 15 16:42 . drwxr-xr-x 24 root root 4096 Apr 30 15:29 .. -rw------- 1 couchbase couchbase 490033152 May 15 10:19 core.memcached.1355 -rw------- 1 couchbase couchbase 401608704 May 15 10:30 core.memcached.5139 -rw------- 1 couchbase couchbase 303599616 May 15 10:42 core.memcached.6619 -rw------- 1 couchbase couchbase 326811648 May 15 10:54 core.memcached.6701 -rw-r--r-- 1 root root 128307838 May 15 00:28 couchbase-server-enterprise_x86_64_2.0.2-803-rel.rpm drwxrwxrwt 2 root root 4096 Apr 30 15:44 .font-unix drwxrwxrwt 2 root root 4096 Apr 30 15:29 .ICE-unix drwxrwxr-x 3 1000 1000 4096 May 15 00:15 measure-sched-delays 10.5.2.13 total 54400 drwxrwxrwt 6 couchbase couchbase 20480 May 16 00:30 . drwxr-xr-x 25 root root 4096 Mar 5 18:31 .. drwxrwxrwt 2 root root 4096 Mar 5 18:31 .ICE-unix -r--r--r-- 1 root root 11 Mar 5 18:35 .X0-lock drwxrwxrwt 2 root root 4096 Mar 5 18:35 .X11-unix drwxrwxrwt 2 root root 4096 Mar 5 18:35 .font-unix srw-rw-rw- 1 root root 0 Mar 5 18:35 .gdm_socket -rw-r--r-- 1 jenkins jenkins 55587177 Jul 18 2012 couchbase-server-enterprise_x86_64_1.8.0r-55-g80f24f2.rpm drwxr-xr-x 3 jenkins jenkins 4096 May 9 04:01 measure-sched-delays 10.5.2.15 total 1776236 drwxrwxrwt 7 membase membase 24576 May 16 00:30 . drwxr-xr-x 24 root root 4096 Mar 5 18:30 .. drwxrwxrwt 2 root root 4096 Mar 5 18:30 .ICE-unix -r--r--r-- 1 root root 11 Mar 5 18:33 .X0-lock drwxrwxrwt 2 root root 4096 Mar 5 18:33 .X11-unix drwxrwxrwt 2 root root 4096 Mar 5 18:33 .font-unix srw-rw-rw- 1 root root 0 Mar 5 18:33 .gdm_socket drwxr-xr-x 2 root root 4096 May 15 16:46 backup -rw-r--r-- 1 root root 5098 May 15 16:46 core-10.5.2.15-0.log -rw-r--r-- 1 root root 5386 May 15 16:46 core-10.5.2.15-1.log -rw-r--r-- 1 root root 5098 May 15 16:46 core-10.5.2.15-2.log -rw-r--r-- 1 root root 4810 May 15 16:46 core-10.5.2.15-3.log -rw-r--r-- 1 root root 4810 May 15 16:46 core-10.5.2.15-4.log -rw-r--r-- 1 root root 4810 May 15 16:46 core-10.5.2.15-5.log -rw-r--r-- 1 root root 5098 May 15 16:46 core-10.5.2.15-6.log -rw-r--r-- 1 root root 4522 May 15 16:46 core-10.5.2.15-7.log -rw-r--r-- 1 root root 5098 May 15 16:46 core-10.5.2.15-8.log -rw------- 1 couchbase couchbase 647872512 May 12 06:43 core.memcached.14945 -rw------- 1 couchbase couchbase 336265216 May 15 09:15 core.memcached.16949 -rw------- 1 couchbase couchbase 690114560 May 15 09:33 core.memcached.17634 -rw------- 1 couchbase couchbase 278417408 May 15 09:48 core.memcached.19427 -rw------- 1 couchbase couchbase 278417408 May 15 10:14 core.memcached.20148 -rw------- 1 couchbase couchbase 497389568 May 15 10:27 core.memcached.21502 -rw------- 1 couchbase couchbase 311087104 May 15 10:30 core.memcached.22944 -rw------- 1 couchbase couchbase 319340544 May 15 10:42 core.memcached.23156 -rw------- 1 couchbase couchbase 375361536 May 15 10:54 core.memcached.23256 -rw-r--r-- 1 jenkins jenkins 55587177 Jul 18 2012 couchbase-server-enterprise_x86_64_1.8.0r-55-g80f24f2.rpm drwxr-xr-x 3 jenkins jenkins 4096 May 9 04:01 measure-sched-delays 10.5.2.14 total 1411284 drwxrwxrwt 7 couchbase couchbase 24576 May 16 00:30 . drwxr-xr-x 25 root root 4096 Mar 5 18:31 .. drwxrwxrwt 2 root root 4096 Mar 5 18:31 .ICE-unix -r--r--r-- 1 root root 11 Mar 5 18:35 .X0-lock drwxrwxrwt 2 root root 4096 Mar 5 18:35 .X11-unix drwxrwxrwt 2 root root 4096 Mar 5 18:35 .font-unix srw-rw-rw- 1 root root 0 Mar 5 18:35 .gdm_socket drwxr-xr-x 2 root root 4096 May 15 16:46 backup -rw-r--r-- 1 root root 5386 May 15 16:46 core-10.5.2.14-0.log -rw-r--r-- 1 root root 5386 May 15 16:46 core-10.5.2.14-1.log -rw-r--r-- 1 root root 5386 May 15 16:46 core-10.5.2.14-2.log -rw-r--r-- 1 root root 4810 May 15 16:46 core-10.5.2.14-3.log -rw-r--r-- 1 root root 5098 May 15 16:46 core-10.5.2.14-4.log -rw-r--r-- 1 root root 4810 May 15 16:46 core-10.5.2.14-5.log -rw-r--r-- 1 root root 5386 May 15 16:46 core-10.5.2.14-6.log -rw-r--r-- 1 root root 5098 May 15 16:46 core-10.5.2.14-7.log -rw-r--r-- 1 root root 5098 May 15 16:46 core-10.5.2.14-8.log -rw------- 1 couchbase couchbase 336265216 May 15 09:15 core.memcached.15346 -rw------- 1 couchbase couchbase 689065984 May 15 09:33 core.memcached.16002 -rw------- 1 couchbase couchbase 278413312 May 15 09:48 core.memcached.17262 -rw------- 1 couchbase couchbase 278401024 May 15 10:06 core.memcached.17992 -rw------- 1 couchbase couchbase 278413312 May 15 10:14 core.memcached.18717 -rw------- 1 couchbase couchbase 493174784 May 15 10:27 core.memcached.19403 -rw------- 1 couchbase couchbase 310022144 May 15 10:30 core.memcached.20835 -rw------- 1 couchbase couchbase 318291968 May 15 10:42 core.memcached.21046 -rw------- 1 couchbase couchbase 374312960 May 15 10:54 core.memcached.21154 -rw-r--r-- 1 jenkins jenkins 55587177 Jul 18 2012 couchbase-server-enterprise_x86_64_1.8.0r-55-g80f24f2.rpm drwxr-xr-x 3 jenkins jenkins 4096 May 9 04:01 measure-sched-delays gdb /opt/couchbase/bin/memcached core.memcached.12720 GNU gdb (GDB) CentOS (7.0.1-45.el5.centos) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /opt/couchbase/bin/memcached...done. [New Thread 12915] [New Thread 12988] [New Thread 12987] [New Thread 12986] [New Thread 12920] [New Thread 12919] [New Thread 12918] [New Thread 12917] [New Thread 12916] [New Thread 12914] [New Thread 12913] [New Thread 12912] [New Thread 12735] [New Thread 12734] [New Thread 12733] [New Thread 12732] [New Thread 12731] [New Thread 12730] [New Thread 12729] [New Thread 12720] Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done. Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0 Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done. Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/librt.so.1 Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done. Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4 Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libstdc++.so.6 Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libgcc_s.so.1 Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done. Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done. Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done. Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so Reading symbols from /opt/couchbase/lib/memcached/ep.so...done. Loaded symbols for /opt/couchbase/lib/memcached/ep.so Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done. Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1 Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done. Loaded symbols for /opt/couchbase/lib/libsnappy.so.1 Reading symbols from /opt/couchbase/lib/libicuuc.so.44...done. Loaded symbols for /opt/couchbase/lib/libicuuc.so.44 Reading symbols from /opt/couchbase/lib/libicudata.so.44...(no debugging symbols found)...done. Loaded symbols for /opt/couchbase/lib/libicudata.so.44 Reading symbols from /opt/couchbase/lib/libicui18n.so.44...done. Loaded symbols for /opt/couchbase/lib/libicui18n.so.44 warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff55dfd000 Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'. Program terminated with signal 6, Aborted. #0 0x0000003866c30285 in raise () from /lib64/libc.so.6 (gdb) t a a bt Thread 20 (Thread 0x2b5633c46220 (LWP 12720)): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b56337ca576 in epoll_dispatch (base=0x6b4a000, tv=<value optimized out>) at epoll.c:404 #2 0x00002b56337b5e44 in event_base_loop (base=0x6b4a000, flags=<value optimized out>) at event.c:1558 #3 0x00000000004097d6 in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7926 Thread 19 (Thread 12729): #0 0x0000003866cc545b in read () from /lib64/libc.so.6 #1 0x0000003866c6b677 in _IO_new_file_underflow () from /lib64/libc.so.6 #2 0x0000003866c6c03e in _IO_default_uflow_internal () from /lib64/libc.so.6 #3 0x0000003866c61124 in _IO_getline_info_internal () from /lib64/libc.so.6 #4 0x0000003866c5ffc9 in fgets () from /lib64/libc.so.6 #5 0x00002b5633c47939 in check_stdin_thread (arg=<value optimized out>) at extensions/daemon/stdin_check.c:37 #6 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #7 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 18 (Thread 12730): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaaae4d6 in logger_thead_main (arg=0x22e6040) at extensions/loggers/file_logger.c:368 #2 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #3 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 17 (Thread 12731): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b56337ca576 in epoll_dispatch (base=0x6b4a500, tv=<value optimized out>) at epoll.c:404 #2 0x00002b56337b5e44 in event_base_loop (base=0x6b4a500, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x22e9900) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 16 (Thread 12732): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b56337ca576 in epoll_dispatch (base=0x6b4a280, tv=<value optimized out>) at epoll.c:404 #2 0x00002b56337b5e44 in event_base_loop (base=0x6b4a280, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x22e99f8) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 15 (Thread 12733): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b56337ca576 in epoll_dispatch (base=0x6b4ac80, tv=<value optimized out>) at epoll.c:404 #2 0x00002b56337b5e44 in event_base_loop (base=0x6b4ac80, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x22e9af0) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 14 (Thread 12734): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b56337ca576 in epoll_dispatch (base=0x6b4aa00, tv=<value optimized out>) at epoll.c:404 #2 0x00002b56337b5e44 in event_base_loop (base=0x6b4aa00, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x22e9be8) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 ---Type <return> to continue, or q <return> to quit--- Thread 13 (Thread 12735): #0 0x0000003866cd3648 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b56337ca576 in epoll_dispatch (base=0x6b4a780, tv=<value optimized out>) at epoll.c:404 #2 0x00002b56337b5e44 in event_base_loop (base=0x6b4a780, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414604 in worker_libevent (arg=0x22e9ce0) at daemon/thread.c:301 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 12 (Thread 12912): #0 0x0000003866c99221 in nanosleep () from /lib64/libc.so.6 #1 0x0000003866cccba4 in usleep () from /lib64/libc.so.6 #2 0x00002aaaaaf351a5 in updateStatsThread (arg=0x22e64c0) at src/memory_tracker.cc:31 #3 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #4 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 11 (Thread 12913): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf399f1 in wait (this=0x6bf5ba0) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6bf5ba0) at src/scheduler.cc:139 #3 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6bf5ba0) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 10 (Thread 12914): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf399f1 in wait (this=0x6bf5a00) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6bf5a00) at src/scheduler.cc:139 #3 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6bf5a00) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 9 (Thread 12916): #0 0x0000003866ccc767 in fdatasync () from /lib64/libc.so.6 #1 0x00002aaaab1da67f in couch_sync (handle=<value optimized out>) at src/os.c:117 #2 0x00002aaaaaf7a25f in cfs_sync (h=0x89cac40) at src/couch-kvstore/couch-fs-stats.cc:88 #3 0x00002aaaab1d475f in couchstore_commit (db=0x6b51ce0) at src/couch_db.c:193 #4 0x00002aaaaaf73d46 in CouchKVStore::setVBucketState (this=0x8908600, vbucketId=323, vbstate=..., vb_change_type=1, newfile=96, notify=true) at src/couch-kvstore/couch-kvstore.cc:728 #5 0x00002aaaaaf74b69 in CouchKVStore::snapshotVBuckets (this=0x8908600, vbstates=Traceback (most recent call last): File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (keytype, valuetype)) RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >. std::map with 1 elements) at src/couch-kvstore/couch-kvstore.cc:596 #6 0x00002aaaaaefc3f3 in EventuallyPersistentStore::snapshotVBuckets (this=0x8d4a800, priority=..., shardId=<value optimized out>) at src/ep.cc:760 #7 0x00002aaaaaf54daf in VBSnapshotTask::run (this=<value optimized out>) at src/tasks.cc:78 #8 0x00002aaaaaf39b61 in ExecutorThread::run (this=0x6c1cea0) at src/scheduler.cc:153 #9 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6c1cea0) at src/scheduler.cc:34 #10 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #11 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 8 (Thread 12917): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf399f1 in wait (this=0x6c1cd00) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6c1cd00) at src/scheduler.cc:139 #3 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6c1cd00) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 7 (Thread 12918): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf399f1 in wait (this=0x6c1cb60) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6c1cb60) at src/scheduler.cc:139 #3 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6c1cb60) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 6 (Thread 12919): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf399f1 in wait (this=0x6c1c9c0) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6c1c9c0) at src/scheduler.cc:139 #3 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6c1c9c0) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 5 (Thread 12920): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf399f1 in wait (this=0x6c1c820) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6c1c820) at src/scheduler.cc:139 #3 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6c1c820) at src/scheduler.cc:34 #4 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 4 (Thread 12986): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaf101df in wait (this=0x8d1ad00) at src/syncobject.hh:57 #2 wait (this=0x8d1ad00) at src/syncobject.hh:73 #3 wait (this=0x8d1ad00) at src/tapconnmap.hh:169 #4 EventuallyPersistentEngine::notifyPendingConnections (this=0x8d1ad00) at src/ep_engine.cc:3377 #5 0x00002aaaaaf102c3 in EvpNotifyPendingConns (arg=0x8d1ad00) at src/ep_engine.cc:1153 #6 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #7 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 3 (Thread 12987): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaef37c8 in wait (this=0x71f2fc0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x71f2fc0, d=...) at src/dispatcher.cc:342 #3 0x00002aaaaaef632a in Dispatcher::run (this=0x7e73c00) at src/dispatcher.cc:184 #4 0x00002aaaaaef6aed in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28 #5 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #6 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 2 (Thread 12988): #0 0x000000386740b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaaef37c8 in wait (this=0x71f2ab0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x71f2ab0, d=...) at src/dispatcher.cc:342 #3 0x00002aaaaaef632a in Dispatcher::run (this=0x7e73dc0) at src/dispatcher.cc:184 #4 0x00002aaaaaef6aed in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28 #5 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #6 0x0000003866cd325d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x48664940 (LWP 12915)): ---Type <return> to continue, or q <return> to quit--- #0 0x0000003866c30285 in raise () from /lib64/libc.so.6 #1 0x0000003866c31d30 in abort () from /lib64/libc.so.6 #2 0x00002aaaaaf360f0 in Mutex::acquire (this=0x6ba2828) at src/mutex.cc:83 #3 0x00002aaaaaf84a71 in lock (this=<value optimized out>, key="couch_port") at src/locks.hh:48 #4 LockHolder (this=<value optimized out>, key="couch_port") at src/locks.hh:26 #5 Configuration::getInteger (this=<value optimized out>, key="couch_port") at src/configuration.cc:77 #6 0x00002aaaaaf8da15 in Configuration::getCouchPort (this=0x6ba2828) at src/generated_configuration.cc:83 #7 0x00002aaaaaf7c056 in CouchNotifier::ensureConnection (this=0x6bc5000) at src/couch-kvstore/couch-notifier.cc:317 #8 0x00002aaaaaf7cc91 in CouchNotifier::sendCommand (this=0x6bc5000, rh=0x8f81460) at src/couch-kvstore/couch-notifier.cc:437 #9 0x00002aaaaaf7d1b4 in CouchNotifier::selectBucket (this=0x6bc5000) at src/couch-kvstore/couch-notifier.cc:739 #10 0x00002aaaaaf7d6cf in CouchNotifier::processInput (this=0x6bc5000) at src/couch-kvstore/couch-notifier.cc:606 #11 0x00002aaaaaf7cce9 in maybeProcessInput (this=0x6bc5000, rh=0x8f81400) at src/couch-kvstore/couch-notifier.cc:546 #12 CouchNotifier::sendCommand (this=0x6bc5000, rh=0x8f81400) at src/couch-kvstore/couch-notifier.cc:439 #13 0x00002aaaaaf7df58 in CouchNotifier::notify_update (this=0x6bc5000, vbs=..., file_version=1, header_offset=4096, cb=...) at src/couch-kvstore/couch-notifier.cc:772 #14 0x00002aaaaaf73c63 in CouchKVStore::setVBucketState (this=0x8ff8f00, vbucketId=682, vbstate=..., vb_change_type=1, newfile=96, notify=true) at src/couch-kvstore/couch-kvstore.cc:745 #15 0x00002aaaaaf74b69 in CouchKVStore::snapshotVBuckets (this=0x8ff8f00, vbstates=Traceback (most recent call last): File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (keytype, valuetype)) RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >. std::map with 1 elements) at src/couch-kvstore/couch-kvstore.cc:596 #16 0x00002aaaaaefc3f3 in EventuallyPersistentStore::snapshotVBuckets (this=0x8d4a800, priority=..., shardId=<value optimized out>) at src/ep.cc:760 #17 0x00002aaaaaf54daf in VBSnapshotTask::run (this=<value optimized out>) at src/tasks.cc:78 #18 0x00002aaaaaf39b61 in ExecutorThread::run (this=0x6bf5860) at src/scheduler.cc:153 #19 0x00002aaaaaf3a27d in launch_executor_thread (arg=0x6bf5860) at src/scheduler.cc:34 #20 0x000000386740677d in start_thread () from /lib64/libpthread.so.0 #21 0x0000003866cd325d in clone () from /lib64/libc.so.6 vms: ip:10.5.2.13 username:jenkins ssh_key:QAkey.pem [vm2] ip:10.5.2.14 username:jenkins ssh_key:QAkey.pem [vm3] ip:10.5.2.15 username:jenkins ssh_key:QAkey.pem [vm4] ip:10.3.121.63 username:root password:couchbase [vm5] ip:10.3.121.64 username:root password:couchbase [vm6] ip:10.3.121.66 username:root password:couchbase [vm7] ip:10.3.121.69 username:root password:couchbase |
| Comment by Andrei Baranouski [ 16/May/13 ] |
|
seems like it was fixed in 2.0.0-MRW37-toy http://qa.hq.northscale.net/job/CouchbaseServer-SanityTest-4Node-Centos64/77/consoleFull |
| Comment by Andrei Baranouski [ 16/May/13 ] |
| please assign back to me when we get build with corresponding commit |
| Comment by Jin Lim [ 17/May/13 ] |
| build 806 didn't run into this issue anymore. please confirm and close the bug. thanks. |
[MB-7938] 2.0.2 memcached crashes in EventuallyPersistentStore::flushVBucket Created: 19/Mar/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Andrei Baranouski | Assignee: | Andrei Baranouski |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | version=2.0.2-741-rel, Ubuntu 11.04 | ||
| Description |
|
http://qa.hq.northscale.net/job/centos-64-2.0-rebalance-regressions/196/consoleFull
./testrunner -i /tmp/rebalance_regression.ini wait_timeout=100,get-cbcollect-info=True -t swaprebalance.SwapRebalanceFailedTests.test_failover_swap_rebalance,replica=2,num-buckets=2,num-swap=2,keys-count=1000000,swap-orchestrator=True test logs: [2013-03-18 08:35:42,154] - [data_helper:289] INFO - creating direct client 10.3.121.94:11210 bucket-1 [2013-03-18 08:35:42,432] - [rest_client:913] ERROR - {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'} - rebalance failed [2013-03-18 08:35:42,432] - [rest_client:914] INFO - Latest logs from UI: [2013-03-18 08:35:42,565] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.93', u'code': 1, u'text': u"Node 'ns_1@10.3.121.93' is leaving cluster.", u'shortText': u'message', u'module': u'ns_cluster', u'tstamp': 1363621308275, u'type': u'info'} [2013-03-18 08:35:42,567] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.93', u'code': 4, u'text': u'Node ns_1@10.3.121.93 left cluster', u'shortText': u'message', u'module': u'ns_cluster', u'tstamp': 1363621307949, u'type': u'info'} [2013-03-18 08:35:42,567] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.95', u'code': 4, u'text': u"Control connection to memcached on 'ns_1@10.3.121.95' disconnected: {badmatch,\n {error,\n closed}}", u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1363621307350, u'type': u'info'} [2013-03-18 08:35:42,570] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.93', u'code': 2, u'text': u"Rebalance exited with reason {{bulk_set_vbucket_state_failed,\n [{'ns_1@10.3.121.95',\n {'EXIT',\n {{{{unexpected_reason,\n {{badmatch,{error,closed}},\n [{mc_binary,quick_stats_recv,3},\n {mc_binary,\n mass_get_last_closed_checkpoint_loop,\n 5},\n {mc_binary,\n mass_get_last_closed_checkpoint,3},\n {ebucketmigrator_srv,handle_call,3},\n {gen_server,handle_msg,5},\n {proc_lib,init_p_do_apply,3}]}},\n [{misc,executing_on_new_process,1},\n {tap_replication_manager,\n change_vbucket_filter,4},\n {tap_replication_manager,\n '-do_set_incoming_replication_map/3-lc$^5/1-5-',\n 2},\n {tap_replication_manager,\n do_set_incoming_replication_map,3},\n {tap_replication_manager,handle_call,3},\n {gen_server,handle_msg,5},\n {proc_lib,init_p_do_apply,3}]},\n {gen_server,call,\n ['tap_replication_manager-bucket-1',\n {change_vbucket_replication,531,\n 'ns_1@10.3.121.96'},\n infinity]}},\n {gen_server,call,\n [{'janitor_agent-bucket-1',\n 'ns_1@10.3.121.95'},\n {if_rebalance,<0.13276.97>,\n {update_vbucket_state,531,replica,\n undefined,'ns_1@10.3.121.96'}},\n infinity]}}}},\n {'ns_1@10.3.121.94',\n {'EXIT',\n {{{{unexpected_reason,\n {{badmatch,{error,closed}},\n [{mc_binary,quick_stats_recv,3},\n {mc_binary,quick_stats_loop,5},\n {mc_binary,quick_stats,5},\n {mc_client_binary,\n get_zero_open_checkpoint_vbuckets,3},\n {ebucketmigrator_srv,handle_call,3},\n {gen_server,handle_msg,5},\n {proc_lib,init_p_do_apply,3}]}},\n [{misc,executing_on_new_process,1},\n {tap_replication_manager,\n change_vbucket_filter,4},\n {tap_replication_manager,\n '-do_set_incoming_replication_map/3-lc$^5/1-5-',\n 2},\n {tap_replication_manager,\n do_set_incoming_replication_map,3},\n {tap_replication_manager,handle_call,3},\n {gen_server,handle_msg,5},\n {proc_lib,init_p_do_apply,3}]},\n {gen_server,call,\n ['tap_replication_manager-bucket-1',\n {change_vbucket_replication,531,\n 'ns_1@10.3.121.95'},\n infinity]}},\n {gen_server,call,\n [{'janitor_agent-bucket-1',\n 'ns_1@10.3.121.94'},\n {if_rebalance,<0.13276.97>,\n {update_vbucket_state,531,replica,\n undefined,'ns_1@10.3.121.95'}},\n infinity]}}}}]},\n [{janitor_agent,bulk_set_vbucket_state,4},\n {ns_vbucket_mover,\n update_replication_post_move,3},\n {ns_vbucket_mover,on_move_done,2},\n {gen_server,handle_msg,5},\n {proc_lib,init_p_do_apply,3}]}\n", u'shortText': u'message', u'module': u'ns_orchestrator', u'tstamp': 1363621307292, u'type': u'info'} [2013-03-18 08:35:42,571] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.95', u'code': 0, u'text': u'Port server memcached on node \'ns_1@10.3.121.95\' exited with status 134. Restarting. Messages: Mon Mar 18 08:41:35.578716 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_451 - Reset vbucket 538 was completed succecssfully.\nMon Mar 18 08:41:35.704327 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_452 - disconnected\nMon Mar 18 08:41:35.805105 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_450"\nMon Mar 18 08:41:35.805157 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_452"\nMon Mar 18 08:41:35.805446 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:35.830322 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 538, cookie 0x6ccf080\nMon Mar 18 08:41:36.347550 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_451 - disconnected\nMon Mar 18 08:41:36.444078 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_453 - Reset vbucket 870 was completed succecssfully.\nMon Mar 18 08:41:36.479457 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_451"\nMon Mar 18 08:41:36.480162 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.94 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:36.714685 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 870, cookie 0x6ca6000\nMon Mar 18 08:41:36.976727 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_453 - disconnected\nMon Mar 18 08:41:37.019371 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_454 - Reset vbucket 537 was completed succecssfully.\nMon Mar 18 08:41:37.206467 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_455 - disconnected\nMon Mar 18 08:41:37.271568 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 537, cookie 0x6ca6000\nMon Mar 18 08:41:37.285754 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_453"\nMon Mar 18 08:41:37.285827 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_455"\nMon Mar 18 08:41:37.286268 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:37.643780 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_456 - Reset vbucket 869 was completed succecssfully.\nMon Mar 18 08:41:37.799981 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 869, cookie 0x6cf1b80\nMon Mar 18 08:41:37.837434 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_456 - disconnected\nMon Mar 18 08:41:37.907834 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_454 - disconnected\nMon Mar 18 08:41:37.984234 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_457 - Reset vbucket 868 was completed succecssfully.\nMon Mar 18 08:41:38.035570 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_454"\nMon Mar 18 08:41:38.035669 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_456"\nMon Mar 18 08:41:38.036087 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.94 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:38.180437 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_458 - disconnected\nMon Mar 18 08:41:38.250083 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_458"\nMon Mar 18 08:41:38.383886 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:38.544533 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 868, cookie 0x6ca62c0\nMon Mar 18 08:41:38.692906 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_459 - Reset vbucket 536 was completed succecssfully.\nMon Mar 18 08:41:38.933451 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_457 - disconnected\nMon Mar 18 08:41:38.934109 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 536, cookie 0x6ca7080\nMon Mar 18 08:41:39.241149 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_460 - Reset vbucket 867 was completed succecssfully.\nMon Mar 18 08:41:39.396965 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_461 - disconnected\nMon Mar 18 08:41:39.472257 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_457"\nMon Mar 18 08:41:39.472345 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_461"\nMon Mar 18 08:41:39.506632 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:39.553585 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_459 - disconnected\nMon Mar 18 08:41:39.572591 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 867, cookie 0x6ca6000\nMon Mar 18 08:41:39.579114 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_459"\nMon Mar 18 08:41:39.695310 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_460 - disconnected\nMon Mar 18 08:41:39.900652 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_460"\nMon Mar 18 08:41:39.900985 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.94 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:40.151512 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_462 - disconnected\nMon Mar 18 08:41:40.266294 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_462"\nMon Mar 18 08:41:40.266461 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:40.856053 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_463 - Reset vbucket 866 was completed succecssfully.\nMon Mar 18 08:41:41.091008 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 866, cookie 0x6ca6000\nMon Mar 18 08:41:41.323419 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_463 - disconnected\nMon Mar 18 08:41:41.342314 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_463"\nMon Mar 18 08:41:41.371404 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_464 - Reset vbucket 535 was completed succecssfully.\nMon Mar 18 08:41:41.661043 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_465 - disconnected\nMon Mar 18 08:41:41.717840 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 535, cookie 0x6ca62c0\nMon Mar 18 08:41:41.742040 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_465"\nMon Mar 18 08:41:41.742448 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:41.954347 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_466 - Reset vbucket 865 was completed succecssfully.\nMon Mar 18 08:41:42.166879 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 865, cookie 0x6cf1600\nMon Mar 18 08:41:42.269048 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_464 - disconnected\nMon Mar 18 08:41:42.340677 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_464"\nMon Mar 18 08:41:42.340711 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.94 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:42.384967 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_467 - Reset vbucket 864 was completed succecssfully.\nMon Mar 18 08:41:42.413198 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_466 - disconnected\nMon Mar 18 08:41:42.441804 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_466"\nMon Mar 18 08:41:42.684453 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 864, cookie 0x6ccf080\nMon Mar 18 08:41:42.818667 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_468 - disconnected\nMon Mar 18 08:41:42.875395 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_467 - disconnected\nMon Mar 18 08:41:42.904356 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_469 - Reset vbucket 534 was completed succecssfully.\nMon Mar 18 08:41:42.978903 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_467"\nMon Mar 18 08:41:42.978974 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_468"\nMon Mar 18 08:41:42.979235 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:43.243646 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_470 - disconnected\nMon Mar 18 08:41:43.403811 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:43.403750 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_470"\nMon Mar 18 08:41:43.494580 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 534, cookie 0x6ccf600\nMon Mar 18 08:41:43.580734 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_471 - Reset vbucket 533 was completed succecssfully.\nMon Mar 18 08:41:43.893154 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 533, cookie 0x6ca6000\nMon Mar 18 08:41:44.327385 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_472 - Reset vbucket 532 was completed succecssfully.\nMon Mar 18 08:41:44.412144 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_469 - disconnected\nMon Mar 18 08:41:44.495170 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_469"\nMon Mar 18 08:41:44.502353 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_471 - disconnected\nMon Mar 18 08:41:44.516347 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_471"\nMon Mar 18 08:41:44.516369 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.94 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:44.519367 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 532, cookie 0x6ca62c0\nMon Mar 18 08:41:44.855524 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.94 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:45.087666 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_473 - Reset vbucket 863 was completed succecssfully.\nMon Mar 18 08:41:45.320777 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_472 - disconnected\nMon Mar 18 08:41:45.440897 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 863, cookie 0x6cceb00\nMon Mar 18 08:41:45.445425 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_472"\nMon Mar 18 08:41:45.446922 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.94 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:45.531819 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_474 - Reset vbucket 531 was completed succecssfully.\nMon Mar 18 08:41:45.566986 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_473 - disconnected\nMon Mar 18 08:41:45.641574 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_473"\nMon Mar 18 08:41:45.822884 PDT 3: (bucket-1) Notified the completion of checkpoint persistence for vbucket 531, cookie 0x6ccf600\nMon Mar 18 08:41:45.850503 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_475 - disconnected\nMon Mar 18 08:41:45.932729 PDT 3: (bucket-1) Schedule cleanup of "eq_tapq:anon_475"\nMon Mar 18 08:41:45.932798 PDT 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@10.3.121.96 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0\nMon Mar 18 08:41:46.151290 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_476 - Reset vbucket 530 was completed succecssfully.\nMon Mar 18 08:41:46.393310 PDT 3: (bucket-1) TAP (Consumer) eq_tapq:anon_474 - disconnected\nmemcached: src/ep.cc:1790: virtual void PersistenceCallback::callback(mutation_result&): Assertion `stats->diskQueueSize < ((size_t)1<<(sizeof(size_t)*8-1))\' failed.', u'shortText': u'message', u'module': u'ns_port_server', u'tstamp': 1363621307102, u'type': u'info'} [2013-03-18 08:35:42,577] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.93', u'code': 0, u'text': u'Bucket "bucket-1" rebalance does not seem to be swap rebalance', u'shortText': u'message', u'module': u'ns_vbucket_mover', u'tstamp': 1363620900761, u'type': u'info'} [2013-03-18 08:35:42,578] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.95', u'code': 1, u'text': u'Bucket "bucket-1" loaded on node \'ns_1@10.3.121.95\' in 0 seconds.', u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1363620899668, u'type': u'info'} [2013-03-18 08:35:42,579] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.96', u'code': 1, u'text': u'Bucket "bucket-1" loaded on node \'ns_1@10.3.121.96\' in 0 seconds.', u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1363620899353, u'type': u'info'} [2013-03-18 08:35:42,580] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.94', u'code': 5, u'text': u"Node 'ns_1@10.3.121.94' saw that node 'ns_1@10.3.121.98' went down.", u'shortText': u'node down', u'module': u'ns_node_disco', u'tstamp': 1363620899198, u'type': u'warning'} [2013-03-18 08:35:42,585] - [rest_client:915] ERROR - {u'node': u'ns_1@10.3.121.95', u'code': 5, u'text': u"Node 'ns_1@10.3.121.95' saw that node 'ns_1@10.3.121.98' went down.", u'shortText': u'node down', u'module': u'ns_node_disco', u'tstamp': 1363620899195, u'type': u'warning'} ERROR root@ubuntu1104-64:/tmp# sudo gdb /opt/couchbase/bin/memcached core.memcached.30589 GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2 Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /opt/couchbase/bin/memcached...done. [New Thread 30607] [New Thread 30597] [New Thread 30611] [New Thread 30589] [New Thread 30604] [New Thread 30598] [New Thread 30602] [New Thread 30600] [New Thread 30603] [New Thread 30601] [New Thread 30606] [New Thread 30610] [New Thread 30609] [New Thread 30608] warning: Can't read pathname for load map: Input/output error. Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done. Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0 Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done. Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5 Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libdl.so.2 Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libm.so.6 Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/librt.so.1 Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done. Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4 Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libpthread.so.0 Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib/x86_64-linux-gnu/libstdc++.so.6 Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libgcc_s.so.1 Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done. Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done. Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so Reading symbols from /lib/x86_64-linux-gnu/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libz.so.1 Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done. Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so Reading symbols from /opt/couchbase/lib/memcached/ep.so...done. Loaded symbols for /opt/couchbase/lib/memcached/ep.so Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done. Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1 Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done. Loaded symbols for /opt/couchbase/lib/libsnappy.so.1 Reading symbols from /opt/couchbase/lib/libicuuc.so.44...done. Loaded symbols for /opt/couchbase/lib/libicuuc.so.44 Reading symbols from /opt/couchbase/lib/libicudata.so.44...(no debugging symbols found)...done. Loaded symbols for /opt/couchbase/lib/libicudata.so.44 Reading symbols from /opt/couchbase/lib/libicui18n.so.44...done. Loaded symbols for /opt/couchbase/lib/libicui18n.so.44 Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2 Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'. Program terminated with signal 6, Aborted. #0 0x00007f9d01257d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) t a a bt Thread 14 (Thread 30608): #0 0x00007f9d015c7f2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f9cfca013d8 in wait (this=0x24ee2d0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x24ee2d0, d=...) at src/dispatcher.cc:328 #3 0x00007f9cfca042ca in Dispatcher::run (this=0x6d9b880) at src/dispatcher.cc:171 #4 0x00007f9cfca04b7d in launch_dispatcher_thread (arg=0x6d9b8d4) at src/dispatcher.cc:28 #5 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 13 (Thread 30609): #0 0x00007f9d015c7f2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f9cfca013d8 in wait (this=0x24ee240, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x24ee240, d=...) at src/dispatcher.cc:328 #3 0x00007f9cfca042ca in Dispatcher::run (this=0x6d9b6c0) at src/dispatcher.cc:171 #4 0x00007f9cfca04b7d in launch_dispatcher_thread (arg=0x6d9b714) at src/dispatcher.cc:28 #5 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 12 (Thread 30610): #0 0x00007f9d015c7f2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f9cfca013d8 in wait (this=0x24ee5a0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x24ee5a0, d=...) at src/dispatcher.cc:328 #3 0x00007f9cfca042ca in Dispatcher::run (this=0x6d9b500) at src/dispatcher.cc:171 #4 0x00007f9cfca04b7d in launch_dispatcher_thread (arg=0x6d9b554) at src/dispatcher.cc:28 #5 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 11 (Thread 30606): #0 0x00007f9d012d44ed in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f9d01305914 in usleep () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007f9cfca416a5 in updateStatsThread (arg=<value optimized out>) at src/memory_tracker.cc:31 #3 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #4 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x0000000000000000 in ?? () Thread 10 (Thread 30601): #0 0x00007f9d0130d633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f9d020dbf36 in epoll_dispatch (base=0x6d4e280, tv=<value optimized out>) at epoll.c:404 #2 0x00007f9d020c7394 in event_base_loop (base=0x6d4e280, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414b94 in worker_libevent (arg=0x24e94f8) at daemon/thread.c:301 #4 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 9 (Thread 30603): #0 mc_get_allocation_size (ptr=0x7977200) at daemon/alloc_hooks.c:115 #1 0x00007f9cfca41264 in DeleteHook (ptr=0x7977200) at src/memory_tracker.cc:56 #2 0x00007f9d01801402 in MallocHook::InvokeDeleteHookSlow(void const*) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #3 0x00007f9d017f455a in MallocHook::InvokeDeleteHook(void const*) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #4 0x00007f9d018077f6 in tc_delete () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #5 0x00007f9d00fb3e19 in std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 ---Type <return> to continue, or q <return> to quit--- #6 0x00007f9cfca3809a in void core_engine::add_casted_stat<unsigned long>(char const*, unsigned long, void (*)(char const*, unsigned short, char const*, unsigned int, void const*), void const*) () from /opt/couchbase/lib/memcached/ep.so #7 0x00007f9cfca28af0 in addCheckpointStat (this=<value optimized out>, vb=...) at src/ep_engine.cc:2858 #8 EventuallyPersistentEngine::StatCheckpointVisitor::visitBucket (this=<value optimized out>, vb=...) at src/ep_engine.cc:2842 #9 0x00007f9cfca09adc in EventuallyPersistentStore::visit (this=<value optimized out>, visitor=...) at src/ep.cc:2417 #10 0x00007f9cfca2d315 in EventuallyPersistentEngine::doCheckpointStats (this=0x6da2000, cookie=0x6cf18c0, add_stat=0x40d1d0 <append_stats>, stat_key=<value optimized out>, nkey=<value optimized out>) at src/ep_engine.cc:2868 #11 0x00007f9cfca2fdcc in EventuallyPersistentEngine::getStats (this=0x6da2000, cookie=0x6cf18c0, stat_key=0x6cf5018 "checkpointn_ns_1@10.3.121.94", nkey=10, add_stat=0x40d1d0 <append_stats>) at src/ep_engine.cc:3328 #12 0x00007f9cfca30416 in EvpGetStats (handle=0x6da2000, cookie=0x6cf18c0, stat_key=0x6cf5018 "checkpointn_ns_1@10.3.121.94", nkey=10, add_stat=<value optimized out>) at src/ep_engine.cc:193 #13 0x00007f9cff4e1c20 in bucket_get_stats (handle=<value optimized out>, cookie=0x6cf18c0, stat_key=0x6cf5018 "checkpointn_ns_1@10.3.121.94", nkey=10, add_stat=0x40d1d0 <append_stats>) at bucket_engine.c:1720 #14 0x00000000004113f6 in process_bin_stat (c=0x6cf18c0) at daemon/memcached.c:2199 #15 0x0000000000411d65 in complete_nread_binary (c=0x7977200) at daemon/memcached.c:3708 #16 complete_nread (c=0x7977200) at daemon/memcached.c:3820 #17 conn_nread (c=0x7977200) at daemon/memcached.c:5673 #18 0x00000000004068a5 in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x6cf18c0) at daemon/memcached.c:5936 #19 0x00007f9d020c748c in event_process_active_single_queue (base=0x6d4ea00, flags=<value optimized out>) at event.c:1308 #20 event_process_active (base=0x6d4ea00, flags=<value optimized out>) at event.c:1375 #21 event_base_loop (base=0x6d4ea00, flags=<value optimized out>) at event.c:1572 #22 0x0000000000414b94 in worker_libevent (arg=0x24e96e8) at daemon/thread.c:301 #23 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #24 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #25 0x0000000000000000 in ?? () Thread 8 (Thread 30600): #0 0x00007f9d015ca955 in __lll_unlock_wake () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f9d015c6e7a in _L_unlock_1177 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f9d015c6da3 in pthread_mutex_unlock () from /lib/x86_64-linux-gnu/libpthread.so.0 #3 0x00007f9cfca42636 in Mutex::release (this=0xed948a8) at src/mutex.cc:94 #4 0x00007f9cfc9fab10 in unlock (this=<value optimized out>, qi=..., vbucket=...) at src/locks.hh:58 #5 ~LockHolder (this=<value optimized out>, qi=..., vbucket=...) at src/locks.hh:41 #6 CheckpointManager::queueDirty (this=<value optimized out>, qi=..., vbucket=...) at src/checkpoint.cc:717 #7 0x00007f9cfca07423 in EventuallyPersistentStore::queueDirty (this=0x6d9e480, vb=..., key=..., vbid=<value optimized out>, op=queue_op_set, seqno=<value optimized out>, tapBackfill=false) at src/ep.cc:2168 #8 0x00007f9cfca0bc2d in EventuallyPersistentStore::setWithMeta (this=0x6d9e480, itm=..., cas=<value optimized out>, cookie=<value optimized out>, force=<value optimized out>, allowExisting=<value optimized out>, nru=3 '\003') at src/ep.cc:1399 #9 0x00007f9cfca2ac53 in EventuallyPersistentEngine::tapNotify(const void *, void *, uint16_t, uint8_t, uint16_t, <anonymous enum>, uint32_t, const void *, size_t, uint32_t, uint32_t, uint64_t, const void *, size_t, uint16_t) (this=0x6da2000, cookie=0x6ccf080, engine_specific=<value optimized out>, nengine=<value optimized out>, tap_flags=<value optimized out>, tap_event=TAP_MUTATION, tap_seqno=974, key=0x1102c031, nkey=25, flags=0, exptime=0, cas=15671462831700303, data=0x1102c04a, ndata=369, vbucket=530) at src/ep_engine.cc:2067 #10 0x00007f9cfca2b368 in EvpTapNotify(ENGINE_HANDLE *, const void *, void *, uint16_t, uint8_t, uint16_t, <anonymous enum>, uint32_t, const void *, size_t, uint32_t, uint32_t, uint64_t, const void *, size_t, uint16_t) (handle=0x6da2000, cookie=0x6ccf080, engine_specific=0x1102c028, nengine=65535, ttl=254 '\376', tap_flags=0, tap_event=TAP_MUTATION, tap_seqno=974, key=0x1102c031, nkey=25, flags=0, exptime=0, cas=15671462831700303, data=0x1102c04a, ndata=369, vbucket=<value optimized out>) at src/ep_engine.cc:1037 #11 0x00007f9cff4e0a04 in bucket_tap_notify (handle=<value optimized out>, cookie=0x6ccf080, engine_specific=0x1102c028, nengine=65535, ttl=254 '\376', tap_flags=23, tap_event=TAP_MUTATION, tap_seqno=974, key=0x1102c031, nkey=25, flags=0, exptime=0, cas=15671462831700303, data=0x1102c04a, ndata=369, vbucket=<value optimized out>) at bucket_engine.c:1942 #12 0x0000000000409eb2 in process_bin_tap_packet (event=TAP_MUTATION, c=0x6ccf080) at daemon/memcached.c:3031 #13 0x00000000004120c3 in process_bin_packet (c=0x6ccf080) at daemon/memcached.c:3117 #14 complete_nread_binary (c=0x6ccf080) at daemon/memcached.c:3738 #15 complete_nread (c=0x6ccf080) at daemon/memcached.c:3820 #16 conn_nread (c=0x6ccf080) at daemon/memcached.c:5673 #17 0x00000000004068a5 in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x6ccf080) at daemon/memcached.c:5936 #18 0x00007f9d020c748c in event_process_active_single_queue (base=0x6d4e500, flags=<value optimized out>) at event.c:1308 #19 event_process_active (base=0x6d4e500, flags=<value optimized out>) at event.c:1375 #20 event_base_loop (base=0x6d4e500, flags=<value optimized out>) at event.c:1572 #21 0x0000000000414b94 in worker_libevent (arg=0x24e9400) at daemon/thread.c:301 ---Type <return> to continue, or q <return> to quit--- #22 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #23 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #24 0x0000000000000000 in ?? () Thread 7 (Thread 30602): #0 0x00007f9d0130d633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f9d020dbf36 in epoll_dispatch (base=0x6d4ec80, tv=<value optimized out>) at epoll.c:404 #2 0x00007f9d020c7394 in event_base_loop (base=0x6d4ec80, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414b94 in worker_libevent (arg=0x24e95f0) at daemon/thread.c:301 #4 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 6 (Thread 30598): #0 0x00007f9d015c7f2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f9d00102176 in logger_thead_main (arg=<value optimized out>) at extensions/loggers/file_logger.c:368 #2 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 Thread 5 (Thread 30604): #0 0x00007f9d0130d633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f9d020dbf36 in epoll_dispatch (base=0x6d4e780, tv=<value optimized out>) at epoll.c:404 #2 0x00007f9d020c7394 in event_base_loop (base=0x6d4e780, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414b94 in worker_libevent (arg=0x24e97e0) at daemon/thread.c:301 #4 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 4 (Thread 30589): #0 0x00007f9d0130d633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f9d020dbf36 in epoll_dispatch (base=0x6d4e000, tv=<value optimized out>) at epoll.c:404 #2 0x00007f9d020c7394 in event_base_loop (base=0x6d4e000, flags=<value optimized out>) at event.c:1558 #3 0x000000000040c2e1 in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7918 Thread 3 (Thread 30611): #0 0x00007f9d015c7f2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f9cfca2090f in wait (this=0x6da2000) at src/syncobject.hh:57 #2 wait (this=0x6da2000) at src/syncobject.hh:73 #3 wait (this=0x6da2000) at src/tapconnmap.hh:169 #4 EventuallyPersistentEngine::notifyPendingConnections (this=0x6da2000) at src/ep_engine.cc:3406 #5 0x00007f9cfca209f3 in EvpNotifyPendingConns (arg=0x6da2000) at src/ep_engine.cc:1139 #6 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #7 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #8 0x0000000000000000 in ?? () Thread 2 (Thread 30597): #0 0x00007f9d012fe2ed in read () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f9d01299798 in _IO_file_underflow () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007f9d0129a7be in _IO_default_uflow () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007f9d0128e8fa in _IO_getline_info () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x00007f9d0128d7ca in fgets () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x00007f9d00b05b19 in fgets (arg=<value optimized out>) at /usr/include/bits/stdio2.h:255 #6 check_stdin_thread (arg=<value optimized out>) at extensions/daemon/stdin_check.c:37 #7 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #8 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #9 0x0000000000000000 in ?? () Thread 1 (Thread 30607): #0 0x00007f9d01257d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f9d0125bab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007f9d012507c5 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007f9cfca1d8db in PersistenceCallback::callback(std::pair<int, long>&) () from /opt/couchbase/lib/memcached/ep.so #4 0x00007f9cfca78724 in CouchKVStore::commitCallback(CouchRequest **, int, <anonymous enum>) (this=0x6dbc000, committedReqs=<value optimized out>, numReqs=10, errCode=COUCHSTORE_SUCCESS) at src/couch-kvstore/couch-kvstore.cc:1655 #5 0x00007f9cfca7c07c in CouchKVStore::commit2couchstore (this=0x6dbc000) at src/couch-kvstore/couch-kvstore.cc:1488 #6 0x00007f9cfca7c25a in CouchKVStore::commit (this=0x777d) at src/couch-kvstore/couch-kvstore.cc:871 #7 0x00007f9cfca0ee06 in EventuallyPersistentStore::flushVBucket (this=0x6d9e480, vbid=<value optimized out>) at src/ep.cc:1977 #8 0x00007f9cfca3ca1a in doFlush (this=0x6dba5a0, d=..., tid=...) at src/flusher.cc:215 #9 Flusher::step (this=0x6dba5a0, d=..., tid=...) at src/flusher.cc:153 #10 0x00007f9cfca042ca in Dispatcher::run (this=0x6d9aa80) at src/dispatcher.cc:171 #11 0x00007f9cfca04b7d in launch_dispatcher_thread (arg=0x777d) at src/dispatcher.cc:28 #12 0x00007f9d015c2d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #13 0x00007f9d0130cfdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #14 0x0000000000000000 in ?? () |
| Comments |
| Comment by Andrei Baranouski [ 19/Mar/13 ] |
|
core file: root(couchbase)@10.3.121.95:/core/7938-2.0.2-741.core.memcached.30589 https://s3.amazonaws.com/bugdb/jira/MB-7938/10.3.121.92-3182013-842-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/10.3.121.93-3182013-836-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/10.3.121.94-3182013-838-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/10.3.121.95-3182013-839-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/10.3.121.96-3182013-840-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/10.3.121.97-3182013-841-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/10.3.121.98-3182013-837-diag.zip |
| Comment by Xiaoqin Ma [ 26/Mar/13 ] |
|
Looks like the value get overflowed, we did 0 - 1 on diskQueuSize, which causes it be the biggest positive number: (gdb) p stats->diskQueueSize $6 = {value = 18446744073709551615} (gdb) p (size_t) - 1 $7 = 18446744073709551615 |
| Comment by Xiaoqin Ma [ 26/Mar/13 ] |
|
Hi Andrei, Can you give me more input about what is the setup? What is the rebalance set up? Are there any read/write operations at the same time on the cluster? Is the failed node is the new node to be added or an existing node in the cluster? Thanks! |
| Comment by Xiaoqin Ma [ 26/Mar/13 ] |
| Also, is it possible that I run the script by myself to do live debugging? Does it happen each time for the tests or just occasionally? How long does it run before the crash? |
| Comment by Mike Wiederhold [ 10/Apr/13 ] |
|
See |
| Comment by Maria McDuff [ 26/Apr/13 ] |
| pls verify / close. |
| Comment by Thuan Nguyen [ 27/Apr/13 ] |
|
Integrated in github-ep-engine-2-0 #485 (See [http://qa.hq.northscale.net/job/github-ep-engine-2-0/485/]) Result = SUCCESS Mike Wiederhold : Files : * src/ep.cc |
| Comment by Andrei Baranouski [ 08/May/13 ] |
|
reproduced on 2.0.2-789 http://qa.hq.northscale.net/job/centos-64-2.0-rebalance-regressions-P1/206/consoleFull gdb /opt/couchbase/bin/memcached core.memcached.26798 GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2 Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /opt/couchbase/bin/memcached...done. [New Thread 26893] [New Thread 26895] [New Thread 26897] [New Thread 26898] [New Thread 26896] [New Thread 2986] [New Thread 2988] [New Thread 2987] [New Thread 2996] [New Thread 2995] [New Thread 2997] [New Thread 3003] [New Thread 3004] [New Thread 26798] [New Thread 26806] [New Thread 26807] [New Thread 26811] [New Thread 26813] [New Thread 26890] [New Thread 26809] [New Thread 26812] [New Thread 3005] [New Thread 26810] [New Thread 26892] [New Thread 26891] [New Thread 26894] warning: Can't read pathname for load map: Input/output error. Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done. Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0 Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done. Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5 Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libdl.so.2 Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libm.so.6 Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/librt.so.1 Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done. Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4 Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libpthread.so.0 Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib/x86_64-linux-gnu/libstdc++.so.6 Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libgcc_s.so.1 Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done. Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done. Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so Reading symbols from /lib/x86_64-linux-gnu/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/x86_64-linux-gnu/libz.so.1 Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done. Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so Reading symbols from /opt/couchbase/lib/memcached/ep.so...done. Loaded symbols for /opt/couchbase/lib/memcached/ep.so Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done. Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1 Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done. Loaded symbols for /opt/couchbase/lib/libsnappy.so.1 Reading symbols from /opt/couchbase/lib/libicuuc.so.44...done. Loaded symbols for /opt/couchbase/lib/libicuuc.so.44 Reading symbols from /opt/couchbase/lib/libicudata.so.44...(no debugging symbols found)...done. Loaded symbols for /opt/couchbase/lib/libicudata.so.44 Reading symbols from /opt/couchbase/lib/libicui18n.so.44...done. Loaded symbols for /opt/couchbase/lib/libicui18n.so.44 Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'. Program terminated with signal 6, Aborted. #0 0x00007fb4f811ad05 in raise () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) t a a bt Thread 26 (Thread 26894): #0 0x00007fb4f81c843d in fdatasync () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f2e8faaf in couch_sync (handle=<value optimized out>) at src/os.c:117 #2 0x00007fb4f314857f in cfs_sync (h=0x48f8ea40) at src/couch-kvstore/couch-fs-stats.cc:88 #3 0x00007fb4f2e89e2f in couchstore_commit (db=0x5dd2ee0) at src/couch_db.c:193 #4 0x00007fb4f313dff6 in CouchKVStore::saveDocs (this=0x3b6e4c00, vbid=731, rev=<value optimized out>, docs=<value optimized out>, docinfos=0x48f8f9a0, docCount=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:1486 #5 0x00007fb4f313e74b in CouchKVStore::commit2couchstore (this=0x3b6e4c00) at src/couch-kvstore/couch-kvstore.cc:1411 #6 0x00007fb4f313e93a in CouchKVStore::commit (this=0x42) at src/couch-kvstore/couch-kvstore.cc:806 #7 0x00007fb4f30ca4c6 in EventuallyPersistentStore::flushVBucket (this=0x92c5000, vbid=<value optimized out>) at src/ep.cc:1919 #8 0x00007fb4f30f55d9 in doFlush (this=0x9ac59e0, tid=26148) at src/flusher.cc:222 #9 Flusher::step (this=0x9ac59e0, tid=26148) at src/flusher.cc:152 #10 0x00007fb4f3106140 in ExecutorThread::run (this=0x5eacea0) at src/scheduler.cc:148 #11 0x00007fb4f310686d in launch_executor_thread (arg=0x42) at src/scheduler.cc:34 #12 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #13 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #14 0x0000000000000000 in ?? () Thread 25 (Thread 26891): #0 0x00007fb4f81c843d in fdatasync () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f2e8faaf in couch_sync (handle=<value optimized out>) at src/os.c:117 #2 0x00007fb4f314857f in cfs_sync (h=0xc1ade20) at src/couch-kvstore/couch-fs-stats.cc:88 #3 0x00007fb4f2e89e03 in couchstore_commit (db=0x5dd36c0) at src/couch_db.c:184 #4 0x00007fb4f313dff6 in CouchKVStore::saveDocs (this=0xb06cc00, vbid=1020, rev=<value optimized out>, docs=<value optimized out>, docinfos=0x4b03c660, docCount=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:1486 #5 0x00007fb4f313e74b in CouchKVStore::commit2couchstore (this=0xb06cc00) at src/couch-kvstore/couch-kvstore.cc:1411 #6 0x00007fb4f313e93a in CouchKVStore::commit (this=0x79) at src/couch-kvstore/couch-kvstore.cc:806 #7 0x00007fb4f30ca4c6 in EventuallyPersistentStore::flushVBucket (this=0x92c5000, vbid=<value optimized out>) at src/ep.cc:1919 #8 0x00007fb4f30f55d9 in doFlush (this=0x9ac5680, tid=26146) at src/flusher.cc:222 #9 Flusher::step (this=0x9ac5680, tid=26146) at src/flusher.cc:152 #10 0x00007fb4f3106140 in ExecutorThread::run (this=0x5e81ba0) at src/scheduler.cc:148 #11 0x00007fb4f310686d in launch_executor_thread (arg=0x79) at src/scheduler.cc:34 #12 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #13 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #14 0x0000000000000000 in ?? () Thread 24 (Thread 26892): #0 0x00007fb4f81c2e93 in poll () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f314b8b5 in CouchNotifier::waitForReadable (this=0x3fd70000, tryOnce=<value optimized out>) at src/couch-kvstore/couch-notifier.cc:629 #2 0x00007fb4f314bfa5 in waitOnce (this=0x3fd70000, vbs=..., file_version=<value optimized out>, header_offset=<value optimized out>, cb=<value optimized out>) at src/couch-kvstore/couch-notifier.cc:674 #3 CouchNotifier::notify_update (this=0x3fd70000, vbs=..., file_version=<value optimized out>, header_offset=<value optimized out>, cb=<value optimized out>) at src/couch-kvstore/couch-notifier.cc:752 #4 0x00007fb4f313e138 in notify_headerpos_update (this=0x3b6e5800, vbid=973, rev=<value optimized out>, docs=<value optimized out>, docinfos=0x2736fd80, docCount=<value optimized out>) at ./src/couch-kvstore/couch-notifier.hh:127 #5 CouchKVStore::saveDocs (this=0x3b6e5800, vbid=973, rev=<value optimized out>, docs=<value optimized out>, docinfos=0x2736fd80, docCount=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:1499 #6 0x00007fb4f313e74b in CouchKVStore::commit2couchstore (this=0x3b6e5800) at src/couch-kvstore/couch-kvstore.cc:1411 #7 0x00007fb4f313e93a in CouchKVStore::commit (this=0x7fb4f0cd2680) at src/couch-kvstore/couch-kvstore.cc:806 #8 0x00007fb4f30ca4c6 in EventuallyPersistentStore::flushVBucket (this=0x92c5000, vbid=<value optimized out>) at src/ep.cc:1919 #9 0x00007fb4f30f55d9 in doFlush (this=0x9ac5d40, tid=26149) at src/flusher.cc:222 #10 Flusher::step (this=0x9ac5d40, tid=26149) at src/flusher.cc:152 #11 0x00007fb4f3106140 in ExecutorThread::run (this=0x5e81a00) at src/scheduler.cc:148 #12 0x00007fb4f310686d in launch_executor_thread (arg=0x7fb4f0cd2680) at src/scheduler.cc:34 #13 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #14 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #15 0x0000000000000000 in ?? () ---Type <return> to continue, or q <return> to quit--- Thread 23 (Thread 26810): #0 0x00007fb4f81d0633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f8f8ef36 in epoll_dispatch (base=0x5dcc280, tv=<value optimized out>) at epoll.c:404 #2 0x00007fb4f8f7a394 in event_base_loop (base=0x5dcc280, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414c84 in worker_libevent (arg=0x15674f8) at daemon/thread.c:301 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 22 (Thread 3005): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30bc778 in wait (this=0x81463f0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x81463f0, d=...) at src/dispatcher.cc:342 #3 0x00007fb4f30bf32a in Dispatcher::run (this=0x8162700) at src/dispatcher.cc:184 #4 0x00007fb4f30bfafd in launch_dispatcher_thread (arg=0x8162754) at src/dispatcher.cc:28 #5 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 21 (Thread 26812): #0 0x00007fb4f81d0633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f8f8ef36 in epoll_dispatch (base=0x5dcca00, tv=<value optimized out>) at epoll.c:404 #2 0x00007fb4f8f7a394 in event_base_loop (base=0x5dcca00, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414c84 in worker_libevent (arg=0x15676e8) at daemon/thread.c:301 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 20 (Thread 26809): #0 0x00007fb4f81d0633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f8f8ef36 in epoll_dispatch (base=0x5dcc500, tv=<value optimized out>) at epoll.c:404 #2 0x00007fb4f8f7a394 in event_base_loop (base=0x5dcc500, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414c84 in worker_libevent (arg=0x1567400) at daemon/thread.c:301 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 19 (Thread 26890): #0 0x00007fb4f81974ed in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f81c8914 in usleep () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fb4f3101385 in updateStatsThread (arg=<value optimized out>) at src/memory_tracker.cc:31 #3 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #4 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x0000000000000000 in ?? () Thread 18 (Thread 26813): #0 0x00007fb4f81d0633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f8f8ef36 in epoll_dispatch (base=0x5dcc780, tv=<value optimized out>) at epoll.c:404 #2 0x00007fb4f8f7a394 in event_base_loop (base=0x5dcc780, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414c84 in worker_libevent (arg=0x15677e0) at daemon/thread.c:301 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 17 (Thread 26811): #0 0x00007fb4f81d0633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f8f8ef36 in epoll_dispatch (base=0x5dccc80, tv=<value optimized out>) at epoll.c:404 ---Type <return> to continue, or q <return> to quit--- #2 0x00007fb4f8f7a394 in event_base_loop (base=0x5dccc80, flags=<value optimized out>) at event.c:1558 #3 0x0000000000414c84 in worker_libevent (arg=0x15675f0) at daemon/thread.c:301 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 16 (Thread 26807): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f6fc5176 in logger_thead_main (arg=<value optimized out>) at extensions/loggers/file_logger.c:368 #2 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #3 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x0000000000000000 in ?? () Thread 15 (Thread 26806): #0 0x00007fb4f81c12ed in read () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f815c798 in _IO_file_underflow () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fb4f815d7be in _IO_default_uflow () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007fb4f81518fa in _IO_getline_info () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x00007fb4f81507ca in fgets () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x00007fb4f79c8b19 in fgets (arg=<value optimized out>) at /usr/include/bits/stdio2.h:255 #6 check_stdin_thread (arg=<value optimized out>) at extensions/daemon/stdin_check.c:37 #7 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #8 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #9 0x0000000000000000 in ?? () Thread 14 (Thread 26798): #0 0x00007fb4f81d0633 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f8f8ef36 in epoll_dispatch (base=0x5dcc000, tv=<value optimized out>) at epoll.c:404 #2 0x00007fb4f8f7a394 in event_base_loop (base=0x5dcc000, flags=<value optimized out>) at event.c:1558 #3 0x000000000040c841 in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7926 Thread 13 (Thread 3004): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30bc778 in wait (this=0x81465a0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x81465a0, d=...) at src/dispatcher.cc:342 #3 0x00007fb4f30bf32a in Dispatcher::run (this=0x81621c0) at src/dispatcher.cc:184 #4 0x00007fb4f30bfafd in launch_dispatcher_thread (arg=0x8162214) at src/dispatcher.cc:28 #5 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 12 (Thread 3003): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30daf7f in wait (this=0x5dc8400) at src/syncobject.hh:57 #2 wait (this=0x5dc8400) at src/syncobject.hh:73 #3 wait (this=0x5dc8400) at src/tapconnmap.hh:169 #4 EventuallyPersistentEngine::notifyPendingConnections (this=0x5dc8400) at src/ep_engine.cc:3379 #5 0x00007fb4f30db063 in EvpNotifyPendingConns (arg=0x5dc8400) at src/ep_engine.cc:1153 #6 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #7 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #8 0x0000000000000000 in ?? () Thread 11 (Thread 2997): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30bc778 in wait (this=0x6e46900, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x6e46900, d=...) at src/dispatcher.cc:342 #3 0x00007fb4f30bf32a in Dispatcher::run (this=0x5e0b6c0) at src/dispatcher.cc:184 ---Type <return> to continue, or q <return> to quit--- #4 0x00007fb4f30bfafd in launch_dispatcher_thread (arg=0x5e0b714) at src/dispatcher.cc:28 #5 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 10 (Thread 2995): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30daf7f in wait (this=0x5dc6000) at src/syncobject.hh:57 #2 wait (this=0x5dc6000) at src/syncobject.hh:73 #3 wait (this=0x5dc6000) at src/tapconnmap.hh:169 #4 EventuallyPersistentEngine::notifyPendingConnections (this=0x5dc6000) at src/ep_engine.cc:3379 #5 0x00007fb4f30db063 in EvpNotifyPendingConns (arg=0x5dc6000) at src/ep_engine.cc:1153 #6 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #7 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #8 0x0000000000000000 in ?? () Thread 9 (Thread 2996): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30bc778 in wait (this=0x6e46630, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x6e46630, d=...) at src/dispatcher.cc:342 #3 0x00007fb4f30bf32a in Dispatcher::run (this=0x5e0ba40) at src/dispatcher.cc:184 #4 0x00007fb4f30bfafd in launch_dispatcher_thread (arg=0x5e0ba94) at src/dispatcher.cc:28 #5 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 8 (Thread 2987): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30bc778 in wait (this=0x8146990, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x8146990, d=...) at src/dispatcher.cc:342 #3 0x00007fb4f30bf32a in Dispatcher::run (this=0xbe02000) at src/dispatcher.cc:184 #4 0x00007fb4f30bfafd in launch_dispatcher_thread (arg=0xbe02054) at src/dispatcher.cc:28 #5 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 7 (Thread 2988): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30bc778 in wait (this=0x8147680, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x8147680, d=...) at src/dispatcher.cc:342 #3 0x00007fb4f30bf32a in Dispatcher::run (this=0xbe02a80) at src/dispatcher.cc:184 #4 0x00007fb4f30bfafd in launch_dispatcher_thread (arg=0xbe02ad4) at src/dispatcher.cc:28 #5 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 6 (Thread 2986): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f30daf7f in wait (this=0x5dc9600) at src/syncobject.hh:57 #2 wait (this=0x5dc9600) at src/syncobject.hh:73 #3 wait (this=0x5dc9600) at src/tapconnmap.hh:169 #4 EventuallyPersistentEngine::notifyPendingConnections (this=0x5dc9600) at src/ep_engine.cc:3379 #5 0x00007fb4f30db063 in EvpNotifyPendingConns (arg=0x5dc9600) at src/ep_engine.cc:1153 #6 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #7 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #8 0x0000000000000000 in ?? () ---Type <return> to continue, or q <return> to quit--- Thread 5 (Thread 26896): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f3105fd1 in wait (this=0x5eacb60) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x5eacb60) at src/scheduler.cc:134 #3 0x00007fb4f310686d in launch_executor_thread (arg=0x5eacba4) at src/scheduler.cc:34 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 4 (Thread 26898): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f3105fd1 in wait (this=0x5eac820) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x5eac820) at src/scheduler.cc:134 #3 0x00007fb4f310686d in launch_executor_thread (arg=0x5eac864) at src/scheduler.cc:34 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 3 (Thread 26897): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f3105fd1 in wait (this=0x5eac9c0) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x5eac9c0) at src/scheduler.cc:134 #3 0x00007fb4f310686d in launch_executor_thread (arg=0x5eaca04) at src/scheduler.cc:34 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 2 (Thread 26895): #0 0x00007fb4f848af2b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fb4f3105fd1 in wait (this=0x5eacd00) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x5eacd00) at src/scheduler.cc:134 #3 0x00007fb4f310686d in launch_executor_thread (arg=0x5eacd44) at src/scheduler.cc:34 #4 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () Thread 1 (Thread 26893): #0 0x00007fb4f811ad05 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb4f811eab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fb4f81137c5 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007fb4f30d816a in PersistenceCallback::callback(std::pair<int, long>&) () from /opt/couchbase/lib/memcached/ep.so #4 0x00007fb4f313cc34 in CouchKVStore::commitCallback(CouchRequest **, int, <anonymous enum>) (this=0x840c300, committedReqs=<value optimized out>, numReqs=981, errCode=COUCHSTORE_SUCCESS) at src/couch-kvstore/couch-kvstore.cc:1591 #5 0x00007fb4f313e766 in CouchKVStore::commit2couchstore (this=0x840c300) at src/couch-kvstore/couch-kvstore.cc:1418 #6 0x00007fb4f313e93a in CouchKVStore::commit (this=0x68ae) at src/couch-kvstore/couch-kvstore.cc:806 #7 0x00007fb4f30ca4c6 in EventuallyPersistentStore::flushVBucket (this=0x8c4cc00, vbid=<value optimized out>) at src/ep.cc:1919 #8 0x00007fb4f30f55d9 in doFlush (this=0x9ac46c0, tid=24958) at src/flusher.cc:222 #9 Flusher::step (this=0x9ac46c0, tid=24958) at src/flusher.cc:152 #10 0x00007fb4f3106140 in ExecutorThread::run (this=0x5e81860) at src/scheduler.cc:148 #11 0x00007fb4f310686d in launch_executor_thread (arg=0x68ae) at src/scheduler.cc:34 #12 0x00007fb4f8485d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #13 0x00007fb4f81cffdd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #14 0x0000000000000000 in ?? () root(couchbase)@10.3.121.98 /cores/7938-2.0.2-789-core.memcached.26798 https://s3.amazonaws.com/bugdb/jira/MB-7938/13460c18/10.3.121.92-582013-546-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/13460c18/10.3.121.93-582013-541-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/13460c18/10.3.121.94-582013-543-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/13460c18/10.3.121.95-582013-545-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/13460c18/10.3.121.96-582013-544-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/13460c18/10.3.121.97-582013-547-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7938/13460c18/10.3.121.98-582013-543-diag.zip |
| Comment by Maria McDuff [ 08/May/13 ] |
|
upgrading to blocker. also see MB-7735. |
| Comment by Mike Wiederhold [ 08/May/13 ] |
| This crash has nothing to do with MB-7735. |
| Comment by Mike Wiederhold [ 08/May/13 ] |
| Please take a look at this issue. I can take a look at it next week if you don't have time to get it resolved by then. |
| Comment by Andrei Baranouski [ 10/May/13 ] |
|
see the same crash on 2.0.2-787-rel http://qa.hq.northscale.net/job/centos-64-2.0-failover-tests-P0/609/consoleFull
Thread 1 (Thread 0x46a74940 (LWP 22854)): #0 0x0000003828630265 in raise () from /lib64/libc.so.6 #1 0x0000003828631d10 in abort () from /lib64/libc.so.6 #2 0x00000038286296e6 in __assert_fail () from /lib64/libc.so.6 #3 0x00002aaaaaf0dd0a in PersistenceCallback::callback(std::pair<int, long>&) () from /opt/couchbase/lib/memcached/ep.so #4 0x00002aaaaaf65a84 in CouchKVStore::commitCallback (this=0x2aaabdb7e300, committedReqs=<value optimized out>, numReqs=31, errCode=COUCHSTORE_SUCCESS) at src/couch-kvstore/couch-kvstore.cc:1591 #5 0x00002aaaaaf67596 in CouchKVStore::commit2couchstore (this=0x2aaabdb7e300) at src/couch-kvstore/couch-kvstore.cc:1418 #6 0x00002aaaaaf6776a in CouchKVStore::commit (this=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:806 #7 0x00002aaaaaf01aa6 in EventuallyPersistentStore::flushVBucket (this=0x1d1f2c00, vbid=599) at src/ep.cc:2059 #8 0x00002aaaaaf2ae59 in doFlush (this=0x86f3a40, d=..., tid=...) at src/flusher.cc:226 #9 Flusher::step (this=0x86f3a40, d=..., tid=...) at src/flusher.cc:157 #10 0x00002aaaaaef582a in Dispatcher::run (this=0x190a1180) at src/dispatcher.cc:184 #11 0x00002aaaaaef5fed in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28 #12 0x000000382920673d in start_thread () from /lib64/libpthread.so.0 #13 0x00000038286d44bd in clone () from /lib64/libc.so.6 vms: root(couchbase)@10.1.3.118:/tmp/core.memcached.17604 & /tmp/core.memcached.23647 root(couchbase)@10.1.3.117:/tmp/core.memcached.730 |
| Comment by Jin Lim [ 10/May/13 ] |
|
* Both the first reported crash in March and the latest crash caused by incorrectly accounting disk queue stat. It appears to be that the stat got arithmetically underflowed. But I believe the root cause that led to the condition was different for each crash though (there are a few changes have made in the code path btw March and now, ex KVShard, etc) * A fix is uploaded for review, http://review.couchbase.org/#/c/26242/ * QE (Andrei) please pick up a toy build, 2.0.0-MRW31-toy-community at http://builds.hq.northscale.net:8010/builders/ec2-centos-x64_toy-couchstore-builder/builds/165, and validate the above fix (as usual we could not reproduce the same crash at the dev side) - many thanks! * Will mark this as fixed after the code review + QE's validation |
| Comment by Maria McDuff [ 10/May/13 ] |
| andrei, pls provide test result from this toy build. thanks. |
| Comment by Andrei Baranouski [ 15/May/13 ] |
| toy build hangs on rebalance http://qa.hq.northscale.net/job/centos-64-2.0-failover-tests-P0/614/console |
| Comment by Jin Lim [ 15/May/13 ] |
|
|
| Comment by Andrei Baranouski [ 15/May/13 ] |
|
okay, but 2.0.0-MRW31 hangs on rebalance http://qa.hq.northscale.net/job/centos-64-2.0-failover-tests-P0/614/consoleFull only one test passed for this run so, I can't verify entire set of tests where we had the crash. I guess 2.0.0-MRW31 doesn't contain fix for |
| Comment by Jin Lim [ 15/May/13 ] |
| yes thanks. the crash must have occurred prior to the hang condition though - but I agree we need to validate the fix with the complete run of any relavant test. |
| Comment by Jin Lim [ 17/May/13 ] |
| The last build 806 didn't show the flushVBucket crash anymore . Please confirm and close it. Thanks! |
[MB-8154] ep_queue_size is not decreasing 20 mins after load was stopped Created: 25/Apr/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Iryna Mironava | Assignee: | Mike Wiederhold |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
centOS 64 bit
build 2.0.2-772-rel <manifest><remote name="couchbase" fetch="git://github.com/couchbase/"/><remote name="membase" fetch="git://github.com/membase/"/><remote name="apache" fetch="git://github.com/apache/"/><remote name="erlang" fetch="git://github.com/erlang/"/><default remote="couchbase" revision="master"/><project name="tlm" path="tlm" revision="14fb7cc05baf418a57d33ab7dd0e7239645ec156"><copyfile src="Makefile.top" dest="Makefile"/></project><project name="bucket_engine" path="bucket_engine" revision="2a797a8d97f421587cce728f2e6aa2cd42c8fa26"/><project name="ep-engine" path="ep-engine" revision="e38e9e49855362bcab0fa72258d888cf2423e4d5"/><project name="libconflate" path="libconflate" revision="c0d3e26a51f25a2b020713559cb344d43ce0b06c"/><project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/><project name="libvbucket" path="libvbucket" revision="026c79ae424a6daed4bb9345e86cc8fc21759b28"/><project name="couchbase-cli" path="couchbase-cli" revision="af83ea2e04736c1e9977f59bdba3f2e3390a86d8" remote="couchbase"/><project name="memcached" path="memcached" revision="f5f43c6971d88c839ee78bcf87d6e7f177cef7b4" remote="membase"/><project name="moxi" path="moxi" revision="4b391021af7a453bf88716724d2c644916ebd969"/><project name="ns_server" path="ns_server" revision="3d51cec3c9bc31e9d4d4dd496993aa5e9c39a00b"/><project name="portsigar" path="portsigar" revision="159b6179ea8a3c2075ee9eb2afa6f91c98c0fda6"/><project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/><project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/><project name="couchbase-python-client" path="couchbase-python-client" revision="006c1aa8b76f6bce11109af8a309133b57079c4c"/><project name="couchdb" path="couchdb" revision="586e4bb73b92db4362192616370c4e3edb8c34a0"/><project name="couchdbx-app" path="couchdbx-app" revision="cf709acdb8ee24cef158a2007189184e1e0f8016"/><project name="couchstore" path="couchstore" revision="ddc4ba05ac9459994464aac973f5815abb9d8aa6"/><project name="geocouch" path="geocouch" revision="ed9ad43aa361df0829262fef811b5236331b44c8"/><project name="testrunner" path="testrunner" revision="96018840bf35a31ae43bc2c409cd6012ac27879e"/><project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/><project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/><project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/><project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/><project name="gperftools" path="gperftools" revision="8f60ba949fb8576c530ef4be148bff97106ddc59" remote="couchbase"/><project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/></manifest> |
||
| Description |
|
1 default bucket, 3 replicas, 4 nodes cluster
loaded 1M items after load ep_queue_size is still 16 for 20 mins. attaching logs |
| Comments |
| Comment by Iryna Mironava [ 25/Apr/13 ] |
|
logs: https://s3.amazonaws.com/bugdb/jira/MB-8154/16fc64ab/172.27.33.10-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8154/16fc64ab/172.27.33.11-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8154/16fc64ab/172.27.33.12-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8154/16fc64ab/172.27.33.13-diag.zip |
| Comment by Chiyoung Seo [ 26/Apr/13 ] |
| I looked at the logs and seems to me that there are actually no dirty items in the disk write queue. This is just a stat bug. I will take a look at it more to see where the stat is not decremented correctly. |
| Comment by Mike Wiederhold [ 02/May/13 ] |
|
Chiyoung, I've been looking into a similar issue. If you don't have time to look at this please feel free to assign it to me. |
| Comment by Chiyoung Seo [ 02/May/13 ] |
| This is a stat update bug. I don't think we should promote it to critical. Please don't change the priority without understanding this issue and discussing it with the assignee. |
| Comment by Chiyoung Seo [ 10/May/13 ] |
|
Mike,
This stat issue happend before merging the MRW implementation. But, we still see this issue after merging the MRW. I think you did lots of refactoring around the flusher before. Please take a look at it. I'm running out of ideas. |
| Comment by Maria McDuff [ 14/May/13 ] |
| per bug triage, bumping to critical. we need status on this. |
| Comment by Mike Wiederhold [ 17/May/13 ] |
| We think the code with the multi-reader/writer fixed this issue and we have not seen it lately. Please file another issue if you run into again. |
[MB-8312] we don't have Table 7.3. and Table 7.4. (clouhbase-cli stuff)in online documentation Created: 17/May/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | documentation |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | None |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Andrei Baranouski | Assignee: | Karen Zeller |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
I see that they are presented in http://www.couchbase.com/docs/couchbase-manual-2.0.pdf
Table 7.3. Administration — couchbase Tool Commands Table 7.4. Administration — Standard couchbase Tool Options but missed in online documentation for '7.4. couchbase-cli Tool' in http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-cmdline-couchbase-cli.html |
| Comments |
| Comment by Karen Zeller [ 17/May/13 ] |
|
HI Andre,
This is because I have been asked to comment out the 2.0.2 command line commands + options until we actually release 2.0.2. They only appear in the PDF sent for review. Thanks, Karen |
| Comment by Karen Zeller [ 17/May/13 ] |
|
HI Andre,
This is because I have been asked to comment out the 2.0.2 command line commands + options until we actually release 2.0.2. They only appear in the PDF sent for review. Thanks, Karen |
[MB-8199] [2.0.2 - RN + Docs] many requests to views causes resource leak, crash Created: 04/May/13 Updated: 17/May/13 Resolved: 16/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Matt Ingenthron | Assignee: | Abhinav Dangeti |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | customer, documentation | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | 4-core CPU, 16GB RAM, Linux | ||
| Operating System: | Centos 64-bit |
| Description |
|
In response to many view requests against the scatter/gather view merger, a node can allocate so many resources that it will fail to recover.
In one case, this did cause many timeouts in the log leading to max_restart_intensity: [error_logger:error,2013-04-25T15:23:26.047,ns_1@10.128.16.171:error_logger<0.6.0>:ale_error_logger_handler:log_report:72] =========================SUPERVISOR REPORT========================= Supervisor: {local,ns_node_disco_sup} Context: shutdown Reason: reached_max_restart_intensity Offender: [{pid,<0.17237.774>}, {name,ns_config_rep}, {mfargs,{ns_config_rep,start_link,[]}}, {restart_type,permanent}, {shutdown,1000}, {child_type,worker}] |
| Comments |
| Comment by Matt Ingenthron [ 04/May/13 ] |
| Note, I put this on 2.0.2 since I know it shouldn't be 2.1 and there does not appear to be a 2.0.3. I feared it would be lost if it didn't have a fixfor version. Please move as appropriate. |
| Comment by Maria McDuff [ 07/May/13 ] |
| per bug scrub, alk - can you chk if aleksey a. can take a look at this? |
| Comment by Aleksey Kondratenko [ 07/May/13 ] |
|
We know this problem so I don't believe we should look again. Fixing it for 2.0.2 feels a bit late but possible if really needed |
| Comment by Dipti Borkar [ 07/May/13 ] |
|
When you say, "we know this problem" can you elaborate on it a bit more? With more customers using views, they are likely to hit this as well. Can you help us understand the scenario a bit more? When this problem can happen? What is the probability of hitting this? |
| Comment by Aleksey Kondratenko [ 07/May/13 ] |
| If you send too many view requests to any node it'll swamp it and kill. I recall seeing that during pre-2.0 testing and there must be MB- somewhere. |
| Comment by Maria McDuff [ 09/May/13 ] |
|
per bug triage, upgrading to blocker.
the fix is to throttle the requests and not to crash/terminate. it's fine to be slow but not crash. alk k to take a look for 2.0.2 |
| Comment by Aliaksey Artamonau [ 16/May/13 ] |
| We merged a simple request that can be configured via internal settings: http://review.couchbase.org/26334. |
| Comment by Aleksey Kondratenko [ 16/May/13 ] |
|
It should also be noted that given we don't have experience how well this approach works in production we decided to have "unlimited" as default limits. We can try playing with that stuff in-house plus get some experience with customers after 2.0.2 is out and then we'll have enough data to enable it by default and set right limits. |
| Comment by Aleksey Kondratenko [ 16/May/13 ] |
| CHANGES text is here: http://review.couchbase.org/#/c/26361/2/CHANGES,unified |
| Comment by Matt Ingenthron [ 16/May/13 ] |
| Alk: we should request QE to develop a test for this. See it cause the problem in 2.0.1 and see it not cause the problem in 2.0.2, right? Assigning it to Maria for that purpose, then it should be closed perhaps when verified? Not sure what QE's process is here now. |
| Comment by Matt Ingenthron [ 16/May/13 ] |
| Maria: Can you work with the team on the appropriate way to test that this is fixed and won't cause other problems? |
| Comment by Maria McDuff [ 17/May/13 ] |
|
Abhinav, pls verify by: -instrumenting a test that sends many view requests. do manual first then automate (if you already have a test that does similar test scenario such as this, just tweak that and use it here for this verification testing). -verifying no crashes happen. if you observe, slowness, note it here. slowness is ok. -noting alk k's "unlimited" dflt limit set. verify all his changes on review link. -using stable build of 2.0.2 which should be built tonight or tomorrow. thanks. |
| Comment by Dipti Borkar [ 17/May/13 ] |
|
We also need to document this. 270 271 +* ( 272 + 273 + It's behavior is controlled by three parameters which can be set via 274 + /internalSettings REST endpoint: 275 + 276 + - restRequestLimit 277 + 278 + Maximum number of simultaneous connections each node should 279 + accept on REST port. Diagnostics related endpoints and 280 + /internalSettings are not counted. 281 + 282 + - capiRequestLimit 283 + 284 + Maximum number of simultaneous connections each node should 285 + accept on CAPI port. It should be noted that it includes XDCR 286 + connections. 287 + 288 + - dropRequestMemoryThresholdMiB 289 + 290 + The amount of memory used by Erlang VM that should not be 291 + exceeded. If it's exceeded the server will start dropping 292 + incoming connections. 293 + 294 + When the server decides to reject incoming connection because some 295 + limit was exceeded, it does so by responding with status code of 503 296 + and Retry-After header set appropriately (more or less). On REST 297 + port textual description of why request was rejected returned in a 298 + body. On CAPI port in CouchDB tradition a JSON object is returned 299 + with "error" and "reason" fields. 300 + 301 + By default all the thresholds are set to be unlimited. |
[MB-8310] Couchdb gerrit changes for master must use manifest 2.1-unstable.xml Created: 17/May/13 Updated: 17/May/13 Resolved: 17/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | build |
| Affects Version/s: | 2.1 |
| Fix Version/s: | None |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Filipe Manana | Assignee: | Phil Labee |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
2.1-stable.xml was abandoned, and any couchdb master gerrit change depends on couchstore revision more recent than the one listed in 2.1-stable.xml - this means the jobs couchdb-gerrit-views-master and couchdb-gerrit-views-pre-merge-master always fail.
|
| Comments |
| Comment by Volker Mische [ 17/May/13 ] |
| I've changed the builds to 2.1-unstable.xml. |
[MB-8153] [Doc'd] cbworkloadgen shows error import sqlite3 module Created: 24/Apr/13 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | documentation, tools |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Thuan Nguyen | Assignee: | Thuan Nguyen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | centos 5.7 64 bit | ||
| Description |
|
Install couchbase server 2.0.2-773
cbworkloadge does not work Error: root@cen-0408 thuan]# /opt/couchbase/bin/cbworkloadgen -h Error: could not import sqlite3 module [root@cen-0408 thuan]# cd /opt/couchbase/bin [root@cen-0408 bin]# ./cbworkloadgen -h Error: could not import sqlite3 module [root@cen-0408 bin]# ./cbworkloadgen Error: could not import sqlite3 module |
| Comments |
| Comment by Pavel Paulau [ 25/Apr/13 ] |
|
Interesting: # cat /etc/redhat-release CentOS release 5.8 (Final) # cat /opt/couchbase/VERSION.txt 2.0.2-774-rel # /opt/couchbase/bin/cbworkloadgen -h Usage: cbworkloadgen [options] Generate workload to destination. Examples: cbworkloadgen -n localhost:8091 cbworkloadgen -n 10.3.121.192:8091 -r .9 -i 100000 \ -s 100 -b my-other-bucket --threads=10 Options: -h, --help show this help message and exit -r .95, --ratio-sets=.95 set/get operation ratio -n 127.0.0.1:8091, --node=127.0.0.1:8091 node's ns_server ip:port -b default, --bucket=default insert data to a different bucket other than default -i 10000, --max-items=10000 number of items to be inserted -s 10, --size=10 minimum value size --prefix=pymc prefix to use for memcached keys or json ids -j, --json insert json data -l, --loop loop forever until interrupted by users -u USERNAME, --username=USERNAME REST username for cluster or server node -p PASSWORD, --password=PASSWORD REST password for cluster or server node -t 1, --threads=1 number of concurrent workers -v, --verbose verbose logging; more -v's provide more verbosity |
| Comment by Pavel Paulau [ 25/Apr/13 ] |
|
Thuan, could you provide output of: # ls -l /opt/couchbase/lib/python/ |
| Comment by Thuan Nguyen [ 26/Apr/13 ] |
|
this vm is available at 10.1.3.140 using key to login python version 2.7 |
| Comment by Bin Cui [ 26/Apr/13 ] |
| Looks like we cannot load sqlite again for this python version. |
| Comment by Bin Cui [ 26/Apr/13 ] |
|
-bash-3.2$ python Python 2.7 (r27:82500, Jul 29 2012, 09:49:59) [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sqlite3 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/python27/lib/python2.7/sqlite3/__init__.py", line 24, in <module> from dbapi2 import * File "/opt/python27/lib/python2.7/sqlite3/dbapi2.py", line 27, in <module> from _sqlite3 import * ImportError: No module named _sqlite3 -bash-3.2$ cd /opt/couchbase/lib/python -bash-3.2$ python Python 2.7 (r27:82500, Jul 29 2012, 09:49:59) [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from pysqlite2 import dbapi2 as sqlite3 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pysqlite2/dbapi2.py", line 27, in <module> from pysqlite2._sqlite import * ImportError: pysqlite2/_sqlite.so: undefined symbol: PyUnicodeUCS4_DecodeUTF8 >>> |
| Comment by Bin Cui [ 26/Apr/13 ] |
|
Quote: This is usually caused by a mismatch in the Unicode mode of the python interpreter and the extension module. Python can be built to use either 2-byte or 4-byte Unicode code points. If you build an extension on a Python interpreter that uses one, but use it on another, this error is the most common result. |
| Comment by Maria McDuff [ 29/Apr/13 ] |
| Per Bin, this is a release blocker... he's working with Pavel on this issue. |
| Comment by Pavel Paulau [ 29/Apr/13 ] |
| Thuan, just a quick question. How did you install Python 2.7 on this machine? Was it installed after Couchbase Server? |
| Comment by Pavel Paulau [ 29/Apr/13 ] |
| http://review.couchbase.org/#/c/25938/ |
| Comment by Thuan Nguyen [ 29/Apr/13 ] |
| On this vm, I just install couchbase server only. So I think python 2.7 is pre-installed before |
| Comment by Maria McDuff [ 06/May/13 ] |
| pls verify / close. |
| Comment by Thuan Nguyen [ 09/May/13 ] |
|
I still repro this bug in build 2.0.2-793 on centos 5.7 64 bit server 10.1.3.140 (same vm)
couchbase-server-enterprise_x86_64_2.0.2-793-rel.rpm [root@cen-0408 thuan]# rpm -i couchbase-server-enterprise_x86_64_2.0.2-793-rel.rpm Minimum RAM required : 4 GB System RAM configured : 3945388 kB Minimum number of processors required : 4 cores Number of processors on the system : 4 cores Starting couchbase-server[ OK ] You have successfully installed Couchbase Server. Please browse to http://cen-0408:8091/ to configure your server. Please refer to http://couchbase.com for additional resources. Please note that you have to update your firewall configuration to allow connections to the following ports: 11211, 11210, 11209, 4369, 8091, 8092 and from 21100 to 21299. By using this software you agree to the End User License Agreement. See /opt/couchbase/LICENSE.txt. [root@cen-0408 thuan]# /opt/couchbase/bin/cbworkloadgen -h Error: could not import sqlite3 module [[root@cen-0408 thuan]# /opt/couchbase/bin/sqlite3 SQLite version 3.7.2 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> [1]+ Stopped /opt/couchbase/bin/sqlite3 [root@cen-0408 thuan]# cd /opt/couchbase/bin/ [root@cen-0408 bin]# ./cbworkloadgen -h Error: could not import sqlite3 module [root@cen-0408 bin]# ./sqlite3 SQLite version 3.7.2 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> [2]+ Stopped ./sqlite3 [root@cen-0408 bin]# python -V Python 2.7 |
| Comment by Pavel Paulau [ 10/May/13 ] |
|
Normally Python 2.7 includes sqlite3, however you have custom installation that was compiled without sqlite support. So it obviously fails.
Expected: $ python2.7 -c "import sqlite3" $ echo $? 0 Your machine: $ python2.7 -c "import sqlite3" Traceback (most recent call last): File "<string>", line 1, in <module> File "/opt/python27/lib/python2.7/sqlite3/__init__.py", line 24, in <module> from dbapi2 import * File "/opt/python27/lib/python2.7/sqlite3/dbapi2.py", line 27, in <module> from _sqlite3 import * ImportError: No module named _sqlite3 Normal recommendation in such cases is to install sqlite-devel and rebuild Python. Or to use default OS setup. Addressing such edge cases is too expensive effort IMHO. This is my input, PMs may have other suggestions. |
| Comment by Bin Cui [ 10/May/13 ] |
|
The current assumption is that we support the following python environments: 1. python 2.4 which doesn't have sqlite3 bundled. we will install our bundled version. 2. python 2.5 and above. python will have its own version of sqlite3 installed. But if it doesn't meet our sqlite3 version requirement, we will install our bundled version. 3. This QA setup is something that we never meet before and it is not a standard environment, to say the least. |
| Comment by Anil Kumar [ 10/May/13 ] |
|
tony to test this on clean VM to verify if this repro. 1. installing python 2.7 on clean vm 2. check if it comes with sqllite3 already |
| Comment by Pavel Paulau [ 11/May/13 ] |
| 0. Package "sqlite-devel" must be installed *before* Python 2.7 installation. |
| Comment by Maria McDuff [ 13/May/13 ] |
|
karen, pls doc sqlite-devel need to be installed first prior to python 2.7. thanks. tony, pls verify / close. thanks. |
| Comment by Karen Zeller [ 16/May/13 ] |
|
Added to RN 2.0.2: <rnentry> <version ver="2.0.0m"/> <class id="fix"/> <issue type="cb" ref=" <rntext> <para> In the past when you used <command>cbworkloadgen</command> you see this error <literal>ImportError: No module named _sqlite3</literal>. This has been fixed.</para> </rntext> </rnentry> |
| Comment by Thuan Nguyen [ 16/May/13 ] |
|
Test on build 2.0.2-804 with 3 vms with python 2.4.3, 2.6.5 and 2.7.1 cbworkloadgen works as expected /opt/couchbase/bin/cbworkloadgen -h Usage: cbworkloadgen [options] Generate workload to destination. Examples: cbworkloadgen -n localhost:8091 cbworkloadgen -n 10.3.121.192:8091 -r .9 -i 100000 \ -s 100 -b my-other-bucket --threads=10 Options: -h, --help show this help message and exit -r .95, --ratio-sets=.95 set/get operation ratio -n 127.0.0.1:8091, --node=127.0.0.1:8091 node's ns_server ip:port -b default, --bucket=default insert data to a different bucket other than default -i 10000, --max-items=10000 number of items to be inserted -s 10, --size=10 minimum value size --prefix=pymc prefix to use for memcached keys or json ids -j, --json insert json data -l, --loop loop forever until interrupted by users -u USERNAME, --username=USERNAME REST username for cluster or server node -p PASSWORD, --password=PASSWORD REST password for cluster or server node -t 1, --threads=1 number of concurrent workers -v, --verbose verbose logging; more -v's provide more verbosity |
Couchbase logo needs to be updated on UI, desktop and program-settings icon
(MB-7804)
|
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | installer |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Technical task | Priority: | Blocker |
| Reporter: | Anil Kumar | Assignee: | Bin Cui |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Comments |
| Comment by Anil Kumar [ 10/Apr/13 ] |
| Can you take look at the Spec and logo assets and let me know if you've what you need to make the changes. |
| Comment by Steve Yen [ 16/Apr/13 ] |
|
Anil, Assigning to you to get the right assets from the visual design folk. Bin's sent email to you regarding what he needs. Once you've got the new stuff, please attach them here and reassign this jira issue back to Bin. Thanks, Steve |
| Comment by Bin Cui [ 18/Apr/13 ] |
| http://review.membase.org/#/c/25770/ |
| Comment by Maria McDuff [ 23/Apr/13 ] |
| pls verify / close. |
| Comment by Shashank Gupta [ 24/Apr/13 ] |
|
Verified. Build : 2.0.2-772-rel |
| Comment by Anil Kumar [ 02/May/13 ] |
| As discussed attached are new images please fix. |
| Comment by Bin Cui [ 02/May/13 ] |
| update revised bmp and icn files |
| Comment by Maria McDuff [ 06/May/13 ] |
| pls verify / close. |
| Comment by Shashank Gupta [ 07/May/13 ] |
| I couldn't find banner image in the latest build during setup. Attaching the screenshots of the old(having banner image) and new(no banner) setup processes. |
| Comment by Bin Cui [ 07/May/13 ] |
| I found the problem and get it fixed at http://review.couchbase.org/#/c/26141/. It should be included in next build. |
| Comment by Shashank Gupta [ 07/May/13 ] |
| Ok. Will verify then. |
| Comment by Shashank Gupta [ 09/May/13 ] |
| Verified with build 2.0.2-787 |
| Comment by Thuan Nguyen [ 16/May/13 ] |
|
The icon in
/cygdrive/c/Program Files/Couchbase/Server/share/couchdb/www/favicon.ico need to update to the new logo |
| Comment by Bin Cui [ 16/May/13 ] |
| http://review.couchbase.org/#/c/26369/ |
[MB-8019] [2.0.2 RN + doc?] healthchecker - refresh for the new stats available in 2.0.0 Created: 10/Dec/12 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | documentation, tools |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Type: | Improvement | Priority: | Major |
| Reporter: | Steve Yen | Assignee: | Bin Cui |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | 2.0.2-release-notes, PM-PRIORITIZED | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Comments |
| Comment by Bin Cui [ 14/Dec/12 ] |
|
Here are new stats related to 2.0
Doc stats: 1. data size 2. disk size 3. actual disk size View performance 1. data size 2. disk size 3. view ops CompactionPerformance - fine grain analysis with thresholds 1. view fragmentation 2. doc fragmentation IncomingXDCRPerformance - get/set ops ratio with thresholds OutgoingXDCRPerformance 1. ops 2. replication queue length |
| Comment by Dipti Borkar [ 28/Feb/13 ] |
|
Anil has setup weekly meetings and will put together requirements and work with Bin.
this is getting more important as we have more customers, and support wants more visibility into the cluster. |
| Comment by Anil Kumar [ 10/Apr/13 ] |
| Bin: Any update on this bug will this be fixed before code-freeze on Friday. |
| Comment by Anil Kumar [ 11/Apr/13 ] |
|
Bin to update the bug with details on which stats were fixed.
|
| Comment by Bin Cui [ 23/Apr/13 ] |
| All the above stats are added to healthchecker already. |
| Comment by Maria McDuff [ 23/Apr/13 ] |
| pls verify / close. |
| Comment by Chisheng Hong [ 16/May/13 ] |
|
Doc stats + view performance: { "description": "View data size", "formula": "N/A", "status": "OK", "value": "882.565 MB" }, { "description": "View total disk size", "formula": "N/A", "status": "OK", "value": "2.303 GB" }, { "description": "Doc data size", "formula": "N/A", "status": "OK", "value": "16.711 GB" }, { "description": "Docs total disk size", "formula": "N/A", "status": "OK", "value": "19.995 GB" }, { "description": "Docs actual disk size", "formula": "N/A", "status": "OK", "value": "17.692 GB" } |
| Comment by Chisheng Hong [ 16/May/13 ] |
| CompactionPerfermance stats is not available in build 2.0.2-804-rel |
| Comment by Bin Cui [ 16/May/13 ] |
|
http://review.couchbase.org/#/c/26367/ http://review.couchbase.org/#/c/26368/ |
[MB-8269] rebalance hang but log page shows it is completed Created: 13/May/13 Updated: 16/May/13 Resolved: 14/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Thuan Nguyen | Assignee: | Anil Kumar |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | regression | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | windows server 2008 R2 64bit | ||
| Attachments: |
|
| Operating System: | Windows 64-bit |
| Description |
|
Environment:
windows server 2008 R2 64bit Build: couchbase server 2.0.2-801 Run rebalance test ./testrunner -i fournode.ini -t failovertests.FailoverTests.test_failover_normal,replica=2,load_ratio=1 rebalance hang with zero percent progress but log page shows it is completed Link to manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.2-801-rel.setup.exe.manifest.xml Link to cbcollect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_2/2013_05/4nodes-202-801_reb_hang_20130513-185157.tgz |
| Comments |
| Comment by Aliaksey Artamonau [ 14/May/13 ] |
| That's just how our logging works. If message is seen more than one time during a minute, only first occurrence is logged. And after some time we log how many repeated messages we've seen. |
| Comment by Maria McDuff [ 14/May/13 ] |
| anil, is this ok to close? |
| Comment by Anil Kumar [ 16/May/13 ] |
| had discussion with Aliaksey A, he will be updating this ticket with details. thanks |
| Comment by Anil Kumar [ 16/May/13 ] |
| the scenario was three rebalance operation were run sequentially. first and second quickly finished and there were two messages 'rebalance completed'. when there are multiple similar messages only first occurrence gets logged in the UI. the third operation is still running which hasn't completed yet. |
[MB-8242] Lazy computed outbound XDCR stats per replication are incorrect Created: 10/May/13 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | cross-datacenter-replication |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Ketaki Gangal | Assignee: | Ketaki Gangal |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | Centos 64-bit |
| Description |
|
1. Setup the Longevity cluster, West Master to East Master and West Master to West3 cluster on bucket0.
2. Load items, 3. Wait for replication to catch up for both the outbound replication streams. Notice that"Percentage complete" stats are exactly identical (incorrect) for the 2 replications. ( See attached screenshot) 1. Percentage Complete - Although it is 100 percent replicated, the replication2 shows the same number as the replication1 which is still replicating Cluster here : http://guinep-s10501:8091 |
| Comments |
| Comment by Junyi Xie [ 10/May/13 ] |
|
Is this the same test Tommie is running? If so, at the time of writing, there is nothing wrong with "Percentage Complete", both replication are done and "Percentage Complete" = 100% Where is your screenshot? Thanks. |
| Comment by Junyi Xie [ 10/May/13 ] |
| My screenshot looks fine. |
| Comment by Ketaki Gangal [ 10/May/13 ] |
| added |
| Comment by Ketaki Gangal [ 10/May/13 ] |
| That shows 100 percent after replication2 is complete, added the pending one above. |
| Comment by Maria McDuff [ 10/May/13 ] |
| bumping up to critical. |
| Comment by Junyi Xie [ 10/May/13 ] |
| The root cause of this issue is that today in ns_server stats collection infrastructure, lazy computed stats are per-bucket basis, however, if there are more than one outbound replications from that bucket, current stats gathering code in menalaus_stats.erl is unable to compute per-replication level XDCR lazy computed stats (percentage_of_completeness, meta_ops_latency and docs_ops_latency). |
| Comment by Junyi Xie [ 10/May/13 ] |
|
fix pending review on gerrit. http://review.couchbase.org/#/c/26247/ Thanks Aliaskey A for help |
| Comment by Junyi Xie [ 13/May/13 ] |
|
Alk has some concern about the fix and want to leave the bug as it is, and address the issue later in 2.1. The discussion at codereview. Leave the decision to Dipit, Maria, Ravi and Yasheen to decide if we want to fix it in 2.0.2 |
| Comment by Damien Katz [ 14/May/13 ] |
| Talked with Alk. He may merge Junyi's code as is or modify to make it more efficient, he wants to look closer to see what should be done short term and long term as this module is getting a bit messy. |
| Comment by Junyi Xie [ 16/May/13 ] |
|
Fix merged. http://review.couchbase.org/#/c/26247/ |
| Comment by Maria McDuff [ 16/May/13 ] |
| fix merged. wait for the new build. |
[MB-8273] System test : observed slow swap-rebalance in 2.0.2 ( 15 hours+) Created: 14/May/13 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Ketaki Gangal | Assignee: | Chiyoung Seo |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Centos
Build - 202-800-rel cluster 10.6.2.42:8091 |
||
| Attachments: |
|
| Description |
|
1. Setup a 6 node cluster with 2 buckets.
2. Each bucket has 1 ddoc and 2 views. 3. Load 60M items on both buckets, data-size 512 bytes consistent 4. Start mutations on the cluster for 2 hours. 5. Wait for initial indexing to complete. 6. Swap rebalance 1-1 node. - Seeing very high swap ~ 5G on most of the nodes. - Further inspection of process using swap shows memcached is using most of swap space. - The memory stats show very high fragmentation - Views are highly fragmented ( 91 percent) across the cluster. ]# pgrep memcached 9551 # grep --color VmSwap /proc/9551/status VmSwap: 5219956 kB [root@orange-11601 ~]# /opt/couchbase/bin/cbstats localhost:11210 -b default allocator NOTE: SMALL MEMORY MODEL IS IN USE, PERFORMANCE MAY SUFFER. ------------------------------------------------ MALLOC: 14060977616 (13409.6 MiB) Bytes in use by application MALLOC: + 263405568 ( 251.2 MiB) Bytes in page heap freelist MALLOC: + 5191139360 ( 4950.7 MiB) Bytes in central cache freelist MALLOC: + 0 ( 0.0 MiB) Bytes in transfer cache freelist MALLOC: + 2662928 ( 2.5 MiB) Bytes in thread cache freelists MALLOC: + 137932952 ( 131.5 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 19656118424 (18745.5 MiB) Actual memory used (physical + swap) MALLOC: + 126885888 ( 121.0 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 19783004312 (18866.5 MiB) Virtual address space used MALLOC: MALLOC: 2348103 Spans in use MALLOC: 20 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory. ------------------------------------------------ Size class breakdown ------------------------------------------------ class 1 [ 8 bytes ] : 838 objs; 0.0 MiB; 0.0 cum MiB class 2 [ 16 bytes ] : 2856430 objs; 43.6 MiB; 43.6 cum MiB class 3 [ 32 bytes ] : 684331 objs; 20.9 MiB; 64.5 cum MiB class 4 [ 48 bytes ] : 10987 objs; 0.5 MiB; 65.0 cum MiB class 5 [ 64 bytes ] : 21645 objs; 1.3 MiB; 66.3 cum MiB class 6 [ 80 bytes ] : 5347446 objs; 408.0 MiB; 474.3 cum MiB class 7 [ 96 bytes ] : 905 objs; 0.1 MiB; 474.4 cum MiB class 8 [ 112 bytes ] : 130 objs; 0.0 MiB; 474.4 cum MiB class 9 [ 128 bytes ] : 337 objs; 0.0 MiB; 474.4 cum MiB class 10 [ 144 bytes ] : 224 objs; 0.0 MiB; 474.4 cum MiB class 11 [ 160 bytes ] : 260 objs; 0.0 MiB; 474.5 cum MiB class 12 [ 176 bytes ] : 43 objs; 0.0 MiB; 474.5 cum MiB class 13 [ 192 bytes ] : 83 objs; 0.0 MiB; 474.5 cum MiB class 14 [ 208 bytes ] : 64 objs; 0.0 MiB; 474.5 cum MiB class 15 [ 224 bytes ] : 70 objs; 0.0 MiB; 474.5 cum MiB class 16 [ 240 bytes ] : 66 objs; 0.0 MiB; 474.6 cum MiB class 17 [ 256 bytes ] : 127 objs; 0.0 MiB; 474.6 cum MiB class 18 [ 288 bytes ] : 41 objs; 0.0 MiB; 474.6 cum MiB class 19 [ 320 bytes ] : 76 objs; 0.0 MiB; 474.6 cum MiB class 20 [ 352 bytes ] : 119 objs; 0.0 MiB; 474.7 cum MiB class 21 [ 384 bytes ] : 57 objs; 0.0 MiB; 474.7 cum MiB class 22 [ 416 bytes ] : 28 objs; 0.0 MiB; 474.7 cum MiB class 23 [ 448 bytes ] : 52 objs; 0.0 MiB; 474.7 cum MiB class 24 [ 480 bytes ] : 36 objs; 0.0 MiB; 474.7 cum MiB class 25 [ 512 bytes ] : 907 objs; 0.4 MiB; 475.2 cum MiB class 26 [ 576 bytes ] : 165 objs; 0.1 MiB; 475.3 cum MiB class 27 [ 640 bytes ] : 18 objs; 0.0 MiB; 475.3 cum MiB class 28 [ 704 bytes ] : 60 objs; 0.0 MiB; 475.3 cum MiB class 29 [ 768 bytes ] : 4872097 objs; 3568.4 MiB; 4043.7 cum MiB class 30 [ 896 bytes ] : 42 objs; 0.0 MiB; 4043.8 cum MiB class 31 [ 1024 bytes ] : 96 objs; 0.1 MiB; 4043.9 cum MiB class 32 [ 1152 bytes ] : 50 objs; 0.1 MiB; 4043.9 cum MiB class 33 [ 1280 bytes ] : 38 objs; 0.0 MiB; 4044.0 cum MiB class 34 [ 1408 bytes ] : 33 objs; 0.0 MiB; 4044.0 cum MiB class 35 [ 1536 bytes ] : 20 objs; 0.0 MiB; 4044.0 cum MiB class 36 [ 1792 bytes ] : 78 objs; 0.1 MiB; 4044.2 cum MiB class 37 [ 2048 bytes ] : 122 objs; 0.2 MiB; 4044.4 cum MiB class 38 [ 2304 bytes ] : 12 objs; 0.0 MiB; 4044.4 cum MiB class 39 [ 2560 bytes ] : 5 objs; 0.0 MiB; 4044.5 cum MiB class 40 [ 2816 bytes ] : 16 objs; 0.0 MiB; 4044.5 cum MiB class 42 [ 3328 bytes ] : 52 objs; 0.2 MiB; 4044.7 cum MiB class 43 [ 4096 bytes ] : 29 objs; 0.1 MiB; 4044.8 cum MiB class 47 [ 6656 bytes ] : 64 objs; 0.4 MiB; 4045.2 cum MiB class 48 [ 8192 bytes ] : 14 objs; 0.1 MiB; 4045.3 cum MiB class 49 [ 9216 bytes ] : 27 objs; 0.2 MiB; 4045.5 cum MiB class 53 [ 16384 bytes ] : 7 objs; 0.1 MiB; 4045.6 cum MiB class 54 [ 20480 bytes ] : 1 objs; 0.0 MiB; 4045.7 cum MiB class 57 [ 32768 bytes ] : 13 objs; 0.4 MiB; 4046.1 cum MiB class 61 [ 65536 bytes ] : 1 objs; 0.1 MiB; 4046.1 cum MiB class 85 [ 262144 bytes ] : 1 objs; 0.2 MiB; 4046.4 cum MiB ------------------------------------------------ PageHeap: 26 sizes; 251.2 MiB free; 121.0 MiB unmapped ------------------------------------------------ 1 pages * 27211 spans ~ 212.6 MiB; 212.6 MiB cum; unmapped: 5.4 MiB; 5.4 MiB cum 2 pages * 3062 spans ~ 47.8 MiB; 260.4 MiB cum; unmapped: 12.3 MiB; 17.7 MiB cum 3 pages * 1018 spans ~ 23.9 MiB; 284.3 MiB cum; unmapped: 18.4 MiB; 36.1 MiB cum 4 pages * 721 spans ~ 22.5 MiB; 306.8 MiB cum; unmapped: 20.1 MiB; 56.2 MiB cum 5 pages * 175 spans ~ 6.8 MiB; 313.7 MiB cum; unmapped: 6.8 MiB; 63.0 MiB cum 6 pages * 113 spans ~ 5.3 MiB; 319.0 MiB cum; unmapped: 5.3 MiB; 68.3 MiB cum 7 pages * 50 spans ~ 2.7 MiB; 321.7 MiB cum; unmapped: 2.7 MiB; 71.0 MiB cum 8 pages * 101 spans ~ 6.3 MiB; 328.0 MiB cum; unmapped: 6.1 MiB; 77.1 MiB cum 9 pages * 23 spans ~ 1.6 MiB; 329.6 MiB cum; unmapped: 1.6 MiB; 78.7 MiB cum 10 pages * 18 spans ~ 1.4 MiB; 331.0 MiB cum; unmapped: 1.4 MiB; 80.1 MiB cum 11 pages * 10 spans ~ 0.9 MiB; 331.9 MiB cum; unmapped: 0.9 MiB; 80.9 MiB cum 12 pages * 11 spans ~ 1.0 MiB; 332.9 MiB cum; unmapped: 1.0 MiB; 82.0 MiB cum 13 pages * 1 spans ~ 0.1 MiB; 333.0 MiB cum; unmapped: 0.1 MiB; 82.1 MiB cum 14 pages * 2 spans ~ 0.2 MiB; 333.2 MiB cum; unmapped: 0.2 MiB; 82.3 MiB cum 15 pages * 3 spans ~ 0.4 MiB; 333.6 MiB cum; unmapped: 0.4 MiB; 82.6 MiB cum 16 pages * 3 spans ~ 0.4 MiB; 334.0 MiB cum; unmapped: 0.2 MiB; 82.9 MiB cum 17 pages * 1 spans ~ 0.1 MiB; 334.1 MiB cum; unmapped: 0.0 MiB; 82.9 MiB cum 20 pages * 1 spans ~ 0.2 MiB; 334.2 MiB cum; unmapped: 0.2 MiB; 83.0 MiB cum 21 pages * 1 spans ~ 0.2 MiB; 334.4 MiB cum; unmapped: 0.2 MiB; 83.2 MiB cum 29 pages * 1 spans ~ 0.2 MiB; 334.6 MiB cum; unmapped: 0.2 MiB; 83.4 MiB cum 49 pages * 18 spans ~ 6.9 MiB; 341.5 MiB cum; unmapped: 6.9 MiB; 90.3 MiB cum 58 pages * 1 spans ~ 0.5 MiB; 342.0 MiB cum; unmapped: 0.5 MiB; 90.8 MiB cum 75 pages * 1 spans ~ 0.6 MiB; 342.6 MiB cum; unmapped: 0.6 MiB; 91.4 MiB cum 98 pages * 8 spans ~ 6.1 MiB; 348.7 MiB cum; unmapped: 6.1 MiB; 97.5 MiB cum 100 pages * 1 spans ~ 0.8 MiB; 349.5 MiB cum; unmapped: 0.8 MiB; 98.3 MiB cum 122 pages * 1 spans ~ 1.0 MiB; 350.4 MiB cum; unmapped: 1.0 MiB; 99.2 MiB cum >255 large * 13 spans ~ 21.8 MiB; 372.2 MiB cum; unmapped: 21.8 MiB; 121.0 MiB cum [root@orange-11601 ~]# [root@orange-11601 ~]# /opt/couchbase/bin/cbstats localhost:11210 -b default memory bytes: 8468018888 ep_kv_size: 8190890236 ep_max_data_size: 13212057600 ep_mem_high_wat: 11230248960 ep_mem_low_wat: 9909043200 ep_mem_tracker_enabled: true ep_oom_errors: 0 ep_overhead: 128884714 ep_tmp_oom_errors: 0 ep_value_size: 6628874716 mem_used: 8468018888 tcmalloc_current_thread_cache_bytes: 2581856 tcmalloc_max_thread_cache_bytes: 4194304 tcmalloc_unmapped_bytes: 126844928 total_allocated_bytes: 14068670960 total_fragmentation_bytes: 5313338896 total_free_bytes: 263061504 total_heap_bytes: 19645071360 added screenshot of the cluster |
| Comments |
| Comment by Mike Wiederhold [ 15/May/13 ] |
| Just for bug distribution. |
| Comment by Maria McDuff [ 16/May/13 ] |
| per bug triage, leaving this to critical --- we need to determine what the degradation is (if any) between 2.0.1 and 2.0.2 with swap rebalance (system in very high swap is a known issue). |
| Comment by Jin Lim [ 16/May/13 ] |
| Per discussion with Chiyoung. He has worked on an existing bug tracking similar symptom. Assign it to him to follow up from his end. Thanks. |
| Comment by Chiyoung Seo [ 16/May/13 ] |
|
Each node has 32GB physical memory and we allocated 20GB for default and sasl-bucket memory quota. The memory heap size of memcached process always remained under 20GB although it has 5GB fragmentation overhead. If an erlang indexer is also memory intensive, I can easily imagine the heavy swap can happen in that node. As we know, this is a known issue.
As this bug mainly indicates the slow swap rebalance in 2.0.2 compared with 2.0.1, I will close it as duplicate of http://www.couchbase.com/issues/browse/MB-8066 I have been working with the QE and perf team for the rebalance performance regression. Please refer to the above bug for more details. |
| Comment by Chiyoung Seo [ 16/May/13 ] |
| http://www.couchbase.com/issues/browse/MB-8066 |
| Comment by Maria McDuff [ 16/May/13 ] |
| MB-8066 |
[MB-8303] [system test] Rebalance in hangs with some initial vbucket movement due to bucket state bouncing between not ready and active Created: 16/May/13 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | bucket-engine, ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Chisheng Hong | Assignee: | Aliaksey Artamonau |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | build 2.0.2-800-rel | ||
| Operating System: | Centos 64-bit |
| Description |
|
Cluster ip is 172.23.105.23
1. create 8 nodes cluster, each node has 12G RAM, HHD 2. create 2 buckets default and saslbucket, with memory quota 6G and 4G 3. Run the KV use case for 2 days: loading 35M items to each bucket, make resident ratio 70%~80%, access the data 15k ops/sec with 5% create, 5% delete, 5%expire, 5% update, 80 gets for several hours. 4. Then with the same work load, run some rebalance operations. When proceed to rebalance in 2 nodes, after several vbucket movement (17 vbucket for saslbucket), rebalance hangs. When check the log, see a lot of bucket state change between not ready and active: root@cola-s10305:/opt/couchbase/var/lib/couchbase/logs# tail -f error.1 <<"replication_building_1013_'ns_1@172.23.105.33'">>} took too long: 1196997 us [ns_server:error,2013-05-16T15:13:17.111,ns_1@172.23.105.32:<0.11769.0>:ns_memcached:verify_report_long_call:294]call {stats,<<>>} took too long: 556521 us [ns_server:error,2013-05-16T15:19:19.834,ns_1@172.23.105.32:<0.11769.0>:ns_memcached:verify_report_long_call:294]call {get_tap_docs_estimate,1013, <<"replication_building_1013_'ns_1@172.23.105.33'">>} took too long: 725657 us [ns_server:error,2013-05-16T15:22:18.618,ns_1@172.23.105.32:ns_memcached-saslbucket<0.11752.0>:ns_memcached:handle_info:671]handle_info(ensure_bucket,..) took too long: 661061 us [ns_server:error,2013-05-16T15:23:35.127,ns_1@172.23.105.32:<0.11704.0>:ns_memcached:verify_report_long_call:294]call {stats,<<>>} took too long: 568954 us [ns_server:error,2013-05-16T15:25:59.291,ns_1@172.23.105.32:<0.11705.0>:ns_memcached:verify_report_long_call:294]call topkeys took too long: 1217646 us [ns_server:error,2013-05-16T15:30:17.553,ns_1@172.23.105.32:ns_doctor<0.9998.0>:ns_doctor:update_status:234]The following buckets became not ready on node 'ns_1@172.23.105.23': ["saslbucket"], those of them are active ["saslbucket"] [ns_server:error,2013-05-16T15:30:37.540,ns_1@172.23.105.32:ns_doctor<0.9998.0>:ns_doctor:update_status:234]The following buckets became not ready on node 'ns_1@172.23.105.23': ["saslbucket"], those of them are active ["saslbucket"] diag links: https://s3.amazonaws.com/bugdb/jira/MB-8303/8nodes_202-800_rebalance_hang_20130516-143321.tgz |
| Comments |
| Comment by Aliaksey Artamonau [ 16/May/13 ] |
| You're using old build. There have been some fixed merged exactly for this issue. |
| Comment by Aliaksey Artamonau [ 16/May/13 ] |
|
|
[MB-8272] there is README file from Erlang distro (on Windows) that should be removed Created: 14/May/13 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | installer |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Thuan Nguyen | Assignee: | Thuan Nguyen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | centos, redhat and deb | ||
| Operating System: | Centos 64-bit |
| Description |
|
In couchbase 2.0.2-xx for mac and windows, there is a README file at root directory. I don't see this README file in rpm and deb package after install.
Is it what we intend to do for rpm and deb package or the bug? |
| Comments |
| Comment by Bin Cui [ 15/May/13 ] |
|
http://review.couchbase.org/#/c/26318/ We should not include this README file since it comes from erlang distribution. We should remove it from MAC release too. |
| Comment by Bin Cui [ 16/May/13 ] |
| We need to remove this README file from erlang from MAC release too. |
| Comment by Maria McDuff [ 16/May/13 ] |
| fix merged. assigning to QE. |
| Comment by Maria McDuff [ 16/May/13 ] |
| Tony, just to clarify, are you referring to the couchbase readme file or erlang readme file? |
[MB-8013] Implement detailed rebalance progress report Created: 21/May/12 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0, 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Improvement | Priority: | Blocker |
| Reporter: | Dipti Borkar | Assignee: | Maria McDuff |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | 2.0.2-release-notes, PM-PRIORITIZED, info-request | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Description |
|
"We need to measure progress better. When vbuckets have lots of items we'll currently display progress as 'stuck' because rebalance progress is currently measure in terms of vbucket movements.
From user perspective we should be able to track how many items/bytes needs to be moved and how many is left. This will likely need some help from ep-engine folks. One of the goals of 1.8.1 is better rebalance progress estimation. Particularly we should present user some ETA that's not too far off. We've discussed this with Chiyoung today. Here's what we came up as initial approach. ns_server is aware of all vbucket movements that need to be done. And in 1.8.x (maybe later than 1.8.1) we will also build replicas during rebalance. So building of replicas will also be taken into account in same way as takeovers. For each needed movement we will look at vbucket stats on source and destination and see if backfill is needed or not. If backfill is needed we know how many items (or bytes, hopefully) will be moved. If backfill is not needed we will look at checkpoint stats and get same information. Then for each in-flight vbucket movement we already have stat that tells us how many items are pending in particular tap cursor. We will use vbucket movement completion (how many vbucket movements are done out from total count of movements we need) and this stat (how far are currently in-flight movements from done) to get us % of completion. And having rate of % completion change we'll get ETA. We know its not taking into account on-going mutations and not taking into account temporary oom NAKs. But hopefully that still won't be too far off. I think that because we're going to refresh our estimates periodically by looking at vbucket stats for vbuckets which movement is still pending, it will account for on-going mutations and should work well enough. http://www.pivotaltracker.com/story/show/22972101 |
| Comments |
| Comment by Aleksey Kondratenko [ 05/Oct/12 ] |
|
Assigning back to Dipti to reflect real-world situation. We discussed this matter yesterday and it's clear some better understanding our options is needed. Particularly we need to understand better what sort of information we can have from existing ep-engine stats. |
| Comment by Aleksey Kondratenko [ 08/Oct/12 ] |
|
We discussed this matter. Decision is: a) fix replica building to reflect it's progress in REBALANCE column of stats b) maintain 'count of items moved' and display it on UI. It'll be integrated from rebalance tap streams drain rate minutus rebalance tap streams backoff rate |
| Comment by Dipti Borkar [ 08/Oct/12 ] |
| raising all must have's to blocker |
| Comment by Dipti Borkar [ 11/Oct/12 ] |
|
based on further discussion with Aliaksey K and Chiyoung, we do not currently have the infrastructure to easily add an increasing count for number of items transferred.
moving this out to .next (candidate for 2.0.1) Aliaksey A and Chiyoung, PLEASE ADD details of the investigation here. |
| Comment by Aleksey Kondratenko [ 15/Feb/13 ] |
|
We need a bit of support from ep-engine folks. See |
| Comment by Farshid Ghods [ 10/Mar/13 ] |
|
Anil, Can you please share the word document which had more information about this feature with QE ( iryna is the QE lead for this feature - Iryna Mironava <irynamironava@yandex.ru> ) |
| Comment by Maria McDuff [ 10/Apr/13 ] |
|
Per Anil (PM), a must have for 2.0.2. Mike to sync up with Alk today, and reach out to Chiyong. Need ETA from Dev to complete this feature. https://docs.google.com/spreadsheet/ccc?key=0AqedrMvnwW3FdHJkQ3h3WS1SRVF1a1VkU2NjMG1ZX0E#gid=0 |
| Comment by Wayne Siu [ 12/Apr/13 ] |
| Adding Karen to the watch list as this requires documentation update. |
| Comment by Maria McDuff [ 08/May/13 ] |
|
code already merged.
|
| Comment by Maria McDuff [ 08/May/13 ] |
|
Deep, pls verify / close. thanks. karen, pls update doc progress. thanks. |
| Comment by Anil Kumar [ 13/May/13 ] |
|
We went through the current changes for this feature. We decided to make few changes to the representation of the stats to make it much clearer to users.
Below are the details. To make it much clearer for rebalance progress on Source & Destination Node. 1. In case when its Source Node and also case where its both Source & Destination Node. We should have the below text. [BOLD] Data being transferred out Bucket: default (2 out 2) Total number of keys: 1572823 Estimated number of keys: 524371 Number of Active# vBuckets and Replica vBuckets Active#-111 Replica#-111 [BOLD] Data being transferred in Bucket: sasl (1 out 2) Total number of keys: 15728 Estimated number of keys: 5243 Number of Active# vBuckets and Replica vBuckets Active#-344 Replica#-121 2. In case when its Only Destination Node we should have the below text. [BOLD] Data being transferred in Bucket: sasl (1 out 2) Total number of keys: 15728 Estimated number of keys: 5243 Number of Active# vBuckets and Replica vBuckets Active#-344 Replica#-121 Thanks! |
| Comment by Karen Zeller [ 13/May/13 ] |
|
Ok, is this ready to get a screen shot now (UI frozen)?
Consolidating: Doc counterpart at MB-8062 |
| Comment by Aliaksey Artamonau [ 13/May/13 ] |
| No, there will be a slight change. |
| Comment by Anil Kumar [ 16/May/13 ] |
| update from Aliaksey A, this is out for code-review and will be merged today. |
| Comment by Aliaksey Artamonau [ 16/May/13 ] |
| http://review.couchbase.org/26333 |
[MB-7269] cbtransfer/cbrestore throws BadStatusLine exception when using wrong port number Created: 27/Nov/12 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | tools |
| Affects Version/s: | 2.0 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Steve Yen | Assignee: | Bin Cui |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
Thanks to Tony Fonager....
When cbtransfer is invoked incorrectly against the wrong port (11211 rather than 8091), the tool should give a more polite, useful error message rather than an ugly stack trace. [root@cbnode05 bin]# ./cbtransfer /root/Backup/cbtransfer http://192.168.0.75:11211 -b quizmo -B quizmo -u username -p password Traceback (most recent call last): File "/opt/couchbase/lib/python/cbtransfer", line 33, in <module> pump_transfer.exit_handler(pump_transfer.Transfer().main(sys.argv)) File "/opt/couchbase/lib/python/pump_transfer.py", line 85, in main sink_class, sink).run() File "/opt/couchbase/lib/python/pump.py", line 100, in run rv, source_map, sink_map = self.check_endpoints() File "/opt/couchbase/lib/python/pump.py", line 150, in check_endpoints rv, sink_map = self.sink_class.check(self.opts, self.sink_spec, source_map) File "/opt/couchbase/lib/python/pump_cb.py", line 71, in check rv, sink_map = pump.rest_couchbase(opts, spec) File "/opt/couchbase/lib/python/pump.py", line 879, in rest_couchbase rest_request_json(host, int(port), user, pswd, path) File "/opt/couchbase/lib/python/pump.py", line 856, in rest_request_json reason=reason) File "/opt/couchbase/lib/python/pump.py", line 829, in rest_request resp = conn.getresponse() File "/usr/lib64/python2.6/httplib.py", line 990, in getresponse response.begin() File "/usr/lib64/python2.6/httplib.py", line 391, in begin version, status, reason = self._read_status() File "/usr/lib64/python2.6/httplib.py", line 355, in _read_status raise BadStatusLine(line) httplib.BadStatusLine |
| Comments |
| Comment by Bin Cui [ 27/Nov/12 ] |
| http://review.couchbase.org/#/c/22854/ |
| Comment by Maria McDuff [ 27/Mar/13 ] |
| Bin -- can you ask Pavel to review your code changes? Thanks. |
| Comment by Anil Kumar [ 28/Mar/13 ] |
| Pavel will code review and this will be checked-in. |
| Comment by Anil Kumar [ 10/Apr/13 ] |
| Bin: If this is already code-reviewed can you merge the changes. |
| Comment by Steve Yen [ 16/Apr/13 ] |
|
Hi Bin, marking this resolved as Bin says it's been merged. Steve |
| Comment by Maria McDuff [ 18/Apr/13 ] |
| pls verify / close using 2.0.2 latest build. |
| Comment by Shashank Gupta [ 29/Apr/13 ] |
|
Tried using the build 2.0.2-767. Command used: ./cbtransfer http://Administrator:password@10.3.3.100:11223 http://Administrator:password@10.3.3.100:8091 -b default -B def Got the following message: error: could not access REST API: 10.3.3.100:11223/pools/default/buckets; please check source URL, username (-u) and password (-p); exception: (111, 'Connection refused') which I guess is correct But when i give port number as 11211, it does nothing and hangs. No error message or exception is thrown. Below is the command I used: ./cbtransfer http://Administrator:password@10.3.3.100:11211 http://Administrator:password@10.3.3.100:8091 -b default -B def Same is the case with cbrestore. |
| Comment by Maria McDuff [ 06/May/13 ] |
| pls see shashank's comment. |
| Comment by Bin Cui [ 13/May/13 ] |
| 11211 is used by moxi as ascii protocal for memcached. And it will wait for response from customer. That's why it hangs. |
| Comment by Maria McDuff [ 13/May/13 ] |
|
Bin, do you think the same err msg shld be returned even for port 11211? see below: error: could not access REST API: 10.3.3.100:11223/pools/default/buckets; please check source URL, username (-u) and password (-p); exception: (111, 'Connection refused') |
| Comment by Bin Cui [ 13/May/13 ] |
|
Unfortunately, the answer is no. Since 11211 is a a valid port number, but reserved for moxi instead of for ns_server, so we connect moxi instead. Talked to steve about this issue. He suggested it is a low priority bug. Two options to solve this problem: 1. implement a timeout for this operation, which is preferred way 2. check out upfront about this port number. this is not recommended because we don't want to hard code any port number in our logic. All in all, this is a edge case. |
| Comment by Maria McDuff [ 14/May/13 ] |
| per bug triage, #1 is fine to implement. |
| Comment by Bin Cui [ 14/May/13 ] |
|
Thought over again, the option one is still not good and clear enough. First, after a timeout, we still cannot clearly tell customers that the port number is incorrect for cbtransfer tool to use. And timeout may be caused by other issues too. Second, we have a clear context that these ports are reserved for moxi service and by no means used as REST api calls. We can identify it and give customers an accurate answers for the input error. http://review.couchbase.org/#/c/26301/ |
[MB-8280] cbrecovery should check if tap batch requests are processed correctly before moving to next batch Created: 14/May/13 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | tools |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Improvement | Priority: | Major |
| Reporter: | Bin Cui | Assignee: | Bin Cui |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
All recovered msgs are pumped into destination cluster in batch mode. If due to slow traffic or poor server performance, batch msg processing may fail.
Recovery tool should make sure every batch is processed correctly before moving to next one. Acknowledgement request for the last msg in batch may be an option. |
| Comments |
| Comment by Bin Cui [ 14/May/13 ] |
| http://review.couchbase.org/#/c/26299/ |
[MB-8226] Implement ns_server side memcached API for new get_meta_batch and update_meta_batch Created: 08/May/13 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | cross-datacenter-replication, ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.1 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Junyi Xie | Assignee: | Junyi Xie |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | PM-PRIORITIZED | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
In MB-8213, ep_engine team will create two new batch operations, get_meta_batch and update_meta_batch, which will be built on new protocol. From ns_server side, we need to implement new memcached APIs correspondingly.
|
| Comments |
| Comment by Matt Ingenthron [ 08/May/13 ] |
| What's the motivation for these batch operations? I ask because there is one project this may affect. |
| Comment by Junyi Xie [ 08/May/13 ] |
|
Matt, We need these because XDCR will be able to replicate to remote Memcached directly in 2.1 (today XDCR can only replicate to CAPI), which is expected to boost XDCR performance in different ways. Today memcached only supports single key based metadata operation and document update (getMeta and set/delWithMeta respectively). Although today we do replicate in batch to CAPI because within CAPI there is a loop to translate the batch into a list of getMeta or set/delWIthMeta ops and the relay them to ep_engine, tomorrow we will bypass these loops and talk to memcached directly, thus ep_engine need to have such operations |
| Comment by Junyi Xie [ 16/May/13 ] |
| Please use MB-8300, a subtask of MB-8213 to track it. Close as duplicate |
[MB-8034] couchbase-cli needs to add the new rest api for optimistic XDCR Created: 06/Apr/13 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | tools |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Improvement | Priority: | Major |
| Reporter: | Junyi Xie | Assignee: | Bin Cui |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
Please see Dipti's comment in Task to document at this ticket: http://www.couchbase.com/issues/secure/RapidBoard.jspa?rapidView=22&view=planning&selectedIssue=MB-7959&epics=hidden |
| Comments |
| Comment by Maria McDuff [ 29/Apr/13 ] |
| Bin will work on this feature this week. |
| Comment by Aliaksey Artamonau [ 01/May/13 ] |
| ns_server part merged: http://review.couchbase.org/#/c/26003/ |
| Comment by Bin Cui [ 01/May/13 ] |
| http://review.couchbase.org/#/c/26011/ |
| Comment by Andrei Baranouski [ 08/May/13 ] |
|
Build 2.0.2-789 [root@localhost bin]# ./couchbase-cli setting-xdcr -c localhost --optimistic-replication-threshold=100 Traceback (most recent call last): File "/opt/couchbase/lib/python/couchbase-cli", line 195, in <module> main() File "/opt/couchbase/lib/python/couchbase-cli", line 180, in main c = commands[cmd]() File "/opt/couchbase/lib/python/xdcr.py", line 21, in __init__ self.rest_cmd = rest_cmds['xdcr-setup'] NameError: global name 'rest_cmds' is not defined |
| Comment by Bin Cui [ 08/May/13 ] |
| http://review.couchbase.org/#/c/26201/ |
| Comment by Maria McDuff [ 13/May/13 ] |
| pls verify / close. |
| Comment by Abhinav Dangeti [ 16/May/13 ] |
|
The cli command doesn't work right.
root@plum-008:~# /opt/couchbase/bin/couchbase-cli setting-xdcr -c 10.3.3.61:8091 --optimistic-replication-threshold=512 ERROR: command: setting-xdcr: 10.3.3.61:8091, local variable 'opts' referenced before assignment |
| Comment by Bin Cui [ 16/May/13 ] |
| http://review.couchbase.org/#/c/26351/ |
[MB-8298] remote_cluster_info module should return remote memcached access info Created: 16/May/13 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | cross-datacenter-replication, ns_server |
| Affects Version/s: | 2.1 |
| Fix Version/s: | 2.1 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Junyi Xie | Assignee: | Aliaksey Artamonau |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
In 2.1, XDCR will provide an option to replicate to remote memcached instead of CAPI.
This requires XDCR core infrastructure to understand how to access memcached at remote cluster. For example, in order to replicate remote bucket "default", vbucket 123, XDCR need to know 1) node (ip address) for this vbucket at remote cluster; 2) memcached port of the node where the vbucket lives; 3) credentials to access the memcached at that node Module remote_cluster_info is a natural home for such information. Today 1) has been already encoded in the vbucketmap maintained by remote_cluster_info, but 2) and 3) are not available. This task will expand remote_cluster_info module to include 2) and 3). In particular, XDCR need an API to return 1), 2), 3) above, e.g., remote_cluster_info:fetch_remote_memcached_info(RemoteClusterId, Bucket, VBucket) returns {"10.3.114.2", 11998, "_admin", "_password"} |
| Comments |
| Comment by Junyi Xie [ 16/May/13 ] |
| Assign to owner of remote_cluster_info |
| Comment by Junyi Xie [ 16/May/13 ] |
| close this one, please use MB-8299 to track. |
| Comment by Junyi Xie [ 16/May/13 ] |
| MB-8299 |
[MB-8139] couchbase-cli - not able to add server Created: 22/Apr/13 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | tools |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Shashank Gupta | Assignee: | Bin Cui |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | Linux 64-bit | ||
| Description |
|
I am trying to add a server using couchbase-cli. Command :
./couchbase-cli server-add -c caper-002.hq.northscale.net:8091 --server-add=caper-004.hq.northscale.net:8091 -u Administrator -p password and getting the following error : ERROR: unable to server-add caper-004.hq.northscale.net:8091 (400) Bad Request [u'Prepare join failed. Authentication failed. Verify username and password. Got HTTP status 401 from REST call post to http://caper-004.hq.northscale.net:8091/engageCluster2. Body was: []'] ERROR: command: server-add: caper-002.hq.northscale.net:8091, 2 Tried using IP address instead of hostname : ./couchbase-cli server-add -c caper-002.hq.northscale.net:8091 --server-add=10.3.3.96:8091 -u Administrator -p password but getting the same error: ERROR: unable to server-add 10.3.3.96:8091 (400) Bad Request [u'Prepare join failed. Authentication failed. Verify username and password. Got HTTP status 401 from REST call post to http://10.3.3.96:8091/engageCluster2. Body was: []'] ERROR: command: server-add: 10.3.3.100:8091, 2 Build used : 2.0.2-769-rel |
| Comments |
| Comment by Maria McDuff [ 29/Apr/13 ] |
| just synced with Bin. He is working on this at the moment.... |
| Comment by Maria McDuff [ 29/Apr/13 ] |
| per bug scrub, moving to 2.1. |
| Comment by Bin Cui [ 16/May/13 ] |
|
When adding a node to a cluster, not only u need to give cluster user/password, but also user/password for that node.
Example: ./couchbase-cli server-add -c node1:8091 --server-add=node2:8091 -u Administrator -p password --server-add-user=Administrator --server-add-password=password |
[MB-8229] [Doc'd] Couchbase UI shows a node down when Rest API is used to rename a node in a multiple node cluster Created: 09/May/13 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Shashank Gupta | Assignee: | Karen Zeller |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Windows 64-bit
Build - 2.0.2-781 |
||
| Attachments: |
|
| Operating System: | Windows 64-bit |
| Description |
|
When we reference a node from hostname to IP in a multiple node cluster, using REST API, then UI of another node shows that first node is down.
In Detail: Steps followed: 1. Take a 2-node cluster referred with hostame (lets say node A and node B). 2. Rename node A from hostname to its corresponding IP, using the rest API - node/controller/rename. Result: Node A gets renamed to its corresponding IP but the UI of other node(i.e. node B) shows that node A is down. Attaching cbcollectinfo from both the nodes. Also uploading the screen-shot of UI. Please see. |
| Comments |
| Comment by Maria McDuff [ 10/May/13 ] |
| bumping up to critical. |
| Comment by Aliaksey Artamonau [ 13/May/13 ] |
| We merged the change that disallows renaming nodes that are part of a cluster: http://review.couchbase.org/26252. |
| Comment by Anil Kumar [ 13/May/13 ] |
|
Aliaksey A, can you resolve this bug and assign it to QE for verifying it. |
| Comment by Maria McDuff [ 14/May/13 ] |
| pls verify (update with your test output) then assign to karen for documentation. |
| Comment by Shashank Gupta [ 15/May/13 ] |
|
Verified. I got this message after running my test: error 400 reason: unknown ["Renaming is disallowed for nodes that are already part of a cluster"] |
| Comment by Karen Zeller [ 15/May/13 ] |
|
Hi,
Can we advise people to ignore this error? Or is there another workaround/fix? Regards, Karen |
| Comment by Aliaksey Artamonau [ 15/May/13 ] |
| The last error is intended behavior. So there's no workaround. Hostnames can only be assigned and changed when node is uninitialized or if it's the only node in the cluster. |
| Comment by Karen Zeller [ 15/May/13 ] |
|
Added to RN, Noted in REST section that this change must be made prior to adding a node to the cluster:
+ <rnentry> + + <version ver="2.0.0m"/> + + <class id="cluster"/> + + <issue type="cb" ref=" + + + <rntext> + + <para> +If you use the REST-API to change a node hostname when the node is already part of a cluster, Couchbase Web Console +will show that the node is down. You should only provide a hostname via the REST-API when a node is not yet part of a cluster. + For more information, see <xref linkend="couchbase-getting-started-hostnames" />.</para> + + + </rntext> + + </rnentry> |
| Comment by Aliaksey Artamonau [ 15/May/13 ] |
| As discussed in person nothing should be added to release nodes. Just the behavior that node cannot be renamed after it's added to a cluster should be documented. If one tries to do so, couchbase server won't show the node as down. It'll just reject the request with the error referred above. |
| Comment by Karen Zeller [ 16/May/13 ] |
|
Removed from release notes. Added note in REST section: If you use this method, you should provide the hostname before you add a node to a cluster. If you provide a hostname for a node that is already part of a Couchbase cluster; the server will reject the request and return <literal>error 400 reason: unknown ["Renaming is disallowed for nodes that are already part of a cluster"]</literal>.: |
| Comment by Aliaksey Artamonau [ 16/May/13 ] |
| This "error 400 reason: unknown" comes apparently from testing framework. ns_server just returns "Renaming is disallowed for nodes that are already part of a cluster" in the response body with status code 400. |
[MB-8282] -b bucket selection option is not filtering other buckets for cbhealthchecker Created: 14/May/13 Updated: 16/May/13 Resolved: 15/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | tools |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Chisheng Hong | Assignee: | Chisheng Hong |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | build-2.0.2-800-rel | ||
| Operating System: | Centos 64-bit |
| Description |
|
1. create 2 buckets, saslbucket, and default.
2. Start the front end workload. 3. When the system is in dgm state, run cbhealthchecker just for default bucket [jenkins@centos-54-x64-302 ~]$ /opt/couchbase/bin/cbhealthchecker -c 172.23.105.23:8091 -u Administrator -p password -b default bucket: default node: 172.23.105.29 11210 node: 172.23.105.23 11210 node: 172.23.105.30 11210 node: 172.23.105.31 11210 node: 172.23.105.25 11210 node: 172.23.105.28 11210 node: 172.23.105.32 11210 node: 172.23.105.26 11210 ................................ Traceback (most recent call last): File "/opt/couchbase/lib/python/cbhealthchecker", line 134, in ? main() File "/opt/couchbase/lib/python/cbhealthchecker", line 121, in main scale = retriever.collect_data(bucket, cluster, user, password, inputfile, statsfile, scale, opts, outputdir) File "/opt/couchbase/lib/python/collector.py", line 316, in collect_data self.get_ns_stats(bucketlist, server, port, user, password, bucketname, scale, opts) File "/opt/couchbase/lib/python/collector.py", line 279, in get_ns_stats if stats_buffer.bucket_info[bucket_name]["bucketType"] == 'memcached': KeyError: u'saslbucket' -b option is not working properly. When we iterate stats_buffer.bucket_info, we are still looking for sasl bucket |
| Comments |
| Comment by Bin Cui [ 15/May/13 ] |
| http://review.couchbase.org/#/c/26315/ |
| Comment by Chisheng Hong [ 16/May/13 ] |
|
Verified with build 2.0.2-804-rel on centos.
[root@slv-0701 bin]# ./cbhealthchecker -c 172.23.105.23:8091 -u Administrator -p password -b default bucket: default node: 172.23.105.23 11210 node: 172.23.105.25 11210 node: 172.23.105.28 11210 Traceback (most recent call last): File "/opt/couchbase/lib/python/collector.py", line 208, in get_mc_stats_per_node node_stats = mc.stats(cmd) File "/opt/couchbase/lib/python/cb_bin_client.py", line 383, in stats cmd, opaque, cas, klen, extralen, data = self._handleKeyedResponse(None) File "/opt/couchbase/lib/python/cb_bin_client.py", line 89, in _handleKeyedResponse cmd, errcode, opaque, cas, keylen, extralen, rv = self._recvMsg() File "/opt/couchbase/lib/python/cb_bin_client.py", line 71, in _recvMsg raise exceptions.EOFError("Got empty data (remote died?).") EOFError: Got empty data (remote died?). node: 172.23.105.29 11210 node: 172.23.105.32 11210 node: 172.23.105.27 11210 node: 172.23.105.26 11210 node: 172.23.105.30 11210 node: 172.23.105.31 11210 node: 172.23.105.33 11210 Traceback (most recent call last): File "/opt/couchbase/lib/python/collector.py", line 208, in get_mc_stats_per_node node_stats = mc.stats(cmd) File "/opt/couchbase/lib/python/cb_bin_client.py", line 383, in stats cmd, opaque, cas, klen, extralen, data = self._handleKeyedResponse(None) File "/opt/couchbase/lib/python/cb_bin_client.py", line 89, in _handleKeyedResponse cmd, errcode, opaque, cas, keylen, extralen, rv = self._recvMsg() File "/opt/couchbase/lib/python/cb_bin_client.py", line 71, in _recvMsg raise exceptions.EOFError("Got empty data (remote died?).") EOFError: Got empty data (remote died?). ................................ The run finished successfully. Please find html output at '/opt/couchbase/bin/reports/2013-05-16_12-28-47.html' and text output at '/opt/couchbase/bin/reports/2013-05-16_12-28-47.txt'. |
[MB-7996] [system test] rebalance hang when add a node to cluster Created: 01/Apr/13 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Thuan Nguyen | Assignee: | Thuan Nguyen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | system-test, windows | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | windows physical servers 2008 R2 64bit | ||
| Attachments: |
|
| Description |
|
Install couchbase server 2.0.1-185 on 4 physical servers with 2 separated disks
Create a cluster with 3 nodes 10.2.1.61 10.2.1.62 10.2.1.63 Create 2 buckets: default (14GB) and sasl (10GB) No view or xdcr created Load 20+ million items to both bucket until resident ratio on both bucket around 90% Access cluster in 3 hours with spec in this page http://hub.internal.couchbase.com/confluence/pages/viewpage.action?pageId=6785119 Add node 10.2.1.64 to cluster and rebalance. Rebalance failed. Filed bug Start rebalance again. Rebalance hang Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_1/201304/4phy-win-201_185-reb-not-moving_node-61-erl-frozen-20130401-122033.tgz Link to manifest file of the build http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_x86_64_2.0.1-185-rel.setup.exe.manifest.xml |
| Comments |
| Comment by Maria McDuff [ 01/Apr/13 ] |
| see screen shot during fail rebalance. rebalance fails on each attempt. upgrading to blocker. |
| Comment by Thuan Nguyen [ 01/Apr/13 ] |
|
We may need erlang coredump from node 61. So don't restart couchbase server on node 61. We need to log into erlang shell of another unix server and kill erlang process on node 61 from that unix server. I am waiting for what to do next in this cluster. It is in failed state now. |
| Comment by Maria McDuff [ 02/Apr/13 ] |
| moving to 2.0.2 |
| Comment by Thuan Nguyen [ 02/Apr/13 ] |
|
Link to erl dump https://s3.amazonaws.com/packages.couchbase/erlang/windows/erl.DMP Link to erl dump compressed https://s3.amazonaws.com/packages.couchbase/erlang/windows/erl.DMP.zip |
| Comment by Karen Zeller [ 16/Apr/13 ] |
| Confirmed with Abhinav- internal only, not for RN 4/16 |
| Comment by Mike Wiederhold [ 24/Apr/13 ] |
|
From the memcached logs
Fri Mar 29 18:33:17.306365 Pacific Daylight Time 3: Warning: failed to save docs to database, numDocs = 8913 error=error reading file [errno = 0: `No error', WINAPI error = 0: `The operation completed successfully. '] Fri Mar 29 18:33:17.306365 Pacific Daylight Time 3: Warning: commit failed, cannot save CouchDB docs for vbucket = 98 rev = 5 Fri Mar 29 18:33:17.306365 Pacific Daylight Time 3: Fatal error in persisting DELETE ``00036F335614CE7BFA4BA5C5'' on vb 98!!! Requeue it... Fri Mar 29 18:33:17.306365 Pacific Daylight Time 3: Fatal error in persisting SET ``00082BE4EBB99A947E0AFC47'' on vb 98!!! Requeue it... The rebalance hangs because persistence stops and this causes the server memory to become fully which then causes the tap streams to stop moving data. See |
| Comment by Chiyoung Seo [ 03/May/13 ] |
|
Tony,
From the log, there were no database compactions for the sasl bucket that had the above write commit failure. It seems to me that this wasn't caused by the race between the database compactor and the flusher. As the issue happend during the rebalance, it might be caused by the race between the vbucket database reset and the flusher. We recently had the similar race issue between the vbucket reset and the flusher while working on the multi reader and writer, and merged a couple of fixes. Tony, as the issue was seen in 2.0.1 build, I suggest you to test the same scenario with the latest 2.0.2 build to see if we still have the same issue. |
| Comment by Thuan Nguyen [ 07/May/13 ] |
|
I could not verify this bug since I was blocked by bug I will verify it when bug MB-7735 is fixed. |
| Comment by Mike Wiederhold [ 07/May/13 ] |
| I just gave Abhinav a toy build with Chiyoung's fix for MB-7735. If the results look okay we can merge that fix. Otherwise I might need help from Chiyoung with that issue. |
| Comment by Maria McDuff [ 10/May/13 ] |
| chiyoung waiting for toybuild test result (chisheng/abhinav testing). |
| Comment by Maria McDuff [ 16/May/13 ] |
|
possible dupe of |
[MB-8293] Alert UI annoyingly flashes as metadata is used Created: 15/May/13 Updated: 16/May/13 Resolved: 16/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | UI |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | None |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Matt Ingenthron | Assignee: | Aleksey Kondratenko |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | customer | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Description |
|
As we go beyond 50% metadata, the UI will show alerts in the console. Problem is, the alerts display non-stop, with no way to disable them, making use of the UI very hard. One has to constantly close the alert box.
Scenario: Load system with many items, such that metadata is consumed. Expected behavior: Alert can be acknowledged and won't be displayed again as long as we're in the alert area. Obviously, if we exit the alert range and hit it again, we'd expect to see the alert again. Observed behavior: While workload is running, the alert box is always popping up. Incessantly. It cannot be stopped. Yes, I know I'm above 50% metadata, go away. No, really, please go away. GO! |
[MB-8231] Rebalance hangs with progress zero percent Created: 09/May/13 Updated: 15/May/13 Resolved: 15/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket, ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Deepkaran Salooja | Assignee: | Jin Lim |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
windows 2008 R2 64 bit and centos 64-bit
|
||
| Operating System: | Centos 64-bit |
| Description |
|
Manifest info:
<manifest> <remote name="couchbase" fetch="git://github.com/couchbase/"/> <remote name="membase" fetch="git://github.com/membase/"/> <remote name="apache" fetch="git://github.com/apache/"/> <remote name="erlang" fetch="git://github.com/erlang/"/> <default remote="couchbase" revision="master"/> <project name="tlm" path="tlm" revision="9f8a97b773c2b97cd63893a84a2fef2562c8860f"> <copyfile src="Makefile.top" dest="Makefile"/> </project> <project name="bucket_engine" path="bucket_engine" revision="2a797a8d97f421587cce728f2e6aa2cd42c8fa26"/> <project name="ep-engine" path="ep-engine" revision="7fa9438353fa783d72e3e7f28e32703ab8922733"/> <project name="libconflate" path="libconflate" revision="c0d3e26a51f25a2b020713559cb344d43ce0b06c"/> <project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/> <project name="libvbucket" path="libvbucket" revision="408057ec55da3862ab8d75b1ed25d2848afd640f"/> <project name="couchbase-cli" path="couchbase-cli" revision="87dcaa935efb0eac4e75d529ab7e3c81b4439e61" remote="couchbase"/> <project name="memcached" path="memcached" revision="b6ceb46fc26ac6f1d6be7a5866d6c6c0f6e6d32a" remote="membase"/> <project name="moxi" path="moxi" revision="4b391021af7a453bf88716724d2c644916ebd969"/> <project name="ns_server" path="ns_server" revision="d77b8c4d9eb27fbd60778ca299cec29bca749e4c"/> <project name="portsigar" path="portsigar" revision="159b6179ea8a3c2075ee9eb2afa6f91c98c0fda6"/> <project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/> <project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/> <project name="couchbase-python-client" path="couchbase-python-client" revision="d443169c0694fca1be67d8f6934a8c50f0175ee7"/> <project name="couchdb" path="couchdb" revision="586e4bb73b92db4362192616370c4e3edb8c34a0"/> <project name="couchdbx-app" path="couchdbx-app" revision="833f5f2491c1d307d4d1ea291fd446a740df83f0"/> <project name="couchstore" path="couchstore" revision="abc2af1310ca375697e08aad4fa78e5e5d61adcf"/> <project name="geocouch" path="geocouch" revision="ed9ad43aa361df0829262fef811b5236331b44c8"/> <project name="testrunner" path="testrunner" revision="2ba2259a62e609b2bbaffc179be2d6c6be6ca61a"/> <project name="healthchecker" path="healthchecker" revision="72dab0d4f293e80644b38321f001b42846701890"/> <project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/> <project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/> <project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/> <project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/> <project name="gperftools" path="gperftools" revision="44a584d1de8c89addfb4f1d0522bdbbbed83ba48" remote="couchbase"/> <project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/> </manifest> Rebalance 1 -> 4 hangs sometimes with status stuck at 0% and below logs repeating: {move_state,985, ['ns_1@10.3.3.100',undefined], ['ns_1@10.3.3.98','ns_1@10.3.3.96'], [{replica_building_stats,'ns_1@10.3.3.98',1,1,<<>>}, {replica_building_stats,'ns_1@10.3.3.96',1,1,<<>>}]}, {move_state,986, ['ns_1@10.3.3.100',undefined], ['ns_1@10.3.3.98','ns_1@10.3.3.96'], [{replica_building_stats,'ns_1@10.3.3.98',1,1,<<>>}, {replica_building_stats,'ns_1@10.3.3.96',1,1,<<>>}]}, {move_state,987, ['ns_1@10.3.3.100',undefined], ['ns_1@10.3.3.98','ns_1@10.3.3.96'], [{replica_building_stats,'ns_1@10.3.3.98',1,1,<<>>}, {replica_building_stats,'ns_1@10.3.3.96',1,1,<<>>}]}, {move_state,988, ['ns_1@10.3.3.100',undefined], ['ns_1@10.3.3.98','ns_1@10.3.3.96'], [{replica_building_stats,'ns_1@10.3.3.98',1,1,<<>>}, {replica_building_stats,'ns_1@10.3.3.96',1,1,<<>>}]}, Reproducer test: ./testrunner -i ../ini/vm-4nodes-sanity.ini -t warmupcluster.WarmUpClusterTest.test_warmUpCluster,num_of_docs=1000 Attaching collect info |
| Comments |
| Comment by Deepkaran Salooja [ 09/May/13 ] |
|
https://s3.amazonaws.com/bugdb/jira/MB-8231/e9125b6b/10.3.3.100-592013-937-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8231/e9125b6b/10.3.3.95-592013-938-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8231/e9125b6b/10.3.3.96-592013-939-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8231/e9125b6b/10.3.3.98-592013-940-diag.zip |
| Comment by Maria McDuff [ 09/May/13 ] |
|
Deep, how frequent is this hang happening? did this test pass before this build? is this build 789? |
| Comment by Aliaksey Artamonau [ 09/May/13 ] |
|
We stuck waiting for response from memcached to CMD_CHECKPOINT_PERSISTENCE command: {backtrace,[<<"Program counter: 0x00002b1c96cd57f8 (prim_inet:recv0/3 + 224)">>, <<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>, <<>>, <<"0x00002b1ca32fa4a0 Return addr 0x00002b1c9b19ffb0 (mc_binary:recv/3 + 280)">>, <<"y(0) 8361">>,<<"y(1) #Port<0.12567>">>,<<>>, <<"x00002b1ca32fa4b8 Return addr 0x00002b1c9b1ec6e8 (mc_client_binary:cmd_binary_vocal_recv/5">>, <<"y(0) infinity">>,<<"y(1) res">>, <<"y(2) #Port<0.12567>">>,<<>>, <<"x00002b1ca32fa4d8 Return addr 0x00002b1c9b1f3198 (mc_client_binary:wait_for_checkpoint_per">>, <<"y(0) []">>,<<"y(1) []">>, <<"y(2) infinity">>,<<"y(3) undefined">>, <<"y(4) undefined">>,<<"y(5) #Port<0.12567>">>, <<"y(6) 177">>,<<>>, <<"x00002b1ca32fa518 Return addr 0x00002b1c9ab11628 (ns_memcached:perform_wait_for_checkpoint">>, <<>>, <<"0x00002b1ca32fa520 Return addr 0x00002b1c96d0d420 (proc_lib:init_p_do_apply/3 + 56)">>, <<"y(0) {<0.5971.0>,#Ref<0.0.0.149700>}">>, <<"y(1) []">>,<<"y(2) []">>, <<"y(3) <0.5805.0>">>,<<>>, <<"0x00002b1ca32fa548 Return addr 0x0000000000872b58 (<terminate process normally>)">>, <<"y(0) Catch 0x00002b1c96d0d440 (proc_lib:init_p_do_apply/3 + 88)">>, <<>>]}, |
| Comment by Ketaki Gangal [ 10/May/13 ] |
|
Hi Jin, Can you please take a look at this bug? Most of our tests are stuck due to this error code. thanks, Ketaki |
| Comment by Jin Lim [ 10/May/13 ] |
|
Will take a look, it is now in my today's working queue. Reviewing the symptom and other logs, it looks like both Thanks. |
| Comment by Andrei Baranouski [ 10/May/13 ] |
|
Jin, you can take a look at http://qa.hq.northscale.net/job/CouchbaseServer-SanityTest-4Node-Centos64/63/consoleFull where rebalance still hangs rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%4010.3.3.30%2Cns_1%4010.3.3.224%2Cns_1%4010.3.3.33%2Cns_1%4010.3.3.32 believe that this is the same problem |
| Comment by Jin Lim [ 10/May/13 ] |
|
Thanks Andrei, can you please provide me the access info of the node where the rebalance is hung. Or please check if you see the same mutex error message as |
| Comment by Andrei Baranouski [ 11/May/13 ] |
|
root/couchbase ini file for the job: [global] username:root port:8091 [cluster1] 1:_1 2:_2 [cluster2] 1:_3 2:_4 [servers] 1:_1 2:_2 3:_3 4:_4 [_1] ip:10.3.3.32 password:couchbase [_2] ip:10.3.3.33 password:couchbase [_3] ip:10.3.3.30 password:couchbase [_4] ip:10.3.3.224 password:password [membase] rest_username:Administrator rest_password:password |
| Comment by Andrei Baranouski [ 11/May/13 ] |
|
see http://www.couchbase.com/issues/browse/MB-8243?focusedCommentId=57921&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-57921 where rebalance hangs with 2.0.0-MRW33-toy build |
| Comment by Jin Lim [ 13/May/13 ] |
|
Thanks for testing out the MRW33-toy build. Following the above link for 1) cen-1711:8091 indicates that "rebalancing 3 nodes" hung for an infinite time 2) ssh to each node 10.5.2.13 - 15, and found 10.5.2.15 (cen-1711) suffered memcached crash due to the same MUTEX error at getCouchBucket(). 3) compared time stamps of this MUTEX error reported in debug.2 and ~tmp/core.memcached.14945, both matched at "May 12 06:43" 4) gdb info for where the crash was -> #1 0x0000003f14231d10 in abort () from /lib64/libc.so.6 #2 0x00002aaaaaf36070 in Mutex::acquire (this=0x1a444828) at src/mutex.cc:83 #3 0x00002aaaaaf83fe8 in lock (this=0x1a444828, key="couch_bucket") at src/locks.hh:48 #4 LockHolder (this=0x1a444828, key="couch_bucket") at src/locks.hh:26 #5 Configuration::getString (this=0x1a444828, key="couch_bucket") at src/configuration.cc:38 #6 0x00002aaaaaf8e5eb in Configuration::getCouchBucket (this=0x1a444828) at src/generated_configuration.cc:71 #7 0x00002aaaaaf7c59e in CouchNotifier::selectBucket (this=0x1a4c3000) at src/couch-kvstore/couch-notifier.cc:721 #8 0x00002aaaaaf7cc0f in CouchNotifier::processInput (this=0x1a4c3000) at src/couch-kvstore/couch-notifier.cc:606 #9 0x00002aaaaaf7c199 in maybeProcessInput (this=0x1a4c3000, rh=0x2c16a540) at src/couch-kvstore/couch-notifier.cc:546 5) look the thread frame #7, getCouchBucket() was invoked froms selectBucket() at couch-notifier.cc#721 6) which means this node didn't get the fix for This might have been a usual case where the toy build didn't pick up the right materials or the node didn't get installed correctly with the ty build. Anyhow, I think it would be a good idea that (a) wipe out all previous installation from all the 3 nodes, (b) get the latest 2.0.2 build which has |
| Comment by Andrei Baranouski [ 13/May/13 ] |
|
we still don't have the latest build with (http://review.couchbase.org/#/c/26253/) the latest one is http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_x86_2.0.2-799-rel.deb.manifest.xml <project name="ep-engine" path="ep-engine" revision="e657fe4789a4a8be3ef145d602548278b48ad3de"/> |
| Comment by Ketaki Gangal [ 13/May/13 ] |
| Seeing this w/ the latest build 202-800 as well. Zero percent rebalance progress on the sanity-job runs. |
| Comment by Jin Lim [ 13/May/13 ] |
|
There were two issues here regarding rebalance hang: 1) MUTEX::acquire() abort - caused memcached to crash (fixed in http://review.couchbase.org/#/c/26253/) 2) rebalance never proceed if vbuckets are empty (fix has been merged to review http://review.couchbase.org/#/c/26275/) |
| Comment by Jin Lim [ 13/May/13 ] |
|
the toy build based on the above possible fix (2) is here for testing http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_toy-couchstore-x86_64_2.0.0-MRW34-toy.rpm |
| Comment by Thuan Nguyen [ 13/May/13 ] |
|
Repro this bug in windows with build 2.0.2-801 Run test ./testrunner -i fournode.ini -t failovertests.FailoverTests.test_failover_normal,replica=2,load_ratio=1 Rebalance hang at 0% See this error in diags of orchestrator node: [ns_server:info,2013-05-13T18:29:26.120,babysitter_of_ns_1@127.0.0.1:<0.197.0>:ns_port_server:log:168]moxi<0.197.0>: 2013-05-13 18:29:27: (agent_config.c.705) ERROR: bad JSON configuration from http://127.0.0.1:8091/pools/default/saslBucketsStreaming: Number of vBuckets must be a power of two > 0 and <= 65536 ({ moxi<0.197.0>: "name": "default", moxi<0.197.0>: "nodeLocator": "vbucket", moxi<0.197.0>: "saslPassword": "", moxi<0.197.0>: "nodes": [{ moxi<0.197.0>: "hostname": "10.3.2.143:8091", moxi<0.197.0>: "ports": { moxi<0.197.0>: "direct": 11210, moxi<0.197.0>: "proxy": 11211 moxi<0.197.0>: } moxi<0.197.0>: }], moxi<0.197.0>: "vBucketServerMap": { moxi<0.197.0>: "hashAlgorithm": "CRC", moxi<0.197.0>: "numReplicas": 1, moxi<0.197.0>: "serverList": ["10.3.2.143:11210"], moxi<0.197.0>: "vBucketMap": [] moxi<0.197.0>: } moxi<0.197.0>: }) Manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.2-801-rel.setup.exe.manifest.xml Link to cbcollect_info file https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_2/2013_05/4nodes-202-801_reb_hang_20130513-185157.tgz |
| Comment by Jin Lim [ 13/May/13 ] |
|
This regression hang issue would occur across platforms. No need to further verify whether it is a platform specific or not. Thanks.
The toy build based on the above possible fix (2) is here for testing http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_toy-couchstore-x86_64_2.0.0-MRW34-toy.rpm |
| Comment by Ketaki Gangal [ 13/May/13 ] |
|
Hi Jin, Seeing the same issue w/ the toy build as well. http://qa.hq.northscale.net/job/CouchbaseServer-SanityTest-4Node-Centos64/68/console -Ketaki |
| Comment by Jin Lim [ 14/May/13 ] |
| Can you give me access to one of the nodes that rebalance hang? Thanks. |
| Comment by Thuan Nguyen [ 14/May/13 ] |
|
Here is one of nodes for windows cluster 10.3.2.21 Administrator/Membase123 They are in hang status now |
| Comment by Jin Lim [ 15/May/13 ] |
| The build 803 has the latest fix that we tested manually on QE's (Tony) two nodes cluster. All rebalance under various scenarios passed there. Please pick up the build 803 for validation. Thanks. |
| Comment by Deepkaran Salooja [ 15/May/13 ] |
|
With build 803, no longer seeing the hang issue. But the sanity job has rebalance failures. Filed http://qa.hq.northscale.net/job/CouchbaseServer-SanityTest-4Node-Centos64/73/console |
| Comment by Maria McDuff [ 15/May/13 ] |
| hang is fixed in build 803. closing as fixed. |
[MB-8245] During rpm upgrade, encountered error while copying opt/couchbase/var/lib/couchbase/config/config.dat (already exists) Created: 10/May/13 Updated: 15/May/13 Resolved: 15/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | None |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Maria McDuff | Assignee: | Bin Cui |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | Centos 64-bit |
| Description |
|
Steps:
-linux centos64 -upgraded (rpm -Uvh) from 2.0.2 build 786 to 795. -did not shutdown service. -Encountered error. It failed to copy /opt/couchbase/var/lib/couchbase/config/config.dat see below. -rw-r--r-- 1 root root 127886701 May 4 00:26 couchbase-server-enterprise_x86_64_2.0.2-786-rel.rpm drwxr-xr-x 3 root root 4096 May 6 17:46 mm -rw-r--r-- 1 root root 128296356 May 10 09:54 couchbase-server-enterprise_x86_64_2.0.2-795-rel.rpm [root@cen-1910 ~]# rpm -Uvh couchbase-server-enterprise_x86_64_2.0.2-795-rel.rpm Preparing... ########################################### [100%] Stopping couchbase-server Minimum RAM required : 4 GB System RAM configured : 8174464 kB Minimum number of processors required : 4 cores Number of processors on the system : 4 cores 1:couchbase-server ########################################### [100%] Upgrading couchbase-server ... /opt/couchbase/bin/install/cbupgrade -c /opt/couchbase/var/lib/couchbase/config -a yes Automatic mode: running without interactive questions or confirmations. Analysing... Previous config.dat file is /opt/couchbase/var/lib/couchbase/config/config.dat Target node: ns_1@10.3.2.49 Upgrading from 2.0 Couchbase should not be running. Please use: /etc/init.d/couchbase-server stop Database dir: /opt/couchbase/var/lib/couchbase/data Buckets to upgrade: beer-sample,gamesim-sample,default Checking disk space available for buckets in directory: /opt/couchbase/var/lib/couchbase/data Free disk bucket space wanted: 0.0 Free disk bucket space available: 15459573760 Free disk space factor: 2.0 Ok. Analysis complete. Copying /opt/couchbase/var/lib/couchbase/config/config.dat cp /opt/couchbase/var/lib/couchbase/config/config.dat /opt/couchbase/bin/install/../../var/lib/couchbase/config/config.dat Error: /opt/couchbase/bin/install/../../var/lib/couchbase/ip already exists while copying /opt/couchbase/var/lib/couchbase/ip.rpmsave Starting couchbase-server[ OK ] You have successfully installed Couchbase Server. Please browse to http://cen-1910:8091/ to configure your server. Please refer to http://couchbase.com for additional resources. Please note that you have to update your firewall configuration to allow connections to the following ports: 11211, 11210, 11209, 4369, 8091, 8092 and from 21100 to 21299. By using this software you agree to the End User License Agreement. See /opt/couchbase/LICENSE.txt. [root@cen-1910 ~]# rpm -qa|grep couchbase couchbase-server-2.0.2-795 [root@cen-1910 ~]# |
| Comments |
| Comment by Bin Cui [ 14/May/13 ] |
| http://www.couchbase.com/issues/browse/MB-8245 |
| Comment by Maria McDuff [ 15/May/13 ] |
|
fixed. verified against build 803. see below: [root@cen-1910 ~]# rpm -Uvh couchbase-server-enterprise_x86_64_2.0.2-803-rel.rpm Preparing... ########################################### [100%] Stopping couchbase-server Minimum RAM required : 4 GB System RAM configured : 8174464 kB Minimum number of processors required : 4 cores Number of processors on the system : 4 cores 1:couchbase-server ########################################### [100%] Upgrading couchbase-server ... /opt/couchbase/bin/install/cbupgrade -c /opt/couchbase/var/lib/couchbase/config -a yes Automatic mode: running without interactive questions or confirmations. Analysing... Previous config.dat file is /opt/couchbase/var/lib/couchbase/config/config.dat Target node: ns_1@10.3.2.49 Done: previous node configuration is empty. Starting couchbase-server[ OK ] You have successfully installed Couchbase Server. Please browse to http://cen-1910:8091/ to configure your server. Please refer to http://couchbase.com for additional resources. Please note that you have to update your firewall configuration to allow connections to the following ports: 11211, 11210, 11209, 4369, 8091, 8092 and from 21100 to 21299. By using this software you agree to the End User License Agreement. See /opt/couchbase/LICENSE.txt. |
[MB-8275] installation document in 2.0 in rpm shows old printout after installation Created: 14/May/13 Updated: 15/May/13 Resolved: 15/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | documentation |
| Affects Version/s: | 2.0 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Thuan Nguyen | Assignee: | Karen Zeller |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | centos, redhat | ||
| Operating System: | Centos 64-bit |
| Description |
|
The information in http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-getting-started-install-redhat.html
showing the old print out of 1.8.1 version [root@cen-1907 ~]# rpm -i couchbase-server-enterprise_x86_64_1.8.1-937-rel.rpm Starting couchbase-server[ OK ] You have successfully installed Couchbase Server. Please browse to http://cen-1907:8091/ to configure your server. Please refer to http://couchbase.com for additional resources. Please note that you have to update your firewall configuration to allow connections to the following ports: 11211, 11210, 11209, 4369, 8091 and from 21100 to 21299. By using this software you agree to the End User License Agreement. See /opt/couchbase/LICENSE.txt. From 2.0 and later, it should show requirement of hardware as in the following printout [root@cen-1907 ~]# rpm -i couchbase-server-enterprise_x86_64_2.0.x-xx-rel.rpm Minimum RAM required : 4 GB System RAM configured : 8174464 kB Minimum number of processors required : 4 cores Number of processors on the system : 4 cores Starting couchbase-server[ OK ] You have successfully installed Couchbase Server. Please browse to http://cen-1907:8091/ to configure your server. Please refer to http://couchbase.com for additional resources. Please note that you have to update your firewall configuration to allow connections to the following ports: 11211, 11210, 11209, 4369, 8091, 8092 and from 21100 to 21299. By using this software you agree to the End User License Agreement. See /opt/couchbase/LICENSE.txt. It may be out of date in other package. |
| Comments |
| Comment by Karen Zeller [ 15/May/13 ] |
| added! |
[MB-8100] installer error on missing dependencies (centos5.8). tried other vm on same OS. worked fine. Created: 15/Apr/13 Updated: 15/May/13 Resolved: 13/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | installer |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maria McDuff | Assignee: | Maria McDuff |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
rpm -i couchbase-server-enterprise_x86_64_2.0.2-764-rel.rpm
[root@grape-016 maria]# hostname grape-016 [root@grape-016 maria]# Tried installing build 2.0.2 build 764. failed with the following missing dependencies. i tried in another vm. install succeeded. [root@grape-016 maria]# cat /etc/*release* [root@grape-016 maria]# rm -rf /opt/couchbase/ [root@grape-016 maria]# rpm -i couchbase-server-enterprise_x86_64_2.0.2-764-rel.rpm error: Failed dependencies: libc.so.6()(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libc.so.6(GLIBC_2.2.5)(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libc.so.6(GLIBC_2.3)(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libc.so.6(GLIBC_2.3.2)(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libc.so.6(GLIBC_2.3.4)(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libc.so.6(GLIBC_2.4)(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libcrypt.so.1()(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libcrypto.so.6()(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libdl.so.2()(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libdl.so.2(GLIBC_2.2.5)(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libgcc_s.so.1()(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libgcc_s.so.1(GCC_3.0)(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libm.so.6()(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libm.so.6(GLIBC_2.2.5)(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libncurses.so.5()(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libnsl.so.1()(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libpthread.so.0()(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libpthread.so.0(GLIBC_2.2.5)(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libpthread.so.0(GLIBC_2.3.2)(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libresolv.so.2()(64bit) is needed by couchbase-server-2.0.2-764.x86_64 librt.so.1()(64bit) is needed by couchbase-server-2.0.2-764.x86_64 librt.so.1(GLIBC_2.2.5)(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libssl.so.6()(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libstdc++.so.6()(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libstdc++.so.6(CXXABI_1.3)(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libstdc++.so.6(CXXABI_1.3.1)(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libstdc++.so.6(GLIBCXX_3.4)(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libutil.so.1()(64bit) is needed by couchbase-server-2.0.2-764.x86_64 libutil.so.1(GLIBC_2.2.5)(64bit) is needed by couchbase-server-2.0.2-764.x86_64 ---------------- [root@cen-1910 ~]# rpm -i couchbase-server-enterprise_x86_64_2.0.2-764-rel.rpm Minimum RAM required : 4 GB System RAM configured : 8174464 kB Minimum number of processors required : 4 cores Number of processors on the system : 4 cores Starting couchbase-server[ OK ] You have successfully installed Couchbase Server. Please browse to http://cen-1910:8091/ to configure your server. Please refer to http://couchbase.com for additional resources. Please note that you have to update your firewall configuration to allow connections to the following ports: 11211, 11210, 11209, 4369, 8091, 8092 and from 21100 to 21299. By using this software you agree to the End User License Agreement. See /opt/couchbase/LICENSE.txt. |
| Comments |
| Comment by Maria McDuff [ 29/Apr/13 ] |
| anil to chk the sys reqt's. |
| Comment by Anil Kumar [ 13/May/13 ] |
|
Here is the documentation for hardware requirements that are recommended for installation. http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-getting-started-prepare-hardware.html |
| Comment by Maria McDuff [ 13/May/13 ] |
|
this is hw reqt's. where are the sw or system requirements? i meet this hw reqt's for installation. |
| Comment by Anil Kumar [ 13/May/13 ] |
| Supported Platform http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-getting-started-prepare-platforms.html check the instructions to install openssl before installing couchbase. |
[MB-8274] items not draining seems... ep-engine is deadlocked Created: 14/May/13 Updated: 15/May/13 Resolved: 14/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Tommie McAfee | Assignee: | Tommie McAfee |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
during xdcr longevity test memcached kept restarting on a node
ui reports: Control connection to memcached on 'ns_1@172.23.105.57' disconnected: {badmatch, {error, couldnt_connect_to_memcached}} (guinep-s1050.sc.couchbase.com) everytime I do a gdb backtrace I see threads 16,17,18 in the lock() method. Thread 18 (Thread 0x7f1212849700 (LWP 21113)): #0 0x00007f121a5b4054 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f121a5af388 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x00007f121a5af257 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f1214436f7a in Mutex::acquire (this=0x641e0f0) at src/mutex.cc:79 #4 0x00007f121447d566 in lock (this=0x641e000) at ./src/locks.hh:48 #5 LockHolder (this=0x641e000) at ./src/locks.hh:26 #6 CouchNotifier::selectBucket (this=0x641e000) at src/couch-kvstore/couch-notifier.cc:723 #7 0x00007f121447dbcf in CouchNotifier::processInput (this=0x641e000) at src/couch-kvstore/couch-notifier.cc:606 #8 0x00007f121447e475 in waitOnce (this=0x641e000, vbs=..., file_version=1, header_offset=212992, cb=...) at src/couch-kvstore/couch-notifier.cc:675 #9 CouchNotifier::notify_update (this=0x641e000, vbs=..., file_version=1, header_offset=212992, cb=...) at src/couch-kvstore/couch-notifier.cc:755 #10 0x00007f1214475cb8 in notify_headerpos_update (this=0x643fb00, vbid=760, rev=1, docs=0x323bd880, docinfos=0x323bda40, docCount=56) at ./src/couch-kvstore/couch-notifier.hh:144 #11 CouchKVStore::saveDocs (this=0x643fb00, vbid=760, rev=1, docs=0x323bd880, docinfos=0x323bda40, docCount=56) at src/couch-kvstore/couch-kvstore.cc:1498 #12 0x00007f121447628b in CouchKVStore::commit2couchstore (this=0x643fb00) at src/couch-kvstore/couch-kvstore.cc:1410 #13 0x00007f121447647a in CouchKVStore::commit (this=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:806 #14 0x00007f1214401f56 in EventuallyPersistentStore::flushVBucket (this=0x6395c00, vbid=760) at src/ep.cc:1919 #15 0x00007f121442aeb9 in doFlush (this=0x520b7a0, tid=1098) at src/flusher.cc:222 #16 Flusher::step (this=0x520b7a0, tid=1098) at src/flusher.cc:152 #17 0x00007f121443ac10 in ExecutorThread::run (this=0x52bfba0) at src/scheduler.cc:148 #18 0x00007f121443b32d in launch_executor_thread (arg=0x52bfba0) at src/scheduler.cc:34 #19 0x00007f121a5ad851 in start_thread () from /lib64/libpthread.so.0 #20 0x00007f121a2fb90d in clone () from /lib64/libc.so.6 Thread 17 (Thread 0x7f1211e48700 (LWP 21114)): #0 0x00007f121a5b4054 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f121a5af388 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x00007f121a5af257 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f1214436f7a in Mutex::acquire (this=0x641e0f0) at src/mutex.cc:79 #4 0x00007f121447e2c3 in lock (this=0x641e000, vbs=..., file_version=1, header_offset=217088, cb=...) at ./src/locks.hh:48 #5 LockHolder (this=0x641e000, vbs=..., file_version=1, header_offset=217088, cb=...) at ./src/locks.hh:26 #6 CouchNotifier::notify_update (this=0x641e000, vbs=..., file_version=1, header_offset=217088, cb=...) at src/couch-kvstore/couch-notifier.cc:753 #7 0x00007f1214475cb8 in notify_headerpos_update (this=0x64d4c00, vbid=693, rev=1, docs=0x93f4480, docinfos=0x94186c0, docCount=72) at ./src/couch-kvstore/couch-notifier.hh:144 #8 CouchKVStore::saveDocs (this=0x64d4c00, vbid=693, rev=1, docs=0x93f4480, docinfos=0x94186c0, docCount=72) at src/couch-kvstore/couch-kvstore.cc:1498 #9 0x00007f121447628b in CouchKVStore::commit2couchstore (this=0x64d4c00) at src/couch-kvstore/couch-kvstore.cc:1410 #10 0x00007f121447647a in CouchKVStore::commit (this=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:806 #11 0x00007f1214401f56 in EventuallyPersistentStore::flushVBucket (this=0x6395c00, vbid=693) at src/ep.cc:1919 #12 0x00007f121442aeb9 in doFlush (this=0x520b680, tid=1095) at src/flusher.cc:222 #13 Flusher::step (this=0x520b680, tid=1095) at src/flusher.cc:152 #14 0x00007f121443ac10 in ExecutorThread::run (this=0x52bfa00) at src/scheduler.cc:148 #15 0x00007f121443b32d in launch_executor_thread (arg=0x52bfa00) at src/scheduler.cc:34 #16 0x00007f121a5ad851 in start_thread () from /lib64/libpthread.so.0 #17 0x00007f121a2fb90d in clone () from /lib64/libc.so.6 Thread 16 (Thread 0x7f1211447700 (LWP 21115)): #0 0x00007f121a5b4054 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f121a5af388 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x00007f121a5af257 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f1214436f7a in Mutex::acquire (this=0x641e0f0) at src/mutex.cc:79 #4 0x00007f121447e2c3 in lock (this=0x641e000, vbs=..., file_version=1, header_offset=212992, cb=...) at ./src/locks.hh:48 #5 LockHolder (this=0x641e000, vbs=..., file_version=1, header_offset=212992, cb=...) at ./src/locks.hh:26 #6 CouchNotifier::notify_update (this=0x641e000, vbs=..., file_version=1, header_offset=212992, cb=...) at src/couch-kvstore/couch-notifier.cc:753 #7 0x00007f1214475cb8 in notify_headerpos_update (this=0x64d4600, vbid=750, rev=1, docs=0x321a4480, docinfos=0x321a4fc0, docCount=65) at ./src/couch-kvstore/couch-notifier.hh:144 #8 CouchKVStore::saveDocs (this=0x64d4600, vbid=750, rev=1, docs=0x321a4480, docinfos=0x321a4fc0, docCount=65) at src/couch-kvstore/couch-kvstore.cc:1498 #9 0x00007f121447628b in CouchKVStore::commit2couchstore (this=0x64d4600) at src/couch-kvstore/couch-kvstore.cc:1410 #10 0x00007f121447647a in CouchKVStore::commit (this=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:806 #11 0x00007f1214401f56 in EventuallyPersistentStore::flushVBucket (this=0x6395c00, vbid=750) at src/ep.cc:1919 #12 0x00007f121442aeb9 in doFlush (this=0x520b560, tid=1096) at src/flusher.cc:222 #13 Flusher::step (this=0x520b560, tid=1096) at src/flusher.cc:152 #14 0x00007f121443ac10 in ExecutorThread::run (this=0x52bf860) at src/scheduler.cc:148 #15 0x00007f121443b32d in launch_executor_thread (arg=0x52bf860) at src/scheduler.cc:34 #16 0x00007f121a5ad851 in start_thread () from /lib64/libpthread.so.0 #17 0x00007f121a2fb90d in clone () from /lib64/libc.so.6 Thread 15 (Thread 0x7f1210a46700 (LWP 21116)): #0 0x00007f121a5b4054 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f121a5af388 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x00007f121a5af257 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f1214436f7a in Mutex::acquire (this=0x528f0f0) at src/mutex.cc:79 #4 0x00007f121447d566 in lock (this=0x528f000) at ./src/locks.hh:48 #5 LockHolder (this=0x528f000) at ./src/locks.hh:26 #6 CouchNotifier::selectBucket (this=0x528f000) at src/couch-kvstore/couch-notifier.cc:723 #7 0x00007f121447dbcf in CouchNotifier::processInput (this=0x528f000) at src/couch-kvstore/couch-notifier.cc:606 #8 0x00007f121447e475 in waitOnce (this=0x528f000, vbs=..., file_version=2, header_offset=1798144, cb=...) at src/couch-kvstore/couch-notifier.cc:675 #9 CouchNotifier::notify_update (this=0x528f000, vbs=..., file_version=2, header_offset=1798144, cb=...) at src/couch-kvstore/couch-notifier.cc:755 #10 0x00007f1214475cb8 in notify_headerpos_update (this=0x5295b00, vbid=1023, rev=2, docs=0x1a897c00, docinfos=0x1a896a80, docCount=107) at ./src/couch-kvstore/couch-notifier.hh:144 #11 CouchKVStore::saveDocs (this=0x5295b00, vbid=1023, rev=2, docs=0x1a897c00, docinfos=0x1a896a80, docCount=107) at src/couch-kvstore/couch-kvstore.cc:1498 #12 0x00007f121447628b in CouchKVStore::commit2couchstore (this=0x5295b00) at src/couch-kvstore/couch-kvstore.cc:1410 #13 0x00007f121447647a in CouchKVStore::commit (this=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:806 #14 0x00007f1214401f56 in EventuallyPersistentStore::flushVBucket (this=0x527e000, vbid=1023) at src/ep.cc:1919 #15 0x00007f121442aeb9 in doFlush (this=0x520a360, tid=12) at src/flusher.cc:222 #16 Flusher::step (this=0x520a360, tid=12) at src/flusher.cc:152 #17 0x00007f121443ac10 in ExecutorThread::run (this=0x52e6ea0) at src/scheduler.cc:148 #18 0x00007f121443b32d in launch_executor_thread (arg=0x52e6ea0) at src/scheduler.cc:34 #19 0x00007f121a5ad851 in start_thread () from /lib64/libpthread.so.0 #20 0x00007f121a2fb90d in clone () from /lib64/libc.so.6 there are also errors in memcached.logs about "Too many connections" I have full backtrace attached and machine is currently live. |
| Comments |
| Comment by Maria McDuff [ 14/May/13 ] |
| per bug triage, upgrading to blocker. |
| Comment by Ketaki Gangal [ 14/May/13 ] |
| Probably a related issue here http://www.couchbase.com/issues/browse/MB-8259 |
| Comment by Chiyoung Seo [ 14/May/13 ] |
|
Jin,
Please take a look at to see if this is a regression from MRW. |
| Comment by Mike Wiederhold [ 14/May/13 ] |
|
Jin, This looks like a deadlock in the couch-notifier. Also, those "Too many connections" messages mean that memcached has too many open connections and cannot accept a new one. |
| Comment by Jin Lim [ 14/May/13 ] |
| Yep there is a deadlock. excellent finding! Thanks. |
| Comment by Jin Lim [ 14/May/13 ] |
| http://review.couchbase.org/#/c/26300/, fix is uploaded for the review. |
| Comment by Jin Lim [ 14/May/13 ] |
| the fix got merged |
| Comment by Maria McDuff [ 15/May/13 ] |
| pls verify / close. |
[MB-8287] couhbase web console is down although erlang process is running Created: 15/May/13 Updated: 15/May/13 Resolved: 15/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Thuan Nguyen | Assignee: | Aleksey Kondratenko |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | windows 2008 r2 64bit | ||
| Attachments: |
|
| Operating System: | Windows 64-bit |
| Description |
|
Environment:
4 nodes windows server 2008 r2 64bit (each node has 4 core cpu annd 4GB RAM) Run sanity test on build 2.0.2-803. After uninstall old build and install build 2.0.2-803, connection to couchbase server of one node failed. I go to that node (10.3.2.21) and see erl process is running. Stop couchbase server. Restart couchbase server erlang process is running but web console is not up Check firewall, all firewall are turn off The node is in failed state now (10.3.2.21 with Administrator/Membase123) |
| Comments |
| Comment by Aleksey Kondratenko [ 15/May/13 ] |
|
Logger couldn't create log files because of eaccess errors.
Windows is weird but looks like something from old installation was causing that. I don't know maybe something like log file of previous installation opened in editor or something like that. Reboot of machine fixed problem. |
[MB-7954] erl_crash.dump when stop couchbase: Stopping couchbase-server{error_logger,{{2013,3,21},{4,54,2}},"Protocol: ~p: register error: ~p~n",["inet_tcp",{{badmatch,{error,duplicate_name}}, Created: 21/Mar/13 Updated: 15/May/13 Resolved: 14/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Task | Priority: | Critical |
| Reporter: | Andrei Baranouski | Assignee: | Andrei Baranouski |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | 2.0.2-741-rel, CentOS release 5.8 | ||
| Description |
|
http://qa.hq.northscale.net/view/2.0.1/job/ubuntu-64-2.0-new-rebalance-tests-P0/395/consoleFull
./testrunner -i /tmp/rebalance-tests.ini get-logs=True,wait_timeout=180,GROUP=P0,get-cbcollect-info=True -t rebalance.rebalanceout.RebalanceOutTests.rebalance_out_with_warming_up,nodes_out=3,items=500000,replicas=2,max_verify=100000,GROUP=OUT;P0 there are no any specific cases: 1. cluster 7 nodes, 1 bucket 2. try to stop server om 10.3.121.64 [root@cen-2707 ~]# /etc/init.d/couchbase-server status couchbase-server is running [root@cen-2707 ~]# /etc/init.d/couchbase-server stop Stopping couchbase-server{error_logger,{{2013,3,21},{4,54,2}},"Protocol: ~p: register error: ~p~n",["inet_tcp",{{badmatch,{error,duplicate_name}},[{inet_tcp_dist,listen,1},{net_kernel,start_protos,4},{net_kernel,start_protos,3},{net_kernel,init_node,2},{net_kernel,init,1},{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}]} {error_logger,{{2013,3,21},{4,54,2}},crash_report,[[{initial_call,{net_kernel,init,['Argument__1']}},{pid,<0.20.0>},{registered_name,[]},{error_info,{exit,{error,badarg},[{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}},{ancestors,[net_sup,kernel_sup,<0.9.0>]},{messages,[]},{links,[#Port<0.53>,<0.17.0>]},{dictionary,[{longnames,true}]},{trap_exit,true},{status,running},{heap_size,377},{stack_size,24},{reductions,475}],[]]} {error_logger,{{2013,3,21},{4,54,2}},supervisor_report,[{supervisor,{local,net_sup}},{errorContext,start_error},{reason,{'EXIT',nodistribution}},{offender,[{pid,undefined},{name,net_kernel},{mfargs,{net_kernel,start_link,[['executioner@executioner',longnames]]}},{restart_type,permanent},{shutdown,2000},{child_type,worker}]}]} {error_logger,{{2013,3,21},{4,54,2}},supervisor_report,[{supervisor,{local,kernel_sup}},{errorContext,start_error},{reason,shutdown},{offender,[{pid,undefined},{name,net_sup},{mfargs,{erl_distribution,start_link,[]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]}]} {error_logger,{{2013,3,21},{4,54,2}},std_info,[{application,kernel},{exited,{shutdown,{kernel,start,[normal,[]]}}},{type,permanent}]} {"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}}"} Crash dump was written to: erl_crash.dump.03-21-2013-04:54:02.29366 Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}}) |
| Comments |
| Comment by Andrei Baranouski [ 21/Mar/13 ] |
|
https://s3.amazonaws.com/bugdb/jira/MB-7954/erl_crash.dump.03-21-2013-04:54:02.29366
https://s3.amazonaws.com/bugdb/jira/MB-7954/10.3.121.63-3212013-519-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7954/10.3.121.66-3212013-542-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7954/10.3.121.69-3212013-544-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7954/10.5.2.13-3212013-551-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7954/10.5.2.14-3212013-553-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7954/10.5.2.15-3212013-556-diag.zip Please, note that test hanged on [2013-03-20 10:47:44,316] - [remote_util:1264] INFO - running command.raw /etc/init.d/couchbase-server stop ( will fix it adding timeout for ssh session connection) output that are above I got manually [root@cen-2707 ~]# ps -ef| grep couch 101 17431 1 0 Mar20 ? 00:00:00 /opt/couchbase/lib/erlang/erts-5.8.5/bin/epmd -daemon 101 17447 1 6 Mar20 ? 01:59:16 /opt/couchbase/lib/erlang/erts-5.8.5/bin/beam.smp -A 16 -sbt u -P 327680 -K true -MMmcs 30 -- -root /opt/couchbase/lib/erlang -progname erl -- -home /opt/couchbase -- -smp enable -setcookie nocookie -kernel inet_dist_listen_min 21100 inet_dist_listen_max 21299 error_logger false -sasl sasl_error_logger false -noshell -noinput -noshell -noinput -run ns_bootstrap -- -couch_ini /opt/couchbase/etc/couchdb/default.ini /opt/couchbase/etc/couchdb/default.d/capi.ini /opt/couchbase/etc/couchdb/default.d/geocouch.ini /opt/couchbase/etc/couchdb/local.ini -ns_server config_path "/opt/couchbase/etc/couchbase/static_config" -ns_server pidfile "/opt/couchbase/var/lib/couchbase/couchbase-server.pid" -ns_server nodefile "/opt/couchbase/var/lib/couchbase/couchbase-server.node" -ns_server cookiefile "/opt/couchbase/var/lib/couchbase/couchbase-server.cookie" -ns_server enable_mlockall true 101 17476 17447 0 Mar20 ? 00:00:01 /opt/couchbase/lib/erlang/lib/os_mon-2.2.7/priv/bin/memsup 101 17477 17447 0 Mar20 ? 00:00:00 /opt/couchbase/lib/erlang/lib/os_mon-2.2.7/priv/bin/cpu_sup 101 17478 17447 0 Mar20 ? 00:00:00 /opt/couchbase/lib/erlang/lib/ssl-4.1.6/priv/bin/ssl_esock 101 24170 17447 0 Mar20 ? 00:00:00 /opt/couchbase/bin/moxi -Z port_listen=11211,default_bucket_name=default,downstream_max=1024,downstream_conn_max=4,connect_max_errors=5,connect_retry_interval=30000,connect_timeout=400,auth_timeout=100,cycle=200,downstream_conn_queue_timeout=200,downstream_timeout=5000,wait_queue_timeout=200 -z url=http://127.0.0.1:8091/pools/default/saslBucketsStreaming -p 0 -Y y -O stderr 101 24171 17447 97 Mar20 ? 19:17:37 /opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler.so -X /opt/couchbase/lib/memcached/file_logger.so,cyclesize=104857600;sleeptime=19;filename=/opt/couchbase/var/lib/couchbase/logs/memcached.log -l 0.0.0.0:11210,0.0.0.0:11209:1000 -p 11210 -E /opt/couchbase/lib/memcached/bucket_engine.so -B binary -r -c 10000 -e admin=_admin;default_bucket_name=default;auto_create=false root 25227 25097 0 Mar20 ? 00:00:00 /bin/sh /etc/init.d/couchbase-server stop root 25249 25227 0 Mar20 ? 00:00:00 /bin/sh /opt/couchbase/bin/couchbase-server -k root 25254 25249 0 Mar20 ? 00:00:02 /opt/couchbase/lib/erlang/erts-5.8.5/bin/beam.smp -- -root /opt/couchbase/lib/erlang -progname erl -- -home /root -- -name executioner@executioner -noshell -hidden -setcookie zplcnapphyjqgsun -eval ns_bootstrap:remote_stop('ns_1@10.3.121.64') root 29904 1 0 05:27 ? 00:00:00 /usr/bin/python /opt/couchbase/bin/cbcollect_info 10.3.121.64-3212013-522-diag.zip root 30260 29904 0 05:28 ? 00:00:00 python /opt/couchbase/lib/python/cbstats -a 127.0.0.1:11210 all -b _admin -p _admin root 30594 1 0 06:04 ? 00:00:00 /usr/bin/python /opt/couchbase/bin/cbcollect_info 10.3.121.64-3212013-558-diag.zip root 30951 30594 0 06:05 ? 00:00:00 python /opt/couchbase/lib/python/cbstats -a 127.0.0.1:11210 all -b _admin -p _admin root 31069 29328 0 06:37 pts/0 00:00:00 grep couch [root@cen-2707 ~]# /etc/init.d/couchbase-server status couchbase-server is running each time it generated new dump: [root@cen-2707 ~]# /etc/init.d/couchbase-server stop Stopping couchbase-server{error_logger,{{2013,3,21},{6,37,41}},"Protocol: ~p: register error: ~p~n",["inet_tcp",{{badmatch,{error,duplicate_name}},[{inet_tcp_dist,listen,1},{net_kernel,start_protos,4},{net_kernel,start_protos,3},{net_kernel,init_node,2},{net_kernel,init,1},{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}]} {error_logger,{{2013,3,21},{6,37,41}},crash_report,[[{initial_call,{net_kernel,init,['Argument__1']}},{pid,<0.20.0>},{registered_name,[]},{error_info,{exit,{error,badarg},[{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}},{ancestors,[net_sup,kernel_sup,<0.9.0>]},{messages,[]},{links,[#Port<0.53>,<0.17.0>]},{dictionary,[{longnames,true}]},{trap_exit,true},{status,running},{heap_size,377},{stack_size,24},{reductions,475}],[]]} {error_logger,{{2013,3,21},{6,37,41}},supervisor_report,[{supervisor,{local,net_sup}},{errorContext,start_error},{reason,{'EXIT',nodistribution}},{offender,[{pid,undefined},{name,net_kernel},{mfargs,{net_kernel,start_link,[['executioner@executioner',longnames]]}},{restart_type,permanent},{shutdown,2000},{child_type,worker}]}]} {error_logger,{{2013,3,21},{6,37,41}},supervisor_report,[{supervisor,{local,kernel_sup}},{errorContext,start_error},{reason,shutdown},{offender,[{pid,undefined},{name,net_sup},{mfargs,{erl_distribution,start_link,[]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]}]} {error_logger,{{2013,3,21},{6,37,41}},std_info,[{application,kernel},{exited,{shutdown,{kernel,start,[normal,[]]}}},{type,permanent}]} {"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}}"} Crash dump was written to: erl_crash.dump.03-21-2013-06:37:41.31081 Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}}) |
| Comment by Maria McDuff [ 01/Apr/13 ] |
| per bug scrub: moving up to critical. |
| Comment by Andrei Baranouski [ 08/Apr/13 ] |
|
reproduced against 2.0.2-758 http://qa.hq.northscale.net/view/2.0.1/job/ubuntu-64-2.0-new-rebalance-tests-P0/408/consoleFull now it hangs on 'memcached stats all' for cbcollect_info [root@cen-2707 ~]# /opt/couchbase/bin/cbcollect_info 10.3.121.64-472013-1213-diag.zip escript: Failed to open file: cbdump-config uname (uname -a) - OK Directory structure (ls -lR '/opt/couchbase' /opt/membase /var/membase /etc/opt/membase) - Exit code 2 Database directory structure (ls -lR /opt/couchbase/var/lib/couchbase/data) - OK Directory structure membase - previous versions (ls -lR /opt/membase /var/membase /var/opt/membase /etc/opt/membase) - Exit code 2 Process list snapshot (export TERM=''; top -Hb -n1 || top -H n1) - OK Process list (ps -AwwL -o user,pid,lwp,ppid,nlwp,pcpu,maj_flt,min_flt,pri,nice,vsize,rss,tty,stat,wchan:12,start,bsdtime,command) - OK Raw /proc/vmstat (cat /proc/vmstat) - OK Raw /proc/mounts (cat /proc/mounts) - OK Raw /proc/partitions (cat /proc/partitions) - OK Raw /proc/diskstats (cat /proc/diskstats) - OK Raw /proc/interrupts (cat /proc/interrupts) - OK Swap configuration (free -t) - OK Swap configuration (swapon -s) - OK Kernel modules (lsmod) - OK Distro version (cat /etc/redhat-release) - OK Distro version (lsb_release -a) - OK Installed software (rpm -qa) - OK Installed software (COLUMNS=300 dpkg -l) - Exit code 127 Extended iostat (iostat -x -p ALL 1 10 || iostat -x 1 10) - Exit code 127 Core dump settings (find /proc/sys/kernel -type f -name '*core*' -print -exec cat '{}' ';') - OK sysctl settings (sysctl -a) - OK netstat -nap (netstat -nap) - OK relevant lsof output (lsof -n | grep 'moxi\|memcached\|vbucketmigrator\|beam\|couch_compact\|godu\|portsigar') - OK Network configuration (ifconfig -a) - OK Taking sample 2 after 10.000000 seconds - OK Network configuration (echo link addr neigh rule route netns | xargs -n1 -- sh -x -c 'ip $1 list' --) - Exit code 124 Network status (netstat -an) - OK Network routing table (netstat -rn) - OK Arp cache (arp -na) - OK Filesystem (df -ha) - OK System activity reporter (sar 1 10) - Exit code 127 System paging activity (vmstat 1 10) - OK System uptime (uptime) - OK couchbase user definition (getent passwd couchbase) - OK couchbase user limits (su couchbase -c "ulimit -a") - OK membase user definition (getent passwd membase) - Exit code 2 couchbase user limits (su couchbase -c "ulimit -a") - OK membase user limits (su membase -c "ulimit -a") - Exit code 1 Interrupt status (intrstat 1 10) - Exit code 127 Processor status (mpstat 1 10) - Exit code 127 System log (cat /var/adm/messages) - Exit code 1 System log (cat /var/log/syslog) - Exit code 1 System log (cat /var/log/messages) - OK All logs (tar cz /var/log/syslog* /var/log/dmesg /var/log/messages* /var/log/daemon* /var/log/debug* /var/log/kern.log* 2>/dev/null) - Exit code 2 Relevant proc data ((pgrep moxi; pgrep beam.smp; pgrep memcached; pgrep couch_compact; pgrep portsigar ; pgrep godu) | xargs -n1 -- sh -c 'echo $1; cat /proc/$1/status; cat /proc/$1/limits; cat /proc/$1/smaps; cat /proc/$1/numa_maps; echo' --) - OK NUMA data (numactl --hardware) - OK NUMA data (numactl --show) - OK NUMA data (cat /sys/devices/system/node/node*/numastat) - OK Version file (cat '/opt/couchbase/VERSION.txt') - OK Manifest file (cat '/opt/couchbase/manifest.txt') - OK Manifest file (cat '/opt/couchbase/manifest.xml') - OK Memcached logs (cd '/opt/couchbase'/var/lib/couchbase/logs && for file in $(ls -tr memcached.log.*); do cat "$file"; done) - OK Ini files (cd '/opt/couchbase'/etc && for file in $(find . -type f -name '*.ini'); do echo -e " File: ${file} ";cat "$file"; done) - OK Kernel log buffer (dmesg) - OK couchbase config ('/opt/couchbase/bin'/escript '/opt/couchbase/bin'/cbdump-config '/opt/couchbase/var/lib/couchbase/config/config.dat') - OK couchbase logs (debug) (cbbrowse_logs) - OK couchbase logs (info) (cbbrowse_logs info) - OK couchbase logs (error) (cbbrowse_logs error) - OK couchbase logs (couchdb) (cbbrowse_logs couchdb) - OK couchbase logs (xdcr) (cbbrowse_logs xdcr) - OK couchbase logs (xdcr_errors) (cbbrowse_logs xdcr_errors) - OK couchbase logs (views) (cbbrowse_logs views) - OK couchbase logs (mapreduce errors) (cbbrowse_logs mapreduce_errors) - OK couchbase logs (stats) (cbbrowse_logs stats) - OK couchbase logs (babysitter) (cbbrowse_logs babysitter) - OK memcached stats all (cbstats -a 127.0.0.1:11210 all -b _admin -p _admin) - |
| Comment by Andrei Baranouski [ 08/Apr/13 ] |
| the same test on other environment http://qa.hq.northscale.net/view/2.0.1/job/centos-64-2.0-new-rebalance-mixed-cluster/60/consoleFull |
| Comment by Thuan Nguyen [ 08/Apr/13 ] |
|
Integrated in ui-testing #35 (See [http://qa.hq.northscale.net/job/ui-testing/35/]) Result = SUCCESS |
| Comment by Aleksey Kondratenko [ 08/Apr/13 ] |
| Andrei, we'll need backtraces of all memcached threads here. |
| Comment by Aleksey Kondratenko [ 08/Apr/13 ] |
|
Getting this off me.
Memcached being stuck is something to investigate. Plus when this error happens we _known_ we've initiated another shutdown earlier and haven't waited for it's completion. |
| Comment by Andrei Baranouski [ 18/Apr/13 ] |
| passed against 2.0.2-766 http://qa.hq.northscale.net/view/2.0.1/job/ubuntu-64-2.0-new-rebalance-tests-P0/417/consoleFull |
| Comment by Anil Kumar [ 14/May/13 ] |
| Andrei, can you update the bug if tests are passing add the details to the bug and close it. |
| Comment by Maria McDuff [ 14/May/13 ] |
| andrei, pls verify this is still passing in latest build. |
[MB-7211] cluster_reference link on remote cluster is broken Created: 18/Nov/12 Updated: 15/May/13 Resolved: 13/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | cross-datacenter-replication, UI |
| Affects Version/s: | 2.0 |
| Fix Version/s: | 2.0.2, 2.1 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Andrei Baranouski | Assignee: | Andrei Baranouski |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | 2.0-release-notes | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
build 1953
steps: 1. on XDCR tab create any cluster reference on other cluster result: link on cluster appeared in cluster_reference_list_container but it's broken see screenshot |
| Comments |
| Comment by Steve Yen [ 19/Nov/12 ] |
| links is missing 8091 port |
| Comment by Steve Yen [ 20/Nov/12 ] |
| moved to 2.0.1 per bug scrub |
| Comment by Karen Zeller [ 05/Dec/12 ] |
|
Added to RN as :
Under the XDCR tab for Couchbase Web Console, the link to a a destination cluster takes you to a missing URL. |
| Comment by Karen Zeller [ 05/Dec/12 ] |
|
Added to RN: Under the XDCR tab for Couchbase Web Console, the link to a a destination cluster
takes you to a missing URL. A simple workaround is to append ':8091' to the address in the newly opened browser tab. |
| Comment by Farshid Ghods [ 10/Dec/12 ] |
| deferring to 2.1 per bug scrub meeting ( Dipti & Farshid -December 7th ) |
| Comment by Junyi Xie [ 08/May/13 ] |
|
This bug has been filed a while back, it is not function broken but pretty annoying and will impact the user experience of XDCR
Is there any chance we can merge the fix to 2.0.2? |
| Comment by Pavel Blagodov [ 13/May/13 ] |
| http://review.couchbase.org/26262 |
| Comment by Aleksey Kondratenko [ 13/May/13 ] |
| Merged |
| Comment by Maria McDuff [ 13/May/13 ] |
| pls verify / close. |
| Comment by Andrei Baranouski [ 15/May/13 ] |
| works on 2.0.2-803 |
[MB-7493] UI (and backend) allow failing over last active server if it's down Created: 04/Jan/13 Updated: 15/May/13 Resolved: 15/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | ns_server, UI |
| Affects Version/s: | 2.0 |
| Fix Version/s: | 2.1 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Aleksey Kondratenko | Assignee: | Pavel Blagodov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
Near same steps as MB-7490
* start and cluster 2 nodes * fail over 1st * shutdow both * start back only 1st (failed over) * observe how UI now allows you to fail over second node even if it's last active cluster member (http://i.imgur.com/G6CMq.png) |
| Comments |
| Comment by Pavel Blagodov [ 01/May/13 ] |
| http://review.couchbase.org/26007 |
[MB-8206] memcached crashing/restarting with Assertion `metadata.size == 16' failed. Created: 06/May/13 Updated: 15/May/13 Resolved: 13/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Tommie McAfee | Assignee: | Tommie McAfee |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
Encountered this issue during an xdcr test on 2.0.2 build 780.
Created 3 buckets on local/remote site. Started unidirection replication from default bucket on local site to remote. I waited about 5 minutes after xdcr pairing to start loading data. Then I checked on the test about 30 minutes into the access phase I notices nodes were constantly restarting with these messages in UI logs: Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 134. Restarting. and from babysitter.1: [ns_server:info,2013-05-06T16:36:18.200,babysitter_of_ns_1@127.0.0.1:<0.134.0>:ns_port_server:log:168]memcached<0.134.0>: memcached: src/c ouch-kvstore/couch-kvstore.cc:1331: static int CouchKVStore::recordDbDump(Db*, DocInfo*, void*): Assertion `metadata.size == 16' failed. [ns_server:info,2013-05-06T16:36:18.307,babysitter_of_ns_1@127.0.0.1:<0.133.0>:supervisor_cushion:handle_info:58]Cushion managed superviso r for memcached failed: {abnormal,134} [ns_server:debug,2013-05-06T16:36:18.308,babysitter_of_ns_1@127.0.0.1:<0.135.0>:supervisor_cushion:init:39]starting ns_port_server with de lay of 5000 [error_logger:error,2013-05-06T16:36:18.307,babysitter_of_ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:log_msg:76]** Generi c server <0.134.0> terminating ** Last message in was {#Port<0.3039>,{exit_status,134}} ** When Server state == {state,#Port<0.3039>,memcached, {["memcached: src/couch-kvstore/couch-kvstore.cc:1331: static int CouchKVStore::recordDbDump(Db*, DocInfo*, void*): Assertion `metadata.size == 16' failed.", "Mon May 6 16:36:15.660505 PDT 3: (saslbucket) metadata loaded in 2819 ms", "Mon May 6 16:36:13.707220 PDT 3: (default) warmup completed in 804 ms", "Mon May 6 16:36:13.694935 PDT 3: (default) metadata loaded in 792 ms", "Mon May 6 16:36:13.018266 PDT 3: (default) Failed to load mutation log, falling back to key dump", "Mon May 6 16:36:12.942281 PDT 3: Extension support isn't implemented in this version of bucket_engine", "Mon May 6 16:36:12.936368 PDT 3: (saslbucket) Failed to load mutation log, falling back to key dump", "Mon May 6 16:36:12.886616 PDT 3: (default) Connected to mccouch: \"127.0.0.1:11213\"", "Mon May 6 16:36:12.885946 PDT 3: (default) Trying to connect to mccouch: \"127.0.0.1:11213\"", "Mon May 6 16:36:12.879580 PDT 3: Extension support isn't implemented in this version of bucket_engine", "Mon May 6 16:36:12.844079 PDT 3: (saslbucket1) Failed to load mutation log, falling back to key dump", "Mon May 6 16:36:12.834687 PDT 3: (saslbucket) Connected to mccouch: \"127.0.0.1:11213\"", "Mon May 6 16:36:12.834366 PDT 3: (saslbucket) Trying to connect to mccouch: \"127.0.0.1:11213\"", "Mon May 6 16:36:12.810127 PDT 3: Extension support isn't implemented in this version of bucket_engine", "Mon May 6 16:36:12.767939 PDT 3: (saslbucket1) Connected to mccouch: \"127.0.0.1:11213\"", "Mon May 6 16:36:12.767549 PDT 3: (saslbucket1) Trying to connect to mccouch: \"127.0.0.1:11213\"", empty], Seems the cluster never stabilizes but memcached keeps restarting. Logs from suspected host attached(172.23.105.45 ), remaining logs pending. |
| Comments |
| Comment by Tommie McAfee [ 06/May/13 ] |
|
ui alerts now starting to show: Metadata overhead warning. Over 51% of RAM allocated to bucket "saslbucket1" on node "172.23.105.44" is taken up by keys and metadata. may have some relation to this metadata assertion. |
| Comment by Mike Wiederhold [ 06/May/13 ] |
| Do you have a core dump for this? If so please let me know how I can access it otherwise just assign this issue back to me. |
| Comment by Tommie McAfee [ 07/May/13 ] |
|
Ok, did not find any cores. I've grabbed diags from all hosts and stored here: 172.23.105.69:/tmp/ should be able to use your rsa key to login |
| Comment by Tommie McAfee [ 07/May/13 ] |
| This may be result of very small bucket size. (256MB) As I'm about to run the test again I see in my testcfg that I didn't update bucket size. |
| Comment by Maria McDuff [ 07/May/13 ] |
| per bug triage, upgrading to critical. if bucket size is small, it shldn't de-stabilize memcached. |
| Comment by Jin Lim [ 08/May/13 ] |
| Tommie please zip and upload (or provide access info) the db files (vbuckets) under the data directory of the crashing node. We basically want to figure out if any of vbuckets files is being corrupted. Thanks. Please ping Jin if you need to coordinate the file transfer of these files. |
| Comment by Tommie McAfee [ 10/May/13 ] |
| Unfortunately vbucket files no longer available. they were deleted when build was upgraded. |
| Comment by Maria McDuff [ 10/May/13 ] |
| per bug triage, can you repro and attach the log files immediately for dev investigation? |
| Comment by Maria McDuff [ 13/May/13 ] |
|
Tommie unable to repro this.
Advised to open a new bug if same issue happens again. |
| Comment by Maria McDuff [ 15/May/13 ] |
| will re-open if happens again. |
[MB-8268] flag value not retained after items restored through cbrecovery Created: 13/May/13 Updated: 14/May/13 Resolved: 14/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | tools |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Abhinav Dangeti | Assignee: | Abhinav Dangeti |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | all | ||
| Description |
|
Flag value of items don't match after they're recovered through cbrecovery.
- Set the flag value at 100 during the loading phase - During verification: Exception: Bad result for flag value: 1677721600 != the value we set: 100 - Set the flag value at 2 during the loading phase - During verification: Exception: Bad result for flag value: 33554432 != the value we set: 2 |
| Comments |
| Comment by Bin Cui [ 13/May/13 ] |
| http://review.couchbase.org/#/c/26282/ |
| Comment by Maria McDuff [ 14/May/13 ] |
| pls verify / close. |
| Comment by Abhinav Dangeti [ 14/May/13 ] |
| Verified fix. |
[MB-7860] add section to couchbase 2.0 manual for downloading couchbase-server CE via yum/apt-get Created: 04/Mar/13 Updated: 14/May/13 Resolved: 14/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | documentation |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Karen Zeller | Assignee: | Phil Labee |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | info-request | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
users can now release couchbase server community edition by adding couchbase repository to centos/redhat and ubuntu and install couchbase by running yum or apt-get command
the instructions are documented here and need to be migrated to couchbase.com 2.0 manual http://hub.internal.couchbase.com/confluence/display/CR/How+to+Use+a+Linux+Repo+--+ubuntu+debian http://hub.internal.couchbase.com/confluence/display/CR/How+to+Use+a+Linux+Repo+--+yum+rpm |
| Comments |
| Comment by Karen Zeller [ 08/Mar/13 ] |
|
Hi Phil,
Is this truly ready for primetime? I created this ticket based on an email thread from Jin/Farshi, but when I look at the link, there are quite a few dependencies and steps. I had expected something simpler in terms of steps like that on the C Library: http://www.couchbase.com/develop/c/current |
| Comment by Karen Zeller [ 14/May/13 ] |
|
dup of |
[MB-7859] Error replicating vbucket XX: {badmatch, {error,timeout}} on a bidirectional replication setup. Created: 04/Mar/13 Updated: 14/May/13 Resolved: 14/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Ketaki Gangal | Assignee: | Ketaki Gangal |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | 2.0.1-170-rc | ||
| Attachments: |
|
| Description |
|
-Load 40M items on cluster1, 2M items on cluster2 - Setup a bidirectional replication between 2 clusters. - No front end load on either cluster , after loading the intial data. -After replicating 1.8M items , cluster2 replication shows errors w/ Badmatch timeout on serveral vbuckets. - On cluster1, seeing replication errors as "Failed to grab remote bucket info from any of known nodes" after replicating 25M items. Adding logs. Adding screenshots from both clusters. |
| Comments |
| Comment by Ketaki Gangal [ 04/Mar/13 ] |
|
Clusters : http://ec2-23-20-140-198.compute-1.amazonaws.com:8091/index.html#sec=replications http://ec2-23-22-53-131.compute-1.amazonaws.com:8091/index.html#sec=replications |
| Comment by Junyi Xie [ 04/Mar/13 ] |
| Looks like ns_server or ep_engine still have time_out issue we have hit many times before. If it is something expected and unavoidable, that means our system is unable to handle this type of XDCR workload. |
| Comment by Ketaki Gangal [ 04/Mar/13 ] |
| https://s3.amazonaws.com/bugdb/jira/MB-7858/bug01.tar logs from the clusters. |
| Comment by Junyi Xie [ 26/Mar/13 ] |
| Known timeout issue. Please assign to ns_server team. |
| Comment by Maria McDuff [ 22/Apr/13 ] |
| assigning to alk. |
| Comment by Maria McDuff [ 23/Apr/13 ] |
|
ketaki, shld this be a blocker? assigning critical for now... pls update this bug and assign to alk for investigation. also, is this sizing related, and does it recover? |
| Comment by Maria McDuff [ 13/May/13 ] |
| assigning to alk k. issue is still there in 2.0.2 |
| Comment by Aleksey Kondratenko [ 13/May/13 ] |
|
There is high chance that original issues is now addressed.
We need fresh logs in order to see if we're hitting same issue or not and if not what is new issue. |
| Comment by Maria McDuff [ 13/May/13 ] |
| Alk, ok, we will get you fresh logs. thanks. |
| Comment by Ketaki Gangal [ 14/May/13 ] |
| Not able to reproduce this w/ 202-800 latest runs. Will reopen if seen again. |
[MB-8236] Server error on {global,ns_rebalance_observer}, Created: 09/May/13 Updated: 14/May/13 Resolved: 14/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Ketaki Gangal | Assignee: | Ketaki Gangal |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | Centos 64, Build 2.0.2-789 | ||
| Description |
|
Seeing server errors during rebalance on build 789.
1. Setup a 2 node cluster. 2. Create 2 ddocs 3. Continous front end load a+ indexing 4. Start rebalance. Server error during processing: ["web request failed", {path,"/pools/default/tasks"}, {type,exit}, {what, {timeout, {gen_server,call, [{global,ns_rebalance_observer}, get_detailed_progress,10000]}}}, {trace, [{gen_server,call,3}, {ns_rebalance_observer, get_detailed_progress,0}, {ns_doctor,get_detailed_progress,0}, {ns_doctor,do_build_tasks_list,4}, {menelaus_web,handle_tasks,2}, {menelaus_web,loop,3}, {mochiweb_http,headers,5}, {proc_lib,init_p_do_apply,3}]}] (repeated 4 times) Logs at https://s3.amazonaws.com/bugdb/jira/bug-ns-server/bug.tar |
| Comments |
| Comment by Aliaksey Artamonau [ 09/May/13 ] |
| See my comment to MB-8237. |
| Comment by Maria McDuff [ 14/May/13 ] |
| ketaki, are you still seeing lots of global,ns_rebalance_observer msgs? or has the count been reduced? the fix is to reduce the no. of these msgs. |
| Comment by Ketaki Gangal [ 14/May/13 ] |
| Not seen this on any recent run yet. |
| Comment by Ketaki Gangal [ 14/May/13 ] |
| Not able to reproduce. Will reopen on new instance of this bug. |
[MB-8263] [system test] Erlang crash during data access phase with Mike's toy build Created: 13/May/13 Updated: 14/May/13 Resolved: 14/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Chisheng Hong | Assignee: | Aleksey Kondratenko |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_toy-mikewied-x86_64_2.0.0-21-toy.rpm.manifest.xml | ||
| Operating System: | Centos 64-bit |
| Description |
|
cluster ip is 10.5.2.30
1.3 node cluster, create 2 buckets, default 2.5G quota and saslbucket 1.3G 2.load items into buckets to make it dgm. Around 80% resident ratio 3.After initial loading, do a lot of "gets", around 80%. 20% cache miss maximum. Total ops per sec is 3k ops/sec During data access phase, nodes went down in "Pend" state. bucket is not accessible from UI. Find erlang_crash.dump on 10.5.2.31 under /opt/couchbase/var/lib/couchbase. Link to the core_dump file: https://s3.amazonaws.com/bugdb/jira/MB-8263/erl_crash.dump.05-10-2013-19:05:36.7640 |
| Comments |
| Comment by Chisheng Hong [ 13/May/13 ] |
| diags link https://s3.amazonaws.com/bugdb/jira/MB-8263/3nodes_mike-210_erlang_crash__20130513-183321.tgz |
| Comment by Maria McDuff [ 14/May/13 ] |
| bumping up to blocker. |
| Comment by Maria McDuff [ 14/May/13 ] |
| per bug triage, assigning to mike. |
| Comment by Mike Wiederhold [ 14/May/13 ] |
|
Alk,
I saw wait backfill determination issues in the logs which is an ep-engine issue. I will get the code in the toy build re-run once we fix that issue. There was also an erlang crash dump attached to this bug. Please take a quick look at it and then close this bug as won't fix if you don't see anything interesting. [EDIT]: My comments here are actually incorrect. I was looking at the wrong logs. In any case the crash dump should be investiagted. |
| Comment by Aleksey Kondratenko [ 14/May/13 ] |
|
It looks like crash dump is unrelated to this. In fact Aliaksey managed to look at logs here and we found that logs are much rotated past interesting times. We believe there is indeed one subtle and rare race in ns_server which we will address. But because logs are rotated we are not sure what actually happened here. |
| Comment by Aleksey Kondratenko [ 14/May/13 ] |
| And we don't think this race could initiate any problems. |
| Comment by Mike Wiederhold [ 14/May/13 ] |
| Thanks Alk. I will create another toy build for this issue soon. |
[MB-8279] System Test : Error replicating vbucket XXX {http_request_failed, "POST", {error,{code,500}} Created: 14/May/13 Updated: 14/May/13 Resolved: 14/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | cross-datacenter-replication |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | None |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Ketaki Gangal | Assignee: | Ketaki Gangal |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Centos
Build 202-800 XDCR unidirectional on source - destination cluster. |
||
| Attachments: |
|
| Description |
|
1. Setup a 3 node cluster with 4 buckets.
2. Setup another destination cluster with 3 buckets. 3. Setup 3 replications from source to destination bucket. Bucket RevAB shows xdcr replication errors. Error replicating vbucket 758: {http_request_failed, "POST", "http://Administrator:*****@10.3.4.30:8092/RevAB%2f758%3b12b42cb2c0aa11f587ddc265ac174f81/_bulk_docs", {error,{code,500}} Results. - Replication not catching up, 0.5M items still to be replicated, no more sets on destination node. - The replication status shows 100 percent complete - this is incorrect. There are 0.5M to be replicated to destination. Adding logs. |
| Comments |
| Comment by Ketaki Gangal [ 14/May/13 ] |
|
Live clusters available for debugging here http://coconut-h20801.hq.couchbase.com:8091 and http://10.3.4.30:8091 |
| Comment by Ketaki Gangal [ 14/May/13 ] |
| Source cluster. |
| Comment by Junyi Xie [ 14/May/13 ] |
|
Ketaki, There are quite a lot memcached error at destination cluster during replication of bucket RevAB. The destination memcached returned ENOMEM error code to some capi_replication requests to update docs, this caused the http_error seen at source cluster. Per ep_engine team, the ENOMEM usually means we are running out of memory, but is are a bunch of memory available at this time. Nothing suspicious found at XDCR core logic. It would be nice to have ep_engine folks take a look. couchdb:error,2013-05-14T14:36:39.304,ns_1@127.0.0.1:<0.30614.500>:couch_log:error:42]Uncaught error in HTTP request: {error, {case_clause, {memcached_error,enomem,undefined}}} Stacktrace: [{capi_replication,do_update_replicated_doc_loop,3}, {capi_replication,'-update_replicated_docs/3-fun-0-',4}, {lists,foldr,3}, {capi_replication,update_replicated_docs,3}, {capi_frontend,update_docs,4}, {couch_httpd_db,db_req,2}, {couch_db_frontend,do_db_req,2}, {couch_httpd,handle_request,6}] [couchdb:error,2013-05-14T14:36:39.342,ns_1@127.0.0.1:<0.30647.500>:couch_log:error:42]Uncaught error in HTTP request: {error, {case_clause, {memcached_error,enomem,undefined}}} Stacktrace: [{capi_replication,do_update_replicated_doc_loop,3}, {capi_replication,'-update_replicated_docs/3-fun-0-',4}, {lists,foldr,3}, {capi_replication,update_replicated_docs,3}, {capi_frontend,update_docs,4}, {couch_httpd_db,db_req,2}, {couch_db_frontend,do_db_req,2}, {couch_httpd,handle_request,6}] [couchdb:error,2013-05-14T14:36:39.406,ns_1@127.0.0.1:<0.30647.500>:couch_log:error:42]Uncaught error in HTTP request: {error, {case_clause, {memcached_error,enomem,undefined}}} Stacktrace: [{capi_replication,do_update_replicated_doc_loop,3}, {capi_replication,'-update_replicated_docs/3-fun-0-',4}, {lists,foldr,3}, {capi_replication,update_replicated_docs,3}, {capi_frontend,update_docs,4}, {couch_httpd_db,db_req,2}, {couch_db_frontend,do_db_req,2}, {couch_httpd,handle_request,6}] |
| Comment by Ketaki Gangal [ 14/May/13 ] |
| https://s3.amazonaws.com/bugdb/MB-7259/7259.tar |
| Comment by Ketaki Gangal [ 14/May/13 ] |
|
Yes, the memory is used up.
Closing this as a low provisioning system error. |
[MB-6849] [windows] cbbrowse_logs always grabs debug logs (was: Separate log files by log type) Created: 09/Oct/12 Updated: 14/May/13 Resolved: 14/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | installer |
| Affects Version/s: | 1.8.1, 2.0 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Perry Krug | Assignee: | Thuan Nguyen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | windows | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
At the moment, it seems that (almost) all log messages are going to all logs files. i.e., there are debug messages in the info files.
|
| Comments |
| Comment by Aleksey Kondratenko [ 09/Oct/12 ] |
| There's seemingly some misunderstanding. There are no debug messages in info files. But I can agree that sometimes not-so-important stuff gets into info. And yes I'd like to fix that too. |
| Comment by Perry Krug [ 09/Oct/12 ] |
|
grep debug ns_server.info.log
[ns_server:debug] [2012-10-06 12:44:19] [ns_1@cwsrabbit04.cw-london.co.uk:<0.2858.3350>:ebucketmigrator_srv:init:228] killing tap named: replication_ns_1@cwsrabbit06.cw-london.co.uk [ns_server:debug] [2012-10-07 3:44:20] [ns_1@cwsrabbit04.cw-london.co.uk:<0.9309.3385>:ebucketmigrator_srv:init:228] killing tap named: replication_ns_1@cwsrabbit05.cw-london.co.uk [rebalance:debug] [2012-10-07 3:44:20] [ns_1@cwsrabbit04.cw-london.co.uk:<0.9309.3385>:ebucketmigrator_srv:init:263] upstream_sender pid: <0.9310.3385> There are many more... |
| Comment by Aleksey Kondratenko [ 09/Oct/12 ] |
| Thanks Perry, I wasn't aware. IMHO that's embarrassing enough for 'fix or we'll die' kind of severity |
| Comment by Liang Guo [ 10/Oct/12 ] |
|
Ran cbcollect_info on 2-node cluster while load running, but I didn't see ay ns_server:debug messages in the ns_server.info.log file. I was running build 2.0.0-1827_rel on linux-vm.
[jenkins@cen-1722 cbcollect_info_20121010-190218]$ grep debug ns_server.info.log {loglevel_default,debug}, {loglevel_ns_server,debug}, {loglevel_error_logger,debug}, {loglevel_user,debug}, {loglevel_menelaus,debug}, {loglevel_ns_doctor,debug}, {loglevel_stats,debug}, {loglevel_rebalance,debug}, {loglevel_cluster,debug}, {loglevel_views,debug}, {loglevel_mapreduce_errors,debug}] 'sink-disk_debug','sink-disk_couchdb', |
| Comment by Liang Guo [ 10/Oct/12 ] |
| Perry, any easy way to reproduce this from your end? Thanks, |
| Comment by Aleksey Kondratenko [ 10/Oct/12 ] |
| Perry, we need more details on how this can happen |
| Comment by Perry Krug [ 11/Oct/12 ] |
|
Alk, the logs that this showed up in are available here: http://pickup.citywire.co.uk/jsaid/rabbit04.zip
Perhaps this is only a Windows issue? |
| Comment by Aleksey Kondratenko [ 11/Oct/12 ] |
|
Thanks Perry, indeed there's some massive duplication. Most likely due to cbbrowse_logs.bat that's part of voltron. It's not trivial for me to check if 1.8.1's voltron has right cbbrowse_logs. Thus I'm passing it to voltron folks.
I believe (but haven't tested) that 2.0 should not have this problem. |
| Comment by Aleksey Kondratenko [ 11/Oct/12 ] |
| I think it makes sense to mark 'fox-for' 2.0 so that we double check it works on 2.0. Hopefully 1.8.x refresh (if any) will also include proper fix for that. |
| Comment by Wayne Siu [ 14/May/13 ] |
| Updating the fix version to 2.0.2 (for testing) to confirm if this is still an issue. |
| Comment by Thuan Nguyen [ 14/May/13 ] |
|
Test on build 2.0.2-802 on windows 2008 Re 64bit
cbbrowser_logs.bat collects logs from info, debug and error logs. |
[MB-8193] [system test] Memcached segfault during rebalance with workload running causes rebalance failure Created: 02/May/13 Updated: 14/May/13 Resolved: 12/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Chisheng Hong | Assignee: | Chisheng Hong |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | system-test | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
2.0.2-780-rel
NAME="Ubuntu" VERSION="12.04.2 LTS, Precise Pangolin" |
||
| Operating System: | Ubuntu 64-bit |
| Description |
|
15G RAM 500G regular disc machine
cluster_ip: 172.23.105.23 Create a 8 node cluster, create default bucket with 6G RAM, 1 replica, sasl bucket with 5G RAM, 1 replica Load items to both buckets into dgm state, around 70%. Access the data for 2 hours, with workload: default bucket: create:5,update:5,get:80,delete:5,expire:5,cache_miss:5, 10K ops/sec saslbucket: create:5,update:5,get:20,delete:10,expire:60,cache_miss:5., 10K ops/sec Then rebalance in one node with workload continue Find core.memcached under /data path. Saw a lot of errors like this: Control connection to memcached on 'ns_1@172.23.105.23' disconnected: {{badmatch, {error, timeout}}, [{mc_client_binary, cmd_binary_vocal_recv, 5}, {mc_client_binary, select_bucket, 2}, {ns_memcached, ensure_bucket, 2}, {ns_memcached, handle_info, 2}, {gen_server, handle_msg, 5}, {ns_memcached, init, 1}, {gen_server, init_it, 6}, {proc_lib, init_p_do_apply, 3}]} (repeated 1 times) Rebalance failed in a short time This also happened in Windows the bug number is The stack trace of the core.memcached.xxxx on some node will be added later |
| Comments |
| Comment by Chisheng Hong [ 02/May/13 ] |
| diags from all the nodes: https://s3.amazonaws.com/bugdb/jira/MB-8193/8nodes_202-780_memcahced_segfault_20130502-143321.tgz |
| Comment by Chisheng Hong [ 02/May/13 ] |
|
link to the stack trace of the core dump (core.memcached.30969) for 172.23.105.25 https://friendpaste.com/4mVnVwuBmN2NeIWrrtEIHR
The original core dump file is https://s3.amazonaws.com/bugdb/jira/MB-8193/core.memcached.30969.gz |
| Comment by Chiyoung Seo [ 03/May/13 ] |
|
This issue is NOT the same as |
| Comment by Chiyoung Seo [ 09/May/13 ] |
|
I have been looking at this issue by looking at all the changes that were made recently in checkpoint queue, but was not able to find any bug in those recent changes.
Chisheng also mentioned that he didn't see this crash in the recent system tests anymore. I've also run the DGM rebalance tests many times, but didn't get the same crash. I'm closing this issue as "can't reproduce" at this time. We will see if this crash happens again in the subsequent builds and tests. |
| Comment by Maria McDuff [ 14/May/13 ] |
| chisheng, if can't reproduce, pls close. |
| Comment by Chisheng Hong [ 14/May/13 ] |
| With the same test case and test environment with build 2.0.2-789, can not reproduce this bug. |
[MB-7804] Couchbase logo needs to be updated on UI, desktop and program-settings icon Created: 21/Feb/13 Updated: 14/May/13 Resolved: 14/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | installer, UI |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Improvement | Priority: | Blocker |
| Reporter: | Anil Kumar | Assignee: | Thuan Nguyen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Σ Remaining Estimate: | Not Specified | Remaining Estimate: | Not Specified |
| Σ Time Spent: | Not Specified | Time Spent: | Not Specified |
| Σ Original Estimate: | Not Specified | Original Estimate: | Not Specified |
| Sub-Tasks: |
|
| Description |
|
Couchbase logo needs to be updated on UI, desktop and program-settings icon
|
| Comments |
| Comment by Farshid Ghods [ 21/Feb/13 ] |
|
is this scheduled for 2.0.1 or 2.02 ? given the state of 2.0.1 release wanted to confirm whether this is really a 2.0.1 blocker ? |
| Comment by Anil Kumar [ 21/Feb/13 ] |
| you're right it's 2.0.2 |
| Comment by Anil Kumar [ 14/Mar/13 ] |
| Can you help getting the current logo sizes. |
| Comment by Dipti Borkar [ 14/Mar/13 ] |
|
we will need mockups for this. The UI piece is ns_server, the install piece is the only piece managed by the database tools team. may be create another bug? |
| Comment by Dipti Borkar [ 14/Mar/13 ] |
|
btw, we need to figure this out and give any requirements to Melinda ASAP. |
| Comment by Bin Cui [ 15/Mar/13 ] |
|
You can find all the images used by window installer at:
https://github.com/couchbase/voltron/tree/2.0.0/server-overlay-win/images |
| Comment by Dipti Borkar [ 02/Apr/13 ] |
| Aliaksey also needs to be on the loop on this for the setup screen. |
| Comment by Aleksey Kondratenko [ 02/Apr/13 ] |
|
As soon as we have new logo images assign UI task on Pavel with those images. |
| Comment by Anil Kumar [ 02/Apr/13 ] |
| Sounds good. I've created sub-task for logo change in UI. |
| Comment by Aleksey Kondratenko [ 16/Apr/13 ] |
| ns_server part is done |
| Comment by Maria McDuff [ 19/Apr/13 ] |
| tony, pls verify the UI and the installer. |
| Comment by Anil Kumar [ 02/May/13 ] |
| this is uber bug tracking sub-tasks for browser, windows and mac. will close this once logo update is complete on sub-tasks. |
| Comment by Maria McDuff [ 14/May/13 ] |
| tony, pls verify / close. |
[MB-8267] cbrecovery doesn't keep flag values intact for recovered msgs Created: 13/May/13 Updated: 13/May/13 Resolved: 13/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | tools |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Bin Cui | Assignee: | Bin Cui |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Comments |
| Comment by Bin Cui [ 13/May/13 ] |
|
It is duplicated with |
| Comment by Maria McDuff [ 13/May/13 ] |
|
|
[MB-8230] Rebalance exited with reason {badarg,[{erlang,'++', [{'EXIT', {{janitor_agent_servant_died, Created: 09/May/13 Updated: 13/May/13 Resolved: 12/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Deepkaran Salooja | Assignee: | Jin Lim |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
<manifest>
<remote name="couchbase" fetch="git://github.com/couchbase/"/> <remote name="membase" fetch="git://github.com/membase/"/> <remote name="apache" fetch="git://github.com/apache/"/> <remote name="erlang" fetch="git://github.com/erlang/"/> <default remote="couchbase" revision="master"/> <project name="tlm" path="tlm" revision="9f8a97b773c2b97cd63893a84a2fef2562c8860f"> <copyfile src="Makefile.top" dest="Makefile"/> </project> <project name="bucket_engine" path="bucket_engine" revision="2a797a8d97f421587cce728f2e6aa2cd42c8fa26"/> <project name="ep-engine" path="ep-engine" revision="4a6f6f232ee140dabd4d7f56f75e1c082678d21b"/> <project name="libconflate" path="libconflate" revision="c0d3e26a51f25a2b020713559cb344d43ce0b06c"/> <project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/> <project name="libvbucket" path="libvbucket" revision="408057ec55da3862ab8d75b1ed25d2848afd640f"/> <project name="couchbase-cli" path="couchbase-cli" revision="87dcaa935efb0eac4e75d529ab7e3c81b4439e61" remote="couchbase"/> <project name="memcached" path="memcached" revision="b6ceb46fc26ac6f1d6be7a5866d6c6c0f6e6d32a" remote="membase"/> <project name="moxi" path="moxi" revision="4b391021af7a453bf88716724d2c644916ebd969"/> <project name="ns_server" path="ns_server" revision="d77b8c4d9eb27fbd60778ca299cec29bca749e4c"/> <project name="portsigar" path="portsigar" revision="159b6179ea8a3c2075ee9eb2afa6f91c98c0fda6"/> <project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/> <project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/> <project name="couchbase-python-client" path="couchbase-python-client" revision="d443169c0694fca1be67d8f6934a8c50f0175ee7"/> <project name="couchdb" path="couchdb" revision="586e4bb73b92db4362192616370c4e3edb8c34a0"/> <project name="couchdbx-app" path="couchdbx-app" revision="dec50d163809c636559c5b365983fb5895ffcd0a"/> <project name="couchstore" path="couchstore" revision="abc2af1310ca375697e08aad4fa78e5e5d61adcf"/> <project name="geocouch" path="geocouch" revision="ed9ad43aa361df0829262fef811b5236331b44c8"/> <project name="testrunner" path="testrunner" revision="d652cc820d888ed9e83784e302d3ff630d3f82ea"/> <project name="healthchecker" path="healthchecker" revision="72dab0d4f293e80644b38321f001b42846701890"/> <project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/> <project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/> <project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/> <project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/> <project name="gperftools" path="gperftools" revision="44a584d1de8c89addfb4f1d0522bdbbbed83ba48" remote="couchbase"/> <project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/> </manifest> |
||
| Operating System: | Centos 64-bit |
| Description |
|
Rebalance fails with the below test ./testrunner -i ../ini/vm-4nodes-sanity.ini -t failovertests.FailoverTests.test_failover_normal,replica=2,load_ratio=1 Below crash report is seen in the logs: =========================CRASH REPORT========================= crasher: initial call: ns_vbucket_mover:init/1 pid: <0.1175.16> registered_name: [] exception exit: {badarg, [{erlang,'++', [{'EXIT', {{janitor_agent_servant_died, {{badmatch,{error,closed}}, {gen_server,call, ['ns_memcached-default', {get_mass_tap_docs_estimate, [86,87,88,89,90,91,92,93,94,95,96,97,98,99,100, 101,102,103,104,105,106,107,108,109,110,111, 112,113,114,115,116,117,118,119,120,121,122, 123,124,125,126,127,128,129,130,131,132,133, 134,135,136,137,138,139,140,141,142,143,144, 145,146,147,148,149,150,151,152,153,154,155, 156,157,158,159,160,161,162,163,164,165,166, 167,168,169,170,342,343,344,345,346,347,348, 349,350,351,352,353,354,355,356,357,358,359, 360,361,362,363,364,365,366,367,368,369,370, 371,372,373,374,375,376,377,378,379,380,381, 382,383,384,385,386,387,388,389,390,391,392, 393,394,395,396,397,398,399,400,401,402,403, 404,405,406,407,408,409,410,411,412,413,414, Attaching the diags |
| Comments |
| Comment by Deepkaran Salooja [ 09/May/13 ] |
|
https://s3.amazonaws.com/bugdb/jira/MB-8230/e9125b6b/10.3.3.100-592013-54-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8230/e9125b6b/10.3.3.95-592013-56-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8230/e9125b6b/10.3.3.96-592013-58-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8230/e9125b6b/10.3.3.98-592013-510-diag.zip |
| Comment by Deepkaran Salooja [ 09/May/13 ] |
| promoted to blocker. Seeing multiple tests fail due to this. |
| Comment by Maria McDuff [ 09/May/13 ] |
| can u confirm that this is build 789? |
| Comment by Aliaksey Artamonau [ 09/May/13 ] |
|
Memcached aborted on several nodes: 2013-05-09 04:44:16.512 ns_log:0:info:message(ns_1@10.3.3.98) - Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 134. Restarting. ... MUTEX ERROR: Failed to acquire lock: Invalid argument |
| Comment by Jin Lim [ 10/May/13 ] |
|
This suffered the same issue as |
| Comment by Jin Lim [ 10/May/13 ] |
|
Duplicate of |
| Comment by Maria McDuff [ 13/May/13 ] |
|
|
[MB-8256] snapshotStats crash during shutdown bucket Created: 13/May/13 Updated: 13/May/13 Resolved: 13/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Jin Lim | Assignee: | Jin Lim |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
scheduled snapshotStats task is picked up and being executed during the bucket destruction. This unfortunately cause a crash for invalid reference to a being-deallocated internal structure (epstat, configuration, etc).
|
| Comments |
| Comment by Maria McDuff [ 13/May/13 ] |
| per bug triage, upgrading to blocker. |
| Comment by Jin Lim [ 13/May/13 ] |
| http://review.couchbase.org/#/c/26253/3, fix got merged after verification. |
[MB-5383] rebalancing should finish vbucket movements for those healthy tap streams instead of aborting the entire rebalancing because one stream was shut down by ep-engine Created: 25/May/12 Updated: 13/May/13 Resolved: 13/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 1.8.0 |
| Fix Version/s: | Backlog |
| Security Level: | Public |
| Type: | Improvement | Priority: | Major |
| Reporter: | Farshid Ghods | Assignee: | Aleksey Kondratenko |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
this feature is very helpful for large clusters where user is trying to rebalance 40 nodes and just because memcached on one node is misbehaving temporarily for whatever reason we abort the entire rebalance
instead ns-server should continue the tap streams that are going on and continue moving more items and in the end print the summary of all streams which failed |
| Comments |
| Comment by Anil Kumar [ 13/May/13 ] |
| closing this since its incremental starting 1.8.1 |
[MB-7384] it would be nice to display the setting changes in the UI logs that have been made through the tools( for instance: start/stop persistence) Created: 10/Dec/12 Updated: 13/May/13 Resolved: 13/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | ns_server, UI |
| Affects Version/s: | 2.0, 2.0.1, 2.0.2 |
| Fix Version/s: | 2.1 |
| Security Level: | Public |
| Type: | Improvement | Priority: | Major |
| Reporter: | Andrei Baranouski | Assignee: | Aleksey Kondratenko |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
user can stop persistence via
./cbepctl localhost:11210 stop Persistence stopped it's not so easy to understand the problems, if they appear when user is not informed in a convenient form |
| Comments |
| Comment by Maria McDuff [ 27/Mar/13 ] |
| deferring to 2.1 release. |
| Comment by Anil Kumar [ 13/May/13 ] |
| closing this improvement since most of the ep_engine related stuff should not be exposed |
[MB-7501] [RN 2.1] Release Notes Created: 07/Jan/13 Updated: 13/May/13 Resolved: 13/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | documentation |
| Affects Version/s: | 2.1 |
| Fix Version/s: | None |
| Security Level: | Public |
| Type: | Improvement | Priority: | Major |
| Reporter: | Karen Zeller | Assignee: | Karen Zeller |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Flagged: |
Release Note
|
| Description |
|
FEATURES:
-Multiple readers/writers per bucket: http://hub.internal.couchbase.com/confluence/display/cbeng/EP-Engine+Roadmap#EP-EngineRoadmap-MultipleReadersandWritersPerBucket(Jin,2.1release) BUGs, KNOWN Issues: |
| Comments |
| Comment by Anil Kumar [ 13/May/13 ] |
| this is been tracked seprately. |
[MB-7786] [RN 2.0.2] Frequent replication start error messages "Failed to grab remote bucket info, vbucket.." at start of replication. Created: 19/Feb/13 Updated: 13/May/13 Resolved: 13/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Improvement | Priority: | Major |
| Reporter: | Ketaki Gangal | Assignee: | Ketaki Gangal |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | 2.0.2-release-notes | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | 2.0.1.-160-rel | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Flagged: |
Release Note
|
||||||||
| Description |
|
Seeing 2 issues when trying to setup replication for the first time between two clusters -
1.Frequently seeing error messages on creating the 1st time replication from cluster1 to cluster2. "Failed to grab remote bucket info, vbucket.." Both the buckets on source/destination are available for a long period of time, so it does not look like it is an issue w/ bucket not being ready. I dont have logs on this currently, will add soon. Seeing this across platforms - linux / windows and on most 2.0.1 runs. 2.Replication replicates data as expected, and these error messages persist for over an hour on the xdcr-last 10 errors. This gives user a wrong idea about the state of replication. The intial Replication-call should either wait long enough to avoid these errors/ figure out if something else can be done here. And also, how frequently do we clean up the xdcr-error messages on the console? Can we clear them sooner than current time? |
| Comments |
| Comment by Ketaki Gangal [ 20/Feb/13 ] |
| Please release-note this. |
| Comment by Ketaki Gangal [ 25/Feb/13 ] |
|
Logs from the node it is trying to reach :
clusters http://ec2-54-235-229-199.compute-1.amazonaws.com:8091/index.html#sec=replications to http://ec2-107-22-40-124.compute-1.amazonaws.com:8091/index.html#sec=analytics&statsBucket=%2Fpools%2Fdefault%2Fbuckets%2Fsasl%3Fbucket_uuid%3De1f9d1e199f28b83c35f26c61ee90ec9 |
| Comment by Ketaki Gangal [ 25/Feb/13 ] |
|
Hi Aliaksey,
I ve added logs from one of the nodes. Could you take a look? Please re-assign this to me/ Jin after you do so. thanks, Ketaki |
| Comment by Ketaki Gangal [ 07/Mar/13 ] |
| Change added here http://review.couchbase.org/#/c/24986/, will be part of next branch. |
| Comment by Ketaki Gangal [ 12/Mar/13 ] |
| http://review.couchbase.org/#/c/24986/ |
| Comment by Karen Zeller [ 15/Mar/13 ] |
|
Added as known issue to RN 2.0.1: When you create a replication between two clusters, you may experience the incorrect error message "Failed to grab remote bucket info, vbucket". Replication will start as and function expected, but the incorrect error message may persist for some time. Please ignore this incorrect error. |
| Comment by Aliaksey Artamonau [ 15/Mar/13 ] |
| I would not call the error incorrect. It's just that replication is able to recover from it. |
| Comment by Karen Zeller [ 15/Mar/13 ] |
|
Redo as: When you create a replication between two clusters, you may experience two error messages: "Failed to grab remote bucket info, vbucket" and "Error replicating vbucket X". Nonetheless, replication will still start and then function as expected, but the error messages may appear for some time in the Web Console. Please ignore this behavior. |
| Comment by Aliaksey Artamonau [ 15/Mar/13 ] |
| Looks good to me. |
| Comment by Karen Zeller [ 15/Mar/13 ] |
| Yes indeed.... : ) |
| Comment by Karen Zeller [ 15/Mar/13 ] |
| http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-server-rn_2-0-0l.html |
| Comment by Aleksey Kondratenko [ 02/Apr/13 ] |
|
Not sure who to assign on.
Code-wise we've fixed it. It was caused by thundering herd of those remote bucket info requests and we don't allow that anymore. I believe folks wanted to add this to release note. Anyways Aliaksey is done with that. |
| Comment by Maria McDuff [ 16/Apr/13 ] |
|
karen, are you finished documenting this? just flagging this for you for release note. Will assign to Ketaki for verification/closing. Thanks. |
| Comment by Karen Zeller [ 16/Apr/13 ] |
| I add this to the 2.0.1 release notes as minor known issue to ignore. Is the message now fixed for 2.0.2? |
| Comment by Aleksey Kondratenko [ 16/Apr/13 ] |
|
As can be seen above it is fixed. |
| Comment by Maria McDuff [ 13/May/13 ] |
|
pls verify / close.
if issue is fixed, karen does not need to RN for 2.0.2 |
| Comment by Karen Zeller [ 13/May/13 ] |
| Relabeled in RN 2.0.2 as Fix. For earlier versions was in release notes as known issue. |
[MB-8262] raise a warning if more than 10 views are created for a bucket Created: 03/Oct/12 Updated: 13/May/13 Resolved: 13/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | UI |
| Affects Version/s: | 2.0, 2.0.1, 2.0.2 |
| Fix Version/s: | 2.1 |
| Security Level: | Public |
| Type: | Improvement | Priority: | Critical |
| Reporter: | Sharon Barr | Assignee: | Aleksey Kondratenko |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Comments |
| Comment by Aleksey Kondratenko [ 08/Oct/12 ] |
| This was not assigned to anybody. But marked for 2.0. Assigning to Peter to prevent us losing track of it |
| Comment by Aleksey Kondratenko [ 11/Oct/12 ] |
| BTW, given that my English is, lets say, suboptimal, it would be nice if tickets like that could suggest actual text to display |
| Comment by Aleksey Kondratenko [ 11/Oct/12 ] |
| http://review.couchbase.org/#/c/21560/ |
| Comment by Deepkaran Salooja [ 16/Oct/12 ] |
|
Need confirmation on below points, if these are the intended behavior(Tested with build 1854, which has this fix):
- The limit of 10 views is per design document and not per bucket as mentioned in the subject of the ticket. There is no warning given if 10 views each are created for 2 design documents. - spatial views are not taken into account as of now. After creating 10 views in a design document, a couple of spatial views can be added without warning. |
| Comment by Aleksey Kondratenko [ 16/Oct/12 ] |
| Yes, Deepkaran, I explicitly ignored spatial views. And yes, I interpreted this as 10 views per ddoc. Maybe incorrectly. |
| Comment by Aleksey Kondratenko [ 17/Oct/12 ] |
|
Waiting for somebody's reaction and want this off my list while waiting. Here's what email I sent: Hi. Looks like I misunderstood ticket description. Indeed text clearly indicates we should warn on 10 views per bucket, but I did 10 views per ddoc (and there's no warning on count of ddocs). It won't take long to fix, but given lateness I'd like to have confirmation that indeed I did it wrong and indeed I should fix. |
| Comment by Dipti Borkar [ 17/Oct/12 ] |
|
Can you create a warning for 10 design docs as well on a bucket level?
I'm assuming its a non-invasive change. |
| Comment by Dipti Borkar [ 01/Nov/12 ] |
|
Aliaksey, did not hear back from you on this one?
what's the current status? We may not be able to fix it for 2.0 but need to know what currently happens. |
| Comment by Farshid Ghods [ 09/Jan/13 ] |
|
per bug scrub Dipti/Farshid/Steve/Jin
deferring this to 2.0.2 |
| Comment by Dipti Borkar [ 13/May/13 ] |
| Given that views performance may change in future releases, closing this as no plan to fix. We need to revisit this, if it becomes a problem. |
[MB-8166] assertion with crash and possible memory leak in ObjectRegistry::onCreateBlob at src/objectregistry.cc:57 Created: 29/Apr/13 Updated: 13/May/13 Resolved: 13/May/13 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Matt Ingenthron | Assignee: | Deepkaran Salooja |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | community | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
OS:
Description of XDCR setup: Description of application workload: |
||
| Operating System: | Centos 64-bit |
| Description |
|
With two clusters doing XDCR replication, on a regular basis the memory utilization will grow very slowly and eventually will crash with a core dump.
asic crash dump analysis of core.29720. Please send the file to support@couchbase.com -------------------------------------------------------------------------------- File information: -rwxr-xr-x 1 bin bin 1535008 Feb 28 21:36 /opt/couchbase/bin/memcached 30c4e0e7d6ff487a025921dd773ec30b /opt/couchbase/bin/memcached memcached 1.4.4_601_ge6f892c memcached 1.4.4_601_ge6f892c VERSION 1.4.4_601_ge6f892c -rw------- 1 couchbase couchbase 2588475392 Apr 27 12:54 core.29720 645f66306da0be2d16b9e2d872f8d0f1 core.29720 -------------------------------------------------------------------------------- Core file callstacks: GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /opt/couchbase/bin/memcached...done. [New Thread 29726] [New Thread 29727] [New Thread 29732] [New Thread 29734] [New Thread 29736] [New Thread 29720] [New Thread 29721] [New Thread 29723] [New Thread 29737] [New Thread 29724] [New Thread 29722] [New Thread 29725] [New Thread 29735] [New Thread 29733] Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done. Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0 Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done. Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5 Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/librt.so.1 Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done. Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4 Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libstdc++.so.6 Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libgcc_s.so.1 Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done. Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done. Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done. Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so Reading symbols from /opt/couchbase/lib/memcached/ep.so...done. Loaded symbols for /opt/couchbase/lib/memcached/ep.so Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done. Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1 Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done. Loaded symbols for /opt/couchbase/lib/libsnappy.so.1 Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_files.so.2 Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'. Program terminated with signal 6, Aborted. #0 0xb77ea424 in __kernel_vsyscall () Thread 14 (Thread 0xb360bb70 (LWP 29733)): #0 JSON_checker_char (jc=0x581e40f0, next_char=<value optimized out>) at tools/JSON_checker.c:299 #1 0xb3ef150b in checkUTF8JSON (data=0xa074b2bc ":1364725540,\"process\":\"fqc\",\"status\":0,\"errors\":[{\"type\":\"SOFT EDGE\\uff084 CORNERS\\uff09\"}]}],\"part_numbers\":[\"604-3157\"],\"id\":\"DYH0100024769156\",\"_source\":\"products\",\"project\":\"X128\",\"part\":\"Housing\""..., size=908) at tools/JSON_checker.c:417 #2 0xb3ed53d4 in isJSON (this=0x5a42afc0, it=..., rev=3, cb=..., del=false) at src/couch-kvstore/couch-kvstore.cc:61 #3 CouchRequest::CouchRequest (this=0x5a42afc0, it=..., rev=3, cb=..., del=false) at src/couch-kvstore/couch-kvstore.cc:248 #4 0xb3ee2c4c in CouchKVStore::set (this=0xc501680, itm=..., cb=...) at src/couch-kvstore/couch-kvstore.cc:343 #5 0xb3e6349a in EventuallyPersistentStore::flushOneDelOrSet (this=0x9415600, qi=..., rejectQueue=std::queue wrapping: std::deque with 0 elements, vb=...) at src/ep.cc:2420 #6 0xb3e6374a in EventuallyPersistentStore::flushOne (this=0x9415600, queue=std::queue wrapping: std::deque with 1941 elements = {...}, rejectQueue=std::queue wrapping: std::deque with 0 elements, vb=...) at src/ep.cc:2468 #7 0xb3e67750 in EventuallyPersistentStore::flushVBQueue (this=0x9415600, vb=..., vb_queue=std::queue wrapping: std::deque with 1941 elements = {...}, vbid=201, data_age=0) at src/ep.cc:2022 #8 0xb3e68ef7 in EventuallyPersistentStore::flushOutgoingQueue (this=0x9415600, flushQueue=0x9415798, flushPhase=@0x941a9ec, nextVbid=@0x941a9f0) at src/ep.cc:1964 #9 0xb3e9824a in Flusher::doFlush (this=0x941a960) at src/flusher.cc:245 #10 0xb3e999e0 in Flusher::step (this=0x941a960, d=..., tid=...) at src/flusher.cc:158 #11 0xb3e5b3a2 in Task::run (this=0xc4e8d20, d=..., t=...) at src/dispatcher.hh:136 #12 0xb3e59fe9 in Dispatcher::run (this=0xc511100) at src/dispatcher.cc:173 #13 0xb3e5a9d5 in launch_dispatcher_thread (arg=0xc511100) at src/dispatcher.cc:28 #14 0x4363ba49 in start_thread () from /lib/libpthread.so.0 #15 0x43578e1e in clone () from /lib/libc.so.6 Thread 13 (Thread 0xb2609b70 (LWP 29735)): #0 0xb77ea424 in __kernel_vsyscall () #1 0x4363f664 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0xb3e57b49 in wait (this=0xc4e8b90, d=...) at src/syncobject.hh:58 #3 IdleTask::run (this=0xc4e8b90, d=...) at src/dispatcher.cc:336 #4 0xb3e59fe9 in Dispatcher::run (this=0xc510f00) at src/dispatcher.cc:173 #5 0xb3e5a9d5 in launch_dispatcher_thread (arg=0xc510f00) at src/dispatcher.cc:28 #6 0x4363ba49 in start_thread () from /lib/libpthread.so.0 #7 0x43578e1e in clone () from /lib/libc.so.6 Thread 12 (Thread 0xb573fb70 (LWP 29725)): #0 0xb77ea424 in __kernel_vsyscall () #1 0x43579696 in epoll_wait () from /lib/libc.so.6 #2 0xb77c9f97 in epoll_dispatch (base=0xc500d80, tv=0x0) at epoll.c:404 #3 0xb77b6463 in event_base_loop (base=0xc500d80, flags=0) at event.c:1558 #4 0x0805cd47 in worker_libevent (arg=0x9414ed8) at daemon/thread.c:301 #5 0x4363ba49 in start_thread () from /lib/libpthread.so.0 #6 0x43578e1e in clone () from /lib/libc.so.6 Thread 11 (Thread 0xb6f4db70 (LWP 29722)): #0 0xb77ea424 in __kernel_vsyscall () #1 0x4363f664 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0xb6f509f5 in logger_thead_main (arg=0x9412040) at extensions/loggers/file_logger.c:368 #3 0x4363ba49 in start_thread () from /lib/libpthread.so.0 #4 0x43578e1e in clone () from /lib/libc.so.6 Thread 10 (Thread 0xb5f40b70 (LWP 29724)): #0 0xb7789b3a in MallocExtension_GetAllocatedSize () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #1 0xb3e9e4d9 in DeleteHook (ptr=0x5c31c6a0) at src/memory_tracker.cc:56 #2 0xb77862c0 in MallocHook::InvokeDeleteHookSlow(void const*) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #3 0xb777a0d2 in MallocHook::InvokeDeleteHook(void const*) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #4 0xb778c2f4 in tc_delete () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 #5 0xb3eaf055 in ~SingleThreadedRCPtr (this=0x5c6aaa00, shouldPause=@0xb5f3df9f) at src/atomic.hh:412 #6 TapProducer::nextFgFetched_UNLOCKED (this=0x5c6aaa00, shouldPause=@0xb5f3df9f) at src/tapconnection.cc:1539 #7 0xb3eb23b0 in TapProducer::getNextItem (this=0x5c6aaa00, c=0xc49e000, vbucket=0xb5f40228, ret=@0xb5f3e048, referenced=@0xb5f3e04f) at src/tapconnection.cc:1850 #8 0xb3e94d38 in EventuallyPersistentEngine::doWalkTapQueue (this=0xc54e000, cookie=0xc49e000, itm=0xb5f40224, es=0xb5f40220, nes=0xb5f4022c, ttl=0xb5f4022f "\377", flags=0xb5f4022a, seqno=0xb5f4021c, vbucket=0xb5f40228, connection=0x5c6aaa00, retry=@0xb5f3e0bf) at src/ep_engine.cc:1659 #9 0xb3e84bc3 in EventuallyPersistentEngine::walkTapQueue (this=0xc54e000, cookie=0xc49e000, itm=0xb5f40224, es=0xb5f40220, nes=0xb5f4022c, ttl=0xb5f4022f "\377", flags=0xb5f4022a, seqno=0xb5f4021c, vbucket=0xb5f40228) at src/ep_engine.cc:1734 #10 0xb3e84d56 in EvpTapIterator (handle=0xc54e000, cookie=0xc49e000, itm=0xb5f40224, es=0xb5f40220, nes=0xb5f4022c, ttl=0xb5f4022f "\377", flags=0xb5f4022a, seqno=0xb5f4021c, vbucket=0xb5f40228) at src/ep_engine.cc:1054 #11 0xb674702f in bucket_tap_iterator_shim (handle=0xb674c220, cookie=0xc49e000, itm=0xb5f40224, engine_specific=0xb5f40220, nengine_specific=0xb5f4022c, ttl=0xb5f4022f "\377", flags=0xb5f4022a, seqno=0xb5f4021c, vbucket=0xb5f40228) at bucket_engine.c:1971 #12 0x08052de0 in ship_tap_log (c=0xc49e000) at daemon/memcached.c:2614 #13 0x0805c0fe in conn_ship_log (c=0xc49e000) at daemon/memcached.c:5523 #14 0x0804d885 in event_handler (fd=67, which=2, arg=0xc49e000) at daemon/memcached.c:5936 #15 0xb77b6568 in event_process_active_single_queue (base=0xc500600, flags=0) at event.c:1308 #16 event_process_active (base=0xc500600, flags=0) at event.c:1375 #17 event_base_loop (base=0xc500600, flags=0) at event.c:1572 #18 0x0805cd47 in worker_libevent (arg=0x9414e4c) at daemon/thread.c:301 #19 0x4363ba49 in start_thread () from /lib/libpthread.so.0 #20 0x43578e1e in clone () from /lib/libc.so.6 Thread 9 (Thread 0xb1607b70 (LWP 29737)): #0 0xb77ea424 in __kernel_vsyscall () #1 0x4363f664 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0xb3e79a0e in wait (this=0xc54e000) at src/syncobject.hh:58 #3 wait (this=0xc54e000) at src/syncobject.hh:74 #4 wait (this=0xc54e000) at src/tapconnmap.hh:169 #5 EventuallyPersistentEngine::notifyPendingConnections (this=0xc54e000) at src/ep_engine.cc:3423 #6 0xb3e79b22 in EvpNotifyPendingConns (arg=0xc54e000) at src/ep_engine.cc:1145 #7 0x4363ba49 in start_thread () from /lib/libpthread.so.0 #8 0x43578e1e in clone () from /lib/libc.so.6 Thread 8 (Thread 0xb6741b70 (LWP 29723)): #0 0xb77ea424 in __kernel_vsyscall () #1 0x43579696 in epoll_wait () from /lib/libc.so.6 #2 0xb77c9f97 in epoll_dispatch (base=0xc500180, tv=0x0) at epoll.c:404 #3 0xb77b6463 in event_base_loop (base=0xc500180, flags=0) at event.c:1558 #4 0x0805cd47 in worker_libevent (arg=0x9414dc0) at daemon/thread.c:301 #5 0x4363ba49 in start_thread () from /lib/libpthread.so.0 #6 0x43578e1e in clone () from /lib/libc.so.6 Thread 7 (Thread 0xb7761b70 (LWP 29721)): #0 0xb77ea424 in __kernel_vsyscall () #1 0x43568aeb in read () from /lib/libc.so.6 #2 0x4350461b in _IO_new_file_underflow () from /lib/libc.so.6 #3 0x4350633b in _IO_default_uflow_internal () from /lib/libc.so.6 #4 0x4350794a in __uflow () from /lib/libc.so.6 #5 0x434fa33c in _IO_getline_info_internal () from /lib/libc.so.6 #6 0x434fa281 in _IO_getline_internal () from /lib/libc.so.6 #7 0x434f91ba in fgets () from /lib/libc.so.6 #8 0xb77a67b7 in check_stdin_thread (arg=0x804a790) at extensions/daemon/stdin_check.c:37 #9 0x4363ba49 in start_thread () from /lib/libpthread.so.0 #10 0x43578e1e in clone () from /lib/libc.so.6 Thread 6 (Thread 0xb7763900 (LWP 29720)): #0 0xb77ea424 in __kernel_vsyscall () #1 0x43579696 in epoll_wait () from /lib/libc.so.6 #2 0xb77c9f97 in epoll_dispatch (base=0xc500000, tv=0xbfd5d254) at epoll.c:404 #3 0xb77b6463 in event_base_loop (base=0xc500000, flags=0) at event.c:1558 #4 0x08051671 in main (argc=19, argv=0xbfd5e8a4) at daemon/memcached.c:7918 Thread 5 (Thread 0xb1e08b70 (LWP 29736)): #0 0xb77ea424 in __kernel_vsyscall () #1 0x4363f664 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0xb3e57b49 in wait (this=0xc4e8b40, d=...) at src/syncobject.hh:58 #3 IdleTask::run (this=0xc4e8b40, d=...) at src/dispatcher.cc:336 #4 0xb3e59fe9 in Dispatcher::run (this=0xc511500) at src/dispatcher.cc:173 #5 0xb3e5a9d5 in launch_dispatcher_thread (arg=0xc511500) at src/dispatcher.cc:28 #6 0x4363ba49 in start_thread () from /lib/libpthread.so.0 #7 0x43578e1e in clone () from /lib/libc.so.6 Thread 4 (Thread 0xb2e0ab70 (LWP 29734)): #0 0xb77ea424 in __kernel_vsyscall () #1 0x4363f664 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0xb3e57b49 in wait (this=0xc4e8a50, d=...) at src/syncobject.hh:58 #3 IdleTask::run (this=0xc4e8a50, d=...) at src/dispatcher.cc:336 #4 0xb3e59fe9 in Dispatcher::run (this=0xc511000) at src/dispatcher.cc:173 #5 0xb3e5a9d5 in launch_dispatcher_thread (arg=0xc511000) at src/dispatcher.cc:28 #6 0x4363ba49 in start_thread () from /lib/libpthread.so.0 #7 0x43578e1e in clone () from /lib/libc.so.6 Thread 3 (Thread 0xb3e20b70 (LWP 29732)): #0 0xb77ea424 in __kernel_vsyscall () #1 0x43537be6 in nanosleep () from /lib/libc.so.6 #2 0x43571d2c in usleep () from /lib/libc.so.6 #3 0xb3e9e7ff in updateStatsThread (arg=0x940c180) at src/memory_tracker.cc:31 #4 0x4363ba49 in start_thread () from /lib/libpthread.so.0 #5 0x43578e1e in clone () from /lib/libc.so.6 Thread 2 (Thread 0xb473db70 (LWP 29727)): #0 0xb77ea424 in __kernel_vsyscall () #1 0x43579696 in epoll_wait () from /lib/libc.so.6 #2 0xb77c9f97 in epoll_dispatch (base=0xc501500, tv=0x0) at epoll.c:404 #3 0xb77b6463 in event_base_loop (base=0xc501500, flags=0) at event.c:1558 #4 0x0805cd47 in worker_libevent (arg=0x9414ff0) at daemon/thread.c:301 #5 0x4363ba49 in start_thread () from /lib/libpthread.so.0 #6 0x43578e1e in clone () from /lib/libc.so.6 Thread 1 (Thread 0xb4f3eb70 (LWP 29726)): #0 0xb77ea424 in __kernel_vsyscall () #1 0x434c4b01 in raise () from /lib/libc.so.6 #2 0x434c63da in abort () from /lib/libc.so.6 #3 0x434bdddb in __assert_fail_base () from /lib/libc.so.6 #4 0x434bde96 in __assert_fail () from /lib/libc.so.6 #5 0xb3ef2116 in ObjectRegistry::onCreateBlob (blob=0x59fa5700) at src/objectregistry.cc:57 #6 0xb3e7c6b4 in Blob (this=0xc54e000, cookie=0xc49e3c0, request=0x577f4000, response=0x804f240 <binary_response_handler>) at src/item.hh:116 #7 New (this=0xc54e000, cookie=0xc49e3c0, request=0x577f4000, response=0x804f240 <binary_response_handler>) at src/item.hh:60 #8 setData (this=0xc54e000, cookie=0xc49e3c0, request=0x577f4000, response=0x804f240 <binary_response_handler>) at src/item.hh:348 #9 Item (this=0xc54e000, cookie=0xc49e3c0, request=0x577f4000, response=0x804f240 <binary_response_handler>) at src/item.hh:163 #10 EventuallyPersistentEngine::setWithMeta (this=0xc54e000, cookie=0xc49e3c0, request=0x577f4000, response=0x804f240 <binary_response_handler>) at src/ep_engine.cc:3876 #11 0xb3e867d6 in processUnknownCommand (h=<value optimized out>, cookie=0xc49e3c0, request=0x577f4000, response=0x804f240 <binary_response_handler>) at src/ep_engine.cc:949 #12 0xb3e87af1 in EvpUnknownCommand (handle=0xc54e000, cookie=0xc49e3c0, request=0x577f4000, response=0x804f240 <binary_response_handler>) at src/ep_engine.cc:1013 #13 0xb6746901 in bucket_unknown_command (handle=0xb674c220, cookie=0xc49e3c0, request=0x577f4000, response=0x804f240 <binary_response_handler>) at bucket_engine.c:2473 #14 0x08059e57 in process_bin_unknown_packet (c=0xc49e3c0) at daemon/memcached.c:2876 #15 process_bin_packet (c=0xc49e3c0) at daemon/memcached.c:3164 #16 complete_nread_binary (c=0xc49e3c0) at daemon/memcached.c:3738 #17 complete_nread (c=0xc49e3c0) at daemon/memcached.c:3820 #18 conn_nread (c=0xc49e3c0) at daemon/memcached.c:5673 #19 0x0804d885 in event_handler (fd=56, which=2, arg=0xc49e3c0) at daemon/memcached.c:5936 #20 0xb77b6568 in event_process_active_single_queue (base=0xc500a80, flags=0) at event.c:1308 #21 event_process_active (base=0xc500a80, flags=0) at event.c:1375 #22 event_base_loop (base=0xc500a80, flags=0) at event.c:1572 #23 0x0805cd47 in worker_libevent (arg=0x9414f64) at daemon/thread.c:301 #24 0x4363ba49 in start_thread () from /lib/libpthread.so.0 #25 0x43578e1e in clone () from /lib/libc.so.6 -------------------------------------------------------------------------------- Module information: /opt/couchbase/lib/memcached/libmemcached_utilities.so.0: /opt/couchbase/lib/libevent-2.0.so.5: /lib/libdl.so.2: /lib/libm.so.6: /lib/librt.so.1: /opt/couchbase/lib/libtcmalloc_minimal.so.4: /lib/libpthread.so.0: /lib/libc.so.6: /lib/ld-linux.so.2: /usr/lib/libstdc++.so.6: /lib/libgcc_s.so.1: /opt/couchbase/lib/memcached/stdin_term_handler.so: /opt/couchbase/lib/memcached/file_logger.so: /opt/couchbase/lib/memcached/bucket_engine.so: /opt/couchbase/lib/memcached/ep.so: /opt/couchbase/lib/libcouchstore.so.1: /opt/couchbase/lib/libsnappy.so.1: /lib/libnss_files.so.2: |
| Comments |
| Comment by Damien Katz [ 29/Apr/13 ] |
| This looks to be a ep-engine issue. |
| Comment by G Woo [ 29/Apr/13 ] |
|
OS: cat /proc/version Linux version 2.6.39-400.17.2.el6uek.i686 (mockbuild@ca-build44.us.oracle.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Wed Mar 13 12:19:21 PDT 2013 Description of XDCR setup: - DC1: 4 nodes with 8GB of ram. All allocated to one bucket. 1GB left for OS. - DC2: 8 nodes with 8GB of ram. All allocated to one bucket. 1GB left for OS Description of application workload: Averaging about 80ops/sec. A single document may have 4-6 ops. 50% sets. 5.17 million items. 5GB total docs data size.6.48GB total disk size. No views. Hope this helps. Let me know if there is anything else I can add. |
| Comment by Chiyoung Seo [ 09/May/13 ] |
| Unfortunately, it's difficult for me to debug this possible memory leak or stat issue by just looking the gdb backtrace. I need both memcached and stats log files that are collected through running collect_info. |
| Comment by Matt Ingenthron [ 09/May/13 ] |
| I'll email the user to see if more info is available. Given that it's an assertion, can't we do anything? I would think an assertion with a code stack would be something we could trace backwards to find possible causes. |
| Comment by Chiyoung Seo [ 09/May/13 ] |
| The assertion was caused by the overflow of a memory usage stat value. It seems to me that the memory stat wasn't incremented or decremented correctly in other places. I need to look at other memory stats and tcmalloc stats as well. |
| Comment by Matt Ingenthron [ 09/May/13 ] |
| Thanks Chiyoung. |
| Comment by Chiyoung Seo [ 13/May/13 ] |
|
I'm still waiting for the log files and the gdb output on the item value size. Meanwhile, I found that we didn't check the value size limit in setWithMeta API, which can cause a huge memory allocation. http://review.couchbase.org/#/c/26269/ |
| Comment by Maria McDuff [ 13/May/13 ] |
| pls verify. |
| Comment by Dipti Borkar [ 13/May/13 ] |
| Since setWithMeta is used for XDCR, we should raise this to a blocker. |
[MB-8253] windows 2.0.2 builds failing due to Cheetah not being installed correctly Created: 13/May/13 Updated: 13/May/13 Resolved: 13/May/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | build |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Phil Labee | Assignee: | Bin Cui |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | |||