[MB-7140] [system test] tcmalloc segfault Created: 09/Nov/12 Updated: 11/Apr/13 Resolved: 18/Jan/13
|Reporter:||Thuan Nguyen||Assignee:||Thuan Nguyen|
|Remaining Estimate:||Not Specified|
|Time Spent:||Not Specified|
|Original Estimate:||Not Specified|
|Environment:||centos 6.2 64bit build 2.0.0-1945|
Cluster: 7 nodes
Node will be added:
Build # 2.0.0-1945
Environment: 8 nodes with 390GB SSD drive, 32GB RAM
Bucket: 1 default bucket (1 replica), disable replica index.
Number of clients: 1
Load 40 million items to default bucket that push resident ratio down to around 62%
Maintain load about 600 ops and 600 queries per second
Create 1 design doc with 8 views. Let the initial index completed.
Then add node 45 to cluster and rebalance.
Rebalance was done at 01:35:40 - Friday Nov 9, 2012.
Then at 07:19:06 - Fri Nov 9, 2012, control connection to memcached on node 42 was disconnected.
Generate memcached backtrace on node 42, got error
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007fb8600f3d03 in tcmalloc::CentralFreeList::FetchFromSpans() ()
Missing separate debuginfos, use: debuginfo-install couchbase-server-2.0.0-1945.x86_64
This bug looks similar to bug
Link to manifest file http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1945-rel.rpm.manifest.xml
Link to memcached stack trace https://friendpaste.com/mhLCkxtmbXmmO0t3ZtQkV
Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201211/8nodes-ci-1945-tcmalloc-segfault-20121109-120057.tgz
|Comment by Chiyoung Seo [ 09/Nov/12 ]|
I don't see anything suspicious in ep-engine. At this time, I move this to 2.0.1.
I was keeping track of the tcmalloc bug reports and some users reported some crashes recently as well. I will create a separate a bug for patching the tcmalloc latest version in 2.0.1
|Comment by Chiyoung Seo [ 21/Nov/12 ]|
|Move this to 2.0.2 as it happens very rarely. As mentioned above, we need to patch the latest tcmalloc in 2.0.1 as it has fixes to some crash issues.|
|Comment by Chiyoung Seo [ 18/Jan/13 ]|
|I was not able to reproduce this issue, but we opened the bug to patch the latest tcmalloc.|
|Comment by Maria McDuff [ 05/Apr/13 ]|
|Hi Tony, chk if current 2.0.2 build is segfaulting, if not, pls close. thanks.|
|Comment by Thuan Nguyen [ 11/Apr/13 ]|
|Re-test in 4 nodes centos 5.8 64bit with build 2.0.2-760. I don't see any memcached crashed with segfault. I will close this bug.|