[MB-7140] [system test] tcmalloc segfault Created: 09/Nov/12 Updated: 11/Apr/13 Resolved: 18/Jan/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Thuan Nguyen | Assignee: | Thuan Nguyen |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | system-test | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | centos 6.2 64bit build 2.0.0-1945 | ||
| Description |
|
Cluster: 7 nodes
10.6.2.37 10.6.2.38 10.6.2.39 10.6.2.40 10.6.2.42 10.6.2.43 10.6.2.44 Node will be added: 10.6.2.45 Build # 2.0.0-1945 Environment: 8 nodes with 390GB SSD drive, 32GB RAM Bucket: 1 default bucket (1 replica), disable replica index. Number of clients: 1 Load 40 million items to default bucket that push resident ratio down to around 62% Maintain load about 600 ops and 600 queries per second Create 1 design doc with 8 views. Let the initial index completed. Then add node 45 to cluster and rebalance. Rebalance was done at 01:35:40 - Friday Nov 9, 2012. Then at 07:19:06 - Fri Nov 9, 2012, control connection to memcached on node 42 was disconnected. Generate memcached backtrace on node 42, got error Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libnss_files.so.2 Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'. Program terminated with signal 11, Segmentation fault. #0 0x00007fb8600f3d03 in tcmalloc::CentralFreeList::FetchFromSpans() () from /opt/couchbase/lib/libtcmalloc_minimal.so.4 Missing separate debuginfos, use: debuginfo-install couchbase-server-2.0.0-1945.x86_64 This bug looks similar to bug Link to manifest file http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1945-rel.rpm.manifest.xml Link to memcached stack trace https://friendpaste.com/mhLCkxtmbXmmO0t3ZtQkV Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201211/8nodes-ci-1945-tcmalloc-segfault-20121109-120057.tgz |
| Comments |
| Comment by Chiyoung Seo [ 09/Nov/12 ] |
|
I don't see anything suspicious in ep-engine. At this time, I move this to 2.0.1.
I was keeping track of the tcmalloc bug reports and some users reported some crashes recently as well. I will create a separate a bug for patching the tcmalloc latest version in 2.0.1 |
| Comment by Chiyoung Seo [ 21/Nov/12 ] |
| Move this to 2.0.2 as it happens very rarely. As mentioned above, we need to patch the latest tcmalloc in 2.0.1 as it has fixes to some crash issues. |
| Comment by Chiyoung Seo [ 18/Jan/13 ] |
| I was not able to reproduce this issue, but we opened the bug to patch the latest tcmalloc. |
| Comment by Maria McDuff [ 05/Apr/13 ] |
| Hi Tony, chk if current 2.0.2 build is segfaulting, if not, pls close. thanks. |
| Comment by Thuan Nguyen [ 11/Apr/13 ] |
| Re-test in 4 nodes centos 5.8 64bit with build 2.0.2-760. I don't see any memcached crashed with segfault. I will close this bug. |