[MB-4824] memcached silently closes connection in auth request [was: ns_server: throws exception in ebucketmigrator_srv during rebalance test] Created: 21/Feb/12 Updated: 10/Jan/13 Resolved: 03/Jun/12 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | bucket-engine |
| Affects Version/s: | 2.0-developer-preview-4 |
| Fix Version/s: | 2.0 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Tommie McAfee | Assignee: | Chiyoung Seo |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | 2.0-dev-preview-4-release-notes | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | cluster-run 6 nodes. | ||
| Attachments: |
|
| Description |
|
Rebalance-in while inserting spatial docs causes rebalance to fail. During the time of failure, I see the following in the diags:
[error_logger:error] [2012-02-21 10:49:04] [n_0@10.17.4.51:error_logger:ale_error_logger_handler:log_report:72] =========================CRASH REPORT========================= crasher: initial call: ebucketmigrator_srv:init/1 pid: <0.25874.6> registered_name: [] exception error: no match of right hand side value {error,closed} in function mc_client_binary:cmd_binary_vocal_recv/5 (src/mc_client_binary.erl, line 103) in call from mc_client_binary:auth/2 (src/mc_client_binary.erl, line 145) in call from ebucketmigrator_srv:connect/4 (src/ebucketmigrator_srv.erl, line 281) in call from ebucketmigrator_srv:init/1 (src/ebucketmigrator_srv.erl, line 133) ancestors: [<0.25873.6>] messages: [] links: [#Port<0.454477>,<0.25873.6>,#Port<0.454476>] dictionary: [] trap_exit: false status: running heap_size: 987 stack_size: 24 reductions: 960 neighbours: SpatialRebalanceTests.test_insert_x_docs_during_rebalance |
| Comments |
| Comment by Aleksey Kondratenko [ 22/Feb/12 ] |
|
this error means that source node closed connection when we tried to establish tap connection. I see no traces of ns_server-side issues that could lead to this. We honestly did what we're normally doing.
BTW, thumbs up for using R15 which provides line numbers in backtrace. So it closes connection when SASL auth is performed. Just after establishing tcp connection and prior to selecting bucket. And much prior to sending TAP_CONNECT command. This could be something OS-related (like exhausting file descriptors in memcached for example). |
| Comment by Aleksey Kondratenko [ 22/Feb/12 ] |
|
Assigning to our memcached & bucket engine guy. Trond, ns_server is using bucket name and password in AUTH request. But that's bucket default with forcibly empty password. So it should not be AUTH failure. Even if it is, it should not be intermittent and silent (i.e. no error is returned but socket just closed). |
| Comment by Trond Norbye [ 24/Feb/12 ] |
| I'll look into this today |
| Comment by Trond Norbye [ 24/Feb/12 ] |
|
Are you absolutely sure it's not sending any data before the socket is closed? I just added plaintext sasl support for mcstat and ran memcached like:
export ISASL_PWFILE=/Users/trondnorbye/compile/ (isasl.pw contains: default ) When I connect ./mcstat -u default -h localhost:11211 -P ' ' It receives an error package before the socket is closed (as memcached prints out:) ./memcached -vvv -S Loaded isasl db from /Users/trondnorbye/compile/ Initialized SASL. isasl checking DB every 60s Loaded engine: Default engine v0.1 Supplying the following features: LRU, compare and swap <22 server listening (binary) <23 server listening (binary) <24 send buffer was 9216, now 3728270 <24 server listening (udp) <25 send buffer was 9216, now 3728270 <24 server listening (udp) <25 server listening (udp) <25 server listening (udp) <24 server listening (udp) <24 server listening (udp) <25 server listening (udp) <25 server listening (udp) 23 - Running task: (conn_listening) <26 new binary client connection. 26 - Running task: (conn_new_cmd) 26: going from conn_new_cmd to conn_waiting 26 - Running task: (conn_waiting) 26: going from conn_waiting to conn_read 26 - Running task: (conn_read) 26: going from conn_read to conn_parse_cmd 26 - Running task: (conn_parse_cmd) >26 Read binary protocol data: >26 0x80 0x21 0x00 0x05 >26 0x00 0x00 0x00 0x00 >26 0x00 0x00 0x00 0x10 >26 0x00 0x00 0x00 0x00 >26 0x00 0x00 0x00 0x00 >26 0x00 0x00 0x00 0x00 26: authenticated() in cmd 0x21 is true 26: going from conn_parse_cmd to conn_nread 26 - Running task: (conn_nread) 26 - Running task: (conn_nread) 26 - Running task: (conn_nread) 26 - Running task: (conn_nread) 26: mech: ``PLAIN'' with 11 bytes of data 26: sasl result code: -1 26: Unknown sasl response: -1 >26 Writing an error: Auth failure <26 Writing bin response: <26 0x81 0x21 0x00 0x00 <26 0x00 0x00 0x00 0x20 <26 0x00 0x00 0x00 0x0c <26 0x00 0x00 0x00 0x00 <26 0x00 0x00 0x00 0x00 <26 0x00 0x00 0x00 0x00 26: going from conn_nread to conn_mwrite 26 - Running task: (conn_mwrite) 26 - Running task: (conn_mwrite) 26: going from conn_mwrite to conn_new_cmd 26 - Running task: (conn_new_cmd) 26: going from conn_new_cmd to conn_waiting 26 - Running task: (conn_waiting) 26: going from conn_waiting to conn_read 26 - Running task: (conn_read) 26: going from conn_read to conn_closing 26 - Running task: (conn_closing) -1: going from conn_closing to conn_immediate_close -1 - Running task: (conn_immediate_close) Immediate close of 0x7ff80053d6c0 Could it be that the code detects that the socket is closed and doesn't read out the error message? If no, how can I reproduce this issue? |
| Comment by Trond Norbye [ 24/Feb/12 ] |
| Please let me know how to reproduce if you're absolutely sure that you're not getting an error message |
| Comment by Aleksey Kondratenko [ 24/Feb/12 ] |
|
Yes I am absolutely sure it is not being sent auth reply. It could be unrelated to AUTH, because error is seemingly intermittent.
And apparently the way to reproduce it is by running testrunner test that was mentioned as part of bug report. |
| Comment by Trond Norbye [ 24/Feb/12 ] |
| If you spawn memcached with more logging you should see when it disconnects you and you can map that with the error you're seeing.. try add -vvv (but you probably need to extend the logger to not drop lines...) |
| Comment by Farshid Ghods [ 29/Feb/12 ] |
| results in a rebalance hung |
| Comment by Aleksey Kondratenko [ 29/Feb/12 ] |
| no it does not. If rebalance hungs anywhere it's a different bug. |
| Comment by Aleksey Kondratenko [ 25/Apr/12 ] |
| We've seen very similar issue in 1.8.1 recently. I've added better diagnostics and hopefully will soon find out whats wrong |
| Comment by Farshid Ghods [ 03/Jun/12 ] |
|
there are couple of other 1.8.1 bugs related to this issue. please close if this is a dupe |
| Comment by Chiyoung Seo [ 03/Jun/12 ] |
|
These issues are basically caused by the same reason. http://www.couchbase.com/issues/browse/MB-5343 |