[MB-8149] Status Message Garbage against a Windows Cluster Created: 24/Apr/13  Updated: 21/Aug/13  Resolved: 21/Aug/13

Status: Resolved
Project: Couchbase Server
Component/s: clients
Affects Version/s: 2.0, 2.0.1
Fix Version/s: 2.2.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Michael Nitschinger Assignee: Trond Norbye
Resolution: Fixed Votes: 0
Labels: windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 130424_ml_opstatus_message_unicode_bug.png    
Operating System: Windows 32-bit

Something that was reported earlier somewhere, but I could never track it down until now.

I'm currently at the DevDay vienna and one person used the Java SDK against a Windows Couchbase Server 2.0 install (locally). So everything works well, but when he wants to print out the status message on async operations the string that comes back is total garbage. It doesnt help to change the encoding inside the IDE, so I guess its something different.

The interesting part is that when I let him point the same code to my Mac OS Couchbase Server 2.0 instance, the message comes back perfectly fine. I've never seen this on mac/linux before.

So my question is first: is there anything that is different on Windows in terms of encoding and such that makes a difference here? So Win client -> Win server breaks, Win client -> Mac server works.

I'll attach a screenshot from him to show what it looks like. It comes up when you use getStatus().getMessage() on a async operation.


Comment by Michael Nitschinger [ 26/Apr/13 ]
One more addition:

during the current dev day here in Prague, one guy showed me the same problem but he was using the .NET SDK against a Windows machine. I told him also to try against my local bucket on Mac and did work fine again.

So since both Java and .NET face the same issues against a Windows machine, maybe its a bug in our Windows build?
Comment by Michael Nitschinger [ 26/Apr/13 ]
The person (already a customer), said that this has been an issue in 1.8.X as well, so it must be in there "since some time".
Comment by Trond Norbye [ 26/Apr/13 ]
Can you add a code snippet on how to reproduce the problem in a client so I know where to look on the server?

Comment by Michael Nitschinger [ 05/May/13 ]
Hey Trond,

can you pass me an IP address where I get access to a windows box to run this? I don't have a specific snippet for you but it should be done easily. Let me give you the information that I recall.

- I only think the message is broken on some of the responses (in the screenshot you can see its a cas response failure on the message).

It always happens when you inspect the status of the returned future. So the snippet would be * do something *, especially with cas() and then grab the future and do a getStatus().getMessage() and just print it to STDOUT. This shoud suffice.

Again, if you have a box where I can try I'll do so.
Comment by Matt Ingenthron [ 31/May/13 ]
Siri: this seems to be a problem with character encoding in responses on Windows. Unfortunately, we don't have a test case, but since it's an error code response over memcached protocol it may be possible to look for the cause in code inspection.

Note that Windows -> Mac works, but Windows -> Windows fails. Does not appear to be a client issue.
Comment by Maria McDuff (Inactive) [ 04/Jun/13 ]
deferring to 2.0.3 since this is not a regression.
Comment by Sriram Melkote [ 05/Jun/13 ]
Sorry Michael. I've now extended the volume for additional 30GB free. Thanks!
Comment by Michael Nitschinger [ 13/Jun/13 ]
Okay, it is easily reproducable on the target system.

I left the IntelliJ open, but in case its closed just reopen it. The only project is called "mb8149" and contains a main() method that you just need to run. (select file on the left, right click, "Run MainApp.main()".

You'll see the garbled message in the log underneath. It just does a cas() update of the document with an invalid cas key.. if I run the same code locally, I correctly get "Data exists for key" so there is no weird character in there. The message on the windows box is completely broken.

Let me know if you need anything else!
Comment by Michael Nitschinger [ 13/Jun/13 ]
Assigning back to you for investigations!
Comment by Sriram Melkote [ 16/Aug/13 ]
I looked at this briefly before the VM got stopped (due to too low a price) on EC2. There didn't seem to be anything wrong from ns_server - either the message from memcached was wrong, or the buffer was being decoded wrongly by the client. In any case, since I'm not the correct person for ns_server (Alk), memcached (Mike) or the client (Michael), I'm reassigning rather than sit on this any longer for lack of time.
Comment by Matt Ingenthron [ 19/Aug/13 ]
I'm assigning this to Trond, raising it to critical and re-adding 2.2 per a discussion with Ravi. The concern here is that if we have garbage or invalid encoding in error messages, what else has garbage/invalid encoding.

Trond: the request here is to inspect the path to try to find any possible causes. We hope it's just a simple problem with string encoding. Please check with Michael if additional info is needed.
Comment by Sriram Melkote [ 19/Aug/13 ]
Note that the message seems to have lot of \u0000, so it may be that we encoded 0's as a string for some reason
Comment by Trond Norbye [ 21/Aug/13 ]
The error function passed a stack buffer for transfer which seemed extremely likely to be the root cause here. It has been fixed in http://review.couchbase.org/#/c/28388/
Generated at Thu Nov 27 14:06:12 CST 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.