[MB-7161] under higher sets per second clients dont get all view results after observing for all mutations Created: 11/Nov/12  Updated: 19/Nov/12  Resolved: 19/Nov/12

Status: Resolved
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.0-beta-2
Fix Version/s: 2.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Farshid Ghods (Inactive) Assignee: Mark Nunberg
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File observe-test.php     File view_results.out    

 Description   
We have just recently found that under load, doing a series of writes, waiting until they're durable, and then querying a view with stale=false gives incorrect results. I should note that this is definitely tied to throughput load. Under minimal load, it works fine. Under a lot of load, we see the incorrect results. This could be a client issue, but I admit that it sounds more like a server side issue at the moment. We're going to try to reproduce it with another client before raising a server issue. It's JCBC-142 at the moment.

 Comments   
Comment by Farshid Ghods (Inactive) [ 11/Nov/12 ]
if possible please run the stale=false query with debug=true

more information about running queries with such option

http://hub.internal.couchbase.com/confluence/display/QA/Debugging+view+engine+issues+and+reporting+them
Comment by Matt Ingenthron [ 11/Nov/12 ]
Please attach your PHP tests where we repro'd this after the JCBC issue, and assign back to Farshid.
Comment by Mark Nunberg [ 12/Nov/12 ]
I've reproduced this issue with both php and java, so definitely a server issue.

In a nutshell:

PHP 'test' which fails (really an ad-hoc script to reproduce this on something other than the Java SDK):

http://www.couchbase.com/issues/secure/attachment/15765/observe-test.php

Modified Java tests - functions a lot like the php test..

https://github.com/mnunberg/couchbase-java-client/commit/3d788ab9d3a88c1dc20717c4dd110e3a8bb5f5bc


I should note something interesting here:

1) - I could not reproduce when running under a single node cluster (sometimes the keys failed to endure within the constraints, but the ones which did were always returned by the view

2) - Under a two node cluster, this happens a lot more. The most interesting note (And what gives it away as a server issue) is this:

About half of the time that the test failed, it failed by the view returning *Exactly* half the amount of expected rows;

For example; if I would set 500 keys (i.e. ensure that they persisted) and then request the view using stale=false, I would often get back *exactly* 250.

I've reproduced this behavior under both Java and PHP
Comment by Dipti Borkar [ 13/Nov/12 ]
Farshid, this is a blocker. has there been any update on this correctness issue?
Comment by Farshid Ghods (Inactive) [ 13/Nov/12 ]
Deep,

can you please reproduce this issue using the steps given by Mark.
Comment by Filipe Manana [ 14/Nov/12 ]
Is someone sure/has evidence this is a view-engine issue? Like checking if things were really persisted to disk before the query?
Can bet a lunch the problem is not there.
Comment by Deepkaran Salooja [ 14/Nov/12 ]
Looking at the code, I think we are not doing "full_set" queries here.

Mark, can you please try the same queries with full_set=true option.

Also, can you give me an idea of how many number of items you are loading/mutating. I have tried with 100K items.
Comment by Deepkaran Salooja [ 14/Nov/12 ]
Adding server logs here would be helpful.

Mark, please run collect_info on your server node e.g.

cd /opt/couchbase/bin/
./cbcollect_info out_file.zip

and attach the out_file.zip here.
Comment by Matt Ingenthron [ 14/Nov/12 ]
@deep: Have you run this?? the view is not a development view, thus full_set is not needed. Even if it were, the results should be deterministic. I would think the first step would be for you to try running it yourself. Is there something you're missing to be able to run this and gather whatever info you need?

@filipe: I won't take your bet. :) I don't know that it is a view-engine issue. In fact, there are some clues that it may not be. Here's what we know thus far:
* The issue is only manifested when there is a large number of items
* The issue only comes up if there is more than one server in the cluster (evidence that it could be in merging)
* The issue has been reproduced by two separate client libraries, verified by SDK QE (Mark) before escalating it

From what I know of our architecture, it could certainly be in ep-engine or couchstore. The client library is getting back (via the observe command) that data has been persisted, but perhaps it hasn't really been persisted? The other thing that could be happening is something at the view merging layer, since it only manifests itself when we have two or more nodes?
Comment by Steve Yen [ 14/Nov/12 ]
to give comments/instructions to Deep, etc. on next info to get.
Comment by Filipe Manana [ 14/Nov/12 ]
So, to nail it down, there's several things that need to be done:

1) Check that's there's no indexing errors due to runtime JavaScript errors (check mapreduce_errors.1 log file on each node). This is unlikely to be the issues apparently;

2) To know if stuff is not persisted to disk after observe unblocks the client, we need to know the ID of all keys, and know in which vbuckets (per node) keys will end up (running the hash function, or just analysis the db vbucket files after a few minutes). Adding a new key (document) to vbucket database file increases its seq number. So at query time, if we pass ?debug=true we'll get information about the seq numbers of all vbucket database files at the time the query arrived - this will allows us to see if these sequences are lower than what we expected.

Deep, do you think you can help on this? (Someone from SDKs can also due it, appreciated)

As soon as I have a list of all keys (and the result of the hash function to know in which vbucket they end up on each node) and the output of a query with ?debug=true missing some rows, I'll mention here, and add an entry to the to the wiki [1], how to analysis this issue based on all this information, so that in the future others can do this autonomously.

[1] http://hub.internal.couchbase.com/confluence/display/QA/Debugging+view+engine+issues+and+reporting+them
Comment by Filipe Manana [ 14/Nov/12 ]
See comment above Deep. Thanks.
Comment by Deepkaran Salooja [ 15/Nov/12 ]
Matt,

I tested yesterday with a similar reproducer I wrote in python but couldn't reproduce except for dev views(without full_set).

I have taken up the java reproducer by Mark and set it up (with my limited knowledge of java) as below:

> git clone https://github.com/mnunberg/couchbase-java-client.git
> cd couchbase-java-client
(Modified the SERVER_URI to my server in ViewTest.java)
> ant compile jar
and Run the test
> java -cp .:build/ivy/lib/couchbase-client/common/*:build/jars/* org.junit.runner.JUnitCore com.couchbase.client.ViewTest

The test testObserveWithStaleFalse is always passing on a 2-node(and 4-node) setup (1024 vbuckets, 1 replica). I am using build#1949.
I tried with docAmount as 500, 2500 and running the same test multiple times.

Please let me know if I am doing anything different/incorrect.





Comment by Matt Ingenthron [ 15/Nov/12 ]
Deep: can you share your environment with mark?

Mark: can you try to repro there?
Comment by Farshid Ghods (Inactive) [ 15/Nov/12 ]
Mark,

as part of your testing can you also grab the query results by appending ?debug=true
>>comments from filipe<<<
As soon as I have a list of all keys (and the result of the hash function to know in which vbucket they end up on each node) and the output of a query with ?debug=true missing some rows,

also when you say you ran a process to consume lots of cpu is this on the same node as the one couchbase server is running ?
Comment by Matt Ingenthron [ 15/Nov/12 ]
Mark may need logins to the servers. Can you help there Farshid?
Comment by Mark Nunberg [ 15/Nov/12 ]
The script has been modified to dump the entire view output (with debug=true) into a file called 'view_results.out' in the same directory.

Yes the cpuburn is running on one of the cluster nodes (which also happens to be the one running the client)

I don't need SSH access to those other machines. I've worked around this by adding the current machine (10.3.2.100) to the cluster; making it a three node cluster.

The server version for all those nodes is the same (1945 or such).
Comment by Farshid Ghods (Inactive) [ 15/Nov/12 ]
Thanks Mark. if you were able to replicate this issue did you happen to catprue the view_results.out ?


also starting another cpu intensive process or any other process that eats up i/o on couchbase cluster which is doing indexing or other things is not recommended. in this small scale however if enough cpu is left for the server to perform the indexing should happen right after persistence.

Comment by Farshid Ghods (Inactive) [ 15/Nov/12 ]
upload view results here : http://s3.amazonaws.com/bugdb/jira/MB-7161/view-results.out
Comment by Filipe Manana [ 15/Nov/12 ]
Sorry, but can you get the raw JSON?

It's a lot easier like this, as it makes it easier to paste long arrays into a javascript interpreter or python or even erlang.

From what I see, the view engine didn't miss anything, that is, it indexed all stuff persisted in vbucket database files.

Here's how I see it:

In debug_info object, you have an entry for each node. That entry is an object. One of its keys is "wanted_seqs". This corresponds to the seq number of each vbucket database file (active vbuckets) at the time the query request was received.
Then another field (again, per node), is named "indexable_seqs". This field tells us the seq number of each vbucket database file (active vbuckets) after the updater ran (for stale=false queries).

As we can see (by eye), for each node, indexable_seqs == wanted_seqs. This means the view engine didn't miss any persisted changes to the database file.

If the output were in JSON, we could just paste these hash objects into a javascript console (or erlang), and no need to do the eyeballing, just compared directly the seq numbers via == or >=.

What I suggest, after talking with Farshid, is right after the observe command returns, to send the ep-engine "stop/pause persistence" command to all nodes. And then use couch_dbdump against all database files to find out / confirm what documents are missing.
Comment by Deepkaran Salooja [ 16/Nov/12 ]
I was able to reproduce this with the php reproducer with 5 instances of burnP6 running on 1 node(making erlang and memcached starve).
We have some rows missing in view output. Now I picked up one of the rows.. and checked that its indeed there in the couchdb file

e.g one of the missing key observeseq_1 exists in 145.couch.1

root@ubuntu1104-64:/opt/couchbase/var/lib/couchbase/data/default# ../../../../../bin/couch_dbdump 145.couch.1
Doc seq: 27
     id: observeseq_1
     rev: 27
     content_meta: 128
     cas: 5032553164630947, expiry: 0, flags: 0
     data: (snappy) {"php_value":"observeseq_1"}

Total docs: 1

For 145.couch.1 the Doc seq is 27, but in the view query debug output for vbucket 145, we have:
indexable_seqs - 26
wanted_seqs - 26

so that means when the view query request was made Doc Seq was 26 and it later got updated to 27.

If I do a stale=false query after this, I get back the missing rows. i.e. in this case now doc seq 27 is picked up from vbucket 145.

So somewhere due to slowness of the system things go out of sync..observe says the key has been persisted and client code issues a view query. but view engine doesn't yet see the updated DocSeq.

I modified the php reproducer to output JSON for query debug information. The modified code and output in json is attached.
Comment by Filipe Manana [ 16/Nov/12 ]
Thanks Deep, excellent analysis.

So my gut feeling is that observe replies to the client after stuff is written to database file, but *before* ep-engine/couchkvstore notifies the erlang side that the vbucket database has a new header/snapshot.

Will confirm with Chiyoung/Mike if this is the case.
Comment by Mike Wiederhold [ 16/Nov/12 ]
We do not respond until after the notification takes place so I don't think that would be the issue. We can discuss more offline.
Comment by Filipe Manana [ 16/Nov/12 ]
See comment/question above Chiyoung/Mike.
Comment by Steve Yen [ 16/Nov/12 ]
bug-scrub mtg:

- feels like a race with mccouch notification / processing. The design, though, is synchronous (?).

- this scenario also artificially saturates server CPU ("cpuburn" ?).

- need to understand what the cost / risk of a fix.
Comment by Filipe Manana [ 16/Nov/12 ]
Deep, this might have been a race/timing issue.

You did the couchdb_dump run right after observe returned and before the query? (My guess is not, unless you have Flash powers).

Discussing with Chiyoung, without code changes to debug this issue, we can't reliably find out the source of the problem. My idea is to make a change (not to commit/submit to gerrit) that disables persistence (on all nodes) right *before observe returns and after writing the stuff to disk and synchronously notifying mccouch*.
Comment by Deepkaran Salooja [ 17/Nov/12 ]
Filipe, couchdb_dump was done after the stale=false view query. So that's in steady state, long after the test is finished.
Comment by Chiyoung Seo [ 17/Nov/12 ]
Deep,

I ran "observe-test.php" with 5 processes of burnP6 on a single node many times, and got the following output:

chiyoung@ubu-1706:~$ php observe-test.php
Have ID observe
Ooops.. key observeseq_0 didn't persist
Ooops.. key observeseq_2 didn't persist
Ooops.. key observeseq_3 didn't persist
...
Ooops.. key observeseq_297 didn't persist
Ooops.. key observeseq_298 didn't persist
Ooops.. key observeseq_299 didn't persist
Found key observeseq_48, but we didn't manage to persist it
Found key observeseq_82, but we didn't manage to persist it
Found key observeseq_87, but we didn't manage to persist it
Found key observeseq_94, but we didn't manage to persist it
Found key observeseq_100, but we didn't manage to persist it
Found key observeseq_105, but we didn't manage to persist it
Found key observeseq_113, but we didn't manage to persist it
Found key observeseq_140, but we didn't manage to persist it
Found key observeseq_145, but we didn't manage to persist it
Found key observeseq_156, but we didn't manage to persist it
Found key observeseq_199, but we didn't manage to persist it
Found key observeseq_225, but we didn't manage to persist it
Found key observeseq_233, but we didn't manage to persist it
Found key observeseq_236, but we didn't manage to persist it
Found key observeseq_260, but we didn't manage to persist it
Found key observeseq_273, but we didn't manage to persist it

Total keys that should have been deleted, but weren't: 0
Total rows: 284


I looked at the php test code and saw the following code snippet:

# Ensure that they've all persisted (sometimes not all of them will).
$status = $cb->keyDurabilityMulti($casvals, array("persist_to" => 2, "replicate_to" => 1));

I guess that "keyDurabilityMulti()" doesn't wait until all the keys are persisted. If this is true, I don't see anything wrong from my tests.

Comment by Deepkaran Salooja [ 17/Nov/12 ]
You are right Chiyoung, keyDurabilityMulti doesn't wait until all the keys are persisted. But it returns a status array using which the test figures out which keys were persisted and which weren't. So in the view query output, the test expects only those keys which were persisted.

$didnt_persist = array();
foreach ($status as $k => $v) {
    # For each of those keys which did not yet persist, store them in
    # a lookup. These are the keys we don't expect to show up in our view
    if (!$v) {
        printf("Ooops.. key %s didn't persist\n", $k);
        $didnt_persist[$k] = true;
    }
}

The problematic case is the output like below:


Have ID observe
Hrrm.. value observeseq_1 still has key {"php_value":"observeseq_1"}
Hrrm.. value observeseq_4 still has key {"php_value":"observeseq_4"}
Hrrm.. value observeseq_8 still has key {"php_value":"observeseq_8"}
Hrrm.. value observeseq_11 still has key {"php_value":"observeseq_11"}
Hrrm.. value observeseq_14 still has key {"php_value":"observeseq_14"}
......
Hrrm.. value observeseq_18 still has key {"php_value":"observeseq_18"}
Hrrm.. value observeseq_20 still has key {"php_value":"observeseq_20"}
Hrrm.. value observeseq_25 still has key {"php_value":"observeseq_25"}
Hrrm.. value observeseq_29 still has key {"php_value":"observeseq_29"}
Hrrm.. value observeseq_33 still has key {"php_value":"observeseq_33"}

Total keys that should have been deleted, but weren't: 94
Total rows: 206

This output is generated when the test detects a key which was reported as persisted but didn't show up in view query results(as all the keys returned by view query are deleted).

With a single node, I wasn't able to repro this error. I have a 3 node cluster and on one of the nodes I am running 5 burnP6 and on the same node I run the php code.
Comment by Chiyoung Seo [ 17/Nov/12 ]
Thanks for your quick response.

I set up three node cluster (10.5.2.35, 10.5.2.36, 10.5.2.37) and was running 5 burnP6 instances and the php client on 10.5.2.37. However, I didn't see any issues in 100 runs yet:

chiyoung@ubu-1706:~$ ps aux | grep burn
root 21640 73.0 0.0 100 16 pts/0 R 23:16 9:25 burnP6
root 21641 75.5 0.0 100 16 pts/0 R 23:16 9:44 burnP6
root 21642 81.2 0.0 100 12 pts/0 R 23:16 10:29 burnP6
root 21643 74.9 0.0 100 12 pts/0 R 23:16 9:40 burnP6
root 21644 80.1 0.0 100 16 pts/0 R 23:16 10:20 burnP6

chiyoung@ubu-1706:~$ ./observe-test.sh > result.txt
chiyoung@ubu-1706:~$
chiyoung@ubu-1706:~$
chiyoung@ubu-1706:~$ cat result.txt | grep "Total keys that should have been deleted"
Total keys that should have been deleted, but weren't: 0
Total keys that should have been deleted, but weren't: 0
...
Total keys that should have been deleted, but weren't: 0
Total keys that should have been deleted, but weren't: 0
Total keys that should have been deleted, but weren't: 0


Can you give me an access to your cluster? I can take a look more tomorrow.

Comment by Chiyoung Seo [ 17/Nov/12 ]
Okay, I saw the same issue after iterating the test in 1000 times:

Hrrm.. value observeseq_0 still has key {"php_value":"observeseq_0"}
Hrrm.. value observeseq_10 still has key {"php_value":"observeseq_10"}
Total keys that should have been deleted, but weren't: 2
Total rows: 292
Comment by Matt Ingenthron [ 19/Nov/12 ]
Unfortunately, adding a delay is not a workaround for the lack of correctness. How long do we delay? We'll never know. One can't tell by the results in a real deployment since there'd be other actors in the system.
Comment by Farshid Ghods (Inactive) [ 19/Nov/12 ]
Deep has been working on reproducing this under higher mutations deterministically and he is going to update the ticket shortly
Comment by Deepkaran Salooja [ 19/Nov/12 ]
This can also be reproduced by running the php test when data load is in progress(20k items, 500 sets/sec) on 1 node and without any CPU burning.
And is much more consistently reproducible this way.
e.g.
On a 3 node cluster, start load on a node using mcsoda(no cpu burning):
./lib/perf_engines/mcsoda.py 10.3.121.100 vbuckets=1024 doc-gen=0 doc-cache=0 ratio-creates=1 ratio-sets=1 ratio-misses=0 min-value-size=256,512 max-items=20000 exit-after-creates=1 prefix=aone

Run the test while load is in progress and the same problem happens:

root@ubuntu1104-64:~/php-tests# ./runany.pl misc/observe-test.php
php misc/observe-test.php
Have ID observe
Hrrm.. value observeseq_8 still has key {"php_value":"observeseq_8"}
Hrrm.. value observeseq_18 still has key {"php_value":"observeseq_18"}
Hrrm.. value observeseq_39 still has key {"php_value":"observeseq_39"}
Hrrm.. value observeseq_58 still has key {"php_value":"observeseq_58"}
Hrrm.. value observeseq_81 still has key {"php_value":"observeseq_81"}
Hrrm.. value observeseq_82 still has key {"php_value":"observeseq_82"}
Hrrm.. value observeseq_91 still has key {"php_value":"observeseq_91"}
Hrrm.. value observeseq_92 still has key {"php_value":"observeseq_92"}
Hrrm.. value observeseq_94 still has key {"php_value":"observeseq_94"}
Hrrm.. value observeseq_97 still has key {"php_value":"observeseq_97"}
....
Hrrm.. value observeseq_265 still has key {"php_value":"observeseq_265"}
Hrrm.. value observeseq_266 still has key {"php_value":"observeseq_266"}
Hrrm.. value observeseq_270 still has key {"php_value":"observeseq_270"}
Hrrm.. value observeseq_275 still has key {"php_value":"observeseq_275"}
Hrrm.. value observeseq_276 still has key {"php_value":"observeseq_276"}
Hrrm.. value observeseq_288 still has key {"php_value":"observeseq_288"}
Total keys that should have been deleted, but weren't: 46
Total rows: 1662
Comment by Farshid Ghods (Inactive) [ 19/Nov/12 ]
Deep,

do you have a python test which can be run with cluster_run configuration to reproduce this issue ?
Comment by Steve Yen [ 19/Nov/12 ]
looks like no intraserver-component race.

Chiyoung provided explanation of test-case with observe with 2-persist / 1-replica and how client mistakenly thinks active server has persisted.
Comment by Steve Yen [ 19/Nov/12 ]
bug-scrub - down to critical, as it's no longer a server internal issue.
Comment by Chiyoung Seo [ 19/Nov/12 ]
Set up the three node cluster (10.5.2.35, 10.5.2.36, 10.5.2.37) and ran 5 instances of burnP6 on 10.5.2.37

I ran "observe-test.php" and got the following ep-engine logs when the test was failed:

1) From 10.5.2.37 where active vbucket 764 is placed:

Mon Nov 19 10:54:15.275393 PST 3: PersistenceCallback: Set the seq no 345 for key 'observeseq_228' and vbucket 764: seq no 345 from DB

2) From 10.5.2.35 where replica vbucket 764 is placed:

Mon Nov 19 10:54:14.694152 PST 3: PersistenceCallback: Set the seq no 198 for key 'observeseq_228' and vbucket 764
Mon Nov 19 10:54:14.722671 PST 3: OBSERVE: key 'observeseq_228' and vbucket 764 was persisted

3) From the view debug output:

"wanted_seqs":
"0764" : 344

From the test, I confirmed that the php client gets the response indicating that "observeseq_228" was persisted in vbucket 764, but it was actually persisted in replica vbucket only on 10.5.2.35. However, the view indexer will look at the active vbucket 764 on 10.5.2.37 where the item wasn't persisted yet.

It seems to me that this is a client library issue (especially libcouchbase).


 
Comment by Mark Nunberg [ 19/Nov/12 ]
I haven't been able to determine any specific client bug as far as libcouchbase is concerned with respect to observe.

There are some things that should be noted, though.

The observe as implemented in libcouchbase is a simple low-level observe; that is, libcouchbase will effectively "broadcast" requests to the relevant nodes, and will invoke a php-ext level callback for each observe response.

Then the php-ext will go ahead and count the number of times those callbacks were received with a successful response (i.e. it was persisted and exists with the user-specified CAS).

This logic is implemented here:
https://github.com/couchbase/php-ext-couchbase/blob/ac35571a79c87281ea8fe3f03f89b7f8316ae71c/observe.c#L489

Basically sitting and waiting until durability requirements have been satisfied.

Despite the PHP client lacking any kind of 'master-first' logic, my initial tests were implemented with a bucket employing a single replica. The test script requests persistence to two nodes; considering the fact that there are only two nodes for any given vbucket, a persistence requirement of two effectively includes waiting for the item to be persisted to the master.

The script may of course be modified to persist to an arbitrary number of nodes (if the count is too many, the php client will warn, and cap the number to the maximum available (persist_to <= (num_replicas + 1) <= 4);

I've modified observe.c in the php ext (on 10.3.121.100) to print out keys as they receive positive responses from the master, and indeed I can reproduce this issue when such responses are received.

I'll actually go ahead now and dump the packets from libcouchbase to the screen as well (i.e. hex dump). Hopefully this should remove confusion about this..

Now as far as the keys not being replicated to the master go; the php ext only knows what libcouchbase tells it, and libcouchbase only knows what it receives on the memcached connection.

Unless libcouchbase is doing something seriously wrong with the protocol handling, I'm guessing the problem isn't there.
Comment by Mike Wiederhold [ 19/Nov/12 ]
From the logs:

Ooops.. key observeseq_599 didn't persist

[libcouchbase] observe_response_handler:459 Response from 10.3.3.95
MAGIC=RES OP=92 STATUS=SUCCESS KLEN=0 EXTLEN=0 NBODY=27 OPAQUE=4a9 CAS=4020000
Body:
[0000] 03 40 00 0E 6F 62 73 65 72 76 65 73 65 71 5F 35 ....obse rveseq.5
[0010] 39 39 00 00 1B 84 08 DD EC B6 E7 99...... ...
Key observeseq_599 persisted to master..


The key status in this message is 00 which corresponds to "Found, Not Persisted", but the log message reports that the item was persisted to master. Can you look at the libcouchbase implementation to see if this is the possible error?
Comment by Mike Wiederhold [ 19/Nov/12 ]
Please see my comment.
Comment by Mark Nunberg [ 19/Nov/12 ]
You are correct; the logging statement was placed at the wrong location (after *any* success status, rather than specifically a persisted one), and the persistence check in the php extension code checked against the wrong criteria (typo).

I'm not sure how the Java client failed, but I can't reproduce it with PHP anymore in light of those items which I've fixed up.

I've filed a bug in PHP for this (PCBC-148, PCBC-149), and there's already JCBC-142 (from whence the bug originated).

Additionally, means to test this (be it via the mock, or via the cluster) should be available. In reality there's usually a sub-millisecond time difference between having a key persisted and having it replicated..

So it seems that thanks to our extra debugging, it's no longer a server issue
Comment by Mike Wiederhold [ 19/Nov/12 ]
Thanks for following up on this and I'm glad we have a fix. The Java client is a separate issue that I will take care it tomorrow. Please close this issue as invalid if there is no server fix needed. If you have any other questions let me know.
Comment by Chiyoung Seo [ 19/Nov/12 ]
I think we have enough evidence indicating that this issue was NOT caused by the server engine side. The separate bugs were created for client sides.
Comment by Matt Ingenthron [ 19/Nov/12 ]
Great to know we can put this issue to rest. I'm glad we resolved where it was.

Mike: thanks for stepping through it and reviewing everything Mark submitted. It surely helped to get everyone looking at the same thing.

Mark: thanks for repro'ing the Java issue and then continuing to iterate providing enough info so the issue could be definitely identified. Also, great followup by filing the related issues.
Generated at Thu Jul 31 12:00:27 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.