[JCBC-134] resubscriber IllegalArgumentException during topology changes Created: 19/Oct/12  Updated: 23/Jan/13  Resolved: 23/Jan/13

Status: Resolved
Project: Couchbase Java Client
Component/s: Core
Affects Version/s: 1.0, 1.1.1
Fix Version/s: 1.1.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Mark Nunberg Assignee: Michael Nitschinger
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Got this during a swap-rebalance. I don't have any other relevant information to reproduce for now (fwiw I've run this many times and it's the first time I'm seeing it.. though I've just seen it again on the very next run).

During the second run, the cluster rebalance is actually hanging..

Attachments: Text File daschl-4-rebealance_two_nodes.log     Text File daschl-4-restart.log     Zip Archive junit.zip     File log2.txt.bz2     File log2.txt.bz2    
Issue Links:
Dependency

 Description   
Exception in thread "couchbase cluster resubscriber - running" java.lang.IllegalArgumentException: Bucket name cannot be null and must never be re-set to a new object.
        at com.couchbase.client.vbucket.ConfigurationProviderHTTP.subscribe(ConfigurationProviderHTTP.java:240)
        at com.couchbase.client.vbucket.ConfigurationProviderHTTP.finishResubscribe(ConfigurationProviderHTTP.java:215)
        at com.couchbase.client.CouchbaseConnectionFactory$Resubscriber.run(CouchbaseConnectionFactory.java:322)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)


Unfortunately I don't have a whole lot more insight into what's happening, but the stack trace might be helpful to examine.. assigning to myself until I have more info..

 Comments   
Comment by Michael Nitschinger [ 08/Nov/12 ]
Hi Mark,

can you elaborate a bit more whats going on through the test? This particular exception can come up when the java sdk tries to subcribe to a new node when the old connection is closed. I think the scenario should give us a connection to what is happening at runtime in the java sdk.

Thanks,
Michael
Comment by Mark Nunberg [ 12/Dec/12 ]
I've run the Java SDKD tests several times already and cannot reproduce this. Will re-open it if i see it again
Comment by Mark Nunberg [ 21/Jan/13 ]
Seen again at:

http://review.couchbase.org/#/c/24092/
Comment by Mark Nunberg [ 21/Jan/13 ]
So as I mentioned in the bug, I closed it because I haven't seen this error. Just now, both me and Michael encountered this error while running the SDKD tests
Comment by Michael Nitschinger [ 22/Jan/13 ]
Attaching the logs for http://review.couchbase.org/#/c/24092 changeset 4 (prefixed with daschl-4-) on some test runs.

Deepti is currently running the changeset against the brun cluster, expect some info in 2 hours.
Comment by Michael Nitschinger [ 22/Jan/13 ]
./stester -i 20devcluster.ini --service ALL --svcaction RESTART --num_nodes 3 --no_fo 1 -c failover.Once --dsw_timeres 1 -d -o restart.log -C 127.0.0.1:8050
Comment by Michael Nitschinger [ 22/Jan/13 ]
./stester -i 20devcluster.ini -c rebalance.Once --mode out --rbcount 2 --dsw_timeres 1 -d -o rebealance_two_nodes.log -C 127.0.0.1:8050
Comment by Deepti Dawar [ 22/Jan/13 ]
Attaching the functional test results.
This was run against a local 2.0.0 node.
Pass rate is better this time - 92%.
Comment by Deepti Dawar [ 22/Jan/13 ]
For the Hybrid tests - failures are still coming.

Attaching the intermittent log.
Comment by Deepti Dawar [ 22/Jan/13 ]
The error that seems to be problematic in the unit test logs is this one -

'Timeout occurred. Please note the time in the report does not reflect the time until the timeout.
junit.framework.AssertionFailedError: Timeout occurred. Please note the time in the report does not reflect the time until the timeout.'

Most of the issues coming due to timeout.

Note : that these tests were run against a local cluster. Hence, such problems should not be occurring.
Comment by Michael Nitschinger [ 23/Jan/13 ]
Merged in today, right before the 1.1.1 release.
Generated at Wed Sep 17 19:25:13 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.