[MB-7661] [Done, RN 2.1.0] View query on Rebalance.Out fails with Reason: A view spec can not consist of merges exclusively. Created: 01/Feb/13  Updated: 13/Aug/13  Resolved: 08/Aug/13

Status: Resolved
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0
Fix Version/s: 2.2.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Michael Nitschinger Assignee: Andrei Baranouski
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 4 2.0.0 Machines, running on Linux.

Attachments: Text File 192.168.56.105-Rebalance-Out.log    
Issue Links:
Dependency
blocks JCBC-189 Views having odd timeout issues on so... Closed
Duplicate
is duplicated by MB-6922 ns_server should send 302 reply when ... Closed

 Description   
I've been running SDKQE stester runs for the Java SDK, specifically testing the following scenario:

/stester -C 127.0.0.1:8050 -i 20devcluster.ini -c rebalance.Once --vdsw_dvname ddoc/vquery --rbcount 2 --dsw_timeres 1 --hdsw_mc_threads 10 --workload dsw.Hybrid --mode out --hdsw_http_threads 5 --hdsw_cb_threads 10 --rebound 90 -d -o rebalance-once.log

My cluster is a 4 node cluster, and this scenario rebalances 2 nodes out of the cluster. During the rebalance, view queries are issued.

Now it happens that during this rebalance (I assume this happens at the end of the rebalance) I get the following Exception:

java.lang.RuntimeException: Failed to access the view
at com.couchbase.client.CouchbaseClient.query(CouchbaseClient.java:838)
at com.couchbase.sdkd.cbclient.ViewQueryCommandContext.execIter(ViewQueryCommandContext.java:252)
at com.couchbase.sdkd.cbclient.CommandContext.execute(CommandContext.java:311)
at com.couchbase.sdkd.server.SdkServer.executeCommand(SdkServer.java:135)
at com.couchbase.sdkd.server.SdkServer.handleRequest(SdkServer.java:156)
at com.couchbase.sdkd.server.SdkServer.run(SdkServer.java:212)
Caused by: java.util.concurrent.ExecutionException: OperationException: SERVER: error Reason: A view spec can not consist of merges exclusively.
at com.couchbase.client.internal.HttpFuture.waitForAndCheckOperation(HttpFuture.java:89)
at com.couchbase.client.internal.HttpFuture.get(HttpFuture.java:73)
at com.couchbase.client.internal.HttpFuture.get(HttpFuture.java:63)
at com.couchbase.client.CouchbaseClient.query(CouchbaseClient.java:834)
... 5 more
Caused by: OperationException: SERVER: error Reason: A view spec can not consist of merges exclusively.
at com.couchbase.client.protocol.views.NoDocsOperationImpl.parseError(NoDocsOperationImpl.java:106)
at com.couchbase.client.protocol.views.ViewOperationImpl.handleResponse(ViewOperationImpl.java:68)
at com.couchbase.client.ViewNode$MyHttpRequestExecutionHandler.handleResponse(ViewNode.java:199)
at org.apache.http.nio.protocol.AsyncNHttpClientHandler.processResponse(AsyncNHttpClientHandler.java:417)
at org.apache.http.nio.protocol.AsyncNHttpClientHandler.inputReady(AsyncNHttpClientHandler.java:242)
at com.couchbase.client.http.AsyncConnectionManager$ManagedClientHandler.inputReady(AsyncConnectionManager.java:244)
at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:172)
at org.apache.http.impl.nio.DefaultClientIOEventDispatch.inputReady(DefaultClientIOEventDispatch.java:155)
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:161)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:335)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:275)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:542)
at java.lang.Thread.run(Thread.java:680)
Feb 1, 2013 1:04:30 PM com.couchbase.sdkd.cbclient.CommandResult warnAbout
WARNING: Unknown exception encountered (for operation) future warnings will be suppressed
java.lang.RuntimeException: Failed to access the view
at com.couchbase.client.CouchbaseClient.query(CouchbaseClient.java:838)
at com.couchbase.sdkd.cbclient.ViewQueryCommandContext.execIter(ViewQueryCommandContext.java:252)
at com.couchbase.sdkd.cbclient.CommandContext.execute(CommandContext.java:311)
at com.couchbase.sdkd.server.SdkServer.executeCommand(SdkServer.java:135)
at com.couchbase.sdkd.server.SdkServer.handleRequest(SdkServer.java:156)
at com.couchbase.sdkd.server.SdkServer.run(SdkServer.java:212)
Caused by: java.util.concurrent.ExecutionException: OperationException: SERVER: no_active_vbuckets Reason: Cannot execute view query since the node has no active vbuckets
at com.couchbase.client.internal.HttpFuture.waitForAndCheckOperation(HttpFuture.java:89)
at com.couchbase.client.internal.HttpFuture.get(HttpFuture.java:73)
at com.couchbase.client.internal.HttpFuture.get(HttpFuture.java:63)
at com.couchbase.client.CouchbaseClient.query(CouchbaseClient.java:834)
... 5 more
Caused by: OperationException: SERVER: no_active_vbuckets Reason: Cannot execute view query since the node has no active vbuckets
at com.couchbase.client.protocol.views.NoDocsOperationImpl.parseError(NoDocsOperationImpl.java:106)
at com.couchbase.client.protocol.views.ViewOperationImpl.handleResponse(ViewOperationImpl.java:68)
at com.couchbase.client.ViewNode$MyHttpRequestExecutionHandler.handleResponse(ViewNode.java:199)
at org.apache.http.nio.protocol.AsyncNHttpClientHandler.processResponse(AsyncNHttpClientHandler.java:417)
at org.apache.http.nio.protocol.AsyncNHttpClientHandler.inputReady(AsyncNHttpClientHandler.java:242)
at com.couchbase.client.http.AsyncConnectionManager$ManagedClientHandler.inputReady(AsyncConnectionManager.java:244)
at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:172)
at org.apache.http.impl.nio.DefaultClientIOEventDispatch.inputReady(DefaultClientIOEventDispatch.java:155)
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:161)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:335)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:275)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:542)

Note that the Java stuff is not of particular interest for this ticket, but the Error messages are. First, I get

error Reason: A view spec can not consist of merges exclusively.
and then
 no_active_vbuckets Reason: Cannot execute view query since the node has no active vbuckets

The way the Java client currently implements is that it will remove the ViewNode during rebalance when it vanishes from the couchNodes list. I'll attach a rebalance log from one of the nodes that is getting rebalanced out, but as expected this happens at the very last step, its still in the list when it has no vbuckets anymore.

Should the server not be able to handle view requests, even when it has no vbuckets attached?

Thanks,
Michael


 Comments   
Comment by Michael Nitschinger [ 01/Feb/13 ]
For reference, IP-Addresses 192.168.56.101 - 104, 103 and 104 get removed. The log is from .104.
Comment by Aleksey Kondratenko [ 01/Feb/13 ]
This is known problem. Some duplicate ticket should exist somewhere. We cannot easily fix it. Perhaps best thing we can do is send you back redirect to a node that likely works.
Comment by Matt Ingenthron [ 01/Feb/13 ]
Alk: but do we currently redirect, or do we currently error? It looks like we currently error.
Comment by Aleksey Kondratenko [ 01/Feb/13 ]
No. We don't redirect AFAIK.

You can try to mitigate on your side by either trying different node on any error, or by trying to detect this particular error.

I cannot promise you yet any particular release when we'll start redirecting.
Comment by Aleksey Kondratenko [ 08/Apr/13 ]
Too late for 2.0.2 but IMHO must have for 2.1
Comment by Matt Ingenthron [ 30/May/13 ]
Is the most recent comment here still relevant? We still see this in integration tests. 2.1 or ...
Comment by Aleksey Kondratenko [ 30/May/13 ]
yes. We will not have it in most upcoming release
Comment by Aleksey Kondratenko [ 30/May/13 ]
let me clarify. We still don't plan to address that in next release (previously known as 2.0.2 and being renamed to 2.1)
Comment by Maria McDuff (Inactive) [ 03/Jun/13 ]
moving to 2.0.3
Comment by Anil Kumar [ 03/Jun/13 ]
this is must have for 2.0.3.
Comment by Michael Nitschinger [ 04/Jun/13 ]
FYI the way we try to remedy that on the client side in the meantime is to just retry on another node (if we get 300 or 500 back)..
Comment by kzeller [ 18/Jun/13 ]
Added to 2.1, 2.0.1, and 2.0 RN:

<rnentry type="knownissue">

<version ver="2.1.0a"/>

<class id="cluster"/>

<issue type="cb" ref="MB-7661"/>


<rntext>

<para>
If you query a view during cluster rebalance it will fail and return the messages
"error Reason: A view spec can not consist of merges exclusively" and then
"no_active_vbuckets Reason: Cannot execute view query since the node has no active vbuckets."
The workaround for this situation is to handle this error and retry later in your code. Alternatively
the latest version of the Java SDK will automatically retry upon these errors.
</para>


</rntext>

</rnentry>
Comment by Matt Ingenthron [ 24/Jun/13 ]
I would turn that last sentence around and say in general an application developer should retry the request if they get these error responses. The 1.1.8 (not yet released) version of the Java SDK is planned to automatically retry those for applications.
Comment by Christian Weirich [ 22/Jul/13 ]
And what about the none Java SDKs? Will they be updated too?
Comment by Aleksey Kondratenko [ 23/Jul/13 ]
As usual things are getting messy rapidly due to un-detected duplicates.

As of 2.0 we do send redirect. Matt, is that something you're reasonably happy about (i.e. all sdks will honor those redirects) ?

Corresponding commit is this:

commit 06eed990928b99d5223094a61d68971f427df9ed
Author: Aliaksey Kandratsenka <alk@tut.by>
Date: Mon Oct 15 17:07:57 2012 -0700

    MB-6922: send 302 when handling no active vbuckets on view query
    
    So that clients can clearly distinguish hitting node being
    rebalanced-in or -out and hitting dead ddoc or bucket. Also Location
    header will point client to better node which is helpful as well.
    
    Change-Id: I5ed1066ba646a67d0197b67f3988251822dfec31
    Reviewed-on: http://review.couchbase.org/21657
    Tested-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
    Reviewed-by: Aliaksey Artamonau <aliaksiej.artamonau@gmail.com>
Comment by Matt Ingenthron [ 23/Jul/13 ]
Alk: did you mean "as of 2.0", or some other version? I ask because this has definitely been observed with recent releases.
Comment by Matt Ingenthron [ 23/Jul/13 ]
Christian: yes, we'll update other clients as soon as possible.
Comment by Aleksey Kondratenko [ 23/Jul/13 ]
Yes. 2.0. So my own comments above from Apr are somewhat invalid. Apparently I forgot we did redirect.

Now the question is: is redirect a sensible solution or you want server to deal with that (for some additional cost complexity- and efficiency- wide)
Comment by Matt Ingenthron [ 23/Jul/13 ]
Two parts. One, I think a redirect is better than proxying or some other expensive solution. Two: we've seen this recently, so something must be wrong.
Comment by Aleksey Kondratenko [ 23/Jul/13 ]
Noted. Good that we agree on redirect.

That issue still occurs with regular views must be a bug. We'll need reproduction instructions or diags from recent reproduction in order to do something with that.
Comment by Christian Weirich [ 24/Jul/13 ]
Matt: thank you. good to know.
Comment by Matt Ingenthron [ 29/Jul/13 ]
Deepti: have we seen the error message identified in the summary line on this issue in any recent testing of 2.1.0 or 2.1.1? My recollection is that we have. Can you put 30 minutes or so into looking over past results and if you can identify a situation, we may need to repro again to gather logs for the cluster dev team.
Comment by Deepti Dawar [ 31/Jul/13 ]
Hi Matt, I looked through the test results from the 2.1.1 and cbc 1.1.8 testing and I did not find these errors repeating. Now only the Invalid view exception is appearing most frequently which has been raised as a bug.
Comment by Matt Ingenthron [ 08/Aug/13 ]
Marking as resolved, as SDKQE reports this is not something we see any longer when client libraries correctly handle redirects.
Generated at Sat Aug 30 07:19:15 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.