Error: Incorrect parameter for prepared name in encoded plan

Hi,

I am getting this error in response when a query is being executed through an application (java).

Prepared name in encoded plan parameter is not [127.0.0.1:8091]b217fd0b75bc740ab648222d07cdc6814eee0e9f

This has started coming today only. however, I am not aware of any change in the configuration of Couchbase server which can cause an encoded plan to get corrupted.
I checked query.log but couldn’t find any log related to this error.

Can anyone tell me what this error actually means and how I can debug it further?

Hi Krishan,

Can you please post the log after setting loggings (by referencing https://blog.couchbase.com/logging-couchbase-java-client/)?

Regards,
-Jae

Hi @Jae,

I am not sure what I need to do here. Can you please be more specific?

Thanks,
Krishan

It looks like SDK’s sent the prepared name and encoded_plan for execution. But prepared name sent doesn’t match with the prepared name inside encoded_plan. Due to that it return error. cc @marcoeg

@vsr1, Is there any way I can retrieve the encoded plan that is being sent by SDK?
Also, can you explain the flow of execution as well? I am interested in how this works internally so that I can visualize the issue in a better way.

FYI- Couchbase version : 5.5 Enterprise

Hi @krishan.jangid - I’m having a look at the code and try to repro (even though so far a reproduction has eluded me), bear with me and I’ll have some news soon…

HTH,
Marco

Actually, I have found the issue by code inspection.
I have logged MB-31270.

1 Like

Hi @Marco_Greco, any update on this?

I was going to reply today. Fixed during the weekend, you’ll get it in 5.5.2

1 Like

By when will this version be available?

If all goes well, it should be GA in a couple of weeks.

1 Like

@Marco_Greco, is there any workaround to get this issue fixed until fixed version is available? Our development and testing is blocked due to this bug.
Thanks

Hi Khrishnan, let me say this.
In 5.5.x there should never be a case in which your applications resort to using the encoded_plan REST API parameter, in the sense that N1QL should be able to handle prepareds without ever having a look at it.

If N1QL has to go to the last resort of even inspecting encoded_plan, something nasty is happening to your prepared cache, which it shouldn’t.

It seems to me like your developers are churning the cache beyond control - the N1QL prepared cache by default hold 16k statements that are managed by LRU, meaning that old prepared statements go away after they have not been used and 16383 new prepared statements have been created since the statement’s last use.

Either that, or you have something that’s periodically deleting from system:prepareds.

Or somehow your prepared limit has been shrunk from the default.

If the first or third case applies - you can increase the prepareds cache

curl http://localhost:8093/admin/settings -u Administrator:password -d ‘{“prepareds-limit”: 65536}’

In the first case, you also should see why you have so many prepared statements created that the cache can’t hold them?
In general terms I would say that such a churn is not normal.
In my 20 years in relational engine development, the norm I have seen is to have a few thousand prepareds system wide

If something is deleting from system:prepareds - well, that process needs to stop because it is affecting your operations.

HTH,
Marco

@Marco_Greco, I’m not sure what makes you think that N1QL prepared cache is not storing as much statements as expected. I just verified and system:prepareds holds 16384 statements on our dev environment as expected. Also, I there no such process written which deletes from system:prepareds periodically. But still this issue persisted.
Can you explain how this relates to the bug you resolved by fixing your code?

Since I had to do something to unblock our testing team, I deleted old cached plans and restarted all the nodes. For now, the issue is not visible but I’m not sure it will work for long.

I’d appreciate if you can explain whats going on behind the curtain so that I don’t have to shoot in the dark. :stuck_out_tongue:

Also, I was tweaking around with Rest API when this issue initially came and found that I was able to run the query when I sent the parematers: name and encoded_plan via query service on Postman. Prepared statement was available on server for that document but still the service was throwing same error.

the encoded_plan REST API parameter (the one we are not using correctly) is only used if the prepared statement cannot be found in the cache.
That means, that out of the 16384 statements that you have in the cache, the failing statement was not there.

The cache is processed by LRU, by that meaning, that the more current statements are kept at the top, the least frequently used ones eventually get deleted to make space for new statements.

If the statement that you are trying to execute is not found (and it isn’t because you are resorting to encoded_plan), then the cache quite simply is not big enough, and you should increase it as per my suggestion.
But you should also investigate why you do have so many prepared statements that they do not actually fit in the cache in the first place.

@Marco_Greco thanks for clarifying the process. I have two more questions:

  1. As per my last comment, when I was checking for one document for which error was coming, the statement was available in system:prepareds. When I was running the Rest API call manually, it was working and the service was still throwing the same error. Why was this happening?

  2. I observed that for one query, one parameter keeps changing in where clause. and for all those parameters, a different prepared statement is saved in system:prepareds. Is this normal?

The problem I am seeing here is that the statement is available, but still encoded_plan is being inspected. Please look into this.

If encoded_plan is being inspected when the prepared is available, then what that means is that the two plans are mismatching.
I would suspect that you have clients that have been running for a long time, a change has intervened in your schema (eg dropped and recreated an index), and the n1ql service has generated a new plan, while the application still carries the old one.

If you restart the applications, they will carry the new plan, and everything will match.
You can also disable the encoded_plan REST API at the application level, and then there would never be any mismatch.
@daschl, could you remind me on how you do that?

Also, yes you can have duplicated statements in the prepared cache, it’s all down to how your application is written (eg named statements vs anonymous statements, placeholders vs embedding values in the statement text, etc).

If you find that you have lots of prepareds that differ only by one value, you could suggest to your developers to use placeholders.

@Marco_Greco, I suspected the same for #1 but after I restarted the application that is responsible for executing N1QL query in couchbase along with couchbase server, all saved
plans got removed from the server. I could not verify the same on application side as I don’t know how to do that. The error was still coming.

Now, the frequent change I know of is Cluster re-balancing which occurs in an interval of few months on various environments (removing/adding nodes) which requires indexes to be dropped and recreated. But this issue occurred only for one particular query. Shouldn’t all the queries be affected with same issue?

cc @pankaj.sharma