Random IO error using CE 4.5

[Fri Oct 20 15:52:56.962760 2017] [php7:notice] [pid 6829] [client] [cb,EROR] (http-io L:207 I:0) <rd.cb2.example.com:8093>Got error while performing I/O on HTTP stream. Err=0x2e
[Fri Oct 20 15:52:56.962888 2017] [php7:notice] [pid 6829] [client] [cb,INFO] (connection L:465 I:0) <rd.cb2.example.com:8093> (SOCK=0x5606ff091aa0) Starting. Timeout=75000000us
[Fri Oct 20 15:52:57.339802 2017] [php7:notice] [pid 6829] [client] [cb,INFO] (connection L:139 I:0) <rd.cb2.example.com:8093> (SOCK=0x5606ff091aa0) Connected established
[Fri Oct 20 15:52:57.795874 2017] [php7:notice] [pid 6829] [client] [cb,EROR] (http-io L:351 I:0) <rd.cb2.example.com:8093> Failed to add redirect URL (om:8093/query/service/query/service)
[Fri Oct 20 15:52:57.795965 2017] [php7:notice] [pid 6829] [client] [cb,WARN] (pcbc/n1ql L:51) Failed to decode N1QL row as JSON: json_last_error=4. I=0x5606ff0750f0
[Fri Oct 20 15:52:57.796200 2017] [php7:notice] [pid 6829] [client] [cb,EROR] (pcbc/n1ql L:88) Failed to perform N1QL query. 301: . I=0x5606ff0750f0
[Fri Oct 20 15:52:57.796243 2017] [php7:notice] [pid 6829] [client] [cb,INFO] (lcbio_mgr L:463 I:0) <rd.cb2.example.com:8093> (HE=0x5606ff092630) Placing socket back into the pool. I=0x5606ff0ad580,C=0x5606ff091aa0

This is what I get randomly when sending N1QL insert commands to my cluster. The issue is intermittent and cannot be easily reproduced. I verified the json being sent when this error occurred, and it was formatted correctly without any syntax errors.

Is it just that the IO error sent a partial part of the json which then caused a failure? Why would an error like this occur?

Do you have exact error return by query service? Is there any errors in query.log

it seems like N1QL service returns 301 Redirect with empty payload and the client cannot handle body as JSON. I will look at this deeper to come up with reproduction and the fix.

Are there changes to the cluster state at that moment (like rebalance, failover, other failures)?

All the errors I have I have provided. I checked the CB log page in the CB dashboard, but saw nothing of interest there.

There were no failovers, or rebalances. query.log also has no real interesting info in there during the failure.

I don’t see an obvious problem. Are you running 4.5.1?

At this point, I think your best bet is to try to reproduce the problem on a minimal setup: one machine, simple data, simple client. If you got that it would let us repro the problem here and debug it.

could you get tcpdump capture of the traffic on port 8093?