We have several environments set up. On some of them, N1QL queries are working correctly. On others, they are not working. All environments are running the same version (4.1.0). After searching this site and others, I tracked the issue down to the query port (8093) not being listened to.
I attempted contacting it through cbq on the box couchbase is installed and got this:
cbq> select * from default limit 1; [31m ERROR 5000 : Post http://localhost:8093/query: dial tcp 127.0.0.1:8093: ConnectEx tcp: No connection could be made because the target machine actively refused it. ←[0m
I looked through the logs but couldn’t find where anything relevant was recorded. Both servers where 8093 is closed had this in their logs, but it’s unclear if it’s related:
Service 'goxdcr' exited with status 1. Restarting. Messages: MetadataService 2016-11-30T04:41:08.908Z [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: ConnectEx tcp: No connection could be made because the target machine actively refused it., num_of_retry=3
MetadataService 2016-11-30T04:41:08.908Z [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: ConnectEx tcp: No connection could be made because the target machine actively refused it., num_of_retry=4
RemoteClusterService 2016-11-30T04:41:08.908Z [ERROR] Failed to get all entries, err=metakv failed for max number of retries = 5
Error starting remote cluster service. err=metakv failed for max number of retries = 5
[goport] 2016/11/30 04:41:08 c:/Program Files/Couchbase/Server/bin/goxdcr.exe terminated: exit status 1
Where is a good place to begin troubleshooting this issue?
Note that Windows 10 Anniversary Update is not currently a supported platform, and Couchbase doesn’t work on it yet. The 4.6.0 Developer Preview build does work, and Windows 10 AU will be supported once 4.6.0 is released.
Our Couchbase installs were all done by a scripted process that was run in the same manner on all of the boxes. It is working on some and not others.
When I ran the installer for 4.1.0 on my development box, it did not give any options for a partial install. It installed everything including query. In fact, the only option presented at all was the directory the files are copied to. After the software is installed and the initial setup webpages showed up from http://localhost:8091/, there was no mention of query. Despite this, the port is open and it accepts queries.
Are you thinking that the query parts were not installed at all? Or more that they were installed and are failing to start somehow?
Is there a configuration file that might show if query is enabled?
It might be that query was never installed at all. But i can’t verify that until I see the logs. If the query service is enabled then you will have query logs.
A couple of questions
For the nodes that didn’t install query, even though the script is the same, can you see the UI on that node and see what services have been enabled ?
When you say, scripted manner on all of the boxes, what does boxes refer to ? Docker instances ? VMs? AWS instances ? Local machines ?
If you see Data and/or Index services then that means that only query service was not brought up, in which case can you upload all the logs for the nodes where query was not successfully installed ?
(Maybe you can upload to dropbox and add a link here so that i can download it and take a look.)