EOF in cbq when dataset is large

I am running a query that joins a 3M sized bucket and performs aggregations over it before filtering the projected results by using “Having”.
When i am running it i get an EOF error by cbq, but when filtering the data-set by another (indexed) property, i get the result within several seconds.
What can cause that EOF error from happening? is there a log with more description of the error?

Still having this problem, even when trying to rephrase the query, still crashes when workingset is large (i guess).
I am running the query via cbq:

cbq> SELECT meta(usr).id user_id, IFMISSINGORNULL(usr.sso_providers.FACEBOOK.email,usr.sso_providers.GOOGLE.email,usr.emails[0]) as email, IFMISSINGORNULL(usr.sso_providers.FACEBOOK.type, usr.sso_providers.GOOGLE.type, “EMAIL”) as source FROM users dvc JOIN users usr ON KEYS dvc.user_id WHERE dvc.last_run BETWEEN 1452067057 AND 1452097057 GROUP BY meta(usr).id EXCEPT SELECT meta(usr).id user_id, IFMISSINGORNULL(usr.sso_providers.FACEBOOK.email,usr.sso_providers.GOOGLE.email,usr.emails[0]) as email, IFMISSINGORNULL(usr.sso_providers.FACEBOOK.type, usr.sso_providers.GOOGLE.type, “EMAIL”) as source FROM users dvc JOIN users usr ON KEYS dvc.user_id WHERE dvc.last_run > 1452097057 GROUP BY meta(usr).id;

ERROR 5000 : Post http://Drippler:99Tv8KAVt52x@localhost:8093/query: EOF

The query uses the GSI that i have created over dvc.last_run

Sorry to hear you are having an N1QL problem. I’m a developer on the N1QL team. I can help.

Let’s try to narrow down the problem by determining whether it is happening on the front end or the back end.

Try sending the query directly to the back end using curl. There’s an example on this page:
http://developer.couchbase.com/documentation/server/4.0/n1ql/n1ql-rest-api/exsuccessful.html

Tried it and after about an hour got: Write failed: Broken pipe

The query takes time to calculate the results since it fetches lots of data.

Full request and response:

curl -v http://Username:password@localhost:8093/query/service -d “statement=SELECT meta(usr).id user_id, IFMISSINGORNULL(usr.sso_providers.FACEBOOK.email,usr.sso_providers.GOOGLE.email,usr.emails[0]) as email, IFMISSINGORNULL(usr.sso_providers.FACEBOOK.type, usr.sso_providers.GOOGLE.type, ‘EMAIL’) as source FROM users dvc JOIN users usr ON KEYS dvc.user_id WHERE dvc.last_run BETWEEN 1452067057 AND 1452097057 GROUP BY meta(usr).id EXCEPT SELECT meta(usr).id user_id, IFMISSINGORNULL(usr.sso_providers.FACEBOOK.email,usr.sso_providers.GOOGLE.email,usr.emails[0]) as email, IFMISSINGORNULL(usr.sso_providers.FACEBOOK.type, usr.sso_providers.GOOGLE.type, ‘EMAIL’) as source FROM users dvc JOIN users usr ON KEYS dvc.user_id WHERE dvc.last_run > 1452097057 GROUP BY meta(usr).id”

  • Hostname was NOT found in DNS cache
  • Trying 127.0.0.1…
  • Connected to localhost (127.0.0.1) port 8093 (#0)
  • Server auth using Basic with user ‘Username’

POST /query/service HTTP/1.1
Authorization: Basic RHJpcHBsZXI6STlUdjhLQVZ0NTK4
User-Agent: curl/7.35.0
Host: localhost:8093
Accept: /
Content-Length: 711
Content-Type: application/x-www-form-urlencoded

  • upload completely sent off: 711 out of 711 bytes
    Write failed: Broken pipe

OK. Try adding the timeout parameter to the request. Set it to considerably longer than you expect the request to take.

timeout
string
[Optional] Maximum time to spend on the request. Default value is “0s”, which means no timeout is applied, i.e. the request runs for however long it takes. There is also a server wide timeout parameter, and the minimum of that and the request timeout is what gets applied. Its format includes an amount and a unit. Valid units are “ns”, “us”, “ms”, “s”, “m”, “h”, for nanoseconds, microseconds, milliseconds, seconds, minutes and hours, respectively. The unit is mandatory. Examples are “10ms”, (which means 10 milliseconds), “1m” (which means 1 minute) and “0.5s” (which means half a second).

Also, could you do an explain-plan on the query and add the results to this thread? Put the “explain” keyword before the plan in CBQ, and it should return a complicated blob of JSON that shows how the system is trying to compute the query.

Tried again and my machine crashed (cannot connect using ssh anymore).
Anyway, EXPLAIN shows that we use the GSIs, however the amount of documents which satisfy the WHERE clause is very large, i guess something like 500k documents.

Explain:
cbq> EXPLAIN SELECT

meta(usr).id user_id,
IFMISSINGORNULL(usr.sso_providers.FACEBOOK.email,usr.sso_providers.GOOGLE.email,usr.emails[0]) as email,
IFMISSINGORNULL(usr.sso_providers.FACEBOOK.type, usr.sso_providers.GOOGLE.type, ‘EMAIL’) as source
FROM users dvc JOIN users usr ON KEYS dvc.user_id
WHERE
dvc.last_run BETWEEN 1452067057 AND 1452097057
GROUP BY meta(usr).id
EXCEPT
SELECT
meta(usr).id user_id,
IFMISSINGORNULL(usr.sso_providers.FACEBOOK.email,usr.sso_providers.GOOGLE.email,usr.emails[0]) as email,
IFMISSINGORNULL(usr.sso_providers.FACEBOOK.type, usr.sso_providers.GOOGLE.type, ‘EMAIL’) as source
FROM users dvc JOIN users usr ON KEYS dvc.user_id
WHERE
dvc.last_run > 1452097057
GROUP BY meta(usr).id;

{
“requestID”: “dc253169-3b40-42e5-b0d8-6a96cf0c73e4”,
“signature”: “json”,
“results”: [
{
#operator”: “ExceptAll”,
“first”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “IndexScan”,
“index”: “SX_DEVICES_LAST_RUN”,
“keyspace”: “users”,
“limit”: 9.223372036854776e+18,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“1.452097057e+09”
],
“Inclusion”: 3,
“Low”: [
“1.452067057e+09”
]
},
“Seek”: null
}
],
“using”: “gsi”
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Fetch”,
“as”: “dvc”,
“keyspace”: “users”,
“namespace”: “default”
},
{
#operator”: “Join”,
“as”: “usr”,
“keyspace”: “users”,
“namespace”: “default”,
“on_keys”: “(dvc.user_id)”
},
{
#operator”: “Filter”,
“condition”: “((dvc.last_run) between 1.452067057e+09 and 1.452097057e+09)”
},
{
#operator”: “InitialGroup”,
“aggregates”: ,
“group_keys”: [
“(meta(usr).id)”
]
}
]
}
},
{
#operator”: “IntermediateGroup”,
“aggregates”: ,
“group_keys”: [
“(meta(usr).id)”
]
},
{
#operator”: “FinalGroup”,
“aggregates”: ,
“group_keys”: [
“(meta(usr).id)”
]
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “InitialProject”,
“result_terms”: [
{
“as”: “user_id”,
“expr”: “(meta(usr).id)”
},
{
“as”: “email”,
“expr”: “ifmissingornull((((usr.sso_providers).FACEBOOK).email), (((usr.sso_providers).GOOGLE).email), ((usr.emails)[0]))”
},
{
“as”: “source”,
“expr”: “ifmissingornull((((usr.sso_providers).FACEBOOK).type), (((usr.sso_providers).GOOGLE).type), "EMAIL")”
}
]
},
{
#operator”: “Distinct”
},
{
#operator”: “FinalProject”
}
]
}
},
{
#operator”: “Distinct”
}
]
},
“second”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “IndexScan”,
“index”: “SX_DEVICES_LAST_RUN”,
“keyspace”: “users”,
“limit”: 9.223372036854776e+18,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: null,
“Inclusion”: 0,
“Low”: [
“1.452097057e+09”
]
},
“Seek”: null
}
],
“using”: “gsi”
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Fetch”,
“as”: “dvc”,
“keyspace”: “users”,
“namespace”: “default”
},
{
#operator”: “Join”,
“as”: “usr”,
“keyspace”: “users”,
“namespace”: “default”,
“on_keys”: “(dvc.user_id)”
},
{
#operator”: “Filter”,
“condition”: “(1.452097057e+09 \u003c (dvc.last_run))”
},
{
#operator”: “InitialGroup”,
“aggregates”: ,
“group_keys”: [
“(meta(usr).id)”
]
}
]
}
},
{
#operator”: “IntermediateGroup”,
“aggregates”: ,
“group_keys”: [
“(meta(usr).id)”
]
},
{
#operator”: “FinalGroup”,
“aggregates”: ,
“group_keys”: [
“(meta(usr).id)”
]
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “InitialProject”,
“result_terms”: [
{
“as”: “user_id”,
“expr”: “(meta(usr).id)”
},
{
“as”: “email”,
“expr”: “ifmissingornull((((usr.sso_providers).FACEBOOK).email), (((usr.sso_providers).GOOGLE).email), ((usr.emails)[0]))”
},
{
“as”: “source”,
“expr”: “ifmissingornull((((usr.sso_providers).FACEBOOK).type), (((usr.sso_providers).GOOGLE).type), "EMAIL")”
}
]
},
{
#operator”: “Distinct”
},
{
#operator”: “FinalProject”
}
]
}
},
{
#operator”: “Distinct”
}
]
}
}
],
“status”: “success”,
“metrics”: {
“elapsedTime”: “148.145007ms”,
“executionTime”: “145.411779ms”,
“resultCount”: 1,
“resultSize”: 9547
}
}