Problem to have the datas

Hello,

I have a Couchbase base server through Docker images (2 nodes). I upload 123000 nested JSON into the base.
In the JSON file, I have a “datas” field which is a list of objects.
To query the datas inside these objects, I use UNNEST but no success.


SELECT f1.date_record, f1.id_cycle, f1.train, f1.vehicule, f1.dcu, datas.Time, datas.PPOR, datas.IPOR, datas.TPOR
FROM R2N_door_system.opening.f1 UNNEST datas;

And I have this message in the couhbase Webserver:

{
“status”: “No data returned from server”
}

With the Python SDK, I have this message:

couchbase.exceptions.TimeoutException: <RC=0xC9[LCB_ERR_TIMEOUT (201)], HTTP Request failed. Examine ‘objextra’ for full result, Results=1, C Source=(src/pycbc_http.c,197), OBJ=ViewResult<rc=0xC9[LCB_ERR_TIMEOUT (201)], value=b’}', http_status=0, tracing_context=0, tracing_output=None>, Context={‘first_error_code’: 0, ‘http_response_code’: 0, ‘first_error_message’: ‘’, ‘statement’: ‘SELECT f1.date_record, f1.id_cycle, f1.train, f1.vehicule, f1.dcu, datas.Time, datas.PPOR, datas.IPOR, datas.TPOR, datas.FDCF, datas.FDCV FROM R2N_door_system.opening.f1 UNNEST datas;’, ‘client_context_id’: ‘8e66d45c8f488ae3’, ‘query_params’: ‘’, ‘http_response_body’: ‘’, ‘endpoint’: ‘’, ‘type’: ‘QueryErrorContext’}, Tracing Output={“:nokey:0”: null}>

What is the problem ? I know that the datas are OK because when I try this query:

SELECT date_record, id-cycle, train, vehicule, dcu, datas[*].Time, datas[*].IPOR, datas[*].UPOR, datas[*].PPOR 
FROM R2N_door_system.opening.f1

No problem to have the results. But not in a right structure for me.

Regards.

Rémy

Nobody knows why UNNEST causes LCB_ERR_TIMEOUT (201) error ?

To narrow the issue, Try the query in web console or cbq shell or curl
Also with limit and see i that works.

How many documents and how large is the average “datas” array? - you’ll be getting SELECT SUM(ARRAY_LENGTH(datas)) FROM R2N_door_system.opening.f1 results.

Thanks for your reply,

I made the query with UNNEST in the WebConsole and I have:

{
“status”: “No data returned from server”
}

@remy ,

Is datas is ARRAY ? If not UNNEST eliminate them (If needed use LEFT UNNEST). Also try this

SELECT f1.date_record, f1.id_cycle, f1.train, f1.vehicule,
                 f1.dcu, datas.Time, datas.PPOR, datas.IPOR, datas.TPOR
FROM R2N_door_system.opening.f1  AS f1
UNNEST f1.datas AS datas;

Thank for your reply
Hereunder the result of your query :slight_smile:

[
{
“$1”: 10047582
}
]

@remy With 123000 documents the average array length must then be ~81.

If @vsr1’s suggestion makes no difference, is it possible to add a LIMIT clause and profile the query to confirm where the time is spent?

(cbq)

\set -profile timings;
SELECT f1.date_record, f1.id_cycle, f1.train, f1.vehicule,
                 f1.dcu, datas.Time, datas.PPOR, datas.IPOR, datas.TPOR
FROM R2N_door_system.opening.f1  AS f1
UNNEST f1.datas AS datas
LIMIT 1000000;
  • If a LIMIT of 1,000,000 is unsuccessful then try 100,000 too.

(My guess is that the time taken to fetch 123000 documents is close enough to the timeout that the addition of the UNNEST to multiply each ~80 times results in exceeding the timeout. The profile should highlight where the bulk of the time is spent.)

I was surprised it did not give single result because it should start streaming after single result qualified.
try LIMIT 10. If still doesn’t work post the EXPLAIN with sample document

Thanks a lot for your answer. Hereunder the profile (with LIMIT 1000000):

"status": "success",
    "metrics": {
        "elapsedTime": "55.543421917s",
        "executionTime": "55.543316709s",
        "resultCount": 1000000,
        "resultSize": 245927171,
        "serviceLoad": 2
    },
    "profile": {
        "phaseTimes": {
            "authorize": "2.587804ms",
            "fetch": "1.193609946s",
            "filter": "8.130481005s",
            "instantiate": "44.858µs",
            "parse": "1.171928ms",
            "plan": "111.163105ms",
            "primaryScan": "42.451567ms",
            "run": "55.430916329s"
        },
        "phaseCounts": {
            "fetch": 13328,
            "filter": 12812,
            "primaryScan": 13842
        },
        "phaseOperators": {
            "authorize": 1,
            "fetch": 1,
            "filter": 1,
            "primaryScan": 1
        },
        "requestTime": "2022-02-18T15:39:44.066Z",
        "servicingHost": "172.16.101.12:8091",
        "executionTimings": {
            "#operator": "Authorize",
            "#stats": {
                "#phaseSwitches": 4,
                "execTime": "2.45µs",
                "servTime": "2.585354ms"
            },
            "privileges": {
                "List": [
                    {
                        "Target": "default:R2N_door_system.opening.f1",
                        "Priv": 7,
                        "Props": 0
                    }
                ]
            },
            "~child": {
                "#operator": "Sequence",
                "#stats": {
                    "#phaseSwitches": 1,
                    "execTime": "10.326µs"
                },
                "~children": [
                    {
                        "#operator": "PrimaryScan3",
                        "#stats": {
                            "#heartbeatYields": 2,
                            "#itemsOut": 13841,
                            "#phaseSwitches": 55369,
                            "execTime": "24.80946ms",
                            "kernTime": "55.385727993s",
                            "servTime": "17.667504ms"
                        },
                        "as": "f1",
                        "bucket": "R2N_door_system",
                        "index": "#primary",
                        "index_projection": {
                            "primary_key": true
                        },
                        "keyspace": "f1",
                        "namespace": "default",
                        "scope": "opening",
                        "using": "gsi"
                    },
                    {
                        "#operator": "Fetch",
                        "#stats": {
                            "#heartbeatYields": 2,
                            "#itemsIn": 13329,
                            "#itemsOut": 13324,
                            "#phaseSwitches": 53364,
                            "execTime": "108.120232ms",
                            "kernTime": "54.234581867s",
                            "servTime": "1.085507691s"
                        },
                        "as": "f1",
                        "bucket": "R2N_door_system",
                        "keyspace": "f1",
                        "namespace": "default",
                        "scope": "opening"
                    },
                    {
                        "#operator": "Sequence",
                        "#stats": {
                            "#phaseSwitches": 2,
                            "execTime": "308.678µs",
                            "kernTime": "109ns",
                            "state": "running"
                        },
                        "~children": [
                            {
                                "#operator": "Filter",
                                "#stats": {
                                    "#itemsIn": 12812,
                                    "#itemsOut": 12811,
                                    "#phaseSwitches": 51250,
                                    "execTime": "8.130516232s",
                                    "kernTime": "47.2977198s"
                                },
                                "condition": "is_array((`f1`.`datas`))"
                            },
                            {
                                "#operator": "Unnest",
                                "#stats": {
                                    "#heartbeatYields": 144,
                                    "#itemsIn": 12299,
                                    "#itemsOut": 1000510,
                                    "#phaseSwitches": 2025622,
                                    "execTime": "18.386252616s",
                                    "kernTime": "37.041984296s"
                                },
                                "as": "datas",
                                "expr": "(`f1`.`datas`)",
                                "filter": "(`datas` is not missing)"
                            }
                        ]
                    },
                    {
                        "#operator": "InitialProject",
                        "#stats": {
                            "#itemsIn": 1000001,
                            "#itemsOut": 1000000,
                            "#phaseSwitches": 3000006,
                            "execTime": "32.942177451s",
                            "kernTime": "495.974539ms"
                        },
                        "result_terms": [
                            {
                                "expr": "(`f1`.`date_record`)"
                            },
                            {
                                "expr": "(`f1`.`id_cycle`)"
                            },
                            {
                                "expr": "(`f1`.`train`)"
                            },
                            {
                                "expr": "(`f1`.`vehicule`)"
                            },
                            {
                                "expr": "(`f1`.`dcu`)"
                            },
                            {
                                "expr": "(`datas`.`Time`)"
                            },
                            {
                                "expr": "(`datas`.`PPOR`)"
                            },
                            {
                                "expr": "(`datas`.`IPOR`)"
                            },
                            {
                                "expr": "(`datas`.`TPOR`)"
                            }
                        ]
                    },
                    {
                        "#operator": "Limit",
                        "#stats": {
                            "#itemsIn": 1000000,
                            "#itemsOut": 1000000,
                            "#phaseSwitches": 3000002,
                            "execTime": "244.590616ms",
                            "kernTime": "21.592877202s"
                        },
                        "expr": "1000000"
                    },
                    {
                        "#operator": "Stream",
                        "#stats": {
                            "#itemsIn": 1000000,
                            "#itemsOut": 1000000,
                            "#phaseSwitches": 2000004,
                            "execTime": "15.576071528s",
                            "kernTime": "39.852182036s"
                        }
                    }
                ]
            },
            "~versions": [
                "7.0.2-N1QL",
                "7.0.3-7031-enterprise"
            ]
        }
    }
}

To be honest, I don’t understand how use these informations !!!

Rémy

An hereunder the EXPLAIN JSON:

{
    "#operator": "Sequence",
    "~children": [
        {
            "#operator": "PrimaryScan3",
            "as": "f1",
            "bucket": "R2N_door_system",
            "index": "#primary",
            "index_projection": {
                "primary_key": true
            },
            "keyspace": "f1",
            "namespace": "default",
            "scope": "opening",
            "using": "gsi"
        },
        {
            "#operator": "Fetch",
            "as": "f1",
            "bucket": "R2N_door_system",
            "keyspace": "f1",
            "namespace": "default",
            "scope": "opening"
        },
        {
            "#operator": "Parallel",
            "~child": {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "Filter",
                        "condition": "is_array((`f1`.`datas`))"
                    },
                    {
                        "#operator": "Unnest",
                        "as": "datas",
                        "expr": "(`f1`.`datas`)",
                        "filter": "(`datas` is not missing)"
                    }
                ]
            }
        },
        {
            "#operator": "Parallel",
            "~child": {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "InitialProject",
                        "result_terms": [
                            {
                                "expr": "(`f1`.`date_record`)"
                            },
                            {
                                "expr": "(`f1`.`id_cycle`)"
                            },
                            {
                                "expr": "(`f1`.`train`)"
                            },
                            {
                                "expr": "(`f1`.`vehicule`)"
                            },
                            {
                                "expr": "(`f1`.`dcu`)"
                            },
                            {
                                "expr": "(`datas`.`Time`)"
                            },
                            {
                                "expr": "(`datas`.`PPOR`)"
                            },
                            {
                                "expr": "(`datas`.`IPOR`)"
                            },
                            {
                                "expr": "(`datas`.`TPOR`)"
                            }
                        ]
                    }
                ]
            }
        }
    ]
}

Clearly tells Query service returned 1M itmes. Not sure why client side returns no data

You have a timeout of 60 seconds and your query will take somewhat more time to stream all the results to you. It worked with the LIMIT 1,000,000 because the complete results streaming took only ~56 seconds. As @vsr1 noted, they will start streaming back to the client long before the timeout - this must be happening.

The bulk of the time is spent formatting the results for the wire and actually streaming them - this then comes down to how quickly you can consume them.

If running through cbq or curl, you can also set pretty = false (\set -pretty false; in cbq) to reduce the load but I doubt it will allow you to stream all 10 million results in 60 seconds. (In my test this doubles the throughput in 60 seconds - will still be far short of the 10+ times you’d need to stream the entire result set.)

The other option would be to change your timeout to allow this query to complete - perhaps set it to 0 (i.e. disable it) then see how long it actually takes? You could then perhaps adjust your timeout value accordingly, or limit/filter to meet your desired timeout setting.

In the profile I can see little time was actually spent in the index and fetch operators, ~8s in the filter (to confirm an array for UNNEST) and ~18s in the UNNEST itself. The projection (~32s) and streaming (~15s) will I suspect come to dominate over a larger result set.

By using CBQ, I’m able to read all the 123423 documents. Of course, as I make query with UNNEST, I have a final document with 10047582 rows and 11 columns.

So it’s clearly a problem with the Python connector.

As I really need to make the extraction from the base with the Python connector, do you have an idea to tweak the Python connector ? To do not have error message.
How is possible to disable the timeout parameter ? By which way ?

Regards.

Rémy

All SDKs sets default timeout to 75s. Check documentation and increase timeout for specific request.

checkout How to raise timeout threshold in Python SDK for analytics_query - #2 by naftali query_timeout

cc @jcasey

Quick sample below to show adjusting the timeouts. You can adjust the global query timeout and also set the timeout on a per-query basis. The timeout used will be the lower of the two.

Hope this helps.

from datetime import timedelta

from couchbase.cluster import Cluster, ClusterOptions, ClusterTimeoutOptions, QueryOptions
from couchbase.auth import PasswordAuthenticator

def run_sample_code():
    try:
        host = "couchbase://localhost"
        username = "Administrator"
        pw = "password"
        # default query timeout = 75s
        options = ClusterTimeoutOptions(kv_timeout=timedelta(seconds=60),
                                        query_timeout=timedelta(seconds=120))

        cluster = Cluster(host, ClusterOptions(
            PasswordAuthenticator(username, pw), timeout_options=options))

        query_iter = cluster.query("SELECT * FROM `travel-sample` LIMIT 100")
        for r in query_iter.rows():
            print(f"Found row: {r}")

        # if the timeout is set the in QueryOoptions, the 
        # timeout will be the lower between the global timeout 
        # timeout provided in the QueryOptions
        q_opts = QueryOptions(timeout=timedelta(seconds=1))
        query_iter = cluster.query("SELECT * FROM `travel-sample` LIMIT 10000", q_opts)
        count = 0
        for r in query_iter.rows():
            count += 1
        print(f"Got {count} rows")

    except Exception:
        import traceback
        traceback.print_exc()


if __name__ == '__main__':
    run_sample_code()

1 Like

Thanks for your help,

Your proposal make the job. I can now go ahead …

Thanks a lot.

Rémy