Batch insertion - Strange behavior in ops/sec

I’ve batch inserted 10k documents in the database to populate it. After I’ve done this insertion, I can see periodic spikes in the number of operations per second (in the order of 5k ops/sec) in the bucket, which slows the application massively (a single query that usually takes <1 second took more than 54 seconds).

I’ve then proceeded to flush the bucket, and I can see that even with an empty bucket, those spikes still occur. I then deleted the bucket and recreated it under the same name, only to find that the number of operations per second still rises from time to time.

Since this is a small dev environment, we currently have a single node with 2Gb RAM, with 80Gb disk. Here’s a picture of the current status of the server:

What causes this behavior? How can I solve it? Let me know if you need more information.

The graph indicates insertion of more than 10K documents.

Can you post the explain for your queries?
Also, if you’re using 4.5.1, post: SELECT * FROM system:completed_requests;

Hi @keshav_m. We only inserted 10k documents (as you can see in the second image), I don’t know the meaning of the subsequent spikes.

The result of the query above yields 4000 results, w/ success status in 1.36s. Here’s the first one:

{
    "completed_requests": {
      "ElapsedTime": "13.313917777s",
      "ErrorCount": 0,
      "PhaseCounts": {
        "Fetch": 9841,
        "PrimaryScan": 9841,
        "Sort": 10
      },
      "PhaseOperators": {
        "Fetch": 1,
        "PrimaryScan": 1,
        "Sort": 1
      },
      "RequestId": "da8f0981-a3b0-4a0e-8745-9fe693a30938",
      "ResultCount": 5,
      "ResultSize": 2726,
      "ServiceTime": "13.312824699s",
      "State": "completed",
      "Statement": "*removed for privacy*",
      "Time": "2016-10-25 17:30:59.906734729 +0000 UTC"
    }
  },

Which are you using to insert the documents, N1QL INSERT or key-value?

Key-value, we’re using an ODM that does that for us. We insert one document at a time. Is it a bad approach?

I just wanted @keshav_m to have that context as he is helping you.

1 Like

@manusyone

From the completed_requests, I see your queries have the following:

  1. PRIMARY Scan – that’ll scan the entire data set (10K) to produce 5 documents.
  2. There is a SORT operation.

Please see the N1QL articles in DZone to design the right indexes and avoid ORDER BY, if possible.

@keshav_m And you believe the lack of indexes is the cause for such an enormous impact with such a low number of documents?

Regarding ORDER BY, this will be required for some operations, as the results are populating a listing that allows for sorting. I’ll take a look into creating indexes for my data.

@keshav_m, @geraldss, just to give you some more context, I’ve inserted 10k documents containing info about cities (name, country, coordinates…). All other documents regard other things like Users. Do you have any suggestions on how should I not account for those 10k documents when making operations in the other documents?