Load test indexer timeout

Index scan timeout after few hour of load . we have 3 node and all service as shown below.
After few hour of load all queries taking more time compare to initial load.
we have 4.6 million records on bucket.
items

what should we check to tune this issue?

2019-10-25T00:04:37.034+08:00 [Warn] scan failed: requestId 8f8a60a6-7cb6-4f59-ad6f-7c6aae896164 queryport 10.131.108.13:9101 inst 14761998452382644509 partition [0]
2019-10-25T00:04:37.034+08:00 [Warn] Scan failed with error for index 1995489534182958865. Trying scan again with replica, reqId:8f8a60a6-7cb6-4f59-ad6f-7c6aae896164 : Index scan timed out from [10.131.108.13:9101] …
2019-10-25T00:04:37.034+08:00 [Error] PickRandom: Fail to find indexer for all index partitions. Num partition 1. Partition with instances 0
2019-10-25T00:04:37.034+08:00 [Warn] Fail to find indexers to satisfy query request. Trying scan again for index 1995489534182958865, reqId:8f8a60a6-7cb6-4f59-ad6f-7c6aae896164 : Index scan timed out from [10.131.108.13:9101] …
2019-10-25T00:04:37.034+08:00 [Error] [GsiScanClient:“10.131.108.13:9101”] Scans(8f8a60a6-7cb6-4f59-ad6f-7c6aae896164) response failed Index scan timed out
2019-10-25T00:04:37.034+08:00 [Warn] scan failed: requestId 8f8a60a6-7cb6-4f59-ad6f-7c6aae896164 queryport 10.131.108.13:9101 inst 14761998452382644509 partition [0]
2019-10-25T00:04:37.034+08:00 [Warn] Scan failed with error for index 1995489534182958865. Trying scan again with replica, reqId:8f8a60a6-7cb6-4f59-ad6f-7c6aae896164 : Index scan timed out from [10.131.108.13:9101] …
2019-10-25T00:04:37.034+08:00 [Error] PickRandom: Fail to find indexer for all index partitions. Num partition 1. Partition with instances 0
2019-10-25T00:04:37.034+08:00 [Warn] Fail to find indexers to satisfy query request. Trying scan again for index 1995489534182958865, reqId:8f8a60a6-7cb6-4f59-ad6f-7c6aae896164 : Index scan timed out from [10.131.108.13:9101] …
2019-10-25T00:04:37.034+08:00 [Info] GSIC[default/optima-1566884730456027365] request(b97761b2-56c9-4c07-b47f-85d2156a6a0a) removing backfill file /opt/couchbase/var/lib/couchbase/tmp/scan-results16574128825450 …
2019-10-25T00:04:37.181+08:00 [Info] GSIC[default/optima_contact-1566884916609657008] logstats “optima_contact” {“gsi_scan_count”:3353784,“gsi_scan_duration”:116680731947867528,“gsi_throttle_duration”:1049095740668,“gsi_prime_duration”:71196097182147836,“gsi_blocked_duration”:4466097771965,“gsi_totalbackfills”:147701}

@mohamedw, default index scan timeout is 2 mins. So, just by looking at logs, it looks like indexer is taking a lot of time respond. Will you please provide more information like which couchbase server version you are using? community/enterprise edition? Number of rows expected to be returned by indexer for the queries that are failing?

Also check the CPU consumption on the machine running indexer process (may be node 10.131.108.13?).

Also, can you please get the CPU profile of indexer process (especially from the time of these index scan timeouts) and attach it here?

Command to get the CPU profile is:

curl -X GET -u <cluster-admin-username>:<password> http://<ip>:9102/debug/pprof/profile > cpu_profile.pprof

Thanks.

Thanks amit,

  • enterprise version
  • 5.5.4-4338
  • yes its taking time during peak load. let me try to get the cpu profile with next run.

Hi Amit,

PFA attached CPU profiling, tried to simple N1QL query using rest when using more user we find following as CPU for indexer process and also response time increased to average 15 sec which is not the case when executing only couple of users in parallel.

This is not actual load but i tried as team on vacation can you please check this and help what can be tuned?

top - 11:31:45 up 85 days, 8:02, 3 users, load average: 1.94, 2.05, 2.11
Tasks: 235 total, 1 running, 234 sleeping, 0 stopped, 0 zombie
%Cpu(s): 52.6 us, 2.0 sy, 0.0 ni, 44.8 id, 0.0 wa, 0.0 hi, 0.6 si, 0.0 st
KiB Mem : 32763912 total, 4732904 free, 16274596 used, 11756412 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 16112852 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6529 couchba+ 20 0 7943148 5.5g 13764 S 198.0 17.5 14866:38 indexer
4146 couchba+ 20 0 2198364 1.0g 3648 S 12.7 3.2 22269:12 beam.smp cpuprofile.zip (65.6 KB)