Creating the SGI faster

eldorado · January 2, 2020, 5:30am

@varun.velamuri - what settings we would like to change here ? as I don’t see this : numSliceWriters
Also if you see IO is potential bottleneck how can we confirm from Index logs and anyway to change settings for this ?

varun.velamuri · January 2, 2020, 6:03am

We don’t have to change this setting as max_cpu_percent is set to ‘0’. This means indexer can use all available cores which is 48 in this case.

eldorado · January 2, 2020, 6:07am

Any other meaningful info you see from indexer logs . I see bunch of errors here : does it make any difference ?

2020-01-01T22:35:56.397+00:00 [Info] KVSender::sendShutdownTopic Projector tccbldc-cluster-0000.tccbldc-cluster.bi-cb.svc:9999 Topic MAINT_STREAM_TOPIC_19b6ddb4f3a578506eb29278edca6951
2020-01-01T22:35:56.401+00:00 [Error] KVSender::sendShutdownTopic Unexpected Error During Shutdown Projector tccbldc-cluster-0000.tccbldc-cluster.bi-cb.svc:9999 Topic MAINT_STREAM_TOPIC_19b6ddb4f3a578506eb29278edca6951. Err projector.topicMissing
2020-01-01T22:35:56.401+00:00 [Error] KVSender::closeMutationStream MAINT_STREAM  Error Received projector.topicMissing from tccbldc-cluster-0000.tccbldc-cluster.bi-cb.svc:9999
2020-01-01T22:35:56.401+00:00 [Info] KVSender::closeMutationStream MAINT_STREAM  Treating projector.topicMissing As Success
2020-01-01T22:35:56.401+00:00 [Info] KVSender::sendShutdownTopic Projector tccbldc-cluster-0008.tccbldc-cluster.bi-cb.svc:9999 Topic MAINT_STREAM_TOPIC_19b6ddb4f3a578506eb29278edca6951
2020-01-01T22:35:56.404+00:00 [Error] KVSender::sendShutdownTopic Unexpected Error During Shutdown Projector tccbldc-cluster-0008.tccbldc-cluster.bi-cb.svc:9999 Topic MAINT_STREAM_TOPIC_19b6ddb4f3a578506eb29278edca6951. Err projector.topicMissing
2020-01-01T22:35:56.404+00:00 [Error] KVSender::closeMutationStream MAINT_STREAM  Error Received projector.topicMissing from tccbldc-cluster-0008.tccbldc-cluster.bi-cb.svc:9999
2020-01-01T22:35:56.404+00:00 [Info] KVSender::closeMutationStream MAINT_STREAM  Treating projector.topicMissing As Success
2020-01-01T22:35:56.404+00:00 [Info] KVSender::sendShutdownTopic Projector tccbldc-cluster-0009.tccbldc-cluster.bi-cb.svc:9999 Topic

eldorado · January 2, 2020, 6:11am

And in here :

2020-01-02T02:01:19.659+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 10.42.5.236:9100.  Error = read tcp 10.42.4.225:40144->10.42.5.236:9100: use of closed network connection. Kill Pipe.
2020-01-02T02:01:19.659+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 10.42.4.225:9100.  Error = read tcp 10.42.4.225:56554->10.42.4.225:9100: use of closed network connection. Kill Pipe.
2020-01-02T02:01:19.659+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 10.42.4.225:56554.  Error = EOF. Kill Pipe.
2020-01-02T02:01:19.659+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 10.42.7.88:9100.  Error = read tcp 10.42.4.225:33552->10.42.7.88:9100: use of closed network connection. Kill Pipe.

2020-01-01T22:35:56.449+00:00 [Error] ServiceMgr::getWithAuth Error setting auth Unable to find given hostport in cbauth database: 127.0.0.1:9102’`

2020-01-02T02:01:19.207+00:00 [Warn] CommandListener: Failed to process create index token. Skip /indexing/ddl/commandToken/create/11025158931531399033/0. Internal Error = invalid character ' ' in literal false (expecting 'e').

2020-01-02T04:24:58.904+00:00 [Error] receiving packet: read tcp 10.42.4.225:9103->10.42.4.224:48508: use of closed network connection
2020-01-02T04:24:58.904+00:00 [Error] receiving packet: read tcp 10.42.4.225:9103->10.42.3.114:60276: use of closed network connection
2020-01-02T04:24:58.904+00:00 [Error] DATP[->dataport ":9103"] worker "10.42.4.224:48508" exit: read tcp 10.42.4.225:9103->10.42.4.224:48508: use of closed network connection
2020-01-02T04:24:58.904+00:00 [Error] receiving packet: read tcp 10.42.4.225:9103->10.42.5.230:57952: use of closed network connection
2020-01-02T04:24:58.904+00:00 [Error] receiving packet: read tcp 10.42.4.225:9103->10.42.6.245:50346: use of closed network connection
2020-01-02T04:24:58.904+00:00 [Error] DATP[->dataport ":9103"] worker "10.42.5.230:57952" exit: read tcp 10.42.4.225:9103->10.42.5.230:57952: use of closed network connection
2020-01-02T04:24:58.904+00:00 [Error] DATP[->dataport ":9103"] worker "10.42.3.114:60276" exit: read tcp 10.42.4.225:9103->10.42.3.114:60276: use of closed network connection
2020-01-02T04:24:58.904+00:00 [Error] DATP[->dataport ":9103"] worker "10.42.6.245:50346" exit: read tcp 10.42.4.225:9103->10.42.6.245:50346: use of closed network connection
2020-01-02T04:24:58.910+00:00 [Info] PlasmaSlice Slice Id 0, IndexInstId 17656313694005354126, PartitionId 6 Created recovery point (took 296.792452ms)
2020-01-02T04:24:58.939+00:00 [Info] KVSender::sendDelBucketsRequest Projector tccbldc-cluster-0000.tccbldc-cluster.bi-cb.svc:9999 Topic INIT_STREAM_TOPIC_19b6ddb4f3a578506eb29278edca6951 Buckets [ALL_RATES]
2020-01-02T04:24:58.955+00:00 [Info] Plasma: Adaptive memory quota tuning RSS:40009109504, freePercent:15.204558071964332, currentQuota=48318382080
2020-01-02T04:24:58.971+00:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector tccbldc-cluster-0000.tccbldc-cluster.bi-cb.svc:9999 Topic INIT_STREAM_TOPIC_19b6ddb4f3a578506eb29278edca6951 Buckets [ALL_RATES]. Err genServer.closed
2020-01-02T04:24:58.971+00:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM ALL_RATES Error Received genServer.closed from tccbldc-cluster-0000.tccbldc-cluster.bi-cb.svc:9999
2020-01-02T04:24:58.971+00:00 [Info] KVSender::deleteBucketsFromStream INIT_STREAM ALL_RATES Treating genServer.closed As Success
2020-01-02T04:24:58.971+00:00 [Info] KVSender::sendDelBucketsRequest Projector tccbldc-cluster-0008.tccbldc-cluster.bi-cb.svc:9999 Topic INIT_STREAM_TOPIC_19b6ddb4f3a578506eb29278edca6951 Buckets [ALL_RATES]
2020-01-02T04:24:58.995+00:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector tccbldc-cluster-0008.tccbldc-cluster.bi-cb.svc:9999 Topic INIT_STREAM_TOPIC_19b6ddb4f3a578506eb29278edca6951 Buckets [ALL_RATES]. Err genServer.closed
2020-01-02T04:24:58.995+00:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM ALL_RATES Error Received genServer.closed from

varun.velamuri · January 2, 2020, 6:16am

These errors are not a problem. A build was just finished and a clean-up was done.

eldorado · January 2, 2020, 6:30am

Okay thanks but I need your and @deepkaran.salooja help to nail it down as this is major bottleneck for us to move .
appreciate your help

https://issues.couchbase.com/browse/MB-37366

varun.velamuri · January 2, 2020, 6:40am

Thanks for creating the ticket and sharing the cb-collect logs. The results from our perf cluster shows that for 50M docs non-partitioned index with 11 keys took 640 sec to index docs and partitioned index took 220 sec.

I will try to compare both the logs and identify the bottleneck. I will update my analysis on the ticket

eldorado · January 2, 2020, 6:42am

thank you for sharing details. appreciate it and looking for the update on the bottleneck

eldorado · January 3, 2020, 6:07pm

@varun.velamuri - Any update on the raised ticket please ?

varun.velamuri · January 6, 2020, 12:55pm

@eldorado, I am looking at the ticket. I will update you shortly.

Thanks,
Varun

eldorado · January 6, 2020, 4:06pm

thanks but I don’t see any update on the ticket yet except assignment to other people

varun.velamuri · January 6, 2020, 6:01pm

I have updated the ticket now. Looking at the symptoms, we think that network might be a bottleneck in this case. I have updated my complete analysis on the ticket.

Can you try the following to confirm our hypothesis:

a. Create a non-partitioned index. Measure the network utilisation on all KV nodes while the index build is happening. Let’s say that the build time is t seconds (From your experiment before, it is around 884 seconds. We need to try this index build again so that we can also measure the network utilisation)

b. Build a partitioned index with num_partition as 2. Measure the network utilisation during the index build. If network is a bottleneck, this index build should approximately take 2 * t seconds. Network utilisation should look saturated during this time

Thanks,
Varun

varun.velamuri · January 7, 2020, 8:26am

@eldorado, , You also need to change your index definition. For a partitioned index, the partition key has to be immutable (Index Partitioning | Couchbase Docs). Please change the index definition to partition by hash(meta().id) or some other immutable field (For index ALL_RATES_DXPPART_INDEX). This will also show some partitioned index build performance improvement (It may not reach the non-partitioned index build time till network is no longer a bottleneck)

eldorado · January 8, 2020, 3:23am

@varun.velamuri - Thanks for your report and sorry for delay is responding . Help me understand how should I trace the Network utilisations ? I quite not get it how network will be bottleneck as we are trying to reside data nodes index nodes all in same K8s CB clusters . and Indexer has free access of the data belongs to his own node … so how this will be bottleneck . In this case cluster is 3 nodes so I understand data could be one node and index could be other node where the accessibility of data and index cross node can have network exchange issue but in practical if I do 1 node with data and indexer then it shouldn’t be a problem at all because data/index belongs to one single IP address . Or I am missing something ? In our test with one node having all services index took same long time.
Also to your point creating the index partition by hash(meta().id) has been tested and performance is not great either …took ~30 minutes to build index on 184 M documents. so this approach certainly not working unless we figure out what are the exact issues .
thanks

varun.velamuri · January 8, 2020, 4:33am

@eldorado, With data and indexer on same node, was the index resident 100% in memory? If the index is not 100% resident in memory, then we can not expect the build time to be equal to the case where index is 100% resident in memory. Also, with one node having all services, we need not use partitioned index. Because, all partitions resides on same node.

The index with hash(meta().id) is the correct way to use the index as partition by clause requires immutable fields. Also, when index is build with hash(meta().id), it would transmit less data over the network compared to partition by hash(11 fields) and the index build would be faster if network is a bottleneck.

For tracing the network utilisations, I am not expert in this area but you can start with the iperf command line utility to measure the intra node bandwidth. This will give us the max rate at which data can be transferred between the nodes. In our perf cluster, our intra node bandwidth is set up at 10Gbps. We want to understand this value in your case.

Additionally, you can capture the output of “cat /proc/net/dev” before and after the index build. This will give the data that has been transmitted during the index build phase. This value coupled with the network bandwidth can tell us if network is being a bottleneck or not. I am not sure if there is a more easier way to do this. I will let you know if I find any.

Thanks,
Varun

eldorado · January 8, 2020, 10:34pm

@varun.velamuri

The index we have is SGI and not MOI . so definitely not 100% resident in memory . Are you saying instead of SGI if we can put the Index with MOI we can potentially get the index created faster ?

hash(meta().id) is what we started doing and what you said its perfectly make sense . But performance wise I don’t see a huge gain . Its similar > 20 minutes

I will certainly check the iperf and also seeking advice from our Network team .
cat /proc/net/dev is changing every seconds … I can certainly capture last snapshot before build and after build but giving that its changing every single seconds does it really help you to see packets / drops/ errors there ?
As I don’t see any errors so far .

eldorado · January 9, 2020, 4:07am

@varun.velamuri - I figured out the problem of slowness (may not be completely) after changing indexerThread settings to 1024 it speed up SGI creation process by 2x . but I need some details : how other parameters like ’ memorySnapshotInterval , ’ stableSnapshotInterval affect on index creation time ?

Also I have noted few information below when Index creation progressed
Can you help me why this blocked happened and is this a alert and possible cause for slowdown ?

“mem_throttled”: true,
2020-01-09T03:46:44.504+00:00 [Info] DATP[->dataport “:9103”] DATP → Indexer 0.220047% blocked
2020-01-09T03:46:44.505+00:00 [Info] DATP[->dataport “:9103”] DATP → Indexer 0.220208% blocked
2020-01-09T03:46:44.505+00:00 [Info] DATP[->dataport “:9103”] DATP → Indexer 0.219352% blocked
2020-01-09T03:46:45.583+00:00 [Info] DATP[->dataport “:9103”] DATP → Indexer 0.375356% blocked
2020-01-09T03:46:47.267+00:00 [Info] DATP[->dataport “:9103”] DATP → Indexer 0.587540% blocked
2020-01-09T03:46:48.774+00:00 [Info] DATP[->dataport “:9103”] DATP → Indexer 0.698418% blocked
2020-01-09T03:46:49.918+00:00 [Info] ALL_RATES/ALL_RATES_PART_INDEX/Mainstore#16274952622666069320:2 Plasma: Warning: not enough memory to hold records in memory. Stats:{

2020-01-09T04:15:33.935+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 10.42.3.119:41418. Error = EOF. Kill Pipe.
2020-01-09T04:15:33.935+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 10.42.5.11:9100. Error = read tcp 10.42.3.119:54056->10.42.5.11:9100: use of closed network connection. Kill Pipe.
2020-01-09T04:15:33.935+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 10.42.4.181:9100. Error = read tcp 10.42.3.119:56246->10.42.4.181:9100: use of closed network connection. Kill Pipe.
2020-01-09T04:15:33.935+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 10.42.3.119:9100. Error = read tcp 10.42.3.119:41418->10.42.3.119:9100: use of closed network connection. Kill Pipe.
2020-01-09T04:15:34.057+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 10.30.183.208:42236. Error = EOF. Kill Pipe.

I am thinking later on causing major issue here because the communication lost from Indexer nodes to other data nodes.

varun.velamuri · January 9, 2020, 4:42am

@eldorado, Using MOI will improve index build performance but it might not be an option for you as your index might not fit into memory.

Previously, partition by hash(11 fields) took 42 minutes. Are you saying that partition by hash(meta().id) is taking 20 minutes for the same index?

What I am more interested in about network is: What is the bandwidth available between nodes and how much of it is being utilised while the index is being build.

Thanks,
Varun

eldorado · January 9, 2020, 4:47am

yes after changing the indexThread to 1024 which is max limit I see considerable performance improved and Indexer node CPU is working hard now and it took ~20 mins to build the index as hash(meta().id) which was taking ~40 minutes before this settings change . So I am speculating what are the other HIDDEN settings which can make indexer work hard .
I know you have asked network bandwidth usage which I will try to figure out but in the meantime can you look for the errors which Index build in progress and tell me if that can be ignored and that is cause of bottleneck ?
I see somewhere “mem_throttled”: true, printed so is that could be throttle in memory , CPU other than network also .

varun.velamuri · January 9, 2020, 5:05am

The DATP blocked messages mean that mutations are being pushed but indexer channels are completely full. So, indexer is not receiving new mutations on DATP till the channels become free.

What is the index build time with out changing the index threads while you change the hash key. I am asking this because you did two changes: (a) Change the partition key (b) Increase the number of threads.

We don’t know what is causing the perf improvement. Can you try not changing the indexer threads and change the hash key. I expect this to be around ~20Min. Also, what is the indexer CPU percentage you see after changing to 1024 threads? Does it saturate at 4800%?

mem_throttled: true means that the storage is throttled as there is not enough memory available. You can increase the memory quota. of the indexer node.

Topic		Replies	Views
Index takes too much disk space Couchbase Server	1	1133	July 17, 2020
Slow indexing speed (primary and GSI) Couchbase Server	5	1164	March 19, 2021
Create index is slow on the huge no. of documents Couchbase Server n1ql	5	4501	December 26, 2019
Index building optimization Couchbase Server	4	1471	October 7, 2016
Couchbase GSI Indexes ( Non covering) SQL++	4	1047	August 2, 2017

Creating the SGI faster

Related topics