Different result set when using the 'Use Index' hint

wai.kwang.mak · October 7, 2016, 8:13am

I had created an index as follow.
CREATE INDEX namedindex ON namedbucket((kvp.token),(kvp.expiryepoch),((kvp.resources).resources),(distinct (array [((k.keys).base), ((k.keys).term), ((k.keys).source), ((k.keys).type)] for k in ((kvp.resources).resources) end)))

With 1, I am able to retrieve the record as expected, while 2 gives me no record. I also note that 2 will return result IF the key size of that specific document is smaller (it might be due the maximal size of the index key)

select * from namedbucket
where
kvp.token = ‘d507f6cc-dab2-41a2-bfe6-e0c84d3e0b5d’ AND
kvp.expiryepoch > 1475910798011
AND
ANY k in kvp.resources.resources SATISFIES
(k.keys.base = ‘eur’ OR k.keys.base = ‘’) AND
(k.keys.term = ‘usd’ OR k.keys.term = '’) AND
(k.keys.source = ‘at’ OR k.keys.source = ‘’) AND
(k.keys.type = ‘SPOT’ OR k.keys.type = '’) AND
ARRAY_CONTAINS(k.rights, ‘readkvp’)
END
select * from namedbucket USE INDEX (namedindex USING GSI)
where
kvp.token = ‘d507f6cc-dab2-41a2-bfe6-e0c84d3e0b5d’ AND
kvp.expiryepoch > 1475910798011
AND
ANY k in kvp.resources.resources SATISFIES
(k.keys.base = ‘eur’ OR k.keys.base = ‘’) AND
(k.keys.term = ‘usd’ OR k.keys.term = '’) AND
(k.keys.source = ‘at’ OR k.keys.source = ‘’) AND
(k.keys.type = ‘SPOT’ OR k.keys.type = '’) AND
ARRAY_CONTAINS(k.rights, ‘readkvp’)
END

prasad · October 7, 2016, 4:09pm

Hi @wai.kwang.mak,
what is the total index key size? By default, it shouldn’t exceed 10K. Look for any errors in indexer and query logs. You have kvp.resources.resources as one of the index keys, which seems to be large object in itself.

Try to set the indexer setting max_array_seckey_size appropriately:
curl -X POST http://localhost:9102/settings -u Administrator:asdasd -d '{"indexer.settings.max_array_seckey_size": 51200}’

-Prasad

geraldss · October 7, 2016, 5:18pm

Hi @prasad, @siri, @deepkaran.salooja, the index lookup should fail. We cannot have index returning wrong results.

Siri · October 7, 2016, 5:52pm

Yes - if the problem here is indeed item being too large and being skipped (there should be a warning in the indexer log file as Prasad points out), we will provide the option to not use such an index in upcoming release.

Please also note that we do not advise changing max_array_seckey_size significantly without carefully evaluating its effect on memory and throughput first.

Finally, it is not always desirable to skip using an index that has skipped documents not meeting indexable criteria. So while the default behavior may be to skip in the future, the user definitely should have the option to keep using such an index.

geraldss · October 7, 2016, 8:04pm

Sounds good, @siri . Let’s define a new index state, and we will provide an option for queries to either accept or reject indexes in that state.

wai.kwang.mak · October 10, 2016, 1:30am

@prasad I am certain that the missing record has index larger than 10k but nevertheless I would think 4k is enough to skip the record.
Here is an extraction from the documentation in
http://developer.couchbase.com/documentation/server/4.5/n1ql/n1ql-language-reference/createindex.html

The total size of the index keys cannot exceed 4K for a single document. Index key size is calculated using the total size of all the expressions being indexed in a single document. If an index keys size exceeds 4K, it will be skipped. The following error is logged to indicate that an item is skipped when building the index: “Encoded secondary key is too long” in the indexer.log file. The indexer.log file is included in cbcollect_info output.

@geraldss @siri
Indeed. I believe that by default the creation of such index should fail if there are records that are larger than 4k. Once created, any records that are larger than 4k should fail when inserting.

petojurkovic · March 23, 2017, 4:41pm

@siri @geraldss
We are facing issues with the default 10K size. How do you recommend to optimize/set the size based on following?

This is an index which is used
CREATE INDEX ctx2-indexONactivity((-createdAt),(distinct (array tfortwithinctx end)))

If I do a query which hitting given index, the response is: (it is quite nondeterministic)

[
  {
    "code": 5000,
    "msg": "dial tcp 127.0.0.1:9101: connection refused from 127.0.0.1:9101 - cause: dial tcp 127.0.0.1:9101: connection refused from 127.0.0.1:9101",
    "query_from_user": "SELECT RAW META().id FROM `activity` USE INDEX (`ctx2-index`)\nWHERE -createdAt IS NOT NULL AND ANY t WITHIN `ctx` SATISFIES t LIKE 'searchTerm' END \nLIMIT 100 OFFSET 0"
  }
]

Indexer Log

panic: runtime error: slice bounds out of range
goroutine 161 [running]:
panic(0xf293a0, 0xc820016050)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.6/go/src/runtime/panic.go:464 +0x3e6 fp=0xc8200e1990 sp=0xc8200e1910
runtime.panicslice()
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.6/go/src/runtime/panic.go:21 +0x49 fp=0xc8200e19b8 sp=0xc8200e1990
github.com/couchbase/indexing/secondary/collatejson.(*Codec).code2json(0xc8200bb6a0, 0xc824916018, 0x2fcf, 0x101a, 0xc8248e2000, 0x13, 0x4000, 0x0, 0x0, 0x0, ...)
	/home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/collatejson/collate.go:361 +0x159a fp=0xc8200e1b80 sp=0xc8200e19b8
github.com/couchbase/indexing/secondary/collatejson.(*Codec).Decode(0xc8200bb6a0, 0xc824916004, 0x2fe3, 0x3ffc, 0xc8248e2000, 0x0, 0x4000, 0x0, 0x0, 0x0, ...)
	/home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/collatejson/collate.go:169 +0xb7 fp=0xc8200e1c00 sp=0xc8200e1b80
github.com/couchbase/indexing/secondary/indexer.secondaryIndexEntry.ReadSecKey(0xc824916004, 0x2fec, 0x3ffc, 0xc8248e2000, 0x0, 0x4000, 0x0, 0x0, 0x0, 0x0, ...)
	/home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/index_entry.go:217 +0x1fc fp=0xc8200e1cd8 sp=0xc8200e1c00
github.com/couchbase/indexing/secondary/indexer.siSplitEntry(0xc824916004, 0x2fec, 0x3ffc, 0xc8248e2000, 0x0, 0x4000, 0x0, 0x0, 0x0, 0x0, ...)
	/home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/scan_pipeline.go:238 +0xbd fp=0xc8200e1da8 sp=0xc8200e1cd8
github.com/couchbase/indexing/secondary/indexer.(*IndexScanDecoder).Routine(0xc82015c870, 0x0, 0x0)
	/home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/scan_pipeline.go:161 +0x466 fp=0xc8200e1ef8 sp=0xc8200e1da8
github.com/couchbase/indexing/secondary/pipeline.(*Pipeline).runIt.func1(0xc8232c5d20, 0xc8237f1580)
	/home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/pipeline/pipeline.go:75 +0x5d fp=0xc8200e1f90 sp=0xc8200e1ef8
runtime.goexit()
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.6/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc8200e1f98 sp=0xc8200e1f90
created by github.com/couchbase/indexing/secondary/pipeline.(*Pipeline).runIt
	/home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/pipeline/pipeline.go:80 +0x62
goroutine 1 [select]:
runtime.gopark(0x1322e00, 0xc826409228, 0x10bbef8, 0x6, 0x18, 0x2)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.6/go/src/runtime/proc.go:262 +0x163 fp=0xc826408fc0 sp=0xc826408f98
runtime.selectgoImpl(0xc826409228, 0x0, 0x18)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.6/go/src/runtime/select.go:392 +0xa67 fp=0xc826409180 sp=0xc826408fc0
runtime.selectgo(0xc826409228)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.6/go/src/runtime/select.go:215 +0x12 fp=0xc8264091a0 sp=0xc826409180
github.com/couchbase/indexing/secondary/indexer.(*indexer).run(0xc8201078c0)
	/home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/indexer.go:479 +0x292 fp=0xc8264092f8 sp=0xc8264091a0
github.com/couchbase/indexing/secondary/indexer.NewIndexer(0xc820114930, 0x0, 0x0, 0x0, 0x0)
	/home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/indexer.go:365 +0x3c0e fp=0xc826409c20 sp=0xc8264092f8
main.main()
	/home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/cmd/indexer/main.go:153 +0x15b5 fp=0xc826409f40 sp=0xc826409c20
runtime.main()
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.6/go/src/runtime/proc.go:188 +0x2b0 fp=0xc826409f90 sp=0xc826409f40
runtime.goexit()
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.6/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc826409f98 sp=0xc826409f90
goroutine 17 [syscall, locked to thread]:

These are an example of indexed docs:
ctx is an dynamic object. It needs to be indexed and search-able by values. Properties name can be ignored.

{
	
	"ctx" : {
		"randomKey" : "randomValue",
		"randomKey2" : {
			"innerKey" : "value",
			"arrayKey" : ["A","B"]
		}
	}
},
{
	
	"ctx" : {
		"nexKey" : "val",
		"num" : 123
	}
}

I’ve changed the max_array_seckey_size to 20K - but some queries are still failing. (Using Couchbase 4.5 Community - the latest stable )

What do you recommend in this case? How should be calculated max_array_seckey_size ? A bucket contains ~2M documents. Just values needs to be search-able, keys can be ignored.

vsr1 · March 23, 2017, 8:35pm

In 4.6 we have TOKENS() function for this Couchbase SDKs
https://dzone.com/articles/more-than-like-efficient-json-search-with-couchbas

petojurkovic · March 23, 2017, 8:46pm

@vsr1 Thanks, I am aware of this function, unfortunately we are not able to use it, because it is not available in CB 4.5.

In terms of this TOKENS function. For this function it is also important max_array_seckey_size or Am I mistaken? When supposes to be released CB 4.6 Community?

Thanks

vsr1 · March 23, 2017, 8:55pm

If documents are big max_array_seckey_size is important. @don might able to answer CB 4.6 CE.

petojurkovic · March 24, 2017, 1:30pm

Event after increasing max_array_seckey_size 4KB (which is 4x the default value), I see see an indexer errors after the query is submitted. The strange is that an index creation was successful (even the indexer log was clear of warns/errors)

CREATE INDEX `ctx2-index` ON `activity`((-`createdAt`),(distinct (array `t` for `t` within `ctx` end)))

Query

SELECT * FROM `activity`  
WHERE -createdAt IS NOT NULL AND ANY t WITHIN `ctx` SATISFIES t LIKE 'searchTerm' END 
ORDER BY -createdAt
LIMIT 100 OFFSET 0

Is there any way how to resolve this issue?

this is the indexer config

{
   "indexer.settings.bufferPoolBlockSize":16384,
   "indexer.settings.compaction.abort_exceed_interval":false,
   "indexer.settings.compaction.check_period":30,
   "indexer.settings.compaction.compaction_mode":"circular",
   "indexer.settings.compaction.days_of_week":"sunday,monday,tuesday,wednesday,thursday,friday,saturday",
   "indexer.settings.compaction.interval":"00:00,00:00",
   "indexer.settings.compaction.min_frag":30,
   "indexer.settings.compaction.min_size":524288000,
   "indexer.settings.cpuProfFname":"",
   "indexer.settings.cpuProfile":false,
   "indexer.settings.fast_flush_mode":true,
   "indexer.settings.gc_percent":100,
   "indexer.settings.inmemory_snapshot.fdb.interval":200,
   "indexer.settings.inmemory_snapshot.interval":200,
   "indexer.settings.inmemory_snapshot.moi.interval":20,
   "indexer.settings.largeSnapshotThreshold":200,
   "indexer.settings.log_level":"info",
   "indexer.settings.maxVbQueueLength":0,
   "indexer.settings.max_array_seckey_size":40960,
   "indexer.settings.max_cpu_percent":0,
   "indexer.settings.max_writer_lock_prob":20,
   "indexer.settings.memProfFname":"",
   "indexer.settings.memProfile":false,
   "indexer.settings.memory_quota":1073741824,
   "indexer.settings.minVbQueueLength":250,
   "indexer.settings.moi.debug":false,
   "indexer.settings.moi.persistence_threads":8,
   "indexer.settings.moi.recovery_threads":4,
   "indexer.settings.persisted_snapshot.fdb.interval":5000,
   "indexer.settings.persisted_snapshot.interval":5000,
   "indexer.settings.persisted_snapshot.moi.interval":600000,
   "indexer.settings.persisted_snapshot_init_build.fdb.interval":5000,
   "indexer.settings.persisted_snapshot_init_build.interval":5000,
   "indexer.settings.persisted_snapshot_init_build.moi.interval":600000,
   "indexer.settings.recovery.max_rollbacks":5,
   "indexer.settings.scan_getseqnos_retries":30,
   "indexer.settings.scan_timeout":120000,
   "indexer.settings.send_buffer_size":1024,
   "indexer.settings.sliceBufSize":50000,
   "indexer.settings.smallSnapshotThreshold":30,
   "indexer.settings.statsLogDumpInterval":60,
   "indexer.settings.storage_mode":"forestdb",
   "indexer.settings.wal_size":4096,
   "projector.settings.log_level":"warn",
   "queryport.client.settings.poolOverflow":30,
   "queryport.client.settings.poolSize":1000
}

Siri · March 31, 2017, 6:42am

@petojurkovic What is the error you see in indexer logs when issuing query? Is it same as the stack that appeared earlier in this thread, collatejson.(*Codec).code2json?

Topic		Replies	Views
Selecting from array using array index returns empty results. But correct results with USE KEYS SQL++ query , n1ql , index	6	1130	April 20, 2018
Query on indexed array gives unexpected result SQL++	14	1369	August 31, 2017
SUFFIXES on SubDocument SQL++ index	2	922	April 18, 2018
Why does my query not use the index? SQL++ n1ql , index	4	1592	October 5, 2016
Is there another way to use a specific index, rather than using the use index clause , in a N1QL select query? SQL++	10	1194	November 30, 2018

Different result set when using the 'Use Index' hint

Related topics