How to model tags

konstantin_dev · September 13, 2022, 6:17pm

Ok wow. This changed everything - in every dimension.
The data must be stored in an array not map - but after this it can be used directly by bleeve

previous:

Disksize of the index: > 1 TB
Index speed > 92h
Query speed (depending on query ~1-60sec)
Query aggregates (e.g. count by tags) > 15min+

Specs from the small test dataset:

22million rows
1500 tags+
average tags ~ 20
data Size ~ 50GB

now:

Disksize of the complete index dropped to 1% !!! 15GB for indexing
Index Speed < 1.5h
Query speed 0.1 - 10s
Query aggregations: out of the box support via facets <10s

Wow. I’m impressed by the performance and feature set of bleve and couchbase.
Huge shoutout to you guys. I’ll write a blog post describing everything a little more clearer.

TL;DR

If you want to search for tags within million of documents i highly recommend storing them in a flat array and use Couchbase FTS (bleve) with the keyword analizer.

{"data":"","tags":["tag_1","tag_2"]}

-> Search -> Quick Index ->  "Index this field as an identifier"

SELECT * FROM data._default.data WHERE SEARCH(`app`,{
"query":{ "analyzer":"keyword","field":"tags","match":"<tag>"},
"explain":"false",
"score":"false",
"size":10,
"sort":"_id",
"fields":["*"]
});

Topic		Replies	Views
Array Indexing in Couchbase 4.5 SQL++	1	1322	September 9, 2016
Data Modeling for Couchbase Couchbase Server data_modelling	2	2610	August 14, 2015
Migration to Couchbase and design question Couchbase Server	2	1941	March 31, 2015
Offset+limit for nested documents in couchbase lite Couchbase Lite	1	919	October 26, 2018
Array indexing in Couchbase server 4.5 Couchbase Server	1	1417	July 22, 2016

How to model tags

TL;DR

Related topics