How to model tags

Ok wow. This changed everything - in every dimension.
The data must be stored in an array not map - but after this it can be used directly by bleeve

previous:

  • Disksize of the index: > 1 TB
  • Index speed > 92h
  • Query speed (depending on query ~1-60sec)
  • Query aggregates (e.g. count by tags) > 15min+

Specs from the small test dataset:

  • 22million rows
  • 1500 tags+
  • average tags ~ 20
  • data Size ~ 50GB

now:

  • Disksize of the complete index dropped to 1% !!! 15GB for indexing
  • Index Speed < 1.5h
  • Query speed 0.1 - 10s
  • Query aggregations: out of the box support via facets <10s

Wow. I’m impressed by the performance and feature set of bleve and couchbase.
Huge shoutout to you guys. I’ll write a blog post describing everything a little more clearer.

TL;DR

If you want to search for tags within million of documents i highly recommend storing them in a flat array and use Couchbase FTS (bleve) with the keyword analizer.

{"data":"","tags":["tag_1","tag_2"]}
-> Search -> Quick Index ->  "Index this field as an identifier"
SELECT * FROM data._default.data WHERE SEARCH(`app`,{
"query":{ "analyzer":"keyword","field":"tags","match":"<tag>"},
"explain":"false",
"score":"false",
"size":10,
"sort":"_id",
"fields":["*"]
});

5 Likes