When selecting in Couchbase, I want to select all document values. I understand that using select * from bk.sc.coll scans the primary index. Would it be better to create a different index, or perhaps use a KV range scan instead? If there’s a recommended approach, please let me know
If you are just retrieving all the documents without any filtering or ordering etc, you can just use SELECT * FROM ...; in 7.6 as long as you’re an admin user or you have been granted permission to use sequential scans, a primary index isn’t needed. Ref: Sequential Scans
Obviously this does mean all documents move from Data nodes to the Query node to the client; they are moved untouched if you are just passing them through unfiltered/otherwise accessed. Depending on the volume of data this may be acceptable. If not and/or overheads must be reduced, then you can use KV range scans (with documents or key-only in conjunction with GETs) in the application to remove that extra hop. Ref: KV Range Scans
Any recommendation must relate to the frequency of the operation, the chances for changes in the future (ease of maintenance) and the volume of data.
If this is something done infrequently, without particular constraint on the time it takes, I’d use the SQL statement with a sequential scan.
If it is performed a bit more frequently and key-ordering is necessary, then I’d create a primary index (this effectively pre-sorts the keys and statements then use the cached pre-sorted results), otherwise it is unnecessary overhead. (If sufficiently infrequently then the overhead of maintaining the primary index may not be worth it.)
If it is something that will be performed often or must be as highly optimised as possible, then I’d go the direct application approach - it just makes sense in such circumstances to eliminate the extra hop the data must take. You can also choose in your application to drive your data nodes to their limits at the expense of concurrency; the Query service will attempt to not do this ever.
That said, if you may want to start filtering or varying the order etc. in the future then SQL++ would once again likely make the most sense.