Query w/ limited aggregation

Hello! Is it possible to have a query where you return the total value of records AND the first n records of that query? I have the following query, which works if there are 5 or more records, but doesn’t work for less records than that:

SELECT ARRAY_AGG(gm.userId)[0:5] as users, COUNT(1) as totalUsers
FROM default gm
WHERE gm.docType="GroupMessage"
GROUP BY gm.groupId

Also, it probably fetches the whole array into memory, just to fetch 5 elements, is this correct?

Ideally, in the end, I’d like to have a query where I’d join the array of userIds with the users bucket, while keeping the information of the group chat and group messages. Do you guys have any suggestions?

Thanks!

First grouping needs to complete before apply array slice. i.e. If given group if there are 100 items ARRAY_AGG() needs to collect all items will not terminate after reaching 5.

If you need to JOIN don’t reduce slice during projection reduce during JOIN using this
users[0:LEAST(5,ARRAY_LENGTH(users))]

SELECT gc.message, ARRAY_AGG(users.firstName) as users, innerQuery.totalUsers
FROM (
  SELECT ("GroupChat::" || gm.groupId) as chats, ARRAY_AGG("User::" || gm.userId) as users, COUNT(1) as totalUsers
  FROM default gm
  WHERE gm.docType="GroupMessage"
  GROUP BY gm.groupId
) as innerQuery
JOIN default users ON KEYS innerQuery.users0:LEAST(5,ARRAY_LENGTH(innerQuery.users))]
JOIN default gc ON KEYS innerQuery.chats
WHERE users.docType="User"
GROUP BY gc, innerQuery.totalUsers;

Thanks for the fast reply! Could you elaborate on what you mean on the grouping before the slice?

I was able to come up with this query:

SELECT gc.message, ARRAY_AGG(users.firstName) as users, innerQuery.totalUsers
FROM (
  SELECT ("GroupChat::" || gm.groupId) as chats, ARRAY_AGG("User::" || gm.userId)[0:5] as users, COUNT(1) as totalUsers
  FROM default gm
  WHERE gm.docType="GroupMessage"
  GROUP BY gm.groupId
) as innerQuery
JOIN default users ON KEYS innerQuery.users
JOIN default gc ON KEYS innerQuery.chats
WHERE users.docType="User"
GROUP BY gc, innerQuery.totalUsers

which gives me what I want. The problems that persist are the following:

  • If the array has less than 5 elements, the result doesn’t come up
  • Ideally, I’d like to process as few info as possible in memory. If I could limit the number of elements in the array a priori it would be the best.

Nice, that works! But if I have 1M users in a group message, and have 10 group messages, I’ll be sending all of that info from the inner select to the outer scope, right? There’s no way to limit that right of the bat?

Try this and see if it works.

SELECT gc.message, ARRAY_AGG(users.firstName) as users, innerQuery.totalUsers
FROM (
  SELECT ("GroupChat::" || gm.groupId) as chats, allusers[0:LEAST(5,ARRAY_LENGTH(allusers)] as users, COUNT(1) as totalUsers
  FROM default gm
  WHERE gm.docType="GroupMessage"
  GROUP BY gm.groupId
  LETTING allusers = ARRAY_AGG("User::" || gm.userId)
) as innerQuery
JOIN default users ON KEYS innerQuery.users
JOIN default gc ON KEYS innerQuery.chats
WHERE users.docType="User"
GROUP BY gc, innerQuery.totalUsers;
1 Like

Yes, this one works as well, and it seems more efficient :slight_smile:

But it still does load all of the user entries into an array before limiting it. Do you reckon that performance in this query won’t suffer if the number of elements is too big? I guess that what I am asking is that if there’s a way to use a sort of a LIMIT statement just for the records we’re fetching, or maybe structure the query in a different way…

Query uses GROUP BY and aggregation and there is no way aggregation to stop when reached certain limit. cc: @geraldss

@vsr1 How about replacing the ARRAY_AGG() with a subquery containing LIMIT 5.

SELECT
    (SELECT ... LIMIT 5) AS users
FROM ...
GROUP BY ...
;

@geraldss, Query has GROUP BY all projections needs to be in GROUP BY or it needs to be aggregates.

@vsr1

I think subqueries can be projected from GROUP BY. Will check.

Can the subquery have a JOIN or WHERE clause.

I think subquery can have WHERE, not JOIN. Also input to subquery needs to be group results.