How can I JOIN documents with multiple reference in other documents and apply aggregate function in property with array of objects

Hi Couchbase Gurus,
I need some help with my N1QL Queries. I had a hard time solving this one.
So, my question are:

I. How can I JOIN documents with multiple reference in other documents ?
II. How can I JOIN or MERGED array of objects in the location and territories properties
calculate “totalSales” with the same “entityId” ?
III. And, to be unique object in each property?

I have 2 types of document product and product-sales-history:
First, the product documents with id as unique identifier.

{
  "id": "company::product::0717",
  "productCode": "98721",
  "name": "MAGNESIUM PLUS VITAMIN D3 180 CAPLETS",
  "price": 12.45,
  "type": "product"
}

Second, the product-sales-history with id as unique identifier.

{
  "id": "company::product-sales-history::00001",
  "product": "company::product::0717",
  "period": "2020-04-08",
  "type": "product-sales-history",
  "location": [
    {
      "entityId": "company::location::0002",
      "averageSales": 113.72,
      "totalSales": 1300
    },
    {
      "entityId": "company::location::0004",	
      "averageSales": 233.72,
      "totalSales": 300
    }
  ],
  "territories": [
    {
      "entityId": "company::territories::0004",	
      "averageSales": 4113.72,
      "totalSales": 7300
    },
    {
      "entityId": "company::territories::0001",
      "averageSales": 2233.72,
      "totalSales": 200
    }
  ]
}

another product-sales-history document

{
  "id": "company::product-sales-history::00002",
  "product": "company::product::0717",
  "period": "2020-04-13",
  "type": "product-sales-history",
  "location": [
    {
      "entityId": "company::location::0001",
      "averageSales": 113.72,
      "totalSales": 1300
    },
    {
      "entityId": "company::location::0002",	
      "averageSales": 233.72,
      "totalSales": 300
    }
  ],
  "territories": [
    {
      "entityId": "company::territories::0001", 	
      "averageSales": 4113.72,
      "totalSales": 7300
    },
    {
      "entityId": "company::territories::0002",
      "averageSales": 2233.72,
      "totalSales": 200
    }
  ]
}

My expected result would be:

{
  "id": "company::product::0717",
  "productCode": "98721",
  "name": "MAGNESIUM PLUS VITAMIN D3 180 CAPLETS",
  "price": 12.45,
  "type": "product",
  "location": [
    {
      "entityId": "company::location::0002",
      "averageSales": 347.44,
      "totalSales": 1600
    },
    {
      "entityId": "company::location::0004",	
      "averageSales": 233.72,
      "totalSales": 300
    },
    {
      "entityId": "company::location::0001",
      "averageSales": 113.72,
      "totalSales": 1300
    }
  ],
  "territories": [
    {
      "entityId": "company::territories::0004",	
      "averageSales": 4113.72,
      "totalSales": 7300
    },
    {
      "entityId": "company::territories::0001",
      "averageSales": 6347.44,
      "totalSales": 7500
    },
    {
      "entityId": "company::territories::0002",
      "averageSales": 2233.72,
      "totalSales": 200
    }
  ]
}

The values for “location” and “territories” property should be unique and totalSales are already sum-up.

Thank you,

CREATE INDEX ix1 ON default(product) WHERE type = "product";
CREATE INDEX ix2 ON default(product) WHERE type = "product-sales-history";
SELECT d.p.*,
       (SELECT l.entityId, SUM(l.averageSales) AS averageSales, SUM(l.totalSales) AS totalSales
        FROM d.psh AS sh
        UNNEST sh.location AS l
        GROUP BY l.entityId) AS location,
       (SELECT t.entityId, SUM(t.averageSales) AS averageSales, SUM(t.totalSales) AS totalSales
        FROM d.psh AS sh
        UNNEST sh.territories AS t
        GROUP BY t.entityId) AS territories
FROM (SELECT p, psh
      FROM default AS p
      LEFT JOIN NEST AS psh ON p.product = psh.product AND psh.type = "product-sales-history"
      WHERE p.type = "product") AS d
;

Also i think following is right one do actual avg

SUM(t.averageSales) AS averageSales
==> SUM(t.totalSales) / SUM(ROUND(t.totalSales/t.averageSales)) AS averageSales

Thank you @vsr1 :slight_smile: cool solution. I forgot to mention on my comment the I’m using Couchbase Analytics N1QL. I cannot use the NEST keyword.

Do you have any idea to substitute NEST ?

NEST ==> GROUP BY + ARRAY_AGG()
As analytic doesn’t have ARRAY_AGG() but they have equivalent as GROUP AS

Analytic query https://docs.couchbase.com/server/current/analytics/3_query.html#Group_By_clauses

SELECT p.*,
       (SELECT l.entityId, SUM(l.averageSales) AS averageSales, SUM(l.totalSales) AS totalSales
        FROM d.psh AS sh
        UNNEST sh.location AS l
        GROUP BY l.entityId) AS location,
       (SELECT t.entityId, SUM(t.averageSales) AS averageSales, SUM(t.totalSales) AS totalSales
        FROM d.psh AS sh
        UNNEST sh.territories AS t
        GROUP BY t.entityId) AS territories
FROM default AS p
LEFT JOIN psh ON p.product = psh.product AND psh.type = "product-sales-history"
WHERE p.type = "product"
GROUP BY p
GROUP AS d(psh) ;

Hi @vsr1,
Sorry, I’m bit confused :frowning: on GROUP AS I got a syntax error:

[
  {
    "code": 24000,
    "msg": "Syntax error: In line 14 >>GROUP AS d(psh) ;;<< Encountered \")\" at column 15. ",
    "query_from_user": "SELECT p.*,\n       (SELECT l.entityId, SUM(l.averageSales) AS averageSales, SUM(l.totalSales) AS totalSales\n        FROM d.psh AS sh\n        UNNEST sh.location AS l\n        GROUP BY l.entityId) AS location,\n       (SELECT t.entityId, SUM(t.averageSales) AS averageSales, SUM(t.totalSales) AS totalSales\n        FROM d.psh AS sh\n        UNNEST sh.territories AS t\n        GROUP BY t.entityId) AS territories\nFROM default AS p\nLEFT JOIN psh ON p.product = psh.product AND psh.`type` = \"product-sales-history\"\nWHERE p.`type` = \"product\"\nGROUP BY p\nGROUP AS d(psh) ;"
  }
]

Please check it showing two semicolons. remove semicolon and try.

If still issue try GROUP AS d(psh AS psh) or GROUP AS d

Hi @vsri
Cool, thanks the GROUP AS d (psh AS psh) works . Just figured it out it produces:

  "d": [
     {
        "psh": { "location" : []  }
     }...
   ]

Can ask another help for getting the sum? I’m bit confused with this :frowning: thank you

Hi @lyndondonz ,

SELECT p.*,
       (SELECT l.entityId, SUM(l.averageSales) AS averageSales, SUM(l.totalSales) AS totalSales
        FROM d AS sh
        UNNEST sh.psh.location AS l
        GROUP BY l.entityId) AS location,
       (SELECT t.entityId, SUM(t.averageSales) AS averageSales, SUM(t.totalSales) AS totalSales
        FROM d AS sh
        UNNEST sh.psh.territories AS t
        GROUP BY t.entityId) AS territories
FROM default AS p
LEFT JOIN psh ON p.product = psh.product AND psh.type = "product-sales-history"
WHERE p.type = "product"
GROUP BY p
GROUP AS d(psh AS psh) ;

Hi @vsr1
the GROUP BY and GROUP AS is already :slight_smile: thanks so much. But, I have a weird error when I do this is in query:

  (SELECT 
    l.entityId,
    l.averageSales
   FROM d AS sh
   UNNEST sh.psh.location AS l
   GROUP BY l.entityId) AS location
[
  {
    "code": 24041,
    "msg": "Cannot resolve alias reference for undefined identifier l (in line 6, at column 5)",
    "query_from_user": "SELECT \n  p.`name`,\n  p.`rank`,\n  (SELECT \n    l.entityId,\n    l.averageSales\n   FROM d AS sh\n   UNNEST sh.psh.location AS l\n   GROUP BY l.entityId) AS location\nFROM company_analytics.`products` AS p\n\n\n\n\nLEFT JOIN company_analytics.`products-sales-histories` psh ON meta(p).`id` = psh.`product`\nWHERE p.`type`=\"product\"\nGROUP BY p\nGROUP AS d (psh AS psh)\n\nORDER BY p.`rank`.`level` ASC, p.`rank`.`position` ASC\nLIMIT 2 OFFSET 0"
  }
]

do have any idea? :slight_smile: thanks

Hi @vsr1
really need help :slight_smile:

Hi @lyndondonz,

l.averageSales this may be issue (try SUM(l.averageSales ) AS averageSales . It is group query and not part of group expressions. So it must be aggregate. cc @dmitry.lychagin, @till

Hi @lyndondonz,

l.averageSales cannot be used directly in the SELECT clause after the GROUP BY. Only group expressions (l.entityId) are visible after the GROUP BY . As the previous comment suggested, you need to use an aggregate function ( SUM(l.averageSales ))

Hi @vsr1 and @dmitry.lychagin,
Thank you for you help :slight_smile:
Yeah, I already tried to aggregate the fields but still the error persist, I’m confused why can’t I use GROUP BY

  ( SELECT 
     l.`entityId`,
     l.`totalProductShipped`,
     l.`averageProductOrdered`,
     l.`averageProductShipped`,
     (SUM(l.`averageSales`)) AS averageSales,
     (SUM(l.`totalSales`)) AS totalSales
    FROM d AS sh
    UNNEST sh.psh.location AS l
    GROUP BY l.`entityId`) AS location
[
  {
    "code": 24041,
    "msg": "Cannot resolve alias reference for undefined identifier l (in line 7, at column 6)",
    "query_from_user": "SELECT \n  p.`id`,\n  p.`name`,\n  p.`rank`,\n  ( SELECT \n     l.`entityId`,\n     l.`totalProductShipped`,\n     l.`averageProductOrdered`,\n     l.`averageProductShipped`,\n     (SUM(l.`averageSales`)) AS averageSales,\n     (SUM(l.`totalSales`)) AS totalSales\n    FROM d AS sh\n    UNNEST sh.psh.location AS l\n    GROUP BY l.`entityId`) AS location\nFROM company_analytics.`products` AS p\n\n\nLEFT JOIN company.`products-sales-histories` psh ON meta(p).`id` = psh.`product`\nGROUP BY p\nGROUP AS d (psh AS psh)\n\nORDER BY p.`rank`.`level` ASC, p.`rank`.`position` ASC\nLIMIT 2 OFFSET 0"
  }
]

Hi @lyndondonz,

If the query block uses group by or aggregate function, The projection must contain expression based on group by expressions or aggregates or constants only. Any other expressions are invalid.

In your case the following are in vlaid.

l.`totalProductShipped`,
 l.`averageProductOrdered`,
 l.`averageProductShipped`,

Hi @vsr1
Sorry, I bit confused with syntax error the produced by GROUP AS

 (SELECT
       l.`entityId`,
       lg,
       (SUM(l.`averageSales`)) AS averageSales,
       (SUM(l.`totalSales`)) AS totalSales
   FROM d AS sh
   UNNEST sh.psh.location AS l
   GROUP BY l.`entityId`
   GROUP AS lg (
     l.`totalProductShipped` AS totalProductShipped, 
     l.`averageProductOrdered` AS averageProductOrdered,
     l.`averageProductShipped` AS averageProductShipped)
   ) AS location
[
  {
    "code": 24000,
    "msg": "Syntax error: In line 14 >>     l.`totalProductShipped` AS totalProductShipped, << Encountered \".\" at column 7. ",
    "query_from_user": "SELECT \n  p.`id`, \n  p.`name`,\n  p.`rank`,\n  (SELECT\n       l.`entityId`,\n       lg,\n       (SUM(l.`averageSales`)) AS averageSales,\n       (SUM(l.`totalSales`)) AS totalSales\n   FROM d AS sh\n   UNNEST sh.psh.location AS l\n   GROUP BY l.`entityId`\n   GROUP AS lg (\n     l.`totalProductShipped` AS totalProductShipped, \n     l.`averageProductOrdered` AS averageProductOrdered,\n     l.`averageProductShipped` AS averageProductShipped)\n   ) AS location\nFROM company.`products` AS p\nLEFT JOIN company.`products-sales-histories` psh ON meta(p).`id` = psh.`product`\n\nGROUP BY p\nGROUP AS d (psh AS psh)\n\n\nORDER BY p.`rank`.`level` ASC, p.`rank`.`position` ASC\nLIMIT 1 OFFSET 0"
  }
]

I read the documentation on Analytics GROUP AS the syntax should be like this:

  Variable ("(" VariableReference  Identifier ("," VariableReference  Identifier )* ")")?

I’m kinda lost with this one :frowning:

i removed that . u have to change to aggregate

Hi @vsr1,
Thank you for your patience and my apology for causing confusion. Just a clarification can I use multiple aggregate function SUM() ?

    ( SELECT 
     l.`entityId`,
     SUM(l.`totalProductShipped`) AS totalProductShipped,
     SUM(l.`averageProductOrdered`) AS averageProductOrdered,
     SUM(l.`averageProductShipped`) AS averageProductShipped,
     SUM(l.`averageSales`) AS averageSales,
     SUM(l.`totalSales`) AS totalSales
    FROM d AS sh
    UNNEST sh.psh.location AS l
    GROUP BY l.`entityId`) AS location

I got this error:

[
  {
    "code": 25000,
    "msg": "Internal error",
    "query_from_user": "SELECT \n  p.`id`, \n  p.`name`,\n  p.`rank`,\n    ( SELECT \n     l.`entityId`,\n     SUM(l.`totalProductShipped`) AS totalProductShipped,\n     SUM(l.`averageProductOrdered`) AS averageProductOrdered,\n     SUM(l.`averageProductShipped`) AS averageProductShipped,\n     SUM(l.`averageSales`) AS averageSales,\n     SUM(l.`totalSales`) AS totalSales\n    FROM d AS sh\n    UNNEST sh.psh.location AS l\n    GROUP BY l.`entityId`) AS location\nFROM company_analytics.`products` AS p\nLEFT JOIN company_analytics.`products-sales-histories` psh ON meta(p).`id` = psh.`product`\n\nGROUP BY p\nGROUP AS d (psh AS psh)\n\n\nORDER BY p.`rank`.`level` ASC, p.`rank`.`position` ASC\nLIMIT 1 OFFSET 0"
  }
]

Hi @lyndondonz,
Yes, you can use multiple aggregates function calls in the same clause.
Which version of the Couchbase Server are you using?

Hi @dmitry.lychagin,

We’re using 6.5

Hi @lyndondonz,

I can reproduce that internal error in my environment. Let me investigate what’s going on there.