How to avoid Duplicate Rows on joins

aponnath · April 6, 2020, 12:07am

I have a Grid which returns data for a query so in my case i run 2 querys, the first one will get me the total row count, the other gets me the data and supports the paging / scrolling in the grid.
Below is the first query to get row count in this case its 29

select count(*) as count FROM Contacts AS f WHERE f._type="farm" and f.tract IN [12979,12980,12981,12982,13255,13260,13261] and LOWER(f.PropertyAddress.streetName) = LOWER("Mikro")

The Query below returns the data for grid but has duplicate rows, in this case 58 rows where the first query has only 29

SELECT META(f).id AS DocId,
       f.Owners.owner1FName || IFMISSINGORNULL((CASE WHEN (LENGTH(f.Owners.owner1SpouseFName) > 0) THEN " & " || f.Owners.owner1SpouseFName || " " ELSE " " END), " ") || f.Owners.owner1LName AS owner_1,
       TRIM(f.Owners.owner2FName || IFMISSINGORNULL((CASE WHEN (LENGTH(f.Owners.owner2SpouseFName) > 0) THEN " & " || f.Owners.owner1SpouseFName || " " ELSE " " END), " ") || f.Owners.owner2LName) AS owner2,
       f.Marketing.privacy,
       ARRAY { CASE WHEN v.type IS NOT MISSING THEN LOWER(v.type) ELSE "default" END: CASE WHEN LENGTH(v. `number`) = 10 THEN "(" || SUBSTR(v.`number` , 0,3) || ") " || SUBSTR(v.`number` , 3,3) || "-" || SUBSTR(v.`number` , 6,4) ELSE v.`number` END } FOR v IN f.phones END AS phones,
       ARRAY {CASE WHEN v.type IS NOT MISSING THEN LOWER(v.type) ELSE "default" END :v.`address`} FOR v IN f.emails END AS emails,
       f.Marketing.farm_id,
       f.PropertyAddress.houseNumber,
       f.PropertyAddress.streetName || IFMISSINGORNULL(" " || f.PropertyAddress.streetType || " ", "") || f.PropertyAddress.streetSuffix AS streetName,
       f.PropertyAddress.city,
       f.PropertyAddress.state,
       f.beds,
       f.baths,
       f.partialBaths,
       f.`pool`,
       f.fireplace,
       f.ownerOccupied,
       f.tract,
       t.name AS tract_name,
       f.mailingAddress.address || " " || f.mailingAddress.city || " " || f.mailingAddress.state || " " || f.mailingAddress.zip ||"-" || f.mailingAddress.zip4 AS mailing_address
FROM Contacts AS f LEFT
    JOIN Contacts AS t ON t._type ="tract_info"
    AND ANY b IN t.tract_id SATISFIES b = f.tract END
WHERE f._type="farm"
    AND f.PropertyAddress.streetName IS NOT MISSING
    AND TONUMBER(f.PropertyAddress.houseNumber) IS NOT MISSING
    AND f.tract IN [12979,12980,12981,12982,13255,13260,13261]
    AND LOWER(f.PropertyAddress.streetName) = LOWER("Mikro")
ORDER BY f.PropertyAddress.streetName,
         TONUMBER(f.PropertyAddress.houseNumber)
LIMIT 100
OFFSET 0

So how can i prevent duplicate rows as in this case

vsr1 · April 6, 2020, 12:32am

You can use SELECT DISTINCT …
From JOIN you only projecting only track name right?

OR remove JOIN use the following in projection tract_name

FIRST v.name v IN (SELECT DISTINCT track_id, t.name FROM Contacts AS t UNNEST t.tract_id AS track_id WHERE t._type ="tract_info" AND tack_id IN [12979,12980,12981,12982,13255,13260,13261]) WHEN f.track = v.track_id END

if you want all track_names use ARRAY instead FIRST

If using 6.50 you can move this query into WITH clause (CTE)

(FYI: I removed other post).

aponnath · April 6, 2020, 5:26am

Yes, i am on 6.5
There reason i did this this way because prior to 6.5 it was the way to go. How would the CTE look like as i get the error

msg": “Unable to run subquery - cause: FROM in correlated subquery must have USE KEYS clause: FROM Contacts.”,

if i run the sub query like this

( SELECT DISTINCT t.name
 FROM Contacts AS t
 UNNEST t.tract_id AS track_id
 WHERE t._type ="tract_info"
  AND track_id = f.tract ) AS track_name,

the property has a single tract_id so there is no need to do the IN query for this one. Its only used to get the list of all properties in that tract. Also do we have to do the UNNEST and if not is there a performance benefit to just look in ARRAY like this.

SELECT t.name
FROM Contacts t
WHERE _type = "tract_info" and ARRAY_CONTAINS(t.tract_id, f.tract )
Limit 1

vsr1 · April 6, 2020, 1:30pm

CREATE INDEX trackix1 ON Contacts(ALL tract_id, name) WHERE _type ="tract_info";

WITH tracknames AS (SELECT DISTINCT track_id, t.name
                    FROM Contacts AS t
                    UNNEST t.tract_id AS track_id
                    WHERE t._type ="tract_info"
                          AND tack_id IN [12979,12980,12981,12982,13255,13260,13261])
SELECT META(f).id AS DocId,
       f.Owners.owner1FName || IFMISSINGORNULL((CASE WHEN (LENGTH(f.Owners.owner1SpouseFName) > 0) THEN " & " || f.Owners.owner1SpouseFName || " " ELSE " " END), " ") || f.Owners.owner1LName AS owner_1,
       TRIM(f.Owners.owner2FName || IFMISSINGORNULL((CASE WHEN (LENGTH(f.Owners.owner2SpouseFName) > 0) THEN " & " || f.Owners.owner1SpouseFName || " " ELSE " " END), " ") || f.Owners.owner2LName) AS owner2,
       f.Marketing.privacy,
       ARRAY { CASE WHEN v.type IS NOT MISSING THEN LOWER(v.type) ELSE "default" END: CASE WHEN LENGTH(v. `number`) = 10 THEN "(" || SUBSTR(v.`number` , 0,3) || ") " || SUBSTR(v.`number` , 3,3) || "-" || SUBSTR(v.`number` , 6,4) ELSE v.`number` END } FOR v IN f.phones END AS phones,
       ARRAY {CASE WHEN v.type IS NOT MISSING THEN LOWER(v.type) ELSE "default" END :v.`address`} FOR v IN f.emails END AS emails,
       f.Marketing.farm_id,
       f.PropertyAddress.houseNumber,
       f.PropertyAddress.streetName || IFMISSINGORNULL(" " || f.PropertyAddress.streetType || " ", "") || f.PropertyAddress.streetSuffix AS streetName,
       f.PropertyAddress.city,
       f.PropertyAddress.state,
      f.beds,
       f.baths,
       f.partialBaths,
       f.`pool`,
       f.fireplace,
       f.ownerOccupied,
       f.tract,
       FIRST v.name FOR v IN tracknames WHEN v.track_id = f.track END AS tract_name,
       f.mailingAddress.address || " " || f.mailingAddress.city || " " || f.mailingAddress.state || " " || f.mailingAddress.zip ||"-" || f.mailingAddress.zip4 AS mailing_address
FROM Contacts AS f
WHERE f._type="farm"
    AND f.PropertyAddress.streetName IS NOT MISSING
    AND TONUMBER(f.PropertyAddress.houseNumber) IS NOT MISSING
    AND f.tract IN [12979,12980,12981,12982,13255,13260,13261]
    AND LOWER(f.PropertyAddress.streetName) = LOWER("Mikro")
ORDER BY f.PropertyAddress.streetName,
         TONUMBER(f.PropertyAddress.houseNumber)
LIMIT 100
OFFSET 0;

aponnath · April 6, 2020, 5:31pm

Thanks, that works but now if i do a query with no criteria like track or street it takes almost 2.5 sec to get the first 100 records. I deployed all recommended Indexes. If i have a filter on tract it takes 200 ms

vsr1 · April 6, 2020, 5:41pm

Even though you need 100 due to ORDER BY it needs to get 3300+

post the query and index.

CREATE INDEX ix1 ON  Contacts(LOWER(PropertyAddress.streetName),  TONUMBER(PropertyAddress.houseNumber), DISTINCT tract) WHERE _type="farm";


Change query some thing like this. If it is acceptable (order BY on lower streetName)

FROM Contacts AS f
WHERE f._type="farm" 
    LOWER(f.PropertyAddress.streetName)  IS NOT MISSING
    AND TONUMBER(f.PropertyAddress.houseNumber) IS NOT MISSING
    AND  ANY v IN f.tract SATISFIES v IN [12979,12980,12981,12982,13255,13260,13261] END
    AND LOWER(f.PropertyAddress.streetName) = LOWER("Mikro")
ORDER BY LOWER(f.PropertyAddress.streetName),
         TONUMBER(f.PropertyAddress.houseNumber)
LIMIT 100
OFFSET 0;

Topic		Replies	Views
Getting count for group using N1QL SQL++	9	943	September 3, 2020
Query Performance on Join across 2 Buckets SQL++	2	696	August 21, 2022
N1QL Performance when filtering on Array SQL++	1	593	June 12, 2019
N1Ql Performance issue on Large Query's SQL++	8	998	January 21, 2022
Help for N1QL query SQL++	5	880	December 30, 2018

How to avoid Duplicate Rows on joins

Related topics