DISTINCT in mysql query removing the records from resultset
I have three tables
TBL1 TBL2 TBL3
---- ------ --------
tbl1_id tbl2_id tbl3_id
cid fkcid fkcid
fktbl1_id fktbl2_id
I have query to get records of TBL3
select distinct tbl3.* from TBL3 tbl3
inner join TBL2 tbl2 on tbl2.tbl2_id = tbl3.fktbl2_id and tbl2.fkcid = tbl3.fkcid
inner join TBL1 tbl1 on tbl1.tbl1_id = tbl2.fktbl1_id and tbl2.fkcid = tbl1.cid;
This query gives me around 1000 records.
But when I removes distinct from query it gives me around 1100 records.
There is no duplicate records in table.Also I confirmed that these extra 100 are not duplicate.Please note That these extra 100 records are not found in query with distinct keyword.
Why this query is behaving unexpectedly.Please help me to understand more clearly and correct me if i am making mistake.
Thank you
You have multiple records in tbl1 or tbl2 that map to the same tbl3, and since you're only selecting tbl3.* in your output, DISTINCT removes the duplication. To instead find what the duplicates are, remove the DISTINCT, add a COUNT(*) to the SELECT clause, and add at the end a GROUP BY and HAVING, such as:
select tbl3.*, count(*)
from TBL3 tbl3
inner join TBL2 tbl2 on tbl2.tbl2_id = tbl3.fktbl2_id and tbl2.fkcid = tbl3.fkcid
inner join TBL1 tbl1 on tbl1.tbl1_id = tbl2.fktbl1_id and tbl2.fkcid = tbl1.cid
group by tbl3.tbl3_id, tbl3.fkcid, tbl3.fktbl2_id having count(*) > 1;
Related
Let's assume I have tbl2, with a foreign key to another table (tbl1). For each record in the tbl1, we can have multiple or no records in tbl2.
I want to have only one record from tbl2 (the last record, based on time), which matches with a record on tbl1. The following query only returns one record:
select * from tbl2
where fk in (select id from tbl1 where some_criteria)
order by time LIMIT 1 DESC
This query also returns all records from tbl2:
select * from tbl2
where fk in (select id from tbl1 where some_criteria)
order by time DESC
I wanna have a row for each record from select id from tbl1 where some_criteria, having all details from the latest record exists in tbl2.
You want a lateral join, available since MySQL 8.0.14:
select *
from tbl1
left outer join lateral
(
select *
from tbl2
where tbl2.fk = tbl1.id
order by time desc
limit 1
) newest_tbl2 on true;
Here is a solution for old MySQL versions: Aggregate the tbl2 by fk to get the maximum time per fk. Then use this result in your joins.
select *
from tbl1
left outer join
(
select fk, max(time) as max_time
from tbl2
group by fk
) mx on mx.fk = tbl1.id
left outer join tbl2 on tbl2.fk = mx.fk and tbl2.time = mx.max_time;
Essentially I need to only LEFT JOIN on the 2 tables if there is one customerid matching in table2. If there is more than 1 record matching table2 it should not count as a match.
Currently I am doing the following:
SELECT DISTINCT
table1.customerid,
table1.name AS customer,
table2.locationid,
table2.locationname
FROM
table1
LEFT JOIN table2 ON table1.customerid = table2.customerid
WHERE
ORDER BY
name ASC
The issue is it matches on all records.
To clarify -- if customerid is in table2 more than once, it should not join on the match, only if customerid is listed once for a record in table2.
How can this be done?
Join with a subquery that only returns customer IDs that have count = 1.
SELECT DISTINCT
table1.customerid,
table1.name AS customer,
table2.locationid,
table2.locationname
FROM table1
LEFT JOIN (
SELECT customerid, MAX(locationid) AS locationid, MAX(locationname) AS locationname
FROM table2
GROUP BY customerid
HAVING COUNT(*) = 1
) AS table2 ON table1.customerid = table2.customerid
ORDER BY name ASC
Note that using LEFT JOIN means that if locationid is not in table2 at all, the query will still return a row for that location, with NULL in the table2 fields. And with this change, it will also return those null rows when there's more than 1 matching row. If you want those rows omitted from the results entirely, use INNER JOIN rather than LEFT JOIN.
I'm new to MySQL, and I'd like some help in setting up a MySQL query to pull some data from a few tables (~100,000 rows) in a particular output format.
This problem involves three SQL tables:
allusers : This one contains user information. The columns of interest are userid and vip
table1 and table2 contain data, but they also have a userid column, which matches the userid column in allusers.
What I'd like to do:
I'd like to create a query which searches through allusers, finds the userid of those that are VIP, and then count the number of records in each of table1 and table2 grouped by the userid. So, my desired output is:
userid | Count in Table1 | Count in Table2
1 | 5 | 21
5 | 16 | 31
8 | 21 | 12
What I've done so far:
I've created this statement:
SELECT userid, count(1)
FROM table1
WHERE userid IN (SELECT userid FROM allusers WHERE vip IS NOT NULL)
GROUP BY userid
This gets me close to what I want. But now, I want to add another column with the respective counts from table2
I also tried using joins like this:
select A.userid, count(T1.userid), count(T2.userid) from allusers A
left join table1 T1 on T1.userid = A.userid
left join table2 T2 on T2.userid = A.userid
where A.vip is not null
group by A.userid
However, this query took a very long time and I had to kill the query. I'm assuming this is because using Joins for such large tables is very inefficient.
Similar Questions
This one is looking for a similar result as I am, but doesn't need nearly as much filtering with subqueries
This one sums up the counts across tables, while I need the counts separated into columns
Could someone help me set up the query to generate the data I need?
Thanks!
You need to pre-aggregate first, then join, otherwise the results will not be what you expect if a user has several rows in both table1 and table2. Besides, pre-aggregation is usually more efficient than outer aggregation in a situation such as yours.
Consider:
select a.userid, t1.cnt cnt1, t2.cnt cnt2
from allusers a
left join (select userid, count(*) cnt from table1 group by userid) t1
on t1.userid = a.userid
left join (select userid, count(*) cnt from table2 group by userid) t2
on t2.userid = a.userid
where a.vip is not null
This is a case where I would recommend correlated subqueries:
select a.userid,
(select count(*) from table1 t1 where t1.userid = a.userid) as cnt1,
(select count(*) from table2 t2 where t2.userid = a.userid) as cnt2
from allusers a
where a.vip is not null;
The reason that I recommend this approach is because you are filtering the alllusers table. That means that the pre-aggregation approach may be doing additional, unnecessary work.
I am running a query like this:
SELECT DISTINCT `tableA`.`field1`,
`tableA`.`filed2` AS field2Alias,
`tableA`.`field3`,
`tableB`.`field4` AS field4Alias,
`tableA`.`field6` AS field6Alias
FROM (`tableC`)
RIGHT JOIN `tableA` ON `tableC`.`idfield` = `tableA`.`idfield`
JOIN `tableB` ON `tableB`.`idfield` = `tableA`.`idfield`
AND tableA.field2 IN
(SELECT field2
FROM tableA
GROUP BY tableA. HAVING count(*)>1)
ORDER BY tableA.field2
This is to find all the duplicate entries, but now it's taking lot of time for the execution. Any suggestions for optimization?
It looks like you are trying to find all duplicates on field2 in TableA. The first step would be to move the in subquery to the from clause:
SELECT DISTINCT a.`field1`, a.`filed2` AS field2Alias,
a.`field3`, b.`field4` AS field4Alias, a.`field6` AS field6Alias
FROM tableA a left join
tableC c
on c.`idfield` = a`.`idfield` join
`tableB` b
ON b.`idfield` = a.`idfield` join
(SELECT field2
FROM tableA
group by field2
having count(*) > 1
) asum
on asum.field2 = a.field2
ORDER BY tableA.field2
There may be additional optimizations, but it is very hard to tell. Your question "find duplicates" and your query "join a bunch of tables together and filter them" don't quite match. It would also be helpful to know what tables have which indexes and unique/primary keys.
Can I get some help with this query. I will explain in details. To make it easier I'll take an example with HTML tags and attributes.
I have 3 table:
tbl1 - contains all the HTML tags (tagId, tagName)
tbl2 - contains all the available attributes that might appear in the
tags (attId, attName)
tbl3 - is a map table for tbl1 and tbl2 (tagId, attId)
I want to select all the attributes (and related information about the attributes) that belong to the chosen tag (in the example below tag id 4) and the ID of the tag.
Here is an example of what I'd like to get from the query:
attId tagId attName
50 4 The name of the attribute with id 50
89 4 The name of the attribute with id 89
114 4 The name of the attribute with id 114
Below is the query that I've made, but I believe there is a better way.
SELECT tbl2.*, tbl3.tagId
FROM tbl2 JOIN tbl3
WHERE tbl3.attId IN (
SELECT tbl3.attId FROM tbl3 where tbl3.tagId=4
)
AND tbl3.attIdd = tbl2.attId
GROUP BY tbl3.attId
Thanks in advance.
Issue 1
You're doing cross join that you later reduce to an inner join by putting a filter in the where clause.
This is the old-skool way when using implicit join syntax.
On most SQL-dialects (but not MySQL) this gives an error.
Never do a join without a ON clause.
If you want to do a cross join, use this syntax: select * from a cross join b
Issue 2
The sub select is really not needed, you can just do the test inside a where clause.
SELECT tbl2.*, tbl3.tagId
FROM tbl2
INNER JOIN tbl3 ON (tbl3.attIdd = tbl2.attId)
WHERE tbl3.tagId = 4
GROUP BY tbl3.attId
Issue 3a
You are doing a group by on tbl3.ATTid, which in this context is the same as doing a group by on tbl2.attid (because of the ON (tbl3.attIdd = tbl2.attId) clause)
The latter makes a bit more sense, because you are doing select tbl2.*, not select tbl3.*
Issue 3b
If attId is not a primary or unique key for tbl2, than the result of the query will be indeterminate.
That means that the query will select a row at random from a list of possible rows with the same attID. If you don't want that you'll have to include a having clause that will choose a row based on some criterion.
Example
/*selects the most recent row per attID, instead of a random one*/
SELECT tbl2.*, tbl3.tagId
FROM tbl2
INNER JOIN tbl3 ON (tbl3.attIdd = tbl2.attId)
WHERE tbl3.tagId = 4
GROUP BY tbl2.attId
HAVING tbl2.dateadded = MAX(tabl2.dateadded)
You can do it like this instead.
SELECT tbl2.*, tbl3.tagId
FROM tbl2 JOIN tbl3
ON tbl3.attId = tbl2.attId
WHERE tbl2.tagId = <selected id>
GROUP BY tbl3.attId
SELECT tbl2.attId, tbl1.tagId, tbl2.attName FROM tbl3, tbl1, tbl2
WHERE tbl1.tagId = tbl3.tagId AND tbl2.attId = tbl3.attId AND tbl3.tagId = 4