SELECT Duplicate row item from MySQL table using
SELECT * FROM `table` GROUP BY `col1`,`col2` Having COUNT(`col1`)>1 and COUNT(`col2`)>1
Actual result
The above query return first duplicate entry. from above data row 1 and row 7 contains duplicate field in same column(col1, col2).
But I need to Get last duplicate entry. Highlighted duplicate row
Expected Result
I need to get last duplicate entry.
How do you define the last duplicate? In a database table, records are not inhenrently ordered, and you did not tell which column we should use for ordering.
If you want to order by col3, then you can just use aggregation, like so:
select col1, col2, max(col3) -- or min(col3)
from mytable
group by col1, col2
-- having count(*) > 1
-- uncomment the above line if you want to see only records for which a duplicate exists
If you have some other column that you want to order with, say id, then you can filter with a correlated subquery
select col1, col2, col3
from mytable t
where id = (
select max(id) from mytable t1 where t1.col1 = t.col1 and t1.col2 = t.col2
)
I have a query in MYSQL like this (simplified) :
SELECT col1, SUM(DISTINCT col2) AS S
FROM tbl1
WHERE col1='abbc'
GROUP BY col1
ORDER BY S ASC
I know that an index on col1 would be useful for that kind of query. I would like to know if a covering index on (col1, col2) would be more usefull or if it doesnt make any difference.
i try it , it seems different and more usefull
Index Version Execution Plan :
without distinct
SELECT col1, SUM(col2) AS S
FROM tbl1
WHERE col1='abbc'
GROUP BY col1
ORDER BY S ASC;
distinct
SELECT col1, SUM(distinct col2) AS S
FROM tbl1
WHERE col1='abbc'
GROUP BY col1
ORDER BY S ASC;
SQL Fiddle
without Index Version Execution Plan :
it's no different
SQL Fiddle
I know it can be done with union, but is kind of repetitive -- so is there a way to split multi column every row into multiple single column rows?
Example:
SPLIT(SELECT col1,col2,col3 FROM tbl);
SPLIT is my imaginary function, for:
SELECT col1 FROM tbl
UNION
SELECT col2 FROM tbl
UNION
SELECT col3 FROM tbl;
So, is there such UNION/SPLIT equivalent?
First, you should use union all -- unless you intend to incur the overhead to remove duplicates:
SELECT col1 FROM tbl
UNION ALL
SELECT col2 FROM tbl
UNION ALL
SELECT col3 FROM tbl;
The above requires scanning the table 3 times. You can scan the table just once using CROSS JOIN, but the logic is a little more cumbersome:
SELECT (CASE col WHEN 1 THEN col1 WHEN 2 THEN col2 WHEN 3 THEN col3 END) as col
FROM tbl CROSS JOIN
(SELECT 1 as col UNION ALL SELECT 2 UNION ALL SELECT 3) x;
I should note that other databases support unpivot and lateral joins, either of which can be used for this purpose. However, these constructs are not in MySQL.
In a stored procedure, I need to INSERT the result of a long UNION into a temp table.
The WHERE clause is the same for all tables, which is being in a SELECT DISTINCT.
Simplified for readability, it goes like this:
INSERT INTO #MyTemp
SELECT col1, col2, col3 FROM tab1 WHERE col1 in (SELECT DISTINCT myId FROM TabIds) UNION
SELECT col1, col2, col3 FROM tab2 WHERE col1 in (SELECT DISTINCT myId FROM TabIds) UNION
SELECT col1, col2, col3 FROM tab3 WHERE col1 in (SELECT DISTINCT myId FROM TabIds) UNION
.
.
.
SELECT col1, col2, col3 FROM tab20 WHERE col1 in (SELECT DISTINCT myId FROM TabIds)
Although TabIds is a small temp table, typically 3-6 records long, this seems to be pretty inneficient.
Is there a better way to do this?
Summarizing my question:
Is there a way I can do SELECT DISTINCT myId FROM TabIds just once and assign it to a kind of array/list/set (not to another temp table) and just use that in the WHERE clauses, and if there is a way, does it really matter for such a small (3-6 recs) temp table?
I'm ignoring your requirement ("not to another temp table") because I don't believe it is well-founded. Try and see if this solution gives you better performance:
SELECT i = myId
INTO #x
FROM dbo.TabIds -- please always use schema prefix
GROUP BY myId;
CREATE UNIQUE CLUSTERED INDEX x ON #x(i);
INSERT INTO #MyTemp(col1, col2, col3)
SELECT col1, col2, col3
FROM
(
SELECT col1, col2, col3 FROM dbo.tab1 WHERE EXISTS -- likely better than IN
(SELECT 1 FROM #x WHERE i = tab1.col1)
UNION ALL
SELECT col1, col2, col3 FROM dbo.tab2 WHERE EXISTS
(SELECT 1 FROM #x WHERE i = tab2.col1)
UNION ALL
...
UNION ALL
SELECT col1, col2, col3 FROM dbo.tab20 WHERE EXISTS
(SELECT 1 FROM #x WHERE i = tab20.col1)
) AS x
GROUP BY col1, col2, col3; -- likely more efficient than `UNION` to remove dupes
Of course this will work best if col1 is indexed in all 20 tables, and if that index includes col2 and col3.
The reason I suggested a view is not because I thought it would make this code run faster. Just that you could create a view that generates this UNION for you, making this code simpler (and any other code that repeats this monotonous UNION). It was a suggestion for convenience, not for performance - though I need to make it clear that using a view does not magically make things slower. Sometimes I can, but that's a dangerous and illogical reason to avoid views.
Finally, I'd strongly consider normalization. Why are these 20 different tables in the first place, when they could all be in one single table?
CREATE TABLE dbo.Normal
(
SourceTableID INT,
col1 <data type>,
col2 <data type>,
col3 <data type>
);
-- indexes / constraints
INSERT dbo.Normal
SELECT 1, col1, col2, col3 FROM dbo.tab1
UNION ALL
SELECT 2, col1, col2, col3 FROM dbo.tab2
UNION ALL
...
UNION ALL
SELECT 20, col1, col2, col3 FROM dbo.tab20;
Now all your queries can simply reference this new table. If you will commonly look for only one of the sources (e.g. tab5), then indexing or partitioning on SourceTableID would be useful.
What you're doing, conceptually, is fine for one-offs and data loads. I hope this isn't part of a bigger pattern in production code, though.
What you're looking for is a Common Table Expression.
My T-SQL is a bit rusty, but with a CTE, your query would go something like:
WITH TabIds_CTE AS (SELECT DISTINCT myId FROM TabIds)
INSERT INTO #MyTemp
SELECT col1, col2, col3 FROM tab1 WHERE col1 IN (SELECT * FROM TabIds_CTE)
UNION ALL ...
I think the following might be better for small tables, but still - it's horrible idea to leave it like this in some production process :)
INSERT INTO #MyTemp (col1,col2,col3)
select distinct
x.col1,x.col2,x.col3
from (
SELECT col1, col2, col3 FROM tab1 union all
SELECT col1, col2, col3 FROM tab2 union all
SELECT col1, col2, col3 FROM tab3 union all
-- ...
SELECT col1, col2, col3 FROM tab20
) x
join (
SELECT DISTINCT myId FROM TabIds
) y
on x.col1=y.myid
I have the following query (MySQL):
SELECT col1, col2 FROM database1.table
->WHERE col3 != ANY(SELECT col1 FROM database2.table)
->ORDER BY this, that;
And I had hoped this would allow me to select col1 and col2 from a table in database1 where col3 (still in database1) does not equal anything from col1 in a table in database2.
Naturally, this wont work because SELECT col1 FROM database2.table returns more than one row, so if is equal to row1, then it's not equal to row2 so it's still returned.
Any thoughts on how to do this the right way?
Use NOT IN
SELECT col1, col2 FROM database1.table
->WHERE col3 NOT IN(SELECT col1 FROM database2.table)
->ORDER BY this, that;
but keep in mind that subselects are not optimized in MySQL, and if there are a lot of records in database1.table this would be slow. Faster way is to use JOIN - there are a lot of examples at SO
WHERE col3 NOT IN (SELECT col1 FROM database2.table)
you can use NOT IN operator for this
SELECT col1, col2 FROM database1.table
->WHERE col3 NOT IN(SELECT col1 FROM database2.table)
->ORDER BY this, that;
Just use ALL instead of ANY:
SELECT col1, col2 FROM database1.table
WHERE col3 != ALL(SELECT col1 FROM database2.table)
ORDER BY this, that;