I am trying to select duplicate rows from a series of MySQL tables. The following query...
SELECT *
FROM table_name
WHERE column_name
IN (SELECT *
FROM (SELECT column_name
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1
) AS subquery)
);
...is producing wildly different performance when run in different tables with identical schema and similar number of rows. In one table it executes within a few seconds, in another with identical data types and similar number of rows it is hanging up for an extended period of time (currently at 30 minutes and counting). What possible explanations are there for such a discrepancy?
EDIT - using EXPLAIN is showing that all the queries are returning "Impossible WHERE noticed after reading const tables" for the dependent subquery. This probably is a good time to mention that there are no indexes on any of the tables (which I inherited...). Finding duplicate values in what is supposed to be a uniqid column so that I can turn that into a proper primary key is the point of this entire snape hunt.
I'd suggest splitting the subquery out into a temporary table.
CREATE TEMPORARY TABLE IF NOT EXISTS DupeColumn AS (
SELECT column_name
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1
);
SELECT t.*
FROM DupeColumn dc
INNER JOIN table_name t
ON dc.column_name = t.column_name;
DROP TEMPORARY TABLE DupeColumn;
In my experience, MySQL is very poor at optimizing
SELECT *
FROM table1
WHERE col1 in (SELECT col2 FROM table2 WHERE ...)
Instead of performing the subquery once and then looking up all the col2 values in table1, it performs a full scan of table1 and then searches for col1 in table2.col2.
It does better when you write a JOIN:
SELECT table1.*
FROM table1
JOIN table2 ON table1.col1 = table2.col2
In your case, this would be done using a subquery for table2:
SELECT t1.*
FROM table_name AS t1
JOIN (SELECT column_name
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1) AS t2
ON t1.column_name = t2.column_name
Related
I have a table with 2 columns, each column is a FK to the same entity
Col1 and col2 are unique
I'm looking to create a query that recursively attempt a self join from Col2 -> Col1 based upon the IDs being the same on Col2 and Col1 between different rows
I cannot fathom this further than:
select *
from table as t1
join table as t2 on t1.col2 = t2.col1
That query only does a single join but i'd like to keep joining for as long as it is successful to retrieve a sum of successful joins
It's not possible for me to manually write the joins because there could potentially be none, one or many joins
You can use recursive cte to achieve this.
Sample example
With recursive cte(id) as
(Select 1 as id from dual
Union all
Select id +1 from cte where id < 10)
Select * from cte;
Here values will recusively generate untill it reaches value of 10.
How do I go about selecting COUNT(*)s from multiple tables in MySQL?
Such as:
SELECT COUNT(*) AS table1Count FROM table1 WHERE someCondition
JOIN??
SELECT COUNT(*) AS table2Count FROM table2 WHERE someCondition
CROSS JOIN? subqueries?
SELECT COUNT(*) AS table3Count FROM table3 WHERE someCondition
Edit:
The goal is to return this:
+-------------+-------------+-------------+
| table1Count | table2Count | table3Count |
+-------------+-------------+-------------+
| 14 | 27 | 0 |
+-------------+-------------+-------------+
You can do it by using subqueries, one subquery for each tableCount :
SELECT
(SELECT COUNT(*) FROM table1 WHERE someCondition) as table1Count,
(SELECT COUNT(*) FROM table2 WHERE someCondition) as table2Count,
(SELECT COUNT(*) FROM table3 WHERE someCondition) as table3Count
You can do this with subqueries, e.g.:
select (SELECT COUNT(*) FROM table1 WHERE someCondition) as table1Count,
(SELECT COUNT(*) FROM table2 WHERE someCondition) as table2Count
Here is simple approach to get purely the row counts from multiple tables, if there are no conditions on specific tables.
Note:
For InnoDB this count is an approximation. However, for MyISAM the count is accurate.
Quoted from the docs:
The number of rows. Some storage engines, such as MyISAM, store the
exact count. For other storage engines, such as InnoDB, this value is
an approximation, and may vary from the actual value by as much as 40%
to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate
count.
Using the information_schema.tables table you can use:
SELECT
table_name,
table_rows
FROM
information_schema.tables
WHERE
table_name like 'my_table%';
Output:
table_name table_rows
my_table_1 0
my_table_2 15
my_table_3 30
You can use UNION
SELECT COUNT(*) FROM table1 WHERE someCondition
UNION
SELECT COUNT(*) FROM table2 WHERE someCondition
UNION
SELECT COUNT(*) FROM table3 WHERE someCondition
You can do this in this way.
SELECT (select count(*) from table1) + (select count(*) from table2) as total_rows
You can add as many tables as you want.
Try changing to:
SELECT
COUNT(table1.*) as t1,
COUNT(table2.*) as t2,
COUNT(table3.*) as t3
FROM table1
LEFT JOIN tabel2 ON condition
LEFT JOIN tabel3 ON condition
Not directly, but it works...
1.- Create a list of the tables you need to count rows.
2.- Put that list in the first column of a spreadsheet.
3.- In the second column, first row, use this formula:
="SELECT '"&A1&"', (SELECT count(*) FROM "&A1&") UNION"
4.- Fill down the formula.
5.- Copy the second column and paste it to do the query.
6.- Don't forget to remove the last UNION.
I have two MySQL tables that have the exact same structure and mostly the same data. Some of the rows would be different between the two because my client updated the old website instead of the new website. There are hundreds of records and a column is not in place for the last modified date. I have created a new database on localhost and imported the old and new tables. All of the rows of data will need to be compared and differences between the old and new databases will need to be returned. Once the differences are identified, would there be a way to easily migrate the updated data from the old table to the new table? I am a MySQL novice, but I can usually muddle my way through issues. Thanks in advance for your assistance.
I have been looking at the following code, but I am not sure if it is the best answer.
SELECT *,'table_1' AS o FROM table_1
UNION
SELECT *,'table_2' AS o FROM table_2
WHERE some_id IN (
SELECT some_id
FROM (
SELECT * FROM table_1
UNION
SELECT * FROM table_2
) AS x
GROUP BY some_id
HAVING COUNT(*) > 1
)
ORDER BY some_id, o;
This should do the trick. You are finding the primary keys for all rows where the every value is the same across both tables in the subselect used in the where clause. You then exclude rows with those primary keys from the unioned result set. Now how you go about reconciling the differences is a totally different story :)
SELECT * FROM (
SELECT *, 'table 1' FROM table_1
UNION ALL
SELECT *, 'table 2' FROM table_2
) AS combined
WHERE combined.primary_key_field
NOT IN (
SELECT t1.primary_key_field
FROM table_1 AS t1
INNER JOIN table_2 AS t2
ON t1.primary_key_field = t2.primary_key_field
AND t1.some_other_field = t2.some_other_field
AND ... /* join on all fields in tables */
)
A insert into select single query will do.
insert into table_new
select * from table_old
where some_id NOT IN (select some_id from table_new)
Currently I m using this query ,Is there any substitution for this query,which will work more faster .
SELECT
SUM(result1),
SUM(result2),
SUM(result3)
FROM (
(
SELECT
0 as result1,0 as result2,COUNT(*) as result3
FROM
table1
)
UNION
(
SELECT
count(*) as result1,0 as result2,0 as result3
FROM
table2
)
UNION
(
SELECT
0 as result1,count(*) as result2,0 as result3
FROM
table3
)
) as allresult
Alternate solution of above query is as below:
SELECT (SELECT COUNT(1) FROM table2) AS result1,
(SELECT COUNT(1) FROM table3) AS result2,
(SELECT COUNT(1) FROM table1) AS result3;
Add the table names in the WHERE clause and execute the below query:
SELECT
T.Name AS TableName,
S.Row_count AS RecordsCount
FROM
sys.dm_db_partition_stats S
INNER JOIN sys.tables T ON T.object_id = S.object_id
Where
Object_Name(S.Object_Id) IN ('Employees','Country')
Very simple way to shave some performance load off this query:
Use UNION ALL instead of UNION. UNION ALL will return duplicates if there are any but the only difference between that and waht you are using, just UNION, is that UNION removes these duplicates at the expense of decreased performace. In other words it does a UNION ALL and then goes back and removes the duplicate entries.
It should increase your querys performance
(Copying my comment from this answer)
You can get the row counts for a table from the INFORMATION_SCHEMA as follows (but see caveat below):
SELECT table_rows
FROM information_schema.tables
WHERE table_schema = DATABASE()
AND table_name IN ('table1', 'table2', 'table3');
However the MySQL documentation notes that these values are not exact for InnoDb tables: "For InnoDB tables, the row count is only a rough estimate used in SQL optimization. (This is also true if the InnoDB table is partitioned.)". If you are using MyISAM, this approach may be sufficient.
Am trying to get the all rows from Tabl1 which are not available in Table2 with help of NOT IN MSQL query. But am getting timeout exception and query is not getting executed. Below is the mysql query which I am using.
SELECT * FROM identity WHERE
unique_id NOT IN (SELECT Message_Queue.index FROM Message_Queue);
Could any please tell the reason or any other way for replacement of NOT IN operation?
When you have so many records in the in() clause then you should use a join instead
SELECT t1.*
FROM table1 t1
left join table2 t2 on t2.myId = t1.myId
where t2.myId is null
Because in MySQL NOT IN is less performant, try using EXISTS
SELECT *
FROM identity a
WHERE NOT EXISTS
(
SELECT null
FROM Message_Queue b
WHERE b.index = a.unique_id
);
you should also put an index on those columns.