I have a table:
--id---name---col1--col2--col3...-colN--created.
--1---myName---col1--col2--col3...-colN--created1.
--2---myName---col1--col2--col3...-colN--created2.
--3---myOtherName---Othercol1--Othercol2--Othercol3...-OthercolN--created3.
id and created fields are unique.
Rest of the rows has duplicates - exact the same set of values (name+col1+col2+col3+..+colN).
However, few rows are completely unique. How could I find them (row 3 in my example)?
You can use NOT EXISTS and a correlated subquery selecting rows from the same table with a different ID but equal values.
SELECT *
FROM elbat t1
WHERE NOT EXISTS (SELECT *
FROM elbat t2
WHERE t2.id <> t1.id
AND t2.col1 = t1.col1
AND t2.col2 = t1.col2
AND t2.col3 = t1.col3
...
AND t2.coln = t1.coln);
You can group by fields that must be unique and then select rows where the count equals to one.
SELECT *
FROM
mytable
INNER JOIN (
SELECT id
FROM
mytable
GROUP BY
col1, col2, col3
HAVING
COUNT(*) = 1
) t
ON mytable.id = t.id;
There are a number of solutions. Depending on the amount of data and performance requirements you could add indexes and test a couple of solutions to get the optimal results.
I need to do selection from mysql 5.7.22 in one query.
select id from t1 where type_id=(select type_id from t2 where id=1 limit 1) and id not in
(select obj_id from t2
where
type_id = (select type_id from t2 where id=1 limit 1)
and
type2_id = (select type2_id from t2 where id=1 limit
...
)
I have some duplicate subquerys in where clause (it's only part of the query, this subquery duplicates many times)
'(select type_id from t2 where id=1 limit 1)'
Can I some how figure it out in one place, to reduce verbose.
So I want to select once
select type_id, type2_id from t2 where id=1 limit 1
and make type_id, type2_id available in all query context.
I know mysql 8.0 has WITH syntax, but I am using 5.7.22
I want to do this in one query without transactions.
It's hard to give you complete advice without seeing your more of your query. But you have some choices.
You could try creating a view as follows then using it.
CREATE VIEW selector
AS SELECT MAX(type_id) type_id, MAX(obj_id) obj_id
FROM t2
WHERE id = 1
It looks possible that the t2 query returns multiple rows. This view deals with that by using MAX() instead of LIMIT 1. But if t2.id is a primary key, then all you need is
CREATE VIEW selector
AS SELECT type_id, obj_id
FROM t2
WHERE id = 1
Then you can use the view in your query.
For example
SELECT id
FROM t1
WHERE type_id = (SELECT type_id FROM selector)
AND obj_id <> (SELECT obj_id FROM selector)
Or you could figure out how to use join operations rather than subqueries.
SELECT id
FROM t1
JOIN selector ON t1.type_id = selector.type_id AND t1.obj_id <> selector.obj_id
try this
select id from t1 ,(select type_id from t2 where id=1 limit 1) t where type_id=t.type_id and id not in
(select obj_id from t2
where
type_id = t.type_id
and
type2_id = t.type_id
...
)
I have two tables with a one-to-many relationship.
Table1
ID name email
Table2
ID table1_ID date
I need to get all the data from Table1 where :
MAX(date) from Table2 < "2016-01-01"
This doesn't work. Max is considered as "invalid" in where clause. What I did was :
SELECT Table1.name, Table1.email, tmp.maxdate
FROM Table1
JOIN ( SELECT MAX(date) maxdate, table1_ID
FROM Table2
GROUP BY table1_ID ) as tmp
ON tmp.table1_ID = table1.id
WHERE tmp.maxdate < "2016-01-01"
AND (other conditions)
So this works. BUT I think the performance is going to be awful - explain shows that all the Table2 is being read, and this table will grow a lot.
Any idea on how I could do it otherwise, or how to improve my current query performances ?
Try:
SELECT Table1.name, Table1.email, tmp.maxdate
FROM Table1
INNER JOIN ( SELECT MAX(date) maxdate, table1_ID
FROM Table2
GROUP BY table1_ID
HAVING maxdate > "2016-01-01" ) as tmp
ON tmp.table1_ID = table1.id
WHERE
AND (other conditions)
Before, you just bringing back everyone from Table2 and join it with Table1. This will knock off all those without the maxdate > "2016-01-01" and do join on it with Table1.
First of all, don't think , test it by your self and check it.
Secondly, you can try using EXISTS() which might be slightly faster becuase you can filter Table2 and not use a GROUP BY clause :
SELECT * FROM Table1 t1
WHERE EXISTS(SELECT 1 FROM Table2 t2
WHERE t2.date > "2016-01-01"
AND t1.id = t2.table1_id
AND <Other Conditions>)
You can also add table2.date > "2016-01-01" inside your sub query.
In addition, consider adding the following indexes:
Table1(id,name,email)
Table2(table1_id,date)
Note that I recommend these indexes based on the query you provided, if there are extra conditions this indexes might not be complete.
How do I display in SELECT results 1 if a variable exits in other table and 0 if not? Is it possible or I have to JOIN? And in case it is possible only by JOIN, what if my SELECT is really complicated and I want to LIMIT it before JOINING?
Lets say that table 1 and table 2 contain column named pid. Would like to select * from table 1 , limit it to 100 results (limit 100), and add one column to results determinating if a pid of a result in table 1 is in table 2.
Try
SELECT t1.*, (t2.pid IS NOT NULL) exists_in_table2
FROM
(
SELECT *
FROM table1
ORDER BY pid
LIMIT 100
) t1 LEFT JOIN table2 t2
ON t1.pid = t2.pid
Here is SQLFiddle demo
Try #peterm solution and if you want to check more conditions you can use CASE statement
SELECT t1.*,
case when t2.pid IS NULL then 0 else 1 end as exists_in_table2
FROM
(
SELECT *
FROM table1
ORDER BY pid
LIMIT 100
) t1 LEFT JOIN table2 t2
ON t1.pid = t2.pid
This is the same as #peterm told, I just changed the checking with CASE statement and the SQLFIDDLE
If I have 2 tables and want to find if they have the same data, what is the most straightforward way to do it in MySQL?
I have read about doing a correlated subquery and UNION ALL but this query is about 2 pages (!) and can not really follow what it is doing. There must be an easier way.
Even if it is e.g. make MySQL copy the table data to files and do a vimdiff (I am not sure that this is even possible -is it?- just thinking out loud).
UPDATE
I am interested only in the table data and not structure. This is to clarify due to an ambiguous comment I made
If you just want to tell whether the tables are identical or not as efficiently as possible, use this query:
SELECT 1 FROM (
SELECT * FROM table1
UNION ALL
SELECT * FROM table2
) t
GROUP BY col1, col2, col3
HAVING count(*) = 1
LIMIT 1
List all the columns in GROUP BY to compare the entire table.
If the result is an empty set, the two tables are identical.
If you want to see the differences, use this query:
SELECT * FROM (
SELECT 'table1' tname, col1, col2, col3 FROM table1
UNION ALL
SELECT 'table2' tname, col1, col2, col3 FROM table2
) t
GROUP BY col1, col2, col3
HAVING count(*) = 1
List the same columns in the inner SELECT as in the GROUP BY, plus a column to distinguish the two tables.
Just throwing this out there, you could emulate a full outer join and then return the rows where just the right or the left side is null.
select t1.*
from table1 t1
LEFT OUTER JOIN table2 t2
ON t1.col1 = t2.col1
AND t1.col2 = t2.col2
AND ...
WHERE t2.id is null
UNION
select t2.*
from table2 t2
LEFT OUTER JOIN table1 t1
ON t2.col1 = t1.col1
AND t2.col2 = t1.col2
AND ...
WHERE t1.id is null
With the FULL OUTER JOIN you can show all rows where the other row is not available in the other table.
Use the following query:
SELECT c1 = cjoin AND c2 = cjoin equiv
FROM (SELECT COUNT(*) c1 FROM Table1) t1,
(SELECT COUNT(*) c2 FROM Table2) t2,
(SELECT COUNT(*) cjoin
FROM Table1 t1
JOIN Table2 t2
ON t1.col1 = t2.col1 AND t1.col2 = t2.col2 AND t1.col3 = t2.col3 ...) tjoin
Assuming the tables have a unique key, this will return equiv = 1 if the tables are equal. It doesn't show the differences, it's just a binary test.
I was reading SQL Cookbook from A.Molinaro, when I came across a solution.
It is based on to tables
emp(empno,ename,job,mgr,hiredate,sal,comm,deptno)
and a view
V
which has the same columns but different rows. The columns mgr and comm might be NULL, other columns not.
The solution in the book is very long and it does not show all differences, although this was the stated problem in 3.7.
I made up my solution which is shorter and shows all differences (means all rows which have different counts in the two tables).
select * from
# those which are contained in the (distinct) union of (col1,col2,...,coln, count) of both tables:
( select empno,ename,job,mgr,hiredate,comm,deptno, count(*) cnt from emp group by empno,ename,job,mgr,hiredate,comm,deptno
union
select empno,ename,job,mgr,hiredate,comm,deptno, count(*) cnt from V group by empno,ename,job,mgr,hiredate,comm,deptno
) as unionOfBoth
where (empno,ename,job,mgr,hiredate,comm,deptno,cnt)
not in
# those which are contained in the intersection of both tables with the equal number of counts:
( select e.empno,e.ename,e.job,e.mgr,e.hiredate,e.comm,e.deptno,e.cnt
from
(select empno, ename,job,mgr,hiredate,comm,deptno, count(*) cnt from emp group by empno,ename,job,mgr,hiredate,comm,deptno) e,
(select empno, ename,job,mgr,hiredate,comm,deptno, count(*) cnt from V group by empno,ename,job,mgr,hiredate,comm,deptno) v
where
e.empno = v.empno
and e.ename = v.ename
and e.job = v.job
and ifnull(e.mgr,0) = ifnull(v.mgr,0)
and e.hiredate = v.mgr
and e.deptno = v.deptno
and ifnull(e.comm,0) = ifnull(v.comm,0)
and e.cnt = v.cnt
);
Basically you count the distinct rows in both tables and do a union (not union all) to get the tmp.table unionBoth. Then you remove those rows, which both tables have in common.
Here two rows r1 from table t1 and r2 from table t2 are considered the same, if
(r1,count of r1 in t1) = (r2, count of r2 in t2), which is equivalent to r1=r2 (on all columns) and (count of r1 in t1) = (count of r2 in t2).
If the tables are small enough, you can export both tables as csv files and then copy one of the tables and paste them side-by-side with the other table. You can just go row by row and see if the outputs are the same that way.