i have a large MySQL Database with more than 1 Million rows. How can i find the missing eid's?
+----+-----+
| id | eid |
+----+-----+
| 1 | 1 |
+----+-----+
| 2 | 2 |
+----+-----+
| 3 | 4 |
+----+-----+
I like to list all missing eid's, the 3 in this example. I've tried many things but everything what i do need to much time.
I hope someone can help me.
Thanks
You can use NOT EXISTS to find the required rows.
create table t(id integer, eid integer);
insert into t values(1,1);
insert into t values(2,2);
insert into t values(3,4);
SELECT id
FROM t a
WHERE NOT EXISTS
( SELECT 1
FROM t b
WHERE b.eid = a.id );
or use NOT IN:
SELECT ID
FROM t
WHERE ID NOT IN
(SELECT EID
FROM t);
produces:
| id |
|----|
| 3 |
Try the below query
SELECT ID FROM table WHERE ID NOT IN(SELECT EID FROM table );
Finding duplicate numbers is easy:
select id, count() from sequence
group by id
having count() > 1;
In this case there are no duplicates, since I’m not concentrating on that in this post (finding duplicates is straightforward enough that I hope you can see how it’s done). I had to scratch my head for a second to find missing numbers in the sequence, though. Here is my first shot at it:
select l.id + 1 as start
from sequence as l
left outer join sequence as r on l.id + 1 = r.id
where r.id is null;
The idea is to exclusion join against the same sequence, but shifted by one position. Any number with an adjacent number will join successfully, and the WHERE clause will eliminate successful matches, leaving the missing numbers. Here is the result:
https://www.xaprb.com/blog/2005/12/06/find-missing-numbers-in-a-sequence-with-sql/
if you want a lighter way to search millions of rows of data,
I was try for search in more than 23 millions rows with old CPU (12.6Gb data need about 1gb of free ram):
Affected rows: 0 Found rows: 346.764 Warnings: 0 Duration for 2 queries: 00:04:48.0 (+ 2,656 sec. network)
SET #idBefore=0, #st=0,#diffSt=0,#diffEnd=0;
SELECT res.idBefore `betweenID`, res.ID `andNextID`
, res.startEID, res.endEID
, res.diff `diffEID`
-- DON'T USE this missingEIDfor more than a thousand of rows
-- this is just for sample view
, GROUP_CONCAT(b.aNum) `missingEID`
FROM (
SELECT
#idBefore `idBefore`
, #idBefore:=(a.id) `ID`
, #diffSt:=(#st) `startEID`
, #diffEnd:=(a.eid) `endEID`
, #st:=a.eid `end`
, #diffEnd-#diffSt-1 `diff`
FROM eid a
ORDER BY a.ID
) res
-- DON'T USE this integers for more than a thousand of rows
-- this is just for sample view
CROSS JOIN (SELECT a.ID + (b.ID * 10) + (c.ID * 100) AS aNum FROM integers a, integers b, integers c) b
WHERE res.diff>0 AND b.aNum BETWEEN res.startEID+1 AND res.endEID-1
GROUP BY res.ID;
check out this http://sqlfiddle.com/#!9/33deb3/9
and this is for missing ID http://sqlfiddle.com/#!9/3ea00c/9
Related
I have multiple SELECT statements that all return the same columns but may return different resultsets. Is there any way to select all rows that are in all resultsets on database level?
E.g.
|---------------------|------------------|---------|
| ID | Name | Age |
|---------------------|------------------|---------|
| 1 | Paul | 50 |
| 2 | Peter | 40 |
| 3 | Frank | 20 |
| 4 | Pascal | 60 |
|---------------------|------------------|---------|
SELECT 1
SELECT name FROM table WHERE age > 40
Result: Paul, Pascal
SELECT 2
SELECT name FROM table where name like 'P%'
Result: Paul, Peter, Pascal
SELECT 3
SELECT name FROM table where id > 3
Result: Pascal
EDIT: This is a very simplified example of my problem. The statements can get very complex (joins over multiple tables), so a simple AND in the WHERE part is not the final solution.
The result should be Pascal. What I am looking for is something like a "reverse UNION".
Alternatively it would be possible to achieve that programatically (NodeJS), but I would like to avoid to iterate over all resultsets, because they might be quite huge.
Thanks in advance!
Is there any way to select all rows that are in all resultsets?
You seem to want and:
select name
from table
where age > 40 and name like 'P%' and id < 3
If using AND between the WHERE conditions is not possible, you could use multiple IN expressions on subqueries using your initial queries.
SELECT name
FROM table
WHERE id IN (SELECT id FROM table WHERE age > 40)
AND id IN (SELECT id FROM table where name like 'P%')
AND id IN (SELECT id FROM table where id < 3)
If you have different result sets and you want to see the intersection, you can use join:
select q1.id
from (<query 1>) q1 join
(<query 2>) q2
on q1.id = q2.id join
(<query 3>) q3
on q1.id = q3.id;
That said, I think GMB has the most concise answer to the question that you actually asked.
If your statements are complex, what you could do is to use a procedure where each of the statements put the matching id's into a temp table. Then select those rows where id's match the number of statements. This will also most likely be more efficient than one huge query with all complex statements combined into one.
create procedure sp_match_all()
begin
drop temporary table if exists match_tmp;
create temporary table match_tmp (
id int
);
insert into match_tmp
SELECT id FROM table WHERE age > 40;
insert into match_tmp
SELECT id FROM table where name like 'P%';
insert into match_tmp
SELECT id FROM table where id < 3;
select t.name
from table t
join (
select id
from match_tmp
group by id
having count(*)=3
) q on q.id=t.id;
drop temporary table match_tmp;
end
I have bit strange requirement in mysql.
I should select all records from table where last 6 characters are not unique.
for example if I have table:
I should select row 1 and 3 since last 6 letters of this values are not unique.
Do you have any idea how to implement this?
Thank you for help.
I uses a JOIN against a subquery where I count the occurences of each unique combo of n (2 in my example) last chars
SELECT t.*
FROM t
JOIN (SELECT RIGHT(value, 2) r, COUNT(RIGHT(value, 2)) rc
FROM t
GROUP BY r) c ON c.r = RIGHT(value, 2) AND c.rc > 1
Something like that should work:
SELECT `mytable`.*
FROM (SELECT RIGHT(`value`, 6) AS `ending` FROM `mytable` GROUP BY `ending` HAVING COUNT(*) > 1) `grouped`
INNER JOIN `mytable` ON `grouped`.`ending` = RIGHT(`value`, 6)
but it is not fast. This requires a full table scan. Maybe you should rethink your problem.
EDITED: I had a wrong understanding of the question previously and I don't really want to change anything from my initial answer. But if my previous answer is not acceptable in some environment and it might mislead people, I have to correct it anyhow.
SELECT GROUP_CONCAT(id),RIGHT(VALUE,6)
FROM table1
GROUP BY RIGHT(VALUE,6) HAVING COUNT(RIGHT(VALUE,6)) > 1;
Since this question already have good answers, I made my query in a slightly different way. And I've tested with sql_mode=ONLY_FULL_GROUP_BY. ;)
This is what you need: a subquery to get the duplicated right(value,6) and the main query yo get the rows according that condition.
SELECT t.* FROM t WHERE RIGHT(`value`,6) IN (
SELECT RIGHT(`value`,6)
FROM t
GROUP BY RIGHT(`value`,6) HAVING COUNT(*) > 1);
UPDATE
This is the solution to avoid the mysql error in the case you have sql_mode=only_full_group_by
SELECT t.* FROM t WHERE RIGHT(`value`,6) IN (
SELECT DISTINCT right_value FROM (
SELECT RIGHT(`value`,6) AS right_value,
COUNT(*) AS TOT
FROM t
GROUP BY RIGHT(`value`,6) HAVING COUNT(*) > 1) t2
)
Fiddle here
Might be a fast code, as there is no counting involved.
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/0
select *
from tbl outr
where not exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Output:
| id | value |
| --- | --------------- |
| 2 | aaaaaaaaaaaaaa |
| 4 | aaaaaaaaaaaaaaB |
| 5 | Hello |
The logic is to test other rows that is not equal to the same id of the outer row. If those other rows has same right 6 characters as the outer row, then don't show that outer row.
UPDATE
I misunderstood the OP's intent. It's the reversed. Anyway, just reverse the logic. Use EXISTS instead of NOT EXISTS
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/3
select *
from tbl outr
where exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Output:
| id | value |
| --- | ----------- |
| 1 | abcdePuzzle |
| 3 | abcPuzzle |
UPDATE
Tested the query. The performance of my answer (correlated EXISTS approach) is not optimal. Just keeping my answer, so others will know what approach to avoid :)
GhostGambler's answer is faster than correlated EXISTS approach. For 5 million rows, his answer takes 2.762 seconds only:
explain analyze
SELECT
tbl.*
FROM
(
SELECT
RIGHT(value, 6) AS ending
FROM
tbl
GROUP BY
ending
HAVING
COUNT(*) > 1
) grouped
JOIN tbl ON grouped.ending = RIGHT(value, 6)
My answer (correlated EXISTS) takes 4.08 seconds:
explain analyze
select *
from tbl outr
where exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Straightforward query is the fastest, no join, just plain IN query. 2.722 seconds. It has practically the same performance as JOIN approach since they have the same execution plan. This is kiks73's answer. I just don't know why he made his second answer unnecessarily complicated.
So it's just a matter of taste, or choosing which code is more readable select from in vs select from join
explain analyze
SELECT *
FROM tbl
where right(value, 6) in
(
SELECT
RIGHT(value, 6) AS ending
FROM
tbl
GROUP BY
ending
HAVING
COUNT(*) > 1
)
Result:
Test data used:
CREATE TABLE tbl (
id INTEGER primary key,
value VARCHAR(20)
);
INSERT INTO tbl
(id, value)
VALUES
('1', 'abcdePuzzle'),
('2', 'aaaaaaaaaaaaaa'),
('3', 'abcPuzzle'),
('4', 'aaaaaaaaaaaaaaB'),
('5', 'Hello');
insert into tbl(id, value)
select x.y, 'Puzzle'
from generate_series(6, 5000000) as x(y);
create index ix_tbl__right on tbl(right(value, 6));
Performances without the index, and with index on tbl(right(value, 6)):
JOIN approach:
Without index: 3.805 seconds
With index: 2.762 seconds
IN approach:
Without index: 3.719 seconds
With index: 2.722 seconds
Just a bit neater code (if using MySQL 8.0). Can't guarantee the performance though
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/1
select x.*
from
(
select
*,
count(*) over(partition by right(value, 6)) as unique_count
from tbl
) as x
where x.unique_count = 1
Output:
| id | value | unique_count |
| --- | --------------- | ------------ |
| 2 | aaaaaaaaaaaaaa | 1 |
| 4 | aaaaaaaaaaaaaaB | 1 |
| 5 | Hello | 1 |
UPDATE
I misunderstood OP's intent. It's the reversed. Just change the count:
select x.*
from
(
select
*,
count(*) over(partition by right(value, 6)) as unique_count
from tbl
) as x
where x.unique_count > 1
Output:
| id | value | unique_count |
| --- | ----------- | ------------ |
| 1 | abcdePuzzle | 2 |
| 3 | abcPuzzle | 2 |
I have some problem with a mySQL query.
The table A is this:
A.id
A.value1
A.user
Table B is:
B.id
B.user
I need to find value_that_i_need from query, by searching for B.user.
But I don't need only values with A.user, i need all values from Table A with the same A.id (inside Table A) that matches B.user.
So I need all distinct id (where there is B.user=A.user) and search for them inside table A by A.id.
I want to avoid 2 different queries! Already tried differents JOIN, nothing works for me.
EDIT
Ok, i will ty to explain the problem in a easiest way.
I have this table:
+---------+------------+
| id_user | another_id |
+---------+------------+
id_user -> unique id for each user
another_id -> an id related to something like a group
another_id can be the same to more users, but i need to take only users who are inside my same groups.
So i will have to check my groups (by searching my id_user) and then i have to see all users with my same another_id.
Problem is that if i query something like this:
SELECT * FROM table0 AS t0, something_like_groups AS slg
JOIN user_inside_group as uig ON slg.id_group=uig.group_id AND slg.id_user='my_user_id'
WHERE slg.id='id_group' AND t0.user_id=uig.user_id
Actually i have to join 3 tables, but the problem is that i need to find the "group" inside i am and get ALL informations about all users inside my same group. (without an additional query)
Perhaps you just want to find the min id based on b user and then get all the rows from a which match. for example
drop table if exists t,t1;
create table t( id int,user varchar(10));
create table t1( id int,user varchar(10));
insert into t values
(1,'aaa'),(1,'bbb'),(2,'ccc');
insert into t1 values
(1,'bbb'),(2,'ccc')
;
select t.id,t.user
from t
join
(
select t1.user,min(t.id) minid
from t1
join t on t.user = t1.user
group by t1.user
) s
on t.id = s.minid;
+------+------+
| id | user |
+------+------+
| 1 | aaa |
| 1 | bbb |
| 2 | ccc |
+------+------+
3 rows in set (0.00 sec)
Not sure if this is possible but I have a schema like this:
id | user_id | thread_id
1 | 1 | 1
2 | 4 | 1
3 | 1 | 2
4 | 3 | 2
I am trying to retrieve the thread_id where user_id = 1 and 4. I know that in(1,4) does not fit my needs as its pretty much a OR and will pull up record 3 as well and Exists only returns a bool.
You may use JOIN (that answer already exists) or HAVING, like this:
SELECT
thread_id,
COUNT(1) AS user_count
FROM
t
WHERE
user_id IN (1,4)
GROUP BY
thread_id
HAVING
user_count=2
-check the demo. HAVING will fit better in case of many id's (because with JOIN you'll need to join as many times as many id you have). This is a bit tricky, however: you may do = comparison only if your records are unique per (user_id, thread_id); for example, your user_id can repeat, then use >=, like in this demo.
Try this with join, i guess you need to do AND operation with user_id must be 4 and 1 then
SELECT
t1.thread_id
FROM
TABLE t1
JOIN TABLE t2
ON (t1.user_id = t2.user_id)
WHERE t1.user_id = 1
AND t2.user_id = 4
I have a table containing several fields. The primary key is userId. Currently the user id column contains values '1,2,3,4...etc' like so:
+------+
|userId|
+------+
| 1 |
| 2 |
| 3 |
| 4 |
...etc
I now want to add new rows ending in a,b,c, like so:
+------+
|userId|
+------+
| 1 |
| 1a |
| 1b |
| 1c |
| 2 |
| 2a |
| 2b |
| 2c |
...etc
The new rows should be identical to their parent row, except for the userId. (i.e. 1a,1b & 1c should match 1)
Also I can't guarantee that there won't already be a few 'a', 'b' or 'c's in userid column.
Is there a way to write an sql query to do this quickly and easily?
DON'T DO IT you will run into more problems than the one you are trying to solve!
add a new column to store the letter and make the primary key cover the original UserId and this new column.
If you ever just want the userId, you need to split the letter portion off, which will be expensive for your query and be a real pain.
I agree with KM. I'm not sure why you're creating these duplicate/composite IDs, but it feels like an uncomfortable direction to take.
That said, there is only really one obsticle to overcome; Apparently you can't select from and insert into the same table in MySQL.
So, you need to insert into a Temporary Table first, then insert into the real table...
CREATE Temporary TABLE MyNewUserIDs (
UserID VARCHAR(32)
)
INSERT INTO
myNewUserIDs
SELECT
CONCAT(myTable.UserID, suffix.val)
FROM
myTable
INNER JOIN
(SELECT 'A' as val UNION ALL SELECT 'B' UNION ALL SELECT 'C' UNION ALL SELECT 'D') AS suffix
ON RIGHT(myTable.UserID, 1) <> Suffix.val
WHERE
NOT EXISTS (SELECT * FROM myTable AS lookup WHERE UserID = CONCAT(myTable.UserID, suffix.val))
INSERT INTO
myTable
SELECT
UserID
FROM
MyNewUserIDs
Depending on your environment, you may want to look into locking the tables, so that changes are not made between creating the list of IDs and inserting them into your table.
This is quite simple from a SQL perspective to generate the extra rows: I'll do that here
#Km's answer tells you how to store it as 2 distinct values which I've assumed here. Feel free to concatenate userid and suffix if you prefer.
INSERT myTable (userid, suffix, co11, col2, ...coln)
SELECT M.userid, X.suffix, M.col1, M.col2, ..., M.coln
FROM
myTable M
CROSS JOIN
(SELECT 'a' AS Suffix UNION ALL SELECT 'b' UNION ALL SELECT 'c') X
WHERE
NOT EXISTS (SELECT *
FROM
MyTable M2
WHERE
M2.userid = M.userid ANS M2.Suffix = X.Suffix)