Selecting rows after a row based on the WHERE condition - mysql

For example let's say I have the table test, with only 1 property.
CREATE TABLE test (
alphabet VARCHAR NOT NULL
);
+----------+
| alphabet |
+----------+
| a |
| b |
| c |
| d |
| e |
+----------+
I only want to show a table of rows after b, so I would do something like.
SELECT * FROM test WHERE alphabet="b"
But that would give me the row b only, so I would use the LIMIT command to try to show other rows. I don't know how to do that though with LIMIT, as LIMIT requires an id and my table does not have an id or indicator of any kind, how do I show a table of.
+----------+
| alphabet |
+----------+
| c |
| d |
| e |
+----------+

SELECT * FROM test
WHERE alphabet > 'b'
or
SELECT * FROM test
ORDER BY alphabet
LIMIT 2, 100
SQLFiddle demo

Well your table schema is not good as it should always have a unique id to use (look into normalizing your table). but this does provide the result you are asking for.. its not a good way to do this though as it can sometimes return a different result.
SETUP:
CREATE TABLE customers (
`alphabet` VARCHAR(55)
);
insert into customers (`alphabet`)
values
('bill'),
('bob'),
('harry'),
('abe'),
('ben'),
('ashley'),
('cal'),
('fes'),
('parker'),
('gabe'),
('barry'),
('ruben'),
('sam'),
('john'),
('tim');
QUERY:
run the first query that sets a user-defined-variable and then run the second one
SET #QUERY := (
SELECT counter FROM(
SELECT #A := #A + 1 counter, alphabet FROM customers
CROSS JOIN (SELECT #A := 0) t
) temp
WHERE alphabet = 'ashley');
SELECT * FROM(
SELECT #A := #A + 1 counter, alphabet FROM customers
CROSS JOIN (SELECT #A := 0) t
) AS temp
WHERE counter > #QUERY;

This should work:
You can also apply most operators to the text field.
Select * from test where alphabet > 'b'

SELECT * FROM test WHERE alphabet NOT IN ('b');

Related

How to increment an id based on a field having a certain value going row by row

I'm importing data where groups of rows need to be given an id but there is nothing unique and common to them in the incoming data. What there is is a known indicator of the first row of a group and that the data is in order so we can step through row by row setting an id and then increment that id whenever this indicator is found. I've done this however it's incredibly slow, so is there a better way to do this in mysql or am i better off perhaps pre-processing the text data going line by line to add the id.
Example of data coming in, I need to increment an id whenever we see "NEW"
id,linetype,number,text
1,NEW,1234,sometext
2,CONTINUE,2412,anytext
3,CONTINUE,1,hello
4,NEW,2333,bla bla
5,CONTINUE,333,hello
6,NEW,1234,anything
So i'll end up with
id,linetype,number,text,group_id
1,NEW,1234,sometext,1
2,CONTINUE,2412,anytext,1
3,CONTINUE,1,hello,1
4,NEW,2333,bla bla,2
5,CONTINUE,333,hello,2
6,NEW,1234,anything,3
I've tried a stored procedure where i go row by row updating as i go, but it's super slow.
select count(*) from mytable into n;
set i=1;
while i<=n do
select linetype into l_linetype from mytable where id = i;
if l_linetype = "NEW" then
set l_id = l_id + 1;
end if;
update mytable set group_id = l_id where id = i;
end while;
No errors, it's just something that i could go line by line reading and writing the text file and do in a second while in mysql it's taking 100 seconds, it'd be nice if there was a way within mysql to do this reasonably fast so separate pre-processing was not needed.
In absence of MySQL 8+ (non availability of Windowing functions), you can use a Correlated Subquery instead:
EDIT: As pointed out by #Paul in comments,
SELECT t1.*,
(SELECT COUNT(*)
FROM your_table t2
WHERE t2.id <= t1.id
AND t2.linetype = 'NEW'
) group_id
FROM your_table t1
Above query can be more performant, if we define the following composite index (linetype, id). The order of columns is important, because we have a Range condition on id.
Previously:
SELECT t1.*,
(SELECT SUM(t2.linetype = 'NEW')
FROM your_table t2
WHERE t2.id <= t1.id
) group_id
FROM your_table t1
Above query requires indexing on id.
Another approach using User-defined Variables (Session variables) would be:
SELECT
t1.*,
#g := IF(t1.linetype = 'NEW', #g + 1, #g) AS group_id
FROM your_table t1
CROSS JOIN (SELECT #g := 0) vars
ORDER BY t1.id
It is like a looping technique, where we use Session Variables whose previous value is accessible during next row's calculation during SELECT. So, we initialize the variable #g to 0, and then compute it row by row. If we can encounter a row with NEW linetype, we increment it, else use the previous row's value. You can also check https://stackoverflow.com/a/53465139/2469308 for more discussion and caveats to take care of while using this approach.
For MySql 8.0+ you can use SUM() window function:
select *,
sum(linetype = 'NEW') over (order by id) group_id
from tablename
See the demo.
For previous versions you can simulate this functionality with the use of a variable:
set #group_id := 0;
select *,
#group_id := #group_id + (linetype = 'NEW') group_id
from tablename
order by id
See the demo.
Results:
| id | linetype | number | text | group_id |
| --- | -------- | ------ | -------- | -------- |
| 1 | NEW | 1234 | sometext | 1 |
| 2 | CONTINUE | 2412 | anytext | 1 |
| 3 | CONTINUE | 1 | hello | 1 |
| 4 | NEW | 2333 | bla bla | 2 |
| 5 | CONTINUE | 333 | hello | 2 |
| 6 | NEW | 1234 | anything | 3 |

Select all records where last n characters in column are not unique

I have bit strange requirement in mysql.
I should select all records from table where last 6 characters are not unique.
for example if I have table:
I should select row 1 and 3 since last 6 letters of this values are not unique.
Do you have any idea how to implement this?
Thank you for help.
I uses a JOIN against a subquery where I count the occurences of each unique combo of n (2 in my example) last chars
SELECT t.*
FROM t
JOIN (SELECT RIGHT(value, 2) r, COUNT(RIGHT(value, 2)) rc
FROM t
GROUP BY r) c ON c.r = RIGHT(value, 2) AND c.rc > 1
Something like that should work:
SELECT `mytable`.*
FROM (SELECT RIGHT(`value`, 6) AS `ending` FROM `mytable` GROUP BY `ending` HAVING COUNT(*) > 1) `grouped`
INNER JOIN `mytable` ON `grouped`.`ending` = RIGHT(`value`, 6)
but it is not fast. This requires a full table scan. Maybe you should rethink your problem.
EDITED: I had a wrong understanding of the question previously and I don't really want to change anything from my initial answer. But if my previous answer is not acceptable in some environment and it might mislead people, I have to correct it anyhow.
SELECT GROUP_CONCAT(id),RIGHT(VALUE,6)
FROM table1
GROUP BY RIGHT(VALUE,6) HAVING COUNT(RIGHT(VALUE,6)) > 1;
Since this question already have good answers, I made my query in a slightly different way. And I've tested with sql_mode=ONLY_FULL_GROUP_BY. ;)
This is what you need: a subquery to get the duplicated right(value,6) and the main query yo get the rows according that condition.
SELECT t.* FROM t WHERE RIGHT(`value`,6) IN (
SELECT RIGHT(`value`,6)
FROM t
GROUP BY RIGHT(`value`,6) HAVING COUNT(*) > 1);
UPDATE
This is the solution to avoid the mysql error in the case you have sql_mode=only_full_group_by
SELECT t.* FROM t WHERE RIGHT(`value`,6) IN (
SELECT DISTINCT right_value FROM (
SELECT RIGHT(`value`,6) AS right_value,
COUNT(*) AS TOT
FROM t
GROUP BY RIGHT(`value`,6) HAVING COUNT(*) > 1) t2
)
Fiddle here
Might be a fast code, as there is no counting involved.
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/0
select *
from tbl outr
where not exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Output:
| id | value |
| --- | --------------- |
| 2 | aaaaaaaaaaaaaa |
| 4 | aaaaaaaaaaaaaaB |
| 5 | Hello |
The logic is to test other rows that is not equal to the same id of the outer row. If those other rows has same right 6 characters as the outer row, then don't show that outer row.
UPDATE
I misunderstood the OP's intent. It's the reversed. Anyway, just reverse the logic. Use EXISTS instead of NOT EXISTS
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/3
select *
from tbl outr
where exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Output:
| id | value |
| --- | ----------- |
| 1 | abcdePuzzle |
| 3 | abcPuzzle |
UPDATE
Tested the query. The performance of my answer (correlated EXISTS approach) is not optimal. Just keeping my answer, so others will know what approach to avoid :)
GhostGambler's answer is faster than correlated EXISTS approach. For 5 million rows, his answer takes 2.762 seconds only:
explain analyze
SELECT
tbl.*
FROM
(
SELECT
RIGHT(value, 6) AS ending
FROM
tbl
GROUP BY
ending
HAVING
COUNT(*) > 1
) grouped
JOIN tbl ON grouped.ending = RIGHT(value, 6)
My answer (correlated EXISTS) takes 4.08 seconds:
explain analyze
select *
from tbl outr
where exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Straightforward query is the fastest, no join, just plain IN query. 2.722 seconds. It has practically the same performance as JOIN approach since they have the same execution plan. This is kiks73's answer. I just don't know why he made his second answer unnecessarily complicated.
So it's just a matter of taste, or choosing which code is more readable select from in vs select from join
explain analyze
SELECT *
FROM tbl
where right(value, 6) in
(
SELECT
RIGHT(value, 6) AS ending
FROM
tbl
GROUP BY
ending
HAVING
COUNT(*) > 1
)
Result:
Test data used:
CREATE TABLE tbl (
id INTEGER primary key,
value VARCHAR(20)
);
INSERT INTO tbl
(id, value)
VALUES
('1', 'abcdePuzzle'),
('2', 'aaaaaaaaaaaaaa'),
('3', 'abcPuzzle'),
('4', 'aaaaaaaaaaaaaaB'),
('5', 'Hello');
insert into tbl(id, value)
select x.y, 'Puzzle'
from generate_series(6, 5000000) as x(y);
create index ix_tbl__right on tbl(right(value, 6));
Performances without the index, and with index on tbl(right(value, 6)):
JOIN approach:
Without index: 3.805 seconds
With index: 2.762 seconds
IN approach:
Without index: 3.719 seconds
With index: 2.722 seconds
Just a bit neater code (if using MySQL 8.0). Can't guarantee the performance though
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/1
select x.*
from
(
select
*,
count(*) over(partition by right(value, 6)) as unique_count
from tbl
) as x
where x.unique_count = 1
Output:
| id | value | unique_count |
| --- | --------------- | ------------ |
| 2 | aaaaaaaaaaaaaa | 1 |
| 4 | aaaaaaaaaaaaaaB | 1 |
| 5 | Hello | 1 |
UPDATE
I misunderstood OP's intent. It's the reversed. Just change the count:
select x.*
from
(
select
*,
count(*) over(partition by right(value, 6)) as unique_count
from tbl
) as x
where x.unique_count > 1
Output:
| id | value | unique_count |
| --- | ----------- | ------------ |
| 1 | abcdePuzzle | 2 |
| 3 | abcPuzzle | 2 |

Find two closest elements from one table to other element from another table

I have two tables:
DROP TABLE IF EXISTS `left_table`;
CREATE TABLE `left_table` (
`l_id` INT(11) NOT NULL AUTO_INCREMENT,
`l_curr_time` INT(11) NOT NULL,
PRIMARY KEY(l_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `right_table`;
CREATE TABLE `right_table` (
`r_id` INT(11) NOT NULL AUTO_INCREMENT,
`r_curr_time` INT(11) NOT NULL,
PRIMARY KEY(r_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO left_table(l_curr_time) VALUES
(3),(4),(6),(10),(13);
INSERT INTO right_table(r_curr_time) VALUES
(1),(5),(7),(8),(11),(12);
I want to map (if exists) two closest r_curr_time from right_table to each l_curr_time from left_table such that r_curr_time must be greater or equal to l_curr_time.
The expected result for given values should be:
+------+-------------+-------------+
| l_id | l_curr_time | r_curr_time |
+------+-------------+-------------+
| 1 | 3 | 5 |
| 1 | 3 | 7 |
| 2 | 4 | 5 |
| 2 | 4 | 7 |
| 3 | 6 | 7 |
| 3 | 6 | 8 |
| 4 | 10 | 11 |
| 4 | 10 | 12 |
+------+-------------+-------------+
I have following solution which works for one closest value. But I do not like it very much because it silently rely on fact that GROUP BY will remain the first occurrence from group:
SELECT l_id, l_curr_time, r_curr_time, time_diff FROM
(
SELECT *, ABS(r_curr_time - l_curr_time) AS time_diff
FROM left_table
JOIN right_table ON 1=1
WHERE r_curr_time >= l_curr_time
ORDER BY l_id ASC, time_diff ASC
) t
GROUP BY l_id;
The output is following:
+------+-------------+-------------+-----------+
| l_id | l_curr_time | r_curr_time | time_diff |
+------+-------------+-------------+-----------+
| 1 | 3 | 5 | 2 |
| 2 | 4 | 5 | 1 |
| 3 | 6 | 7 | 1 |
| 4 | 10 | 11 | 1 |
+------+-------------+-------------+-----------+
4 rows in set (0.00 sec)
As you can see I am doing JOIN ON 1=1 is this OK also for large data (e.g. if both left_table and right_table has 10000 rows then Cartesian product will be 10^8 long)? Despite this lack I thing JOIN ON 1=1 is the only possible solution because first I need to create all possible combinations from existing tables and then pick up the ones which satisfies the condition, but if I'm wrong please correct me. Thanks.
This question is not trivial. In SQL Server or postgrsql it would be very easy because of the row_number() over x statement. This is not present in mysql. In mysql you have to deal with variables and chained select statements.
To solve this problem you have to combine multiple concepts. I will try to explain them one after the other to came to a solution that fits your question.
Lets start easy: How to build a table that contains the information of left_table and right_table?
Use a join. In this particular problem a left join and as the join condition we set that l_curr_time has to be smaller than r_curr_time. To make the rest easier we order this table by l_curr_time and r_curr_time. The statement is like the following:
SELECT l_id, l_curr_time, r_curr_time
FROM left_table l
LEFT JOIN right_table r ON l.l_curr_time<r.r_curr_time
ORDER BY l.l_curr_time, r.r_curr_time;
Now we have a table that is ordered and contains the information we want... but too many of them ;) Because the table is ordered it would be amazing if mysql could select only the two first occurent rows for each value in l_curr_time. This is not possible. We have to do it by ourselfs
mid part: How to number rows?
Use a variable! If you want to number a table you can use a mysql variable. There are two things to do: First of all we have to declare and define the variable. Second we have to increment this variable. Let's say we have a table with names and we want to know the position of all names when we order them by name:
SELECT name, #num:=#num+1 /* increment */
FROM table t, (SELECT #num:=0) as c
ORDER BY name ASC;
Hard part: How to number subset of rows depending of the value of one field?
Use variables to count (take a look above) and a variable for state pattern. We use the same principe like above but now we take a variable and save the value of the field we want depend on. If the value changes we reset the counter variable to zero. Again: This second variable have to be declared and defined. New Part: resetting a different variable depending on the content of the state variable:
SELECT
l_id,
l_curr_time,
r_curr_time,
#num := IF( /* (re)set num (the counter)... */
#l_curr_time = l_curr_time,
#num:= #num + 1, /* increment if the variable equals the actual l_curr_time field value */
1 /* reset to 1 if the values are not equal */
) as row_num,
#l_curr_time:=l_curr_time as lct /* state variable that holds the l_curr_time value */
FROM ( /* table from Step 1 of the explanation */
SELECT l_id, l_curr_time, r_curr_time
FROM left_table l
LEFT JOIN right_table r ON l.l_curr_time<r.r_curr_time
ORDER BY l.l_curr_time, r.r_curr_time
) as joinedTable
Now we have a table that holds all combinations we want (but too many) and all rows are numbered depending on the value of the l_curr_time field. In other words: Each subset is numbered from 1 to the amount of matching r_curr_time values that are greather or equal than l_curr_time.
Again the easy part: select all the values we want and depending on the row number
This part is easy. because the table we created in 3. is ordered and numbered we can filter by the number (it has to be smaller or equal to 2). Furthermore we select only the columns we're interessted in:
SELECT l_id, l_curr_time, r_curr_time, row_num
FROM ( /* table from step 3. */
SELECT
l_id,
l_curr_time,
r_curr_time,
#num := IF(
#l_curr_time = l_curr_time,
#num:= #num + 1,
1
) as row_num,
#l_curr_time:=l_curr_time as lct
FROM (
SELECT l_id, l_curr_time, r_curr_time
FROM left_table l
LEFT JOIN right_table r ON l.l_curr_time<r.r_curr_time
ORDER BY l.l_curr_time, r.r_curr_time
) as joinedTable
) as numberedJoinedTable,(
SELECT #l_curr_time:='',#num:=0 /* define the state variable and the number variable */
) as counterTable
HAVING row_num<=2; /* the number has to be smaller or equal to 2 */
That's it. This statement returns exactly what you want. You can see this statement in action in this sqlfiddle.
JoshuaK has the right idea. I just think it could be expressed a little more succinctly...
How about:
SELECT n.l_id
, n.l_curr_time
, n.r_curr_time
FROM
( SELECT a.*
, CASE WHEN #prev = l_id THEN #i:=#i+1 ELSE #i:=1 END i
, #prev := l_id prev
FROM
( SELECT l.*
, r.r_curr_time
FROM left_table l
JOIN right_table r
ON r.r_curr_time >= l.l_curr_time
) a
JOIN
( SELECT #prev := null,#i:=0) vars
ORDER
BY l_id,r_curr_time
) n
WHERE i<=2;

How can i find missing id's in mysql

i have a large MySQL Database with more than 1 Million rows. How can i find the missing eid's?
+----+-----+
| id | eid |
+----+-----+
| 1 | 1 |
+----+-----+
| 2 | 2 |
+----+-----+
| 3 | 4 |
+----+-----+
I like to list all missing eid's, the 3 in this example. I've tried many things but everything what i do need to much time.
I hope someone can help me.
Thanks
You can use NOT EXISTS to find the required rows.
create table t(id integer, eid integer);
insert into t values(1,1);
insert into t values(2,2);
insert into t values(3,4);
SELECT id
FROM t a
WHERE NOT EXISTS
( SELECT 1
FROM t b
WHERE b.eid = a.id );
or use NOT IN:
SELECT ID
FROM t
WHERE ID NOT IN
(SELECT EID
FROM t);
produces:
| id |
|----|
| 3 |
Try the below query
SELECT ID FROM table WHERE ID NOT IN(SELECT EID FROM table );
Finding duplicate numbers is easy:
select id, count() from sequence
group by id
having count() > 1;
In this case there are no duplicates, since I’m not concentrating on that in this post (finding duplicates is straightforward enough that I hope you can see how it’s done). I had to scratch my head for a second to find missing numbers in the sequence, though. Here is my first shot at it:
select l.id + 1 as start
from sequence as l
left outer join sequence as r on l.id + 1 = r.id
where r.id is null;
The idea is to exclusion join against the same sequence, but shifted by one position. Any number with an adjacent number will join successfully, and the WHERE clause will eliminate successful matches, leaving the missing numbers. Here is the result:
https://www.xaprb.com/blog/2005/12/06/find-missing-numbers-in-a-sequence-with-sql/
if you want a lighter way to search millions of rows of data,
I was try for search in more than 23 millions rows with old CPU (12.6Gb data need about 1gb of free ram):
Affected rows: 0 Found rows: 346.764 Warnings: 0 Duration for 2 queries: 00:04:48.0 (+ 2,656 sec. network)
SET #idBefore=0, #st=0,#diffSt=0,#diffEnd=0;
SELECT res.idBefore `betweenID`, res.ID `andNextID`
, res.startEID, res.endEID
, res.diff `diffEID`
-- DON'T USE this missingEIDfor more than a thousand of rows
-- this is just for sample view
, GROUP_CONCAT(b.aNum) `missingEID`
FROM (
SELECT
#idBefore `idBefore`
, #idBefore:=(a.id) `ID`
, #diffSt:=(#st) `startEID`
, #diffEnd:=(a.eid) `endEID`
, #st:=a.eid `end`
, #diffEnd-#diffSt-1 `diff`
FROM eid a
ORDER BY a.ID
) res
-- DON'T USE this integers for more than a thousand of rows
-- this is just for sample view
CROSS JOIN (SELECT a.ID + (b.ID * 10) + (c.ID * 100) AS aNum FROM integers a, integers b, integers c) b
WHERE res.diff>0 AND b.aNum BETWEEN res.startEID+1 AND res.endEID-1
GROUP BY res.ID;
check out this http://sqlfiddle.com/#!9/33deb3/9
and this is for missing ID http://sqlfiddle.com/#!9/3ea00c/9

Query to Segment Results Based on Equal Sets of Column Value

I'd like to construct a single query (or as few as possible) to group a data set. So given a number of buckets, I'd like to return results based on a specific column.
So given a column called score which is a double which contains:
90.00
91.00
94.00
96.00
98.00
99.00
I'd like to be able to use a GROUP BY clause with a function like:
SELECT MIN(score), MAX(score), SUM(score) FROM table GROUP BY BUCKETS(score, 3)
Ideally this would return 3 rows (grouping the results into 3 buckets with as close to equal count in each group as is possible):
90.00, 91.00, 181.00
94.00, 96.00, 190.00
98.00, 99.00, 197.00
Is there some function that would do this? I'd like to avoid returning all the rows and figuring out the bucket segments myself.
Dave
create table test (
id int not null auto_increment primary key,
val decimal(4,2)
) engine = myisam;
insert into test (val) values
(90.00),
(91.00),
(94.00),
(96.00),
(98.00),
(99.00);
select min(val) as lower,max(val) as higher,sum(val) as total from (
select id,val,#row:=#row+1 as row
from test,(select #row:=0) as r order by id
) as t
group by ceil(row/2)
+-------+--------+--------+
| lower | higher | total |
+-------+--------+--------+
| 90.00 | 91.00 | 181.00 |
| 94.00 | 96.00 | 190.00 |
| 98.00 | 99.00 | 197.00 |
+-------+--------+--------+
3 rows in set (0.00 sec)
Unluckily mysql doesn't have analytical function like rownum(), so you have to use some variable to emulate it. Once you do it, you can simply use ceil() function in order to group every tot rows as you like. Hope that it helps despite my english.
set #r = (select count(*) from test);
select min(val) as lower,max(val) as higher,sum(val) as total from (
select id,val,#row:=#row+1 as row
from test,(select #row:=0) as r order by id
) as t
group by ceil(row/ceil(#r/3))
or, with a single query
select min(val) as lower,max(val) as higher,sum(val) as total from (
select id,val,#row:=#row+1 as row,tot
from test,(select count(*) as tot from test) as t2,(select #row:=0) as r order by id
) as t
group by ceil(row/ceil(tot/3))