I have a table in a MySQL DB with an UNIQUE INT(10) column. The table is pretty populated and the row contains non-consecutive entries of Integer numbers in that column. I would like to do a query, which gets me the smallest number (or the n smallest numbers) that is not in any row.
Example: The table contains rows with values (1, 2, 3, 5, 7, 8, 10, 12, 15) for the column. The sql statement should return i.e. the five lowest non-contained values, which are 4, 6, 9, 11, 13 in this case.
Is this possible with MySQL?
You can use a "numbers" table (it's handy for various operations):
CREATE TABLE num
( i UNSIGNED INT NOT NULL
, PRIMARY KEY (i)
) ;
INSERT INTO num (i)
VALUES
(1), (2), ..., (1000000) ;
Then:
SELECT
num.i
FROM
num
LEFT JOIN
tableX AS t
ON num.i = t.columnX
WHERE
t.columnX IS NULL
ORDER BY
num.i
LIMIT 5
or:
SELECT
num.i
FROM
num
WHERE
NOT EXISTS
( SELECT *
FROM tableX AS t
WHERE num.i = t.columnX
)
ORDER BY
num.i
LIMIT 5
Another approach, without using an auxilary table, would be to use MySQL variables. You can test it in SQL-Fiddle, test-2. The output is not the same as the previous (just to show that it can be done):
SELECT start_id, end_id
FROM
( SELECT
IF( t.columnX <> #id, #id, NULL) AS start_id
, IF( t.columnX <> #id, t.columnX-1, NULL) AS end_id
, #rows := #rows + (t.columnX - #id) AS r
, #id := t.columnX + 1 AS running_id
FROM
tableX AS t
CROSS JOIN
( SELECT #rows := 0
, #id := 1
) AS dummy
WHERE
#rows < 5
ORDER BY
t.columnX
) AS tmp
WHERE
start_id IS NOT NULL
This will work, but I think it is pretty inefficient. You won't need an extra table though (a table that would be (2^31-1)*4/1024^3 = 8GB for all positive numbers in INT). Also I advise you look at why you need this, because it might not be neccesary.
Also it will return the start and end of a range, but not all numbers in that range. (e.g. if you have numbers 1 and 5 it will return {0,2,4,6})
SELECT (t.num-1) AS bound FROM t
WHERE t.num-1 NOT IN (SELECT t.num FROM t)
UNION
SELECT (t.num+1) AS bound FROM t
WHERE t.num+1 NOT IN (SELECT t.num FROM t)
As I said this will be pretty inefficient, JOINs might be faster but you would need benchmark it.
SELECT (t.num-1) AS bound FROM t
LEFT JOIN t AS u ON t.num-1 = u.num
WHERE u.num IS NULL
UNION
SELECT (t.num+1) AS bound FROM t
LEFT JOIN t AS u ON t.num+1 = u.num
WHERE u.num IS NULL
Related
This is obviously wrong, but what would be the correct way to average the SUM of 3 columns and exclude the 0's?
SELECT (
AVG(NULLIF(`dices`.`Die1`,0)) +
AVG(NULLIF(`dices`.`Die2`,0)) +
AVG(NULLIF(`dices`.`Die3`,0))
) /3 as avgAllDice
FROM (
SELECT `Die1`,`Die2`,`Die3` FROM `GameLog`
WHERE PlayerId = "12345"
) dices
Thanks.
If I was keeping the inline view query (it's not clear why it's needed). I'd probably do something like this:
SELECT AVG( NULLIF( CASE d.i
WHEN 1 THEN dices.`Die1`
WHEN 2 THEN dices.`Die2`
WHEN 3 THEN dices.`Die3`
END
,0)
) AS `avgAllDice`
FROM ( SELECT gl.`Die1`
, gl.`Die2`
, gl.`Die3`
FROM `GameLog` gl
WHERE gl.playerId = '12345'
) dices
CROSS
JOIN ( SELECT 1 AS i UNION ALL SELECT 2 UNION ALL SELECT 3 ) d
The trick is the cross join operation, giving me three rows for each row returned from dices, and an expression that picks out values of Die1, Die2 and Die3 on each of three rows, respectively.
To exclude values of 0, we replace 0 with with NULL (since AVG doesn't include NULL values.)
Now with all of the non-zero DieN values stacked into a single column, we can just use the AVG function.
Another way to do it would be to get the numerator and denominator for each of Die1, Die2, Die3.... and then total up the numerators, total up the denominators, and then divide the total numerator by the total denominator.
This will should give an equivalent result.
SELECT ( IFNULL(t.n_die1,0) + IFNULL(t.n_die2,0) + IFNULL(t.n_die3,0) )
/ ( t.d_die1 + t.d_die2 + t.d_die3 )
AS avgAllDice
FROM ( SELECT SUM( NULLIF(gl.die1,0)) AS n_die1
, COUNT(NULLIF(gl.die1,0)) AS d_die1
, SUM( NULLIF(gl.die2,0)) AS n_die2
, COUNT(NULLIF(gl.die2,0)) AS d_die2
, SUM( NULLIF(gl.die3,0)) AS n_die3
, COUNT(NULLIF(gl.die3,0)) AS d_die3
FROM `GameLog` gl
WHERE gl.playerid = '12345'
) t
(I didn't work out what gets returned in the edge and corner cases... no matching rows in GameLog, all values of Die1, Die2 and Die3 are zero, etc., for either query. The results might be slightly different, returning a zero instead of NULL, divide by zero edge case, etc.)
FOLLOWUP
I ran a quick test of both queries.
CREATE DATABASE d20170228 ;
USE d20170228 ;
CREATE TABLE GameLog
( playerid VARCHAR(5) DEFAULT '12345'
, die1 TINYINT
, die2 TINYINT
, die3 TINYINT
);
INSERT INTO GameLog (die1,die2,die3)
VALUES (3,0,0),(2,1,0),(4,3,3),(3,3,3),(0,0,0),(4,4,4),(5,4,0),(0,0,2)
;
SELECT (3+2+1+4+3+3+3+3+3+4+4+4+5+4+2)/15 AS manual_avg
manual_avg is coming out 3.2.
Both queries are also returning 3.2
If you want to eliminate zeroes and NULLs, you can simply SELECT from the filtered master set multiple times, doing a UNION ALL on the results, then averaging against that.
SELECT AVG(`allDice`.`DieResult`)
FROM (
SELECT `Die1` AS `DieResult` FROM `GameLog` WHERE COALESCE(`Die1`, 0) <> 0 AND PlayerId = '12345'
UNION ALL
SELECT `Die2` FROM `GameLog` WHERE COALESCE(`Die2`, 0) <> 0 AND PlayerId = '12345'
UNION ALL
SELECT `Die3` FROM `GameLog` WHERE COALESCE(`Die3`, 0) <> 0 AND PlayerId = '12345'
) AS `allDice`
There's no need to overthink this one, it's not too difficult a problem
Let's say I was looking for the second most highest record.
Sample Table:
CREATE TABLE `my_table` (
`id` int(2) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`value` int(10),
PRIMARY KEY (`id`)
);
INSERT INTO `my_table` (`id`, `name`, `value`) VALUES (NULL, 'foo', '200'), (NULL, 'bar', '100'), (NULL, 'baz', '0'), (NULL, 'quux', '300');
The second highest value is foo. How many ways can you get this result?
The obvious example is:
SELECT name FROM my_table ORDER BY value DESC LIMIT 1 OFFSET 1;
Can you think of other examples?
I was trying this one, but LIMIT & IN/ALL/ANY/SOME subquery is not supported.
SELECT name FROM my_table WHERE value IN (
SELECT MIN(value) FROM my_table ORDER BY value DESC LIMIT 1
) LIMIT 1;
Eduardo's solution in standard SQL
select *
from (
select id,
name,
value,
row_number() over (order by value) as rn
from my_table t
) t
where rn = 1 -- can pick any row using this
This works on any modern DBMS except MySQL. This solution is usually faster than solutions using sub-selects. It also can easily return the 2nd, 3rd, ... row (again this is achievable with Eduardo's solution as well).
It can also be adjusted to count by groups (adding a partition by) so the "greatest-n-per-group" problem can be solved with the same pattern.
Here is a SQLFiddle to play around with: http://sqlfiddle.com/#!12/286d0/1
This only works for exactly the second highest:
SELECT * FROM my_table two
WHERE EXISTS (
SELECT * FROM my_table one
WHERE one.value > two.value
AND NOT EXISTS (
SELECT * FROM my_table zero
WHERE zero.value > one.value
)
)
LIMIT 1
;
This one emulates a window function rank() for platforms that don't have them. It can also be adapted for ranks <> 2 by altering one constant:
SELECT one.*
-- , 1+COALESCE(agg.rnk,0) AS rnk
FROM my_table one
LEFT JOIN (
SELECT one.id , COUNT(*) AS rnk
FROM my_table one
JOIN my_table cnt ON cnt.value > one.value
GROUP BY one.id
) agg ON agg.id = one.id
WHERE agg.rnk=1 -- the aggregate starts counting at zero
;
Both solutions need functional self-joins (I don't know if mysql allows them, IIRC it only disallows them if the table is the target for updates or deletes)
The below one does not need window functions, but uses a recursive query to enumerate the rankings:
WITH RECURSIVE agg AS (
SELECT one.id
, one.value
, 1 AS rnk
FROM my_table one
WHERE NOT EXISTS (
SELECT * FROM my_table zero
WHERE zero.value > one.value
)
UNION ALL
SELECT two.id
, two.value
, agg.rnk+1 AS rnk
FROM my_table two
JOIN agg ON two.value < agg.value
WHERE NOT EXISTS (
SELECT * FROM my_table nx
WHERE nx.value > two.value
AND nx.value < agg.value
)
)
SELECT * FROM agg
WHERE rnk = 2
;
(the recursive query will not work in mysql, obviously)
You can use inline initialization like this:
select * from (
select id,
name,
value,
#curRank := #curRank + 1 AS rank
from my_table t, (SELECT #curRank := 0) r
order by value desc
) tb
where tb.rank = 2
SELECT name
FROM my_table
WHERE value < (SELECT max(value) FROM my_table)
ORDER BY value DESC
LIMIT 1
SELECT name
FROM my_table
WHERE value = (
SELECT min(r.value)
FROM (
SELECT name, value
FROM my_table
ORDER BY value DESC
LIMIT 2
) r
)
LIMIT 1
current situation is to add below value of A01, B03, Z11 and X21 in repetitive way in field code for 400 hundreds row of data in table BabyCode.
Above is current table - without value in 'Code" column
Above is to be updated table - repetitive value is added in 'Code' column
You can do this:
INSERT INTO BabyCode
SELECT Codes.Code
FROM
(
SELECT id
FROM
(
SELECT t3.digit * 100 + t2.digit * 10 + t1.digit + 1 AS id
FROM TEMP AS t1
CROSS JOIN TEMP AS t2
CROSS JOIN TEMP AS t3
) t
WHERE id <= 400
) t,
(
SELECT 1 AS ID, 'A01' AS Code
UNION ALL
SELECT 2, 'B03'
UNION ALL
SELECT 3, 'Z11'
UNION ALL
SELECT 4, 'X21'
) codes;
But you will need to define a temp table, to use as an anchor table:
CREATE TABLE TEMP (Digit int);
INSERT INTO Temp VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
SQL Fiddle Demo
This will insert 400 hundred rows of the values A01, B03, Z11, and X21, into the code column in the table BabyCode.
You could put the four values into a virtual table identical to that used in #Mahmoud Gamal's answer, and, if the ID values in your table start at 1 and are sequential (have neither gaps nor duplicates), you could use the following method to join to the virtual table and update the target's Code column:
UPDATE YourTable t
INNER JOIN (
SELECT 1 AS ID, 'A01' AS Code
UNION ALL SELECT 2, 'B03'
UNION ALL SELECT 3, 'Z11'
UNION ALL SELECT 4, 'X21'
) x
ON (t.ID - 1) MOD 4 + 1 = x.ID
SET t.Code = x.Code
;
Otherwise you could use variables to assign 1, 2, 3, 4 sequentially to every row of your table, then you would be able join to the virtual table using those values:
UPDATE YourTable t
INNER JOIN (
SELECT ID, #rnk := CASE WHEN #rnk = 4 THEN 0 ELSE #rnk END + 1 AS rnk
FROM YourTable
CROSS JOIN (SELECT #rnk := 0) x
ORDER BY ID
) r ON t.ID = r.ID
INNER JOIN (
SELECT 1 AS ID, 'A01' AS Code
UNION ALL SELECT 2, 'B03'
UNION ALL SELECT 3, 'Z11'
UNION ALL SELECT 4, 'X21'
) x
ON r.rnk = x.ID
SET t.Code = x.Code
;
Both queries can be played with at SQL Fiddle:
Method 1
Method 2
I have two tables like this in mysql
a.cardnumber (unique)
a.position (numerical 3 digits or null)
a.serial
b.serial (unique)
b.lastused
I want to update any rows in "a" where position is above 600 AND "a.serial" is blank with any serial from "b.serial" where "b.lastused" is either null or more than 30 days ago. When the serial is copied into "a.serial" I want to update "b.lastused" with today's date so I know that the relevant "b.serial" has been used today.
There is no relation to the two tables apart from the serial and any serial from b can be used with any cardnumber in a.
I've tried this using my limited knowledge of mysql but I keep getting an error from my mysql desktop program to say I have an error in my query :(
Any help much appreciated!
I'm assuming here that you want to use a separate b.serial for each row to be updated in a. (This isn't specifically stated, but it seems to me to be most likely; please feel free to correct my assumption if it is wrong.)
I setup a small example. It wasn't clear what the datatypes for each of the columns, so I used INT where I wasn't sure. I used DATE datatype (rather than DATETIME) for lastused.
CREATE TABLE a (`cardnumber` VARCHAR(10) NOT NULL PRIMARY KEY, `position` INT, `serial` INT);
CREATE TABLE b (`serial` INT NOT NULL PRIMARY KEY, lastused DATE);
INSERT INTO a VALUES ('x0000',555,NULL),('x0001',700,123),('a1111',601,NULL),('a2222',602,NULL);
INSERT INTO b VALUES (100,'2012-07-15'),(101,NULL),(102,'2010-01-01'),(103,NULL),(104,NULL);
SELECT * FROM a;
SELECT * FROM b;
Based on the conditions you give, the rows with cardnumbers 'a1111' and 'a2222' should get updated, the other two rows should not (position <= 600, serial already assigned).
Before we run an UPDATE, we want to first run a SELECT that returns the rows to be updated, along with the values that will be assigned. Once we get that, we can convert that to a multi-table UPDATE statement.
SELECT a.cardnumber AS `a.cardnumber`
, a.position AS `a.position`
, a.serial AS `a.serial`
, b.serial AS `b.serial`
, b.lastused AS `b.lastused`
FROM (
SELECT #i := #i + 1 AS i
, aa.*
FROM a aa
JOIN (SELECT #i := 0) ii
WHERE aa.position > 600 /* assuming `position` is numeric datatype */
AND aa.serial IS NULL /* assuming 'blank' represented by NULL */
ORDER BY aa.cardnumber
) ia
JOIN (
SELECT #j := #j + 1 AS j
, bb.serial
, bb.lastused
FROM b bb
JOIN (SELECT #j := 0) jj
WHERE bb.lastused IS NULL
OR bb.lastused < DATE_ADD(NOW(),INTERVAL -30 DAY)
ORDER BY bb.serial
) jb
ON ia.i = jb.j
JOIN a ON a.cardnumber = ia.cardnumber
JOIN b ON b.serial = jb.serial
To convert that to an UPDATE, replace the SELECT ... FROM with UPDATE, and add a SET clause to assign new values to the tables.
UPDATE (
SELECT #i := #i + 1 AS i
, aa.*
FROM a aa
JOIN (SELECT #i := 0) ii
WHERE aa.position > 600
AND aa.serial IS NULL
ORDER BY aa.cardnumber
) ia
JOIN (
SELECT #j := #j + 1 AS j
, bb.serial
, bb.lastused
FROM b bb
JOIN (SELECT #j := 0) jj
WHERE bb.lastused IS NULL
OR bb.lastused < DATE_ADD(NOW(),INTERVAL -30 DAY)
ORDER BY bb.serial
) jb
ON ia.i = jb.j
JOIN a ON a.cardnumber = ia.cardnumber
JOIN b ON b.serial = jb.serial
SET a.serial = b.serial
, b.lastused = DATE(NOW())
-- 4 row(s) affected
You can run the queries for the inline views seperately (ia, jb) to verify that these are getting the rows you want to update.
The join from ia to a, and from jb to b, should be on the primary keys unique key.
The purpose of the ia and jb inline views is to get sequential numbers assigned to those rows so we can match them to each other.
The joins to a and b are to get back to the row in the original table, which is what we want to update.
(Obviously, some adjustments need to be made if serial is not an INT, or lastused is a DATETIME rather than a DATE.)
But this is an example of how I would go about doing the UPDATE you want to do (as best I understood it.)
NOTE: This approach works with MySQL versions that support subqueries. For MySQL 4.0, you would need to run this in steps, storing the results from the "ia" and "jb" inline views (subqueries) into actual tables. Then reference those tables in the query in place of the inline views. The ii and jj subqueries can be removed, and replaced with separate SELECT #i := 0, #j := 0 statement prior to the execution of the queries that reference these variables.
let me know if this works
Update table_a
set serial =
(
select b.serial from table_b b
where b.lastused = NULL
OR b.lastused < (current date - 30) limit 1
)
where cardnumber in
(
select a.cardnumber
from table_a a
where a.position > 600
and a.serial = NULL
)
update table_b b
set b.lastused = current date
where b.lastused = NULL
OR b.lastused < (current date - 30)
Suppose there is a table with only two columns (an example is shown below). Every '1' entry should be followed (in the sorted order given below) by a '0'. However, as you can see, in the table, there are some 'orphans' where there are two consecutive '1's.
How can I create a query that returns all the rows, except for the first of any consecutive '1's? (This would reduce the example below from 16 rows to 14)
1 E
0 A
1 T
0 S
1 R
0 E
1 F
0 T
1 G
1 T
0 R
1 X
1 R
0 R
1 E
0 T
I'm going to try and clarify my problem, I think that above I simplified it too much. Imagine one table called logs, with four columns:
user (a string containing a username)
machine (a string uniquely identifying various PCs)
type (event's type: a 1 for login and a 0 for logout)
time (the time of the event being logged)
[The machine/time pair provides a unique key, as no machine can be logged in or out of twice at the same instant. Presumably an 'ID' column could be artificially created based on machine/time sort if needed.]
The idea is that every login event should be accompanied by a logout event. In an ideal word it would be fairly easy to match logins to logouts, and hence analyse the time spent logged in.
However, in the case of a power cut, the logout will not be recorded. Therefore (considering only one machine's data, sorted by time) if there are two login events in a row, we want to ignore the first login, because we don't have any reliable data from it. This is the problem I am trying to solve.
Provided, that
only 1's are dupes, never 0's
You want to get rid of all the first 1's if there are more.
Your text says "except for the first of any consecutive", but I think, this is what you want. Or there can only ever be 2, then it is the same.
SELECT x.*
FROM x
LEFT JOIN x y on y.id = (x.id + 1)
WHERE (x.nr = y.nr) IS NOT TRUE -- OR x.nr = 0
ORDER BY x.id
If you want to preserve double 0's, use the commented clause additionally, but probably not needed.
Edit after question edit:
You may want to add an auto-increment column to your data to make this simpler:
Generate (i.e. write) a row number index column in MySQL
Other RDBMS (PostgreSQL, Oracle, SQL Server, ..) have window functions like row_number() or lag() and lead() that make such an operation much easier.
Assuming you get an id (add column, set column id = record number in database) use:
select a.*
from the_table a
left join the_table b on b.id = a.id + 1
and b.col1 = 0
where a.col1 = 1
and b.id is null
Try:
select l.*
from logs l
where l.type = 0 or
not (select type
from (select * from logs order by `time` desc) n
where n.machine = l.machine and
n.user = l.user and
n.time > l.time)
group by () )
USING a CTE to separate the lag-logic from the selection criteria.
DROP TABLE tmp.bits;
CREATE TABLE tmp.bits
( id SERIAL NOT NULL
, bit INTEGER NOT NULL
, code CHAR(1)
);
INSERT INTO tmp.bits(bit, code) VALUES
(1, 'T' )
, (0, 'S' )
, (1, 'R' )
, (0, 'E' )
, (1, 'F' )
, (0, 'T' )
, (1, 'G' )
, (1, 'T' )
, (0, 'R' )
, (1, 'X' )
, (1, 'R' )
, (0, 'R' )
, (1, 'E' )
, (0, 'T' )
;
SET search_path='tmp';
SELECT * FROM bits;
-- EXPLAIN ANALYZE
WITH prevnext AS (
SELECT
bt.id AS thisid
, bt.bit AS thisbit
, bt.code AS thiscode
, bp.bit AS prevbit
, bp.code AS prevcode
FROM bits bt
LEFT JOIN bits bp ON (bt.id > bp.id)
AND NOT EXISTS ( SELECT * FROM bits nx
WHERE nx.id > bp.id
AND nx.id < bt.id
)
)
SELECT thisid, thisbit, thiscode
FROM prevnext
WHERE thisbit=0
OR prevbit IS NULL OR thisbit <> prevbit
;
EDIT:
for those poor soals that cannot use CTEs, it is easy to create a view instead:
CREATE VIEW prevnext AS (
SELECT
bt.id AS thisid
, bt.bit AS thisbit
,bt.code AS thiscode
, bp.bit AS prevbit
, bp.code AS prevcode
FROM bits bt
LEFT JOIN bits bp ON (bt.id > bp.id)
AND NOT EXISTS ( SELECT * FROM bits nx
WHERE nx.id > bp.id
AND nx.id < bt.id
)
)
;
SELECT thisid, thisbit, thiscode
FROM prevnext
WHERE thisbit=0
OR prevbit IS NULL OR thisbit <> prevbit
;