How do I get a list of numbers in MySQL? - mysql

I've got a database of movies, and I'd like a list of years where I don't have a movie for that year. So all I need is a list (1900 .. 2012) and then I can JOIN and IN and NOT IN on that all I want.
I've got:
CREATE PROCEDURE build_years(p1 SMALLINT)
BEGIN
CREATE TEMPORARY TABLE year (year SMALLINT(5) UNSIGNED);
label1: LOOP
INSERT INTO year VALUES (p1);
SET p1 = p1 + 1;
IF p1 > 2012 THEN LEAVE label1; END IF;
END LOOP;
END
But that seems so unSQL and only marginally less kludgy then running Python code to create the same table. I'd really like something that didn't use a stored procedure, didn't use looping and didn't use an actual table, in that order of concern.

This should work until you need more than 195 years , at which point you'll need to add a UNION ALL:
SELECT Year
FROM ( SELECT #i:= #i + 1 AS YEAR
FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY,
( SELECT #i:= 1899) AS i
) As Y
WHERE Year BETWEEN 1900 AND 2012
ORDER BY Year;
Although I am assuming that the COLLATION_CHARACTER_SET_APPLICABILITY System table has a default size of 195 based on my trusty testing ground SQL Fiddle

I had similar problem a few years ago. My solution was:
1. Sequence table
I created a table filled with integer sequence from 0 to < as much as it will be required >:
CREATE TABLE numbers (n INT);
INSERT INTO numbers VALUES (0),(1),(2),(3),(4);
INSERT INTO numbers SELECT n+5 FROM numbers;
INSERT INTO numbers SELECT n+10 FROM numbers;
INSERT INTO numbers SELECT n+20 FROM numbers;
INSERT INTO numbers SELECT n+40 FROM numbers;
etc.
It is executed only once, so can be created from outside of your app, even by hand.
2. Select data of a needed type and range
For integers it is obvious - i.e. range 1..99:
SELECT n FROM numbers WHERE n BETWEEN 1 AND 99;
Dates - 2h intervals from now to +2 days:
SELECT date_add(now(),INTERVAL 2*n HOUR) FROM numbers WHERE n BETWEEN 0 AND 23;
So in your case it could be:
SELECT n+1900 AS n_year FROM numbers WHERE n BETWEEN 0 AND 112;
Then JOIN it on n_year.

This will return a list of 2012 to 1900 if you really want to keep it to a query..
SELECT
TO_CHAR (ADD_MONTHS (TRUNC (SYSDATE, 'YYYY'), ((rno - 1) * -12)), 'YYYY') AS "years"
FROM
(
SELECT
LEVEL rno
FROM DUAL
CONNECT BY LEVEL <=
(SELECT TO_CHAR (TRUNC (SYSDATE, 'YYYY'), 'YYYY')
- 1899
yearstobuild
FROM DUAL))

The only solution I can think of according to your wishes sucks also ...
SELECT years.year FROM
(
SELECT 1900 AS year
UNION SELECT 1901
...
UNION SELECT 2012
) AS years
LEFT OUTER JOIN yourmovietable USING (year)
WHERE yourmovietable.year IS NULL;

Using this generic query is faster:
INSERT INTO numbers SELECT n+(SELECT COUNT(*) FROM numbers) FROM numbers;
Each query execution duplicates:
INSERT INTO numbers VALUES (0),(1),(2),(3),(4);
INSERT INTO numbers SELECT n+(SELECT COUNT(*) FROM numbers) FROM numbers;
INSERT INTO numbers SELECT n+(SELECT COUNT(*) FROM numbers) FROM numbers;
INSERT INTO numbers SELECT n+(SELECT COUNT(*) FROM numbers) FROM numbers;
...

select year into temporary table blaa from (generate_series(1900,2000)) where year not in(select distinct(year) from films)
dont know if this will work but you get the drift.

Related

Is there any way in SQL or function in MYSQL that sums up all the increments in a column?

I want to find a way to sum up all the increments in the value of a column.
We provide delivery services to our customers. A customer can pay as he go, but if he pays an upfront fee, he gets a better deal. There is a table that has the balance of the customer across the time. So I want to sum all the increments to the balance. I can't change the way the payment is recorded.
I have alredy coded an stored procedure that works, but is kind slow, so I'm looking for alternatives. I think that, maybe, an sql statement that can do this task, can outperform my stored procedure that has loops.
My stored procedure makes a select of the customer in a given date range, and insert the result in a temp table X. After that, it starts to pop rows from X table, comparing the balance value in that row against the previous row, and detects if there is an increment. If there is not increment, pops another row and do the same routine, if there is an increment, it calculates the difference between that row and the previous, and the result is inserted in another temp table Y.
When there are no rows left, the stored procedure performs a SUM in the temp table Y, and thus, you can know how much the customer has "refilled" its balance.
This is an example of the table X, and the expected result:
DATE BALANCE
---- -------
2019-02-01 200
2019-02-02 195 //from 200 to 195 there is a decrement, so it doesn't matter
2019-02-03 180
2019-02-04 150
2019-02-05 175 //there is an increment from 150 to 175, it's 25 that must be inserted in the temp table
2019-02-06 140
2019-02-07 180 //there is another increment, from 140 to 180, it's 40
So the resulting temp table Y must be something like this:
REFILL
------
25
40
The expected result is 65. My stored procedure returns this value, but as I said, is kind slow (it takes about 22 seconds to process 3900 rows, equivalent to 3 days, aprox), I think is because the loops. I would like to explore another alternatives. Because some details that I don't mention here, for a single costumer, I can have 1300 rows per day (the example is given in days, but I have rows by the minute). My tables are indexed, I think properly. I can't post my stored procedure, but it works as described (I know that "The devil is in the detail"). So any suggestion will be appreciated.
Use a user-defined variable to hold the balance from the previous row, and then subtract it from the current row's balance.
SELECT SUM(refill) AS total_refill
FROM (
SELECT GREATEST(0, balance - #prev_balance) AS refill, #prev_balance := balance
FROM (
SELECT balance
FROM tableX
ORDER BY date) AS t
CROSS JOIN (SELECT #prev_balance := NULL) AS ars
) AS t
There is a quite well-known mechanism to deal with these: Use a variable inside a field.
SELECT #result:=0;
SELECT #lastbalance:=9999999999; -- whatever value is sure to be highe than any real balance
SELECT SUM(increments) AS total FROM (
SELECT
IF(balance>#lastbalance, balance-#lastbalance, 0) AS increments,
#lastbalance:=balance AS ignore
FROM X -- insert real table name here
WHERE
-- insert selector here
ORDER BY
-- insert real chronological sorter here
) AS baseview;
Use lag() in MySQL 8+:
select sum(balance - prev_balance) as refills
from (select t.*, lag(balance) over (order by date) prev_balance
from t
) t
where balance > prev_balance;
In older versions of MySQL this is tricky. If the values are continuous dates, then a simple JOIN works:
select sum(t.balance - tprev.balance) as refills
from t join
t tprev
on tprev.date = t.date - 1
where t.balance > tprev.balance;
This may not be the case. Then the next best method is variables. But you have to be very careful. MySQL does not declare the order of evaluation of expressions in a SELECT. As the documentation explains:
The order of evaluation for expressions involving user variables is undefined. For example, there is no guarantee that SELECT #a, #a:=#a+1 evaluates #a first and then performs the assignment.
The variables need to be assigned and used in the same expression:
select sum(balance - prev_balance) as refills
from (select t.*,
(case when (#temp_prevb := #prevb) = NULL -- intentionally false
then -1
when (#prevb := balance)
then #temp_prevb
end) as prev_balance
from (select t.* from t order by date) t cross join
(select #prevb := NULL) params
) t
where balance > prev_balance;
And the final method is a correlated subquery:
select sum(balance - prev_balance) as refills
from (select t.*,
(select t2.balance
from t t2
where t2.date < t.date
order by t2.date desc
) as prev_balance
from t
) t
where balance > prev_balance;

Finding the area available for the date range

Suppose you have a room which is 100sqft and you want to rent it from 1st Aug to 31st Aug.
Bookings Table schema
startdate|enddate|area|storageid
you have following bookings
06-Aug|25-Aug|50|'abc'
05-Aug|11-Aug|40|'xyz'
18-Aug|23-Aug|30|'pqr'
13-Aug|16-Aug|10|'qwe'
Now somebody requests for booking from 08-Aug to 20-Aug. For this date range the maximum area available is 10sqft (Since, for dates 8,9,10 and 11 Aug only 10sq ft is available.)
How would you create an efficient SQL query to get this? Right now I have very messy and inefficient query which gives wrong results for some cases. I am not posting the query because It is so messy that I can't explain it myself.
I don't necessarily want to solve it using SQL only. If there is an algorithm that can solve it efficiently I would extract all the data from database.
Someone removed SQL Server, but here is the algorithm:
DECLARE #startDate date = '2016-08-09';
DECLARE #endDate date = '2016-08-20';
DECLARE #totalArea decimal(19,2) = 100;
WITH Src AS --Your source table
(
SELECT * FROM (VALUES
('2016-08-06', '2016-08-25', 50, 'abc'),
('2016-08-05', '2016-08-11', 40, 'xyz'),
('2016-08-18','2016-08-23',30,'pqr'),
('2016-08-13','2016-08-16',10,'qwe')
)T(startdate, enddate, area, storageid)
), Nums AS --Numbers table 0..N, N must be greater than ranges calculated
(
SELECT 0 N
UNION ALL
SELECT N+1 N FROM Nums
WHERE N<DATEDIFF(DAY,#startDate,#endDate)
) --Query
--You can use total-maxUsed from range of days
SELECT #totalArea-MAX(Used) FROM
(
--Group by day, sum all used areas
SELECT MidDate, SUM(Used) Used FROM
(
--Join table with numbers, split every day, if room used, return area
SELECT DATEADD(DAY, N, #startDate) MidDate, CASE WHEN DATEADD(DAY, N, #startDate) BETWEEN startDate AND endDate THEN area END Used
FROM Src
CROSS APPLY Nums
) T
GROUP BY MidDate
) T

How to add dates in two dates weekwise?

I have a table in which rows have dates as monday dates of the weeks. Some consecutive rows may not have consecutive weekdate and thus has gaps in between. This image will clear the situation:
As clear from the image, there is a gap between weekdates 2016-08-08 and 2016-09-05 as rows with weekdates '2016-08-15','2016-08-22','2016-08-29' are not there before '2016-09-05'.
So, how can I fill this gap with rows for all these dates and null for rest two columns?
Use a tally table
either from a physically stored numbers table, see code here
or create one on-the-fly with a CTE.
You might try this code, which will generate a list of Mondays
DECLARE #start INT=0;
DECLARE #end INT=20;
DECLARE #step INT=7;
WITH x AS(SELECT 1 AS N FROM(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS tbl(N))--10^1
,N3 AS (SELECT 1 AS N FROM x CROSS JOIN x AS N2 CROSS JOIN x N3) --10^3
,Tally AS(SELECT TOP(#end-#start +1) (ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) + #start -1) * #step AS Nr FROM N3
CROSS JOIN N3 N6 CROSS JOIN N3 AS N9)
SELECT DATEADD(DAY,Nr,{d'2016-08-01'}) AS Monday
FROM Tally
You can specify the count of generated rows with #start and #end, the #step should be 7 in your case. This will add 0, 7, 14, 21, ... to a given date (which should be a Monday in your case).
Now use a LEFT JOIN to combine this with your table data. This should result in a gap-less list of all Mondays together with values - if there are any...
Try this:
SELECT DATEADD(week,1,[select query]) -- In select query write the query
-- to get the date to which you need to add 1 week
In the above query it will take the current date i.e 2016-09-08 15:19:06.950 and add 1 week to it and give the resultant date i.e 2016-09-15 15:19:40.657

Calculating the Median with Mysql

I'm having trouble with calculating the median of a list of values, not the average.
I found this article
Simple way to calculate median with MySQL
It has a reference to the following query which I don't understand properly.
SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2
If I have a time column and I want to calculate the median value, what do the x and y columns refer to?
I propose a faster way.
Get the row count:
SELECT CEIL(COUNT(*)/2) FROM data;
Then take the middle value in a sorted subquery:
SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit #middlevalue) x;
I tested this with a 5x10e6 dataset of random numbers and it will find the median in under 10 seconds.
This will find an arbitrary percentile by replacing the COUNT(*)/2 with COUNT(*)*n where n is the percentile (.5 for median, .75 for 75th percentile, etc).
val is your time column, x and y are two references to the data table (you can write data AS x, data AS y).
EDIT:
To avoid computing your sums twice, you can store the intermediate results.
CREATE TEMPORARY TABLE average_user_total_time
(SELECT SUM(time) AS time_taken
FROM scores
WHERE created_at >= '2010-10-10'
and created_at <= '2010-11-11'
GROUP BY user_id);
Then you can compute median over these values which are in a named table.
EDIT: Temporary table won't work here. You could try using a regular table with "MEMORY" table type. Or just have your subquery that computes the values for the median twice in your query. Apart from this, I don't see another solution. This doesn't mean there isn't a better way, maybe somebody else will come with an idea.
First try to understand what the median is: it is the middle value in the sorted list of values.
Once you understand that, the approach is two steps:
sort the values in either order
pick the middle value (if not an odd number of values, pick the average of the two middle values)
Example:
Median of 0 1 3 7 9 10: 5 (because (7+3)/2=5)
Median of 0 1 3 7 9 10 11: 7 (because 7 is the middle value)
So, to sort dates you need a numerical value; you can get their time stamp (as seconds elapsed from epoch) and use the definition of median.
Finding median in mysql using group_concat
Query:
SELECT
IF(count%2=1,
SUBSTRING_INDEX(substring_index(data_str,",",pos),",",-1),
(SUBSTRING_INDEX(substring_index(data_str,",",pos),",",-1)
+ SUBSTRING_INDEX(substring_index(data_str,",",pos+1),",",-1))/2)
as median
FROM (SELECT group_concat(val order by val) data_str,
CEILING(count(*)/2) pos,
count(*) as count from data)temp;
Explanation:
Sorting is done using order by inside group_concat function
Position(pos) and Total number of elements (count) is identified. CEILING to identify position helps us to use substring_index function in the below steps.
Based on count, even or odd number of values is decided.
Odd values: Directly choose the element belonging to the pos using substring_index.
Even values: Find the element belonging to the pos and pos+1, then add them and divide by 2 to get the median.
Finally the median is calculated.
If you have a table R with a column named A, and you want the median of A, you can do as follows:
SELECT A FROM R R1
WHERE ( SELECT COUNT(A) FROM R R2 WHERE R2.A < R1.A ) = ( SELECT COUNT(A) FROM R R3 WHERE R3.A > R1.A )
Note: This will only work if there are no duplicated values in A. Also, null values are not allowed.
Simplest ways me and my friend have found out... ENJOY!!
SELECT count(*) INTO #c from station;
select ROUND((#c+1)/2) into #final;
SELECT round(lat_n,4) from station a where #final-1=(select count(lat_n) from station b where b.lat_n > a.lat_n);
Here is a solution that is easy to understand. Just replace Your_Column and Your_Table as per your requirement.
SET #r = 0;
SELECT AVG(Your_Column)
FROM (SELECT (#r := #r + 1) AS r, Your_Column FROM Your_Table ORDER BY Your_Column) Temp
WHERE
r = (SELECT CEIL(COUNT(*) / 2) FROM Your_Table) OR
r = (SELECT FLOOR((COUNT(*) / 2) + 1) FROM Your_Table)
Originally adopted from this thread.

SQL query that returns all dates not used in a table

So lets say I have some records that look like:
2011-01-01 Cat
2011-01-02 Dog
2011-01-04 Horse
2011-01-06 Lion
How can I construct a query that will return 2011-01-03 and 2011-01-05, ie the unused dates. I postdate blogs into the future and I want a query that will show me the days I don't have anything posted yet. It would look from the current date to 2 weeks into the future.
Update:
I am not too excited about building a permanent table of dates. After thinking about it though it seems like the solution might be to make a small stored procedure that creates a temp table. Something like:
CREATE PROCEDURE MISSING_DATES()
BEGIN
CREATE TABLE TEMPORARY DATES (FUTURE DATETIME NULL)
INSERT INTO DATES (FUTURE) VALUES (CURDATE())
INSERT INTO DATES (FUTURE) VALUES (ADDDATE(CURDATE(), INTERVAL 1 DAY))
...
INSERT INTO DATES (FUTURE) VALUES (ADDDATE(CURDATE(), INTERVAL 14 DAY))
SELECT FUTURE FROM DATES WHERE FUTURE NOT IN (SELECT POSTDATE FROM POSTS)
DROP TABLE TEMPORARY DATES
END
I guess it just isn't possible to select the absence of data.
You're right — SQL does not make it easy to identify missing data. The usual technique is to join your sequence (with gaps) against a complete sequence, and select those elements in the latter sequence without a corresponding partner in your data.
So, #BenHoffstein's suggestion to maintain a permanent date table is a good one.
Short of that, you can dynamically create that date range with an integers table. Assuming the integers table has a column i with numbers at least 0 – 13, and that your table has its date column named datestamp:
SELECT candidate_date AS missing
FROM (SELECT CURRENT_DATE + INTERVAL i DAY AS candidate_date
FROM integers
WHERE i < 14) AS next_two_weeks
LEFT JOIN my_table ON candidate_date = datestamp
WHERE datestamp is NULL;
One solution would be to create a separate table with one column to hold all dates from now until eternity (or whenever you expect to stop blogging). For example:
CREATE TABLE Dates (dt DATE);
INSERT INTO Dates VALUES ('2011-01-01');
INSERT INTO Dates VALUES ('2011-01-02');
...etc...
INSERT INTO Dates VALUES ('2099-12-31');
Once this reference table is set up, you can simply outer join to determine the unused dates like so:
SELECT d.dt
FROM Dates d LEFT JOIN Blogs b ON d.dt = b.dt
WHERE b.dt IS NULL
If you want to limit the search to two weeks in the future, you could add this to the WHERE clause:
AND d.dt BETWEEN NOW() AND ADDDATE(NOW(), INTERVAL 14 DAY)
The way to extract rows from the mysql database is via SELECT. Thus you cannot select rows that do not exist.
What I would do is fill my blog table with all possible dates (for a year, then repeat the process)
create table blog (
thedate date not null,
thetext text null,
primary key (thedate));
doing a loop to create all dates entries for 2011 (using a program, eg $mydate is the date you want to insert)
insert IGNORE into blog (thedate,thetext) values ($mydate, null);
(the IGNORE keyword to not create an error (thedate is a primary key) if thedate exists already).
Then you insert the values normally
insert into blog (thedate,thetext) values ($mydate, "newtext")
on duplicate key update thetext="newtext";
Finally to select empty entries, you just have to
select thedate from blog where thetext is null;
You probably not going to like this:
select '2011-01-03', count(*) from TABLE where postdate='2011-01-03'
having count(*)=0 union
select '2011-01-04', count(*) from TABLE where postdate='2011-01-04'
having count(*)=0 union
select '2011-01-05', count(*) from TABLE where postdate='2011-01-05'
having count(*)=0 union
... repeat for 2 weeks
OR
create a table with all days in 2011, then do a left join, like
select a.days_2011
from all_days_2011
left join TABLE on a.days_2011=TABLE.postdate
where a.days_2011 between date(now()) and date(date_add(now(), interval 2 week))
and TABLE.postdate is null;