MySQL: select random individual from available to populate new table - mysql

I am trying to automate the production of a roster based on leave dates and working preferences. I have generated some data to work with and I now have two tables - one with a list of individuals and their preferences for working on particular days of the week(e.g. some prefer to work on a Tuesday, others only every other Wednesday, etc), and another with leave dates for individuals. That looks like this, where firstpref and secondpref represent weekdays with Mon = 1, Sun = 7 and firstprefclw represents a marker for which week of a 2 week pattern someone prefers (0 = no pref, 1 = wk 1 preferred, 2 = wk2 preferred)
initials | firstpref | firstprefclw | secondpref | secondprefclw
KP | 3 | 0 | 1 | 0
BD | 2 | 1 | 1 | 0
LW | 3 | 0 | 4 | 1
Then there is a table leave_entries which basically has the initials, a start date, and an end date for each leave request.
Finally, there is a pre-calculated clwdates table which contains a marker (a 1 or 2) for each day in one of its columns as to what week of the roster pattern it is.
I have run this query:
SELECT #tdate, DATE_FORMAT(#tdate,'%W') AS whatDay, GROUP_CONCAT(t1.initials separator ',') AS available
FROM people AS t1
WHERE ((t1.firstpref = (DAYOFWEEK(#tdate))-1
AND (t1.firstprefclw = 0 OR (t1.firstprefclw = (SELECT c_dates.clw from clwdates AS c_dates LIMIT i,1))))
OR (t1.secondpref = (DAYOFWEEK(#tdate))-1
AND (t1.secondprefclw = 0 OR (t1.secondprefclw = (SELECT c_dates.clw from clwdates AS c_dates LIMIT i,1)))
OR ((DAYOFWEEK(#tdate))-1 IN (0,5,6))
AND t1.initials NOT IN (SELECT initials FROM leave_entries WHERE #tdate BETWEEN leave_entries.start_date and leave_entries.end_date)
);
My output from that is a list of dates with initials of the pattern:
2018-01-03;Wednesday;KP,LW,TH
My desired output is
2018-01-03;Wednesday;KP
Where the initials of the person have been randomly selected from the list of available people generated by the first set of SELECTs.
I have seen a SO post where a suggestion of how to do this has been made involving SUBSTRING_INDEX (How to select Random Sub string,which seperated by coma(",") From a string), however I note the comment that CSV is not the way to go, and since I have a table which is not CSV, I am wondering:
How can I randomly select an individual's initials from the available ones and create a table which is basically date ; random_person?

So I figured out how to do it.
The first select (outlined above) forms the heart of a PROCEDURE called ROWPERROW() and generates a table called available_people
This is probably filthy MySQL code, but it works:
SET #tdate = 0
DROP TABLE IF EXISTS on_call;
CREATE TABLE working(tdate DATE, whatDay VARCHAR(20), selected VARCHAR(255));
DELIMITER //
DROP PROCEDURE IF EXISTS ROWPERROW2;
CREATE PROCEDURE ROWPERROW2()
BEGIN
DECLARE n INT DEFAULT 0;
DECLARE kk INT DEFAULT 0;
SET n=90; -- or however many days the roster is going to run for
SET kk=0;
WHILE kk<n DO
SET #tdate = (SELECT c_dates.fulldate from clwdates AS c_dates LIMIT kk,1);
INSERT INTO working
SELECT #tdate, DATE_FORMAT(#tdate,'%W') AS whatDay, t1.available
FROM available_people AS t1 -- this is the table created by the first query above
WHERE tdate = #tdate ORDER BY RAND() LIMIT 1;
SET kk = kk + 1;
END WHILE;
end;
//
DELIMITER ;
CALL ROWPERROW2();
SELECT * from working;

Related

Different results for the same query but inside a function

I have a table with Pontuation(Pontuacao) and an unique number for Accomodation(Estadia) and i want to calculate the average pontuation of each accomodation.
This is the table:
Estadia | Pontuacao
-------------------
5 | 5
-------------------
5 | 5
So i made this funcion:
delimiter $$
create function mediapontuacao(estadia int)
returns float
begin
declare media float;
select sum(Pontuacao)/count(*) into media
from EstadiaUtilizador
where Estadia = estadia;
return media;
end $$
If i do this
select mediapontuacao(5); //calculate average pontuation of the accomodation which number is 5
This query gives me the value of 3.965.
But if i do this
select sum(Pontuacao)/count(*)
from EstadiaUtilizador
where Estadia = 5;
In other words calculate average pontuation of the accomodation which number is 5, the exact same thing the function i wrote should do and this query gives me the value of 5.00 which is the correct answer.
I am puzzled why i get different values when it should give the same value, i think.
The problem is here:
where Estadia = estadia
which is the same as, say,
where 1 = 1
Your parameter and column should have different names, so the DBMS knows what you are talking about.
You must use GROUP BYclause
select
Estadia,
sum(Pontuacao) / count(*) as mediapontuacao
from
EstadiaUtilizador
group by
Estadia
having Estadia = 5

select one row multiple time when using IN()

I have this query :
select
name
from
provinces
WHERE
province_id IN(1,3,2,1)
ORDER BY FIELD(province_id, 1,3,2,1)
the Number of values in IN() are dynamic
How can I get all rows even duplicates ( in this example -> 1 ) with given ORDER BY ?
the result should be like this :
name1
name3
name2
name1
plus I shouldn't use UNION ALL :
select * from provinces WHERE province_id=1
UNION ALL
select * from provinces WHERE province_id=3
UNION ALL
select * from provinces WHERE province_id=2
UNION ALL
select * from provinces WHERE province_id=1
You need a helper table here. On SQL Server that can be something like:
SELECT name
FROM (Values (1),(3),(2),(1)) As list (id) --< List of values to join to as a table
INNER JOIN provinces ON province_id = list.id
Update: In MySQL Split Comma Separated String Into Temp Table can be used to split string parameter into a helper table.
To get the same row more than once you need to join in another table. I suggest to create, only once(!), a helper table. This table will just contain a series of natural numbers (1, 2, 3, 4, ... etc). Such a table can be useful for many other purposes.
Here is the script to create it:
create table seq (num int);
insert into seq values (1),(2),(3),(4),(5),(6),(7),(8);
insert into seq select num+8 from seq;
insert into seq select num+16 from seq;
insert into seq select num+32 from seq;
insert into seq select num+64 from seq;
/* continue doubling the number of records until you feel you have enough */
For the task at hand it is not necessary to add many records, as you only need to make sure you never have more repetitions in your in condition than in the above seq table. I guess 128 will be good enough, but feel free to double the number of records a few times more.
Once you have the above, you can write queries like this:
select province_id,
name,
#pos := instr(#in2 := insert(#in2, #pos+1, 1, '#'),
concat(',',province_id,',')) ord
from (select #in := '0,1,2,3,1,0', #in2 := #in, #pos := 10000) init
inner join provinces
on find_in_set(province_id, #in)
inner join seq
on num <= length(replace(#in, concat(',',province_id,','),
concat(',+',province_id,',')))-length(#in)
order by ord asc
Output for the sample data and sample in list:
| province_id | name | ord |
|-------------|--------|-----|
| 1 | name 1 | 2 |
| 2 | name 2 | 4 |
| 3 | name 3 | 6 |
| 1 | name 1 | 8 |
SQL Fiddle
How it works
You need to put the list of values in the assignment to the variable #in. For it to work, every valid id must be wrapped between commas, so that is why there is a dummy zero at the start and the end.
By joining in the seq table the result set can grow. The number of records joined in from seq for a particular provinces record is equal to the number of occurrences of the corresponding province_id in the list #in.
There is no out-of-the-box function to count the number of such occurrences, so the expression at the right of num <= may look a bit complex. But it just adds a character for every match in #in and checks how much the length grows by that action. That growth is the number of occurrences.
In the select clause the position of the province_id in the #in list is returned and used to order the result set, so it corresponds to the order in the #in list. In fact, the position is taken with reference to #in2, which is a copy of #in, but is allowed to change:
While this #pos is being calculated, the number at the previous found #pos in #in2 is destroyed with a # character, so the same province_id cannot be found again at the same position.
Its unclear exactly what you are wanting, but here's why its not working the way you want. The IN keyword is shorthand for creating a statement like ....Where province_id = 1 OR province_id = 2 OR province_id = 3 OR province_id = 1. Since province_id = 1 is evaluated as true at the beginning of that statement, it doesn't matter that it is included again later, it is already true. This has no bearing on whether the result returns a duplicate.

MySql, Without loop filling a table with semi/random data

How to have this code/output in MySql:
Had a recursive cte in MSSQL to fill a table with random data without loop e.g begin/end. Searched for similar logic in MySql but most or all solutions were using begin/end or for loops. Wonder if you could suggest a solution without loop in MySql.
Thanks
--MSSQL cte:------------------------------------
with t1( idi,val ) as
(
select
idi=1
,val=cast( 1 as real)
union all
select
idi=idi+1
,val=cast(val+rand() as real)
from t1
where idi<5
)
select idi,val from t1
-----------------------------------------------
Output in MSSQL:( semi random values)
idi | val
-------------
1 | 1
2 | 1.11
3 | 1.23
4 | 1.35
5 | 1.46
Edit:
Regarding discussions which considers set based codes as loop based codes indeed, I could understand this but just out of interest gave it a try in MSSQL 2008r2, here is the result:
1- above code with 32000 recursion took 2.812 sec
2- above output created with WHILE BEGIN END loop for 32000 took 53.640 sec
Obviously this is a big difference in execution time.
Here is the loop based code:
insert into #t1(idi,val)
select
idi=1
,val=1
declare #ii int = 2
while #ii<32000
begin
insert into #t1(idi,val)
select
idi=idi+1
,val=val+rand()
from #t1
where idi=#ii-1
set #ii=#ii+1
end
select * from #t1
MySql doesn't support CTE.
You need a procedure or some tricky queries like this one:
set #id=0;
set #val=0;
SELECT #id:=#id+1 As id,
#val:=#val+rand() As val
FROM information_schema.tables x
CROSS JOIN information_schema.tables y
LIMIT 10

Efficient way to remove successive duplicate rows in MySQL

I have a table with columns like (PROPERTY_ID, GPSTIME, STATION_ID, PROPERTY_TYPE, VALUE) where PROPERTY_ID is primary key and STATION_ID is foreign key.
This table records state changes; each row represents property value of some station at given time. However, its data was converted from old table where each property was a column (like (STATION_ID, GPSTIME, PROPERTY1, PROPERTY2, PROPERTY3, ...)). Because usually only one property changed at time I have lots of duplicates.
I need to remove all successive rows with same values.
Example. Old table contained values like
time stn prop1 prop2
100 7 red large
101 7 red small
102 7 blue small
103 7 red small
The converted table is
(order by time,type) (order by type,time)
time stn type value time stn type value
100 7 1 red 100 7 1 red
100 7 2 large 101 7 1 red
101 7 1 red 102 7 1 blue
101 7 2 small 103 7 1 red
102 7 1 blue 100 7 2 large
102 7 2 small 101 7 2 small
103 7 1 red 102 7 2 small
103 7 2 small 103 7 2 small
should be changed to
time stn type value
100 7 1 red
100 7 2 large
101 7 2 small
102 7 1 blue
103 7 1 red
The table contains about 22 mln rows.
My current approach is to use procedure to iterate over the table and remove duplicates:
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE id INT;
DECLARE psid,nsid INT DEFAULT null;
DECLARE ptype,ntype INT DEFAULT null;
DECLARE pvalue,nvalue VARCHAR(50) DEFAULT null;
DECLARE cur CURSOR FOR
SELECT station_property_id,station_id,property_type,value
FROM station_property
ORDER BY station_id,property_type,gpstime;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur;
read_loop: LOOP
FETCH cur INTO id,nsid,ntype,nvalue;
IF done THEN
LEAVE read_loop;
END IF;
IF (psid = nsid and ptype = ntype and pvalue = nvalue) THEN
delete from station_property where station_property_id=id;
END IF;
SET psid = nsid;
SET ptype = ntype;
SET pvalue = nvalue;
END LOOP;
CLOSE cur;
END
However, it is too slow. On test table with 20000 rows it removes 10000 duplicates for 6 minutes. Is there a way to optimize the procedure?
P.S. I still have my old table intact, so maybe it is better to try and convert it without the duplicates rather than dealing with duplicates after conversion.
UPDATE.
To clarify which duplicates I want to allow and which not.
If a property changes, then changes back, I want all 3 records to be saved, even though first and the last contains same station_id, type, and value.
If there are several successive (by GPSTIME) records with same station_id, type, and value, I want only the first one (which represents the change to that value) to be saved.
In short, a -> b -> b -> a -> a should be optimized to a -> b -> a.
SOLUTION
As #Kickstart suggested, I've created new table, populated with filtered data. To refer previous rows, I've used approach similar to one used in this question.
rename table station_property to station_property_old;
create table station_property like station_property_old;
set #lastsid=-1;
set #lasttype=-1;
set #lastvalue='';
INSERT INTO station_property(station_id,gpstime,property_type,value)
select newsid as station_id,gpstime,newtype as type,newvalue as value from
-- this subquery adds columns with previous values
(select station_property_id,gpstime,#lastsid as lastsid,#lastsid:=station_id as newsid,
#lasttype as lasttype,#lasttype:=property_type as newtype,
#lastvalue as lastvalue,#lastvalue:=value as newvalue
from station_property_old
order by newsid,newtype,gpstime) sub
-- we filter the data, removing unnecessary duplicates
where lastvalue != newvalue or lastsid != newsid or lasttype != newtype;
drop table station_property_old;
Possibly create a new table, populated with a select from the existing table using a GROUP BY. Something like this (not tested so excuse any typos):-
INSERT INTO station_property_new
SELECT station_property_id, station_id, property_type, value
FROM (SELECT station_property_id, station_id, property_type, value, COUNT(*) FROM station_property GROUP BY station_property_id, station_id, property_type, value) Sub1
Regarding chainging properties, cant you put a unique constraint to ensure the combination of station/type/value columns is unique. That way you will not be able to change it to a value which will result in a duplication.

Advanced SQL Select Query

week cookie
1 a
1 b
1 c
1 d
2 a
2 b
3 a
3 c
3 d
This table represent someone visits a website in a particular week. Each cookie represents an individual person. Each entry represent someone visit this site in a particular week. For example, the last entry means 'd' come to the site in week 3.
I want to find out how many (same) people keep coming back in the following week, when given a start week to look at.
For example, if I look at week 1. I will get result like:
1 | 4
2 | 2
3 | 1
Because 4 user came in week 1. Only 2 of them (a,b) came back in week 2. Only 1 (a) of them came in all of these 3 weeks.
How can I do a select query to find out? The table will be big: there might be 100 weeks, so I want to find the right way to do it.
This query uses variables to track adjacent weeks and work out if they are consecutive:
set #start_week = 2, #week := 0, #conseq := 0, #cookie:='';
select conseq_weeks, count(*)
from (
select
cookie,
if (cookie != #cookie or week != #week + 1, #conseq := 0, #conseq := #conseq + 1) + 1 as conseq_weeks,
(cookie != #cookie and week <= #start_week) or (cookie = #cookie and week = #week + 1) as conseq,
#cookie := cookie as lastcookie,
#week := week as lastweek
from (select week, cookie from webhist where week >= #start_week order by 2, 1) x
) y
where conseq
group by 1;
This is for week 2. For another week, change the start_week variable at the top.
Here's the test:
create table webhist(week int, cookie char);
insert into webhist values (1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'c'), (3, 'd');
Output of above query with where week >= 1:
+--------------+----------+
| conseq_weeks | count(*) |
+--------------+----------+
| 1 | 4 |
| 2 | 2 |
| 3 | 1 |
+--------------+----------+
Output of above query with where week >= 2:
+--------------+----------+
| conseq_weeks | count(*) |
+--------------+----------+
| 1 | 2 |
| 2 | 1 |
+--------------+----------+
p.s. Good question, but a bit of a ball-breaker
For some reason most of these answers are very over complicated, it doesn't need cursors or for loops or anything of the sort...
I want to find out how many (same) people keep coming back in the
following week, when given a start week to look at.
If you want to know how many users for any week visited one week and then the week after for each future week:
SELECT visits.week, COUNT(1) AS [NumRepeatUsers]
FROM visits
WHERE EXISTS (
SELECT TOP 1 1
FROM visits AS nextWeek
WHERE nextWeek.week = visits.week+1
AND nextWeek.cookie = visits.cookie
)
AND EXISTS (
SELECT TOP 1 1
FROM visits AS searchWeek
WHERE searchWeek.week = #week
AND nextWeek.cookie = visits.cookie
)
GROUP BY visits.week
ORDER BY visits.week
However this will not show you diminishing results over time if you have 10 users in week 1, and then 5 different users visited for the next 5 weeks you would keep seeing 1=10,2=5,3=5,4=5,5=5,6=5 and so on, instead you want to see that 5=x where x is the number of users who visited every week for 5 weeks straight. To do this, see below:
SELECT visits.week, COUNT(1) AS [NumRepeatUsers]
FROM visits
WHERE EXISTS (
SELECT TOP 1 1
FROM visits AS nextWeek
WHERE nextWeek.week = visits.week+1
AND nextWeek.cookie = visits.cookie
)
AND EXISTS (
SELECT TOP 1 1
FROM visits AS searchWeek
WHERE searchWeek.week = #week
AND nextWeek.cookie = visits.cookie
)
AND visits.week - #week = (
SELECT COUNT(1) AS [Count]
FROM visits AS searchWeek
WHERE searchWeek.week BETWEEN #week+1 AND visits.week
AND nextWeek.cookie = visits.cookie
)
GROUP BY visits.week
ORDER BY visits.week
This will give you 1=10,2=5,3=4,4=3,5=2,6=1 or the like
This is an interesting one.
I try to work out when was the final week each person visited.
This is calculated as the first week on or after the start where the following week doesn't have a visit.
Once you know each user's final visiting week you just count up, for every week, the number of different users whose final visit was on or after that week.
SELECT wks.week, COUNT(cookie) as Visitors
FROM (SELECT a.cookie, MIN(a.week) AS FinalVisit
FROM WeekVisits a
INNER JOIN WeekVisits FirstWeek
ON a.cookie = FirstWeek.cookie
WHERE a.week >= 1
AND FirstWeek.week = 1
AND NOT EXISTS (SELECT 1
FROM WeekVisits b
WHERE b.week = a.week + 1
AND b.cookie = a.cookie)
GROUP BY a.cookie) fv
INNER JOIN
(SELECT DISTINCT week
FROM WeekVisits
WHERE week >= 1) wks
ON fv.FinalVisit >= wks.week
GROUP BY wks.week
ORDER BY wks.week
EDIT
-Thanks ypercube for noticing. I had also lost the group by from the "fv" query. Oops.
-I've removed the comments denoting parameters.
-I've removed the unnecessary distinct.
EDIT again
-Added in a extra stuff for FirstWeek because it didn't cope with starting on week 2
When I run this (admittedly on MS Access)
starting week 1 I get:
+------+----------+
| week | Visitors |
| 1 | 4 |
| 2 | 2 |
| 3 | 1 |
+------+----------+
starting week 2 I get:
+------+----------+
| week | Visitors |
| 2 | 2 |
| 3 | 1 |
+------+----------+
.. as expected.
(To start on week 2 you would change the 1 to 2 in the three places where it is compared with the week column)
The method seems sound but the syntax may need adjusting for MySQL.
Okay let's say your table is called visits and you are interested in week number n. You want to know, for every week number w >= n, which users appear in every single such week w.
So how many such weeks are there?
select count(*)
from visits
where week >= n;
And in how many such weeks did each user visit?
select user, count(user)
from visit
group by user
where week >= n;
Suppose you have weeks 1, 3, 4, 5, 6, 7, 9, 10, and 13, and you are interested in week 5. So the first query above gives you 6, because there are 6 weeks of interest: 5, 6, 7, 9, 10, and 13. The second query will give you, for each user, how many of those weeks they visited in. Now you want to know for how many of those users the count is 6.
I think this works:
select user, count(user)
from visit
group by user
having count(user) = (
select count(*)
from visits
where week >= n)
where week >= n;
but I don't have access to MySQL right now. If it doesn't work, then perhaps the approach makes some sense and sets you in the right direction. EDIT: I will be able to test tomorrow.
This is my solution, is not really straightforward but -as I have tested- it does solve your problem:
First we declare a stored procedure that will give us the visitor in a particular week separated by strings, you can use group_concat if you wish, but I did this way -take into account that group_concat has a text limit.
DELIMITER $$
DROP PROCEDURE IF EXISTS `db`.`get_visitors_for_week`$$
CREATE DEFINER=`root`#`localhost` PROCEDURE `get_visitors_for_week`(id_week INTEGER, OUT result TEXT)
BEGIN
DECLARE should_continue INT DEFAULT 0;
DECLARE c_cookie CHAR(1);
DECLARE r CURSOR FOR SELECT v.cookie
FROM visits v WHERE v.week = id_week;
DECLARE CONTINUE HANDLER FOR NOT FOUND
SET should_continue = 1;
OPEN r;
REPEAT
SET c_cookie = NULL;
FETCH r INTO c_cookie;
IF c_cookie IS NOT NULL THEN
IF result IS NULL OR result = '' THEN
SET result = c_cookie;
ELSE SET result = CONCAT(result,',',c_cookie);
END IF;
END IF;
UNTIL should_continue = 1
END REPEAT;
CLOSE r;
END$$
DELIMITER ;
Then we declare a function to wrap that stored procedure, so we can call inside a query conveniently:
DELIMITER $$
DROP FUNCTION IF EXISTS `db`.`concat_values`$$
CREATE DEFINER=`root`#`localhost` FUNCTION `concat_values`(id_week INTEGER) RETURNS TEXT CHARSET latin1
BEGIN
DECLARE result TEXT;
CALL get_visitors_for_week(id_week, result);
RETURN result;
END$$
DELIMITER ;
And then we must count the visitors that has come this week and last week -for each week of course-, we 'see' that by searching for our cookie string in the concatenated list. This is the final query:
SELECT
v.week,
SUM(IF(IFNULL(concat_values(v.week - 1)) OR INSTR(concat_values(v.week - 1),v.cookie) > 0, 1, 0)) AS Visitors
FROM (SELECT
v.week,
v.cookie,
vt.visitors
FROM visits v
INNER JOIN (SELECT DISTINCT
v.week,
concat_values(v.week) AS visitors
FROM visits v) AS vt
ON v.week = vt.week) AS v
WHERE v.week >= 1
GROUP BY v.week
Substitue the condition v.week >= 1 -the 1- for the week number you want to start from.
Use self-join:
SELECT ... FROM visits AS v1 LEFT JOIN visits AS v2 ON v2.week = v1.week+1
WHERE v2.week IS NOT NULL
GROUP BY cookie
This will give you records of second and later visits.
But I think that better would be just to GROUP BY cookie which can get you number of visits per cookie; any number above 1 is a returning user.