Advanced SQL Select Query - mysql

week cookie
1 a
1 b
1 c
1 d
2 a
2 b
3 a
3 c
3 d
This table represent someone visits a website in a particular week. Each cookie represents an individual person. Each entry represent someone visit this site in a particular week. For example, the last entry means 'd' come to the site in week 3.
I want to find out how many (same) people keep coming back in the following week, when given a start week to look at.
For example, if I look at week 1. I will get result like:
1 | 4
2 | 2
3 | 1
Because 4 user came in week 1. Only 2 of them (a,b) came back in week 2. Only 1 (a) of them came in all of these 3 weeks.
How can I do a select query to find out? The table will be big: there might be 100 weeks, so I want to find the right way to do it.

This query uses variables to track adjacent weeks and work out if they are consecutive:
set #start_week = 2, #week := 0, #conseq := 0, #cookie:='';
select conseq_weeks, count(*)
from (
select
cookie,
if (cookie != #cookie or week != #week + 1, #conseq := 0, #conseq := #conseq + 1) + 1 as conseq_weeks,
(cookie != #cookie and week <= #start_week) or (cookie = #cookie and week = #week + 1) as conseq,
#cookie := cookie as lastcookie,
#week := week as lastweek
from (select week, cookie from webhist where week >= #start_week order by 2, 1) x
) y
where conseq
group by 1;
This is for week 2. For another week, change the start_week variable at the top.
Here's the test:
create table webhist(week int, cookie char);
insert into webhist values (1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'c'), (3, 'd');
Output of above query with where week >= 1:
+--------------+----------+
| conseq_weeks | count(*) |
+--------------+----------+
| 1 | 4 |
| 2 | 2 |
| 3 | 1 |
+--------------+----------+
Output of above query with where week >= 2:
+--------------+----------+
| conseq_weeks | count(*) |
+--------------+----------+
| 1 | 2 |
| 2 | 1 |
+--------------+----------+
p.s. Good question, but a bit of a ball-breaker

For some reason most of these answers are very over complicated, it doesn't need cursors or for loops or anything of the sort...
I want to find out how many (same) people keep coming back in the
following week, when given a start week to look at.
If you want to know how many users for any week visited one week and then the week after for each future week:
SELECT visits.week, COUNT(1) AS [NumRepeatUsers]
FROM visits
WHERE EXISTS (
SELECT TOP 1 1
FROM visits AS nextWeek
WHERE nextWeek.week = visits.week+1
AND nextWeek.cookie = visits.cookie
)
AND EXISTS (
SELECT TOP 1 1
FROM visits AS searchWeek
WHERE searchWeek.week = #week
AND nextWeek.cookie = visits.cookie
)
GROUP BY visits.week
ORDER BY visits.week
However this will not show you diminishing results over time if you have 10 users in week 1, and then 5 different users visited for the next 5 weeks you would keep seeing 1=10,2=5,3=5,4=5,5=5,6=5 and so on, instead you want to see that 5=x where x is the number of users who visited every week for 5 weeks straight. To do this, see below:
SELECT visits.week, COUNT(1) AS [NumRepeatUsers]
FROM visits
WHERE EXISTS (
SELECT TOP 1 1
FROM visits AS nextWeek
WHERE nextWeek.week = visits.week+1
AND nextWeek.cookie = visits.cookie
)
AND EXISTS (
SELECT TOP 1 1
FROM visits AS searchWeek
WHERE searchWeek.week = #week
AND nextWeek.cookie = visits.cookie
)
AND visits.week - #week = (
SELECT COUNT(1) AS [Count]
FROM visits AS searchWeek
WHERE searchWeek.week BETWEEN #week+1 AND visits.week
AND nextWeek.cookie = visits.cookie
)
GROUP BY visits.week
ORDER BY visits.week
This will give you 1=10,2=5,3=4,4=3,5=2,6=1 or the like

This is an interesting one.
I try to work out when was the final week each person visited.
This is calculated as the first week on or after the start where the following week doesn't have a visit.
Once you know each user's final visiting week you just count up, for every week, the number of different users whose final visit was on or after that week.
SELECT wks.week, COUNT(cookie) as Visitors
FROM (SELECT a.cookie, MIN(a.week) AS FinalVisit
FROM WeekVisits a
INNER JOIN WeekVisits FirstWeek
ON a.cookie = FirstWeek.cookie
WHERE a.week >= 1
AND FirstWeek.week = 1
AND NOT EXISTS (SELECT 1
FROM WeekVisits b
WHERE b.week = a.week + 1
AND b.cookie = a.cookie)
GROUP BY a.cookie) fv
INNER JOIN
(SELECT DISTINCT week
FROM WeekVisits
WHERE week >= 1) wks
ON fv.FinalVisit >= wks.week
GROUP BY wks.week
ORDER BY wks.week
EDIT
-Thanks ypercube for noticing. I had also lost the group by from the "fv" query. Oops.
-I've removed the comments denoting parameters.
-I've removed the unnecessary distinct.
EDIT again
-Added in a extra stuff for FirstWeek because it didn't cope with starting on week 2
When I run this (admittedly on MS Access)
starting week 1 I get:
+------+----------+
| week | Visitors |
| 1 | 4 |
| 2 | 2 |
| 3 | 1 |
+------+----------+
starting week 2 I get:
+------+----------+
| week | Visitors |
| 2 | 2 |
| 3 | 1 |
+------+----------+
.. as expected.
(To start on week 2 you would change the 1 to 2 in the three places where it is compared with the week column)
The method seems sound but the syntax may need adjusting for MySQL.

Okay let's say your table is called visits and you are interested in week number n. You want to know, for every week number w >= n, which users appear in every single such week w.
So how many such weeks are there?
select count(*)
from visits
where week >= n;
And in how many such weeks did each user visit?
select user, count(user)
from visit
group by user
where week >= n;
Suppose you have weeks 1, 3, 4, 5, 6, 7, 9, 10, and 13, and you are interested in week 5. So the first query above gives you 6, because there are 6 weeks of interest: 5, 6, 7, 9, 10, and 13. The second query will give you, for each user, how many of those weeks they visited in. Now you want to know for how many of those users the count is 6.
I think this works:
select user, count(user)
from visit
group by user
having count(user) = (
select count(*)
from visits
where week >= n)
where week >= n;
but I don't have access to MySQL right now. If it doesn't work, then perhaps the approach makes some sense and sets you in the right direction. EDIT: I will be able to test tomorrow.

This is my solution, is not really straightforward but -as I have tested- it does solve your problem:
First we declare a stored procedure that will give us the visitor in a particular week separated by strings, you can use group_concat if you wish, but I did this way -take into account that group_concat has a text limit.
DELIMITER $$
DROP PROCEDURE IF EXISTS `db`.`get_visitors_for_week`$$
CREATE DEFINER=`root`#`localhost` PROCEDURE `get_visitors_for_week`(id_week INTEGER, OUT result TEXT)
BEGIN
DECLARE should_continue INT DEFAULT 0;
DECLARE c_cookie CHAR(1);
DECLARE r CURSOR FOR SELECT v.cookie
FROM visits v WHERE v.week = id_week;
DECLARE CONTINUE HANDLER FOR NOT FOUND
SET should_continue = 1;
OPEN r;
REPEAT
SET c_cookie = NULL;
FETCH r INTO c_cookie;
IF c_cookie IS NOT NULL THEN
IF result IS NULL OR result = '' THEN
SET result = c_cookie;
ELSE SET result = CONCAT(result,',',c_cookie);
END IF;
END IF;
UNTIL should_continue = 1
END REPEAT;
CLOSE r;
END$$
DELIMITER ;
Then we declare a function to wrap that stored procedure, so we can call inside a query conveniently:
DELIMITER $$
DROP FUNCTION IF EXISTS `db`.`concat_values`$$
CREATE DEFINER=`root`#`localhost` FUNCTION `concat_values`(id_week INTEGER) RETURNS TEXT CHARSET latin1
BEGIN
DECLARE result TEXT;
CALL get_visitors_for_week(id_week, result);
RETURN result;
END$$
DELIMITER ;
And then we must count the visitors that has come this week and last week -for each week of course-, we 'see' that by searching for our cookie string in the concatenated list. This is the final query:
SELECT
v.week,
SUM(IF(IFNULL(concat_values(v.week - 1)) OR INSTR(concat_values(v.week - 1),v.cookie) > 0, 1, 0)) AS Visitors
FROM (SELECT
v.week,
v.cookie,
vt.visitors
FROM visits v
INNER JOIN (SELECT DISTINCT
v.week,
concat_values(v.week) AS visitors
FROM visits v) AS vt
ON v.week = vt.week) AS v
WHERE v.week >= 1
GROUP BY v.week
Substitue the condition v.week >= 1 -the 1- for the week number you want to start from.

Use self-join:
SELECT ... FROM visits AS v1 LEFT JOIN visits AS v2 ON v2.week = v1.week+1
WHERE v2.week IS NOT NULL
GROUP BY cookie
This will give you records of second and later visits.
But I think that better would be just to GROUP BY cookie which can get you number of visits per cookie; any number above 1 is a returning user.

Related

MySQL: select random individual from available to populate new table

I am trying to automate the production of a roster based on leave dates and working preferences. I have generated some data to work with and I now have two tables - one with a list of individuals and their preferences for working on particular days of the week(e.g. some prefer to work on a Tuesday, others only every other Wednesday, etc), and another with leave dates for individuals. That looks like this, where firstpref and secondpref represent weekdays with Mon = 1, Sun = 7 and firstprefclw represents a marker for which week of a 2 week pattern someone prefers (0 = no pref, 1 = wk 1 preferred, 2 = wk2 preferred)
initials | firstpref | firstprefclw | secondpref | secondprefclw
KP | 3 | 0 | 1 | 0
BD | 2 | 1 | 1 | 0
LW | 3 | 0 | 4 | 1
Then there is a table leave_entries which basically has the initials, a start date, and an end date for each leave request.
Finally, there is a pre-calculated clwdates table which contains a marker (a 1 or 2) for each day in one of its columns as to what week of the roster pattern it is.
I have run this query:
SELECT #tdate, DATE_FORMAT(#tdate,'%W') AS whatDay, GROUP_CONCAT(t1.initials separator ',') AS available
FROM people AS t1
WHERE ((t1.firstpref = (DAYOFWEEK(#tdate))-1
AND (t1.firstprefclw = 0 OR (t1.firstprefclw = (SELECT c_dates.clw from clwdates AS c_dates LIMIT i,1))))
OR (t1.secondpref = (DAYOFWEEK(#tdate))-1
AND (t1.secondprefclw = 0 OR (t1.secondprefclw = (SELECT c_dates.clw from clwdates AS c_dates LIMIT i,1)))
OR ((DAYOFWEEK(#tdate))-1 IN (0,5,6))
AND t1.initials NOT IN (SELECT initials FROM leave_entries WHERE #tdate BETWEEN leave_entries.start_date and leave_entries.end_date)
);
My output from that is a list of dates with initials of the pattern:
2018-01-03;Wednesday;KP,LW,TH
My desired output is
2018-01-03;Wednesday;KP
Where the initials of the person have been randomly selected from the list of available people generated by the first set of SELECTs.
I have seen a SO post where a suggestion of how to do this has been made involving SUBSTRING_INDEX (How to select Random Sub string,which seperated by coma(",") From a string), however I note the comment that CSV is not the way to go, and since I have a table which is not CSV, I am wondering:
How can I randomly select an individual's initials from the available ones and create a table which is basically date ; random_person?
So I figured out how to do it.
The first select (outlined above) forms the heart of a PROCEDURE called ROWPERROW() and generates a table called available_people
This is probably filthy MySQL code, but it works:
SET #tdate = 0
DROP TABLE IF EXISTS on_call;
CREATE TABLE working(tdate DATE, whatDay VARCHAR(20), selected VARCHAR(255));
DELIMITER //
DROP PROCEDURE IF EXISTS ROWPERROW2;
CREATE PROCEDURE ROWPERROW2()
BEGIN
DECLARE n INT DEFAULT 0;
DECLARE kk INT DEFAULT 0;
SET n=90; -- or however many days the roster is going to run for
SET kk=0;
WHILE kk<n DO
SET #tdate = (SELECT c_dates.fulldate from clwdates AS c_dates LIMIT kk,1);
INSERT INTO working
SELECT #tdate, DATE_FORMAT(#tdate,'%W') AS whatDay, t1.available
FROM available_people AS t1 -- this is the table created by the first query above
WHERE tdate = #tdate ORDER BY RAND() LIMIT 1;
SET kk = kk + 1;
END WHILE;
end;
//
DELIMITER ;
CALL ROWPERROW2();
SELECT * from working;

MySQL count how many times foreign key has a given value [duplicate]

This question already has answers here:
Selecting users who were not sent newsletter
(2 answers)
Closed 5 years ago.
I have one table with states and one table with dates under a given state
s_state
---------
#id name
1 State 1
2 State 2
d_date
--------
#date #user state
2017-01-01 1 1
2017-01-02 1 1
2017-01-03 2 1
I am trying to get, for a given user, how many times (how many days) he had been with each state. My current query works if the state is used, but my problem is that it doesn't return "count 0" for the states not used. (It would, for user 1, return only "State 1 used 2 times", but I want it to return "State 1 count = 2, State 2 count = 0")
Here is my current query:
SELECT s_state.id, COUNT(date)
FROM s_state
LEFT JOIN d_date ON s_state.id = d_date.state
WHERE d_date.user = 1
GROUP BY s_state.id
Try this
SELECT
s_state.id AS 'State Id',
IFNULL(COUNT(date), 0) AS 'Count'
FROM s_state
LEFT JOIN d_date ON s_state.id = d_date.state AND user = 1
GROUP BY
s_state.id,
user
If you use user in WHERE clause, it will filter those that do not exist. A JOIN will show NULLs instead, which you can then convert to 0s

MySQL - How to get a 0 when COUNT(*) returns null?

I'm haing some difficulties with an SQL request.
In that request, I wanna get, for a week (entered in parameter), and a year (also in parameter) the number of opened tickets by tech.
My goal is to get this kind of table :
YEAR WEEK TECH_ID BACKLOG_TICKETS
2017 1 5 11
2017 1 6 1
2017 1 6 0
But the problem is that, when a tech has no ticket in backlog (0), the record is not created, certainly because the COUNT(*) returns a null value.
So here is what I really have :
YEAR WEEK TECH_ID BACKLOG_TICKETS
2017 1 5 11
2017 1 6 1
and here is my request :
SET #selectedDate = DATE_ADD(STR_TO_DATE('01-01-2017', '%d-%m-%Y'), INTERVAL _week WEEK);
INSERT INTO whd_stats.backlog_tickets_by_tech_week (YEAR, WEEK, TECH_ID, BACKLOG_TICKETS_NUMBER)
SELECT _year AS 'YEAR', _week AS 'WEEK',
coalesce(j.ASSIGNED_TECH_ID , 99999) AS 'TECH',
#backlogNumber := COUNT(j.JOB_TICKET_ID)
FROM whd.job_ticket j
LEFT OUTER JOIN whd.tech t ON j.ASSIGNED_TECH_ID = t.CLIENT_ID
LEFT OUTER JOIN whd.STATUS_TYPE s ON j.STATUS_TYPE_ID = s.STATUS_TYPE_ID
WHERE j.DELETED = 0
-- Create Date with the given year, then add the number of week
AND j.REPORT_DATE <= #selectedDate
AND (j.CLOSE_DATE > #selectedDate
OR (j.CLOSE_DATE IS NULL AND s.STATUS_TYPE_NAME IN ('Open', 'Pending', 'Approval Pending')))
GROUP BY YEAR, WEEK, TECH
ON DUPLICATE KEY UPDATE BACKLOG_TICKETS_NUMBER = #backlogNumber;
I have tried to replace COUNT(j.JOB_TICKET_ID) by IFNULL(COUNT(j.JOB_TICKET_ID), 0), I also tried COALESCE(COUNT(j.JOB_TICKET_ID), 0) but none is working, and I have no more idea...
Can you please help me?
Thanks!
One option here would be the "calendar table" approach. You can create a new table looking something like this:
TECH_ID
1
2
3
4
5
You could change your query such that it begins with this calendar table and then left joins outwards. This would guarantee that every TECH_ID would appear in the result set, even if it be absent in your recorded data.
try this one
IFNULL(j.JOB_TICKET_ID,0) AS ticketcount
Try to use IF
IF(COUNT(j.JOB_TICKET_ID) is NULL, 0, COUNT(j.JOB_TICKET_ID)) as backlogNumber
or CASE
(
CASE
WHEN COUNT(j.JOB_TICKET_ID) IS NULL
THEN 0
ELSE COUNT(j.JOB_TICKET_ID)
END
) as backlogNumber
With the (great) help of Tim, I could find a solution, using a "calendar table", containing all my tech IDs.
So here is the working request :
SELECT 2017 AS 'YEAR', 31 AS 'WEEK',
tct.tech_id AS 'TECH',
COUNT(j.JOB_TICKET_ID)
FROM whd.job_ticket j
LEFT OUTER JOIN whd.STATUS_TYPE s ON j.STATUS_TYPE_ID = s.STATUS_TYPE_ID
RIGHT JOIN whd_stats.tech_calendar_table tct ON j.ASSIGNED_TECH_ID = tct.tech_id
AND j.DELETED = 0
AND j.REPORT_DATE <= #selectedDate
AND (j.CLOSE_DATE > #selectedDate
OR (j.CLOSE_DATE IS NULL AND s.STATUS_TYPE_NAME IN ('Open', 'Pending', 'Approval Pending')))
GROUP BY 1, 2, 3;
I really would like to thank Tim for learning me the "calendar table" concept and for his patience.

MySQL: Select first record, last record and 200 evenly spaced records in-between in table

I have a MySQL-table of values with 10.000+ records collected from a datalogging device.
The table has the following columns:
––––––––––––––––––––––––––––––
| ID | Time | par1 | par2 |
––––––––––––––––––––––––––––––
| 0 | .. | .. | .. |
––––––––––––––––––––––––––––––
| 1 | .. | .. | .. |
––––––––––––––––––––––––––––––
.. and so on.
The ID is auto-incrementing.
I would like to query the table for values to plot, so I'm not interested in selecting all 10.000+ records.
If it's possible, I would like select the first record, the last record and 200 evenly spaced records in-between with a single MySQL-query, so that the result can be passed to my plotting algorithm.
I have seen other similar solutions, where every n'th row starting at (1,2,3.. etc) is selected, like explained here:
How to select every nth row in mySQL starting at n
So far I have accomplished to select the first record + 200 evenly spaced records
set #row:=-1;
set #numrows := (SELECT COUNT(*) FROM mytable);
set #everyNthRow := FLOOR(#numrows / 200);
SELECT r.*
FROM (
SELECT *
FROM mytable
ORDER BY ID ASC
) r
CROSS
JOIN ( SELECT #i := 0 ) s
HAVING ( #i := #i + 1) MOD #everyNthRow = 1
But I can't figure out how to include the last record as well.
How can this be accomplished?
EDIT: Furthermore, I would like to check whether the table is actually containing more than 200 records, before applying the desired logic. If not, the select statement should just output every record (so that the first 200 entries of the datalogging-session will also appear).
Try expanding your HAVING clause to include the last row.
HAVING ( #i := #i + 1) MOD #everyNthRow = 1 OR #i = #numrows
You may need to experiment as it could be #i = #numrows - 1 that gives the right result.

mysql combine rows on conditions

I have a table like:
user | area | start | end
1 1 12 18
1 1 19 27
1 1 29 55
1 1 80 99
means: a 'user' appeared in an 'area' from time 'start' to time 'end', areas can be overlapped.
what I want is to get a result like:
user | start-end
1 12-18,19-27,29-55
1 80-99
which means: combine appears with time difference less than a specified value, i.e (row2.start - row1.end < 10), and one result row stands for one 'visit' of the area for a user.
Currently I can distinguish each visit and get the count of visits by comparing the same table using one sql statement. But I'm not able to find a way to get the above result.
Any help is appreciated.
Explanation: The first 3 appears are linked together as only one visit because: row2.start-row1.end < 10 and row3.start-row2.end < 10, the last appear is a new visit because:80(row4.start) - 55(row3.end) >= 10 .
We need two steps:
1 - combine a row with its predcessor to have start and last end in the same row
SELECT
user, area, start, end, #lastend AS lastend, #lastend:=end AS ignoreme
FROM
tablename,
(SELECT #lastend:=0) AS init
ORDER BY user, area, start, end;
2 - use the difference as a grouping criterion
SELECT
...
FROM
...
(SELECT #groupnum:=0) AS groupinit
GROUP BY
... ,
IF(start-lastend>=10,#groupnum:=#groupnum+1,#groupnum)
Now let's combine it:
SELECT
user, area,
GROUP_CONCAT(CONCAT(start,"-",end)) AS start_end
FROM (
SELECT
user, area, start, end, #lastend AS lastend, #lastend:=end AS ignoreme
FROM
tablename,
(SELECT #lastend:=0) AS init
ORDER BY user, area, start, end
) AS baseview,
(SELECT #groupnum:=0) AS groupinit
GROUP BY
user, area,
IF(start-lastend>=10,#groupnum:=#groupnum+1,#groupnum)
Edit
Fixed typos and verified: SQLfiddle