Generating complex sql tables

Generating complex sql tables - mysql

I currently have an employee logging sql table that has 3 columns
fromState: String,
toState: String,
timestamp: DateTime
fromState is either In or Out. In means employee came in and Out means employee went out. Each row can only transition from In to Out or Out to In.
I'd like to generate a temporary table in sql to keep track during a given hour (hour by hour), how many employees are there in the company. Aka, resulting table has columns HourBucket, NumEmployees.
In non-SQL code I can do this by initializing the numEmployees as 0 and go through the table row by row (sorted by timestamp) and add (employee came in) or subtract (went out) to numEmployees (bucketed by timestamp hour).
I'm clueless as how to do this in SQL. Any clues?

Use a COUNT ... GROUP BY query. Can't see what you're using toState from your description though! Also, assuming you have an employeeID field.
E.g.
SELECT fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable
INNER JOIN (SELECT employeeID AS 'empID', MAX(timestamp) AS 'latest' FROM StaffinBuildingTable GROUP BY employeeID) AS LastEntry ON StaffinBuildingTable.employeeID = LastEntry.empID
GROUP BY fromState
The LastEntry subquery will produce a list of employeeIDs limited to the last timestamp for each employee.
The INNER JOIN will limit the main table to just the employeeIDs that match both sides.
The outer GROUP BY produces the count.
SELECT HOUR(SBT.timestamp) AS 'Hour', SBT.fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable AS SBT
INNER JOIN (
SELECT SBIJ.employeeID AS 'empID', MAX(timestamp) AS 'latest'
FROM StaffinBuildingTable AS SBIJ
WHERE DATE(SBIJ.timestamp) = CURDATE()
GROUP BY SBIJ.employeeID) AS LastEntry ON SBT.employeeID = LastEntry.empID
GROUP BY SBT.fromState, HOUR(SBT.timestamp)
Replace CURDATE() with whatever date you are interested in.
Note this is non-optimal as it calculates the HOUR twice - once for the data and once for the group.
Again you are using the INNER JOIN to limit the number of returned row, this time to the last timestamp on a given day.

To me your description of the FromState and ToState seem the wrong way round, I'd expect to doing this based on the ToState. But assuming I'm wrong on that the following should point you in the right direction:
First, I create a "Numbers" table containing 24 rows one for each hour of the day:
create table tblHours
(Number int);
insert into tblHours values
(0),(1),(2),(3),(4),(5),(6),(7),
(8),(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),(23);
Then for each date in your employee logging table, I create a row in another new table to contain your counts:
create table tblDailyHours
(
HourBucket datetime,
NumEmployees int
);
insert into tblDailyHours (HourBucket, NumEmployees)
select distinct
date_add(date(t.timeStamp), interval h.Number HOUR) as HourBucket,
0 as NumEmployees
from
tblEmployeeLogging t
CROSS JOIN tblHours h;
Then I update this table to contain all the relevant counts:
update tblDailyHours h
join
(select
h2.HourBucket,
sum(case when el.fromState = 'In' then 1 else -1 end) as cnt
from
tblDailyHours h2
join tblEmployeeLogging el on
h2.HourBucket >= el.timeStamp
group by h2.HourBucket
) cnt ON
h.HourBucket = cnt.HourBucket
set NumEmployees = cnt.cnt;
You can now retrieve the counts with
select *
from tblDailyHours
order by HourBucket;
The counts give the number on site at each of the times displayed, if you want during the hour in question, we'd need to tweak this a little.
There is a working version of this code (using not very realistic data in the logging table) here: rextester.com/DYOR23344
Original Answer (Based on a single over all count)
If you're happy to search over all rows, and want the current "head count" you can use this:
select
sum(case when t.FromState = 'In' then 1 else -1) as Heads
from
MyTable t
But if you know that there will always be no-one there at midnight, you can add a where clause to prevent it looking at more rows than it needs to:
where
date(t.timestamp) = curdate()
Again, on the assumption that the head count reaches zero at midnight, you can generalise that method to get a headcount at any time as follows:
where
date(t.timestamp) = "CENSUS DATE" AND
t.timestamp <= "CENSUS DATETIME"
Obviously you'd need to replace my quoted strings with code which returned the date and datetime of interest. If the headcount doesn't return to zero at midnight, you can achieve the same by removing the first line of the where clause.

Related

MySQL - get users who placed 25th order during period

I have users and orders tables with this structure (simplified for question):
USERS
userid
registered(date)
ORDERS
id
date (order placed date)
user_id
I need to get array of users (array of userid) who placed their 25th order during specified period (for example in May 2019), date of 25th order for each user, number of days to place 25th order (difference between registration date for user and date of 25th order placed).
For example if user registered in April 2018, then placed 20 orders in 2018, and then placed 21-30th orders in Jan-May 2019 - this user should be in this array, if he placed 25th (overall for his account) order in May 2019.
How I can do this with MySQL request?
Sample data and structure: http://www.sqlfiddle.com/#!9/998358 (for testing you can get 3rd order as ex., not 25th, to not add a lot of sample data records).
One request is not required - if this can't be done in one request, few is possible and allowed.

You can use a correlated subquery to get the count of orders placed before the current one by a user. If that's 24 the current order is the 25th. Then check if the date is in the desired range.
SELECT o1.user_id,
o1.date,
datediff(o1.date, u1.registered)
FROM orders o1
INNER JOIN users u1
ON u1.userid = o1.user_id
WHERE (SELECT count(*)
FROM orders o2
WHERE o2.user_id = o1.user_id
AND o2.date < o1.date
OR o2.date = o1.date
AND o2.id < o1.id) = 24
AND o1.date >= '2019-01-01'
AND o1.date < '2019-06-01';

The basic inefficient way of doing this would be to get the user_id for every row in ORDERS where the date is in your target range AND the count of rows in ORDERS with the same user_id and a lower date is exactly 24.
This can get very ugly, very quickly, though.
If you're calling this from code you control, can't you do it from the code?
If not, there should be a way to assign to each row an index describing its rank among orders for its specific user_id, and select from this all user_id from rows with an index of 25 and a correct date. This will give you a select from select from select, but it should be much faster. The difficulty here is to control the order of the rows, so here are the selects I envision:
Select all rows, order by user_id asc, date asc, union-ed to nothing from a table made of two vars you'll initialize at 0.
from this, select all while updating a var to know if a row's user_id is the same as the last, and adding a field that will report so (so for each user_id the first line in order will have a specific value like 0 while the other rows for the same user_id will have a 1)
from this, select all plus a field that equals itself plus one in case the first added field is 1, else 0
from this, select the user_id from the rows where the second added field is 25 and the date is in range.
The union thingy is only necessary if you need to do it all in one request (you have to initialize them in a lower select than the one they're used in).
Edit: Well if you need the date too you can just select it along with the user_id, but calculating the number of days in sql will be a pain. Just join the result table to the users table and get both the date of 25th order and their date of registration, you'll surely be able to do the difference in code.
I'll try building an actual request, however if you want to truly understand what you need to make this you gotta read up on mysql variables, unions, and conditional statements.
"Looks too complicated. I am sure that this can be done with current DB structure and 1-2 requests." Well, yeah. Use the COUNT request, it will be easy, and slow as hell.
For the complex answer, see http://www.sqlfiddle.com/#!9/998358/21
Since you can use multiple requests, you can just initialize the vars first.
It isn't actually THAT complicated, you just have to understand how to concretely express what you mean by "an user's 25th command" to a SQL engine.
See http://www.sqlfiddle.com/#!9/998358/24 for the difference in days, turns out there's a method for that.
Edit 5: seems you're going with the COUNT method. I'll pray your DB is small.
Edit 6: For posterity:
The count method will take years on very large databases. Since OP didn't come back, I'm assuming his is small enough to overlook query speed. If that's not your case and let's say it's 10 years from now and the sqlfiddle links are dead; here's the two-queries solution:
SET #PREV_USR:=0;
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT orders.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
orders
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
Just change RANK = ? and the conditions to fit your needs. If you want to fully understand it, start by the innermost SELECT then work your way high; this version fuses the points 1 & 2 of my explanation.
Now sometimes you will have to use an API or something and it wont let you keep variable values in memory unless you commit it or some other restriction, and you'll need to do it in one query. To do that, you put the initialization one step lower and make it so it does not affect the higher statements. IMO the best way to do this is in a UNION with a fake table where the only row is excluded. You'll avoid the hassle of a JOIN and it's just better overall.
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT DERIVED_4.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
(SELECT * FROM orders
UNION
SELECT * FROM (
SELECT (#PREV_USR:=0) AS INIT_PREV_USR, 0 AS COL_2, 0 AS COL_3
) AS DERIVED_3
WHERE INIT_PREV_USR <> 0
) AS DERIVED_4
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
With that method, the thing to watch for is the amount and the type of columns in your basic table. Here orders' first field is an int, so I put INIT_PREV_USR in first then there are two more fields so I just add two zeroes with names and call it a day. Most types work, since the union doesn't actually do anything, but I wouldn't try this when your first field is a blob (worst comes to worst you can use a JOIN).
You'll note this is derived from a method of pagination in mysql. If you want to apply this to other engines, just check out their best pagination calls and you should be able to work thinks out.

MySQL query to select all hostels with at least X spaces between start and end dates?

I have 2 tables, one with hostels (effectively a single-room hotel with lots of beds), and the other with bookings.
Hostel table: unique ID, total_spaces
Bookings table: start_date, end_date, num_guests, hostel_ID
I need a (My)SQL query to generate a list of all hostels that have at least num_guests free spaces between start_date and end_date.
Logical breakdown of what I'm trying to achieve:
For each hostel:
Get all bookings that overlap start_date and end_date
For each day between start_date and end_date, sum the total bookings for that day (taking into account num_guests for each booking) and compare with total_spaces, ensuring that there are at least num_guests spaces free on that day (if there aren't on any day then that hostel can be discounted from the results list)
Any suggestions on a query that would do this please? (I can modify the tables if necessary)

I built an example for you here, with more comments, which you can test out:
http://sqlfiddle.com/#!9/10219/9
What's probably tricky for you is to join ranges of overlapping dates. The way I would approach this problem is with a DATES table. It's kind of like a tally table, but for dates. If you join to the DATES table, you basically break down all the booking ranges into bookings for individual dates, and then you can filter and sum them all back up to the particular date range you care about. Helpful code for populating a DATES table can be found here: Get a list of dates between two dates and that's what I used in my example.
Other than that, the query basically follows the logical steps you've already outlined.

Ok, if you are using mysql 8.0.2 and above, then you can use window functions. In such case you can use the solution bellow. This solution does not need to compute the number of quests for each day in the query interval, but only focuses on days when there is some change in the number of hostel guests. Therefore, there is no helping table with dates.
with query as
(
select * from bookings where end_date > '2017-01-02' and start_date < '2017-01-05'
)
select hostel.*, bookingsSum.intervalMax
from hostel
join
(
select tmax.id, max(tmax.intervalCount) intervalMax
from
(
select hostel.id, t.dat, sum(coalesce(sum(t.gn),0)) over (partition by t.id order by t.dat) intervalCount
from hostel
left join
(
select id, start_date dat, guest_num as gn from query
union all
select id, end_date dat, -1 * guest_num as gn from query
) t on hostel.id = t.id
group by hostel.id, t.dat
) tmax
group by tmax.id
) bookingsSum on hostel.id = bookingsSum.id and hostel.total_spaces >= bookingsSum.intervalMax + <num_of_people_you_want_accomodate>
demo
It uses a simple trick, where each start_date represents +guest_num to the overall number of quests and each 'end_date' represents -guest_num to the overall number of quests. We than do the necessary sumarizations in order to find peak number of quests (intervalMax) in the query interval.
You change '2017-01-05' in my query to '2017-01-06' (then only two hostels are in the result) and if you use '2017-01-07' then just hostel id 3 is in the result, since it does not have any bookings yet.

MySQL - group by and count - best query

We have a statistics database of which we would like to group some results. Every entry has a timestamp 'tstarted'.
We would like to group by every quarter of the day. For each quarter, we would like to know the day count where we have > 0 results (for that quarter).
We could resolve this by using a subquery:
select quarter, sum(q), count(quarter), sum(q) / count(quarter) as average
from (
select SEC_TO_TIME((TIME_TO_SEC(tstarted) DIV 900) * 900) as quarter, sum(qdelivered) as q
from statistics
where stat_field = 1
group by SEC_TO_TIME((TIME_TO_SEC(tstarted) DIV 900) * 900), date(tstarted)
order by SEC_TO_TIME((TIME_TO_SEC(tstarted) DIV 900) * 900) asc
) as sub
group by quarter
My question: is there a more efficient way to retrieve this result (e.g. join or other way)?

Efficiency could be improved by eliminating the inline view (derived table aliased as sub), and doing all the work in a single query. (This is because of the way that MySQL processes the inline view, creating and populating a temporary MyISAM table.)
I don't understand why the expression date(tstarted) needs to be included in the GROUP BY clause; I don't see that removing that would change the result set returned by the query.
I do now see the effect of including the date(tstarted) in the GROUP BY of the inline view query.
I think this query returns the same result as the original:
SELECT SEC_TO_TIME((TIME_TO_SEC(s.tstarted) DIV 900) * 900) AS `quarter`
, SUM(s.qdelivered) AS `q`
, COUNT(DISTINCT DATE(s.tstarted)) AS `day_count`
, SUM(s.qdelivered) / COUNT(DISTINCT DATE(s.tstarted)) AS `average`
FROM statistics s
WHERE s.stat_field = 1
GROUP BY SEC_TO_TIME((TIME_TO_SEC(s.tstarted) DIV 900) * 900)
This should be more efficient since it avoids materializing an intermediate derived table.
Your question said you wanted a "day count"; that sounds like you want a count of the each day that had a row within a particular quarter hour.
To get that, you could just add an aggregate expression to the SELECT list,
, COUNT(DISTINCT DATE(s.tstarted)) AS `day_count`

I would be tempted to set up a table of quarters in the day. Use this table and LEFT JOIN your statistics table it.
CREATE TABLE quarters
(
id INT,
start_qtr INT,
end_qtr INT
);
INSERT INTO quarters (id, start_qtr, end_qtr) VALUES
(1,0,899),
(2,900,1799),
(3,1800,2699),
(4,2700,3599),
(5,3600,4499),
(6,4500,5399),
(7,5400,6299),
(8,6300,7199),
etc;
Your query can then be:-
SELECT SEC_TO_TIME(quarters.start_qtr) AS quarter,
sum(statistics.qdelivered),
count(statistics.qdelivered),
sum(statistics.qdelivered) / count(statistics.qdelivered) as average
FROM quarters
LEFT OUTER JOIN statistics
ON TIME_TO_SEC(statistics.tstarted) BETWEEN quarters.start_qtr AND quarters.end_qtr
AND statistics.stat_field = 1
AND DATE(statistics.tstarted) = '2014-06-30'
GROUP BY quarter
ORDER BY quarter;
Advantage of this is that it will give you entries with a count of 0 (and an average of NULL) for quarters where there are no statistics, and it saves some of the calculations.
You could save more calculations by adding time columns to the quarters table:-
CREATE TABLE quarters
(
id INT,
start_qtr INT,
end_qtr INT
start_qtr_time TIME,
end_qtr_time TIME,
);
INSERT INTO quarters (id, start_qtr, end_qtr, start_qtr_time, end_qtr_time) VALUES
(1,0,899, '00:00:00', '00:14:59'),
(2,900,1799, '00:15:00', '00:29:59'),
(3,1800,2699, '00:30:00', '00:44:59'),
(4,2700,3599, '00:45:00', '00:59:59'),
(5,3600,4499, '01:00:00', '01:14:59'),
(6,4500,5399, '01:15:00', '01:29:59'),
(7,5400,6299, '01:30:00', '01:44:59'),
(8,6300,7199, '01:45:00', '01:59:59'),
etc
Then this saves the use of a function on the JOIN:-
SELECT start_qtr_time AS quarter,
sum(statistics.qdelivered),
count(statistics.qdelivered),
sum(statistics.qdelivered) / count(statistics.qdelivered) as average
FROM quarters
LEFT OUTER JOIN statistics
ON TIME(statistics.tstarted) BETWEEN quarters.start_qtr_time AND quarters.end_qtr_time
AND statistics.stat_field = 1
AND DATE(statistics.tstarted) = '2014-06-30'
GROUP BY quarter
ORDER BY quarter;
These both assume you are interested in a particular day.

Update MySQL table with counts from subquery if string match, else set to zero

I have a MySQL table department_members that contains rows with a string field (member_name) and an int field (recent_actions) for every person in a single department. Recent_actions is currently NULL for all rows.
I have another, much larger table company_actions that contains a row for every time someone in the whole company has performed that type of action in the past year. Each row has a member_name, timestamp, and a unique action_id.
I want to update department_members.recent_actions with a count of how many times that member has performed that type of action within the past two weeks. If they haven't performed any actions recently, I want to update department_members.recent_actions with 0.
I've tried various CASE and IF approaches, but I can't get the syntax right.
In pseudocode, this is what I'm trying to do:
UPDATE department_members AS d,
(SELECT COUNT(action_id) AS recent, member_name
FROM company_actions
WHERE timestamp > DATE_SUB(NOW(), INTERVAL 14 DAY) AS tmp
/* then do something like this, only for real: */
IF d.member_name IN tmp(member_names) THEN d.recent_actions = tmp.recent
WHERE d.member_name = tmp.member_name
ELSE IF d.member_name NOT IN tmp(member_names) THEN d.recent_actions = 0
Hopefully that gets across what I'm going for? Any help would be appreciated! Been beating my head against this problem all day.

Join department_members with a subquery that calculates the total number of action_id in table company_actions using LEFT JOIN.
The COALESCE() returns the first non-null value in the params list.
UPDATE department_members a
LEFT JOIN
(
SELECT member_name, COUNT(*) TotalAction
FROM company_actions
WHERE timestamp > DATE_SUB(NOW(), INTERVAL 14 DAY)
GROUP BY member_name
) b ON a.member_name = b.member_name
SET a.action_id = COALESCE(b.TotalAction, 0)

MySQL Query - Include dates without records

I have a report that displays a graph. The X axis uses the date from the below query. Where the query returns no date, I am getting gaps and would prefer to return a value. Is there any way to force a date where there are no records?
SELECT
DATE(instime),
CASE
WHEN direction = 1 AND duration > 0 THEN 'Incoming'
WHEN direction = 2 THEN 'Outgoing'
WHEN direction = 1 AND duration = 0 THEN 'Missed'
END AS type,
COUNT(*)
FROM taxticketitem
GROUP BY
DATE(instime),
CASE
WHEN direction = 1 AND duration > 0 THEN 'Incoming'
WHEN direction = 2 THEN 'Outgoing'
WHEN direction = 1 AND duration = 0 THEN 'Missed'
END
ORDER BY DATE(instime)

One possible way is to create a table of dates and LEFT JOIN your table with them. The table could look something like this:
CREATE TABLE `datelist` (
`date` DATE NOT NULL,
PRIMARY KEY (`date`)
);
and filled with all dates between, say Jan-01-2000 through Dec-31-2050 (here is my Date Generator script).
Next, write your query like this:
SELECT datelist.date, COUNT(taxticketitem.id) AS c
FROM datelist
LEFT JOIN taxticketitem ON datelist.date = DATE(taxticketitem.instime)
WHERE datelist.date BETWEEN `2012-01-01` AND `2012-12-31`
GROUP BY datelist.date
ORDER BY datelist.date
LEFT JOIN and counting not null values from right table's ensures that the count is correct (0 if no row exists for a given date).

You would need to have a set of dates to LEFT JOIN your table to it. Unfortunately, MySQL lacks a way to generate it on the fly.
You would need to prepare a table with, say, 100000 consecutive integers from 0 to 99999 (or how long you think your maximum report range would be):
CREATE TABLE series (number INT NOT NULL PRIMARY KEY);
and use it like this:
SELECT DATE(instime) AS r_date, CASE ... END AS type, COUNT(instime)
FROM series s
LEFT JOIN
taxticketitems ti
ON ti.instime >= '2013-01-01' + INTERVAL number DAY
AND ti.instime < '2013-01-01' + INTERVAL number + 1 DAY
WHERE s.number <= DATEDIFF('2013-02-01', '2013-01-01')
GROUP BY
r_date, type

Had to do something similar before.
You need to have a subselect to generate a range of dates. All the dates you want. Easiest with a start date added to a number:-
SELECT DATE_ADD(SomeStartDate, INTERVAL (a.I + b.1 * 10) DAY)
FROM integers a, integers b
Given a table called integers with a single column called i with 10 rows containing 0 to 9 that SQL will give you a range of 100 days starting at SomeStartDate
You can then left join your actual data against that to get the full range.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Generating complex sql tables - mysql

Related

MySQL - get users who placed 25th order during period

MySQL query to select all hostels with at least X spaces between start and end dates?

MySQL - group by and count - best query

Update MySQL table with counts from subquery if string match, else set to zero

MySQL Query - Include dates without records

Categories

Resources