SQL comparison within max function - mysql

I'm trying to get a list of 20 events grouped by their Ids and sorted by whether they are in progress, pending, or already finished. The problem is that there are events with the same id that include finished, pending, and in progress events and I want to have 20 distinct Ids in the end. What I want to do is group these events together but if one of them is in progress then sort that group by that event. So basically I want to sort by the latest end time that is also before now().
What I have so far is something like this where end and start are end/start times. I'm not sure if what is inside max() is behaving how I should expect.
select * from event_schedule as t1
JOIN (
SELECT DISTINCT(event_id) as e
from event_schedule
GROUP BY event_id
order by MAX(end < unix_timestamp(now())) asc,
MIN(start >= unix_timestamp(now())) asc,
MAX(start) desc
limit 0, 20
)
as t2 on (t1.event_id = t2.e)
This results in some running / pending events to be mixed around in order when I want them to be in the order running -> pending -> Ended.

I would suggest to first create a view in order to not get an overcomplicated SELECT statement:
CREATE VIEW v_event_schedule AS
SELECT *,
CASE
WHEN end < unix_timestamp(now())
THEN 1
WHEN start > unix_timestamp(now())
THEN 2
ELSE 3
END AS category
FROM event_schedule;
This view v_event_schedule returns an extra column, in addition to the columns of event_schedule, which represents the priority of the category (running, pending, past):
running (in progress)
pending (future)
past
Then the following will do what you want:
SELECT a.*
FROM v_event_schedule a
INNER JOIN (
SELECT id,
MIN(category) category
FROM v_event_schedule b
GROUP BY id
) b
ON a.id = b.id
AND a.category = b.category
ORDER BY category,
start DESC
LIMIT 20;
The ORDER BY can be further adapted to your needs as to how you want to sort within the same category. I added start DESC as that seemed what you were doing in your attempt.
About the original ORDER BY
You had this:
order by MAX(end < unix_timestamp(now())) asc,
MIN(start >= unix_timestamp(now())) asc,
The expressions you have there evaluate to boolean values, and both elements in the ORDER BY each divide the groups into two sections, one for false and one for true, so in total 4 groups.
The first of the two will order IDs first that have no record with an end value in the past, because only then the boolean expression is always false which is the only way to make the MAX of them false as well.
Now let's say for the same ID you have both records that have an end date in the future as well as records with an end date in the past. In that case the MAX aggregates to true, and so the id will be sorted secondary. This is not intended, as this ID might have a "running" record.
I did not look into making your query work based on such aggregates on boolean expressions. It requires some time to understand what they are doing. A CASE WHEN to determine the category with a number really makes the SQL a lot easier to understand, at least to me.

Related

MySQL - get users who placed 25th order during period

I have users and orders tables with this structure (simplified for question):
USERS
userid
registered(date)
ORDERS
id
date (order placed date)
user_id
I need to get array of users (array of userid) who placed their 25th order during specified period (for example in May 2019), date of 25th order for each user, number of days to place 25th order (difference between registration date for user and date of 25th order placed).
For example if user registered in April 2018, then placed 20 orders in 2018, and then placed 21-30th orders in Jan-May 2019 - this user should be in this array, if he placed 25th (overall for his account) order in May 2019.
How I can do this with MySQL request?
Sample data and structure: http://www.sqlfiddle.com/#!9/998358 (for testing you can get 3rd order as ex., not 25th, to not add a lot of sample data records).
One request is not required - if this can't be done in one request, few is possible and allowed.
You can use a correlated subquery to get the count of orders placed before the current one by a user. If that's 24 the current order is the 25th. Then check if the date is in the desired range.
SELECT o1.user_id,
o1.date,
datediff(o1.date, u1.registered)
FROM orders o1
INNER JOIN users u1
ON u1.userid = o1.user_id
WHERE (SELECT count(*)
FROM orders o2
WHERE o2.user_id = o1.user_id
AND o2.date < o1.date
OR o2.date = o1.date
AND o2.id < o1.id) = 24
AND o1.date >= '2019-01-01'
AND o1.date < '2019-06-01';
The basic inefficient way of doing this would be to get the user_id for every row in ORDERS where the date is in your target range AND the count of rows in ORDERS with the same user_id and a lower date is exactly 24.
This can get very ugly, very quickly, though.
If you're calling this from code you control, can't you do it from the code?
If not, there should be a way to assign to each row an index describing its rank among orders for its specific user_id, and select from this all user_id from rows with an index of 25 and a correct date. This will give you a select from select from select, but it should be much faster. The difficulty here is to control the order of the rows, so here are the selects I envision:
Select all rows, order by user_id asc, date asc, union-ed to nothing from a table made of two vars you'll initialize at 0.
from this, select all while updating a var to know if a row's user_id is the same as the last, and adding a field that will report so (so for each user_id the first line in order will have a specific value like 0 while the other rows for the same user_id will have a 1)
from this, select all plus a field that equals itself plus one in case the first added field is 1, else 0
from this, select the user_id from the rows where the second added field is 25 and the date is in range.
The union thingy is only necessary if you need to do it all in one request (you have to initialize them in a lower select than the one they're used in).
Edit: Well if you need the date too you can just select it along with the user_id, but calculating the number of days in sql will be a pain. Just join the result table to the users table and get both the date of 25th order and their date of registration, you'll surely be able to do the difference in code.
I'll try building an actual request, however if you want to truly understand what you need to make this you gotta read up on mysql variables, unions, and conditional statements.
"Looks too complicated. I am sure that this can be done with current DB structure and 1-2 requests." Well, yeah. Use the COUNT request, it will be easy, and slow as hell.
For the complex answer, see http://www.sqlfiddle.com/#!9/998358/21
Since you can use multiple requests, you can just initialize the vars first.
It isn't actually THAT complicated, you just have to understand how to concretely express what you mean by "an user's 25th command" to a SQL engine.
See http://www.sqlfiddle.com/#!9/998358/24 for the difference in days, turns out there's a method for that.
Edit 5: seems you're going with the COUNT method. I'll pray your DB is small.
Edit 6: For posterity:
The count method will take years on very large databases. Since OP didn't come back, I'm assuming his is small enough to overlook query speed. If that's not your case and let's say it's 10 years from now and the sqlfiddle links are dead; here's the two-queries solution:
SET #PREV_USR:=0;
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT orders.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
orders
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
Just change RANK = ? and the conditions to fit your needs. If you want to fully understand it, start by the innermost SELECT then work your way high; this version fuses the points 1 & 2 of my explanation.
Now sometimes you will have to use an API or something and it wont let you keep variable values in memory unless you commit it or some other restriction, and you'll need to do it in one query. To do that, you put the initialization one step lower and make it so it does not affect the higher statements. IMO the best way to do this is in a UNION with a fake table where the only row is excluded. You'll avoid the hassle of a JOIN and it's just better overall.
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT DERIVED_4.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
(SELECT * FROM orders
UNION
SELECT * FROM (
SELECT (#PREV_USR:=0) AS INIT_PREV_USR, 0 AS COL_2, 0 AS COL_3
) AS DERIVED_3
WHERE INIT_PREV_USR <> 0
) AS DERIVED_4
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
With that method, the thing to watch for is the amount and the type of columns in your basic table. Here orders' first field is an int, so I put INIT_PREV_USR in first then there are two more fields so I just add two zeroes with names and call it a day. Most types work, since the union doesn't actually do anything, but I wouldn't try this when your first field is a blob (worst comes to worst you can use a JOIN).
You'll note this is derived from a method of pagination in mysql. If you want to apply this to other engines, just check out their best pagination calls and you should be able to work thinks out.

Generating complex sql tables

I currently have an employee logging sql table that has 3 columns
fromState: String,
toState: String,
timestamp: DateTime
fromState is either In or Out. In means employee came in and Out means employee went out. Each row can only transition from In to Out or Out to In.
I'd like to generate a temporary table in sql to keep track during a given hour (hour by hour), how many employees are there in the company. Aka, resulting table has columns HourBucket, NumEmployees.
In non-SQL code I can do this by initializing the numEmployees as 0 and go through the table row by row (sorted by timestamp) and add (employee came in) or subtract (went out) to numEmployees (bucketed by timestamp hour).
I'm clueless as how to do this in SQL. Any clues?
Use a COUNT ... GROUP BY query. Can't see what you're using toState from your description though! Also, assuming you have an employeeID field.
E.g.
SELECT fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable
INNER JOIN (SELECT employeeID AS 'empID', MAX(timestamp) AS 'latest' FROM StaffinBuildingTable GROUP BY employeeID) AS LastEntry ON StaffinBuildingTable.employeeID = LastEntry.empID
GROUP BY fromState
The LastEntry subquery will produce a list of employeeIDs limited to the last timestamp for each employee.
The INNER JOIN will limit the main table to just the employeeIDs that match both sides.
The outer GROUP BY produces the count.
SELECT HOUR(SBT.timestamp) AS 'Hour', SBT.fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable AS SBT
INNER JOIN (
SELECT SBIJ.employeeID AS 'empID', MAX(timestamp) AS 'latest'
FROM StaffinBuildingTable AS SBIJ
WHERE DATE(SBIJ.timestamp) = CURDATE()
GROUP BY SBIJ.employeeID) AS LastEntry ON SBT.employeeID = LastEntry.empID
GROUP BY SBT.fromState, HOUR(SBT.timestamp)
Replace CURDATE() with whatever date you are interested in.
Note this is non-optimal as it calculates the HOUR twice - once for the data and once for the group.
Again you are using the INNER JOIN to limit the number of returned row, this time to the last timestamp on a given day.
To me your description of the FromState and ToState seem the wrong way round, I'd expect to doing this based on the ToState. But assuming I'm wrong on that the following should point you in the right direction:
First, I create a "Numbers" table containing 24 rows one for each hour of the day:
create table tblHours
(Number int);
insert into tblHours values
(0),(1),(2),(3),(4),(5),(6),(7),
(8),(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),(23);
Then for each date in your employee logging table, I create a row in another new table to contain your counts:
create table tblDailyHours
(
HourBucket datetime,
NumEmployees int
);
insert into tblDailyHours (HourBucket, NumEmployees)
select distinct
date_add(date(t.timeStamp), interval h.Number HOUR) as HourBucket,
0 as NumEmployees
from
tblEmployeeLogging t
CROSS JOIN tblHours h;
Then I update this table to contain all the relevant counts:
update tblDailyHours h
join
(select
h2.HourBucket,
sum(case when el.fromState = 'In' then 1 else -1 end) as cnt
from
tblDailyHours h2
join tblEmployeeLogging el on
h2.HourBucket >= el.timeStamp
group by h2.HourBucket
) cnt ON
h.HourBucket = cnt.HourBucket
set NumEmployees = cnt.cnt;
You can now retrieve the counts with
select *
from tblDailyHours
order by HourBucket;
The counts give the number on site at each of the times displayed, if you want during the hour in question, we'd need to tweak this a little.
There is a working version of this code (using not very realistic data in the logging table) here: rextester.com/DYOR23344
Original Answer (Based on a single over all count)
If you're happy to search over all rows, and want the current "head count" you can use this:
select
sum(case when t.FromState = 'In' then 1 else -1) as Heads
from
MyTable t
But if you know that there will always be no-one there at midnight, you can add a where clause to prevent it looking at more rows than it needs to:
where
date(t.timestamp) = curdate()
Again, on the assumption that the head count reaches zero at midnight, you can generalise that method to get a headcount at any time as follows:
where
date(t.timestamp) = "CENSUS DATE" AND
t.timestamp <= "CENSUS DATETIME"
Obviously you'd need to replace my quoted strings with code which returned the date and datetime of interest. If the headcount doesn't return to zero at midnight, you can achieve the same by removing the first line of the where clause.

Access Query for TopN by Group

I've reviewed quite a bit of the sites (e.g. Allen Brown) for creating a query that produces top 5 (or N) values by group. I think I am getting hung up on the creation of a subquery because I'm referencing a previous query not a table.
I have a query started which counts by month the number of PIs (qryPICountbyMonth). Currently the below gives a data mismatch expression error:
SELECT qryPI.EventMonth, qryPI.PI_Issue, Count(qryPI.PI_Issue) AS
CountOfPI_Issue
FROM qryPI
GROUP BY qryPI.EventMonth, qryPI.PI_Issue
HAVING (((Count(qryPI.PI_Issue)) In (Select Top 5 [PI_Issue] From [qryPI]
Where [EventMonth]=[qryPI].[EventMonth] Order By [PI_Issue] Desc)))
ORDER BY qryPI.EventMonth DESC , Count(qryPI.PI_Issue) DESC;
It is built off a a separate query, qryPI
SELECT tblPI.EventDate, Format([EventDate],'yyyy-mm',1,1) AS EventMonth, tblPI.PI_Issue
FROM tblPI
WHERE (((tblPI.EventDate) >= #4/1/2016# And (tblPI.EventDate) <= #5/31/2016#))
GROUP BY tblPI.EventDate, Format([EventDate],'yyyy-mm',1,1), tblPI.PI_Issue;
I'm hoping to have it generate the top 5 counts of PI_Issue by EventMonth. If I haven't provided enough info let me know.
The problem (or at least a problem) is with [EventMonth]=[qryPI].[EventMonth]. Both your primary source and your lookup are called qryPI. You have to alias at least one of them.
You can't do this:
HAVING (((Count(qryPI.PI_Issue)) In (Select Top 5 [PI_Issue] From [qryPI]
count(field) will return an integer, not the set of values you're counting
I thought you could specify TopN in an Access query (it's in the properties), but you have to specify an order by clause, so it knows how to determine the TOP.
Have you tried:
SELECT top 5
tblPI.EventDate, Format([EventDate],'yyyy-mm',1,1) AS EventMonth, tblPI.PI_Issue
FROM tblPI
WHERE (((tblPI.EventDate) >= #4/1/2016# And (tblPI.EventDate) <= #5/31/2016#))
GROUP BY tblPI.EventDate, Format([EventDate],'yyyy-mm',1,1), tblPI.PI_Issue
order by PI_Issue
also not sure why you're using GROUP BY in your inner query as you're not returning any aggregate functions. Do you just need DISTINCT instead?
try:
SELECT distinct top 5
tblPI.EventDate, Format([EventDate],'yyyy-mm',1,1) AS EventMonth, tblPI.PI_Issue
FROM tblPI
WHERE (((tblPI.EventDate) >= #4/1/2016# And (tblPI.EventDate) <= #5/31/2016#))
order by PI_Issue
Actually, if I understand what you want, you need that GROUP BY instead of DISTINCT, but you also need to return the COUNT(*):
SELECT
Year([eventDate]) AS yr,
Month([eventDate]) AS mo,
tblPI.PI_issue,
Min(tblPI.eventDate) AS MinOfeventDate,
Max(tblPI.eventDate) AS MaxOfeventDate,
Count(tblPI.PI_issue) AS CountOfPI_issue
FROM tblPI
WHERE
(((tblPI.EventDate)>=#4/1/2016# And
(tblPI.EventDate)<#6/1/2016#))
GROUP BY
Year([eventDate]),
Month([eventDate]),
tblPI.PI_issue;
then you want to apply the TOPN function to cnt_issue in an outer query:
SELECT TOP 5 from qryInner
order by cnt_issue desc
except that TOP5 applies to all the query results, not the results grouped by yy/mm, which is what I'm assuming you want, so try this:
SELECT TOP 5
qry_inner.yr,
qry_inner.mo,
qry_inner.CountOfPI_issue,
qry_inner.PI_issue,
qry_inner.MinOfeventDate,
qry_inner.MaxOfeventDate
FROM qry_inner
ORDER BY qry_inner.CountOfPI_issue DESC;
As far as I know, Access doesn't allow you to select the top number of rows within a group, so you'll need to limit your outer query results to one month, then apply the TOP function.

How do I group a table of datetimes together as long as there is a continuous chain at least every hour?

I have a table called 'events'.
It contains eventID (INT), eventDateTime(DATETIME), and eventMessage(VARCHAR).
I want to be able group the rows by eventDateTime where there is another row with eventDateTime within 1 hour each side. This should propogate forever (for example a group should be able go on for years, as long as there is never a gap longer than an hour between a linking chain of eventDateTime values within that time period. Ideally I want to end up selecting MIN(eventID) for each group, and both the MIN and MAX of eventDateTime which will give me the time span in which the group runs.
I assume I need some kind of iterating loop to do this? Where would I start?
Let's start from subqueries we need
SET #row_number1 = 0;
SET #row_number2 = 0;
The query returns us the events table ordered with row numbers (rn)
SELECT
(#row_number1:=#row_number1 + 1) AS rn, eventID, eventDateTime
FROM
events
ORDER BY eventDateTime
Let's mar them as SUB1 and SUB2
Then let's join them
select *
from SUB1 join SUB2 on sub1.rn=sub2.rn+1
So we have in one row 2 eventDateTime of current and next row and can calculate time difference
TIMESTAMPDIFF(HOUR, SUB1.eventDateTime, SUB2.eventDateTime) as hoursDiff
Then we can add HAVING hourDiff>1 to have rule breaking intervals. For such records SUB1.eventDateTime is the end of previous group but SUB2.eventDateTime is the beginning of next group.
So our query will return us
SUB1.eventID as previousGroupEndEventId,
SUB1.eventDateTime as previousGroupEndeventDateTime,
SUB2.eventID as currentGroupStartEventId,
SUB2.eventDateTime as currentGroupStarteventDateTime,
TIMESTAMPDIFF(HOUR, SUB1.eventDateTime, SUB2.eventDateTime) as breakInterval
And you can use the query results to get all your info
For complex problems requiring some form of looping, some databases allow recursive queries, but apparently not mysql.
Fortunately, in your case I don't think it is necessary. You can instead look for any rows which don't have another row in the preceeding hour thus:
select *
from events as A
where not exists (
select 1
from events as B
where B.eventDateTime < A.eventDateTime
and B.eventDateTime > DATE_ADD(A.eventDateTime, INTERVAL -1 HOUR)
)
Example kept simple. Fix up the details to meet your requirements.
Working example is here: http://sqlfiddle.com/#!9/c3b73c/1

MySql retrieve products and prices

I would like to retrieve a list of all the products, with their associated prices for a given period.
The Product table:
Id
Name
Description
The Price table:
Id
Product_id
Name
Amount
Start
End
Duration
The most important thing to not here, is that a Product can have mutliple prices, even over the same period, but not with the same duration.
For example, a price from "2013-06-01 -> 2013-06-08" and another from "2013-06-01 -> 2013-06-05"
So my aim is to retrieve, for a given period, the lists of all products, paginated by 10 product for example, joined to the prices existant over the period.
The basic way to do so would be:
SElECT *
FROM product
LEFT JOIN prices ON ...
WHERE prices.start >= XXX And prices.end <= YYY
LIMIT 0,10
The problem while using this simple solution, is that I can't retrieve only 10 Products, but 10 Products*Prices, which is not acceptable in my case.
So the solution would be:
SElECT *
FROM product
LEFT JOIN prices ON ...
WHERE prices.start >= XXX And prices.end <= YYY
GROUP BY product.id
LIMIT 0,10
But the problem here is, i'll only retrieve "1" price for each product.
So I wonder what would be the best way to handle this.
I could for example use a group function, like "group_concat", and retrieve in a field all the prices in a string, like "200/300/100" and so on. That seem weird, and would need work on server-language side to transform to a readable information, but it could work.
Another solution would be to use different column for each prices, depending on duration:
SELECT
IF( NOT ISNULL(price.start) AND price.duration = 1, price.amount , NULL) AS price_1_day
---- same here for all possible durations ---
From ...
Thta would work too i guess (i'm not really sure if this is possible however), but I may need to create about 250 columns to cover all possibilities. Is that a safe option ?
Any help will be much appreciated
I believe that a group_concat would be the best way forward on this, as its very purpose is to aggregate multiple pieces of data relating to a particular column.
However, adapting on peterm's SQL fiddle, this is possible to do in 1 query if using user defined variables. (If one ignores the initial query for setting the vars)
http://dev.mysql.com/doc/refman/5.7/en/user-variables.html
SET #productTemp := '', #increment := 0;
SElECT
#increment := if(#productTemp != Product_id, #increment + 1, #increment) AS limiter,
#productTemp :=Product_id as Product_id,
Product.name,
Price.id as Price_id,
Price.start,
Price.end
FROM
Product
LEFT JOIN
Price ON Product.Id=Price.Product_id
WHERE
`start` >= '2013-05-01' AND `end` <= '2013-05-15'
GROUP BY
Price_id
HAVING
limiter <=2
What we're doing here is only incrementing the user defined var "incrementer" only when the product id is not the same as the last one that was encountered.
As aliases cannot be used in the WHERE condition, we must GROUP by the unique ID (in this case price ID) so that we can reduce the result using HAVING. In this case, I have a full result set that should include 3 Product IDs, reduced to only showing 2.
Please note: This is not a solution I would recommend on large data sets, or in a production enviornment. Even the mysql manual makes a point of highlighting that user defined vars can behave somewhat erratically depending on what paths the optimizer takes. However, I have used them to great effect for some internal statistics in the past.
Fiddle: http://sqlfiddle.com/#!2/96c92/3
It's hard to tell without sample data and desired output but you can try something like this
SElECT p.*, q2.*
FROM
(
SElECT Product_id
FROM Price
WHERE `start` >= '2013-05-01' AND `end` <= '2013-05-15'
GROUP BY Product_id
LIMIT 0,10
) q1 JOIN
(
SELECT *
FROM Price
WHERE `start` >= '2013-05-01' AND `end` <= '2013-05-15'
) q2 ON q1.Product_id = q2.Product_id JOIN product p
ON q1.Product_id = p.Id
Here is SQLFiddle demo