Related
How can I get the succeeding or previous data row result from a query, as I wanted to compute days difference based on date_created for succeeding rows?
SELECT
-- *,
h.date_created,
-- (select date_created where id = h.id and pci_s = h.pci_s + 1) as dc,
h.id,
h.date_created,
CONCAT('B', h.pci_b) AS batch,
h.pci_s,
DATEDIFF(h.date_created, h.date_created) as days_in_stage
FROM
historical h
WHERE
h.pci_b = 1
;
Expected
date_created id date_created batch pci_s days_in_stage
2021-07-18T06:32:26Z 1 2021-07-18T06:32:26Z B1 0 0
2021-07-20T06:32:26Z 4 2021-07-20T06:32:26Z B1 1 2
Here's the jsfiddle
http://sqlfiddle.com/#!9/f32a242/3
Currently using: Mysql 5.7.33
I get the succeeding or previous data row result
that you are looking for LEAD or LAG window function.
but your MySQL version is lower than 8.0 which didn't support window function.
you can try to use a subquery to make LEAD or LAG window function.
Query 1:
SELECT
h.date_created,
h.id,
h.date_created,
CONCAT('B', h.pci_b) AS batch,
h.pci_s,
COALESCE(DATEDIFF(h.date_created,(
SELECT hh.date_created
FROM historical hh
WHERE h.pci_b = hh.pci_b AND h.date_created > hh.date_created
ORDER BY hh.date_created DESC
LIMIT 1
)),0) as days_in_stage
FROM
historical h
WHERE
h.pci_b = 1
Results:
| date_created | id | date_created | batch | pci_s | days_in_stage |
|----------------------|----|----------------------|-------|-------|---------------|
| 2021-07-18T06:32:26Z | 1 | 2021-07-18T06:32:26Z | B1 | 0 | 0 |
| 2021-07-20T06:32:26Z | 4 | 2021-07-20T06:32:26Z | B1 | 1 | 2 |
With MySQL8 you can simply use LAG() or LEAD() window functions.
https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html#function_lag
With MySQL5.x you don't have that so can instead use correlated-sub-queries, for example...
SELECT
-- *,
(SELECT date_created FROM historical WHERE date_created < h.date_created ORDER BY date_Created DESC LIMIT 1) AS prev_date_created,
-- (select date_created where id = h.id and pci_s = h.pci_s + 1) as dc,
h.id,
h.date_created,
CONCAT('B', h.pci_b) AS batch,
h.pci_s,
DATEDIFF(h.date_created, h.date_created) as days_in_stage
FROM
historical h
WHERE
h.pci_b = 1
ORDER BY
h.date_created
;
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=0ce3b601688e1de11eda1007bffea9f9
In the new version of mysql you can use Lead and Lag functions for getting next row and previous row data. But if you are not using current version then this will not work, there is other solution.
Please share your mysql version if not using new version.
For eg for new version:
SELECT
-- *,
h.date_created,
-- (select date_created where id = h.id and pci_s = h.pci_s + 1) as dc,
h.id,
h.date_created,
CONCAT('B', h.pci_b) AS batch,
h.pci_s,
DATEDIFF(h.date_created, h.date_created) as days_in_stage,
lead(h.date_created)over (order by h.date_created) next_row_date,
lag(h.date_created)over (order by h.date_created) pre_row_date
FROM
historical h
WHERE
h.pci_b = 1
;
I know MS SQL fairly well (no expert), but am new to MySQL and can't quite get this to work. I have a game that is composed of 6 stages which are timed. There are penalties for misses and procedural errors where each miss adds 5 seconds, and each procedural adds 10 seconds.
I have to return a table that calculates total time for each stage, a rank for each stage's total time, total match time, total match misses, and overall match rank, in addition to all of the existing fields in the table, where a MatchId equals a specified match Id. I should note that I am forced to use a mySQL 5.5x database, so no window functions such as RANK() function.
Here is my partial table, as I mentioned there are 6 stages per match.
+---------+--------+------------+---------+-------+-------+---------+-------+-------+-------------+
| MatchID | UserID | CategoryID | Stg1Raw | Stg1M | Stg1P | Stg2Raw | Stg2M | Stg2P | ...-> Stg6p |
+---------+--------+------------+---------+-------+-------+---------+-------+-------+-------------+
| 1 | 1 | 9 | 30.75 | NULL | NULL | 28.47 | 1 | NULL | NULL |
| 1 | 2 | 15 | 24.07 | 3 | NULL | 22.20 | NULL | NULL | NULL |
| 1 | 3 | 12 | 23.96 | NULL | NULL | 22.87 | NULL | NULL | NULL |
| 2 | 2 | 15 | 25.72 | 1 | 1 | 28.95 | NULL | 1 | NULL |
+---------+--------+------------+---------+-------+-------+---------+-------+-------+-------------+
I am trying to create a stored procedure that takes in a parameter for matchId i.e. p_matchId and returns the scores for the match. What I can do is: create a stored procedure that calculates total time for each player for each stage, as well as calculating Total time for each player for all stages. I even figured out how to rank each stage as well as the total overall ranking in a standard query. I just cant put it all together.
Here is my query for calculating stage and overall total scores
SELECT sc.*,
(sc.stg1Total + sc.stg2Total + sc.stg3Total +
sc.stg4Total + sc.stg5Total + sc.stg6Total) AS matchTotal
FROM (SELECT m.Alias, c.Category,
(ifnull(s.stg1M,0) + ifnull(s.stg2M,0) + ifnull(s.stg3M,0) +
ifnull(s.stg4M,0) + ifnull(s.stg5M,0) + ifnull(s.stg6M,0)) AS mTotal,
s.stg1Raw, s.stg1M, s.stg1P, ((s.stg1Raw + (ifnull(s.stg1M,0) * 5)) + (ifnull(s.stg1P,0) * 10)) AS stg1Total,
s.stg2Raw, s.stg2M, s.stg2P, ((s.stg2Raw + (ifnull(s.stg2M,0) * 5)) + (ifnull(s.stg2P,0) * 10)) AS stg2Total,
s.stg3Raw, s.stg3M, s.stg3P, ((s.stg3Raw + (ifnull(s.stg3M,0) * 5)) + (ifnull(s.stg3P,0) * 10)) AS stg3Total,
s.stg4Raw, s.stg4M, s.stg4P, ((s.stg4Raw + (ifnull(s.stg4M,0) * 5)) + (ifnull(s.stg4P,0) * 40)) AS stg4Total,
s.stg5Raw, s.stg5M, s.stg5P, ((s.stg5Raw + (ifnull(s.stg5M,0) * 5)) + (ifnull(s.stg5P,0) * 50)) AS stg5Total,
s.stg6Raw, s.stg6M, s.stg6P, ((s.stg6Raw + (ifnull(s.stg6M,0) * 5)) + (ifnull(s.stg6P,0) * 60)) AS stg6Total
FROM Scores s
INNER JOIN Members m
ON s.ShooterId = m.id
INNER JOIN Categories c
ON s.CatId = c.ID
where (MatchID = p_matchId)) AS sc
I know there is probably a better/shorter way to write the above query so any suggestions on that would be appreciated as well.
I tried putting results of the above query into a temporary table and then run the ranking for each stage on the temp table, but mySQL doesn't appear to like the way I was attempting to do it. What would be the shortest most efficient way to create this SP?
UPDATE: To clarify, How could I create a rank column for each of the following in the above sql statement: stg1Total, stg2Total, stg3Total, stg4Total, stg5Total, stg6Total, and matchTotal?
UPDATE 2: I got it working but had to create multiple temp tables to get it to run, and achieve the desired results
BEGIN
CREATE TEMPORARY TABLE A SELECT sc.*,
(sc.stg1Total + sc.stg2Total + sc.stg3Total +
sc.stg4Total + sc.stg5Total + sc.stg6Total) AS matchTotal
FROM
(SELECT m.Alias, c.Category, (ifnull(s.stg1M,0) + ifnull(s.stg2M,0) + ifnull(s.stg3M,0) +
ifnull(s.stg4M,0) + ifnull(s.stg5M,0) + ifnull(s.stg6M,0)) AS mTotal,
s.stg1Raw, s.stg1M, s.stg1P,
((s.stg1Raw + (ifnull(s.stg1M,0) * 5)) + (ifnull(s.stg1P,0) * 10)) AS stg1Total,
s.stg2Raw, s.stg2M, s.stg2P,
((s.stg2Raw + (ifnull(s.stg2M,0) * 5)) + (ifnull(s.stg2P,0) * 10)) AS stg2Total,
s.stg3Raw, s.stg3M, s.stg3P,
((s.stg3Raw + (ifnull(s.stg3M,0) * 5)) + (ifnull(s.stg3P,0) * 10)) AS stg3Total,
s.stg4Raw, s.stg4M, s.stg4P,
((s.stg4Raw + (ifnull(s.stg4M,0) * 5)) + (ifnull(s.stg4P,0) * 40)) AS stg4Total,
s.stg5Raw, s.stg5M, s.stg5P,
((s.stg5Raw + (ifnull(s.stg5M,0) * 5)) + (ifnull(s.stg5P,0) * 50)) AS stg5Total,
s.stg6Raw, s.stg6M, s.stg6P,
((s.stg6Raw + (ifnull(s.stg6M,0) * 5)) + (ifnull(s.stg6P,0) * 60)) AS stg6Total
FROM Scores s
INNER JOIN Members m
ON s.ShooterId = m.id
INNER JOIN Categories c
ON s.CatId = c.ID
where (MatchID = p_matchId)) AS sc;
CREATE TEMPORARY TABLE B SELECT * FROM A;
CREATE TEMPORARY TABLE C SELECT * FROM A;
CREATE TEMPORARY TABLE D SELECT * FROM A;
CREATE TEMPORARY TABLE E SELECT * FROM A;
CREATE TEMPORARY TABLE F SELECT * FROM A;
CREATE TEMPORARY TABLE G SELECT * FROM A;
CREATE TEMPORARY TABLE H SELECT * FROM A;
SELECT Alias, Category, mTotal, stg1Raw, stg1M ,stg1P, stg1Total,
(SELECT COUNT(*)+1 FROM B WHERE A.stg1Total>B.stg1Total)AS stg1Rnk,
stg2Raw, stg2M ,stg2P, stg2Total,
(SELECT COUNT(*)+1 FROM C WHERE A.stg2Total>C.stg2Total)AS stg2Rnk,
stg3Raw, stg3M ,stg3P, stg3Total,
(SELECT COUNT(*)+1 FROM D WHERE A.stg3Total>D.stg3Total)AS stg3Rnk,
stg4Raw, stg4M ,stg4P, stg4Total,
(SELECT COUNT(*)+1 FROM E WHERE A.stg4Total>E.stg4Total)AS stg4Rnk,
stg5Raw, stg5M ,stg5P, stg5Total,
(SELECT COUNT(*)+1 FROM F WHERE A.stg5Total>F.stg5Total)AS stg5Rnk,
stg6Raw, stg6M ,stg6P, stg6Total,
(SELECT COUNT(*)+1 FROM G WHERE A.stg6Total>G.stg6Total)AS stg6Rnk,
matchTotal,
(SELECT COUNT(*)+1 FROM H WHERE A.matchTotal>H.matchTotal)AS oaRnk
FROM A;
DROP TEMPORARY TABLE A;
DROP TEMPORARY TABLE B;
DROP TEMPORARY TABLE C;
DROP TEMPORARY TABLE D;
DROP TEMPORARY TABLE E;
DROP TEMPORARY TABLE F;
DROP TEMPORARY TABLE G;
DROP TEMPORARY TABLE H;
END
Its a little unruly, an probably not that performant, So I would still like to see a more efficient and performant method of achieving the same results.
I write a SQL query to find all numbers that appear at least three times consecutively:
| Id | Num|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 1 |
| 6 | 2 |
| 7 | 2 |
For example, given the above Logs table, 1 is the only number that appears consecutively for at least three times.
My original query returns 1 and 2:
SELECT l1.Num
FROM Logs l1, Logs l2, Logs l3
WHERE l1.Id + 2 = l2.Id + 1 = l3.Id
AND l1.Num = l2.Num = l3.Num;
+------------------------------------------------------------------------
Then I change my query to this (basically I change the a+2 = b+1 = c to a + 1 = b and b+1 = c), it returns the correct answer:
SELECT l1.Num
FROM Logs l1, Logs l2, Logs l3
WHERE l1.Id + 1 = l2.Id AND l2.Id + 1 = l3.Id
AND l1.Num = l2.Num AND l1.Num= l3.Num;
+------------------------------------------------------------------------
It drives me crazy, I cannot figure our why. Could anyone kindly explain this to me? Thanks in advance!!!!
Well, let's have a look at this query:
WHERE l1.Id + 2 = l2.Id + 1 = l3.Id
In MySQL as in most programming languages, the = is just a binary operator, which means it takes a left and a right side and returns true if they are the same and false if otherwise.
In your query, you're first comparing l1.Id + 1 with l2.Id resulting in a truth value. And then, you compare the truth value with l3.Id. (The truth value gets automatically casted to 0/1). This leads to a different result.
tl;dr: = in MySQL is not like = often used in mathematical notation.
When you use multiple = operators (i.e. chaining), MySQL evaluates it from left to right, result of left most comparison is then fed to next operation. Have a look at this SQL Fiddle:
id=id=2 returns false because id=id returns 1 and 1 != 2.
Now, as far as query is concerned, I believe there is a simpler way to do it, using variables in SELECT. Based on your approach, if we were to find all the numbers that occur 4 times then we need to do another join. Instead, we can do something like this:
SELECT id, num,
#counter := IF(#previousNum = num, #Counter + 1, 1) AS counter,
#previousNum := num as prevNum
FROM test, (select #counter := 1, #previousNum := -1) a
ORDER BY id;
This returns id and values along with counter that is incremented if consecutive ids have same num value.
Now, if we want to filter out the values with more than 3 appearances, we need to wrap this query in another select statement, e.g.:
SELECT id
FROM
(SELECT id, num,
#counter := IF(#previousNum = num, #Counter + 1, 1) AS counter,
#previousNum := num as prevNum
FROM test, (select #counter := 1, #previousNum := -1) a
ORDER BY id) a
WHERE a.counter >= 3;
Here is the SQL Fiddle. You can easily change the counter value to find out values for different number of consecutive appeareances.
Say I have two tables. businesses and reviews for businesses.
businesses table:
+----+-------+
| id | title |
+----+-------+
reviews table:
+----+-------------+---------+------+
| id | business_id | message | rate |
+----+-------------+---------+------+
each review has a rate ( 1 to 5 stars )
I want to sort businesses by their reviews rates, based on Bayesian Ranking with condition of having at least 2 reviews.
Here is my query:
SELECT b.id,
(SELECT COUNT(r.rate) as rr FROM reviews r WHERE r.business_id = b.id) as rr,
(SELECT
((COUNT(r.rate) / (COUNT(r.rate) + 2)) AVG(r.rate) +
(2 /(COUNT(r.rate) + 2)) 4)
FROM reviews r where r.business_id = b.id AND rr > 2
) as score
FROM businesses b
order by score desc
LIMIT 4
this will output me:
+------+----+------------+
| id | rr | score |
+------+----+------------+
| 992 | 14 | 4.31250000 |
+------+----+------------+
| 237 | 3 | 4.2000000 |
+------+----+------------+
| 19 | 5 | 4.0000000 |
+------+----+------------+
| 1009 | 12 | 3.9285142 |
+------+----+------------+
I have two questions:
as you see in ((COUNT(r.rate) / (COUNT(r.rate) + 2)) AVG(r.rate) +
(2 /(COUNT(r.rate) + 2)) 4) FROM reviews r where r.business_id = b.id AND rr > 2 ) some functions are running more than once, like COUNT or AVG. are they running once in background and maybe caches the resuslt? OR run for every single call?
is there any equivalent query for this but more optimize?
thanks in advance.
I would hope that MySQL would optimise the multiple counts away, but not certain.
However you could rearrange you query to join against a sub query. This way you are not performing 2 sub queries for every row.
SELECT b.id,
sub0.rr,
sub0.score
FROM businesses b
INNER JOIN
(
SELECT r.business_id,
COUNT(r.rate) AS rr ,
((COUNT(r.rate) / (COUNT(r.rate) + 2)) AVG(r.rate) + (2 /(COUNT(r.rate) + 2)) 4) AS score
FROM reviews r
GROUP BY r.business_id
HAVING rr > 2
) sub0
ON sub0.business_id = b.id
ORDER BY score DESC
LIMIT 4
Note that the result here are very slightly different as it will exclude records with only 2 reviews, while your query will still return them but with a score of NULL. I have left in the apparent missing operators (ie, before AVG(r.rate) and before 4) AS score from your original query.
Using the above idea you could recode it to return both the count and the average rate in the sub query, and just use the values of those returned columns for the calculation.
SELECT b.id,
sub0.rr,
((rr / (rr + 2)) arr + (2 /(rr + 2)) 4) AS score
FROM businesses b
INNER JOIN
(
SELECT r.business_id,
COUNT(r.rate) AS rr ,
AVG(r.rate) AS arr
FROM reviews r
GROUP BY r.business_id
HAVING rr > 2
) sub0
ON sub0.business_id = b.id
ORDER BY score DESC
LIMIT 4
I have a data set that looks like this:
User | Task | Time
--------|--------|--------
User A | Task X | 100
User A | Task Y | 200
User A | Task Z | 300
User B | Task X | 400
User B | Task Y | 500
User B | Task Z | 600
User C | Task X | 700
User C | Task Y | 800
User C | Task Z | 900
User D | Task X | 1000
User D | Task Y | 1100
user D | Task Z | 1200
When I do my initial grouping, the data looks like this:
| Avg User | Avg Task X | Avg Task Y | Avg Task Z
User | Time | Time | Time | Time
-------|----------|------------|------------|------------
User A | 200 | 100 | 200 | 300
User B | 500 | 400 | 500 | 600
User C | 800 | 700 | 800 | 900
User D | 1100 | 1000 | 1100 | 1200
I need it to look like this:
| Avg User | Avg Task X | Avg Task Y | Avg Task Z
User | Time | Time | Time | Time
------|----------|------------|------------|------------
All | 650 | 550 | 650 | 750
This is how I got those numbers:
650 = (200+500+800+1100) / 4
550 = (100+400+700+1000) / 4
650 = (200+500+800+1100) / 4
750 = (300+600+900+1200) / 4
In other words, I have a column group on Task and a row group on User. The problem is that I want the row group to get summarized an extra time.
At first glance I could just return the user's name back as 'All' and it would summarize but this doesn't actually give me the averages that I need. I need to first SUM the times by user, and then find the average per user. If I change the way the original data is shaped, my task groups will no longer work properly.
If I try to use a "Totals" row on my row group, it aggregates the ORIGINAL data and not the summarized/grouped data. That is rather disappointing because it is actually incorrect in my eyes.
The only way I was able to do this type of functionality is to was to use the Code section of the report. I would keep track of the group data I wanted to summarize in a global variable in that I would later output to the field that I wanted.
Here is a microsoft article to describe how to embed code into your report
http://msdn.microsoft.com/en-us/library/ms159238.aspx
Here is a much more detailed way to solve your problem. Link
Assuming your source is SQL Server 2008 you might be able to use a combination of grouping sets:
http://technet.microsoft.com/en-us/library/bb522495.aspx
And the SSRS Aggregate Function:
http://msdn.microsoft.com/en-us/library/ms155830(v=sql.90).aspx
This blog has an example that may also be helpful
http://beyondrelational.com/blogs/jason/archive/2010/07/03/aggregate-of-an-aggregate-function-in-ssrs.aspx
Good Luck
I would do this in a sql script, doing this in reporting would be overkill (although it probably would be possible).
I have and example script right here:
drop table #tmp, #tmp2, #tmp3
select 'User A' as [User],' Task X ' as [Task],100.00 as [Time]
into #tmp
union all
select 'User A ',' Task Y ',200
union all
select 'User A ',' Task Z ',300
union all
select 'User B ',' Task X ',400
union all
select 'User B ',' Task Y ',500
union all
select 'User B ',' Task Z ',600
union all
select 'User C ',' Task X ',700
union all
select 'User C ',' Task Y ',800
union all
select 'User C ',' Task Z ',900
union all
select 'User D ',' Task X ',1000
union all
select 'User D ',' Task Y ',1100
union all
select 'User D ',' Task Z ',1200
select [User],
Task,
Sum(time) as time
into #tmp2
from #tmp
group by [User],
[Task]
select [User],
avg(time) as time
into #tmp3
from #tmp2
group by [User];
declare #statement nvarchar(max);
select #statement =
'with cteTimes as (
select *
from #tmp2 t
pivot (sum (t.[time]) for Task in (' + stuff((select ', ' + quotename([Task]) from #tmp group by [Task] for xml path, type).value('.','varchar(max)'), 1, 2, '') + ')) as Task
)
select ''All'' as [User],
(select avg(usr.time) from #tmp3 usr),'
+ stuff((select ', avg(' + quotename([Task]) + ') as ' + quotename([Task]) from #tmp group by [Task] for xml path, type).value('.','varchar(max)'), 1, 2, '') +
+'from cteTimes x ';
exec sp_executesql #statement;
The script can probably be optimized by using a pivot instead of multiple joins while creating the #tmp4.
My example is just explanatory.
Here's the query I would write that works... The "PreQuery" is done to group the counts and sum of each element for a given user... Then that is rolled-up to the top-most level of "All". Now, this is based on your data sample.
SELECT
AVG( TaskTime / TaskCount ) as TaskAvg,
SUM( XTime ) / SUM( XCount ) as XAvg,
SUM( YTime ) / SUM( YCount ) as YAvg,
SUM( ZTime ) / SUM( ZCount ) as ZAvg
from
( SELECT
user,
COUNT(*) as TaskCount,
SUM( Time ) as TaskTime,
CASE WHEN Task = "Task X" THEN 1 ELSE 0 END as XCount,
CASE WHEN Task = "Task X" THEN Time ELSE 0 END as XTime,
CASE WHEN Task = "Task Y" THEN 1 ELSE 0 END as YCount,
CASE WHEN Task = "Task Y" THEN Time ELSE 0 END as YTime,
CASE WHEN Task = "Task Z" THEN 1 ELSE 0 END as ZCount,
CASE WHEN Task = "Task Z" THEN Time ELSE 0 END as ZTime
FROM
AllUsersTasks
group by ;
user ) PreQuery
If your data could provide that a given user has multiple entries for a single Task, such as 3 entries for User A, Task X has Times of 95, 100 and 105, you have 3 entries for 300 which results in the 100. This could skew your OVERALL Average of this task and would have to modify the query. Let me know if a person will have multiple entries per a given task based on production data... If so, then THAT element would probably need to be put into its OWN pre-query where the "From AllUserTasks" table is.