Calculating duration percentage - mysql

I have a table which logs the HTTP status code of a website whenever the status changes, so the table looks like this...
id status date
-----------------------------------
1 404 2015-10-01 13:30:00
2 200 2015-10-02 13:30:00
3 404 2015-10-03 13:30:00
I want to use this data to display a table on my website showing how many times each status has been logged and the percentage duration of the status to the present time.
I have successfully managed to get the total count for each status using following query....
SELECT
`status`,
COUNT(*) AS `status_count`
FROM `table_name`
GROUP BY `status`
ORDER BY `status`
...when executed gives me something like this...
status status_count
----------------------
200 1
404 2
I would like to modify my sql add duration to my results calculated by the date column, my goal is to end up with this...
status status_count duration (%)
-----------------------------------
200 1 20
404 2 80

Here is SQL FIDDLE DEMO
SELECT t1.status
,COUNT(t1.id) as status_count
,SUM(IF(t2.date IS NULL, NOW(), t2.date)-t1.date) / (NOW()-t3.start_date) as duration
FROM table_name t1
LEFT JOIN table_name t2 ON t1.id = (t2.id - 1)
,(SELECT MIN(date) as start_date FROM table_name) t3
GROUP BY t1.status

Mine is more complicated than nick but give a different result.
And i try it on excel to verify values are correct.
I start date with 2015-07-01 13:30:00 so NOW() function can work
That mean seconds are
404 | 86400 1 day | 0.05101
200 | 86400 1 day | 0.05101
404 | 1521138 17 days | 0.89799
total 1693938
Final Result
404 | 2 | 0.94899
200 | 1 | 0.05101
SQL FIDDLE DEMO
SELECT status, Count(status), SUM(secdiff) / MAX(sectotal) as porcentage
FROM
(
SELECT
h1.status,
h2.dateupdate d1,
h1.dateupdate d2,
TIMESTAMPDIFF(SECOND,h1.dateupdate, h2.dateupdate) secdiff,
TIMESTAMPDIFF(SECOND,
(SELECT MIN(dateupdate) from logHttp),
NOW()) sectotal
FROM
logHttp as h1 INNER JOIN
(
(Select * from logHttp)
union
(select MAX(id) +1, 0, NOW() from logHttp)
) as h2
On h1.id + 1 = h2.id
) as t1
group by status;

Use this query
select a.*,(a.status_count/b.tot)*100 as duration_per from (SELECT
`status`,
COUNT(*) AS `status_count`
FROM `table_name`
GROUP BY `status`
ORDER BY `status`) a join (select count(*) as tot from `table_name`) b

Related

MySQL - SELECT MAX(), and get corresponding fields

The task is: get the list with ID of every employee and the ID of the last department where he worked. It's becoming more complicated cause one person can work in different departments at one time, so we need to get his last department where he has the max rate.
table:
ID_employee| ID_department | end_date | rate
1 22 2016-01-01 1
2 25 NULL 0.3
2 27 NULL 1
3 22 2013-12-12 0.5
3 22 2014-05-05 0.5
end_date is the last day when employee worked, and NULL value means that his contract is actual today.
The result must look like:
ID_employee | ID_department | end_date | rate
1 22 2016-01-01 1
2 27 NULL 1
3 22 2014-05-05 0.5
I found out how to select max() with corresponding fields by using join:
SELECT table.id_employee, id_department
FROM table
JOIN ( SELECT id_employee,
IF (MAX( end_date IS NULL ) = 1 , "0000-00-00", MAX( end_date )) as max_end_date
FROM table GROUP BY id_employee) maxs ON maxs.id_employee = table.id_employee
WHERE maxs.max_end_date = IFNULL(table.end_date, "0000-00-00")
GROUP BY table.id_employee
However, there are ALL corresponding rows in the result:
ID_employee | ID_department | end_date | rate
1 22 2016-01-01 1
2 25 NULL 0.3
2 27 NULL 1
3 22 2014-05-05 0.5
The question is, how to get NOT JUST corresponding rows to MAX(end_date), but with MAX(rate) too? I assume that HAVING might help, but I still don't know what exactly must be there.
And maybe there are other ways to solve problem with better performance, because this query works about 16s while the table has ~30 000 rows.
Could you try with the query below:
SELECT T1.ID_employee,
T1.ID_department,
CASE WHEN maxs.max_end_date = "0000-00-00" THEN NULL ELSE maxs.max_end_date END AS end_date,
T1.rate
FROM TestTable T1
JOIN ( SELECT id_employee,
MAX(ID_department) AS ID_department,
IF (MAX( end_date IS NULL ) = 1, "0000-00-00", MAX( end_date )) AS max_end_date
FROM TestTable
GROUP BY id_employee ) maxs ON maxs.id_employee = T1.id_employee AND maxs.ID_department = T1.ID_department
WHERE maxs.max_end_date = IFNULL(T1.end_date, "0000-00-00")
GROUP BY T1.id_employee
Please find the Live Demo
UPDATE:
As per the comments the following query helped to achieve the result:
SET #CurrentDate := CURDATE();
SELECT T2.ID_employee,
T2.ID_department,
CASE WHEN MR.Max_end_date = #CurrentDate THEN NULL ELSE T2.end_date END AS end_date,
MR.MaxRate AS rate
FROM TestTable T2
JOIN (
SELECT T1.ID_employee, MAX(T1.rate) AS MaxRate, MD.Max_end_date
FROM TestTable T1
JOIN (
SELECT ID_employee,
MAX(CASE WHEN end_date IS NULL THEN #CurrentDate ELSE end_date END) AS Max_end_date
FROM TestTable
GROUP BY ID_employee
) MD ON MD.ID_employee = T1.ID_employee
WHERE MD.Max_end_date = IFNULL(T1.end_date, #CurrentDate)
GROUP BY T1.ID_employee
) MR ON MR.ID_employee = T2.ID_employee AND MR.MaxRate = T2.rate
WHERE MR.Max_end_date = IFNULL(T2.end_date, #CurrentDate)
Working Demo
I think this query will work for you.
SELECT ID_employee, ID_department, end_date, MAX(rate)
FROM test_max
GROUP BY ID_employee

Calculate average days between events?

If I have a table that includes:
user_id | event_time
How can I calculate the average days between events? To get something like:
days_diff | count
1 | 100
2 | 90
3 | 20
A user may have 1 day between events, but may also have 3 days between to subsequent events. How can I count them in both buckets?
Sample data (note in this case the DAY DIFF is 0/1 but this is just a small subset of data)
user_id | event_time
82770 2015-05-04 02:34:53
1 2015-05-04 08:45:53
82770 2015-05-04 20:38:24
82770 2015-05-04 20:38:24
82770 2015-05-04 20:38:24
1 2015-05-05 09:31:42
82770 2015-05-05 13:33:36
82770 2015-05-05 13:33:53
1 2015-05-06 09:53:59
1 2015-05-06 23:31:18
1 2015-05-06 23:31:35
1 2015-05-07 12:31:41
82770 2015-05-07 16:01:16
Here's a solution without using a temporary table:
select daybetweenevents as days_diff,
count(daybetweenevents) as count
from (select t1.user_id,
t1.event_time,
datediff(day, t1.event_time, min(t2.event_time)) as daybetweenevents
from yourtable t1
inner join yourtable t2
on t1.user_id = t2.user_id
and t1.event_time < t2.event_time
group by t1.user_id, t1.event_time) temp
group by daybetweenevents
Use DATEDIFF and a correlated sub query to get previous date.
SELECT user_id, event_time,
DATEDIFF((SELECT MAX(event_time)
FROM yourtable
WHERE event_time < a.event_time), event_time) AS days_diff
FROM yourtable AS a
I went with a temporary table of sorted user events to make the correlation lookup easier and handle users with more than two events. This should get you the output you are asking for.
create table #tempOrderedUserEvents
(
id int identity (1,1),
userid int,
event_time datetime
)
insert into #tempOrderedUserEvents (userid, event_time)
select [user_id], event_time
from YourUserDataTable A
order by [user_id], event_time
select interval, count(*) as [count]
from
(
select A.userid, datediff(day, A.event_time, B.event_time) as interval
from #tempOrderedUserEvents A
JOIN #tempOrderedUserEvents B on A.id+1 = B.id and A.userid = B.userid
) as Intervals
group by interval
drop table #tempOrderedUserEvents

Finding a previous, non-contiguous date using SQL

Suppose a table, tableX, like this:
| date | hours |
| 2014-07-02 | 10 |
| 2014-07-03 | 10 |
| 2014-07-07 | 20 |
| 2014-07-08 | 40 |
The dates are 'workdays' -- that is, no weekends or holidays.
I want to find the increase in hours between consecutive workdays, like this:
| date | hours |
| 2014-07-03 | 0 |
| 2014-07-07 | 10 |
| 2014-07-08 | 20 |
The challenge is dealing with the gaps. If there were no gaps, something like
SELECT t1.date1 AS 'first day', t2.date1 AS 'second day', (t2.hours - t1.hours)
FROM tableX t1
LEFT JOIN tableX t2 ON t2.date1 = DATE_add(t1.date1, INTERVAL 1 DAY)
ORDER BY t2.date1;
would get it done, but that doesn't work in this case as there is a gap between 2014-07-03 and 2014-07-07.
Just use a correlated subquery instead. You have two fields, so you can do this with two correlated subqueries, or a correlated subquery with a join back to the table. Here is the first version:
SELECT t1.date1 as `first day`,
(select t2.date1
from tableX t2
where t2.date1 > t.date1
order by t2.date asc
limit 1
) as `next day`,
(select t2.hours
from tableX t2
where t2.date1 > t.date1
order by t2.date asc
limit 1
) - t.hours
FROM tableX t
ORDER BY t.date1;
Another alternative is to rank the data by date and then subtract the hours of the previous workday's date from the hours of the current workday's date.
SELECT
ranked_t1.date1 date,
ranked_t1.hours - ranked_t2.hours hours
FROM
(
SELECT t.*,
#rownum := #rownum + 1 AS rank
FROM (SELECT * FROM tableX ORDER BY date1) t,
(SELECT #rownum := 0) r
) ranked_t1
INNER JOIN
(
SELECT t.*,
#rownum2 := #rownum2 + 1 AS rank
FROM (SELECT * FROM tableX ORDER BY date1) t,
(SELECT #rownum2 := 0) r
) ranked_t2
ON ranked_t2.rank = ranked_t1.rank - 1;
SQL Fiddle demo
Note:
Obviously an index on tableX.date1 would speed up the query.
Instead of a correlated subquery, a join is used in the above query.
Reference:
Mysql rank function on SO
Unfortunately, MySQL doesn't (yet) have analytic functions which would allow you to access the "previous row" or the "next row" of the data stream. However, you can duplicate it with this:
select h2.LogDate, h2.Hours - h1.Hours as Added_Hours
from Hours h1
left join Hours h2
on h2.LogDate =(
select Min( LogDate )
from Hours
where LogDate > h1.LogDate )
where h2.LogDate is not null;
Check it out here. Note the index on the date field. If that field is not indexed, this query will take forever.

SQL: Find the biggest percentage change in response time

I'm trying to calculate and list the websites in order of biggest overall reduction in response time from one time period to the next.
I don't strictly need to use a single query to do this, I can potentially run multiple queries.
websites:
| id | url |
| 1 | stackoverflow.com |
| 2 | serverfault.com |
| 3 | stackexchange.com |
responses:
| id | website_id | response_time | created_at |
| 1 | 1 | 93.26 | 2014-01-28 11:51:39
| 2 | 1 | 99.46 | 2014-01-28 11:52:38
| 2 | 1 | 94.51 | 2014-01-28 11:53:38
| 2 | 1 | 104.46 | 2014-01-28 11:54:38
| 2 | 1 | 85.46 | 2014-01-28 11:56:38
| 2 | 1 | 100.00 | 2014-01-28 11:57:36
| 2 | 1 | 50.00 | 2014-01-28 11:58:37
| 2 | 2 | 100.00 | 2014-01-28 11:58:38
| 2 | 2 | 80 | 2014-01-28 11:58:39
Ideally the result would look like:
| percentage_change | website_id |
| 52 | 1 |
| 20 | 2 |
I've got as far as figuring out the largest response time, but no idea how to do another query to calculate the lowest response time and then do the math, then sort the maths.
SELECT * FROM websites
LEFT JOIN (
SELECT distinct *
FROM responses
ORDER BY response_time desc) responsetable
ON websites.id=responsetable.website_id group by website_id
Thanks
You need the equivalent of the lag() or lead() function. In MySQL, I do this using a correlated subquery:
select website_id, max(1 - (prev_response_time / response_time)) * 100
from (select t.*,
(select t2.response_time
from table t2
where t2.website_id = t.website_id and
t2.created_at < t.created_at
order by t2.created_at desc
limit 1
) as prev_response_time
from table t
) t
group by website_id;
EDIT:
If you want the change from the highest to the lowest:
select website_id, (1 - min(response_time) / max(response_time)) * 100
from table t
group by website_id;
Using a couple of sequence numbers:-
SELECT a.id, a.url, MAX(100 * (LeadingResponse.response_time - TrailingResponse.response_time) / LeadingResponse.response_time)
FROM
(
SELECT website_id, created_at, response_time, #aCnt1 := #aCnt1 + 1 AS SeqCnt
FROM responses
CROSS JOIN
(
SELECT #aCnt1:=1
) Deriv1
ORDER BY website_id, created_at
) TrailingResponse
INNER JOIN
(
SELECT website_id, created_at, response_time, #aCnt2 := #aCnt2 + 1 AS SeqCnt
FROM responses
CROSS JOIN
(
SELECT #aCnt2:=2
) Deriv2
ORDER BY website_id, created_at
) LeadingResponse
ON LeadingResponse.SeqCnt = TrailingResponse.SeqCnt
AND LeadingResponse.website_id = TrailingResponse.website_id
INNER JOIN websites a
ON LeadingResponse.website_id = a.id
GROUP BY a.id, a.url
SQL fiddle for this:-
http://www.sqlfiddle.com/#!2/ace08/1
EDIT - different way of doing it. This will only work if the id on the responses table is in date / time order.
SELECT a.id, a.url, MAX(100 * (r2.response_time - r1.response_time) / r2.response_time)
FROM responses r1
INNER JOIN responses r2
ON r1.website_id = r2.website_id
INNER JOIN
(
SELECT r1.website_id, r1.id, MAX(r2.id) AS prev_id
FROM responses r1
INNER JOIN responses r2
ON r1.website_id = r2.website_id
AND r1.id > r2.id
GROUP BY r1.website_id, r1.id
) ordering_query
ON r1.website_id = ordering_query.website_id
AND r1.id = ordering_query.id
AND r2.id = ordering_query.prev_id
INNER JOIN websites a
ON r1.website_id = a.id
GROUP BY a.id, a.url
You could do a similar thing based on the response_time field rather than the id, but that would require the response_time for a website to be unique.
EDIT
Just seen that you do not just want consecutive changes, rather just the highest to lowest response. Assuming that the lowest doesn't have to come after the highest:-
SELECT id, url, MAX(100 * (max_response - min_response) / max_response)
FROM
(
SELECT a.id, a.url, MIN(r1.response_time) AS min_response, MAX(r1.response_time) AS max_response
FROM responses r1
INNER JOIN websites a
ON r1.website_id = a.id
GROUP BY a.id, a.url
) Sub1
If you are only interested in the lower response time being after the higher one:-
SELECT id, url, MAX(100 * (max_response - min_following_response) / max_response)
FROM
(
SELECT a.id, a.url, MAX(r1.response_time) AS max_response, MIN(r2.response_time) AS min_following_response
FROM responses r1
INNER JOIN responses r2
ON r1.website_id = r2.website_id
AND (r1.created_at < r2.created_at
OR (r1.created_at = r2.created_at
AND r1.id < r2.id))
INNER JOIN websites a
ON r1.website_id = a.id
GROUP BY a.id, a.url
) Sub1
(assuming that the id field on the response table is unique and in created at order)
From your "I've got as far as figuring out the largest response time, but no idea how to do another query to calculate the lowest response time and then do the math, then sort the maths." I understant that you want smallest response time and largest response time and do your math.
drop table #test
create table #test(
id int, website_id int, response_time decimal, created_at datetime)
insert into #test (id , website_id , response_time , created_at) values ( 1 , 1 , 93.26, '2014-01-28 11:51:39')
insert into #test (id , website_id , response_time , created_at) values ( 2 , 1 , 99.46 , '2014-01-28 11:52:38')
insert into #test (id , website_id , response_time , created_at) values ( 2 , 1 , 94.51 , '2014-01-28 11:53:38')
insert into #test (id , website_id , response_time , created_at) values ( 2 , 1 , 104.46 , '2014-01-28 11:54:38')
insert into #test (id , website_id , response_time , created_at) values ( 2 , 1 , 85.46 , '2014-01-28 11:56:38')
insert into #test (id , website_id , response_time , created_at) values ( 2 , 1 , 100.00 , '2014-01-28 11:57:38')
insert into #test (id , website_id , response_time , created_at) values ( 2 , 1 , 50.00 , '2014-01-28 11:58:38')
insert into #test (id , website_id , response_time , created_at) values ( 2 , 2 , 100.00 , '2014-01-28 11:58:38')
insert into #test (id , website_id , response_time , created_at) values ( 2 , 2 , 80 , '2014-01-28 11:58:38')
select * from #test
select distinct MINT.website_id,MINT.MINRT,maxT.MINRT,(MINT.MINRT/maxT.MINRT)*100--Do your calculation here---
from #test t0
inner join(select min(response_time) as MINRT,website_id from #test group by website_id ) MINT
on MINT.website_id = t0.website_id
inner join(select max(response_time) as MINRT,website_id from #test group by website_id ) maxT
on maxT.website_id = t0.website_id
You want to divide the minimum response time by the maximum response time per website? That would simply be:
select
websites.id as website_id,
100 - min(response_time) / max(response_time) * 100 as percentage_change
from websites
left join responses on websites.id = responses.website_id
group by websites.id;
(I assume response_time can never be zero. In case it can, you will have to use a case statement for that.)
Group the response times by website_id, find MIN(response_time) and MAX(response_time) and compare whether MIN() happened after MAX() to filter only websites which improved their performance.
<?php
$rows = $db->fetchAll('
select
r.website_id, min(r.response_time) min_time, max(r.response_time) max_time,
(select
rmin.created_at
FROM
responses rmin
WHERE
rmin.response_time = min(r.response_time) AND
rmin.website_id = r.website_id
ORDER BY rmin.created_at
LIMIT 1) min_created_at,
(select
rmax.created_at
FROM
responses rmax
WHERE
rmax.response_time = max(r.response_time) AND
rmax.website_id = r.website_id
ORDER BY rmax.created_at DESC
LIMIT 1) max_created_at
FROM
responses r
GROUP BY
r.website_id');
foreach($rows as $row) {
if($row['max_created_at'] < $row['min_created_at']) {
echo 'Website '.$row['website_id'].
' improved by '.
(100 - (($row['min_time'] / $row['max_time']) * 100)).
"%\n";
}
}
The query will be however most probably pretty slow with large datasets. You'll have to optimize the indexes and/or query.
sqlfiddle: http://www.sqlfiddle.com/#!2/fa8f9/8

same column calculation / percentage

How does someone in MYSQL compare a users percentage from a dates entry and score to another dates entry and score, effectively returning a users percentage increase from one date to another?
I have been trying to wrap my head around this question for a few days and am running out of ideas and feel my sql knowledge is limited. Not sure if I'm supposed to use a join or a subquery? The MYSQL tables consist of 3 fields, name, score, and date.
TABLE: userData
name score date
joe 5 2014-01-01
bob 10 2014-01-01
joe 15 2014-01-08
bob 12 2014-01-08
returned query idea
user %inc last date
joe 33% 2014-01-08
bob 17% 2014-01-08
It seems like such a simple function a database would serve yet trying to understand this is out of my grasp?
You need to use SUBQUERIES. Something like:
SELECT name,
((SELECT score
FROM userData as u2
WHERE u2.name = u1.name
ORDER BY date desc
LIMIT 1
)
/
(
SELECT score
FROM userData as u3
WHERE u3.name = u1.name
ORDER BY date desc
LIMIT 1,1
)
* 100.0
) as inc_perc,
max(date) as last_date
FROM userData as u1
GROUP BY name
Simple solution assuming that the formula for %Inc column = total/sum *100
select name,total/sum * 100, date from (
select name,sum(score) as total,count(*) as num,date from table group by name
)as resultTable
select a.name as [user],(cast(cast(b.score as float)-a.score as float)/cast(a.score as float))*100 as '% Inc',b.[date] as lastdate
from userdata a inner join userdata b on a.name = b.name and a.date < b.date
I guess you are looking for the % increse in the score compared to past date
Another way (and note, that I have another result. Based on the name "percinc", percentage increase, I calculated it in my eyes correctly. If you want your result, just calculate it with t1.score / t2.score * 100):
Sample data:
CREATE TABLE t
(`name` varchar(3), `score` int, `date` varchar(10))
;
INSERT INTO t
(`name`, `score`, `date`)
VALUES
('joe', 5, '2014-01-01'),
('bob', 10, '2014-01-01'),
('joe', 15, '2014-01-08'),
('bob', 12, '2014-01-08')
;
Query:
select
t1.name,
t1.score first_score,
t1.date first_date,
t2.score last_score,
t2.date last_date,
t2.score / t1.score * 100 percinc
from
t t1
join t t2 on t1.name = t2.name
where
t1.date = (select min(date) from t where t.name = t1.name)
and t2.date = (select max(date) from t where t.name = t1.name);
Result:
| NAME | FIRST_SCORE | FIRST_DATE | LAST_SCORE | LAST_DATE | PERCINC |
|------|-------------|------------|------------|------------|---------|
| joe | 5 | 2014-01-01 | 15 | 2014-01-08 | 300 |
| bob | 10 | 2014-01-01 | 12 | 2014-01-08 | 120 |
live demo