How to use GROUP BY with subquery range? - mysql

I'm quite new here and I am tinkering with MYSQL to get a sort of pivot table.
For now I'm blocked here :
SELECT `range`,
Sum(IF(`Vrange` = '< 5',1,0)) as `<5`,
Sum(IF(`Vrange` = ' 5-10',1,0)) as `5-10`,
Sum(IF(`Vrange` = ' 10-15',1,0)) as `10-15`,
Sum(IF(`Vrange` = ' 15-20',1,0)) as `15-20`,
Sum(IF(`Vrange` = ' 20-25',1,0)) as `20-25`,
Sum(IF(`Vrange` = ' 20-25',1,0)) as `20-25`,
Sum(IF(`Vrange` = '> 30',1,0)) as `>30`
FROM(
select `Time`,`HDG`, `Vitesse`,
case
when `HDG` between 1 and 90 then ' 0-90'
when `HDG` between 91 and 180 then ' 91-180'
when `HDG` between 181 and 270 then ' 181-270'
else '271-360'
end as `range`,
case
when `Vitesse` between 0 and 5 then '< 5'
when `Vitesse` between 6 and 10 then ' 5-10'
when `Vitesse` between 11 and 15 then ' 10-15'
when `Vitesse` between 16 and 20 then ' 15-20'
when `Vitesse` between 21 and 25 then ' 20-25'
when `Vitesse` between 25 and 30 then ' 25-30'
else '> 30'
end as `Vrange`
from DataPort
WHERE `Time` > now() - interval 1 day
ORDER BY `Time` DESC
)as SQ
GROUP BY `range`;
I get the following anwser :
| range | <5 | 5-10 | ...
|---------|---------|------------|--------
| 0-90 | 5 | 3 |
| 180-270 | 12 | 20 |
And I would like to display all items of range i.e. 0-90 / 91-180 / 181-270 / 271-360 in each row. How is it possible ? As follow :
| range | <5 | 5-10 | ...
|---------|---------|------------|--------
| 0-90 | 1 | 1 |
| 91-180 |0 or null| 0 or null |
| 180-270 | 12 | 20 |
| 271-360 |0 or null| 0 or null |
Many thanks in advance

Welcome to S/O. This should help get what you had going. You did not need to do an explicit pre-query to get ranges, then sum again in the outer query for the counts.
select
AllRanges.Required Range,
sum( case when DP.Vitesse >= 0 and DP.Vitesse < 5 then 1 else 0 end ) ' < 5',
sum( case when DP.Vitesse >= 5 and DP.Vitesse < 10 then 1 else 0 end ) '5-10',
sum( case when DP.Vitesse >= 10 and DP.Vitesse < 15 then 1 else 0 end ) '10-15',
sum( case when DP.Vitesse >= 15 and DP.Vitesse < 20 then 1 else 0 end ) '15-20',
sum( case when DP.Vitesse >= 20 and DP.Vitesse < 25 then 1 else 0 end ) '20-25',
sum( case when DP.Vitesse >= 25 and DP.Vitesse < 30 then 1 else 0 end ) '25-30',
sum( case when DP.Vitesse >= 30 then 1 else 0 end ) '>30'
from
( select '0-90' Required
UNION select '91-180'
UNION select '181-270'
UNION select '271-360' ) AllRanges
LEFT JOIN DataPort DP
ON AllRanges.Required =
case when DP.HDG >= 0 and DP.HDG <= 90 then '0-90'
when DP.HDG > 90 and DP.HDG <= 180 then '91-180'
when DP.HDG >= 180 and DP.HDG <= 270 then '181-270'
else '271-360' end
AND DP.`Time` > now() - interval 1 day
group by
case when DP.HDG >= 0 and DP.HDG <= 90 then '0-90'
when DP.HDG > 90 and DP.HDG <= 180 then '91-180'
when DP.HDG >= 180 and DP.HDG <= 270 then '181-270'
else '271-360' end
Now, having said that, and the above will work, I would like to point out some less-than-optimal parts of it.
Your "HDG", I believe is a directional Heading and will always be technically 0-359 degrees as 360 is actually back to 0.
In your Vitesse range brackets, and not knowing if any fractional / decimal values or not, but you are using the labels twice, such as 5-10 and 10-15. Shouldn't 10 only be within one of the brackets? Your between was testing between 11 and 15, so shouldn't the header group also match?
Your result columns should be named columns. Not spaces, and especially not special characters, dashes, etc. The results should be a table with direct column names. It is the part of your OUTPUT such as report or web that has heading columns with proper context rather than naming the columns as you were attempting.
Finally, careful on your column names, such as 'Time' Try NOT to use reserved keywords within your SQL table column definitions. Take a look at the commands available, function names, etc. Instead of just time, maybe a EntryTime, LogTime, CreateTime, or similar. A bit more explicit context and you'll avoid having to add tick marks to everything. Also, by qualifying with table.column or alias.column helps prevent ambiguity when joining to multiple tables having similar column names.
Just trying to suggest improvements for this and future as you grow with SQL.
FEEDBACK
As per issue of not getting all ranges, I have revised the query. Notice the inner (select ---- ranges) AllRanges via LEFT JOIN to the DataPort table. In this case, the primary table is now the AllRanges alias with 4 rows for each one you want. THEN, I did the LEFT JOIN to the DataPort table. The join is based on the condition of the AllRanges.Required column matching the conditional CASE --- PLUS the Time condition of the date.
If you have the WHERE clause for the time, it internally will convert the LEFT JOIN to an INNER JOIN thus preventing all 4 ranges.
Should be good to go now.

I am not sure if anything is wrong with your code... Are you saying you are missing "91-180" and "271"360"?
Are you sure you have rows that match that range in your subquery?

Related

How to get all records before any specified month in mysql

I want to select all records before any specified month using mysql. Here is my attempted statement:
SELECT SUM(amount) as allPreviousAmount FROM `fn_table`
WHERE MONTH(transdate) < 1 AND YEAR(transdate) = YEAR(CURRENT_DATE())
transdate is datetime data type.
I have data on December 2018. But this does not select the data. Then I remove the Year part, still no data is selected. The transdate is 2018-12-31 15:59:41.
Please fix it and explain why this is not working.
This will do (assuming there are no future dates):
SELECT SUM(amount) as allPreviousAmount
FROM `fn_table`
WHERE MONTH(transdate) < ? OR YEAR(transdate) < YEAR(CURRENT_DATE())
Replace ? with the month that you want the results for.
Multiply year by 100 add month (on both sides) and compare.
set #dt1 = '2019-10-01';
select #dt1,current_date,
year(#dt1) * 100 + month(#dt1),
case
when year(#dt1) * 100 + month(#dt1) < year(current_date) * 100 + month(current_date) then
'Less than'
else 'other'
end as result;
------------+--------------+--------------------------------+-----------+
| #dt1 | current_date | year(#dt1) * 100 + month(#dt1) | result |
+------------+--------------+--------------------------------+-----------+
| 2019-10-01 | 2019-11-14 | 201910 | Less than |
+------------+--------------+--------------------------------+-----------+
1 row in set (0.00 sec)

MySQL - How to check continuity of data

I have a database containing information about sick days of employees in the following structure ( example ):
date || login
2018-01-02 || TestLogin1
2018-01-03 || TestLogin2
2018-01-04 || TestLogin5
2018-01-05 || TestLogin1
2018-01-06 || TestLogin2
And I want to check whether someone had 23 Sick Days in a row within previous 60 days.
I know how to do this in PHP, using loops , but was wondering whether there is a possibility to create this app in raw MySQL.
This is the output I want to achieve:
login || NumberOfDaysOnSickLeaveWithinPrevious2Month
TestLogin4 || 32
TestLogin7 || 30
TestLogin12 || 20
TestLogin3 || 15
TestLogin1 || 10
Will be thankful for the support,
Thanks in advance,
Your sample data suggests that you just want aggregation:
select login,
count(*) as NumberOfDaysOnSickLeaveWithinPrevious2Month
from t
where date >= curdate() - interval 2 month
group by login;
That has nothing to do with "consecutive days". But your sample data doesn't even show two days in a row with the same login -- nor even any dates within the past two months.
It's a lot easier to develop this if you shrink the numbers for example 2 or more continuous days absent in the last 5 days.
drop table if exists t;
create table t(employee_id int, dt date);
insert into t values
(1,'2018-07-10'),(1,'2018-07-11'),(1,'2018-07-12'),
(2,'2018-07-10'),(2,'2018-07-15'),
(3,'2018-07-10'),(3,'2018-07-11'),(3,'2018-07-13'),(3,'2018-07-14')
;
select employee_id, bn, count(*)
from
(
select t.*, concat(employee_id,year(dt) * 10000 + month(dt) * 100 + day(dt))
- #p = 1 diff,
if(
concat(employee_id,year(dt) * 10000 + month(dt) * 100 + day(dt))
- #p = 1 ,#bn:=#bn,#bn:=#bn+1) bn,
#p:=concat(employee_id,year(dt) * 10000 + month(dt) * 100 + day(dt)) p
from t
cross join (select #bn:=0,#p:=0) b
where dt >= date_add(date(now()), interval -5 day)
order by employee_id,dt
) s
group by employee_id,bn having count(*) >= 2 ;
+-------------+------+----------+
| employee_id | bn | count(*) |
+-------------+------+----------+
| 1 | 1 | 3 |
| 3 | 4 | 2 |
| 3 | 5 | 2 |
+-------------+------+----------+
3 rows in set (0.06 sec)
Note the use of variables to work out a block number ,and the having clause. Concating employee and date creates a psuedo key and simplifies calculation.

How to Efficiently Find Number of Specific Day Between Two Dates in MySQL?

Different variations of this question have been asked before, but none for the use case that I'm looking for. I'd like to find the specific number of weekdays between two dates for each row of a MySQL table and then update a column of each row with the result of that operation. This is part of an ETL process, and I'd like to keep this in a stored procedure if at all possible.
Data
Dates are of DATE type and I'd like to find the number of a specific because I have 7 day columns that have a flag if a record occurs on that day of the week. Like this (1 is Monday):
day_1 | day_2 | day_3 | day_4 | day_5 | day_6 | day_7
----- | ----- | ----- | ----- | ----- | ----- | -----
0 | 1 | 0 | 1 | 1 | 0 | 1
Example Use Case
I'm doing this because I'm trying to find the frequency of rows for a timeframe that's not available in the input data (call it input). So for a record that had start and end date values of 2016-01-01 and 2016-03-01, I'd want to know how often that record would have occurred only from 2016-01-01 to 2016-01-31, inclusive. I initially tried to do this by making a table that contained all datevalues for many years into the future like:
datevalue
---------
2016-01-01
2016-01-02
...
and then joining input to that table on start_date and end_date and then aggregating up while counting the number of each day like this:
SUM(CASE WHEN WEEKDAY(B.datevalue) + 1 = 1 THEN 1 ELSE 0 END) * day_1 +
SUM(CASE WHEN WEEKDAY(B.datevalue) + 1 = 2 THEN 1 ELSE 0 END) * day_2 +
SUM(CASE WHEN WEEKDAY(B.datevalue) + 1 = 3 THEN 1 ELSE 0 END) * day_3 +
SUM(CASE WHEN WEEKDAY(B.datevalue) + 1 = 4 THEN 1 ELSE 0 END) * day_4 +
SUM(CASE WHEN WEEKDAY(B.datevalue) + 1 = 5 THEN 1 ELSE 0 END) * day_5 +
SUM(CASE WHEN WEEKDAY(B.datevalue) + 1 = 6 THEN 1 ELSE 0 END) * day_6 +
SUM(CASE WHEN WEEKDAY(B.datevalue) + 1 = 7 THEN 1 ELSE 0 END) * day_7 AS adj_total_frequency
That worked perfectly on a smaller dataset, but input has > 30 million records, and when I tried running on that procedure it ran for 36 hours before I killed it.
Is there a more efficient way of doing this in MySQL?
Too long for a comment but, combining with the pre-calculation of weekday I originally suggested, how much does this (using a single SUM with a complete CASE) work out for you?
SUM(CASE WHEN B.weekdayval = 1 AND day_1 THEN 1
WHEN B.weekdayval = 2 AND day_2 THEN 1
WHEN B.weekdayval = 3 AND day_3 THEN 1
WHEN B.weekdayval = 4 AND day_4 THEN 1
WHEN B.weekdayval = 5 AND day_5 THEN 1
WHEN B.weekdayval = 6 AND day_6 THEN 1
WHEN B.weekdayval = 7 AND day_7 THEN 1
ELSE 0 END) AS adj_total_frequency
actually this could be better; it could theoretically mean B.weekdayval only gets compared once per row (I say theoretically because MySQL does not guarantee irrelevant THEN clauses will not be evaluated, just not "returned" from the CASE).
SUM(CASE WHEN day_1 THEN B.weekdayval = 1
WHEN day_2 THEN B.weekdayval = 2
WHEN day_3 THEN B.weekdayval = 3
WHEN day_4 THEN B.weekdayval = 4
WHEN day_5 THEN B.weekdayval = 5
WHEN day_6 THEN B.weekdayval = 6
WHEN day_7 THEN B.weekdayval = 7
ELSE 0 END) AS adj_total_frequency
Edit: As far as the datesub method goes, I don't have the time to write a full solution, but to start you (or other potential answerers) on that...
I meant DATEDIFF
you can get the number of whole weeks between the start and end with DATEDIFF(end, start) DIV 7
multiply that by the number of days in a week that apply to get an approximation
then (the hardest part), figure out the number of days to add for the fractional week not covered by div.
(Sometimes) MySQL has big troubles optimizing GROUP BY statements with a JOIN. To overcome that you can store the joined result into a temporary table so you can use GROUP BY with one table.
drop temporary table if exists tmp;
create temporary table tmp (id int unsigned not null)
engine=myisam
select i.id
from input i
straight_join dates B
on B.datevalue >= i.`start`
and B.datevalue < i.`end`
where (
(WEEKDAY(B.datevalue ) = 0) AND i.day_7 OR
(WEEKDAY(B.datevalue ) = 1) AND i.day_1 OR
(WEEKDAY(B.datevalue ) = 2) AND i.day_2 OR
(WEEKDAY(B.datevalue ) = 3) AND i.day_3 OR
(WEEKDAY(B.datevalue ) = 4) AND i.day_4 OR
(WEEKDAY(B.datevalue ) = 5) AND i.day_5 OR
(WEEKDAY(B.datevalue ) = 6) AND i.day_6
)
-- and i.id > 000000
-- and i.id <= 100000
;
drop temporary table if exists tmp1;
create temporary table tmp1 (id int unsigned not null, cnt int unsigned not null)
engine=myisam
select id, count(1) as cnt
from tmp
group by id
;
update input i
join tmp1 using(id)
set i.numdays = tmp1.cnt
where 1=1;
My test data contains 1M rows with random day bits (round(rand())) and an average date range of 50 days. So the tmp table contains about 25M rows.
On my system it takes about 500 msec for 10K rows, 5 sec for 100K rows and 2 mins for 1M rows. So if you split the updates in chunks of 100K rows (using the commented id range condition in the first statement) you should be ready in about 30 minutes.

MySQL CASE for value range doesn't work but nested IF's do?

I'm probably missing something that is really, really simple, but I can't for the life of me figure out what it is that I'm not doing correctly...
I have this query which is used to pull out hours people have completed in volunteering and then assign them an award based on the amount of hours submitted. Not difficult...
The nested IF solution is horrible and was only a fallback to see if it was just the CASE that was messing up. Turns out, the janky nested IF solution works perfectly, where as my CASE solution is still broken.
The query is only run once annually to pull off final results, so performance isn't really a problem (the nested IF query currently has an execution time of 0.0095 seconds / 700 rows which is perfectly adequate), it's more of the fact that it is thoroughly annoying my that it's not working and want to understand why for future reference.
For reference the hour values are stored as DECIMAL(8,2), subsequently the value of total_hours is also of the same type.
The output I'm looking for is:
| id | first_name | last_name | total_hours | award |
|----|------------|------------|-------------|----------|
| 1 | Bob | Harrington | 0.50 | Silver |
| 2 | Jim | Halpert | 800.00 | Platinum |
| 3 | Dwight | Shrute | 130.00 | Gold |
| 4 | Michael | Scott | 5.00 | Bronze |
The CASE statement results in all rows having the value of 'Less than 1 hour' for award, EXCEPT those where total_hours equals 1.00, in which the value of award equals 'Bronze'.
The nested IF statements result in the table being generated correctly, as per the example above.
Here is my current CASE query, that doesn't work:
SELECT
m.id,
m.first_name,
m.last_name,
total_hours,
CASE total_hours
WHEN total_hours >= 1 <= 50 THEN
'Bronze'
WHEN total_hours >= 51 <= 125 THEN
'Silver'
WHEN total_hours >= 126 <= 249 THEN
'Gold'
WHEN total_hours >= 250 THEN
'Platinum'
ELSE
'Less than 1 hour'
END AS award
FROM (
SELECT member_id, sum(hours) total_hours
FROM volunteering_hours
WHERE authorise_date > 0 AND validate_date > 0 AND delete_date = 0
GROUP BY member_id
) hour_query
LEFT JOIN members m ON m.id = member_id
ORDER BY total_hours DESC
What I've tried so far:
Placing the raw comparison numeric values in quotes.
Giving the comparison numeric values decimal places.
Trying the CASE statement with only one comparison, just as a test, that being; WHEN total_hours > 1 THEN 'GT 1' ELSE 'LT 1' END award, all columns where still coming up as LT 1 after running the query - meaning it's failed.
Grouping the CASE statement
Changing the syntax of each range comparison to total_hours >= 1 && total_hours <= 50, etc.. and it still yielded the same failed result
My current nested IF solution which looks horrible, but at least is working, is:
SELECT
m.id,
m.first_name,
m.last_name,
total_hours,
IF(total_hours >= 1 && total_hours <= 50, 'Bronze',
IF(total_hours >= 51 && total_hours <= 125, 'Silver',
IF(total_hours >= 126 && total_hours <= 249, 'Gold',
IF(total_hours >= 250, 'Platinum', 'Less than 1 hour')
)
)
) award
FROM (
SELECT member_id, sum(hours) total_hours
FROM volunteering_hours
WHERE authorise_date > 0 AND validate_date > 0 AND delete_date = 0
GROUP BY member_id
) hour_query
LEFT JOIN members m ON m.id = member_id
ORDER BY total_hours DESC
Can someone please shower me in some knowledge as to why the CASE isn't working?
Thanks in advance. :)
You were close but had some syntax errors. Do this instead:
CASE
WHEN total_hours >= 1 AND total_hours <= 50 THEN
'Bronze'
WHEN total_hours >= 51 AND total_hours <= 125 THEN
'Silver'
WHEN total_hours >= 126 AND total_hours <= 249 THEN
'Gold'
WHEN total_hours >= 250 THEN
'Platinum'
ELSE
'Less than 1 hour'
END AS award
Sample simplified SQL Fiddle
You mixed the (different) case syntax.
If you are using case xx, then your WHEN should not contain xx again:
case xx
when 1 then statement
when 2 then statement
If you are using just case then you can need to provide the variables to compare:
case
when xx=1 then statement
when xx=2 then statement
See example here:
http://sqlfiddle.com/#!9/d8348/6
compared with programming languages the first one equals
switch(variable){
case 1: statement; break;
case 2: statement; break;
}
while the second one is
if (variable==1){
statement;
else if (variable==2){
statement;
}
Try the following.
SELECT
m.id,
m.first_name,
m.last_name,
total_hours,
CASE WHEN total_hours between 1 and 50 THEN
'Bronze'
WHEN total_hours between 51 and 125 THEN
'Silver'
WHEN total_hours between 126 and 249 THEN
'Gold'
WHEN total_hours between 250 THEN
'Platinum'
ELSE
'Less than 1 hour'
END AS award
FROM (
SELECT member_id, sum(hours) total_hours
FROM volunteering_hours
WHERE authorise_date > 0 AND validate_date > 0 AND delete_date = 0
GROUP BY member_id
) hour_query
LEFT JOIN members m ON m.id = member_id
ORDER BY total_hours DESC
I am not sure if you can use between in your cases statement like that, but I think you can. If you can't then you just need to break it up correctly like total_hours >=1 and total_hours <=50 in your cases statement.
As for your query, I would recommend restructuring it something like
select
m.id,
m.first_name,
m.last_name,
sum(vh.hours) total_hours,
case when sum(vh.hours) between 1 and 50 then
'bronze'
when sum(vh.hours) between 51 and 125 then
'silver'
when sum(vh.hours) between 126 and 249 then
'gold'
when sum(vh.hours) between 250 then
'platinum'
else
'less than 1 hour'
end as award
from
volunteering_hours vh
left join members m on
m.id = member_id
where
authorise_date > 0 and
validate_date > 0 and
delete_date = 0
group by
vh.member_id,
m.id,
m.first_name,
m.last_name
order by
total_hours desc
Nested tables are not great and you should avoid using them if possible. A bit cleaner and I think the grouping still works fine I think.

MySQL grouping totals by multiple date and results using case

I have a results table which lists a set of values, each linking to another table containing the date that result was made.
I have working SQL to get all the dates (using CASE's) however I can only retrieve a single range of results.
Select
count(CASE
WHEN results.test_id IN ( SELECT id
FROM `test`
WHERE `posted`
BETWEEN '2011-07-01 00:00:00'
AND '2011-07-01 23:59:59')
THEN results.test_id
ELSE NULL
END) AS "1st July"
from `results`
WHERE results.window_id = 2 and results.mark > 90;
I also have another SQL query which gets all the ranges but can only work for one date at a time.
SELECT
CASE
when mark > 90 then '>90%'
when mark > 80 then '>80%'
when mark > 70 then '>70%'
END as mark_results,
COUNT(*) AS count
FROM (SELECT mark from results where window_id =2) as derived
GROUP BY mark_results
ORDER BY mark_results;
What I'd like is to have everything in one unified query, displaying the relevant totals for each range of results. such as below:
Result Range | 1st July | 2nd July | 3rd July | 4th July
>90% | 0 | 0 | 0 | 1
>80% | 1 | 2 | 1 | 1
>70% | 4 | 5 | 5 | 4
So that the totals for each range are displayed under their date.
I assume it's possible.
The following statement joins results and tests in the FROM clause. It then aggregates the query by the mark range, with the counts per day:
Select (CASE when mark > 90 then '>90%'
when mark > 80 then '>80%'
when mark > 70 then '>70%'
END) as mark_results,
sum(case when posted BETWEEN '2011-07-01 00:00:00' AND '2011-07-01 23:59:59' then 1 else 0 end) as July01,
sum(case when posted BETWEEN '2011-07-02 00:00:00' AND '2011-07-02 23:59:59' then 1 else 0 end) as July02,
. . .
from `results` r join
test t
on r.test_id = t.test_id
WHERE r.window_id = 2 and results.mark > 90
group by (CASE when mark > 90 then '>90%'
when mark > 80 then '>80%'
when mark > 70 then '>70%'
END)
order by 1
Just add whatever days you want to the SELECT clause.
I should add . . . if you want all the dates, you need to put them on separate rows:
Select date(posted) as PostedDate,
(CASE when mark > 90 then '>90%'
when mark > 80 then '>80%'
when mark > 70 then '>70%'
END) as mark_results,
count(*) as cnt
. . .
from `results` r join
test t
on r.test_id = t.test_id
WHERE r.window_id = 2 and results.mark > 90
group by date(posted),
(CASE when mark > 90 then '>90%'
when mark > 80 then '>80%'
when mark > 70 then '>70%'
END)
order by 1, 2
In fact, you might consider having a separate row for each date, with the ranges pivoted as columns.