Creating columns per where clause - mysql

Is it possible to construct a query to create multiple columns based on multiple conditions and aggregate by time. For example a simple table like so:
id value created
1 45 datetime
2 52 datetime
3 24 datetime
4 33 datetime
5 20 datetime
I can get all values between 10 and 20 per week of the year like so:
SELECT count(*) as '10-20', week(created) as 'week', year(created) as 'year'
FROM table
WHERE value > 10 AND value < 20
GROUP BY year(created), week(created)
This will give me for example:
10-20 week year
40 1 2014
21 2 2014
3 33 2014
I could repeat the query for ranges 20-30,30-40, 40-50 and manual join the outputs but I'd like a single queries combining these into a table, so the output would be:
10-20 20-30 30-40 40-50 week year
40 0 33 42 1 2014
21 1 0 2 1 2014
0 0 32 0 12 2014
3 42 34 32 33 2014

You can use SUM instead of COUNT, with an IF inside the brackets of the sum
Something like this
SELECT SUM(IF(value > 10 AND value < 20), 1, 0) as '10-20',
SUM(IF(value > 20 AND value < 30), 1, 0) as '20-30',
SUM(IF(value > 30 AND value < 40), 1, 0) as '30-40',
SUM(IF(value > 40 AND value < 50), 1, 0) as '40-50',
week(created) as 'week', year(created) as 'year'
FROM table
WHERE
GROUP BY year(created), week(created)
Note that there is an issue here (and in your original code) for items with a value that is on the border (eg, if 20).
Flexibility wise you would probably be better off with another table that stores the ranges, join that to get get one row per range per week / year and then turn the rows to columns in your script. Saves amending the code when a range is added.

Related

SQL - sum data for all time, 30 days and 90 days for multiple columns indiviually

BACKGROUND:
I have data that looks like this
date src subsrc subsubsrc param1 param2
2020-02-01 src1 ksjd dfd8 47 31
2020-02-02 src1 djsk zmnc 44 95
2020-02-03 src2 skdj awes 92 100
2020-02-04 src2 mxsf kajs 80 2
2020-02-05 src3 skdj asio 46 53
2020-02-06 src3 dekl jdqo 19 18
2020-02-07 src3 dskl dqqq 69 18
2020-02-08 src4 sqip riow 64 46
2020-02-09 src5 ss01 qwep 34 34
I am trying to aggregate for all time, last 30 days and last 90 days (no rolling sum)
So my final data would look like this:
src subsrc subsubsrc p1_all p1_30 p1_90 p2_all p2_30 p2_90
src1 ksjd dfd8 7 1 7 98 7 98
src1 djsk zmnc 0 0 0 0 0 0
src2 skdj awes 12 12 12 4 4 4
src2 mxsf kajs 6 6 6 31 31 31
src3 skdj asio 0 0 0 0 0 0
src3 dekl jdqo 20 20 20 17 17 17
src3 dskl dqqq 3 3 3 4 4 4
src4 sqip qwep 0 0 0 0 0 0
src5 ss01 qwes 15 15 15 2 2 2
ABOUT DATA:
This is only dummy data and therefore incorrect.
There are tens of thousands of rows in my data.
There are a dozen of src columns that make up the key for the table.
There are a dozen of param columns that I have to sum for 30 and 90 and all time.
Also there are null values in param columns.
Also there are might be multiple rows for same day and src column.
New data is being added every day and the query is probably going to be run every day to get the latest 30, 90, all time data.
WHAT I HAVE TRIED:
This is what I have come up with:
SELECT src, subsubsrc, subsubsrc,
SUM(param1) as param1_all,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 30 THEN param1 END) as param1_30,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 90 THEN param1 END) as param1_90,
SUM(param2) as param2_all,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 30 THEN param2 END) as param2_30,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 90 THEN param2 END) as param2_90,
FROM `MY_TABLE`
GROUP BY src
ORDER BY src
This actually works but I can anticipate how long this query is going to become for multiple sources and even more param columns.
I have been trying something called "Filtered aggregate functions (or manual pivot)" explained HERE. But I am unable to understand/implement it for my case.
Also I have looked at dozens of answers and most of them are running sums for each day OR are complicated cases of this basic calculation. Maybe I am not searching it correctly.
As you can see I am newbie in SQL and would really appreciate any help.
Your query looks quite good; conditional aggregation is the canonical method to pivot a dataset.
One way to possibly increase performance would be to change the date filter in the conditional expressions: using a date function precludes the use of an index.
Instead, you could phrase this as:
select
src,
subsrc,
subsubsrc,
sum(param1) as param1_all,
sum(case when date >= current_date - interval 30 day then param1 end) as param1_30,
sum(case when date >= current_date - interval 90 day then param1 end) as param1_90,
sum(param2) as param2_all,
sum(case when date >= current_date - interval 30 day then param2 end) as param2_30,
sum(case when date >= current_date - interval 90 day then param2 end) as param2_90
from my_table
group by src, subsrc, subsubsrc
order by src, subsrc, subsubsrc
For performance, the following index may be helpul: (src, subsrc, subsubsrc, date).
Note that I included all three non-aggregated columns (src, subsrc, subsubsrc) in the group by clause: starting MySQL 5.7, this is by default mandatory (although you can play around with sql modes to alter that behavior) - and most other databases implement the same constraint.
Your first approach isn't a bad one if you are able to build the query programmatically. One alternative might be to create side tables for the 30 and 90 day cases first so you can effectively select all columns from each. This could also be done in sub-queries but there are performance considerations.
Some untested pseudo code to hopefully clarify:
SELECT
src,
subsrc,
subsubsrc,
SUM(param1) as param1_all,
-- other "all" sums here
SUM(t30.param1) as param1_30,
-- other "30" sums here
SUM(t90.param1) as param1_90,
-- other "90" sums here
FROM MY_TABLE
LEFT JOIN (
SELECT *
FROM MY_TABLE
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
) as t30 on t30.src = MY_TABLE.src
LEFT JOIN (
SELECT *
FROM MY_TABLE
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
) as t90 on t90.src = MY_TABLE.src
GROUP BY MY_TABLE.src
ORDER BY MY_TABLE.src
Note the date conditions have been switched to not use a function on the date column but instead compare to a date value. Your original approach would defeat any index on date (which you will want to make this more efficient).
If you first put these sub-queries into side tables that have a key on src the joins will be more efficient too. You could even group/sum directly into those side tables first rather than creating larger copies of your data, and then join the aggregated data together.
Your code looks good. Your RDBMS needs to loop all records under the hood and do some calculations. One thing that you can improve is that you are calculating date differences for all records. It would make sense to calculate the moment 30 days ago and 90 days ago beforehand, respectively and only compare the dates against those.
Since you already know that the number of rows and parameters will increase in the future, it makes sense to create a cron job which daily computes this in the following manner:
the first time it calculates the values, it should store all the results along with the date it was running at (maybe into a table dedicated for this analytics)
on subsequent days you can calculate the all time sum by loading the items which were created since the last check
you will still need to calculate the 30 and 90 day stuff, but that would be much less of a problem than calculating this for all time
If you do this properly and have daily information, then later on you will be able to analyze trends in history as well.
I'd recommend you use 3 different queries for that:
Sum for all time
Sum for 30 days
Sum for 90 days
Because when you're trying to do all-in-1 query then you end up with full table scan because of CASE-WHEN-END (BTW there is compact form IF() in MySQL). This is extremely non-optimal.
If you split it into 3 different queries and add an index to the date column then it won't do full-scan for the 2nd and 3rd query. Only for the 1st query, which can be optimised separately (for example by caching).
Also this approach: DATE_DIFF(CURRENT_DATE,date,day) <= 90
should be changed to: date >= 'date-90-days-ago' (where 'date-90-days-ago' is a fixed date)
Thus you won't have to compute difference of 2 dates for every row. You'll have just to compute 2 dates: 30 days ago and 90 days ago and compare all other dates to these two. This approach will benefit of the date column index.

Mysql group by 1-15 and 16-30/31 days of a month

I have a table with a lot of rows and i need to get the count of them grouped by ID and date bimonthly
For example
**ID Date**
15 2016/01/01
15 2016/01/04
15 2016/01/05
15 2016/01/22
15 2016/01/30
15 2016/02/01
15 2016/02/16
15 2016/03/01
15 2016/03/16
15 2016/03/22
Expected results:
**Count ID Date**
3 15 2016/01/01
2 15 2016/01/15
1 15 2016/02/01
1 15 2016/02/15
1 15 2016/03/01
2 15 2016/03/15
Currently i have this:
SELECT count(*) as '#', ID, from_unixtime(Date, '%Y-%m-%d') as 'Date'
FROM table
GROUP BY country,WEEK(FROM_UNIXTIME(Date))
ORDER BY Date
Which indeed groups by week but starting from the first input and so on (whici is not what i want but is as close as i have gotten)
EDIT: Changed the term to bimonthly
The 1st thru 15th and 16th thru 30/31 groups are not "biweekly". That grouping would be referred to as bimonthly. (I'd prefer the term "dimonthly" if that were a word.)
It's odd that you would want to return a value of the 15th, for the group that would not contain the 15th, rather than returning the 16th.
You could use expressions to get you the year and month, and then the 1st or 15th.
GROUP BY CONCAT(DATE_FORMAT(t.date,'%Y-%m-'),IF(DAY(t.date)<16,'01','15'))
+ INTERVAL 0 DAY
You could use the same expression in the SELECT list to return the date value.
(My personal preference would be to return a date value of yyyy-mm-16, rather than -15.)
Note that this won't return a "zero count" for a bimonthly period where there aren't any rows. A row for such a period would be "missing".
GROUP BY clauses can be arbitrary expressions, e.g.
GROUP BY (DAY(FROM_UNIXTIME(Date)) <= 15)
which would split things up into 1-15 and 16->

MySQL get records by hour interval

I need to show a timeline from MySQL table. Basically retrieve count of records by each hour. My TimeSigned column is DateStamp
Login
Id TimeSigned MarkedBy
1. 2016-03-14 05:12:17 James
2. 2016-03-14 05:30:10 Mark
3. 2016-03-14 06:10:00 James
4. 2016-03-14 07:30:10 Mary
I am using following query but it brings wrong results.
SELECT COUNT(Id) From Logins WHERE HOUR(TimeSigned) > 5 AND HOUR(TimeSigned) < 6
I was expecting it to return a count of 2 (i.e. 1 and the 2 record are within the 5-6 time range) but it brings back 0.
I have created a sqlfiddle here SQL Fiddle
Use = in your first condition. there is nothing between 5 and 6 so it will give count 0
SELECT COUNT(Id) From Logins WHERE HOUR(TimeSigned) >= 5 AND HOUR(TimeSigned) < 6
HOUR() returns the hour part so it's whole numbers.
There are no whole numbers that are greater than 5 and less than 6.
I think you want to just look for the hour is equal to 5
SELECT COUNT(Id) From Logins WHERE HOUR(TimeSigned) = 5
Or if you want you could return counts for each hour by doing
SELECT COUNT(Id) as Count,HOUR(TimeSigned) as Hour From Logins GROUP BY HOUR(TimeSigned)

SQL Server 2008 SP

I have two tables actual and forecast
Actual
month actual
6 20
7 60
8 70
and Forecast
month forecast
9 50
10 150
11 85
I have to update it in same column, i.e. till the data is available it should be updated from actual table and when data is not available there it should be updated from forecast table.
month actual/forecast
6 20
7 60
8 70
9 50
10 150
11 85
I'd do something like the following. The Source column is only for your test purposes. I'm also assuming that you might require some year or something included in your queries if your data spans multiple years.
SELECT month, actual as 'actual/forecast', 'A' as Source FROM Actual
UNION
SELECT month, forecast as 'actual/forecast', 'F' As Source FROM Forecast
WHERE month NOT IN (SELECT month FROM Actual)
maybe this will help
WITH T AS(
SELECT COUNT(*) FROM (SELECT month,actual FROM Actual
UNION ALL
SELECT MONTH,forecast FROM Forecast)
)
IF (SELECT COUNT(*) FROM T) = 0
BEGIN
--UPDATE here forecast TABLE
END
ELSE
--UPDATE here T table

MySQL select/where statement

I have a webapplication linked to a mysql database with the following fields:
field 1:trip_id
field 2:trip_destination
field 3:trip_description
field 4:trip_duration
In the webapplication I have a listbox based on the following:
ListBox value =1: trip duration 1 - 5 days
ListBox value =2: trip duration 6 - 10 days
Listbox value =3: trip duration 11 -20 days
ListBox value =4: trip duration over 20 days
How do I put this in the sql select statement?
SELECT * FROM trip_table WHERE trip_duration BETWEEN start_day-1 AND end_day+1;
You would then need to replace start_day and end_day with your periods e.g. start_day = 6 end_day=10.
Hope this helps.
in its simplest form, from your internally controlled values of the listbox ranges (and I'm not a PHP programmer to fill in the blanks), but a query could be.
select *
from TripTable
where trip_duration >= ?MinimumDays
AND trip_duration <= ?MaximumDays
If you are trying to get all trips, and have them pre-stamped with a 1-4 classification, I would apply a CASE WHEN
select *,
case
when trip_duration >= 1 and trip_duration <=5 then 1
when trip_duration >= 6 and trip_duration <=10 then 2
when trip_duration >= 11 and trip_duration <=20 then 3
when trip_duration > 20 then 4
end as TripDurationType
from
TripTable