I have a webapplication linked to a mysql database with the following fields:
field 1:trip_id
field 2:trip_destination
field 3:trip_description
field 4:trip_duration
In the webapplication I have a listbox based on the following:
ListBox value =1: trip duration 1 - 5 days
ListBox value =2: trip duration 6 - 10 days
Listbox value =3: trip duration 11 -20 days
ListBox value =4: trip duration over 20 days
How do I put this in the sql select statement?
SELECT * FROM trip_table WHERE trip_duration BETWEEN start_day-1 AND end_day+1;
You would then need to replace start_day and end_day with your periods e.g. start_day = 6 end_day=10.
Hope this helps.
in its simplest form, from your internally controlled values of the listbox ranges (and I'm not a PHP programmer to fill in the blanks), but a query could be.
select *
from TripTable
where trip_duration >= ?MinimumDays
AND trip_duration <= ?MaximumDays
If you are trying to get all trips, and have them pre-stamped with a 1-4 classification, I would apply a CASE WHEN
select *,
case
when trip_duration >= 1 and trip_duration <=5 then 1
when trip_duration >= 6 and trip_duration <=10 then 2
when trip_duration >= 11 and trip_duration <=20 then 3
when trip_duration > 20 then 4
end as TripDurationType
from
TripTable
Related
BACKGROUND:
I have data that looks like this
date src subsrc subsubsrc param1 param2
2020-02-01 src1 ksjd dfd8 47 31
2020-02-02 src1 djsk zmnc 44 95
2020-02-03 src2 skdj awes 92 100
2020-02-04 src2 mxsf kajs 80 2
2020-02-05 src3 skdj asio 46 53
2020-02-06 src3 dekl jdqo 19 18
2020-02-07 src3 dskl dqqq 69 18
2020-02-08 src4 sqip riow 64 46
2020-02-09 src5 ss01 qwep 34 34
I am trying to aggregate for all time, last 30 days and last 90 days (no rolling sum)
So my final data would look like this:
src subsrc subsubsrc p1_all p1_30 p1_90 p2_all p2_30 p2_90
src1 ksjd dfd8 7 1 7 98 7 98
src1 djsk zmnc 0 0 0 0 0 0
src2 skdj awes 12 12 12 4 4 4
src2 mxsf kajs 6 6 6 31 31 31
src3 skdj asio 0 0 0 0 0 0
src3 dekl jdqo 20 20 20 17 17 17
src3 dskl dqqq 3 3 3 4 4 4
src4 sqip qwep 0 0 0 0 0 0
src5 ss01 qwes 15 15 15 2 2 2
ABOUT DATA:
This is only dummy data and therefore incorrect.
There are tens of thousands of rows in my data.
There are a dozen of src columns that make up the key for the table.
There are a dozen of param columns that I have to sum for 30 and 90 and all time.
Also there are null values in param columns.
Also there are might be multiple rows for same day and src column.
New data is being added every day and the query is probably going to be run every day to get the latest 30, 90, all time data.
WHAT I HAVE TRIED:
This is what I have come up with:
SELECT src, subsubsrc, subsubsrc,
SUM(param1) as param1_all,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 30 THEN param1 END) as param1_30,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 90 THEN param1 END) as param1_90,
SUM(param2) as param2_all,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 30 THEN param2 END) as param2_30,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 90 THEN param2 END) as param2_90,
FROM `MY_TABLE`
GROUP BY src
ORDER BY src
This actually works but I can anticipate how long this query is going to become for multiple sources and even more param columns.
I have been trying something called "Filtered aggregate functions (or manual pivot)" explained HERE. But I am unable to understand/implement it for my case.
Also I have looked at dozens of answers and most of them are running sums for each day OR are complicated cases of this basic calculation. Maybe I am not searching it correctly.
As you can see I am newbie in SQL and would really appreciate any help.
Your query looks quite good; conditional aggregation is the canonical method to pivot a dataset.
One way to possibly increase performance would be to change the date filter in the conditional expressions: using a date function precludes the use of an index.
Instead, you could phrase this as:
select
src,
subsrc,
subsubsrc,
sum(param1) as param1_all,
sum(case when date >= current_date - interval 30 day then param1 end) as param1_30,
sum(case when date >= current_date - interval 90 day then param1 end) as param1_90,
sum(param2) as param2_all,
sum(case when date >= current_date - interval 30 day then param2 end) as param2_30,
sum(case when date >= current_date - interval 90 day then param2 end) as param2_90
from my_table
group by src, subsrc, subsubsrc
order by src, subsrc, subsubsrc
For performance, the following index may be helpul: (src, subsrc, subsubsrc, date).
Note that I included all three non-aggregated columns (src, subsrc, subsubsrc) in the group by clause: starting MySQL 5.7, this is by default mandatory (although you can play around with sql modes to alter that behavior) - and most other databases implement the same constraint.
Your first approach isn't a bad one if you are able to build the query programmatically. One alternative might be to create side tables for the 30 and 90 day cases first so you can effectively select all columns from each. This could also be done in sub-queries but there are performance considerations.
Some untested pseudo code to hopefully clarify:
SELECT
src,
subsrc,
subsubsrc,
SUM(param1) as param1_all,
-- other "all" sums here
SUM(t30.param1) as param1_30,
-- other "30" sums here
SUM(t90.param1) as param1_90,
-- other "90" sums here
FROM MY_TABLE
LEFT JOIN (
SELECT *
FROM MY_TABLE
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
) as t30 on t30.src = MY_TABLE.src
LEFT JOIN (
SELECT *
FROM MY_TABLE
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
) as t90 on t90.src = MY_TABLE.src
GROUP BY MY_TABLE.src
ORDER BY MY_TABLE.src
Note the date conditions have been switched to not use a function on the date column but instead compare to a date value. Your original approach would defeat any index on date (which you will want to make this more efficient).
If you first put these sub-queries into side tables that have a key on src the joins will be more efficient too. You could even group/sum directly into those side tables first rather than creating larger copies of your data, and then join the aggregated data together.
Your code looks good. Your RDBMS needs to loop all records under the hood and do some calculations. One thing that you can improve is that you are calculating date differences for all records. It would make sense to calculate the moment 30 days ago and 90 days ago beforehand, respectively and only compare the dates against those.
Since you already know that the number of rows and parameters will increase in the future, it makes sense to create a cron job which daily computes this in the following manner:
the first time it calculates the values, it should store all the results along with the date it was running at (maybe into a table dedicated for this analytics)
on subsequent days you can calculate the all time sum by loading the items which were created since the last check
you will still need to calculate the 30 and 90 day stuff, but that would be much less of a problem than calculating this for all time
If you do this properly and have daily information, then later on you will be able to analyze trends in history as well.
I'd recommend you use 3 different queries for that:
Sum for all time
Sum for 30 days
Sum for 90 days
Because when you're trying to do all-in-1 query then you end up with full table scan because of CASE-WHEN-END (BTW there is compact form IF() in MySQL). This is extremely non-optimal.
If you split it into 3 different queries and add an index to the date column then it won't do full-scan for the 2nd and 3rd query. Only for the 1st query, which can be optimised separately (for example by caching).
Also this approach: DATE_DIFF(CURRENT_DATE,date,day) <= 90
should be changed to: date >= 'date-90-days-ago' (where 'date-90-days-ago' is a fixed date)
Thus you won't have to compute difference of 2 dates for every row. You'll have just to compute 2 dates: 30 days ago and 90 days ago and compare all other dates to these two. This approach will benefit of the date column index.
I need to show a timeline from MySQL table. Basically retrieve count of records by each hour. My TimeSigned column is DateStamp
Login
Id TimeSigned MarkedBy
1. 2016-03-14 05:12:17 James
2. 2016-03-14 05:30:10 Mark
3. 2016-03-14 06:10:00 James
4. 2016-03-14 07:30:10 Mary
I am using following query but it brings wrong results.
SELECT COUNT(Id) From Logins WHERE HOUR(TimeSigned) > 5 AND HOUR(TimeSigned) < 6
I was expecting it to return a count of 2 (i.e. 1 and the 2 record are within the 5-6 time range) but it brings back 0.
I have created a sqlfiddle here SQL Fiddle
Use = in your first condition. there is nothing between 5 and 6 so it will give count 0
SELECT COUNT(Id) From Logins WHERE HOUR(TimeSigned) >= 5 AND HOUR(TimeSigned) < 6
HOUR() returns the hour part so it's whole numbers.
There are no whole numbers that are greater than 5 and less than 6.
I think you want to just look for the hour is equal to 5
SELECT COUNT(Id) From Logins WHERE HOUR(TimeSigned) = 5
Or if you want you could return counts for each hour by doing
SELECT COUNT(Id) as Count,HOUR(TimeSigned) as Hour From Logins GROUP BY HOUR(TimeSigned)
This is a bit difficult to describe, and I'm not sure if this can be done in SQL. Using the following example data set:
ID Count Date
1 0 1/1/2015
2 3 1/5/2015
3 4 1/6/2015
4 3 1/9/2015
5 9 1/15/2015
I want to return records where the Date column falls into a range. But, if the "from" date doesn't exist in the table, I want to use the most recent date as my "From" select. For example, if my date range is between 1/5 and 1/9, I would expect to have records 2,3, and 4 returned. But, if I have a date range of 1/3 - 1/6 I want to return records 1,2,and 3. I want to include record 1 because, as 1/3 does not exist, I want the value of the Count that is rounded down.
Any thoughts on how this can be done? I'm using MySQL.
Basically, you need to replace the from date with the latest date before or on that date. Let me assume that the variables are #v_from and #v_to.
select e.*
from example e
where e.date >= (select max(e2.date) from example e2 where e2.date <= #v_from) and
e.date <= #v_to;
EDIT AFTER EDIT:
SELECT *
FROM TABLE
WHERE DATE BETWEEN (
SELECT Date
FROM TABLE
WHERE Date <= #Start
ORDER BY Date DESC
LIMIT 1
)
AND #End
Or
SELECT *
FROM TABLE
WHERE DATE BETWEEN (
SELECT MAX(Date)
FROM TABLE
WHERE Date <= #Start
)
AND #End
I have two tables actual and forecast
Actual
month actual
6 20
7 60
8 70
and Forecast
month forecast
9 50
10 150
11 85
I have to update it in same column, i.e. till the data is available it should be updated from actual table and when data is not available there it should be updated from forecast table.
month actual/forecast
6 20
7 60
8 70
9 50
10 150
11 85
I'd do something like the following. The Source column is only for your test purposes. I'm also assuming that you might require some year or something included in your queries if your data spans multiple years.
SELECT month, actual as 'actual/forecast', 'A' as Source FROM Actual
UNION
SELECT month, forecast as 'actual/forecast', 'F' As Source FROM Forecast
WHERE month NOT IN (SELECT month FROM Actual)
maybe this will help
WITH T AS(
SELECT COUNT(*) FROM (SELECT month,actual FROM Actual
UNION ALL
SELECT MONTH,forecast FROM Forecast)
)
IF (SELECT COUNT(*) FROM T) = 0
BEGIN
--UPDATE here forecast TABLE
END
ELSE
--UPDATE here T table
I want to get the number of Registrations back from a time period (say a week), which isn't that hard to do, but I was wondering if it is in anyway possible to in MySQL to return a zero for days that have no registrations.
An example:
DATA:
ID_Profile datCreate
1 2009-02-25 16:45:58
2 2009-02-25 16:45:58
3 2009-02-25 16:45:58
4 2009-02-26 10:23:39
5 2009-02-27 15:07:56
6 2009-03-05 11:57:30
SQL:
SELECT
DAY(datCreate) as RegistrationDate,
COUNT(ID_Profile) as NumberOfRegistrations
FROM tbl_profile
WHERE DATE(datCreate) > DATE_SUB(CURDATE(),INTERVAL 9 DAY)
GROUP BY RegistrationDate
ORDER BY datCreate ASC;
In this case the result would be:
RegistrationDate NumberOfRegistrations
25 3
26 1
27 1
5 1
Obviously I'm missing a couple of days in between. Currently I'm solving this in my php code, but I was wondering if MySQL has any way to automatically return 0 for the missing days/rows. This would be the desired result:
RegistrationDate NumberOfRegistrations
25 3
26 1
27 1
28 0
1 0
2 0
3 0
4 0
5 1
This way we can use MySQL to solve any problems concerning the number of days in a month instead of relying on php code to calculate for each month how many days there are, since MySQL has this functionality build in.
Thanks in advance
No, but one workaround would be to create a single-column table with a date primary key, preloaded with dates for each day. You'd have dates from your earliest starting point right through to some far off future.
Now, you can LEFT JOIN your statistical data against it - then you'll get nulls for those days with no data. If you really want a zero rather than null, use IFNULL(colname, 0)
Thanks to Paul Dixon I found the solution. Anyone interested in how I solved this read on:
First create a stored procedure I found somewhere to populate a table with all dates from this year.
CREATE Table calendar(dt date not null);
CREATE PROCEDURE sp_calendar(IN start_date DATE, IN end_date DATE, OUT result_text TEXT)
BEGIN
SET #begin = 'INSERT INTO calendar(dt) VALUES ';
SET #date = start_date;
SET #max = SUBDATE(end_date, INTERVAL 1 DAY);
SET #temp = '';
REPEAT
SET #temp = concat(#temp, '(''', #date, '''), ');
SET #date = ADDDATE(#date, INTERVAL 1 DAY);
UNTIL #date > #max
END REPEAT;
SET #temp = concat(#temp, '(''', #date, ''')');
SET result_text = concat(#begin, #temp);
END
call sp_calendar('2009-01-01', '2010-01-01', #z);
select #z;
Then change the query to add the left join:
SELECT
DAY(dt) as RegistrationDate,
COUNT(ID_Profile) as NumberOfRegistrations
FROM calendar
LEFT JOIN
tbl_profile ON calendar.dt = tbl_profile.datCreate
WHERE dt BETWEEN DATE_SUB(CURDATE(),INTERVAL 6 DAY) AND CURDATE()
GROUP BY RegistrationDate
ORDER BY dt ASC
And we're done.
Thanks all for the quick replies and solution.