Calculating the tendency of a value over time - mysql

I've got a table with (amongst other values) the temperature for the last 4 hours. When making a graph of it I can see the 'tendency' of the graph on an eye blink:
The thick red line obviously has a negative direction. While the green line has a positive direction.
How can I calculate this 'direction' value of the last 3 hours worth of data. This data can be retreived from the database with the following sql-statement:
SELECT temp FROM weather WHERE time_utc => NOW() - INTERVAL 3 HOUR
Is there a function like AVG() or something to calculate this or am I overthinking this?

What about this :
SELECT HOUR(time_utc) as hour_group, AVG(temp)
FROM weather
WHERE time_utc => NOW() - INTERVAL 3 HOUR
GROUP BY hour_group
This way you divide your measures in hour block and can compare the first with the last?

Related

How to get data of the current time from mysql database

I have data per day in my database mysql. What I want to do is to select the km field in the current time in the day minus the km field in start of the day (current day time :00:00 am).
I tried this but it didnt work:
SELECT km from position where serverTime=startOfTheDay
SELECT km from position where serverTime=currentTime()
These two requests don't work. I just need an idea of what I need to do.
Also, I need to minus the last km from the first one, help please??
You can try this approach - it finds the difference between the smallest and largest km for each day.
select min(serverTime) as start,
max(serverTime) as stop,
max(km) - min(km) as km_difference
from position
group by date(serverTime);
First one (and only works if you have properly defined the col serverTime): SELECT... curdate()

How to add extra 5 minutes in mysql using query

I'm new for mysql, Already value in time field, I want to update extra 5 minutes in time field using query. I tried so many things but not working.
Here my query:
UPDATE STUDENT SET START_TIME = ADDTIME(START_TIME, 500) WHERE ID = 1;
Above query working but one issue is there that is, If my field having 23:55:00.
I want result after executing query 00:00:00 but it updates 24:00:00.
Anyone help me!
Thanks in advance!!
This is bit tricky, because you only have the time, and you want it to wrap around to 0 after hitting 24 hours. My approach is to extract the number of seconds from START_DATE, add 5 minutes, then take the mod of this by 24 hours to wrap around to zero if it exceeds one day's worth of seconds.
UPDATE STUDENT
SET START_TIME = CAST(STR_TO_DATE(CAST(MOD((TIME_TO_SEC(START_TIME) + 300), 86400) AS CHAR(5)), '%s') AS TIME)
WHERE ID = 1
In the demo below, you can see the logic in action which correctly converts 23:55:00 with five minutes added to become 00:00:00.
SQLFiddle
However, the easiest solution in your case might be to just use a DATETIME and ignore the date component. Then the time should wrap automatically to a new day.
select addtime('23:55:00', '00:06:00');
output - 24:01:00 (Ideally it is right, because time datatype represents only time, if it converts to 00:01:00 then time component looses 24hr, which is wrong)
select addtime('2016-09-01 23:55:00', '00:06:00');
output - 2016-09-02 00:01:00 (In this case, 24hr gets added in date so time component is represented as 00:01:00)
If the requirement is to get it as 00:01:00 then here is the workaround -
SELECT TIME((ADDTIME(TIME('23:59:59'), TIME('02:00:00')))%(TIME('24:00:00')));
reference -
ADDTIME() return 24 hour time

R how to select 150 days with only month and day information

I was able to select last 150 days from database when having column 'year' as follow:
data1 = dbGetQuery(conn_data, statement=paste("SELECT *, STR_TO_DATE(CONCAT(yyyy,'-',mm,'-',dd),'%Y-%m-%d') as dt FROM stations_daily_data", "WHERE STR_TO_DATE(CONCAT(yyyy,'-',mm,'-',dd),'%Y-%m-%d') >= DATE_SUB(CURDATE(), INTERVAL 150 DAY)"))
But now all data were averaged to date and thus only have columns 'month' and 'day' (no column 'year'), and I was stuck in how to select last 150 days this time. Here is the simplified example of data frame with original one of 17 million rows:
df <- data.frame(ID=c(1:5,50001:50005),mm=c(rep(1,5),rep(12,5)),dd=c(1:5,27:31),value=c(21:30))
Feb 29 can be ignored since 150 days is a significant amount of time period.
I tried add column 'year' so that I could use the code above, but it would be wrong if say, current date is at the beginning of a year, also make changes to a big table in R would run out of R memory, I'm not familiar with database query, is it possible that I can do this by just using query instead of read the table into R and then make changes in the data frame in R, any suggestion would be appreciated!
EDIT:
The column 'year' is no longer needed since its all been averaged to date, which means now May 5th would be the average of 60 years of May 5th of each year. Next I would like to select last 150 days(averaged), the reason I tried to add column 'year' was simply try to make it easier to select.
Since I need to run the data every day, so if the day is after the month of June it would be easy just to use the current year, but if it's the month of February, then it would be current year-1, this could be done if the data is much smaller, now if I make change to the data frame, the R would pop out error of 'out of memory', that's why I was wondering if there is a way to select in database query or functions in R that wouldn't cost much memory, thanks!
You could write a function to calculate year based on a reference year plus an adjustment based on a cut off month. Then you could use the order function to order the data.frame based on calculated year, month, and day, without inserting the new calculated year field into the data.frame.
Won't have a great performance on 17 million row dataset though, since you are still ordering every row.
# some dummy data (not worrying about illegal dates like Feb 31)
set.seed(123)
da <- data.frame(mm=sample(1:12, 20, replace=T),
dd=sample(1:31, 20, replace=T))
# function to calculate year from reference year and cut off month
calc_year <- function(mm_vec, ref_year, cut_month) {
ref_year + ifelse(mm_vec >= cut_month, 0, -1)
}
# order the data.frame by year, month, and day
# (taking 2014 as ref. year & assuming months before June are from prior year
da[with(da, order(calc_year(mm_vec=mm, ref_year=2014, cut_month=6), mm, dd)), ]
# if you want just the first 5 rows
da[with(da, order(calc_year(mm_vec=mm, ref_year=2014, cut_month=6), mm, dd)), ][1:5,]

average rows in a column that are between 5 minutes

I would like to ask about how can I take average of rows in a column that are between 5 minutes.
In order to be more accurate I have a table like this
id-----link_id---------date---------------------speed
0---------123------(24/4/2014 12:03:34)----------45
1---------123------(24/4/2014 12:04:34)----------43
2---------127------(24/4/2014 12:04:37)----------50
3---------123------(28/4/2014 12:03:34)----------60
i would like to create a new table that will have the average of speed for rows that have the same link_id and are between 5 minutes
In the case that I mentioned above only the two first rows comply the requirements
and i want a new table like this
id-----link_id---------date---------------------speed
0---------123------(24/4/2014 12:00:00)----------44
2---------127------(24/4/2014 12:00:00)----------50
3---------123------(28/4/2014 12:00:00)----------60
which is the query that i have to use to create a new table with those requirments?
thank you in advance
It is not clear what you mean by 'average of speed for rows that ... are between five minutes.' So I will guess.
I guess you want to compute the averages for each distinct five minute interval. For example, you want averages of all items with timestamps from 2014-04-24 12:00:00 to 2014-04-24:12:04:59, then another average for items with timestamps from 2014-04-24 12:05:00 to 2014-04-24:12:09:59, and so forth.
To do this, you need to start with an expression that will take any DATETIME value and round it down to the beginning of its five-minute interval. How do you get that?
First, this expression will round down a timestamp to the beginning of the minute in which it occurs:
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00')
This expression gives the number of minutes past the hour, modulo 5.
MINUTE(`date`)%5
So, this expression gives you the rounded-down DATETIME you need:
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE
Great. Now we need to use that in an aggregate query to compute the average speeds.
SELECT link_id,
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE AS five_min
AVG(speed) AS avg_speed
FROM mytable
GROUP BY link_id,
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE
ORDER BY link_id,
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE
This will do the trick you need done. There will be one row for each distinct link_id and five-minute interval of time. The time interval will be named by giving the time at which it begins. Each row will contain the average speed for observations in that time interval.
It's helpful when creating your specification for this kind of query to think very carefully about what you want each row of your result set to contain. If you do that, you will probably find that your query flows naturally from your specification.
Here's a more extensive writeup on how to do this sort of thing.
http://www.plumislandmedia.net/mysql/sql-reporting-time-intervals/

subtract the data for every 5 minutes between two particular times

I have some problem with MYSQL,I need to subtract the data between two particular times,for every 5 minutes and then average it the 5 minutes data.
What I am doing now is:
select (avg(columnname)),convert((min(datetime) div 500)*500, datetime) + INTERVAL 5 minute as endOfInterval
from Databasename.Tablename
where datetime BETWEEN '2012-09-12 10:50:00' AND '2012-09-12 14:50:00'
group by datetime div 500;
It is the cumulative average.
Suppose i get 500 at 11 o' clock and 700 at 11.05 ,the average i need is (700-500)/5 = 40.
But now i am getting (500+700)/5 = 240.
I dont need the cumulative average .
Kindly help me.
For the kind of average you're talking about, you don't want to aggregate multiple rows using a GROUP BY clause. INstead, you want to compute your result using exactly two diffrent rows from the same table. This calls for a self-join:
SELECT (b.columnname - a.columnname)/5, a.datetime, b.datetime
FROM Database.Tablename a, Database.Tablename b
WHERE b.datetime = a.datetime + INTERVAL 5 MINUTE
AND a.datetime BETWEEN '2012-09-12 10:50:00' AND '2012-09-12 14:45:00'
a and b refer to two different rows of the same table. The WHERE clause ensures that they are exactly 5 minutes apart.
If there is no second column matching that temporal distance, no resulting row will be included in the query result. If your table doesn't have data points exactly every five minutes, but you have to search for the suitable partner instead, then things become much more difficult. This answer might perhaps be adjusted for that use case. Or you might implement this at the application level, instead of on the database server.