We have records taken every 30 seconds. Over the past few years we have a quite a few built up. Creating a windowing search on those records based on some interval of time.
For Example:
User inputs 15 minute increment between records based on current time: Time now: 8:53 (next 15 would be 9:08, 9:23, etc) within 1 hour.
User wants records between two dates (could be as big as 3 years of records at a 15 minute increment).
The records are not guaranteed to be exactly at those times but more like a maybe... we need to return the closest record to each 15 minute window of time.
There are several other query additions to filter these records by but the main issue is the problem above.
I have already created something that meets this goal but it likely can be refined and more performant. Also the way the UI queries and applies the incremented records can likely be improved.
Tell me if anyone has an idea for a solution? Thanks in advance.
Currently I put hourly traffic (total number of input requests) of my website in a MySQL table. I keep data for the last 90 days.
I want to check every hour, lets say 6th hour, that has the traffic increased/decreased beyond some threshold than last 7 days or last 30 days 6th hour traffic. Basically, I see a pattern of traffic. Different hours have different values.
-> Is there some alerting framework which I can use as it is for this purpose?
-> If yes, can someone suggest some open source?
-> If no, I am thinking of keeping running average of last 7 days / last 30 days in a MySQL table for every hour. And according, write a script to generate alerts based on those numbers. I am not very sure whether I should be mean, median or standard deviation. Can someone enlighten me there?
There is some value, x, which I am recording every 30 seconds, currently into a database with three fields:
ID
Time
Value
I am then creating a mobile app which will use that data to plot charts in views of:
Last hour
Last 24 hours.
7 Day
30 Day
Year
Obviously, saving every 30 seconds for the last year and then sending that data to a mobile device will be too much (it would mean sending 1051200 values).
My second thought was perhaps I could use the average function in MySQL, for example, collect all of the averages for every 7 days (creating 52 points for a year), and send those points. This would work, but still MySQL would be trawling through creating averages and if many users connect, it's going to be bad.
So simply put, if these are my views, then I do not need to keep track of all that data. Nobody should care what x was a year ago to the precision of every 30 seconds, this is fine. I should be able to use "triggers" to create some averages.
I'm looking for someone to check what I have below is reasonable:
Store values every 30s in a table (this will be used for the hour view, 120 points)
When there are 120 rows are in the 30s table (120 * 30s = 60 mins = 1 hour), use a trigger to store the first half an hour in a "half hour average" table, remove the first 60 entries from the 30s table. This new table will need to have an id, start time, end time and value. This half hour average will be used for the 24 hour view (48 data points).
When the half hour table has more than 24 entries (12 hours), store the first 6 as an average in a 6 hour average table and then remove from the table. This 6 hour average will be used for the 7 day view (28 data points).
When there are 8 entries in the 6 hour table, remove the first 4 and store this as an average day, to be used in the 30 day view (30 data points).
When there are 14 entries in the day view, remove the first 7 and store in a week table, this will be used for the year view.
This doesn't seem like the best way to me, as it seems to be more complicated than I would imagine it should be.
The alternative is to keep all of the data and let mysql find averages as and when needed. This will create a monstrously huge database. I have no idea about the performance yet. The id is an int, time is a datetime and value is a float. Is 1051200 records too many? Now is a good time to add, I would like to run this on a raspberry pi, but if not.. I do have my main machine which I could use.
Your proposed design looks OK. Perhaps there are more elegant ways of doing this, but your proposal should work too.
RRD (http://en.wikipedia.org/wiki/Round-Robin_Database) is a specialised database designed to do all of this automatically, and it should be much more performant than MySQL for this specialised purpose.
An alternative is the following: keep only the original table (1051200 records), but have a trigger that generates the last hour/day/year etc views every time a new record is added (e.g. every 30 seconds) and store/cache the result somewhere. Then your number-crunching workload is independent of the number of requests/clients you have to serve.
1051200 records may or may not be too many. Test in your Raspberry Pi to find out.
Let me give a suggestion on the physical layout of your table, regardless on whether you decide to keep all data or "prune" it from time to time...
Since you generate a new row "every 30 seconds", then Time can serve as a natural key without fear of exceeding the resolution of the underlying data type and causing duplicated keys. You don't need ID in this scenario1, so your table is simply:
Time (PK)
Value
And since InnoDB tables are clustered, not having secondary indexes2 means the whole table is stored in a single B-Tree, which is as efficient as it gets from storage and querying perspective. On top of that, Value is automatically covered, which may not have been the case in your original design unless you specifically designed your index(es) for that.
Using time as key can be tricky in general, but I think may be worth it in this particular case.
1 Unless there are other tables that reference it through FOREIGN KEYs, or you have already written too much code that depends on it.
2 Which would be necessary in the original design to support efficient aggregation.
I want to retrive the data in lighting speed for my analytics solution.This problem is as follows.
I am processing a lot of data every 15 minutes and creating different cubes [tuples] with very huge number of different distinct dimensions. in detail i am segregating data in 100 Countries X 10000 states X 5000 TV-Channels X 999 Programs X ... X 15 Min Time Interval. So every 15 minutes this is creating lots of different tuples. Currently i am using Mysql Database and Dump data via file so it is much faster to write. I also have different tables for 15 Min,1 hour, 1 Day ,1 Week, 1Month, 1 YEAR granularity aggregated tables which i use for different types of queries. But while retrieving it is taking lot of time[ even after best indexing done on database tables.]
Please anyone provide me solution of this problem ? If possible distinguishing NoSql with MySql database ?
I am getting my data in simple txt file as server log. which is generated by my java web service application using logging functionality. like this ... – Dhimant Jayswal 13 mins ago
As i mentioned i am creating different dimensions via processing these logs every 15 minutes and saving into mysql database table doing aggregation on same dimensions during that time(for example - for 1 hour there will be 4 15-min-dimensions will be aggregated to create 1 dimension in more detail, i am creating a dimension[/Object] like time=2012-11-16 10:00:00, Country='US', Location='X', His=15 ; another one time=2012-11-16 10:15:00, Country='US', Location='X',His=67). and then these values goes to database tables. So i can retrive data anytime by an hour or 15 mins or day
I'm making an air conditioning scheduler website for school. A user will be able to add a temperature and humidity setting for any of the 30 minute intervals throughout the day, for seven days of the week. For example, a user will be able to say that on Sunday, at 3:30 PM, they want the cooler (rather than the heater) to cool their home down to 70 degrees and a humidity index of 50 for 15 minutes. I could use advice setting up a MySQL table (or tables) to handle such commands. It's not the individual variables for all the potential settings I'm worried about, but rather handling all those times for all seven days.
So far I am thinking of having one big table called Scheduler which would handle the entire week. The day AND time slots for the seven days of the week could go into a VARCHAR column called time_slot, and would have both the day and the time slot in military time. For example.
time_slot (a VARCHAR column)
sunday_0000 (this is sunday at midnight)
.....
sunday_1630 (this is sunday at 4:30 pm)
.....
sunday_1130 (this is the final possible sunday time slot at 11:30 PM)
monday_0000 (this is the start of monday)
(continue for all seven days)
the remaining columns for the table would be all the necessary settings a user could put, as well as a duration from 30 seconds to the full 30 minutes before the next potential time slot. Does anyone have any ideas for a more efficient MySQL table? Perhaps something that gives each individual day it's own table?
You may want to consider having multiple columns, using TINYINT for day (1-7) and TIME (00:00-23:59). This way one could set the time for each days individually or all at once.
e.g.
UPDATE scheduler
set ...
where TIME = '12:00';