Currently I put hourly traffic (total number of input requests) of my website in a MySQL table. I keep data for the last 90 days.
I want to check every hour, lets say 6th hour, that has the traffic increased/decreased beyond some threshold than last 7 days or last 30 days 6th hour traffic. Basically, I see a pattern of traffic. Different hours have different values.
-> Is there some alerting framework which I can use as it is for this purpose?
-> If yes, can someone suggest some open source?
-> If no, I am thinking of keeping running average of last 7 days / last 30 days in a MySQL table for every hour. And according, write a script to generate alerts based on those numbers. I am not very sure whether I should be mean, median or standard deviation. Can someone enlighten me there?
Related
We have records taken every 30 seconds. Over the past few years we have a quite a few built up. Creating a windowing search on those records based on some interval of time.
For Example:
User inputs 15 minute increment between records based on current time: Time now: 8:53 (next 15 would be 9:08, 9:23, etc) within 1 hour.
User wants records between two dates (could be as big as 3 years of records at a 15 minute increment).
The records are not guaranteed to be exactly at those times but more like a maybe... we need to return the closest record to each 15 minute window of time.
There are several other query additions to filter these records by but the main issue is the problem above.
I have already created something that meets this goal but it likely can be refined and more performant. Also the way the UI queries and applies the incremented records can likely be improved.
Tell me if anyone has an idea for a solution? Thanks in advance.
I am developing an application that consist of a server in node.js, basically a socket that is listening for incoming connections.
Data that arrive to my server, come from GPS trackers (30 approximately), that send 5 records per minute each one, so in a minute a will have 5*30 = 150 records, in a hour i will have 150*60 = 9000 records, in a day 9000*24 =216000 and in a month 216000*30 = 6.480.000 million of records.
In addition to the latitude and longitude, i have to store in the database (MySql) the cumulative distance of each tracker. Each tracker send to server positions, and i have to calculate kms between 2 points every time i receive data (to decrease the work to the database when it has millions of records).
So the question is, what is the correct way to sum the kilometers and store it?
I think sum the entire database is not a solution because in millions of records will be very slow. Maybe, every time i have to store a new point (150 times per minute), can I do a select last record in database and then sum the cumulative kilometer with the new distance calc?
2.5 inserts per second is only a modest rate. 6M records/month -- no problem.
How do you compute the KMs? Compute the distance of the previous GPS reading to the current? Or maybe back to the start? Keep in mind that GPS readings can be a bit flaky; a car going in a straight line may look drunk when plotted every 12 seconds. Meanwhile, I will assume you need some kind of index on (sensor, sequence) to keep track of the previous (or first) reading to do the distance.
But, what will you do with the distance? Is it being continually read out for display somewhere? That is, are you updating some non-MySQL thingie 150 times per minute? If so, you have an app that should receive the new GPS reading, store it into MySQL, read the starting point (or remember it), compute the kms and update the graph. That is, MySQL is not the focus here, but your app is.
As for representation of lat/lng, I reference my cheat sheet to see that FLOAT may be optimal.
Kilometers should almost certainly be stored as FLOAT. That gives you about 7 significant digits of precision. You should decide whether to the value represents "meters" or "kilometers". (The precision is the same.)
I'm currently working on an application, which requires detection of activity within the past 12 hours (divided into 10 minute intervals, i.e. 6 times an hour). My initial thought has been to do a cronjob every 10 minutes to detect and accumulate activity level (rows) from a table within the past 10 minutes, and then insert these into a new table, which collects and updates the activity level for e.g. 1.20 (from current timestamp).
Though I'm struggling in figuring out the logic of "pushing" (in need of a better word) all other values to next value within the table. E.g. 1 hour and 20 minutes ago, should then be "pushed" to 1 hour and 30 minutes ago etc.
I realize that my thoughts as to the setup is limited to my understanding of PHP/MySQL and use of same, but am open to other setups such as NodeJS/MongoDB if that seems more flexible and feasible. The output should be a JSON file, showing the activity level for each hour divided into 10 minutes for the past 12 hours.
Would love some thoughts/feedback as to approach and way to handle this. Thanks a bunch in advance.
There is some value, x, which I am recording every 30 seconds, currently into a database with three fields:
ID
Time
Value
I am then creating a mobile app which will use that data to plot charts in views of:
Last hour
Last 24 hours.
7 Day
30 Day
Year
Obviously, saving every 30 seconds for the last year and then sending that data to a mobile device will be too much (it would mean sending 1051200 values).
My second thought was perhaps I could use the average function in MySQL, for example, collect all of the averages for every 7 days (creating 52 points for a year), and send those points. This would work, but still MySQL would be trawling through creating averages and if many users connect, it's going to be bad.
So simply put, if these are my views, then I do not need to keep track of all that data. Nobody should care what x was a year ago to the precision of every 30 seconds, this is fine. I should be able to use "triggers" to create some averages.
I'm looking for someone to check what I have below is reasonable:
Store values every 30s in a table (this will be used for the hour view, 120 points)
When there are 120 rows are in the 30s table (120 * 30s = 60 mins = 1 hour), use a trigger to store the first half an hour in a "half hour average" table, remove the first 60 entries from the 30s table. This new table will need to have an id, start time, end time and value. This half hour average will be used for the 24 hour view (48 data points).
When the half hour table has more than 24 entries (12 hours), store the first 6 as an average in a 6 hour average table and then remove from the table. This 6 hour average will be used for the 7 day view (28 data points).
When there are 8 entries in the 6 hour table, remove the first 4 and store this as an average day, to be used in the 30 day view (30 data points).
When there are 14 entries in the day view, remove the first 7 and store in a week table, this will be used for the year view.
This doesn't seem like the best way to me, as it seems to be more complicated than I would imagine it should be.
The alternative is to keep all of the data and let mysql find averages as and when needed. This will create a monstrously huge database. I have no idea about the performance yet. The id is an int, time is a datetime and value is a float. Is 1051200 records too many? Now is a good time to add, I would like to run this on a raspberry pi, but if not.. I do have my main machine which I could use.
Your proposed design looks OK. Perhaps there are more elegant ways of doing this, but your proposal should work too.
RRD (http://en.wikipedia.org/wiki/Round-Robin_Database) is a specialised database designed to do all of this automatically, and it should be much more performant than MySQL for this specialised purpose.
An alternative is the following: keep only the original table (1051200 records), but have a trigger that generates the last hour/day/year etc views every time a new record is added (e.g. every 30 seconds) and store/cache the result somewhere. Then your number-crunching workload is independent of the number of requests/clients you have to serve.
1051200 records may or may not be too many. Test in your Raspberry Pi to find out.
Let me give a suggestion on the physical layout of your table, regardless on whether you decide to keep all data or "prune" it from time to time...
Since you generate a new row "every 30 seconds", then Time can serve as a natural key without fear of exceeding the resolution of the underlying data type and causing duplicated keys. You don't need ID in this scenario1, so your table is simply:
Time (PK)
Value
And since InnoDB tables are clustered, not having secondary indexes2 means the whole table is stored in a single B-Tree, which is as efficient as it gets from storage and querying perspective. On top of that, Value is automatically covered, which may not have been the case in your original design unless you specifically designed your index(es) for that.
Using time as key can be tricky in general, but I think may be worth it in this particular case.
1 Unless there are other tables that reference it through FOREIGN KEYs, or you have already written too much code that depends on it.
2 Which would be necessary in the original design to support efficient aggregation.
We have a MySQL database table with statistical data that we want to present as a graph, with timestamp used as the x axis. We want to be able to zoom in and out of the graph between resolutions of, say, 1 day and 2 years.
In the zoomed out state, we will not want to get all data from the table, since that would mean to much data being shipped through the servers, and the graph resolution will be good enough with less data anyway.
In MySQL you can make queries that only select e.g. every tenth value and similar, which could be usable in this case. However, the intervals between values stored in the database isn't consistent, two values can be separated by as little as 10 minutes and as much as 6 hours, possibly more.
So the issue is that it is difficult to calculate a good stepping interval for the query, if we skip every tenth value for some reslution, that may work for series 10 minutes inbetween, but for 6 hour intervals we will throw away too much and the graph will end up having a too low resolution for comfort.
My impression is that MySQL isn't able to have a stepping interval depend on time so it would skip rows that are e.g. in the vicinity of five minutes of am included rows.
One solution could be to set 6 hours as a minimal resolution requirement for the graph, so we don't throw away values unless 6 hours is represented by a sufficiently small distance in the graph. I fear that this may result in too much data being read and sent through the system if the interval actually is smaller.
Another solution is to have more intelligence in the Java code, reading sets of data iteratively from low resolution and downwards until the data is good enough.
Any ideas for a solution that would enable us to get optimal resolution in one read, without too large result sets being read from the database, while not putting too much load on the database? I'm having wild ideas about installing an intermediate NoSQL component to store the values in, that might support time intervals the way I want - not sure if that actually is an option in the organisation.