I have a very large data set (10 minutes wind speed observations over 10 years) which I want to compare by way of a normalised / standardised S curve / probability distribution to different data sets with far less records. In Excel I'd sort the data by wind speed (lowest to highest) take the record / row number, divide by the total number of records / rows and get 2 normalised probability distributions from P0 to P100 to compare (then graph with y axis speed and x axis from 0 to 100).
Table is analysis, Query is Master with fields "Timestamp", "Average wind speed ms". Would like Excel to be able to get 100 records (time and speed) from Access at P0, P1 .... P100 for comparing to same output from another query with same fields. Or combine the two lots of speed data into one query with 100 records with Px wind speed from A and Px wind speed from B on same row.
Related
I have a problem in deciding on index. I have a table in mySQL say X with atlas 70 million records. I need to query few fields based on filter like Year = X and Quarter = X and ( user = X or manager = X or ..)
I have an index on year and quarter which is not considered. So if it is considered, then less than 10% of the data is used.
I have index on year, quarter and all the user fields. Even then the index is not considered.
What am I doing wrong?
I'm afraid I with this situation:
I have a MySQL table with just 3 columns: ID, CREATED, TOTAL_VALUE.
A new TOTAL_VALUE is recorded roughly every 60 seconds, so about 1440 times a day.
I am using PHP to generate some CanvasJS code that plots the MySQL records into line graph - this so that I can see how TOTAL_VALUE changes over time.
it works great for displaying 1 day worth of data, but when doing 1 week(7*1440=10080 plot points) things get really slow.
And a date range of for example 1-JAN-2016 and 1-SEP-2016 just leads to time outs in the PHP script.
How can I write some MySQL that still selects records between a date range but limit the rows returned to ie max 1000 rows?
I need to optimize this by limiting the number of data points that need to be plotted.
Can MySQL do some clever stuff where it decides to skip 1 every so many rows and return 1000 averaged values - this so that my line graph would by approximation still be correct- but using fewer data points?
I need to build the backend for a chart, which needs to have a fixed amount of data points, let's assume 10 for this example. I need to get all entries in a table, have them split into 10 chunks (by their respective date column) and show how many entries there were between each date interval.
I have managed to do kind of the opposite (I can get the entries for a fixed interval, and variable number of data points), but now I need a fixed number of data points and variable date interval.
What I was thinking (which didn't work) is to get the difference between the min and max date from the table, divide it by 10 (number of data points) and have each row's date column divided by that result and also grouped by it. I either screwed up the query somewhere or my logic is faulty, because it didn't work.
Something along these lines:
SELECT (UNIX_TIMESTAMP(created_at) DIV (SELECT (MAX(UNIX_TIMESTAMP(created_at)) - MIN(UNIX_TIMESTAMP(created_at))) / 10 FROM user)) x FROM user GROUP BY x;
I have partly the following MySQL schema
ServiceRequests
----------
id int
RequestDateTime datetime
This is what a typical collection of records might look like.
1 | 2009-10-11 14:34:22
2 | 2009-10-11 14:34:56
3 | 2009-10-11 14:35:01
In this case the average request time is (34+5)/2 = 19.5 seconds, being
14:34:22 ---> (34 seconds) ----> 14:34:56 ------> (5 seconds) -----> 14:35:01
Basically I need to work out the difference in time between consecutive records, sum that up and divide by the number of records.
The closest thing I can think of is to convert the timestamp to epoch time and start there. I can add a field to the table to precalculate the epoch time if necessary.
How do I determine 19.5 using a sql statement(s)?
You don't really need to know the time difference of each record to get the average. You have x data points ranging from some point t0 to t1. Notice that the the last time - first time is also 39 sec. (max-min)/(count-1) should work for you
select max(RequestDateTime)-min(RequestDateTime) / (count(id)-1) from ServiceRequests;
Note: This will not work if the table is empty, due to a divide by zero.
Note2: Different databases handle subtraction of dates differently so you may need to turn that difference into seconds.
Hint: maybe using TIMEDIFF(expr1,expr2) and/or TIME_TO_SEC(expr3)
I've two columns in a table: time, volumes.
The time columns has resolution of seconds (format: YYMMDDHHmmss) and volumes are traffic volumes. I want to write a script that can calculate time series consisting of total volumes over a 5 win window; time series with 5 min bins.
How can I do that? Group by?