MySQL: limit results by a calculated step interval - mysql

I have a need to return a specific number of rows from a query within a given start and stop time at a dynamically calculated step interval.
I've kept it simple here with a table consisting of a unix timestamp and a corresponding integer value.
In my example, I need to have 200 rows returned with an INCLUSIVE start time of 1307455099 and and an INCLUSIVE end time of 1307462455.
Here's the current query I've developed so far. It uses the modulus of the total rows to calculate the step interval:
SELECT timestamp, value FROM soh_data
WHERE timestamp % (CAST((1307462455 - 1307455099)/200 AS SIGNED INTEGER)) = 0
AND timestamp BETWEEN 1307455099 AND 1307462455
ORDER BY timestamp;
The first problem is that because I'm using a modulus, the start and end times aren't always inclusive (that's solvable with an extra query... I'm fine with that).
The second, and more difficult issue to tackle, is that the total rows returned in this case is only 196. In most queries, it's n-1.
FYI, this is on a MySQL database with millions of rows of data.
Any insights?

Since I'm fine with throwing away a few rows, but I'm not alright with too little data, I've come up with two different approaches.
First: I've decided to adapt my query to use FLOOR instead of CAST. In my example, the quotient of the division was 21.805. SQL rounded that up to 22. The right step interval for gathering more than 200 results was 21 (yielding 205 results). Using FLOOR will give me the step number of 21 I need. Unfortunately, I haven't fully tested this to ensure consistent results across larger sets:
SELECT DISTINCT timestamp FROM soh_data
WHERE timestamp % (FLOOR((1307459460 - 1307455099)/200)) = 0
AND timestamp BETWEEN 1307455099 AND 1307459460
ORDER BY timestamp;
The more reliable solution is to pre-calculate the step in code. This way, I can zero in on the step programmatically. In the following example, I use Ruby for readability, but my ultimate solution will be coded in C++:
lower = 1307455099
upper = 1307459460
limit = 200
range = lower..upper
matches = 0
stepFactor = ((upper-1) - (lower+1))/limit
while (matches <= (limit - 2)) do
matches = 0
range.each { |ts| matches += 1 if (ts % stepFactor == 0) }
stepFactor -= 1 # For the next attempt
puts "Step factor = #{stepFactor+1}"
puts "Matches = #{matches}"
end

The number of rows returned would depend entirely on how many timestamps match your condition, of course. Let's say your step value comes out to 2, so your modulo math boils down to 'only even numbered timestamps'. If by chance all items in your table have odd time stamps, then you're going to get 0 rows returned, even though there's (say) 500+ items within the time range.
If you need exactly 200, you'd probably be better off using LIMIT in some way.

Related

MySQL Limit amount of records returned between 2 dates

I'm afraid I with this situation:
I have a MySQL table with just 3 columns: ID, CREATED, TOTAL_VALUE.
A new TOTAL_VALUE is recorded roughly every 60 seconds, so about 1440 times a day.
I am using PHP to generate some CanvasJS code that plots the MySQL records into line graph - this so that I can see how TOTAL_VALUE changes over time.
it works great for displaying 1 day worth of data, but when doing 1 week(7*1440=10080 plot points) things get really slow.
And a date range of for example 1-JAN-2016 and 1-SEP-2016 just leads to time outs in the PHP script.
How can I write some MySQL that still selects records between a date range but limit the rows returned to ie max 1000 rows?
I need to optimize this by limiting the number of data points that need to be plotted.
Can MySQL do some clever stuff where it decides to skip 1 every so many rows and return 1000 averaged values - this so that my line graph would by approximation still be correct- but using fewer data points?

Efficiently selecting every nth row without ROW_NUMBER

I have a table consisting of about 20 million rows, totalling approximately 2 GB. I need to select every nth row, leaving me with only a few hundred rows. But I cannot for the life of me figure out how to do it without getting a timeout.
ROW_NUMBER is not available, and keeping track of the current row number with a variable (e.g. #row) causes a timeout. I presume this is because it is still iterating over every row, but I'm not too sure. There's no integer index for me to use either. A DATETIME field is used instead. This is an example query using #row:
SET #row = 0;
SELECT `field` FROM `table` WHERE (#row := #row + 1) % 1555200 = 0;
Is there anything else I haven't tried?
Thanks in advance!
It's a tricky one for sure. You could work out the minimum date and then use a datediff to get you the sequential values, but this probably isn't sargeable (as below). For me, it took 18 seconds on a table with 16 million rows, but your mileage may vary.
** EDIT ** I should also add that this was with a nonclustered index scan against an index which included the date column (pretty sure this is forced by the function around the date but perhaps someone with more knowledge can expand on this). After creating an index against that column, I got 12 seconds.
Try it out and let me know how it goes :)
DECLARE #n INT = 5;
SELECT
DATEDIFF(DAY, first_date.min_date, DATE_COLUMN) AS ROWNUM
FROM
ss.YOUR_TABLE
OUTER APPLY
( SELECT
MIN(a.DATE_COLUMN) min_date
FROM ss.YOUR_TABLE a
) first_date
WHERE DATEDIFF(DAY, first_date.min_date, DATE_COLUMN) % #n = 0
Edit again:
Just noticed this has been accepted as an answer... In case anyone else comes across this, it probably shouldn't be. On review, this only works if your datetime field has one entry per day and the datetime is sequential (in that rows are added in the same order as the datetime, or if the datetime is the primary key).
Again only works per day with the above caveats, you can change the date diff to use any unit (Month, Year, Minute etc) if you have one row added per unit of time.

get count of leading zeros in mysql

Is it possible to create a select query that computes the number of leading zeros from a bit-operation in mysql? I would need to compare this to a threshold and return the results.
The Select-Query would be called very often, but i think that every SQL-based solution is better than our current strategy of loading everything into memory and then doing it in java.
example of the functionality:
unsigned INT with the value 1 -> 31, since there 32 bits and the rightmost is set
unsigned INT with the value 0x0C000000 -> 4, there are 4 zero bits from the highest order to the first set
I can then compare the result to a threshold an only get the ones above the threshold.
Pseudocode example query:
SELECT *
FROM data
WHERE numberOfLeadingZeros(data.data XOR parameter) >= threshold;
This should work, but please take a look at #viraptor's suggestion:
SELECT * FROM data WHERE (32 - LENGTH(BIN(data))) >= threshold
This just converts the integer into a binary string without the leading zeros. So if you subtract the length of this string from 32 you got the amount of leading zeros in the number.
Fiddle: http://sqlfiddle.com/#!9/e56c31/2
You can ignore the counting of zeros, and just compare to a threshold. For example you know how many leading 0s are in 32-bit value 2. Anything with fewer 0s will be >2 and anything with more 0s will be <2.
So for your solution, instead of doing query number_of_zeros(x) >= threshold do a query for x < value_with_threshold_leading_zeros. (the value being 2^(31-number_of_zeros))

get average interarrival time for service requests by timestamp

I have partly the following MySQL schema
ServiceRequests
----------
id int
RequestDateTime datetime
This is what a typical collection of records might look like.
1 | 2009-10-11 14:34:22
2 | 2009-10-11 14:34:56
3 | 2009-10-11 14:35:01
In this case the average request time is (34+5)/2 = 19.5 seconds, being
14:34:22 ---> (34 seconds) ----> 14:34:56 ------> (5 seconds) -----> 14:35:01
Basically I need to work out the difference in time between consecutive records, sum that up and divide by the number of records.
The closest thing I can think of is to convert the timestamp to epoch time and start there. I can add a field to the table to precalculate the epoch time if necessary.
How do I determine 19.5 using a sql statement(s)?
You don't really need to know the time difference of each record to get the average. You have x data points ranging from some point t0 to t1. Notice that the the last time - first time is also 39 sec. (max-min)/(count-1) should work for you
select max(RequestDateTime)-min(RequestDateTime) / (count(id)-1) from ServiceRequests;
Note: This will not work if the table is empty, due to a divide by zero.
Note2: Different databases handle subtraction of dates differently so you may need to turn that difference into seconds.
Hint: maybe using TIMEDIFF(expr1,expr2) and/or TIME_TO_SEC(expr3)

Filtering MySQL query result according to a interval of timestamp

Let's say I have a very large MySQL table with a timestamp field. So I want to filter out some of the results not to have too many rows because I am going to print them.
Let's say the timestamps are increasing as the number of rows increase and they are like every one minute on average. (Does not necessarily to be exactly once every minute, ex: 2010-06-07 03:55:14, 2010-06-07 03:56:23, 2010-06-07 03:57:01, 2010-06-07 03:57:51, 2010-06-07 03:59:21 ...)
As I mentioned earlier I want to filter out some of the records, I do not have specific rule to do that, but I was thinking to filter out the rows according to the timestamp interval. After I achieve filtering I want to have a result set which has a certain amount of minutes between timestamps on average (ex: 2010-06-07 03:20:14, 2010-06-07 03:29:23, 2010-06-07 03:38:01, 2010-06-07 03:49:51, 2010-06-07 03:59:21 ...)
Last but not least, the operation should not take incredible amount of time, I need this functionality to be almost fast as a normal select operation.
Do you have any suggestions?
I wasn't able to come up with a query that would do this off the top of my head, but here's what I was thinking:
If you have a lot of entries within a single minute, figure out a way to collapse the results such that there is max 1 entry for any given minute (DISTINCT, DATE_FORMAT maybe?).
Limit the number of results by using modulus on the minute value, something like this (if you'd like an entry from every 10 minutes):
WHERE MOD(MINUTE(tstamp_column, 10)) = 0
If your goal is to filter records, presumably what you really want is a small percentage of the records, but not the first 10 or 100. In which case, which not just select them randomly? The MySQL RAND() function will return a floating point number n, such that 0 <= n < 1.0. Convert your desired percentage to a floating point number, and use it like this:
SELECT * FROM table
WHERE RAND() < 0.001
If you want repeatable results (for testing), you can use a seed parameter to force the function to always return the same set of numbers.