I've been using a pretty simple array formula in excel to crunch some datasets but they're getting too large and absolutely destroying my computers performance whenever I update the calculations.
The excel sheet and MySQL database are laid out like so:
+-Timestamp-+-value-+
| 1340816430| .02 |
---------------------
x600,000 rows
Here's the excel formula:
{=AVERAGEIFS(B:B,A:A,"<"&A1+1000,A:A,">"&A1-1000)}
That returns the average of the values, and is the third column in the excel sheet. Is there any plausible way for me to create a MySQL query that performs a similar operation and returns a column with the values that would have been in the third column had I run excel's formula?
If you are happy using Excel formulas you can speed up this calculation a lot (factor of over 3000 on my system). Assuming that Column A contains the timestamps in ASCENDING ORDER and Column B the values (if not already sorted then use Excel Sort).
in Column C put =IFERROR(MATCH(A1-1000,$A:$A,1),1) and copy down. This calculates the row number of the row 1000 timestamp less.
in Column D put =IFERROR(MATCH(A1+1000,$A:$A,1),1048576) and copy down. This calculates the row number of the row 1000 timestamp more.
in column E put =AVERAGE(OFFSET(B1,C1-ROW(),0,D1-C1+1,1)) and copy down. This calculates the average of the subset range from the first row to the last row. On my system this full calculates 1000K rows in 20 seconds. The disadvantage of this method is that its volatile so will recalculate whenever you make a change, but I assume that you are in Manual calculation mode anyway.
MySQL code:
select
a.timestamp t1,
avg(x.value) average_value
from
mydata a inner join (
select
timestamp,
value
from mydata
) x
on x.timestamp between a.timestamp - 1000 and a.timestamp + 1000
group by
a.timestamp
order by
t1
;
I would like to think that without the Excel overhead this will perform far better, but I can't promise it will be lightning fast on 600k rows. You will definitely want to index Timestamp. See also SQL Fiddle I created.
#Peter You can stick with Excel if you want to. Just use http://xllarray.codeplex.com. The formula you want is =AVERAGE(ARRAY.MASK((A:A>A1 + 1000)*(A:A<A1 - 1000), B:B). 1MM rows on my junky laptop calculate in under 1 second. Be sure to Ctrl-Shift-Enter it as an array formula.
If you don't want to build the code, you can grab the add-in and help file off my SkyDrive: http://sdrv.ms/JtaMIV
#Charles. Ah, no. It is only for one formula. Misread the spec.
If you wanted to push the calculation into C++ and expose it as an xll, here is how you might do that:
#include <algorithm>
#include <numeric>
#include "xll/xll.h"
using namespace xll;
typedef traits<XLOPER12>::xword xword;
static AddIn12 xai_windowed_average(
L"?xll_windowed_average", XLL_FP12 XLL_FP12 XLL_FP12 XLL_DOUBLE12,
L"WINDOWED.AVERAGE", L"Time, Value, Window"
);
_FP12* WINAPI
xll_windowed_average(_FP12* pt, _FP12* pv, double dt)
{
#pragma XLLEXPORT
static xll::FP12 a(size(*pt), 1);
double* bt0 = &pt->array[0];
double* bv0 = &pv->array[0];
double* bt = std::lower_bound(begin(*pt), end(*pt), *bt0 - dt);
double* et = std::lower_bound(begin(*pt), end(*pt), *bt0 + dt);
for (xword i = 0; i < size(*pt); ++i) {
a[i] = (bt == et) ? 0 : std::accumulate(bv0 + (bt - bt0), bv0 + (et - bt0), 0)/(et - bt);
// update the window
bt = std::lower_bound(bt, end(*pt), pt->array[i] - dt);
et = std::lower_bound(bt, end(*pt), pt->array[i] + dt);
}
return a.get();
}
Related
I am dealing with a table of decimal values that represent binary numbers. My goal is to count the number of times Bit(0), Bit(1),... Bit(n) are high.
For example, if a table entry is 5 this converts to '101' which can be done using the BIN() function.
What I would like to do is increment a variable 'bit0Count' and 'bit2Count'
I have looked into the BIT_COUNT() function however this would only return 2 for the above example.
Any insight would be greatly appreciated.
SELECT SUM(n & (1<<2) > 0) AS bit2Count FROM ...
The & operator is a bitwise AND.
1<<2 is a number with only 1 bit set, left-shifted by two places, so it is binary 100. Using bitwise AND against you column n is either binary 100 or binary 000.
Testing that with > 0 returns either 1 or 0, since in MySQL, boolean results are literally the integers 1 for true and 0 for false (note this is not standard in other implementations of SQL).
Then you can SUM() these 1's and 0's to get a count of the occurrences where the bit was set.
To tell if bit N is set, use 1 << N to create a mask for that bit and then use bitwise AND to test it. So (column & (1 << N)) != 0 will be 1 if bit N is set, 0 if it's not set.
To total these across rows, use the SUM() aggregation function.
If you need to do this frequently, you could define a stored function:
CREATE FUNCTION bit_set(UNSIGNED INT val, TINYINT which) DETERMINISTIC
RETURN (val & (1 << which)) != 0;
I am trying to find a reliable query which returns the first instance of an acceptable insert range.
Research:
some of the below links adress similar questions, but I could get none of them to work for me.
Find first available date, given a date range in SQL
Find closest date in SQL Server
MySQL difference between two rows of a SELECT Statement
How to find a gap in range in SQL
and more...
Objective Query Function:
InsertRange(1) = (StartRange(i) - EndRange(i-1)) > NewValue
Where InsertRange(1) is the value the query should return. In other words, this would be the first instance where the above condition is satisfied.
Table Structure:
Primary Key: StartRange
StartRange(i-1) < StartRange(i)
StartRange(i-1) + EndRange(i-1) < StartRange(i)
Example Dataset
Below is an example User table (3 columns), with a set range distribution. StartRanges are always ordered in a strictly ascending way, UserID are arbitrary strings, only the sequences of StartRange and EndRange matters:
StartRange EndRange UserID
312 6896 user0
7134 16268 user1
16877 22451 user2
23137 25142 user3
25955 28272 user4
28313 35172 user5
35593 38007 user6
38319 38495 user7
38565 45200 user8
46136 48007 user9
My current Query
I am trying to use this query at the moment:
SELECT t2.StartRange, t2.EndRange
FROM user AS t1, user AS t2
WHERE (t1.StartRange - t2.StartRange+1) > NewValue
ORDER BY t1.EndRange
LIMIT 1
Example Case
Given the table, if NewValue = 800, then the returned answer should be 23137. This means, the first available slot would be between user3 and user4 (with an actual slot size = 813):
InsertRange(1) = (StartRange(i) - EndRange(i-1)) > NewValue
InsertRange = (StartRange(6) - EndRange(5)) > NewValue
23137 = 25955 - 25142 > 800
More Comments
My query above seemed to be working for the special case where StartRanges where tightly packed (i.e. StartRange(i) = StartRange(i-1) + EndRange(i-1) + 1). This no longer works with a less tightly packed set of StartRanges
Keep in mind that SQL tables have no implicit row order. It seems fair to order your table by StartRange value, though.
We can start to solve this by writing a query to obtain each row paired with the row preceding it. In MySQL, it's hard to do this beautifully because it lacks the row numbering function.
This works (http://sqlfiddle.com/#!9/4437c0/7/0). It may have nasty performance because it generates O(n^2) intermediate rows. There's no row for user0; it can't be paired with any preceding row because there is none.
select MAX(a.StartRange) SA, MAX(a.EndRange) EA,
b.StartRange SB, b.EndRange EB , b.UserID
from user a
join user b ON a.EndRange <= b.StartRange
group by b.StartRange, b.EndRange, b.UserID
Then, you can use that as a subquery, and apply your conditions, which are
gap >= 800
first matching row (lowest StartRange value) ORDER BY SB
just one LIMIT 1
Here's the query (http://sqlfiddle.com/#!9/4437c0/11/0)
SELECT SB-EA Gap,
EA+1 Beginning_of_gap, SB-1 Ending_of_gap,
UserId UserID_after_gap
FROM (
select MAX(a.StartRange) SA, MAX(a.EndRange) EA,
b.StartRange SB, b.EndRange EB , b.UserID
from user a
join user b ON a.EndRange <= b.StartRange
group by b.StartRange, b.EndRange, b.UserID
) pairs
WHERE SB-EA >= 800
ORDER BY SB
LIMIT 1
Notice that you may actually want the smallest matching gap instead of the first matching gap. That's called best fit, rather than first fit. To get that you use ORDER BY SB-EA instead.
Edit: There is another way to use MySQL to join adjacent rows, that doesn't have the O(n^2) performance issue. It involves employing user variables to simulate a row_number() function. The query involved is a hairball (that's a technical term). It's described in the third alternative of the answer to this question. How do I pair rows together in MYSQL?
Hey is there any way to create query with simple formula ?
I have a table data with two columns value_one and value_two both are decimal values. I want to select this rows where difference between value_one and value_two is grater then 5. How can i do this?
Can i do something like this ?
SELECT * FROM data WHERE (MAX(value_one, value_two) - MIN(value_one, value_two)) > 5
Example values
value_one, value_two
1,6
9,3
2,3
3,2
so analogical difs are: 5, 6, 1, 1 so the selected row would be only first and second.
Consider an example where smaller number is subtracted with a bigger number:
2 - 5 = -3
So, the result is a difference of two numbers with a negation sign.
Now, consider the reverse scenario, when bigger number is subtracted with the smaller number:
5 - 2 = 3
Pretty simple right.
Basically, the difference of two number remains same, if you just ignore the sign. This is in other words called absolute value of a number.
Now, the question arises how to find the absolute value in MySQL?
Answer to this is the built-in method of MySQL i.e. abs() function which returns an absolute value of a number.
ABS(X):
Returns the absolute value of X.
mysql> SELECT ABS(2);
-> 2
mysql> SELECT ABS(-32);
-> 32
Therefore, without worrying about finding min and max number, we can directly focus on the difference of two numbers and then, retrieving the absolute value of the result. Finally, check if it is greater than 5.
So, the final query becomes:
SELECT *
FROM data
WHERE abs(value_one - value_two) > 5;
You can also do complex operations once the absolute value is calculated like adding or dividing with the third value. Check the code below:
SELECT *
FROM
data
WHERE
(abs(value_one - value_two) / value_three) + value_four > 5;
You can also add multiple conditions using logical operators like AND, OR, NOT to do so. Click here for logical operators.
SELECT *
FROM
data
WHERE
((abs(value_one - value_two) / value_three) + value_four > 5)
AND (value_five != 0);
Here is the link with various functions available in MySQL:
https://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html
No, you would just use a simple where clause:
select *
from data
where abs(value_one - value_two) > 5;
i am creating a matrix report.
need to show row number in the Matrix table under column and row grouping,
like this
A B C
X 1 1 1
Y 2 2 2
Z 3 3 3
Presently i am using this Expression for Row number but i am getting incorrect count.
=RunningValue(CountDistinct("YourTableName"),Count,"YourTableName")
Riddle-Master has a decent answer if your matrix is 100% full, ie there are no empty cells.
RowNumber("DataSet1") will contain the running sum of all fields
RowNumber("RowGroup") will contain the number of fields in each row group
However, if you have empty cells, you'll end up with some fractions and staggered numbers.
I found what I believe to be a better answer on this site, as long as you have a value in each row that is unique from the last. In my case, I'm using Customer_Num.
Go to Report Properties, open the Code property and paste the following code:
public dim LastNum as String
public dim rowCount as int32 = 0
Public Function GetRowNum(byval Num as String) as Integer
if Num <> LastNum then
rowCount = rowCount + 1
LastNum = Num
End If
Return rowCount
End Function
Then in the cell where you want the Row Number, past the following expression:
=Code.GetRowNum(Fields!Customer_Num.Value)
This should compare each value against the previous for each row, and give you an incrementing row number, no matter how many cell values are in each row.
From my experience RowNumber alone would give the sum all the dynamic fields from every row.
But a RowNumber with a scope on one field only would give the sum of the dynamic fields on one row alone.
Thus what solved it for me was:
=RowNumber("DataSet1")/RowNumber("StudentId")
When DataSet1 is from where I take all my matrix info and StudentId is only one of the fields in a row.
What worked for me for a matrix with both row and column grouping is:
=RowNumber("YourColumnGroupName") / CountRows()
I'm having trouble with this SQL:
$sql = mysql_query("SELECT $menucompare ,
(COUNT($menucompare ) * 100 / (SELECT COUNT( $menucompare )
FROM data WHERE $ww = $button )) AS percentday FROM data WHERE $ww >0 ");
$menucompare is table fields names what ever field is selected and contains data bellow
$button is the week number selected (lets say week '6')
$ww table field name with row who have the number of week '6'
For example, I have data in $menucompare like that:
123456bool
521478bool
122555heel
147788itoo
and I want to select those, who have same word in the last of the data and make percentage.
The output should be like that:
bool -- 50% (2 entries)
heel -- 25% (1 entry)
itoo -- 25% (1 entry)
Any clearness to my SQL will be very appreciated.
I didn't find anything like that around.
Well, keeping data in such format probably not the best way, if possible, split the field into 2 separate ones.
First, you need to extract the string part from the end of the field.
if the length of the string / numeric parts is fixed, then it's quite easy;
if not, you should use regular expressions which, unfortunately, are not there by default with MySQL. There's a solution, check this question: How to do a regular expression replace in MySQL?
I'll assume, that numeric part is fixed:
SELECT s.str, CAST(count(s.str) AS decimal) / t.cnt * 100 AS pct
FROM (SELECT substr(entry, 7) AS str FROM data) AS s
JOIN (SELECT count(*) AS cnt FROM data) AS t ON 1=1
GROUP BY s.str, t.cnt;
If you'll have regexp_replace function, then substr(entry, 7) should be replaced to regexp_replace(entry, '^[0-9]*', '') to achieve the required result.
Variant with substr can be tested here.
When sorting out problems like this, I would do it in two steps:
Sort out the SQL independently of the presentation language (PHP?).
Sort out the parameterization of the query and the presentation of the results after you know you've got the correct query.
Since this question is tagged 'SQL', I'm only going to address the first question.
The first step is to unclutter the query:
SELECT menucompare,
(COUNT(menucompare) * 100 / (SELECT COUNT(menucompare) FROM data WHERE ww = 6))
AS percentday
FROM data
WHERE ww > 0;
This removes the $ signs from most of the variable bits, and substitutes 6 for the button value. That makes it a bit easier to understand.
Your desired output seems to need the last four characters of the string held in menucompare for grouping and counting purposes.
The data to be aggregated would be selected by:
SELECT SUBSTR(MenuCompare, -4) AS Last4
FROM Data
WHERE ww = 6
The divisor in the percentage is the count of such rows, but the sub-stringing isn't necessary to count them, so we can write:
SELECT COUNT(*) FROM Data WHERE ww = 6
This is exactly what you have anyway.
The divdend in the percentage will be the group count of each substring.
SELECT Last4, COUNT(Last4) * 100.0 / (SELECT COUNT(*) FROM Data WHERE ww = 6)
FROM (SELECT SUBSTR(MenuCompare, -4) AS Last4
FROM Data
WHERE ww = 6
) AS Week6
GROUP BY Last4
ORDER BY Last4;
When you've demonstrated that this works, you can re-parameterize the query and deal with the presentation of the results.