Fastest way to calculate correlation in every row? - mysql

Well, I have a table data of millions of rows. I want to carry out correlation study for every row (from the 1st to the current row minus 1). For e.g. the 1st rows is omitted. The 2nd row's result column is to be supplied with the correlation using the 1st row. The 3rd row's result column is to be supplied with the correlation using the 1st and 2nd row. And so on.
Correlation for the entire table can be calculated using:
SELECT (Count(*)*Sum(x*y)-Sum(x)*Sum(y))/
(sqrt(Count(*)*Sum(x*x)-Sum(x)*Sum(x))*
sqrt(Count(*)*Sum(y*y)-Sum(y)*Sum(y))) AS TotalCorelation FROM Data;
I want to avoid using Joins as much as possible as it takes lots of time, sometimes even timeout error, above 300 seconds). What's the other alternative?
Example table Data Structure:
id, x, y, result
1 , 4, 2, null
2 , 6, 3, -0.2312
3 , 5, 5, 0.42312
4 , 6, 2, -0.5231
5 , 5, 5, 0.22312
6 , 3, 7, -0.2312
7 , 2, 9, 0.42231
8 , 7, 2, 0.32253
9 , 9, 5, 0.32431
id : primary key
x and y : The data
result: correlation

I think this is it:
SELECT d2.ID, d2.x, d2.y, d2.result,
(Count(*)*Sum(d1.x*d1.y)-Sum(d1.x)*Sum(d1.y))/
(sqrt(Count(*)*Sum(d1.x*d1.x)-Sum(d1.x)*Sum(d1.x))*
sqrt(Count(*)*Sum(d1.y*d1.y)-Sum(d1.y)*Sum(d1.y))) AS TotalCorelation
FROM Data d1
RIGHT JOIN Data d2 ON d1.id < d2.id
GROUP BY d2.ID
ORDER BY d2.ID
Without a closed form for calculating correlation of N+1 from N rows, you have to use a quadratic join like this.
I'm assuming that your basic formula is correct. But I'm not sure it is -- when I just run it on the total dataset, I don't get the result 0.32431, I get -0.552773693079.
Here's a linear implementation:
SET #SumX = 0;
SET #SumY = 0;
SET #Count = 0;
SET #SumX2 = 0;
SET #SumY2 = 0;
SET #SumXY = 0;
SELECT id, x, y,
#SumX := #SumX + x AS SumX,
#SumY := #SumY + y AS SumY,
#Count := #Count + 1 AS ct,
#SumX2 := #SumX2 + x*x AS SumX2,
#SumY2 := #SumY2 + y*y AS SumY2,
#SumXY := #SumXY + x*y AS SumXY,
IF(#Count > 1,
(#Count*#SumXY-#SumX*#SumY)/
(sqrt(#Count*#SumX2-#SumX*#SumX)*
sqrt(#Count*#SumY2-#SumY*#SumY)), NULL) AS TotalCorelation
FROM DATA
ORDER BY id
SQLFIDDLE

Related

MySQL Tagging Rows in Sequence based on a pattern

I have a column which I am trying to convert in MySQL into another column with a pattern where ever there are consecutive 1s in the data. Please see the example dataset below
Dataset Sample: https://1drv.ms/x/s!ApGNZAoiMmX3gi9OR7SUxt3ou84v?e=tuSV7f
Following is the code I have written but not able to make it work and any suggestions would be helpful.
select rownum,result,movingsum,new_result
(select rownum,result,movingsum,
if(result_norm_max=0,0,if(movingsum=1,1,0)) as new_result
from
(select rownum,result,
sum(result) over (order by rownum rows between 2 preceding and current row) as movingsum
from mytable) a;
The issue is, the above code doesn't return the output needed for all required logic of:
when result column is 0 new_result should be 0
when result is 1, new_result = 1 but only when previous 2 new_results are 0
Any suggestion on how I should approach this will be useful.
Thanks!
With some tries I was able to find the solution which is close to what I need as mentioned below. I used 2 variables to carry out the trick,
select rownum,result,
if (result= 0, 0, if(#n = 1, if(#m >= 7, 1 , 0), 1)) as new_max,
if (result= 0, 0, if(#n = 1,
case when #m >= 7 then #m:=0 else 0 end
, 1)) as new_max1,
if (result= 0, if(#m>0,#m:=#m-1,#m:=0), if(#n = 1, #m:=#m+1,#m:=#m-1)) as new_m,
#n := result
from mytable a, (select #n:= 0, #m:= 0) b

Mask integer field in mysql

I need to mask integer field in mysql such that 9999911111 becomes 9900001111. I want to keep first 2 digits and last 4 digits and need to mark rest of the digits as 0 for the integers stored in the field.
I have created a query and it's working but I am not sure whether this is right way to do for integers or not.
update table_name
set field_name=CONCAT(SUBSTR(field_name, 1, 2),
REPEAT('0', CHAR_LENGTH(field_name) - 6),
SUBSTR(field_name, CHAR_LENGTH(field_name)-3, CHAR_LENGTH(field_name)));
Just trying a different approach .
SET #myVar = 344553543534;
SELECT #myVar - (SUBSTRING(#myVar, 4, LENGTH(#myVar) - 7) * 10000) ;
Above mentioned formula will give 344000003534 as the result. Tried with different combination and found it working.
So your query need to change as given below
UPDATE table_name
SET field_name=
(field_name - (SUBSTRING(field_name, 4, LENGTH(field_name) - 7) * 10000));
Explanation :
Consider Number, a = 344553543534;
Expected Result, b = 344000003534;
c = (a - b) = 344553543534 - 344000003534 = 553540000;
Now if you consider the result, c, 55354 is the numbers where masking required, and 0000 indicates the last 4 number to be left open.
So to get masked value, we can use the formula, b = a - c;
So now to get c, used SUBSTRING(a, 4, LENGTH(a) - 7) * 10000
EDIT : To keep only first two numbers, use 3 instead of 4 and 6 instead of 7. I assumed that you needed to keep first 3.
SET #myVar = 344553543534;
SELECT #myVar - (SUBSTRING(#myVar, 3, LENGTH(#myVar) - 6) * 10000) ;

mysql between show matched value

I have a table with columns showing ranges, like
id from to
1 10 100
2 200 300
I have a query which will be a list of values, like 17, 20, 44, 288 etc.
Is it possible to have a result set which would include the where condition, so I get:
id from to input
1 10 100 7
1 10 100 20
1 10 100 144
2 200 300 288
Right now the code runs one query per where value and it works, and I'm looking to increase performance by combing it into one large multiple where clause, like
SELECT *
FROM table
WHERE (from<=7 AND start>=7)
OR (from<=20 AND start>=20)
OR (from<=144 AND start>=144)
OR (from<=288 AND start>=288)
What you want makes no sense regarding ranges.
7 and 144 has no compatible range yet you want to put then into the first range.
In a result set with lots of values listing you will probably get to many conditions.
What you can do is to put those values that isn't in a range to show without correspondence. Like this:
With the structure being:
create table test (
id integer,
vfrom integer,
vto integer
);
insert into test values
(1, 10, 100),
(2, 200, 300);
create table vals(
val integer
);
insert into vals values (7), (20), (144), (288);
You can use this query:
select val, id, vfrom, vto
from vals v left join
test t on ( t.vfrom <= v.val and t.vto >= v.val )
It will bring you:
7 null null null
20 1 10 100
144 null null null
288 2 200 300
see it here on fiddle: http://sqlfiddle.com/#!2/f68fd/8
Maybe it isn't what you want but it is more logical.
Sure there is a query for this. Trouble is we need a table for specific values to show up; and then there are sub-queries and union selects:
SELECT table.*, values.val AS input
FROM (SELECT 7 AS val UNION SELECT 20 AS val UNION SELECT 144 AS val UNION SELECT 288 AS val) as values
JOIN table ON table.from <= values.val AND table.to >= values.val
This should do the trick. Note that you only have to specify the column name in the first SELECT with in a UNION SELECT.
I will suppose you are using Java as your application language. You could build your query this way:
public String buildQuery(int[] myList) {
String queryToReturn = "";
for (int queryIndex = 0; queryIndex < myList.length; queryIndex++) {
queryToReturn += ((queryIndex == 0) ? ("") : (" union ")) +
"(select `id`, `from`, `to`, " + myList[queryIndex] + " as input
from MyTable
where `from` < " + myList[queryIndex] + " and " + myList[queryIndex] " < `to`)";
}
return queryToReturn;
}
Then run the returned query.

Concat different tables?

I need to concatenate from two different tables.
Compare s.panelid (result like "AA") to b.modulecodes and return number_of_strings. Then put s.panelid (result like "AA") and number_of_string together.
select concat(Mid(s.panelid, 5, 2), ' - ' , '??') as `Module Type-Strings`
from r2rtool.stringtopanel s, be.modulecodes b
where s.insertts > '2011-07-15' and s.insertts < '2011-07-26' and Mid(s.panelid, 5, 2) != 99
group by date(insertts), `Module Type-Strings`
order by `Module Type-Strings`;
Be (Table): modulecodes, number_of_strings
AA - 12
AB - 4
AD - 3
AE - 12
When I run the above query it returns things like: Module Type-Strings = 'AA-??' and "AB-??" of course.
I am looking for: Module Type-Strings = 'AA-12'
Just in case you haven't tried it already...
Have you tried this?
select concat(Mid(s.panelid, 5, 2), ' - ' , b.number_of_string) as `Module Type-Strings`
from r2rtool.stringtopanel s, be.modulecodes b
where s.insertts > '2011-07-15' and s.insertts < '2011-07-26' and Mid(s.panelid, 5, 2) != 99
group by date(insertts), `Module Type-Strings`
order by `Module Type-Strings`;
There I'm basically replacing the '??' with the column you are asking about, number_of_string in the be.modulecodes table (aliased as b in the from clause).

sql loop and set properties

In SQL, I would like to query a list, in order by pageNumber
SELECT * FROM `comics`
WHERE 1
ORDER BY pageNumber ASC
Then, I would like to set their pageNumbers based on their index in the query (starting with 1 instead of 0).
Here is my pseudo code for the functionality as desired; where list is the return value of the Select Query above.
for(var n:int = 0; n<list.length; n++){
if(list[n].pageNumber != n+1){
list[n].pageNumber = n+1
}
}
For example I might have pageNumbers 5, 17, 23, 24, 18, 7
The ORDER BY pageNumber ASC will sort this to 5, 7, 17, 18, 23, 24
I would then like to alter the pageNumbers in order to be 1, 2, 3, 4, 5, 6
edit:
#fortheworld MySQL
#cyberkiwi UPDATE
sorry for being unclear. guess i need to learn more for my questions to be clear :)
thanks for all your help
SET #I := 0;
SELECT *,
#I := #I + 1 AS newPageNumber
FROM comics
ORDER BY pageNumber ASC
I don't understand why peops insist on writing an SQL batch when a single statement will do.
SELECT comics.*, #n := #n + 1 AS PageNumber2
FROM (SELECT #n := 0) X CROSS JOIN comics
ORDER BY pageNumber ASC