optimizing short text comparisons - mysql

i have 2 tables qs and local.
qs has 2 columns (actually built from several other columns) that are part of the comparison i need to do:
f1 | t1
abcdaa | abcdbb
local just has one column that's part of the comparison:
rangeA
abcd
I am trying to find the entries in qs that do not have a matching substring in local
I've tried this in about a dozen different ways, and i must be missing something , since it's taking an unusual amount of time.
here is the fastest method I've found so far:
CREATE TEMPORARY TABLE `tempB` SELECT f1, t1,
LEFT(f1,2) AS l2,LEFT(f1,3) AS l3,LEFT(f1,4) AS l4,LEFT(f1,5) AS l5,LEFT(f1,6) AS l6,LEFT(f1,7) AS l7,LEFT(f1,8) AS l8,
LEFT(f1,9) AS l9,LEFT(f1,10) AS l10,LEFT(f1,11) AS l11,LEFT(f1,12) AS l12,LEFT(f1,13) AS l13,
LEFT(t1,2) AS lt2,LEFT(t1,3) AS lt3,LEFT(t1,4) AS lt4,LEFT(t1,5) AS lt5,LEFT(t1,6) AS lt6,LEFT(t1,7) AS lt7,LEFT(t1,8) AS lt8,
LEFT(t1,9) AS lt9,LEFT(t1,10) AS lt10,LEFT(t1,11) AS lt11,LEFT(t1,12) AS lt12,LEFT(t1,13) AS lt13 FROM
(SELECT CONCAT(c1,n1,s1) AS f1, CONCAT(c1,n1,s2) AS t1 FROM qs WHERE c1 ='a')tab0 ORDER BY f1 ASC;
CREATE TEMPORARY TABLE `tempB2` SELECT rangeA FROM local WHERE rangeA LIKE 'a%' ORDER BY rangeA ASC;
CREATE TEMPORARY TABLE `tempB3` SELECT rangeA AS rangeAA FROM local WHERE rangeA LIKE 'a%' ORDER BY rangeA ASC;
SELECT f1,t1, rangeA, rangeAA FROM tempB
LEFT JOIN tempB2 ON rangeA IN(l2,l3,l4,l5,l6,l7,l8,l9,l10,l11,l12,l13)
LEFT JOIN tempB3 ON rangeAA IN(lt2,lt3,lt4,lt5,lt6,lt7,lt8,lt9,lt10,lt11,lt12,lt13)
WHERE rangeA IS NULL OR rangeAA IS NULL
creating the temp tables is fast and starting with one character at a time (in this case 'a') significantly reduces the size of the datasets, but this is still very very slow even with only a few hundred thousand rows in each temp table.
I've tried using just f1 and t1 with a
ON f1 LIKE CONCAT (rangeA,'%')
but that seemed to be even slower.
Any other ideas?
Note that rangeA is at least 2 characters long and at most 13 characters long. hence the LEFTs.
example data:
qs :
c1 | n1 | s1 | s2
ab | cd | aa | bb
bb | bbb | bb | bc
cbc | cc | cdd | ddd
ddd | e | ddf | def
local :
rangeA
abcd
bdddd
cbcccdd
dddedd
expected result:
f1 | t1 | f1match | t1match
bbbbbbb | bbbbbbc | NULL | NULL
cbccccdd | cbcccddd | NULL | cbcccdd
dddeddf | dddedef | dddedd | NULL

Thank you Paul Spiegel for making this work.
Let's set up some test data.
mysql> select * from qs;
+----+---------------+-------------------+
| id | f1 | t1 |
+----+---------------+-------------------+
| 6 | match1 | no match |
| 7 | match1 | match2 |
| 8 | foo match1 | match1 bar |
| 9 | no match | abc match2 123 |
| 10 | no match | no match |
| 11 | also no match | again not a match |
+----+---------------+-------------------+
mysql> select * from local;
+--------+
| rangeA |
+--------+
| match1 |
| match2 |
+--------+
And we expect only those rows which neither f1 nor t1 match any row in local.
+----+---------------+-------------------+
| id | f1 | t1 |
+----+---------------+-------------------+
| 10 | no match | no match |
| 11 | also no match | again not a match |
+----+---------------+-------------------+
UPDATE: Indexing qs(f1,t1) and local(rangeA) will help performance.
create index index_qs_fields on qs(f1,t1);
create index index_local_rangeA on local(rangeA);
instr finds a substring in a string, that simplifies many things.
We can do this with a left excluding join. That is to get only the rows on the left side (qs) which have no match on the right (local).
We do a normal left join to check for matches.
select qs.*, rangeA
from qs
left join local on
instr(f1,rangeA) or
instr(t1,rangeA)
+----+---------------+-------------------+--------+
| id | f1 | t1 | rangeA |
+----+---------------+-------------------+--------+
| 1 | match1 | no match | match1 |
| 2 | match1 | match2 | match1 |
| 3 | foo match1 | match1 bar | match1 |
| 2 | match1 | match2 | match2 |
| 4 | no match | abc match2 123 | match2 |
| 5 | no match | no match | NULL |
| 6 | also no match | again not a match | NULL |
+----+---------------+-------------------+--------+
And turn it into an excluding join by filtering for only those which don't match at all.
select qs.*, rangeA
from qs
left join local on
instr(f1,rangeA) or
instr(t1,rangeA)
where rangeA is null
+----+---------------+-------------------+
| id | f1 | t1 |
+----+---------------+-------------------+
| 5 | no match | no match |
| 6 | also no match | again not a match |
+----+---------------+-------------------+
dbfiddle
UPDATE: Lots of entries in local can make this slow. We can try optimizing it by joining all the matches together into one regular expression. This might be faster.
We can construct our regex using group_concating all the matches together as a single regex.
select group_concat(rangeA separator '|')
into #range_re
from local;
select qs.*
from qs
where not f1 regexp(#range_re) and not t1 regexp(#range_re);
Note that you'll need to be careful to escape regex characters in your matches.
Original way too complicated answer follows.
That tells us which entries in qs don't match entries in local.
select qs.id, f1, t1, rangeA
from qs
left join local on 1=1
where instr(f1,rangeA) = 0 and instr(t1,rangeA) = 0;
+----+---------------+-------------------+--------+
| id | f1 | t1 | rangeA |
+----+---------------+-------------------+--------+
| 6 | match1 | no match | match2 |
| 8 | foo match1 | match1 bar | match2 |
| 9 | no match | abc match2 123 | match1 |
| 10 | no match | no match | match1 |
| 10 | no match | no match | match2 |
| 11 | also no match | again not a match | match1 |
| 11 | also no match | again not a match | match2 |
+----+---------------+-------------------+--------+
But we want those which don't match all of local. We can do that by counting up how many times a row appears in our list of not matches.
select qs.id, f1, t1, count(id)
from qs
left join local on 1=1
where instr(f1,rangeA) = 0
and instr(t1,rangeA) = 0
group by qs.id;
+----+---------------+-------------------+-----------+
| id | f1 | t1 | count(id) |
+----+---------------+-------------------+-----------+
| 6 | match1 | no match | 1 |
| 8 | foo match1 | match1 bar | 1 |
| 9 | no match | abc match2 123 | 1 |
| 10 | no match | no match | 2 |
| 11 | also no match | again not a match | 2 |
+----+---------------+-------------------+-----------+
And then select only those whose count is the same as the number of matches.
mysql> select qs.id, f1, t1
from qs
left join local on 1=1
where instr(f1,rangeA) = 0
and instr(t1,rangeA) = 0
group by qs.id
having count(id) = (select count(*) from local);
+----+---------------+-------------------+
| id | f1 | t1 |
+----+---------------+-------------------+
| 10 | no match | no match |
| 11 | also no match | again not a match |
+----+---------------+-------------------+
dbfiddle

here's what i have so far, which works pretty well for <50k rows. Thank you to Schwern for the helpful discussion about INSTR().
CREATE TEMPORARY TABLE `tempB` SELECT f1, t1 FROM
(SELECT LEFT(CONCAT(c1,n1,s1),17) AS f1, LEFT(CONCAT(c1,n1,s2),17) AS t1 FROM qs WHERE c1 ='a')tab0 ORDER BY f1 ASC;
CREATE TEMPORARY TABLE `tempB2` SELECT rangeA FROM local WHERE rangeA LIKE 'a%' ORDER BY rangeA ASC;
CREATE TEMPORARY TABLE `tempB3` SELECT rangeA AS rangeAA FROM local WHERE rangeA LIKE 'a%' ORDER BY rangeA ASC;
SELECT f1,t1, rangeA, rangeAA FROM tempB
LEFT JOIN tempB2 ON INSTR(f1,rangeA) =1
LEFT JOIN tempB3 ON INSTR(t1,rangeAA) =1
WHERE rangeA IS NULL OR rangeAA IS NULL

If I correctly understand your question, I think you should look into using LOCATE() or POSITION(). I don't really get the need to using all those LEFT().
A overly simplified version of what I think you want is this:
CREATE TEMPORARY TABLE `tempB`
SELECT CONCAT(c1,n1,s1) AS f1, CONCAT(c1,n1,s2) AS t1 FROM qs ORDER BY f1 ASC;
CREATE TEMPORARY TABLE `tempB2` SELECT rangeA FROM local ;
SELECT tempB.f1, tempB.t1
from tempB
WHERE (SELECT COUNT(*) from tempB2
WHERE POSITION(rangeA IN tempB.f1) != 0 AND POSITION(rangeA IN tempB.t1) != 0) = 0;

Related

MySQL: SUM function applied to a formula contained in field selected by another query

I'm in the need to perform a select SUM() where that is a formula contained into a field selected by another query.
Example:
table_A (the "formula" field contains, in each cell, an arithmetic expression involving columns from table B):
+------------+--------------+------------+
| Product_id | related_prod | formula |
+------------+--------------+------------+
| U1 | C2 | col2-col1 |
| U2 | C3 | col3-col2 |
| U3 | C4 | col3-col1 |
+------------+--------------+------------+
table_B:
+------------+---------+------------+----------+------+------+------+
| Product_id | year_id | company_id | month_id | col1 | col2 | col3 |
+------------+---------+------------+----------+------+------+------+
| C2 | 2017 | 1 | 2 | 100 | 200 | 300 |
| C3 | 2017 | 1 | 2 | 400 | 500 | 600 |
| C4 | 2017 | 1 | 2 | 700 | 800 | 900 |
+------------+---------+------------+----------+------+------+------+
I do, then, the following query:
SELECT
SUM(totals.relaz) as final_sum,
totals.relaz as 'col',
totals.prod as 'prod',
totals.cons as 'cons',
m.company_id, m.month_id, m.year_id, FROM `table_B` m,
( SELECT formula as relaz,
related_prod as prod,
p.product_id as cons FROM table_A p )
AS totals
WHERE m.product_id=totals.prod
GROUP BY m.company_id, m.year_id, m.month_id, m.product_id, totals.cons
After the select I'd do expect that, considering for example the only product 'U1', the corresponding row would be
+-----------+-----------+------+------+------------+----------+---------+
| final_sum | col | prod | cons | company_id | month_id | year_id |
+-----------+-----------+------+------+------------+----------+---------+
| 100 | col2-col1 | C2 | U1 | 1 | 2 | 2017 |
+-----------+-----------+------+------+------------+----------+---------+
Instead, what I get is
+-----------+-----------+------+------+------------+----------+---------+
| final_sum | col | prod | cons | company_id | month_id | year_id |
+-----------+-----------+------+------+------------+----------+---------+
| 0 | col2-col1 | C2 | U1 | 1 | 2 | 2017 |
+-----------+-----------+------+------+------------+----------+---------+
i.e. the final_sum field is always set to 0, despite the 'col' field contains the correct equation.
What am I doing wrong?
Thank you in advance
Alex
You are trying to get sum from a string column (table_A.formula). This will result 0. MySQL/MariaDB will not try to convert the strings to column references and evaluate the formula in the string.
Another thing is that you should list all columns not in aggregate function in GROUP BY.
To get the result you want, use:
SELECT
SUM(CASE
WHEN a.formula = 'col2-col1' THEN b.col2-b.col1
WHEN a.formula = 'col3-col1' THEN b.col3-b.col1
WHEN a.formula = 'col3-col2' THEN b.col3-b.col2
END
) AS final_sum,
a.formula as 'col',
a.related_prod as 'prod',
a.Product_id as 'cons',
b.company_id,
b.month_id,
b.year_id
FROM table_B b
JOIN table_A a on a.related_prod=b.Product_id
GROUP BY a.formula, a.related_prod, a.Product_id, b.company_id, b.month_id, b.year_id
It may possible to build a Stored routine that fetches the string col2-col1 and inserts it (using CONCAT) into a string, then PREPAREs and EXECUTEs the SQL string.
That is, dynamically build the SQL, perhaps like in #slaakso's Answer.
It would be messy.
I have needed something like this; I chose to do eval() in PHP, which was the client language. I use it for evaluating VARIABLES and GLOBAL STATUS. Example: Table_open_cache_misses / Uptime gives the "misses per second", which, if high, indicates the need for increasing the setting table_open_cache.

MySQL Inner Join changes the order of records

I have a table Table1 which has 5 columns like this
| ID | Name | V1 | V2 | V3 |
| 1 | A | 103 | 507 | 603 |
| 2 | B | 514 | 415 | 117 |
and another table Table2 which has values like this
| Values | Rooms |
| 103 | ABC |
| 507 | DEF |
| 603 | GHI |
| 514 | JKL |
| 415 | MNO |
| 117 | PQR |
I am running a join query to get rooms from Table2 joined by Table1 as
SELECT t2.values, t2.rooms, t1.Name FROM Table2 t2
INNER JOIN Table1 t1 ON t1.V1 = t2.Values
OR t1.V2 = t2.Values
OR t1.V3 = t2.Values;
this query gets the result but in ascending order of t2.values. I do not want to change any order. I just want to get result in whatever the Table1 has values.
| Values | Rooms | Names |
| 103 | ABC | A |
| 117 | PQR | B |
| 415 | MNO | B |
| 507 | DEF | A |
| 514 | JKL | B |
| 603 | GHI | A |
The above result is ordered according to T2.Values and these values come form t1.V1, t1.V2, T1.V3. I do not want the order result. I want the result to be according the t1.V1, t1.V2, T1.V3 values. If we see at Table1 the values would be 103, 507, 603, 514, 415, 117 and therefore the result should be
| Values | Rooms | Names |
| 103 | ABC | A |
| 507 | DEF | A |
| 603 | GHI | A |
| 415 | MNO | B |
| 514 | JKL | B |
| 117 | PQR | B |
I hope I made my explaination somehow better. Please If it still doesnt clear let me allow to edit it more.
As paxdiablo suggested, I tried adding ORDER BY t1.name but that is not sorting and result is same. Why?
I just want to get result in whatever the Table1 has values.
This is where you've made your mistake. Table1, at least as far as SQL is concerned, doesn't have an order. Tables are unordered sets to which you impose order when extracting the data (if you wish).
SQL select statements make absolutely no guarantee on the order in which results are returned, unless you specifically use order by or group by. Even select * from table1 can return the rows in whatever order the DBMS sees fit to give them to you.
If you want a specific ordering, you need to ask for it explicitly. For example, if you want them ordered by the room name, whack an order by t1.name at the end of your query. Though I'd probably go the whole hog and use a secondary sort order as well, with order by t1.name, t2.rooms.
Or, to sort on the values, add order by t2.values.
For example, punching this schema/data into SQLFiddle:
create table table1(
id integer,
name varchar(10),
v1 integer,
v2 integer,
v3 integer);
insert into table1 (id,name,v1,v2,v3) values (1,'a',103,507,603);
insert into table1 (id,name,v1,v2,v3) values (2,'b',514,415,117);
create table table2 (
val integer,
room varchar(10));
insert into table2(val,room) values (103,'abc');
insert into table2(val,room) values (507,'def');
insert into table2(val,room) values (603,'ghi');
insert into table2(val,room) values (514,'jkl');
insert into table2(val,room) values (415,'mno');
insert into table2(val,room) values (117,'pqr');
and then executing:
select t2.val, t2.room, t1.name from table2 t2
inner join table1 t1 on t1.v1 = t2.val
or t1.v2 = t2.val
or t1.v3 = t2.val
gives us an arbitrary ordering (it may look likes it's ordering by rooms within name but that's not guaranteed):
| val | room | name |
|-----|------|------|
| 103 | abc | a |
| 507 | def | a |
| 603 | ghi | a |
| 514 | jkl | b |
| 415 | mno | b |
| 117 | pqr | b |
When we change that to sort on two descending keys order by t1.name desc, t2.room desc, we can see it re-orders based on that:
| val | room | name |
|-----|------|------|
| 117 | pqr | b |
| 415 | mno | b |
| 514 | jkl | b |
| 603 | ghi | a |
| 507 | def | a |
| 103 | abc | a |
And, finally, changing the ordering clause to order by t2.val asc, we get it in value order:
| val | room | name |
|-----|------|------|
| 103 | abc | a |
| 117 | pqr | b |
| 415 | mno | b |
| 507 | def | a |
| 514 | jkl | b |
| 603 | ghi | a |
Finally, if your intent is to order it by the order of columns in each row of table1 (so the order is left to right v1, v2, v3, you can introduce an artificial sort key, either by using a case statement to select based on which column matched, or by running multiple queries which may be more efficient since:
you're not executing per-row functions, which tend not to scale very well; and
in larger DBMS', they can be parallelised.
The multiple query option would go something like:
select 1 as minor, t2.val as val, t2.room as room, t1.name as name from table2 t2
inner join table1 t1 on t1.v1 = t2.val
union all select 2 as minor, t2.val as val, t2.room as room, t1.name as name from table2 t2
inner join table1 t1 on t1.v2 = t2.val
union all select 3 as minor, t2.val as val, t2.room as room, t1.name as name from table2 t2
inner join table1 t1 on t1.v3 = t2.val
order by name, minor
and generates:
| minor | val | room | name |
|-------|-----|------|------|
| 1 | 103 | abc | a |
| 2 | 507 | def | a |
| 3 | 603 | ghi | a |
| 1 | 514 | jkl | b |
| 2 | 415 | mno | b |
| 3 | 117 | pqr | b |
You can see there that it uses name as the primary key and the position of the value in the row as the minor key.
Now some people may think it an ugly approach to introduce a fake column for sorting but it's a tried and tested method for increasing performance. However, you shouldn't trust me (or anyone) on that. My primary mantra for optimisation is measure, don't guess.
I know you've already accepted an answer, but it looks to me like you want them sorted by the order of ID in table1, and then order of the column (v1, v2, v3) that you've matched on. In which case, something like this should work:
SELECT t2.`values`, t2.rooms, t1.Name FROM Table2 t2
INNER JOIN Table1 t1 ON t1.V1 = t2.`values`
OR t1.V2 = t2.`values`
OR t1.V3 = t2.`values`
ORDER BY
t1.id,
CASE
WHEN t1.v1 = t2.`values` THEN 1
WHEN t1.v2 = t2.`values` THEN 2
WHEN t1.V3 = t2.`values` THEN 3
END
(Note I'm quoting values because it's a keyword in SQL...)
What I'm doing here is:
First, I'm ordering by t1.id, which gets you the rough sort order based on the rows in the t1 tables.
Then I'm adding a secondary sort based on which Values column was matched in the result row, using a CASE statement. For each row of your query results, if the result was produced by a match between t1.v1 and t2.values, then the CASE statement evaluates to 1. If the result was because of a match between t1.v2 and t2.values, then we get 2. If the result was because of a match between t1.v3 and t2.values, then we get 3.
So the overall sort order is based first on the order of the rows in t1, and then within that on the order of which column got matched between t1 and t2 for each row in your results, which seems to be the requirement (though it's hard to put into words!)
well the query is sorting the table using Values in ascending order - like "103 < 117 < 415 and so on..." but you want them to take the order in which they are sorted in actual table i.e. "103 than 507 than 603 and so on" which is sorted the way they have been inserted and you just want to retain this order of sorting.. one possible way you can achieve that is using an extra timestamp field in second table that can track the time insertion is done and thus you can use that timestamp like " ORDER BY timestamp " in your query..

How to write a mysql query to get data from like mixed column and rows

I'd like to query the table which stores the entries somehow mixed in rows and columns.
Here is the table:
| id | class | field | value |
|-----|-------|-------|-------|
| 1 | 1 | a | AA |
| 2 | 1 | b | BB |
| 3 | 1 | c | CC |
| 4 | 2 | a | DD |
| 5 | 2 | b | EE |
| 6 | 2 | c | FF |
What should be the query to get a result like:
a)
| class | new_a | new_c |
|-------|-------|-------|
| 1 | AA | CC |
| 2 | DD | FF |
My pseudo query I imagine it would be something like:
select class, value(where field=a) as new_a, value(where field=c) as new_c, from table;
b)
| class | new_a | new_c |
|-------|-------|-------|
| 2 | DD | FF |
For this one I guess it should be like:
select class, value(where field=a) as new_a, value(where field=c) as new_c, from table where class = '2';
Unfortunatelly I'm rarely using the mysql and I'm not sure how to build this query. All constructive suggestions are appreciated.
Try this query
For a) The query is
SELECT t1.class, t1.value as new_a, t2.value as new_b
FROM table t1
JOIN table t2 ON(t2.class=t1.class )
WHERE t1.field='a' AND t2.field='c'
For b) The query is
SELECT t1.class, t1.value as new_a, t2.value as new_b
FROM table t1
JOIN table t2 ON(t2.class=t1.class )
WHERE t1.field='a' AND t2.field='c' AND t1.class='2'
1) you are trying to convert the rows into columns so I joined the same table twice with condition as 2 tables should have same 'class' value
2) then added condition as what to fetch from table t1 and table t2 as t1.field='a' and t2.field='c'
3) In second query you need only the class value '2', so i added the condtion as t1.class=2

Merging Corresponding MySQL Records

I have a MySQL table called "objecttable" that has the following structure and data in it. (The data is just a sequence, there is a whole lot more).
ID | Name | posX | posY | posZ |rotX | rotY | rotZ | rotW |
3562 | LODpmedhos1_LAe | 2062 | -1703 | 16 | 0 | 45 | 22 | 1 |
3559 | LODpmedhos5_LAe | 2021 | -1717 | 15 | 0 | 45 | 34 | 1 |
3561 | LODpmedhos3_LAe | 2021 | -1717 | 15 | 0 | 45 | 34 | 1 |
I want to figure out which records have the same posX, posY, posZ, rotX, rotY and rotZ values and insert them into a table called "matchtable", and in the end I want it to look like this (I have the table structure ready)
ID1 | Name | ID2 | Name |
3559 | LODpmedhos5_LAe | 3561 | LODpmedhos3_LAe|
I'd appreciate if someone could give me the correct SQL query for it. I don't have more than two matching coordinates and not all coordinates match.
Sorry if the table representations suck, I'll try to make a HTML table if necessary.
Thanks!
This query will do the trick, but the number of results might be a LOT more than required. For example, if there are 5 rows satisfying your query, then the results will be 20( = n*(n-1) ) in number.
SELECT ot.ID AS ID1, ot.Name AS Name1, ot2.ID AS ID2, ot2.Name AS Name
FROM objecttable ot
JOIN objecttable ot2
ON ot.ID > ot2.ID
AND ot.posX = ot2.posX
AND ot.posY = ot2.posY
AND ot.posZ = ot2.posZ
AND ot.rotX = ot2.rotX
AND ot.rotY = ot2.rotY
AND ot.rotZ = ot2.rotZ
EDIT
In reply to lserni's comment:
ON ot.ID <> ot2.ID
The above condition is there to remove the result like:
ID1 | Name | ID2 | Name |
3559 | LODpmedhos5_LAe | 3559 | LODpmedhos5_LAe|
try this:
-- insert into matchtable -- uncomment to insert the data
select alias1.Id,
alias1.Name,
alias2.Id
alias2.Name
from objecttable as alias1
join objecttable as alias2
on alias1.posx = alias2.posx
and alias1.posy = alias2.posy
and alias1.posz = alias2.posz
and alias1.roty = alias2.roty
and alias1.roty = alias2.roty
and alias1.rotz = alias2.rotz
and alias1.Id > alias2.Id

Efficient assignment of percentile/rank in MYSQL

I have a couple of very large tables (over 400,000 rows) that look like the following:
+---------+--------+---------------+
| ID | M1 | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 | NULL |
| 3684515 | 3.0476 | NULL |
| 3684516 | 2.6499 | NULL |
| 3684517 | 0.3585 | NULL |
| 3684518 | 1.6919 | NULL |
| 3684519 | 2.8515 | NULL |
| 3684520 | 4.0728 | NULL |
| 3684521 | 4.0224 | NULL |
| 3684522 | 5.8207 | NULL |
| 3684523 | 6.8291 | NULL |
+---------+--------+---------------+...about 400,000 more
I need to assign each row in the M1_Percentile column a value that represents "the percent of rows with M1 values equal or lower to the current row's M1 value"
In other words, I need:
I implemented this sucessfully, but it is FAR FAR too slow. If anyone could create a more efficient version of the following code, I would really appreciate it!
UPDATE myTable AS X JOIN (
SELECT
s1.ID, COUNT(s2.ID)/ (SELECT COUNT(*) FROM myTable) * 100 AS percentile
FROM
myTable s1 JOIN myTable s2 on (s2.M1 <= s1.M1)
GROUP BY s1.ID
ORDER BY s1.ID) AS Z
ON (X.ID = Z.ID)
SET X.M1_Percentile = Z.percentile;
This is the (correct but slow) result from the above query if the number of rows is limited to the ones you see (10 rows):
+---------+--------+---------------+
| ID | M1 | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 | 60 |
| 3684515 | 3.0476 | 50 |
| 3684516 | 2.6499 | 30 |
| 3684517 | 0.3585 | 10 |
| 3684518 | 1.6919 | 20 |
| 3684519 | 2.8515 | 40 |
| 3684520 | 4.0728 | 80 |
| 3684521 | 4.0224 | 70 |
| 3684522 | 5.8207 | 90 |
| 3684523 | 6.8291 | 100 |
+---------+--------+---------------+
Producing the same results for the entire 400,000 rows takes magnitudes longer.
I cannot test this, but you could try something like:
update table t
set mi_percentile = (
select count(*)
from table t1
where M1 < t.M1 / (
select count(*)
from table));
UPDATE:
update test t
set m1_pc = (
(select count(*) from test t1 where t1.M1 < t.M1) * 100 /
( select count(*) from test));
This works in Oracle (the only database I have available). I do remember getting that error in MySQL. It is very annoying.
Fair warning: mysql isn't my native environment. However, after a little research, I think the following query should be workable:
UPDATE myTable AS X
JOIN (
SELECT X.ID, (
SELECT COUNT(*)
FROM myTable X1
WHERE (X.M1, X.id) >= (X1.M1, X1.id) as Rank)
FROM myTable as X
) AS RowRank
ON (X.ID = RowRank.ID)
CROSS JOIN (
SELECT COUNT(*) as TotalCount
FROM myTable
) AS TotalCount
SET X.M1_Percentile = RowRank.Rank / TotalCount.TotalCount;