MySql: using #variable in select statment takes hundreds times longer - mysql

I'm trying to understand a huge performance difference that I'm seeing in equivalent code. Or at least code I think is equivalent.
I have a table with about 10 million records on it. It contains a field, which is indexed defined as:
USPatentNum char(8)
If I set a variable withing MySql to a value, it takes over 218 seconds. The exact same query with a string literal takes under 1/4 of a second.
In the code below, the first select statement (with where USPatentNum = #pn;) takes forever, but the second, with the literal value
(where USPatentNum = '5288812';) is nearly instant
mysql> select #pn := '5288812';
+------------------+
| #pn := '5288812' |
+------------------+
| 5288812 |
+------------------+
1 row in set (0.00 sec)
mysql> select patentId, USPatentNum, grantDate from patents where USPatentNum = #pn;
+----------+-------------+------------+
| patentId | USPatentNum | grantDate |
+----------+-------------+------------+
| 306309 | 5288812 | 1994-02-22 |
+----------+-------------+------------+
1 row in set (3 min 38.17 sec)
mysql> select #pn;
+---------+
| #pn |
+---------+
| 5288812 |
+---------+
1 row in set (0.00 sec)
mysql> select patentId, USPatentNum, grantDate from patents where USPatentNum = '5288812';
+----------+-------------+------------+
| patentId | USPatentNum | grantDate |
+----------+-------------+------------+
| 306309 | 5288812 | 1994-02-22 |
+----------+-------------+------------+
1 row in set (0.21 sec)
Two questions:
Why is the use of the #pn so much slower?
Can I change the select statement so that the performance will be the same?

Declare #pn as char(8) before setting its value.
I suspect it will be a varchar as you do it now. If so, the performance loss is because MySql can't mach the index with your variable.

It doesn't matter whether you use constant or #var. You get different result because the second time MySQL gets results from cache. If you execute once again your scenario but trade places queries with const and with #var you will get them same results (but with another value). First will be slowed, second will be fast.
Hope it helps

Related

MySQL string splitting on delimiters

Based on https://stackoverflow.com/a/59666211/4250302 I created the stored function get_enum_item for future processing the lists of possible values in the ENUM() type fields.
It works fine enough, but... but I can't determine what to do if the delimiter itself is the part of a string being split. For example:
(square brackets are for readability)
mysql> set #q=",v1,',v2'" --empty string, "v1", "comma-v2";
mysql> select concat('[',get_enum_item(#q,',',0),']') as item;
+------+
| item |
+------+
| [] |
+------+
it is OK
mysql> select concat('[',get_enum_item(#q,',',1),']') as item;
+------+
| item |
+------+
| [v1] |
+------+
it is also OK
mysql> select concat('[',get_enum_item(#q,',',2),']') as item;
+------+
| item |
+------+
| ['] |
+------+
It is not OK
the #q contains 3 commas, the first two of these are real delimiters, while the last one is the part of the third possible value: "comma-v-two". And I have no idea how to avoid confusion of splitting function. MySQL WorkBench in the "form editor" mode solves this trouble somehow, but how can I solve this with MySQL's code?
Well, I can rely on the fact that the show_columns-like queries show the enums in "hardcoded" manner:
select column_name,column_type
from information_schema.columns
where data_type='enum' and table_name='assemblies';
+--------------+------------------------------------------------------------------+
| COLUMN_NAME | COLUMN_TYPE |
+--------------+------------------------------------------------------------------+
| AssetTagType | enum('','И/Н','Н/Н',',fgg') |
| PCTagType | enum('','И/Н','Н/Н') |
| MonTagType | enum('','И/Н','Н/Н') |
| UPSTagType | enum('','И/Н','Н/Н') |
| OtherTagType | enum('','И/Н','Н/Н') |
| state | enum('в работе','на списание','списано') |
+--------------+------------------------------------------------------------------+
Thus I can try to use ',' as a delimiter, but this will not save me from the case if the "comma-apostrophe" combination is the part of possible value... :-(
The only thing I can imagine is to count apostrophes and if the delimiting comma is after the even number of ''s, then it is the delimiter, while if it follows an odd number of ''s, it is the part of the value.
And I can't invent anything except for dumb scanning the input string inside the loop. But maybe there are some other suggestions to get the values split correctly?
Please, don't suggest use PHP, Python, AWK, and so on. The query will be executed from the Pascal (Lazarus, CodeTyphoon) application, and calling external processors is highly unsafe.
As a last resort, I can process the column_type with Pascal's code, but at first, I must make myself sure that the task is not solvable by MySQL's features.
edit:
select column_type from information_schema.columns
where column_name='assettagtype' and table_name='assemblies';
+------------------------------------------+
| COLUMN_TYPE |
+------------------------------------------+
| enum('','И/Н','Н/Н',''''',fgg','''') |
+------------------------------------------+
1 row in set (0.00 sec)
Fourth field: '',fgg, fifth field: '
set #q="'в работе','на списание','списано'";
WITH RECURSIVE cte as (
select 1 as a union all
select a+1 from cte where a<35
)
select distinct regexp_substr(#q,'''[^,]*''',a) as E from cte;
Too high values for 35 raise an error ERROR 3686 (HY000): Index out of bounds in regular expression search.. (I created a bug for this)
The null value should be filtered out... 😉
output:
E
'в работе'
'на списание'
'списано'
null
EDIT: With some effort, this also works for a more complex example (not for every "staged" example!)
set #q="'в работе','на списание','списано',''',fgg'";
select #q;
WITH RECURSIVE cte as (
select 1 as a union all
select a+1 from cte where a<35
)
select distinct regexp_substr(#q,'(''([^,]|[^''][^''])*'')',a) E from cte;
output:
E
'в работе'
'на списание'
'списано'
''',fgg'

Check if IP is in CIDR netmask (range)

I have 2 tables ip and cidr.
In the first one I store IP's. (2 column table, id, ip), here is an example (the values are fictional):
id | ip
---+-------------
1 | 172.922.2.10
---+-------------
2 | 194.22.10.13
In the second one I store CIDR netmask's (2 column table, id, cidr), here is an example (the values are fictional):
id | cidr
---+-------------
1 | 26.232.49.0/20
---+---------------
2 | 14.44.182.0/24
Is there any way to make a mysql query to check whether the ip's from the first table are in the range of any of my cidr netmasks?
Note: To convert a cidr netmask to a range of ip's click here
I'd personally recommend using postgres as it has a CIDR data-type and powerful functions to go with it, but there's an interesting discussion on doing similar things in MySQL, too.
http://planet.mysql.com/entry/?id=29283
This has come up in a related project of mine, and this appears to be the top google result for the question, so, you get the answer!
create function get_lowest_ipv4(cidr char(18)) returns bigint deterministic return INET_ATON(SUBSTRING_INDEX(cidr, '/', 1));
create function get_highest_ipv4(cidr char(18)) returns bigint deterministic return get_lowest_ipv4(cidr) + (0x100000000 >> SUBSTRING_INDEX(cidr,'/', -1)) - 1;
You can then do ... from ip_map where INET_ATON("ip.add.re.ss") between get_lowest_ipv4(ip) AND get_highest_ipv4(ip)
Because you declare the functions as deterministic, it'll get cached inside mysql and the calculation will only need to be run once. Then it'll just be 'is integer greater than y and less than x', which will be effectively instant.
MySQL [astpp]> set #cidr="10.11.0.0/16";
Query OK, 0 rows affected (0.00 sec)
MySQL [astpp]> select get_lowest_ipv4(#cidr), get_highest_ipv4(#cidr), INET_NTOA(get_lowest_ipv4(#cidr)), INET_NTOA(get_highest_ipv4(#cidr));
+------------------------+-------------------------+-----------------------------------+------------------------------------+
| get_lowest_ipv4(#cidr) | get_highest_ipv4(#cidr) | INET_NTOA(get_lowest_ipv4(#cidr)) | INET_NTOA(get_highest_ipv4(#cidr)) |
+------------------------+-------------------------+-----------------------------------+------------------------------------+
| 168493056 | 168558591 | 10.11.0.0 | 10.11.255.255 |
+------------------------+-------------------------+-----------------------------------+------------------------------------+
1 row in set (0.00 sec)
MySQL [astpp]> set #cidr="10.11.12.1/32";
Query OK, 0 rows affected (0.00 sec)
MySQL [astpp]> select get_lowest_ipv4(#cidr), get_highest_ipv4(#cidr), INET_NTOA(get_lowest_ipv4(#cidr)), INET_NTOA(get_highest_ipv4(#cidr));
+------------------------+-------------------------+-----------------------------------+------------------------------------+
| get_lowest_ipv4(#cidr) | get_highest_ipv4(#cidr) | INET_NTOA(get_lowest_ipv4(#cidr)) | INET_NTOA(get_highest_ipv4(#cidr)) |
+------------------------+-------------------------+-----------------------------------+------------------------------------+
| 168496129 | 168496129 | 10.11.12.1 | 10.11.12.1 |
+------------------------+-------------------------+-----------------------------------+------------------------------------+
1 row in set (0.01 sec)
MySQL [astpp]>
The only important caveat is that you insert VALID CIDRs. For example, 10.11.12.13/24 is not valid. That's an IP Address INSIDE the 10.11.12.0/24 network.
If you are unable to validate the CIDRs before inserting them (for some crazy reason), you could change get_lowest_ipv4 to do a bitwise comparison on the source, but that's much less elegant.
INET_ATON(SUBSTRING_INDEX(`ip`, '/', 1)) & 0xffffffff ^((0x1 <<(32 - SUBSTRING_INDEX(`ip`, '/', -1))) -1 )
Is an (untested) stab at matching invalid CIDRs.

How to insert the default value in temporal tables in MySQL?

I want to create a temporal table from a SELECT statement in MySQL. It involves several JOINs, and it can produce NULL values that I want MySQL to take as zeroes. It sounds like an easy problem (simply default to zero), but MySQL (5.6.12) fails to elicit the default value.
For example, take the following two tables:
mysql> select * from TEST1;
+------+------+
| a | b |
+------+------+
| 1 | 2 |
| 4 | 25 |
+------+------+
2 rows in set (0.00 sec)
mysql> select * from TEST2;
+------+------+
| b | c |
+------+------+
| 2 | 100 |
| 3 | 100 |
+------+------+
2 rows in set (0.00 sec)
A left join gives:
mysql> select TEST1.*,c from TEST1 left join TEST2 on TEST1.b=TEST2.b;
+------+------+------+
| a | b | c |
+------+------+------+
| 1 | 2 | 100 |
| 4 | 25 | NULL |
+------+------+------+
2 rows in set (0.00 sec)
Now, if I want to save these values in a temporal table (changing NULL for zero), this is the code I would use:
mysql> create temporary table TEST_JOIN (a int, b int, c int default 0 not null)
select TEST1.*,c from TEST1 left join TEST2 on TEST1.b=TEST2.b;
ERROR 1048 (23000): Column 'c' cannot be null
What am I doing wrong? The worst part is that this code used to work before I did a system-wide upgrade (I don't remember which version of MySQL I had, but surely it was lower than my current 5.6). It used to produce the behavior I would expect: if it's NULL, use the default, not the frustrating error I'm getting now.
From the documentation of 5.6 (unchanged since 4.1):
Inserting NULL into a column that has been declared NOT NULL. For
multiple-row INSERT statements or INSERT INTO ... SELECT statements,
the column is set to the implicit default value for the column data
type. This is 0 for numeric types, the empty string ('') for string
types, and the “zero” value for date and time types. INSERT INTO ...
SELECT statements are handled the same way as multiple-row inserts
because the server does not examine the result set from the SELECT to
see whether it returns a single row. (For a single-row INSERT, no
warning occurs when NULL is inserted into a NOT NULL column. Instead,
the statement fails with an error.)
My current workaround is to store the NULL values in the temporal table, and then replace them by zeroes, but it seems rather cumbersome with many columns (and terribly inefficient). Is there a better way to do it?
BTW, I cannot simply ignore some columns in the query (as suggested for another question), because it's a multirow query.
IFNULL(`my_column`,0);
That would set NULLs to 0. Other values stay as is.
Just wrap your values/column names with IFNULL and it will convert them to whatever default value you put into the function. E.g. 0. Or "european swallow", or whatever you want.
Then you can keep strict mode on and still handle NULLs gracefully.

where clause produce weird result

I want to get the maximum value of a column for the first 1000 not null results for some condition. Then, when for the next 1000, and so on. I do this for different conditions, but here I found something strange, when I use dayofweek. The first command I show you works:
mysql> select max(id),max(d20) from (select id, d20 from data where d20 is not null and id<1000000 and dayofweek(day)=1 limit 1000) x;
+---------+----------+
| max(id) | max(d20) |
+---------+----------+
| 100281 | 13785 |
+---------+----------+
1 row in set (0.44 sec)
but actually I want this second command, which doesn't work as expected.
mysql> select max(id),max(d20) from (select id, d20 from data where d20 is not null and id>100000 and dayofweek(day)=1 limit 1000) x;
+---------+----------+
| max(id) | max(d20) |
+---------+----------+
| 303765 | 0 |
+---------+----------+
1 row in set (0.02 sec)
Any clue?
Take the extreme case of the limit being 1.
That means, the subquery returns any row (there's no order by to make the row deterministic) that has id<1000000, which makes MAX(id) and MAX(d20) return the values from that row only. Hardly representative of the total collection.
Raising the limit to 1000 will just make the sample bigger, but will still give an indeterministic result depending on which 1000 rows are sampled (assuming there are more than 1000 rows that match). You may very well get a different result every time you execute the query, so expecting a particular result won't work.
If you need a deterministic result, add an ORDER BY to your subquery before limiting the results.

Is there a way with MySQL to include the time a query takes in the results table?

I want to include the time it takes to run a query as part of the output. Is this possible?
For example, this query:
mysql> SELECT count(*) AS NumberOfUsers FROM mdl_user;
+---------------+
| NumberOfUsers |
+---------------+
| 5741 |
+---------------+
1 row in set (0.16 sec)
I want to run it so that the "0.16 sec" value appears in a second column. Something like:
mysql> SELECT
count(*) AS NumberOfUsers
, QUERY_TIME() AS TimeToRunQuery
FROM mdl_user;
+---------------+----------------+
| NumberOfUsers | TimeToRunQuery |
+---------------+----------------+
| 5741 | 0.16 sec |
+---------------+----------------+
1 row in set (0.16 sec)
Nope, sorry. If you're interested just for informational purposes, you can have your script simply time the query by recording the time when the query is sent and subtracting that from the time when the query completes.
In PHP, it'd look something like this:
$start_time = microtime();
execute_query();
$end_time = microtime() - $start_time; // execution time in microseconds