MySQL string splitting on delimiters - mysql

Based on https://stackoverflow.com/a/59666211/4250302 I created the stored function get_enum_item for future processing the lists of possible values in the ENUM() type fields.
It works fine enough, but... but I can't determine what to do if the delimiter itself is the part of a string being split. For example:
(square brackets are for readability)
mysql> set #q=",v1,',v2'" --empty string, "v1", "comma-v2";
mysql> select concat('[',get_enum_item(#q,',',0),']') as item;
+------+
| item |
+------+
| [] |
+------+
it is OK
mysql> select concat('[',get_enum_item(#q,',',1),']') as item;
+------+
| item |
+------+
| [v1] |
+------+
it is also OK
mysql> select concat('[',get_enum_item(#q,',',2),']') as item;
+------+
| item |
+------+
| ['] |
+------+
It is not OK
the #q contains 3 commas, the first two of these are real delimiters, while the last one is the part of the third possible value: "comma-v-two". And I have no idea how to avoid confusion of splitting function. MySQL WorkBench in the "form editor" mode solves this trouble somehow, but how can I solve this with MySQL's code?
Well, I can rely on the fact that the show_columns-like queries show the enums in "hardcoded" manner:
select column_name,column_type
from information_schema.columns
where data_type='enum' and table_name='assemblies';
+--------------+------------------------------------------------------------------+
| COLUMN_NAME | COLUMN_TYPE |
+--------------+------------------------------------------------------------------+
| AssetTagType | enum('','И/Н','Н/Н',',fgg') |
| PCTagType | enum('','И/Н','Н/Н') |
| MonTagType | enum('','И/Н','Н/Н') |
| UPSTagType | enum('','И/Н','Н/Н') |
| OtherTagType | enum('','И/Н','Н/Н') |
| state | enum('в работе','на списание','списано') |
+--------------+------------------------------------------------------------------+
Thus I can try to use ',' as a delimiter, but this will not save me from the case if the "comma-apostrophe" combination is the part of possible value... :-(
The only thing I can imagine is to count apostrophes and if the delimiting comma is after the even number of ''s, then it is the delimiter, while if it follows an odd number of ''s, it is the part of the value.
And I can't invent anything except for dumb scanning the input string inside the loop. But maybe there are some other suggestions to get the values split correctly?
Please, don't suggest use PHP, Python, AWK, and so on. The query will be executed from the Pascal (Lazarus, CodeTyphoon) application, and calling external processors is highly unsafe.
As a last resort, I can process the column_type with Pascal's code, but at first, I must make myself sure that the task is not solvable by MySQL's features.
edit:
select column_type from information_schema.columns
where column_name='assettagtype' and table_name='assemblies';
+------------------------------------------+
| COLUMN_TYPE |
+------------------------------------------+
| enum('','И/Н','Н/Н',''''',fgg','''') |
+------------------------------------------+
1 row in set (0.00 sec)
Fourth field: '',fgg, fifth field: '

set #q="'в работе','на списание','списано'";
WITH RECURSIVE cte as (
select 1 as a union all
select a+1 from cte where a<35
)
select distinct regexp_substr(#q,'''[^,]*''',a) as E from cte;
Too high values for 35 raise an error ERROR 3686 (HY000): Index out of bounds in regular expression search.. (I created a bug for this)
The null value should be filtered out... 😉
output:
E
'в работе'
'на списание'
'списано'
null
EDIT: With some effort, this also works for a more complex example (not for every "staged" example!)
set #q="'в работе','на списание','списано',''',fgg'";
select #q;
WITH RECURSIVE cte as (
select 1 as a union all
select a+1 from cte where a<35
)
select distinct regexp_substr(#q,'(''([^,]|[^''][^''])*'')',a) E from cte;
output:
E
'в работе'
'на списание'
'списано'
''',fgg'

Related

How can I extract a number from string using regex with various formats (MySQL)?

I have a MySQL data table which stores metadata for client transactions. I am writing a query to extract a number out of the metadata column, which is essentially a JSON stored as a string.
I am trying to find 'clients' and extract the first number after clients. The data can be stored in several different ways; see the examples below:
..."type\":\"temp\",\"typeOther\":\"\",\"clients\":\"2\",\"hours\":\"5\",\...
..."id\":31457,\"clients\":2,\"cancel\":false\...
I've tried the following Regexp:
(?<=clients.....)[0-9]+
(?<=clients...)[0-9]*(?=[[^:digit:]])
And I've tried the following json_extract, but it returned a null value:
json_extract(rd.meta, '$.clients')
The regexp functions do work, but the first one only works on the first example, while the second only works on the second example.
What regexp should I use such that it's dynamic and will pull the number nested between two non-word char sets after 'clients'?
clients.*?([0-9]+)
^^^^^^^ -- exact match
^^^ -- non-greedy string of anything
^ ^ -- capture
^^^^^^ -- string of 1 or more digits (greedy)
I did this test on MySQL 8.0.29, but it should work on MySQL 5.x too:
mysql> set #s1 = '..."type\":\"temp\",\"typeOther\":\"\",\"clients\":\"2\",\"hours\":\"5\",\...';
mysql> set #s2 = '..."id\\":31457,\\"clients\\":2,\\"cancel\\":false\\...';
mysql> select trim(leading '\\"' from substring_index(#s1, '\\"clients\\":', -1)) as clients;
+--------------------------+
| clients |
+--------------------------+
| 2\",\"hours\":\"5\",\... |
+--------------------------+
mysql> select trim(leading '\\"' from substring_index(#s2, '\\"clients\\":', -1)) as clients;
+------------------------+
| clients |
+------------------------+
| 2,\"cancel\":false\... |
+------------------------+
Then cast the result as an integer to get rid of the non-numeric part following the number.
mysql> select cast(trim(leading '\\"' from substring_index(#s1, '\\"clients\\":', -1)) as unsigned) as clients;
+---------+
| clients |
+---------+
| 2 |
+---------+
mysql> select cast(trim(leading '\\"' from substring_index(#s2, '\\"clients\\":', -1)) as unsigned) as clients;
+---------+
| clients |
+---------+
| 2 |
+---------+

How to CAST a VARCHAR value to INT if the value is comma seperated

The MySQL database I am working with has a column with comma separated values similar to -
mysql> select * from performance;
+----+------------------+
| id | maximums |
+----+------------------+
| 1 | 10000RPM, 60KM/h |
| 2 | 5000RPM, 30KM/h |
| 3 | 25mph, 3000RPM |
| 4 | 200KM/h, 2000RPM |
+----+------------------+
4 rows in set (0.00 sec)
I am trying to cast the numbers found in to their own INT columns.
mysql> select maximums,
CASE WHEN maximums like "%mph%" THEN CAST(SUBSTRING_INDEX(maximums, 'mph', 1) AS UNSIGNED) END AS mph_int,
CASE WHEN maximums like "%KM/h%" THEN CAST(SUBSTRING_INDEX(maximums, 'KM/h', 1) AS UNSIGNED) END AS kmh_int,
CASE WHEN maximums like "%RPM%" THEN CAST(SUBSTRING_INDEX(maximums, 'RPM', 1) AS UNSIGNED) END AS rpm_int
from performance;
+------------------+---------+---------+---------+
| maximums | mph_int | kmh_int | rpm_int |
+------------------+---------+---------+---------+
| 10000RPM, 60KM/h | NULL | 10000 | 10000 |
| 5000RPM, 30KM/h | NULL | 5000 | 5000 |
| 25mph, 3000RPM | 25 | NULL | 25 |
| 200KM/h, 2000RPM | NULL | 200 | 200 |
+------------------+---------+---------+---------+
4 rows in set, 4 warnings (0.00 sec)
I expect the output to show me the values as INTs in new columns, however am unsure how to achieve this.
Let's give this a whirl, using the good ol'-fashioned blunt instrument approach. I am guessing that you only need this to work once, to convert an old, poorly-designed schema into something more workable. Given that, I have made no effort at elegance or performance.
(If you are not using this to fix your data schema, you should, because the pain you are experiencing now is only the beginning.)
First, we need to split the maximums value into two pieces and process them separately. The first half is:
SUBSTRING_INDEX(`maximum`, ',', 1)
The second half is similar, but there is a stray space:
TRIM(SUBSTRING_INDEX(`maximum`, ',', -1))
From here on, let's just always trim, in case there is variation in the data. Now we need to see if the first section has 'mph' in it, and if so capture the value as you did in your question (this is essentially like your example but operating on only the first part of the maximum value):
IF(TRIM(SUBSTRING_INDEX(`maximum`, ',', 1)) LIKE '%mph', SUBSTRING_INDEX(TRIM(SUBSTRING_INDEX(`maximum`, ',', 1)), 'mph', 1), NULL)
Let's name that chunk of code "mph test on first half". The mph test on the second half is almost identical, just using -1 as the index. Finally, we need to put the non-null value (if either) into the column using COALESCE. Once we create all six variations of the test, we end up with the following:
SELECT
...
COALESCE([mph test on first half], [mph test on second half]) AS mph_int,
COALESCE([kph test on first half], [kph test on second half]) AS kph_int,
COALESCE([rpm test on first half], [rpm test on second half]) AS rpm_int
WHERE
...
Chances are you don't actually need to formally cast the string of digits into an integer; if you are inserting into a table with columns of those types, MySQL will cast the value for you.

How to match an ip address in mysql?

For example, I am having a column storing data like this.
Apple
12.5.126.40
Smite
Abby
127.0.0.1
56.5.4.8
9876543210
Notes
How to select out only the rows with data in IP format?
I have tried with '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$'
but I have no idea why it also matches 9876543210
You're going to need to use REGEXP to match the IP address dotted quad pattern.
SELECT *
FROM yourtable
WHERE
thecolumn REGEXP '^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}$'
Technically, this will match values that are not valid IP addresses, like 999.999.999.999, but that may not be important. What is important, is fixing your data such that IP addresses are stored in their own column separate from whatever other data you have in here. It is almost always a bad idea to mix data types in one column.
mysql> SELECT '9876543210' REGEXP '^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}$';
+---------------------------------------------------------------------------+
| '9876543210' REGEXP '^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}$' |
+---------------------------------------------------------------------------+
| 0 |
+---------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT '987.654.321.0' REGEXP '^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}$';
+------------------------------------------------------------------------------+
| '987.654.321.0' REGEXP '^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}$' |
+------------------------------------------------------------------------------+
| 1 |
+------------------------------------------------------------------------------+
Another method is to attempt to convert the IP address to a long integer via MySQL's INET_ATON() function. An invalid address will return NULL.
This method is likely to be more efficient than the regular expression.
You may embed it in a WHERE condition like: WHERE INET_ATON(thecolumn) IS NOT NULL
SELECT INET_ATON('127.0.0.1');
+------------------------+
| INET_ATON('127.0.0.1') |
+------------------------+
| 2130706433 |
+------------------------+
SELECT INET_ATON('notes');
+--------------------+
| INET_ATON('notes') |
+--------------------+
| NULL |
+--------------------+
SELECT INET_ATON('56.99.9999.44');
+----------------------------+
| INET_ATON('56.99.9999.44') |
+----------------------------+
| NULL |
+----------------------------+
IS_IPV4() is a native mysql function that lets you check whether a value is a valid IP Version 4.
SELECT *
FROM ip_containing_table
WHERE IS_IPV4(ip_containing_column);
I don't have data, but I reckon that this must be the most solid and efficient way to do this.
There are also similar native functions that check for IP Version 6 etc.
This may not be the most efficient way, and it's not technically regex, but it should work:
SELECT col1 FROM t1 WHERE col1 LIKE '%.%.%.%';
you could also use the useful function inet_aton()
SELECT *
FROM yourtable
WHERE inet_aton(thecolumn) is not null
Lengthy but works fine:
mysql> SELECT '1.0.0.127' regexp '^([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\\.([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\\.([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\\.([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])$';

MySql: using #variable in select statment takes hundreds times longer

I'm trying to understand a huge performance difference that I'm seeing in equivalent code. Or at least code I think is equivalent.
I have a table with about 10 million records on it. It contains a field, which is indexed defined as:
USPatentNum char(8)
If I set a variable withing MySql to a value, it takes over 218 seconds. The exact same query with a string literal takes under 1/4 of a second.
In the code below, the first select statement (with where USPatentNum = #pn;) takes forever, but the second, with the literal value
(where USPatentNum = '5288812';) is nearly instant
mysql> select #pn := '5288812';
+------------------+
| #pn := '5288812' |
+------------------+
| 5288812 |
+------------------+
1 row in set (0.00 sec)
mysql> select patentId, USPatentNum, grantDate from patents where USPatentNum = #pn;
+----------+-------------+------------+
| patentId | USPatentNum | grantDate |
+----------+-------------+------------+
| 306309 | 5288812 | 1994-02-22 |
+----------+-------------+------------+
1 row in set (3 min 38.17 sec)
mysql> select #pn;
+---------+
| #pn |
+---------+
| 5288812 |
+---------+
1 row in set (0.00 sec)
mysql> select patentId, USPatentNum, grantDate from patents where USPatentNum = '5288812';
+----------+-------------+------------+
| patentId | USPatentNum | grantDate |
+----------+-------------+------------+
| 306309 | 5288812 | 1994-02-22 |
+----------+-------------+------------+
1 row in set (0.21 sec)
Two questions:
Why is the use of the #pn so much slower?
Can I change the select statement so that the performance will be the same?
Declare #pn as char(8) before setting its value.
I suspect it will be a varchar as you do it now. If so, the performance loss is because MySql can't mach the index with your variable.
It doesn't matter whether you use constant or #var. You get different result because the second time MySQL gets results from cache. If you execute once again your scenario but trade places queries with const and with #var you will get them same results (but with another value). First will be slowed, second will be fast.
Hope it helps

In MySQL, should I quote numbers or not?

For example - I create database and a table from cli and insert some data:
CREATE DATABASE testdb CHARACTER SET 'utf8' COLLATE 'utf8_general_ci';
USE testdb;
CREATE TABLE test (id INT, str VARCHAR(100)) TYPE=innodb CHARACTER SET 'utf8' COLLATE 'utf8_general_ci';
INSERT INTO test VALUES (9, 'some string');
Now I can do this and these examples do work (so - quotes don't affect anything it seems):
SELECT * FROM test WHERE id = '9';
INSERT INTO test VALUES ('11', 'some string');
So - in these examples I've selected a row by a string that actually stored as INT in mysql and then I inserted a string in a column that is INT.
I don't quite get why this works the way it works here. Why is string allowed to be inserted in an INT column?
Can I insert all MySQL data types as strings?
Is this behavior standard across different RDBMS?
MySQL is a lot like PHP, and will auto-convert data types as best it can. Since you're working with an int field (left-hand side), it'll try to transparently convert the right-hand-side of the argument into an int as well, so '9' just becomes 9.
Strictly speaking, the quotes are unnecessary, and force MySQL to do a typecasting/conversion, so it wastes a bit of CPU time. In practice, unless you're running a Google-sized operation, such conversion overhead is going to be microscopically small.
You should never put quotes around numbers. There is a valid reason for this.
The real issue comes down to type casting. When you put numbers inside quotes, it is treated as a string and MySQL must convert it to a number before it can execute the query. While this may take a small amount of time, the real problems start to occur when MySQL doesn't do a good job of converting your string. For example, MySQL will convert basic strings like '123' to the integer 123, but will convert some larger numbers, like '18015376320243459', to floating point. Since floating point can be rounded, your queries may return inconsistent results. Learn more about type casting here. Depending on your server hardware and software, these results will vary. MySQL explains this.
If you are worried about SQL injections, always check the value first and use PHP to strip out any non numbers. You can use preg_replace for this: preg_replace("/[^0-9]/", "", $string)
In addition, if you write your SQL queries with quotes they will not work on databases like PostgreSQL or Oracle.
Check this, you can understand better ...
mysql> EXPLAIN SELECT COUNT(1) FROM test_no WHERE varchar_num=0000194701461220130201115347;
+----+-------------+------------------------+-------+-------------------+-------------------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+------+---------+--------------------------+
| 1 | SIMPLE | test_no | index | Uniq_idx_varchar_num | Uniq_idx_varchar_num | 63 | NULL | 3126240 | Using where; Using index |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+------+---------+--------------------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT COUNT(1) FROM test_no WHERE varchar_num='0000194701461220130201115347';
+----+-------------+------------------------+-------+-------------------+-------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+-------+------+-------------+
| 1 | SIMPLE | test_no | const | Uniq_idx_varchar_num | Uniq_idx_varchar_num | 63 | const | 1 | Using index |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
mysql>
mysql>
mysql> SELECT COUNT(1) FROM test_no WHERE varchar_num=0000194701461220130201115347;
+----------+
| COUNT(1) |
+----------+
| 1 |
+----------+
1 row in set, 1 warning (7.94 sec)
mysql> SELECT COUNT(1) FROM test_no WHERE varchar_num='0000194701461220130201115347';
+----------+
| COUNT(1) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
AFAIK it is standard, but it is considered bad practice because
- using it in a WHERE clause will prevent the optimizer from using indices (explain plan should show that)
- the database has to do additional work to convert the string to a number
- if you're using this for floating-point numbers ('9.4'), you'll run into trouble if client and server use different language settings (9.4 vs 9,4)
In short: don't do it (but YMMV)
This is not standard behavior.
For MySQL 5.5. this is the default SQL Mode
mysql> select ##sql_mode;
+------------+
| ##sql_mode |
+------------+
| |
+------------+
1 row in set (0.00 sec)
ANSI and TRADITIONAL are used more rigorously by Oracle and PostgreSQL. The SQL Modes MySQL permits must be set IF AND ONLY IF you want to make the SQL more ANSI-compliant. Otherwise, you don't have to touch a thing. I've never done so.
It depends on the column type!
if you run
SELECT * FROM `users` WHERE `username` = 0;
in mysql/maria-db you will get all the records where username IS NOT NULL.
Always quote values if the column is of type string (char, varchar,...) otherwise you'll get unexpected results!
You don't need to quote the numbers but it is always a good habit if you do as it is consistent.
The issue is, let's say that we have a table called users, which has a column called current_balance of type FLOAT, if you run this query:
UPDATE `users` SET `current_balance`='231608.09' WHERE `user_id`=9;
The current_balance field will be updated to 231608, because MySQL made a rounding, similarly if you try this query:
UPDATE `users` SET `current_balance`='231608.55' WHERE `user_id`=9;
The current_balance field will be updated to 231609