MySQL - efficient regexp (or like) query - mysql

I have two tables, a performer table and a redirect table. The performer table has a column called slug. The redirect table has a column called source.
Both the source and slug columns have unique key indexes.
An example of a slug column data is something like:
this-is-a-slug
An example of a source column data is something like:
this-is-a-slug.s12345
I want an efficient query that gives me all the rows in redirect that have a source column that starts with a slug and the ".s" characters, followed by a number digits.
I tried this:
select source from redirect
join performer on
source regexp concat('^', slug, '.s[0-9]+$');
It was extremely slow. So I decided to be less restrictive and tried this:
select source from redirect
join performer on
source like concat(slug, ".s%");
It was still slow.
Is there a way I can do this efficiently?

Abandon the current plans.
Add a column to redirect that has the slug. This is a one-time change to the table, plus changing your code to insert it.
If you are running 5.7 or MariaDB, use a virtual column, possibly with a materialized index.
BTW, here's another way to split the string:
mysql> SELECT SUBSTRING_INDEX('this-is-a-slug.s12345', '.', 1);
+--------------------------------------------------+
| SUBSTRING_INDEX('this-is-a-slug.s12345', '.', 1) |
+--------------------------------------------------+
| this-is-a-slug |
+--------------------------------------------------+
If the 's' is critical, then study these:
mysql> SELECT SUBSTRING_INDEX('this-is-a-slug.s12345', '.s', 1);
+---------------------------------------------------+
| SUBSTRING_INDEX('this-is-a-slug.s12345', '.s', 1) |
+---------------------------------------------------+
| this-is-a-slug |
+---------------------------------------------------+
mysql> SELECT SUBSTRING_INDEX('this-is-a-slug.invalid', '.s', 1);
+----------------------------------------------------+
| SUBSTRING_INDEX('this-is-a-slug.invalid', '.s', 1) |
+----------------------------------------------------+
| this-is-a-slug.invalid |
+----------------------------------------------------+

Maybe
join performer on left(source,length(slug)+2)=concat(slug, ".s")
But it seems to me it is the same

Related

How to create a loop to replace values in mysql

I have two tables something like this:
Table 1:
+---------------------+
| name_fr | name_en |
+---------------------+
| valfr1 | valen1 |
+---------------------+
Table 2:
+------------------------+
| id | value |
+------------------------+
| 1 | valfr1 is thiss |
+------------------------+
| 2 | something random |
+------------------------+
I try to loop each row of table 1 and take the values of each field, then for each row in table 2 I intend to do a replacement in the value field. Given the tables example, the loop would do something like this:
update table2 set value = replace(value, 'valfr1', 'valen1');
And it would replace the value in table2 with id 1 and it will have 'valen1 is thiss'
But imagine table1 has for example 100 rows, how can i loop for each of them and try to replace the value?
Thanks for the help, sorry if i couldn't explain myself correctly
Introduction
You can easily achieve something like this with an update command. Don't worry, that's ultimately does a loop under the hood, it's just a looping that has been ever optimizing for decades, so it's probable that your loop will not perform as well as that. At least not without a very large amount of effort. So, for this answer I will assume that an update is good-enough for this purpose.
Reference: https://www.mysqltutorial.org/mysql-update-join/
The query
UPDATE TABLE1
JOIN TABLE2
ON TABLE1.value LIKE CONCAT('%', TABLE2.name_fr, '%')
SET TABLE1.value = REPLACE(TABLE1.value, TABLE2.name_fr, TABLE2.name_en);
Explanation
This query matches all records from TABLE1 to their counterparts from TABLE2, where TABLE1.value contains TABLE2.name_fr. For these matches the replacement is done for TABLE1.value accordingly to the mapping specified in TABLE2.
Edge-case
If there is a name_fr value which contains another, then it is better to evaluate the former first, because the latter could make premature replacements if evaluated first. For this purpose you could order TABLE2 descendingly by fr_name length and alias is to some name.

Process TEXT BLOBs fields in MySQL line by line

I have a MEDIUMTEXT blob in a table, which contains paths, separated by new line characters. I'd like to add a "/" to the begging of each line if it is not already there. Is there a way to write a query to do this with built-in procedures?
I suppose an alternative would be to write a Python script to get the field, convert to a List, process each line and update the record. There aren't that many records in the DB, so I can take the processing delay (if it doesn't lock the entire DB or table). About 8K+ rows.
Either way would be fine. If second option is recommended, do I need to know of specific locking schematics before getting into this -- as this would be run on a live prod DB (of course, I'd take a DB snapshot). But in place updates would be best to not have downtime.
Demo:
mysql> create table mytable (id int primary key, t text );
mysql> insert into mytable values (1, 'path1\npath2\npath3');
mysql> select * from mytable;
+----+-------------------+
| id | t |
+----+-------------------+
| 1 | path1
path2
path3 |
+----+-------------------+
1 row in set (0.00 sec)
mysql> update mytable set t = concat('/', replace(t, '\n', '\n/'));
mysql> select * from mytable;
+----+----------------------+
| id | t |
+----+----------------------+
| 1 | /path1
/path2
/path3 |
+----+----------------------+
However, I would strongly recommend to store each path on its own row, so you don't have to think about this. In SQL, each column should store one value per row, not a set of values.

Sorting order behaviour between Postgres and Mysql

I have faced some strange sort order behaviour between Postgres & mysql.
For example, i have created simple table with varchar column and inserted two records as below in both Postgres and Mysql.
create table mytable(name varchar(100));
insert into mytable values ('aaaa'), ('aa_a');
Now, i have executed simple select query with order by column.
Postgres sort order:
test=# select * from mytable order by (name) asc;
name
------
aa_a
aaaa
(2 rows)
Mysql sort order:
mysql> select * from mytable order by name asc;
+------+
| name |
+------+
| aaaa |
| aa_a |
+------+
2 rows in set (0.00 sec)
Postgres and mysql both returning same records with different order.
My question is which one correct?
How to get results in same order in both database?
Edited:
I tried with query with ORDER BY COLLATE, it solved my problem.
Tried like this
mysql> select * from t order by name COLLATE utf8_bin;
+------+
| name |
+------+
| aa_a |
| aaaa |
+------+
3 rows in set (0.00 sec)
Thanks.
There is no "correct" way to sort data.
You need to read up on "locales".
Different locales will provide (among other things) different sort orders. You might have a database using ISO-8859-1 or UTF-8 which can represent several different languages. Rules for sorting English will be different for those from French or German.
PostgreSQL uses the underlying operating-system's support for locales, and not all locales are available on all platforms. The alternative is to provide your own support, but then you can have incompatibilities within one machine.
I believe MySQL takes the second option, but I'm no expert on MySQL.

How can I use the LIKE operator on a list of strings to compare?

I have a query I need to run on almost 2000 strings where it would be very helpful to be able to do a list like you can with the "IN" operator but using the LIKE comparison operation.
For example I want to check to see if pet_name is like any of these (but not exact): barfy, max, whiskers, champ, big-D, Big D, Sally
Using like it wouldn't be case sensitive and it can also have an underscore instead of a dash. Or a space. It will be a huge pain in the ass to write a large series of OR operators. I am running this on MySQL 5.1.
In my particular case I am looking for file names where the differences are usually a dash or an underscore where the opposite would be.
For this task I would suggest making use of RegExp capabilities in MySQL like this:
select * from EMP where name RLIKE 'jo|ith|der';
This is case insensitive match and will save from multiple like / OR conditions.
You could do something like this -
SELECT FIND_IN_SET(
'bigD',
REPLACE(REPLACE('barfy,max,whiskers,champ,big-D,Big D,Sally', '-', ''), ' ', '')
) has_petname;
+-------------+
| has_petname |
+-------------+
| 5 |
+-------------+
It will give a non-zero value (>0) if there is a pet_name we are looking for.
But I'd suggest you to create a table petnames and use SOUNDS LIKE function to compare names, in this case 'bigD' will be equal to 'big-D', e.g.:
SELECT 'bigD' SOUNDS LIKE 'big-D';
+---------------------------+
| 'bigD'SOUNDS LIKE 'big-D' |
+---------------------------+
| 1 |
+---------------------------+
Example:
CREATE TABLE petnames(name VARCHAR(40));
INSERT INTO petnames VALUES
('barfy'),('max'),('whiskers'),('champ'),('big-D'),('Big D'),('Sally');
SELECT name FROM petnames WHERE 'bigD' SOUNDS LIKE name;
+-------+
| name |
+-------+
| big-D |
| Big D |
+-------+
As first step put all static values in any temporary table, this would be lookup dictionary.
SELECT * FROM Table t
WHERE EXISTS (
SELECT *
FROM LookupTable l
WHERE t.PetName LIKE '%' + l.Value + '%'
)
Configure the column containing those 2000 values for full-text searching. Then you can use MySQL's full-text search feature. Refer to their docs
You could use REGEXP instead. It worked like a charm for me
pet_name regexp 'barfy|max|whiskers|champ|you name it'

MySQL search comma separated value syntax

I am using MySQL. In one of my table attributes, I have a serial number description like "SM,ST,SK" for one device.
When users enter SM or ST or SK, I want my query to return a result
My current query looks like that:
SELECT CONCAT(lvl1_id,',',lvl2_id)
FROM hier_menus
LEFT JOIN labels ON (hier_menus.id=label_id AND tbl=65 AND fld=2 AND lang_id=5)
WHERE
hm_type=13 AND lvl1_id=141 AND lvl2_id=id AND label='".addslashes($serial)."'";
It is only able to look at the first comma part of serial number column. When users enter ST, it will not return anything.
Is it possible to search the whole of the long string "SM,ST,SK" to return a matching row?
mysql> select find_in_set('SK', 'SM,ST,SK');
+-------------------------------+
| find_in_set('SK', 'SM,ST,SK') |
+-------------------------------+
| 3 |
+-------------------------------+
1 row in set (0.00 sec)
mysql> select find_in_set('SP', 'SM,ST,SK');
+-------------------------------+
| find_in_set('SP', 'SM,ST,SK') |
+-------------------------------+
| 0 |
+-------------------------------+
You are looking for find_in_set,
however, this is not an optimize solution
you should seek to normalize your serial number into another table,
where each SM,ST, and SK is stored as one row
another way is to convert the data type to set
Try FIND_IN_SET():
SELECT ... WHERE FIND_IN_SET($serial, label)
and as ajreal's pointed out, don't use addslashes. use mysql_real_escape_string (or whatever your DB abstraction library provides). addslashes is hopelessly broken and WILL allow someone to attack your database with ease.