Mysql like '%search%' not found with concat colums - mysql

I create this table:
create table if not exists `example`(
`firstNames` varchar(45) not null,
`secondNames` varchar(45) not null)
ENGINE = InnoDB;
Now I insert one row:
insert into example values('Jose Alonzo', 'Pena Palma');
And a check if is correct
select * from example;
| firstNames | secondNames |
----------------------------
| Jose Alonzo| Pena Palma |
Its ok!
Easy
Now I create a statment to search this row
set #search = 'jose alonzo pena';
select * from example
where concat(firstNames, ' ', secondNames) like concat('%',#search,'%');
This return
| firstNames | secondNames |
----------------------------
| Jose Alonzo| Pena Palma |
Now I change the value #search for 'jose pena'
set #search = 'jose pena';
select * from example
where concat(firstNames, ' ', secondNames) like concat('%',#search,'%');
And do not return nothing!
| firstNames | secondNames |
What is happening?
I can't use like for characters that are in the middle of the varchar?

No, you cannot use like for characters that are in the middle of the string. Or, in other words, a space character matches a space character, not an arbitrary string of characters. The following would match:
where concat(firstNames, ' ', secondNames) like concat('%', replace(#search, ' ', '%'), '%')
The order would be important, so this would match concat(firstNames, ' ', secondNames) but not concat(secondNames, ' ', firstNames).
If you are interested in these types of searches, you should investigate full text indexes. In addition to being more powerful, they are also faster.

Related

How to replace all the digits before hyphen with a new digit using MySQL? [duplicate]

I have a table called myTable which has a column called col1. This column contains data in this format: (1 or 2 digits)(hyphen)(8 digits).
I want to replace all the data in this column and replace everything before hyphen with 4, so this is an example:
--------------------------------
| old values | New Values |
--------------------------------
| 1-654283568 => 4-654283568 |
| 2-467862833 => 4-467862833 |
| 8-478934293 => 4-478934293 |
| 12-573789475 => 4-573789475 |
| 16-574738575 => 4-574738575 |
--------------------------------
I am using MySQL 5.7.19, I believe REGEXP_REPLACE is available in MySQL Version 8+... not sure how this can be achieved?
You don't need regex; you can use SUBSTRING_INDEX to extract everything after the hyphen and concatenate 4- to that:
UPDATE myTable
SET col1 = CONCAT('4-', SUBSTRING_INDEX(col1, '-', -1))
Demo on dbfiddle
This will work regardless of the number of characters after the hyphen.
Looking to your pattern seem you could avoid regexp
update myTable
set col1 = concat('4-', right(col1,8))
or
update myTable
set col1 = concat('4', right(col1,9))
Try this:
UPDATE testing SET val=REPLACE(val,SUBSTRING(val,1,LOCATE('-',val)),'4-');
Fiddle here :https://www.db-fiddle.com/f/4mU5ctLh8NB9iKSKZF9Ue2/2
Using LOCATE to find '-' position then use SUBSTRING to get only the front part of the '-'.
SELECT CONCAT( #new_prefix, SUBSTRING(old_value FROM LOCATE('-', old_value)) ) AS new_value
UPDATE sourcetable
SET fieldname = CONCAT( '4', SUBSTRING(fieldname FROM LOCATE('-', fieldname)) )
WHERE LOCATE('-', fieldname)
/* AND another conditions */

How to replace a regex pattern in MySQL

I have a table called myTable which has a column called col1. This column contains data in this format: (1 or 2 digits)(hyphen)(8 digits).
I want to replace all the data in this column and replace everything before hyphen with 4, so this is an example:
--------------------------------
| old values | New Values |
--------------------------------
| 1-654283568 => 4-654283568 |
| 2-467862833 => 4-467862833 |
| 8-478934293 => 4-478934293 |
| 12-573789475 => 4-573789475 |
| 16-574738575 => 4-574738575 |
--------------------------------
I am using MySQL 5.7.19, I believe REGEXP_REPLACE is available in MySQL Version 8+... not sure how this can be achieved?
You don't need regex; you can use SUBSTRING_INDEX to extract everything after the hyphen and concatenate 4- to that:
UPDATE myTable
SET col1 = CONCAT('4-', SUBSTRING_INDEX(col1, '-', -1))
Demo on dbfiddle
This will work regardless of the number of characters after the hyphen.
Looking to your pattern seem you could avoid regexp
update myTable
set col1 = concat('4-', right(col1,8))
or
update myTable
set col1 = concat('4', right(col1,9))
Try this:
UPDATE testing SET val=REPLACE(val,SUBSTRING(val,1,LOCATE('-',val)),'4-');
Fiddle here :https://www.db-fiddle.com/f/4mU5ctLh8NB9iKSKZF9Ue2/2
Using LOCATE to find '-' position then use SUBSTRING to get only the front part of the '-'.
SELECT CONCAT( #new_prefix, SUBSTRING(old_value FROM LOCATE('-', old_value)) ) AS new_value
UPDATE sourcetable
SET fieldname = CONCAT( '4', SUBSTRING(fieldname FROM LOCATE('-', fieldname)) )
WHERE LOCATE('-', fieldname)
/* AND another conditions */

MySQL Select and Remove JSON Characters from a Column [duplicate]

I'm trying to replace a bunch of characters in a MySQL field. I know the REPLACE function but that only replaces one string at a time. I can't see any appropriate functions in the manual.
Can I replace or delete multiple strings at once? For example I need to replace spaces with dashes and remove other punctuation.
You can chain REPLACE functions:
select replace(replace('hello world','world','earth'),'hello','hi')
This will print hi earth.
You can even use subqueries to replace multiple strings!
select replace(london_english,'hello','hi') as warwickshire_english
from (
select replace('hello world','world','earth') as london_english
) sub
Or use a JOIN to replace them:
select group_concat(newword separator ' ')
from (
select 'hello' as oldword
union all
select 'world'
) orig
inner join (
select 'hello' as oldword, 'hi' as newword
union all
select 'world', 'earth'
) trans on orig.oldword = trans.oldword
I'll leave translation using common table expressions as an exercise for the reader ;)
Cascading is the only simple and straight-forward solution to mysql for multiple character replacement.
UPDATE table1
SET column1 = replace(replace(REPLACE(column1, '\r\n', ''), '<br />',''), '<\r>','')
REPLACE does a good simple job of replacing characters or phrases everywhere they appear in a string. But when cleansing punctuation you may need to look for patterns - e.g. a sequence of whitespace or characters in the middle of a word or after a full stop. If that's the case, a regular expression replace function would be much more powerful.
UPDATE: If using MySQL version 8+, a REGEXP_REPLACE function is provided and can be invoked as follows:
SELECT txt,
REGEXP_REPLACE(REPLACE(txt, ' ', '-'),
'[^a-zA-Z0-9-]+',
'') AS `reg_replaced`
FROM test;
See this DB Fiddle online demo.
PREVIOUS ANSWER - only read on if using a version of MySQL before version 8: .
The bad news is MySQL doesn't provide such a thing but the good news is it's possible to provide a workaround - see this blog post.
Can I replace or delete multiple strings at once? For example I need
to replace spaces with dashes and remove other punctuation.
The above can be achieved with a combination of the regular expression replacer and the standard REPLACE function. It can be seen in action in this online Rextester demo.
SQL (excluding the function code for brevity):
SELECT txt,
reg_replace(REPLACE(txt, ' ', '-'),
'[^a-zA-Z0-9-]+',
'',
TRUE,
0,
0
) AS `reg_replaced`
FROM test;
CREATE FUNCTION IF NOT EXISTS num_as_word (name TEXT) RETURNS TEXT RETURN
(
SELECT
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(IFNULL(name, ''),
'1', 'one'),
'2', 'two'),
'3', 'three'),
'4', 'four'),
'5', 'five'),
'6', 'six'),
'7', 'seven'),
'8', 'eight'),
'9', 'nine')
);
I've been using lib_mysqludf_preg for this which allows you to:
Use PCRE regular expressions directly in MySQL
With this library installed you could do something like this:
SELECT preg_replace('/(\\.|com|www)/','','www.example.com');
Which would give you:
example
on php
$dataToReplace = [1 => 'one', 2 => 'two', 3 => 'three'];
$sqlReplace = '';
foreach ($dataToReplace as $key => $val) {
$sqlReplace = 'REPLACE(' . ($sqlReplace ? $sqlReplace : 'replace_field') . ', "' . $key . '", "' . $val . '")';
}
echo $sqlReplace;
result
REPLACE(
REPLACE(
REPLACE(replace_field, "1", "one"),
"2", "two"),
"3", "three");
UPDATE schools SET
slug = lower(name),
slug = REPLACE(slug, '|', ' '),
slug = replace(slug, '.', ' '),
slug = replace(slug, '"', ' '),
slug = replace(slug, '#', ' '),
slug = replace(slug, ',', ' '),
slug = replace(slug, '\'', ''),
slug = trim(slug),
slug = replace(slug, ' ', '-'),
slug = replace(slug, '--', '-');
UPDATE schools SET
slug = replace(slug, '--', '-');
If you are using MySQL Version 8+ then below is the built-in function that might help you better.
String
Replace
Output
w"w\'w. ex%a&m:p l–e.c)o(m
"'%&:)(–
www.example.com
MySQL Query:
SELECT REGEXP_REPLACE('`w"w\'w. ex%a&m:p l–e.c)o(m`', '[("\'%[:blank:]&:–)]', '');
Almost for all bugging characters-
SELECT REGEXP_REPLACE(column, '[\("\'%[[:blank:]]&:–,#$#!;\\[\\]\)<>\?\*\^]+','')
Real-life scenario.
I had to update all the files name which has been saved in 'demo' with special characters.
SELECT * FROM demo;
| uri |
|------------------------------------------------------------------------------|
| private://webform/applicant_details/129/offers upload winners .png |
| private://webform/applicant_details/129/student : class & teacher data.pdf |
| private://webform/applicant_details/130/tax---user's---data__upload.pdf |
| private://webform/applicant_details/130/Applicant Details _ report_0_2.pdf |
| private://webform/applicant_details/131/india&asia%population huge.pdf |
Test Case -
The table has multiple rows with special characters in the file name.
Advice:
To remove all the special characters from the file name and use a-z, A-Z, 0-9, dot and underscore with a lower file name.
Expected result is:
| uri |
|------------------------------------------------------------------------------|
| private://webform/applicant_details/129/offers_upload_winners_.png |
| private://webform/applicant_details/129/student_class_teacher_data.pdf |
| private://webform/applicant_details/130/tax_user_s_data_upload.pdf |
| private://webform/applicant_details/130/applicant_details_report_0_2.pdf |
| private://webform/applicant_details/131/india_asia_population_huge.pdf |
Okay, let's plan step by step
1st - let's find the file name
2nd - run all the find replace on that file name part only
3rd - replace the new file name with an old one
How can we do this?
Let's break down the whole action in chunks for a better understanding.
Below function will extract the file name only from the full path e.g. "Applicant Details _ report_0_2.pdf"
SELECT -- MySQL SELECT statement
SUBSTRING_INDEX -- MySQL built-in function
( -- Function start Parentheses
uri, -- my table column
'/', -- delimiter (the last / in full path; left to right ->)
-1 -- start from the last and find the 1st one (from right to left <-)
) -- Function end Parentheses
from -- MySQL FROM statement
demo; -- My table name
#1 Query result
| uri |
|------------------------------------|
| offers upload winners .png |
| student : class & teacher data.pdf |
| tax---user's---data__upload.pdf |
| Applicant Details _ report_0_2.pdf |
| india&asia%population huge.pdf |
Now we have to find and replace within the generated file name result.
SELECT
REGEXP_REPLACE( -- MySQL REGEXP_REPLACE built-in function (string, pattern, replace)
SUBSTRING_INDEX(uri, '/', -1), -- File name only
'[^a-zA-Z0-9_.]+', -- Find everything which is not a-z, A-Z, 0-9, . or _.
'_' -- Replace with _
) AS uri -- Give a alias column name for whole result
from
demo;
#2 Query result
| uri |
|------------------------------------|
| offers_upload_winners_.png |
| student_class_teacher_data.pdf |
| tax_user_s_data__upload.pdf |
| Applicant_Details___report_0_2.pdf |
| india_asia_population_huge.pdf |
FYI - Last '+' in the pattern is for repetitive words like ---- or multiple spaces ' ', Notice the result without '+' in the below regex pattern.
SELECT
REGEXP_REPLACE( -- MySQL REGEXP_REPLACE built-in function (string, pattern, replace)
SUBSTRING_INDEX(uri, '/', -1), -- File name only
'[^a-zA-Z0-9_.]', -- Find everything which is not a-z, A-Z, 0-9, . or _.
'_' -- Replace with _
) AS uri -- Give a alias column name for whole result
from
demo;
#3 Query result
| uri |
|------------------------------------|
| offers___upload__winners_.png |
| student___class___teacher_data.pdf |
| tax___user_s___data__upload.pdf |
| Applicant_Details___report_0_2.pdf |
| india_asia_population__huge.pdf |
Now, we have a file name without special characters (. and _ allowed). But the problem is file name still has Capital letters and also has multiple underscores.
Let's lower the file name first.
SELECT
LOWER(
REGEXP_REPLACE(
SUBSTRING_INDEX(uri, '/', -1),
'[^a-zA-Z0-9_.]',
'_'
)
) AS uri
from
demo;
#4 Query result
| uri |
|------------------------------------|
| offers_upload_winners_.png |
| student_class_teacher_data.pdf |
| tax_user_s_data__upload.pdf |
| applicant_details___report_0_2.pdf |
| india_asia_population_huge.pdf |
Now everything is in lower case but underscores are still there. So we will wrap the whole REGEX.. with one more REGEX..
SELECT
LOWER(
REGEXP_REPLACE( -- this wrapper will solve the multiple underscores issue
REGEXP_REPLACE(
SUBSTRING_INDEX(uri, '/', -1),
'[^a-zA-Z0-9_.]+',
'_'
),
'[_]+', -- if 1st regex action has multiple __ then find it
'_' -- and replace them with single _
)
) AS uri
from
demo;
#5 Query result
| uri |
|----------------------------------|
| offers_upload_winners_.png |
| student_class_teacher_data.pdf |
| tax_user_s_data_upload.pdf |
| applicant_details_report_0_2.pdf |
| india_asia_population_huge.pdf |
Congratulations! we have found what we were looking for. Now UPDATE TIME! Yeah!!
UPDATE -- run a MySQL UPDATE statement
demo -- tell MySQL to which table you want to update
SET -- put SET statement to set the updated values in desire column
uri = REPLACE( -- tell MySQL to which column you want to update,
-- I am also putting REPLACE function to replace existing values with new one
-- REPLACE (string, replace, with-this)
uri, -- my column to replace
SUBSTRING_INDEX(uri, '/', -1), -- my file name part "Applicant Details _ report_0_2.pdf"
-- without doing any action
LOWER( -- "applicant_details_report_0_2.pdf"
REGEXP_REPLACE( -- "Applicant_Details_report_0_2.pdf"
REGEXP_REPLACE( -- "Applicant_Details___report_0_2.pdf"
SUBSTRING_INDEX(uri, '/', -1), -- "Applicant Details _ report_0_2.pdf"
'[^a-zA-Z0-9_.]+',
'_'
),
'[_]+',
'_'
)
)
);
And after and UPDATE Query, result would be like this.
| uri |
|--------------------------------------------------------------------------|
| private://webform/applicant_details/152/offers_upload_winners_.png |
| private://webform/applicant_details/153/student_class_teacher_data.pdf |
| private://webform/applicant_details/153/tax_user_s_data_upload.pdf |
| private://webform/applicant_details/154/applicant_details_report_0_2.pdf |
| private://webform/applicant_details/154/india_asia_population_huge.pdf |
Sample data script
DROP TABLE IF EXISTS `demo`;
CREATE TABLE `demo` (
`uri` varchar(255) CHARACTER SET utf8mb3 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT 'The S3 URI of the file.',
`filesize` bigint unsigned NOT NULL DEFAULT '0' COMMENT 'The size of the file in bytes.',
`timestamp` int unsigned NOT NULL DEFAULT '0' COMMENT 'UNIX timestamp for when the file was added.',
`dir` int NOT NULL DEFAULT '0' COMMENT 'Boolean indicating whether or not this object is a directory.',
`version` varchar(255) CHARACTER SET utf8mb3 COLLATE utf8_bin DEFAULT '' COMMENT 'The S3 VersionId of the object.'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `demo` (`uri`, `filesize`, `timestamp`, `dir`, `version`) VALUES
('private://webform/applicant_details/152/offers upload winners .png', 14976905, 1658397516, 0, ''),
('private://webform/applicant_details/153/student : class & teacher data.pdf', 0, 1659525447, 1, ''),
('private://webform/applicant_details/153/tax---user\'s---data__upload.pdf', 98449, 1658397516, 0, ''),
('private://webform/applicant_details/154/Applicant Details _ report_0_2.pdf', 0, 1659525447, 1, ''),
('private://webform/applicant_details/154/india&asia%population huge.pdf', 13301, 1658397517, 0, '');
Big Thanks:
MySQL: SELECT, UPDATE, REPLACE, SUBSTRING_INDEX, LOWER, REGEXP_REPLACE
MySQL Query Formatter: Thanks to CodeBeautify for such an awesome tool.

Create search query

I have a table named Customers with columns FirstName, LastName, Email
Let's pretend I have the customer: John | Williams | johnW1234#gmail.com
Now how do I have to create my query so that if I search for:
"williams" => match
"will john" => Match
"will john 55" => NO match
"will 1234" => match
my query right now looks like:
SELECT * FROM `Customers` WHERE `FirstName` LIKE _search_ OR `LastName` LIKE _search__
But if someone where to look for "will john" then my query will return no matches
Seems like you want to do something like that:
select * from Customers where
(FirstName like concat('%','john', '%') and LastName like concat('%','smith', '%'))
or
(LastName like concat('%','john', '%') and FirstName like concat('%','smith', '%'))
The parts: john and smith (in the query) are the different parts of the search term which is exploded by spaces and modified to lowercase (you can do it either in the code or in the DB).
Link to Fiddle
I think this works:
select * from Customers
where (_search_ regexp '^[^ ]+ [^ ]+$' or _search_ regexp '^[^ ]+$')
and (LastName like concat(substring_index(_search_, ' ', 1), '%'))
or FirstName like concat(substring_index(_search_, ' ', -1), '%')));
Dynamic sql can help you
EXECUTE 'SELECT * FROM Customers WHERE FirstName LIKE ' ||
_search_ || ' OR LastName LIKE ' || _search__ || ';';
"_ search _" should be converted to text (explicitly or not).
Of course, quotation waiting for your attention.
Crudely...
SET #string = 'John|Williams|johnW1234#gmail.com';
SELECT IF(#string LIKE "%will%",IF(#string LIKE "%john%",IF(#string LIKE "%55%",1,0),0),0)x;
+---+
| x |
+---+
| 0 |
+---+
SELECT IF(#string LIKE "%will%",IF(#string LIKE "%john%",IF(#string LIKE "%12%",1,0),0),0)x;
+---+
| x |
+---+
| 1 |
+---+

Can MySQL replace multiple characters?

I'm trying to replace a bunch of characters in a MySQL field. I know the REPLACE function but that only replaces one string at a time. I can't see any appropriate functions in the manual.
Can I replace or delete multiple strings at once? For example I need to replace spaces with dashes and remove other punctuation.
You can chain REPLACE functions:
select replace(replace('hello world','world','earth'),'hello','hi')
This will print hi earth.
You can even use subqueries to replace multiple strings!
select replace(london_english,'hello','hi') as warwickshire_english
from (
select replace('hello world','world','earth') as london_english
) sub
Or use a JOIN to replace them:
select group_concat(newword separator ' ')
from (
select 'hello' as oldword
union all
select 'world'
) orig
inner join (
select 'hello' as oldword, 'hi' as newword
union all
select 'world', 'earth'
) trans on orig.oldword = trans.oldword
I'll leave translation using common table expressions as an exercise for the reader ;)
Cascading is the only simple and straight-forward solution to mysql for multiple character replacement.
UPDATE table1
SET column1 = replace(replace(REPLACE(column1, '\r\n', ''), '<br />',''), '<\r>','')
REPLACE does a good simple job of replacing characters or phrases everywhere they appear in a string. But when cleansing punctuation you may need to look for patterns - e.g. a sequence of whitespace or characters in the middle of a word or after a full stop. If that's the case, a regular expression replace function would be much more powerful.
UPDATE: If using MySQL version 8+, a REGEXP_REPLACE function is provided and can be invoked as follows:
SELECT txt,
REGEXP_REPLACE(REPLACE(txt, ' ', '-'),
'[^a-zA-Z0-9-]+',
'') AS `reg_replaced`
FROM test;
See this DB Fiddle online demo.
PREVIOUS ANSWER - only read on if using a version of MySQL before version 8: .
The bad news is MySQL doesn't provide such a thing but the good news is it's possible to provide a workaround - see this blog post.
Can I replace or delete multiple strings at once? For example I need
to replace spaces with dashes and remove other punctuation.
The above can be achieved with a combination of the regular expression replacer and the standard REPLACE function. It can be seen in action in this online Rextester demo.
SQL (excluding the function code for brevity):
SELECT txt,
reg_replace(REPLACE(txt, ' ', '-'),
'[^a-zA-Z0-9-]+',
'',
TRUE,
0,
0
) AS `reg_replaced`
FROM test;
CREATE FUNCTION IF NOT EXISTS num_as_word (name TEXT) RETURNS TEXT RETURN
(
SELECT
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(IFNULL(name, ''),
'1', 'one'),
'2', 'two'),
'3', 'three'),
'4', 'four'),
'5', 'five'),
'6', 'six'),
'7', 'seven'),
'8', 'eight'),
'9', 'nine')
);
I've been using lib_mysqludf_preg for this which allows you to:
Use PCRE regular expressions directly in MySQL
With this library installed you could do something like this:
SELECT preg_replace('/(\\.|com|www)/','','www.example.com');
Which would give you:
example
on php
$dataToReplace = [1 => 'one', 2 => 'two', 3 => 'three'];
$sqlReplace = '';
foreach ($dataToReplace as $key => $val) {
$sqlReplace = 'REPLACE(' . ($sqlReplace ? $sqlReplace : 'replace_field') . ', "' . $key . '", "' . $val . '")';
}
echo $sqlReplace;
result
REPLACE(
REPLACE(
REPLACE(replace_field, "1", "one"),
"2", "two"),
"3", "three");
UPDATE schools SET
slug = lower(name),
slug = REPLACE(slug, '|', ' '),
slug = replace(slug, '.', ' '),
slug = replace(slug, '"', ' '),
slug = replace(slug, '#', ' '),
slug = replace(slug, ',', ' '),
slug = replace(slug, '\'', ''),
slug = trim(slug),
slug = replace(slug, ' ', '-'),
slug = replace(slug, '--', '-');
UPDATE schools SET
slug = replace(slug, '--', '-');
If you are using MySQL Version 8+ then below is the built-in function that might help you better.
String
Replace
Output
w"w\'w. ex%a&m:p l–e.c)o(m
"'%&:)(–
www.example.com
MySQL Query:
SELECT REGEXP_REPLACE('`w"w\'w. ex%a&m:p l–e.c)o(m`', '[("\'%[:blank:]&:–)]', '');
Almost for all bugging characters-
SELECT REGEXP_REPLACE(column, '[\("\'%[[:blank:]]&:–,#$#!;\\[\\]\)<>\?\*\^]+','')
Real-life scenario.
I had to update all the files name which has been saved in 'demo' with special characters.
SELECT * FROM demo;
| uri |
|------------------------------------------------------------------------------|
| private://webform/applicant_details/129/offers upload winners .png |
| private://webform/applicant_details/129/student : class & teacher data.pdf |
| private://webform/applicant_details/130/tax---user's---data__upload.pdf |
| private://webform/applicant_details/130/Applicant Details _ report_0_2.pdf |
| private://webform/applicant_details/131/india&asia%population huge.pdf |
Test Case -
The table has multiple rows with special characters in the file name.
Advice:
To remove all the special characters from the file name and use a-z, A-Z, 0-9, dot and underscore with a lower file name.
Expected result is:
| uri |
|------------------------------------------------------------------------------|
| private://webform/applicant_details/129/offers_upload_winners_.png |
| private://webform/applicant_details/129/student_class_teacher_data.pdf |
| private://webform/applicant_details/130/tax_user_s_data_upload.pdf |
| private://webform/applicant_details/130/applicant_details_report_0_2.pdf |
| private://webform/applicant_details/131/india_asia_population_huge.pdf |
Okay, let's plan step by step
1st - let's find the file name
2nd - run all the find replace on that file name part only
3rd - replace the new file name with an old one
How can we do this?
Let's break down the whole action in chunks for a better understanding.
Below function will extract the file name only from the full path e.g. "Applicant Details _ report_0_2.pdf"
SELECT -- MySQL SELECT statement
SUBSTRING_INDEX -- MySQL built-in function
( -- Function start Parentheses
uri, -- my table column
'/', -- delimiter (the last / in full path; left to right ->)
-1 -- start from the last and find the 1st one (from right to left <-)
) -- Function end Parentheses
from -- MySQL FROM statement
demo; -- My table name
#1 Query result
| uri |
|------------------------------------|
| offers upload winners .png |
| student : class & teacher data.pdf |
| tax---user's---data__upload.pdf |
| Applicant Details _ report_0_2.pdf |
| india&asia%population huge.pdf |
Now we have to find and replace within the generated file name result.
SELECT
REGEXP_REPLACE( -- MySQL REGEXP_REPLACE built-in function (string, pattern, replace)
SUBSTRING_INDEX(uri, '/', -1), -- File name only
'[^a-zA-Z0-9_.]+', -- Find everything which is not a-z, A-Z, 0-9, . or _.
'_' -- Replace with _
) AS uri -- Give a alias column name for whole result
from
demo;
#2 Query result
| uri |
|------------------------------------|
| offers_upload_winners_.png |
| student_class_teacher_data.pdf |
| tax_user_s_data__upload.pdf |
| Applicant_Details___report_0_2.pdf |
| india_asia_population_huge.pdf |
FYI - Last '+' in the pattern is for repetitive words like ---- or multiple spaces ' ', Notice the result without '+' in the below regex pattern.
SELECT
REGEXP_REPLACE( -- MySQL REGEXP_REPLACE built-in function (string, pattern, replace)
SUBSTRING_INDEX(uri, '/', -1), -- File name only
'[^a-zA-Z0-9_.]', -- Find everything which is not a-z, A-Z, 0-9, . or _.
'_' -- Replace with _
) AS uri -- Give a alias column name for whole result
from
demo;
#3 Query result
| uri |
|------------------------------------|
| offers___upload__winners_.png |
| student___class___teacher_data.pdf |
| tax___user_s___data__upload.pdf |
| Applicant_Details___report_0_2.pdf |
| india_asia_population__huge.pdf |
Now, we have a file name without special characters (. and _ allowed). But the problem is file name still has Capital letters and also has multiple underscores.
Let's lower the file name first.
SELECT
LOWER(
REGEXP_REPLACE(
SUBSTRING_INDEX(uri, '/', -1),
'[^a-zA-Z0-9_.]',
'_'
)
) AS uri
from
demo;
#4 Query result
| uri |
|------------------------------------|
| offers_upload_winners_.png |
| student_class_teacher_data.pdf |
| tax_user_s_data__upload.pdf |
| applicant_details___report_0_2.pdf |
| india_asia_population_huge.pdf |
Now everything is in lower case but underscores are still there. So we will wrap the whole REGEX.. with one more REGEX..
SELECT
LOWER(
REGEXP_REPLACE( -- this wrapper will solve the multiple underscores issue
REGEXP_REPLACE(
SUBSTRING_INDEX(uri, '/', -1),
'[^a-zA-Z0-9_.]+',
'_'
),
'[_]+', -- if 1st regex action has multiple __ then find it
'_' -- and replace them with single _
)
) AS uri
from
demo;
#5 Query result
| uri |
|----------------------------------|
| offers_upload_winners_.png |
| student_class_teacher_data.pdf |
| tax_user_s_data_upload.pdf |
| applicant_details_report_0_2.pdf |
| india_asia_population_huge.pdf |
Congratulations! we have found what we were looking for. Now UPDATE TIME! Yeah!!
UPDATE -- run a MySQL UPDATE statement
demo -- tell MySQL to which table you want to update
SET -- put SET statement to set the updated values in desire column
uri = REPLACE( -- tell MySQL to which column you want to update,
-- I am also putting REPLACE function to replace existing values with new one
-- REPLACE (string, replace, with-this)
uri, -- my column to replace
SUBSTRING_INDEX(uri, '/', -1), -- my file name part "Applicant Details _ report_0_2.pdf"
-- without doing any action
LOWER( -- "applicant_details_report_0_2.pdf"
REGEXP_REPLACE( -- "Applicant_Details_report_0_2.pdf"
REGEXP_REPLACE( -- "Applicant_Details___report_0_2.pdf"
SUBSTRING_INDEX(uri, '/', -1), -- "Applicant Details _ report_0_2.pdf"
'[^a-zA-Z0-9_.]+',
'_'
),
'[_]+',
'_'
)
)
);
And after and UPDATE Query, result would be like this.
| uri |
|--------------------------------------------------------------------------|
| private://webform/applicant_details/152/offers_upload_winners_.png |
| private://webform/applicant_details/153/student_class_teacher_data.pdf |
| private://webform/applicant_details/153/tax_user_s_data_upload.pdf |
| private://webform/applicant_details/154/applicant_details_report_0_2.pdf |
| private://webform/applicant_details/154/india_asia_population_huge.pdf |
Sample data script
DROP TABLE IF EXISTS `demo`;
CREATE TABLE `demo` (
`uri` varchar(255) CHARACTER SET utf8mb3 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT 'The S3 URI of the file.',
`filesize` bigint unsigned NOT NULL DEFAULT '0' COMMENT 'The size of the file in bytes.',
`timestamp` int unsigned NOT NULL DEFAULT '0' COMMENT 'UNIX timestamp for when the file was added.',
`dir` int NOT NULL DEFAULT '0' COMMENT 'Boolean indicating whether or not this object is a directory.',
`version` varchar(255) CHARACTER SET utf8mb3 COLLATE utf8_bin DEFAULT '' COMMENT 'The S3 VersionId of the object.'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `demo` (`uri`, `filesize`, `timestamp`, `dir`, `version`) VALUES
('private://webform/applicant_details/152/offers upload winners .png', 14976905, 1658397516, 0, ''),
('private://webform/applicant_details/153/student : class & teacher data.pdf', 0, 1659525447, 1, ''),
('private://webform/applicant_details/153/tax---user\'s---data__upload.pdf', 98449, 1658397516, 0, ''),
('private://webform/applicant_details/154/Applicant Details _ report_0_2.pdf', 0, 1659525447, 1, ''),
('private://webform/applicant_details/154/india&asia%population huge.pdf', 13301, 1658397517, 0, '');
Big Thanks:
MySQL: SELECT, UPDATE, REPLACE, SUBSTRING_INDEX, LOWER, REGEXP_REPLACE
MySQL Query Formatter: Thanks to CodeBeautify for such an awesome tool.