MySQL simple query to extract substring given 2 delimiters - mysql

Specifically I need to pick the part of field label from table tmp.label between delimiters <!-- rwbl_1 --> and <!-- rwbl_2 --> where label contains <span in order to be able to update that field for records erroneously formatted with HTML tags.
The result amounts to something like strip_tags. This is, obviously, only possible due to the presence of the abovementioned (or similar) delimiters.

Here is a simple solution for the abovementioned case to extract a substring given 2 delimiters.
SELECT
SUBSTRING_INDEX(
SUBSTRING_INDEX(label,
'<!-- rwbl_1 -->', -1),
'<!-- rwbl_2 -->', 1) AS cutout
FROM tmp.label
WHERE label LIKE '<span%'
ORDER BY 1
LIMIT 1;
More abstract:
SELECT
SUBSTRING_INDEX(
SUBSTRING_INDEX(string_field, # field name
'delimiter_1', -1), # take the right part of original
'<!-- delimiter_2 -->', 1) # take the left part of the resulting substring
AS cutout
FROM my_table
WHERE my_condition;
And the update now works like this:
UPDATE my_table
SET string_field = SUBSTRING_INDEX(
SUBSTRING_INDEX(string_field,
'delimiter_1', -1),
'<!-- delimiter_2 -->', 1)
WHERE my_condition;

Related

using a regex to remove invalid values from image

working on a code where we are storing images but some images ending with weird characters
like , %2C -x1 to x10 etc or more but always end wih a .jpg
how can i regex to replace the image name to be a valid name
here is an example of what i have
PCpaste_10_g,-X1,-X2,-X3
SNBar_NEW,-X1
they can go till -X10
so i want to have regex to remove ,and everything afterwards it
i tried using replace but that only works for one item at a time
If your data is consistent with the string before the first comma that need to be taken, then you can try with SUBSTRING_INDEX;
Let's use this as you sample table & using your sample data:
CREATE TABLE mytable (
val VARCHAR(255));
INSERT INTO mytable VALUES
('PCpaste_10_g,-X1,-X2,-X3.jpg'),
('SNBar_NEW,-X1.jpg');
val
PCpaste_10_g,-X1,-X2,-X3.jpg
SNBar_NEW,-X1.jpg
Then first you extract the first string before comma occurrence:
SELECT SUBSTRING_INDEX(val,',',1) extracted
FROM mytable
returns
extracted
PCpaste_10_g
SNBar_NEW
Then to add back .jpg:
SELECT CONCAT(SUBSTRING_INDEX(val,',',1),'.jpg') extracted_combined
FROM mytable
IF your image extension is not consistently .jpg, you can do another SUBSTRING_INDEX() to get the extension then CONCAT() them:
SELECT CONCAT(SUBSTRING_INDEX(val,',',1) ,'.',
SUBSTRING_INDEX(val,'.',-1)) Extracted_combined
FROM mytable;
Demo fiddle
You can use LOCATE to find the first occurrence of "," in the field and LEFT to grab everything up to the first "," -
SET #value := 'PCpaste_10_g,-X1,-X2,-X3';
SELECT CONCAT(LEFT(#value, LOCATE(',', #value) - 1), '.jpg');
or for your update -
UPDATE <table>
SET image_name = CONCAT(LEFT(image_name, LOCATE(',', image_name) - 1), '.jpg')
WHERE image_name LIKE '%,%';
or to handle your %2C at the same time -
UPDATE <table>
SET image_name = CASE
WHEN image_name LIKE '%,%'
THEN CONCAT(LEFT(image_name, LOCATE(',', image_name) - 1), '.jpg')
WHEN image_name LIKE '%\%2C%'
THEN CONCAT(LEFT(image_name, LOCATE('%2C', image_name) - 1), '.jpg')
END
WHERE image_name LIKE '%,%'
OR image_name LIKE '%\%2C%';

Wrong flag setting for SUBSTRING_INDEX operator

I compiled a query that chooses a link from the 'ss' column:
'ss' column:
<img data-src="https://asd.com/123.jpg" srcset="https://asd.com/234.jpg">
<img data-src="https://asd.com/123.jpg" srcset="https://asd.com/234.jpg">
<img data-src="https://asd.com/123.jpg" srcset="https://asd.com/234.jpg">
Here is my SQL:
UPDATE post SET still1 = SUBSTRING_INDEX( SUBSTRING_INDEX( ss,'nk\"><img data-src=\"',-3 ),'.jpg\" s',1 )
UPDATE post SET still2 = SUBSTRING_INDEX( SUBSTRING_INDEX( ss,'nk\"><img data-src=\"',-2 ),'.jpg\" s',1 )
UPDATE post SET still3 = SUBSTRING_INDEX( SUBSTRING_INDEX( ss,'nk\"><img data-src=\"',-1 ),'.jpg\" s',1 )
I refer to the first link using the -3 flag. When the number of links is three, everything works fine. But when there are fewer of them (there cannot be more than three of them) - an incorrect result is issued. Is there an alternative to the first link? (for cases when links, only 2 for example).
Here is one and non-working options that I went through:
UPDATE post SET still1 = SUBSTRING_INDEX( SUBSTRING_INDEX( ss,'nk\"><img data-src=\"',1 ),'.jpg\" s',-1 )
I can not understand why this option does not work.
UPD:
Okey, here is a simple query that I was looking for:
UPDATE post SET still1 = SUBSTRING_INDEX( SUBSTRING_INDEX( ss,'.jpg\" s',1 ),'nk\"><img data-src=\"',-1 )
UPDATE post SET still2 = SUBSTRING_INDEX( SUBSTRING_INDEX( ss,'.jpg\" s',2 ),'nk\"><img data-src=\"',-1 )
...
However, I still did not understand the recursive MySQL logic.

substring_index skips delimiter from right

I have a table 'car_purchases' with a 'description' column. The column is a string that includes first name initial followed by full stop, space and last name.
An example of the Description column is
'Car purchased by J. Blow'
I am using 'substring_index' function to extract the letter preceding the '.' in the column string. Like so:
SELECT
Description,
SUBSTRING_INDEX(Description, '.', 1) as TrimInitial,
SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1) as trimmed,
length(SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1)) as length
from car_purchases;
I will call this query 1.
picture of the result set (Result 1) is as follows
As you can see the problem is that the 'trimmed' column in the select statement starts counting the 2nd delimiter ' ' instead of the first from the right and produces the result 'by J' instead of just 'J'. Further the length column indicates that the string length is 5 instead of 4 so WTF?
However when I perform the following select statement;
select SUBSTRING_INDEX(
SUBSTRING_INDEX('Car purchased by J. Blow', '.', 1),' ', -1); -- query 2
Result = 'J' as 'Result 2'.
As you can see from result 1 the string in column 'Description' is exactly (as far as I can tell) the same as the string from 'Result 2'. But when the substring_index is performed on the column (instead of just the string itself) the result ignores the first delimiter and selects a string from the 2nd delimiter from the right of the string.
I've racked my brains over this and have tried 'by ' and ' by' as delimiters but both options do not produce the desired result of a single character. I do not want to add further complexity to query 1 by using a trim function. I've also tried the cast function on result column 'trimmed' but still no success. I do not want to concat it either.
There is an anomaly in the 'length' column of query 1 where if I change the length function to char_length function like so:
select length(SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1)) as length -- result = 5
select char_length(SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1)) as length -- result = 4
Can anyone please explain to me why the above select statement would produce 2 different results? I think this is the reason why I am not getting my desired result.
But just to be clear my desired outcome is to get 'J' not 'by J'.
I guess I could try reverse but I dont think this is an acceptable compromise. Also I am not familiar with collation and charset principles except that I just use the defaults.
Cheers Players!!!!
CHAR_LENGTH returns length in characters, so a string with 4 2-byte characters would return 4. LENGTH however returns length in bytes, so a string with 4 2-byte characters would return 8. The discrepancy in your results (including SUBSTRING_INDEX) says that the "space" between by and J is not actually a single-byte space (ASCII 0x20) but a 2-byte character that looks like a space. To workaround this, you could try replacing all unicode characters with spaces using CONVERT and REPLACE. In this example, I have an en-space unicode character in the string between by and J. The CONVERT changes that to a ?, and the REPLACE then converts that to a space:
SELECT SUBSTRING_INDEX( SUBSTRING_INDEX("Car purchased by J. Blow", '.', 1),' ', -1)
Output:
by J
With CONVERT and REPLACE:
SELECT SUBSTRING_INDEX( SUBSTRING_INDEX(REPLACE(CONVERT("Car purchased by J. Blow" USING ASCII), '?', ' '), '.', 1),' ', -1)
Output
J
For your query, you would replace the string with your column name i.e.
SELECT SUBSTRING_INDEX( SUBSTRING_INDEX(REPLACE(CONVERT(description USING ASCII), '?', ' '), '.', 1),' ', -1)
Demo on DBFiddle

Remove trailing spaces and add them as leading spaces

I would like to remove the trailing spaces from the expressions in my column and add them to beginning of the expression. So for instance, I currently have the expressions:
Sample_four_space
Sample_two_space
Sample_one_space
I would like to transform this column into:
Sample_four_space
Sample_two_space
Sample_one_space
I have tried this expression:
UPDATE My_Table
SET name = REPLACE(name,'% ',' %')
However, I would like a more robust query that would work for any length of trailing spaces. Can you help me develop a query that will remove all trailing spaces and add them to the beginning of the expression?
If you know all spaces are at the end (as in your example, then you can count them and put them at the beginning:
select concat(space(length(name) - length(replace(name, ' ', ''))),
replace(name, ' ', '')
)
Otherwise the better solution is:
select concat(space( length(name) - length(trim(trailing ' ' from name)) ),
trim(trailing ' ' from name)
)
or:
select concat(space( length(name) - length(rtrim(name)) ),
rtrim(name)
)
Both these cases count the number of spaces (in or at the end of). The space() function then replicates the spaces and concat() puts them at the beginning.

Search for text between delimiters in MySQL

I am trying to extract a certain part of a column that is between delimiters.
e.g. find foo in the following
test 'esf :foo: bar
So in the above I'd want to return foo, but all the regexp functions only return true|false,
is there a way to do this in MySQL
Here ya go, bud:
SELECT
SUBSTR(column,
LOCATE(':',column)+1,
(CHAR_LENGTH(column) - LOCATE(':',REVERSE(column)) - LOCATE(':',column)))
FROM table
Yea, no clue why you're doing this, but this will do the trick.
By performing a LOCATE, we can find the first ':'. To find the last ':', there's no reverse LOCATE, so we have to do it manually by performing a LOCATE(':', REVERSE(column)).
With the index of the first ':', the number of chars from the last ':' to the end of the string, and the CHAR_LENGTH (don't use LENGTH() for this), we can use a little math to discover the length of the string between the two instances of ':'.
This way we can peform a SUBSTR and dynamically pluck out the characters between the two ':'.
Again, it's gross, but to each his own.
This should work if the two delimiters only appear twice in your column. I am doing something similar...
substring_index(substring_index(column,':',-2),':',1)
A combination of LOCATE and MID would probably do the trick.
If the value "test 'esf :foo: bar" was in the field fooField:
MID( fooField, LOCATE('foo', fooField), 3);
I don't know if you have this kind of authority, but if you have to do queries like this it might be time to renormalize your tables, and have these values in a lookup table.
With only one set of delimeters, the following should work:
SUBSTR(
SUBSTR(fooField,LOCATE(':',fooField)+1),
1,
LOCATE(':',SUBSTR(fooField,LOCATE(':',fooField)+1))-1
)
mid(col,
locate('?m=',col) + char_length('?m='),
locate('&o=',col) - locate('?m=',col) - char_length('?m=')
)
A bit compact form by replacing char_length(.) with the number 3
mid(col, locate('?m=',col) + 3, locate('&o=',col) - locate('?m=',col) - 3)
the patterns I have used are '?m=' and '&o'.
select mid(col from locate(':',col) + 1 for
locate(':',col,locate(':',col)+1)-locate(':',col) - 1 )
from table where col rlike ':.*:';
If you know the position you want to extract from as opposed to what the data itself is:
$colNumber = 2; //2nd position
$sql = "REPLACE(SUBSTRING(SUBSTRING_INDEX(fooField, ':', $colNumber),
LENGTH(SUBSTRING_INDEX(fooField,
':',
$colNumber - 1)) + 1)";
This is what I am extracting from (mainly colon ':' as delimiter but some exceptions), as column theline255 in table loaddata255:
23856.409:0023:trace:message:SPY_EnterMessage (0x2003a) L"{#32769}" [0081] WM_NCCREATE sent from self wp=00000000 lp=0023f0b0
This is the MySql code (It quickly did what I want, and is straight forward):
select
time('2000-01-01 00:00:00' + interval substring_index(theline255, '.', 1) second) as hhmmss
, substring_index(substring_index(theline255, ':', 1), '.', -1) as logMilli
, substring_index(substring_index(theline255, ':', 2), ':', -1) as logTid
, substring_index(substring_index(theline255, ':', 3), ':', -1) as logType
, substring_index(substring_index(theline255, ':', 4), ':', -1) as logArea
, substring_index(substring_index(theline255, ' ', 1), ':', -1) as logFunction
, substring(theline255, length(substring_index(theline255, ' ', 1)) + 2) as logText
from loaddata255
and this is the result:
# LogTime, LogTimeMilli, LogTid, LogType, LogArea, LogFunction, LogText
'06:37:36', '409', '0023', 'trace', 'message', 'SPY_EnterMessage', '(0x2003a) L\"{#32769}\" [0081] WM_NCCREATE sent from self wp=00000000 lp=0023f0b0'
This one looks elegant to me. Strip all after n-th separator, rotate string, strip everything after 1. separator, rotate back.
select
reverse(
substring_index(
reverse(substring_index(str,separator,substrindex)),
separator,
1)
);
For example:
select
reverse(
substring_index(
reverse(substring_index('www.mysql.com','.',2)),
'.',
1
)
);
you can use the substring / locate function in 1 command
here is a mice tutorial:
http://infofreund.de/mysql-select-substring-2-different-delimiters/
The command as describes their should look for u:
**SELECT substr(text,Locate(' :', text )+2,Locate(': ', text )-(Locate(' :', text )+2)) FROM testtable**
where text is the textfield which contains "test 'esf :foo: bar"
So foo can be fooooo or fo - the length doesnt matter :).