Related
I have a table 'car_purchases' with a 'description' column. The column is a string that includes first name initial followed by full stop, space and last name.
An example of the Description column is
'Car purchased by J. Blow'
I am using 'substring_index' function to extract the letter preceding the '.' in the column string. Like so:
SELECT
Description,
SUBSTRING_INDEX(Description, '.', 1) as TrimInitial,
SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1) as trimmed,
length(SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1)) as length
from car_purchases;
I will call this query 1.
picture of the result set (Result 1) is as follows
As you can see the problem is that the 'trimmed' column in the select statement starts counting the 2nd delimiter ' ' instead of the first from the right and produces the result 'by J' instead of just 'J'. Further the length column indicates that the string length is 5 instead of 4 so WTF?
However when I perform the following select statement;
select SUBSTRING_INDEX(
SUBSTRING_INDEX('Car purchased by J. Blow', '.', 1),' ', -1); -- query 2
Result = 'J' as 'Result 2'.
As you can see from result 1 the string in column 'Description' is exactly (as far as I can tell) the same as the string from 'Result 2'. But when the substring_index is performed on the column (instead of just the string itself) the result ignores the first delimiter and selects a string from the 2nd delimiter from the right of the string.
I've racked my brains over this and have tried 'by ' and ' by' as delimiters but both options do not produce the desired result of a single character. I do not want to add further complexity to query 1 by using a trim function. I've also tried the cast function on result column 'trimmed' but still no success. I do not want to concat it either.
There is an anomaly in the 'length' column of query 1 where if I change the length function to char_length function like so:
select length(SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1)) as length -- result = 5
select char_length(SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1)) as length -- result = 4
Can anyone please explain to me why the above select statement would produce 2 different results? I think this is the reason why I am not getting my desired result.
But just to be clear my desired outcome is to get 'J' not 'by J'.
I guess I could try reverse but I dont think this is an acceptable compromise. Also I am not familiar with collation and charset principles except that I just use the defaults.
Cheers Players!!!!
CHAR_LENGTH returns length in characters, so a string with 4 2-byte characters would return 4. LENGTH however returns length in bytes, so a string with 4 2-byte characters would return 8. The discrepancy in your results (including SUBSTRING_INDEX) says that the "space" between by and J is not actually a single-byte space (ASCII 0x20) but a 2-byte character that looks like a space. To workaround this, you could try replacing all unicode characters with spaces using CONVERT and REPLACE. In this example, I have an en-space unicode character in the string between by and J. The CONVERT changes that to a ?, and the REPLACE then converts that to a space:
SELECT SUBSTRING_INDEX( SUBSTRING_INDEX("Car purchased by J. Blow", '.', 1),' ', -1)
Output:
by J
With CONVERT and REPLACE:
SELECT SUBSTRING_INDEX( SUBSTRING_INDEX(REPLACE(CONVERT("Car purchased by J. Blow" USING ASCII), '?', ' '), '.', 1),' ', -1)
Output
J
For your query, you would replace the string with your column name i.e.
SELECT SUBSTRING_INDEX( SUBSTRING_INDEX(REPLACE(CONVERT(description USING ASCII), '?', ' '), '.', 1),' ', -1)
Demo on DBFiddle
I have a column which returns
a:2:{i:0;s:10:"Properties";i:1;s:14:"Movable Assets";}
I would like to return only:
Properties, Movable Assets
How can I use a select statement to retrieve the values between the " symbols
these are serialize values, you can use php, to get your desired results.
you can use , unserialize which will return an array then you can use implode to get the comma separated values.
example
Use SUBSTRING_INDEX().
SUBSTRING_INDEX() takes a string argument followed by a delimiter character and the number of parts to return. After you break up the string using the delimiter, that number of parts is returned as a single string.
select concat(
SUBSTRING_INDEX(
SUBSTRING_INDEX(
SUBSTRING_INDEX(
'a:2:{i:0;s:10:"Properties";i:1;s:14:"Movable Assets";}',
'"',
4
),
'"',
2
),
'"',
-1
),
",",
SUBSTRING_INDEX(
SUBSTRING_INDEX(
SUBSTRING_INDEX(
'a:2:{i:0;s:10:"Properties";i:1;s:14:"Movable Assets";}',
'"',
4
),
'"',
4
),
'"',
-1
)
);
Use combination of LOCATE() and SUBSTRING().
Definitions: https://dev.mysql.com/doc/refman/5.0/en/string-functions.html
Or better -- migrate the data to actually be retrievable.
I have a column that has comma separated data:
1,2,3
3,2,1
4,5,6
5,5,5
I'm trying to run a search that would query each value of the CSV string individually.
0<first<5 and 1<second<3 and 2<third<4
I get that I could return all queries and split it myself and compare it myself. I'm curious if there is a way to do this so MySQL does that processing work.
Thanks!
Use
substring_index(`column`,',',1) ==> first value
substring_index(substring_index(`column`,',',-2),',',1)=> second value
substring_index(substring_index(`column`,',',-1),',',1)=> third value
in your where clause.
SELECT * FROM `table`
WHERE
substring_index(`column`,',',1)<0
AND
substring_index(`column`,',',1)>5
It seems to work:
substring_index ( substring_index ( context,',',1 ), ',', -1)
substring_index ( substring_index ( context,',',2 ), ',', -1)
substring_index ( substring_index ( context,',',3 ), ',', -1)
substring_index ( substring_index ( context,',',4 ), ',', -1)
it means 1st value, 2nd, 3rd, etc.
Explanation:
The inner substring_index returns the first n values that are comma separated. So if your original string is "34,7,23,89", substring_index( context,',', 3) returns "34,7,23".
The outer substring_index takes the value returned by the inner substring_index and the -1 allows you to take the last value. So you get "23" from the "34,7,23".
Instead of -1 if you specify -2, you'll get "7,23", because it took the last two values.
Example:
select * from MyTable where substring_index(substring_index(prices,',',1),',',-1)=3382;
Here, prices is the name of a column in MyTable.
Usually substring_index does what you want:
mysql> select substring_index("foo#gmail.com","#",-1);
+-----------------------------------------+
| substring_index("foo#gmail.com","#",-1) |
+-----------------------------------------+
| gmail.com |
+-----------------------------------------+
1 row in set (0.00 sec)
You may get what you want by using the MySQL REGEXP or LIKE.
See the MySQL Docs on Pattern Matching
As an addendum to this, I've strings of the form:
Some words 303
where I'd like to split off the numerical part from the tail of the string.
This seems to point to a possible solution:
http://lists.mysql.com/mysql/222421
The problem however, is that you only get the answer "yes, it matches", and not the start index of the regexp match.
Here is another variant I posted on related question. The REGEX check to see if you are out of bounds is useful, so for a table column you would put it in the where clause.
SET #Array = 'one,two,three,four';
SET #ArrayIndex = 2;
SELECT CASE
WHEN #Array REGEXP CONCAT('((,).*){',#ArrayIndex,'}')
THEN SUBSTRING_INDEX(SUBSTRING_INDEX(#Array,',',#ArrayIndex+1),',',-1)
ELSE NULL
END AS Result;
SUBSTRING_INDEX(string, delim, n) returns the first n
SUBSTRING_INDEX(string, delim, -1) returns the last only
REGEXP '((delim).*){n}' checks if there are n delimiters (i.e. you are in bounds)
Building on #Oleksiy's answer, here is one that can work with strings of variable segment lengths (within reasonable limits), for example comma-separated addresses:
SELECT substring_index ( substring_index ( address,',',1 ), ',', -1) AS address_line_1,
IF(address_parts > 1, substring_index ( substring_index ( address,',',2 ), ',', -1), '') AS address_line_2,
IF(address_parts > 2, substring_index ( substring_index ( address,',',3 ), ',', -1), '') AS address_line_3,
IF(address_parts > 3, substring_index ( substring_index ( address,',',4 ), ',', -1), '') AS address_line_4,
IF(address_parts > 4, substring_index ( substring_index ( address,',',5 ), ',', -1), '') AS address_line_5
FROM (
SELECT address, LENGTH(address) - LENGTH(REPLACE(address, ',', '')) AS address_parts
FROM mytable
) AS addresses
It's working..
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(
SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(col,'1', 1), '2', 1), '3', 1), '4', 1), '5', 1), '6', 1)
, '7', 1), '8', 1), '9', 1), '0', 1) as new_col
FROM table_name group by new_col;
I have a column that has comma separated data:
1,2,3
3,2,1
4,5,6
5,5,5
I'm trying to run a search that would query each value of the CSV string individually.
0<first<5 and 1<second<3 and 2<third<4
I get that I could return all queries and split it myself and compare it myself. I'm curious if there is a way to do this so MySQL does that processing work.
Thanks!
Use
substring_index(`column`,',',1) ==> first value
substring_index(substring_index(`column`,',',-2),',',1)=> second value
substring_index(substring_index(`column`,',',-1),',',1)=> third value
in your where clause.
SELECT * FROM `table`
WHERE
substring_index(`column`,',',1)<0
AND
substring_index(`column`,',',1)>5
It seems to work:
substring_index ( substring_index ( context,',',1 ), ',', -1)
substring_index ( substring_index ( context,',',2 ), ',', -1)
substring_index ( substring_index ( context,',',3 ), ',', -1)
substring_index ( substring_index ( context,',',4 ), ',', -1)
it means 1st value, 2nd, 3rd, etc.
Explanation:
The inner substring_index returns the first n values that are comma separated. So if your original string is "34,7,23,89", substring_index( context,',', 3) returns "34,7,23".
The outer substring_index takes the value returned by the inner substring_index and the -1 allows you to take the last value. So you get "23" from the "34,7,23".
Instead of -1 if you specify -2, you'll get "7,23", because it took the last two values.
Example:
select * from MyTable where substring_index(substring_index(prices,',',1),',',-1)=3382;
Here, prices is the name of a column in MyTable.
Usually substring_index does what you want:
mysql> select substring_index("foo#gmail.com","#",-1);
+-----------------------------------------+
| substring_index("foo#gmail.com","#",-1) |
+-----------------------------------------+
| gmail.com |
+-----------------------------------------+
1 row in set (0.00 sec)
You may get what you want by using the MySQL REGEXP or LIKE.
See the MySQL Docs on Pattern Matching
As an addendum to this, I've strings of the form:
Some words 303
where I'd like to split off the numerical part from the tail of the string.
This seems to point to a possible solution:
http://lists.mysql.com/mysql/222421
The problem however, is that you only get the answer "yes, it matches", and not the start index of the regexp match.
Here is another variant I posted on related question. The REGEX check to see if you are out of bounds is useful, so for a table column you would put it in the where clause.
SET #Array = 'one,two,three,four';
SET #ArrayIndex = 2;
SELECT CASE
WHEN #Array REGEXP CONCAT('((,).*){',#ArrayIndex,'}')
THEN SUBSTRING_INDEX(SUBSTRING_INDEX(#Array,',',#ArrayIndex+1),',',-1)
ELSE NULL
END AS Result;
SUBSTRING_INDEX(string, delim, n) returns the first n
SUBSTRING_INDEX(string, delim, -1) returns the last only
REGEXP '((delim).*){n}' checks if there are n delimiters (i.e. you are in bounds)
Building on #Oleksiy's answer, here is one that can work with strings of variable segment lengths (within reasonable limits), for example comma-separated addresses:
SELECT substring_index ( substring_index ( address,',',1 ), ',', -1) AS address_line_1,
IF(address_parts > 1, substring_index ( substring_index ( address,',',2 ), ',', -1), '') AS address_line_2,
IF(address_parts > 2, substring_index ( substring_index ( address,',',3 ), ',', -1), '') AS address_line_3,
IF(address_parts > 3, substring_index ( substring_index ( address,',',4 ), ',', -1), '') AS address_line_4,
IF(address_parts > 4, substring_index ( substring_index ( address,',',5 ), ',', -1), '') AS address_line_5
FROM (
SELECT address, LENGTH(address) - LENGTH(REPLACE(address, ',', '')) AS address_parts
FROM mytable
) AS addresses
It's working..
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(
SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(col,'1', 1), '2', 1), '3', 1), '4', 1), '5', 1), '6', 1)
, '7', 1), '8', 1), '9', 1), '0', 1) as new_col
FROM table_name group by new_col;
I am trying to extract a certain part of a column that is between delimiters.
e.g. find foo in the following
test 'esf :foo: bar
So in the above I'd want to return foo, but all the regexp functions only return true|false,
is there a way to do this in MySQL
Here ya go, bud:
SELECT
SUBSTR(column,
LOCATE(':',column)+1,
(CHAR_LENGTH(column) - LOCATE(':',REVERSE(column)) - LOCATE(':',column)))
FROM table
Yea, no clue why you're doing this, but this will do the trick.
By performing a LOCATE, we can find the first ':'. To find the last ':', there's no reverse LOCATE, so we have to do it manually by performing a LOCATE(':', REVERSE(column)).
With the index of the first ':', the number of chars from the last ':' to the end of the string, and the CHAR_LENGTH (don't use LENGTH() for this), we can use a little math to discover the length of the string between the two instances of ':'.
This way we can peform a SUBSTR and dynamically pluck out the characters between the two ':'.
Again, it's gross, but to each his own.
This should work if the two delimiters only appear twice in your column. I am doing something similar...
substring_index(substring_index(column,':',-2),':',1)
A combination of LOCATE and MID would probably do the trick.
If the value "test 'esf :foo: bar" was in the field fooField:
MID( fooField, LOCATE('foo', fooField), 3);
I don't know if you have this kind of authority, but if you have to do queries like this it might be time to renormalize your tables, and have these values in a lookup table.
With only one set of delimeters, the following should work:
SUBSTR(
SUBSTR(fooField,LOCATE(':',fooField)+1),
1,
LOCATE(':',SUBSTR(fooField,LOCATE(':',fooField)+1))-1
)
mid(col,
locate('?m=',col) + char_length('?m='),
locate('&o=',col) - locate('?m=',col) - char_length('?m=')
)
A bit compact form by replacing char_length(.) with the number 3
mid(col, locate('?m=',col) + 3, locate('&o=',col) - locate('?m=',col) - 3)
the patterns I have used are '?m=' and '&o'.
select mid(col from locate(':',col) + 1 for
locate(':',col,locate(':',col)+1)-locate(':',col) - 1 )
from table where col rlike ':.*:';
If you know the position you want to extract from as opposed to what the data itself is:
$colNumber = 2; //2nd position
$sql = "REPLACE(SUBSTRING(SUBSTRING_INDEX(fooField, ':', $colNumber),
LENGTH(SUBSTRING_INDEX(fooField,
':',
$colNumber - 1)) + 1)";
This is what I am extracting from (mainly colon ':' as delimiter but some exceptions), as column theline255 in table loaddata255:
23856.409:0023:trace:message:SPY_EnterMessage (0x2003a) L"{#32769}" [0081] WM_NCCREATE sent from self wp=00000000 lp=0023f0b0
This is the MySql code (It quickly did what I want, and is straight forward):
select
time('2000-01-01 00:00:00' + interval substring_index(theline255, '.', 1) second) as hhmmss
, substring_index(substring_index(theline255, ':', 1), '.', -1) as logMilli
, substring_index(substring_index(theline255, ':', 2), ':', -1) as logTid
, substring_index(substring_index(theline255, ':', 3), ':', -1) as logType
, substring_index(substring_index(theline255, ':', 4), ':', -1) as logArea
, substring_index(substring_index(theline255, ' ', 1), ':', -1) as logFunction
, substring(theline255, length(substring_index(theline255, ' ', 1)) + 2) as logText
from loaddata255
and this is the result:
# LogTime, LogTimeMilli, LogTid, LogType, LogArea, LogFunction, LogText
'06:37:36', '409', '0023', 'trace', 'message', 'SPY_EnterMessage', '(0x2003a) L\"{#32769}\" [0081] WM_NCCREATE sent from self wp=00000000 lp=0023f0b0'
This one looks elegant to me. Strip all after n-th separator, rotate string, strip everything after 1. separator, rotate back.
select
reverse(
substring_index(
reverse(substring_index(str,separator,substrindex)),
separator,
1)
);
For example:
select
reverse(
substring_index(
reverse(substring_index('www.mysql.com','.',2)),
'.',
1
)
);
you can use the substring / locate function in 1 command
here is a mice tutorial:
http://infofreund.de/mysql-select-substring-2-different-delimiters/
The command as describes their should look for u:
**SELECT substr(text,Locate(' :', text )+2,Locate(': ', text )-(Locate(' :', text )+2)) FROM testtable**
where text is the textfield which contains "test 'esf :foo: bar"
So foo can be fooooo or fo - the length doesnt matter :).