Odd behavior of UPPER in select - mysql

I am trying to do some string manipulated within a MySQL select and I seem to have some strange behavior from the UPPER function.
I am trying to return the first letter of a word (delimited with spaces) and convert it to upper case. However if I use UPPER on the single returned letter I get a blank, while if I use UPPER on the whole word before getting the first letter from it I do get the first letter.
Stripping the SQL down to the minimum I have come up with this test SQL:-
SELECT
'verbatim h',
SUBSTRING(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(' ','verbatim h',' '), ' ', 2), ' ', -1), 1, 1),
UPPER(SUBSTRING(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(' ','verbatim h',' '), ' ', 2), ' ', -1), 1, 1)),
SUBSTRING(UPPER(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(' ','verbatim h',' '), ' ', 2), ' ', -1)), 1, 1)
This is taking the string 'verbatim h', concatenating spaces at either end then getting the string between the 1st and 2nd space (so it will get verbatim).
The first column is the full string, the 2nd column is the first letter of the first word, the 3rd column is the first letter converted to upper case of the first word while the 4th column is the first letter of the first word converted to upper case.
I think columns 3 and 4 should have the same values (the only difference being that one converts the 1st word to upper case before grabbing the first letter, while the other grabs the 1st letter then converts it to upper case), but instead one contains the letter V as I would expect while the other contains blank.
If I modify the above to get the HEX values of the resulting characters, the blank one is a 1 character string with hex value 00, while the V is a hex value of 56.
Any suggestions? Am I missing something obvious?

The string becomes a binary string. And for these you can't use LOWER and UPPER, as stated in the mysql reference docu
So how to solve?
Use the convert funcion like this:
SELECT
'verbatim h',
SUBSTRING(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(' ','verbatim h',' '), ' ', 2), ' ', -1), 1, 1) AS c1,
UPPER(CONVERT((SUBSTRING(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(' ','verbatim h',' '), ' ', 2), ' ', -1), 1, 1)) USING latin1)) AS c1_upper;
here is a sqlfiddle

Related

Insert white space before the last three characters

I have a table of UK postcodes. All of them are in different format, some are capitalized with white space some are not. What I want to do is format them so they can follow the UK postcode standard. For instance AB1 2BB.
I used this query for the purpose which does work, but some postcodes have a longer or shorter first part so it does not succeed for all.
SELECT UPPER(INSERT((REPLACE(postcode , ' ', '')) , 4, 0, ' ')) AS postcode
However if I try to do it the other way around
SELECT UPPER(INSERT((REPLACE(postcode , ' ', '')) , -4, 0, ' ')) AS postcode
It does not work and returns all the postcodes glued together e.g AB12BB
What I want is to put a space before the last 3 characters.
I think you want:
select concat_ws(' ',
left(replace(postcode, ' ', ''), 3),
right(replace(postcode, ' ', ''), 3
) as standardized_postcode
Insert a space at the 3th char from the end of the string, after you remove all the spaces:
SELECT UPPER(INSERT(REPLACE(postcode , ' ', ''), LENGTH(REPLACE(postcode , ' ', '')) - 2, 0, ' ')) AS postcode
It sounds like you're going to be dealing with postcodes like this:
LS10 1DH
LS101DH
LS63DR
etc. We should start by removing all of the spaces:
REPLACE(postcode,' ','') -- LS10 1DH becomes LS101DH
taking the last 3 characters:
RIGHT(REPLACE(postcode,' ',''), 3) -- 1DH
and all of the characters up to the 3rd from last:
LEFT(REPLACE(postcode,' ',''), LEN(REPLACE(postcode,' ','')) - 3))
Then use CONCAT to bring it all together:
SET #postcode = 'LS101DH';
SELECT CONCAT(LEFT(REPLACE(#postcode,' ',''), LENGTH(REPLACE(#postcode,' ','')) - 3),
' ', -- add a space in
RIGHT(REPLACE(#postcode,' ',''), 3));
for pc in postcodes:
print('{} {}'.format(pc[:-3], pc[-3:]))
Seems to work for a given list of UK postcodes for me.

substring_index skips delimiter from right

I have a table 'car_purchases' with a 'description' column. The column is a string that includes first name initial followed by full stop, space and last name.
An example of the Description column is
'Car purchased by J. Blow'
I am using 'substring_index' function to extract the letter preceding the '.' in the column string. Like so:
SELECT
Description,
SUBSTRING_INDEX(Description, '.', 1) as TrimInitial,
SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1) as trimmed,
length(SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1)) as length
from car_purchases;
I will call this query 1.
picture of the result set (Result 1) is as follows
As you can see the problem is that the 'trimmed' column in the select statement starts counting the 2nd delimiter ' ' instead of the first from the right and produces the result 'by J' instead of just 'J'. Further the length column indicates that the string length is 5 instead of 4 so WTF?
However when I perform the following select statement;
select SUBSTRING_INDEX(
SUBSTRING_INDEX('Car purchased by J. Blow', '.', 1),' ', -1); -- query 2
Result = 'J' as 'Result 2'.
As you can see from result 1 the string in column 'Description' is exactly (as far as I can tell) the same as the string from 'Result 2'. But when the substring_index is performed on the column (instead of just the string itself) the result ignores the first delimiter and selects a string from the 2nd delimiter from the right of the string.
I've racked my brains over this and have tried 'by ' and ' by' as delimiters but both options do not produce the desired result of a single character. I do not want to add further complexity to query 1 by using a trim function. I've also tried the cast function on result column 'trimmed' but still no success. I do not want to concat it either.
There is an anomaly in the 'length' column of query 1 where if I change the length function to char_length function like so:
select length(SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1)) as length -- result = 5
select char_length(SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1)) as length -- result = 4
Can anyone please explain to me why the above select statement would produce 2 different results? I think this is the reason why I am not getting my desired result.
But just to be clear my desired outcome is to get 'J' not 'by J'.
I guess I could try reverse but I dont think this is an acceptable compromise. Also I am not familiar with collation and charset principles except that I just use the defaults.
Cheers Players!!!!
CHAR_LENGTH returns length in characters, so a string with 4 2-byte characters would return 4. LENGTH however returns length in bytes, so a string with 4 2-byte characters would return 8. The discrepancy in your results (including SUBSTRING_INDEX) says that the "space" between by and J is not actually a single-byte space (ASCII 0x20) but a 2-byte character that looks like a space. To workaround this, you could try replacing all unicode characters with spaces using CONVERT and REPLACE. In this example, I have an en-space unicode character in the string between by and J. The CONVERT changes that to a ?, and the REPLACE then converts that to a space:
SELECT SUBSTRING_INDEX( SUBSTRING_INDEX("Car purchased by J. Blow", '.', 1),' ', -1)
Output:
by J
With CONVERT and REPLACE:
SELECT SUBSTRING_INDEX( SUBSTRING_INDEX(REPLACE(CONVERT("Car purchased by J. Blow" USING ASCII), '?', ' '), '.', 1),' ', -1)
Output
J
For your query, you would replace the string with your column name i.e.
SELECT SUBSTRING_INDEX( SUBSTRING_INDEX(REPLACE(CONVERT(description USING ASCII), '?', ' '), '.', 1),' ', -1)
Demo on DBFiddle

Ho to strip and replace values in mysql

I have around 200,000 records of data with phone numbers, but the numbers are inconsistent.
for example, some may be 10 digits (missing a 0 at the beginning), some have spaces in there, some have a '-' in the middle and some begin with '+44' instead of 0.
Is there a way in mySQL to condition all these and cleanse the data in one query?
Without sample data and without an example output this is purely speculative and assuming you want the output in the format of 01234567891.
Use a combination of LENGTH, REPLACE' ANDLEFT` functions to resolve the 4 issues you highlighted:
Missing 0 at beggining.
Spaces in the string.
-'s in the string.
+44 rather than 0.
SELECT CASE WHEN LENGTH(REPLACE(REPLACE(numberfield, '-', ''), ' ', '')) = 10
THEN CONCAT('0', REPLACE(REPLACE(numberfield, '-', ''), ' ', ''))
WHEN LEFT(REPLACE(REPLACE(numberfield, '-', ''), ' ', ''), 3) = '+44'
THEN REPLACE(REPLACE(REPLACE(numberfield, '-', ''), ' ', ''), '+44', '0'
END AS Cleannumber
FROM yourtable
Assuming the phone number field is a string - the following should deal with the conditions you specified :
RIGHT( LPAD( REPLACE( REPLACE( REPLACE('phonenumber', '-', ''), '+44', ''), ' ', ''), 11, '0' ), 11 )
first any '-' are removed, then '+44' is removed, then spaces are removed, then 11 '0's are added to the start of the number, finally the rightmost 11 characters are taken.
So you would do an UPDATE query replacing the phonenumber column.

SQL match first part of string and compare

I use Oracle and MySql, so if you have any answer in both codes please let me know,
Issue:
I have 2 columns in one table called: USers
Column#1= Names
Column#2= UCNames
This list contain names that are from different sources but partially match like:
Names
Alex Jones Marfex
UCNames
Alex Jonnes Mike Marfex
I want to compare both of the columns and find match based on the following attributes:
Search on the first 4 for letters and 4 last words and to store in new column called: verifiyed
Thanks
This gives you the first word in a string
Substring(Col,1,(Locate (' ',Col + ' ')-1)) First
This gives you the last word in a string
Reverse(Substring(Reverse(Col), 1, Locate(' ',Reverse(Col)) - 1)) Last
So your compare could be
Where
Substring(Col1,1,( Locate(' ',Col1 + ' ')-1))
= Substring(Col2,1,( Locate(' ',Col2 + ' ')-1))
And
Reverse(Substring(Reverse(Col1), 1, Locate(' ',Reverse(Col1)) - 1))
= Reverse(Substring(Reverse(Col2), 1, Locate(' ',Reverse(Col2)) - 1))
I went with words which seemed safer for what you are trying to do with just a bit more effort. If you do want to keep the 4 characters, just replace the Locate es with 4.
Based on the answer by #asantaballa , but using MySQL SUBSTRING_INDEX function :-
WHERE SUBSTRING_INDEX(Col1, ' ', 1) = SUBSTRING_INDEX(Col2, ' ', 1)
AND SUBSTRING_INDEX(Col1, ' ', -1) = SUBSTRING_INDEX(Col2, ' ', -1)
or to crudely check the first 4 characters and the last 4 words:-
WHERE SUBSTRING(Col1, 1, 4) = SUBSTRING(Col2, 1, 4)
AND SUBSTRING_INDEX(Col1, ' ', -4) = SUBSTRING_INDEX(Col2, ' ', -4)

Search for text between delimiters in MySQL

I am trying to extract a certain part of a column that is between delimiters.
e.g. find foo in the following
test 'esf :foo: bar
So in the above I'd want to return foo, but all the regexp functions only return true|false,
is there a way to do this in MySQL
Here ya go, bud:
SELECT
SUBSTR(column,
LOCATE(':',column)+1,
(CHAR_LENGTH(column) - LOCATE(':',REVERSE(column)) - LOCATE(':',column)))
FROM table
Yea, no clue why you're doing this, but this will do the trick.
By performing a LOCATE, we can find the first ':'. To find the last ':', there's no reverse LOCATE, so we have to do it manually by performing a LOCATE(':', REVERSE(column)).
With the index of the first ':', the number of chars from the last ':' to the end of the string, and the CHAR_LENGTH (don't use LENGTH() for this), we can use a little math to discover the length of the string between the two instances of ':'.
This way we can peform a SUBSTR and dynamically pluck out the characters between the two ':'.
Again, it's gross, but to each his own.
This should work if the two delimiters only appear twice in your column. I am doing something similar...
substring_index(substring_index(column,':',-2),':',1)
A combination of LOCATE and MID would probably do the trick.
If the value "test 'esf :foo: bar" was in the field fooField:
MID( fooField, LOCATE('foo', fooField), 3);
I don't know if you have this kind of authority, but if you have to do queries like this it might be time to renormalize your tables, and have these values in a lookup table.
With only one set of delimeters, the following should work:
SUBSTR(
SUBSTR(fooField,LOCATE(':',fooField)+1),
1,
LOCATE(':',SUBSTR(fooField,LOCATE(':',fooField)+1))-1
)
mid(col,
locate('?m=',col) + char_length('?m='),
locate('&o=',col) - locate('?m=',col) - char_length('?m=')
)
A bit compact form by replacing char_length(.) with the number 3
mid(col, locate('?m=',col) + 3, locate('&o=',col) - locate('?m=',col) - 3)
the patterns I have used are '?m=' and '&o'.
select mid(col from locate(':',col) + 1 for
locate(':',col,locate(':',col)+1)-locate(':',col) - 1 )
from table where col rlike ':.*:';
If you know the position you want to extract from as opposed to what the data itself is:
$colNumber = 2; //2nd position
$sql = "REPLACE(SUBSTRING(SUBSTRING_INDEX(fooField, ':', $colNumber),
LENGTH(SUBSTRING_INDEX(fooField,
':',
$colNumber - 1)) + 1)";
This is what I am extracting from (mainly colon ':' as delimiter but some exceptions), as column theline255 in table loaddata255:
23856.409:0023:trace:message:SPY_EnterMessage (0x2003a) L"{#32769}" [0081] WM_NCCREATE sent from self wp=00000000 lp=0023f0b0
This is the MySql code (It quickly did what I want, and is straight forward):
select
time('2000-01-01 00:00:00' + interval substring_index(theline255, '.', 1) second) as hhmmss
, substring_index(substring_index(theline255, ':', 1), '.', -1) as logMilli
, substring_index(substring_index(theline255, ':', 2), ':', -1) as logTid
, substring_index(substring_index(theline255, ':', 3), ':', -1) as logType
, substring_index(substring_index(theline255, ':', 4), ':', -1) as logArea
, substring_index(substring_index(theline255, ' ', 1), ':', -1) as logFunction
, substring(theline255, length(substring_index(theline255, ' ', 1)) + 2) as logText
from loaddata255
and this is the result:
# LogTime, LogTimeMilli, LogTid, LogType, LogArea, LogFunction, LogText
'06:37:36', '409', '0023', 'trace', 'message', 'SPY_EnterMessage', '(0x2003a) L\"{#32769}\" [0081] WM_NCCREATE sent from self wp=00000000 lp=0023f0b0'
This one looks elegant to me. Strip all after n-th separator, rotate string, strip everything after 1. separator, rotate back.
select
reverse(
substring_index(
reverse(substring_index(str,separator,substrindex)),
separator,
1)
);
For example:
select
reverse(
substring_index(
reverse(substring_index('www.mysql.com','.',2)),
'.',
1
)
);
you can use the substring / locate function in 1 command
here is a mice tutorial:
http://infofreund.de/mysql-select-substring-2-different-delimiters/
The command as describes their should look for u:
**SELECT substr(text,Locate(' :', text )+2,Locate(': ', text )-(Locate(' :', text )+2)) FROM testtable**
where text is the textfield which contains "test 'esf :foo: bar"
So foo can be fooooo or fo - the length doesnt matter :).