I have around 200,000 records of data with phone numbers, but the numbers are inconsistent.
for example, some may be 10 digits (missing a 0 at the beginning), some have spaces in there, some have a '-' in the middle and some begin with '+44' instead of 0.
Is there a way in mySQL to condition all these and cleanse the data in one query?
Without sample data and without an example output this is purely speculative and assuming you want the output in the format of 01234567891.
Use a combination of LENGTH, REPLACE' ANDLEFT` functions to resolve the 4 issues you highlighted:
Missing 0 at beggining.
Spaces in the string.
-'s in the string.
+44 rather than 0.
SELECT CASE WHEN LENGTH(REPLACE(REPLACE(numberfield, '-', ''), ' ', '')) = 10
THEN CONCAT('0', REPLACE(REPLACE(numberfield, '-', ''), ' ', ''))
WHEN LEFT(REPLACE(REPLACE(numberfield, '-', ''), ' ', ''), 3) = '+44'
THEN REPLACE(REPLACE(REPLACE(numberfield, '-', ''), ' ', ''), '+44', '0'
END AS Cleannumber
FROM yourtable
Assuming the phone number field is a string - the following should deal with the conditions you specified :
RIGHT( LPAD( REPLACE( REPLACE( REPLACE('phonenumber', '-', ''), '+44', ''), ' ', ''), 11, '0' ), 11 )
first any '-' are removed, then '+44' is removed, then spaces are removed, then 11 '0's are added to the start of the number, finally the rightmost 11 characters are taken.
So you would do an UPDATE query replacing the phonenumber column.
Related
I have a table of UK postcodes. All of them are in different format, some are capitalized with white space some are not. What I want to do is format them so they can follow the UK postcode standard. For instance AB1 2BB.
I used this query for the purpose which does work, but some postcodes have a longer or shorter first part so it does not succeed for all.
SELECT UPPER(INSERT((REPLACE(postcode , ' ', '')) , 4, 0, ' ')) AS postcode
However if I try to do it the other way around
SELECT UPPER(INSERT((REPLACE(postcode , ' ', '')) , -4, 0, ' ')) AS postcode
It does not work and returns all the postcodes glued together e.g AB12BB
What I want is to put a space before the last 3 characters.
I think you want:
select concat_ws(' ',
left(replace(postcode, ' ', ''), 3),
right(replace(postcode, ' ', ''), 3
) as standardized_postcode
Insert a space at the 3th char from the end of the string, after you remove all the spaces:
SELECT UPPER(INSERT(REPLACE(postcode , ' ', ''), LENGTH(REPLACE(postcode , ' ', '')) - 2, 0, ' ')) AS postcode
It sounds like you're going to be dealing with postcodes like this:
LS10 1DH
LS101DH
LS63DR
etc. We should start by removing all of the spaces:
REPLACE(postcode,' ','') -- LS10 1DH becomes LS101DH
taking the last 3 characters:
RIGHT(REPLACE(postcode,' ',''), 3) -- 1DH
and all of the characters up to the 3rd from last:
LEFT(REPLACE(postcode,' ',''), LEN(REPLACE(postcode,' ','')) - 3))
Then use CONCAT to bring it all together:
SET #postcode = 'LS101DH';
SELECT CONCAT(LEFT(REPLACE(#postcode,' ',''), LENGTH(REPLACE(#postcode,' ','')) - 3),
' ', -- add a space in
RIGHT(REPLACE(#postcode,' ',''), 3));
for pc in postcodes:
print('{} {}'.format(pc[:-3], pc[-3:]))
Seems to work for a given list of UK postcodes for me.
I have a table 'car_purchases' with a 'description' column. The column is a string that includes first name initial followed by full stop, space and last name.
An example of the Description column is
'Car purchased by J. Blow'
I am using 'substring_index' function to extract the letter preceding the '.' in the column string. Like so:
SELECT
Description,
SUBSTRING_INDEX(Description, '.', 1) as TrimInitial,
SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1) as trimmed,
length(SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1)) as length
from car_purchases;
I will call this query 1.
picture of the result set (Result 1) is as follows
As you can see the problem is that the 'trimmed' column in the select statement starts counting the 2nd delimiter ' ' instead of the first from the right and produces the result 'by J' instead of just 'J'. Further the length column indicates that the string length is 5 instead of 4 so WTF?
However when I perform the following select statement;
select SUBSTRING_INDEX(
SUBSTRING_INDEX('Car purchased by J. Blow', '.', 1),' ', -1); -- query 2
Result = 'J' as 'Result 2'.
As you can see from result 1 the string in column 'Description' is exactly (as far as I can tell) the same as the string from 'Result 2'. But when the substring_index is performed on the column (instead of just the string itself) the result ignores the first delimiter and selects a string from the 2nd delimiter from the right of the string.
I've racked my brains over this and have tried 'by ' and ' by' as delimiters but both options do not produce the desired result of a single character. I do not want to add further complexity to query 1 by using a trim function. I've also tried the cast function on result column 'trimmed' but still no success. I do not want to concat it either.
There is an anomaly in the 'length' column of query 1 where if I change the length function to char_length function like so:
select length(SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1)) as length -- result = 5
select char_length(SUBSTRING_INDEX(
SUBSTRING_INDEX(Description, '.', 1),' ', -1)) as length -- result = 4
Can anyone please explain to me why the above select statement would produce 2 different results? I think this is the reason why I am not getting my desired result.
But just to be clear my desired outcome is to get 'J' not 'by J'.
I guess I could try reverse but I dont think this is an acceptable compromise. Also I am not familiar with collation and charset principles except that I just use the defaults.
Cheers Players!!!!
CHAR_LENGTH returns length in characters, so a string with 4 2-byte characters would return 4. LENGTH however returns length in bytes, so a string with 4 2-byte characters would return 8. The discrepancy in your results (including SUBSTRING_INDEX) says that the "space" between by and J is not actually a single-byte space (ASCII 0x20) but a 2-byte character that looks like a space. To workaround this, you could try replacing all unicode characters with spaces using CONVERT and REPLACE. In this example, I have an en-space unicode character in the string between by and J. The CONVERT changes that to a ?, and the REPLACE then converts that to a space:
SELECT SUBSTRING_INDEX( SUBSTRING_INDEX("Car purchased by J. Blow", '.', 1),' ', -1)
Output:
by J
With CONVERT and REPLACE:
SELECT SUBSTRING_INDEX( SUBSTRING_INDEX(REPLACE(CONVERT("Car purchased by J. Blow" USING ASCII), '?', ' '), '.', 1),' ', -1)
Output
J
For your query, you would replace the string with your column name i.e.
SELECT SUBSTRING_INDEX( SUBSTRING_INDEX(REPLACE(CONVERT(description USING ASCII), '?', ' '), '.', 1),' ', -1)
Demo on DBFiddle
I need a little help in queries.
I have a table like this:
id value
1 rs-123
2 rsa-123
I need to get the first row if the user queries in following ways : rs123, rs-123, rs 123 (either using or skipping the space and dash).
WHERE REPLACE(REPLACE(value, '-', ''), ' ', '')
= REPLACE(REPLACE($val , '-', ''), ' ', '')
In the below piece of code, I am creating an Address field by concatenating various parts of an address.
However, if for instance address2 was empty, the trailing , will still be concatenated into Address.
This means if all fields were empty, I end up with a result of ,,,,.
If address1 is "House Number" and everything else is empty, I end up with House Number,,,,.
CONCAT( COALESCE(address1,'') , ', ' ,
COALESCE(address2,'') , ', ' ,
COALESCE(address3,'') , ', ' ,
COALESCE(city,'') , ', ' ,
COALESCE(zip, '')
) AS Address,
Is there some way of conditionally placing the commas between address parts only if the content of an address part is not empty.
Such as something along the lines of (pseudo-code) IF(address1) is NULL use '' ELSE use ','
Thank you.
CONCAT_WS(', ',
IF(LENGTH(`address1`),`address1`,NULL),
IF(LENGTH(`address2`),`address2`,NULL),
IF(LENGTH(`address3`),`address3`,NULL),
IF(LENGTH(`city`),`city`,NULL),
IF(LENGTH(`zip`),`zip`,NULL)
)
Using CONCAT_WS as Mat says is a very good idea, but I thought I'd do it a different way, with messy IF() statements:
CONCAT( COALESCE(address1,''), IF(LENGTH(address1), ', ', ''),
COALESCE(address2,''), IF(LENGTH(address2), ', ', ''),
COALESCE(address3,''), IF(LENGTH(address3), ', ', ''),
COALESCE(city,''), IF(LENGTH(city), ', ', ''),
COALESCE(zip,''), IF(LENGTH(address1), ', ', ''),
) AS Address,
The IF()s check if the field has a length and if so returns a comma. Otherwise, it returns an empty string.
try with MAKE_SET
SELECT MAKE_SET(11111,`address1`,`address2`,`address3`,`city`,`zip`) AS Address
It will returns a string with all NOT NULL value separated by ,
CONCAT_WS(', ',
NULLIF(`address1`,''),
NULLIF(`address2`,''),
NULLIF(`address3`,''),
NULLIF(`city`,''),
NULLIF(`zip`,'')
)
CONCAT_WS combines non-NULL strings.
NULLIF writes NULL if left and right side are equals. In this case if values are equals an empty sting ''.
I have a MYSQL database with Negative numbers that are enclosed in parenthesis
eg. (14,500) which is supposed to be -14500.
I am storing the numbers as varchar. I am trying to convert all the numbers to a double or float format and also format the negative numbers with a minus sign.
My code:
select case
when substr(gross_d,1,1) = '(' then
ltrim('(') and rtrim(')') *-1
else
(gross_d)
end gross_d_num
from buy;
convert(gross_d_num,Double);
The problem with my current method is all the negative numbers with the parenthesis are converted to zero. Is there a different method to get my result.
edit:
I also removed the *-1 to see if the Parenthesis is removed and I get a value of zero.
Something like
convert (
case
when gross_d LIKE '(%)' THEN CONCAT('-', REPLACE(REPLACE(gross_d, ')', ''), '(', ''))
else gross_d
end, decimal(19,6))
Here, you are trimming parenthesis only. This becomes zero when you multiply by -1
ltrim('(') and rtrim(')') *-1
CONVERT(
IF( gross_d LIKE '(%)'
,CONCAT( '-', SUBSTR( gross_d, 1, LENGTH( gross_d ) - 2 ) )
,gross_d )
,DECIMAL );
At our company we don't have control over currency formatting used by external parties uploading excel sheets. We currently use this to convert the currencies and add a case whenever something new shows up :
SET #netSale := '$ (154.00)';
SELECT CONVERT (
CASE
when #netSale LIKE '$ (%)' THEN CONCAT('-', REPLACE(REPLACE(REPLACE(#netSale, '$ ', ''), ')', ''), '(', ''))
when #netSale LIKE '(%)' THEN CONCAT('-', REPLACE(REPLACE(REPLACE(#netSale, '$ ', ''), ')', ''), '(', ''))
else REPLACE(REPLACE(#netSale,'$',''),',', '')
END, DECIMAL(10,2)
)
This deals with most formatting styles we have encountered and is especially useful when loading a converted CSV file to a table.