MySql Script to search all non ascii and reaplace with space - mysql

Can I ask for some sort of script to search for an ascii ranging from DLE way up to US. You can check it at ASCII TABLE.
I have a code which suppposedly search for a non ascii character but I also want to replace it with a space at the same time.
Here's my code
SELECT *
from
`TABLE_NAME`(Sample only)
where COLUMN_NAME(sample only)
like %non-keyboard character(sample only(1st non keyboard character))% or
like %non-keyboard character(sample only(2nd non keyboard character))%.....
The code above is long since I used it from DLE down to US.
Any advice will be greatly appreciated.

You can search range from DLE to US using REGEXP. With REGEXP you can specify non-ascii character like [.DLE.].
SELECT col
FROM tab
WHERE col REGEXP '[[.DLE.]-[.US.]]';
But, Unforfunately there is no way to replace using REGEXP something like REG_STR_REPLACE(). please refere to this
So you should call REPLACE() several times but this is slow. You have three choices.
1.
SELECT REPLACE(REPLACE(REPLACE(col, CHAR(16), ''), CHAR(17), ''), CHAR(18), '')
FROM ...
2. install udf explained in SO
3. do in client side

Try this one
update tablename
set columnToCheck = replace(columnToCheck , char(146), '');
or
UPDATE tablename
SET columnToCheck = REPLACE(CONVERT(columnToCheck USING ascii), '?', '')
WHERE ...
Font: http://www.xaprb.com/blog/2006/04/14/bad-character-data-in-mysql/

Related

how to handle white spaces in sql

I want to write an SQL query that will fetch all the students who live in a specific Post Code. Following is my query.
SELECT * FROM `students` AS ss WHERE ss.`postcode` LIKE 'SE4 1NA';
Now the issue is that in database some records are saved without the white space is postcode, like SE41NA and some may also be in lowercase, like se41na or se4 1na.
The query gives me different results based on how the record is saved. Is there any way in which I can handle this?
Using regexp is one way to do it. This performs a case insensitive match by default.
SELECT * FROM students AS ss
WHERE ss.postcode REGEXP '^SE4[[:space:]]?1NA$';
[[:space:]]? matches an optional space character.
REGEXP documentation MySQL
Whether case matters depends on the collation of the string/column/database/server. But, you can get around it by doing:
WHERE UPPER(ss.postcode) LIKE 'SE4%1NA'
The % will match any number of characters, including none. It is a bit too general for what you might really need -- but it should work fine in practice.
The more important issue is that your database does not validate the data being put into it. You should fix the application so the postal codes are correct and follow a standard format.
Use a combination of UPPER and REPLACE.
SELECT *
FROM students s
WHERE UPPER(REPLACE(s.postcode, ' ', '')) LIKE '%SE41NA%'
SELECT * FROM students AS ss
WHERE UPPER(REPLACE(ss.postcode, ' ', '')) = 'SE41NA' ;
SELECT *
FROM students AS ss
WHERE UPPER(ss.postcode) LIKE SELECT REPLACE(UPPER('SE4 1NA'), ' ', '%'); ;
I propose using the spaces replaced with the'%' placeholder. Also transform the case to upper for both sides of the LIKE operator

Cleaning out a field of Phone numbers in mySql

In not a database guy but: I have mixed up data in a mySql database that I inherited.
Some Phone numbers are formatted (512) 555-1212 (call it dirty)
Others 5125551212 (Call it clean)
I need a sqlstamet that says
UPDATE table_name
SET Phone="clean'(Some sort of cleaning code - regex?)
WHERE Phone='Dirty'
Unfortunately there's no regex replace/update in MySQL. If it's just parentheses and dashes and spaces then some nested REPLACE calls will do the trick:
UPDATE table_name
SET Phone = REPLACE(REPLACE(REPLACE(REPLACE(Phone, '-', ''), ')', ''), '(', ''), ' ', '')
To my knowledge you can't run a regexp to replace data during the update process. Only during the SELECT statement.
Your best bet is to use a scripting language that you're familiar with and read the table and change it that way. Basically by retrieving all the entries. Then using a string replace to match a simple regexp such as [^\d]* and remove those characters. Then update the table with the new value.
Also, see this answer:
How to do a regular expression replace in MySQL?

How to remove unconvertable characters to ascii with SELECT in mySQL

I use a CONVERT function in my SELECT statement in order to avert utf8 errors, but MySQL leaves question marks behind. Is there a way to convert the unconvertable characters to blank or space characters?
SELECT MeetId,
ResId,
Special,
CONVERT(proposal USING ascii) as Proposal,
Analysis,
Vote,
Vote_for,
Oppose,
Discret,
Abstain,
gpVote %s
FROM RESO
WHERE RESO.MeetId = %s
As an example a typical result may have this in a field: 'The current issue ?A? is on the table '
What about just using REPLACE:
SELECT
REPLACE(CONVERT('§123' USING ascii), '?', '')
And the Fiddle.
Good luck.
Be careful, sgeddes solution also removes all question marks(if exist) from your string!
For example :
SELECT REPLACE(CONVERT('§How are you?' USING ascii), '?', '')
Output will be : How are you

Format Phone Numbers in MySQL

I have a bunch of phone numbers in a DB that are formatted as such: (999) 123-3456.
I'm needing them to look like 123-123-1234
Is there any sort of regex or something I can do in MySQL to quickly format all these phone numbers?
Also, frustratingly, some are NOT formatted like that, so I couldn't just apply this to an entire column.
Thanks!
A quick solution would be to run these two queries:
UPDATE table_name set PhoneCol = REPLACE(PhoneCol, '(', '');
UPDATE table_name set PhoneCol = REPLACE(PhoneCol, ') ', '-');
Just write a small php script that loops through all the values and updates them. Making that change is pretty simple in php. Then just run an update on the row to overwrite the value.
maybe a two pass solution.
strip out all non-numeric characters (and spaces)
inset the formatting characters '(',')', ' ', and '-' into the correct spots
(or better yet, leave them off and format only during select on your reports.)
I had a similar problem, but increased by the reason that some phones had the format with the dashes and others did not and this was the command that helped me to update the formats of the numbers that did not have the hyphens.
Phone before the command: 1234567890
Phone after command: 123-456-7890
The phone field is called phone_number and is a VARCHAR
The command I used is:
UPDATE database.table
SET phone_number = concat(SUBSTRING(phone_number,1,3) , '-' , SUBSTRING(phone_number,4,3) , '-' , SUBSTRING(phone_number,7,4))
WHERE LOCATE('-', phone_number) = 0;
I think your command could be like this:
UPDATE database.table
SET phone_number = concat(SUBSTRING(phone_number,2,3) , '-' , SUBSTRING(phone_number,7,8));
I would remove the WHERE clause under the assumption that all phones would be formatted with the (). Also, the second string of characters would start from position 7 because there appears to be a space after the parentheses.

How can I find non-ASCII characters in MySQL?

I'm working with a MySQL database that has some data imported from Excel. The data contains non-ASCII characters (em dashes, etc.) as well as hidden carriage returns or line feeds. Is there a way to find these records using MySQL?
MySQL provides comprehensive character set management that can help with this kind of problem.
SELECT whatever
FROM tableName
WHERE columnToCheck <> CONVERT(columnToCheck USING ASCII)
The CONVERT(col USING charset) function turns the unconvertable characters into replacement characters. Then, the converted and unconverted text will be unequal.
See this for more discussion. https://dev.mysql.com/doc/refman/8.0/en/charset-repertoire.html
You can use any character set name you wish in place of ASCII. For example, if you want to find out which characters won't render correctly in code page 1257 (Lithuanian, Latvian, Estonian) use CONVERT(columnToCheck USING cp1257)
You can define ASCII as all characters that have a decimal value of 0 - 127 (0x00 - 0x7F) and find columns with non-ASCII characters using the following query
SELECT * FROM TABLE WHERE NOT HEX(COLUMN) REGEXP '^([0-7][0-9A-F])*$';
This was the most comprehensive query I could come up with.
It depends exactly what you're defining as "ASCII", but I would suggest trying a variant of a query like this:
SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9]';
That query will return all rows where columnToCheck contains any non-alphanumeric characters. If you have other characters that are acceptable, add them to the character class in the regular expression. For example, if periods, commas, and hyphens are OK, change the query to:
SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9.,-]';
The most relevant page of the MySQL documentation is probably 12.5.2 Regular Expressions.
This is probably what you're looking for:
select * from TABLE where COLUMN regexp '[^ -~]';
It should return all rows where COLUMN contains non-ASCII characters (or non-printable ASCII characters such as newline).
One missing character from everyone's examples above is the termination character (\0). This is invisible to the MySQL console output and is not discoverable by any of the queries heretofore mentioned. The query to find it is simply:
select * from TABLE where COLUMN like '%\0%';
Based on the correct answer, but taking into account ASCII control characters as well, the solution that worked for me is this:
SELECT * FROM `table` WHERE NOT `field` REGEXP "[\\x00-\\xFF]|^$";
It does the same thing: searches for violations of the ASCII range in a column, but lets you search for control characters too, since it uses hexadecimal notation for code points. Since there is no comparison or conversion (unlike #Ollie's answer), this should be significantly faster, too. (Especially if MySQL does early-termination on the regex query, which it definitely should.)
It also avoids returning fields that are zero-length. If you want a slightly-longer version that might perform better, you can use this instead:
SELECT * FROM `table` WHERE `field` <> "" AND NOT `field` REGEXP "[\\x00-\\xFF]";
It does a separate check for length to avoid zero-length results, without considering them for a regex pass. Depending on the number of zero-length entries you have, this could be significantly faster.
Note that if your default character set is something bizarre where 0x00-0xFF don't map to the same values as ASCII (is there such a character set in existence anywhere?), this would return a false positive. Otherwise, enjoy!
Try Using this query for searching special character records
SELECT *
FROM tableName
WHERE fieldName REGEXP '[^a-zA-Z0-9#:. \'\-`,\&]'
#zende's answer was the only one that covered columns with a mix of ascii and non ascii characters, but it also had that problematic hex thing. I used this:
SELECT * FROM `table` WHERE NOT `column` REGEXP '^[ -~]+$' AND `column` !=''
In Oracle we can use below.
SELECT * FROM TABLE_A WHERE ASCIISTR(COLUMN_A) <> COLUMN_A;
for this question we can also use this method :
Question from sql zoo:
Find all details of the prize won by PETER GRÜNBERG
Non-ASCII characters
ans: select*from nobel where winner like'P% GR%_%berg';