Basically, I have a problem with replace() function in MySQL (via phpMyAdmin). One table got messed and some special characters (+ empty space after it) appeared inside a word. So all I wanted to do was:
UPDATE myTable SET columnName =
(replace(columnName, 'Å house',
'house'))
But MySQL returns
0 row(s) affected. ( Query took 0.0107 sec )
The same is when I try to replace foreign towns with special characters in the name of a town (Swedish town, German town, etc.)
Am I doing something wrong???
Å house
Is likely to actually be:
Å house
That is, with a U+00A0 Non Break Space character and not a normal space. Of course normally you cannot see the difference, but a string replace can and won't touch it.
This was probably originally just a single non-breaking-space character, that has been mangled through a classic UTF-8-read-as-ISO-8859-1 encoding screw-up. Other non-ASCII characters in your database are likely to have been similarly messed up.
Related
I have a table in Microsoft Access with words that contain Romanian characters. I want to query for words that contain ț (Latin small letter T with cedilla).
The following query gives back all the entries of my table "words", just as if it was another wildcard:
"SELECT words.[word]
FROM words
WHERE (((words.[word]) Like "*ț*"));
Any idea how to search for words that contain that character?
By the way, if I search for words with "ă" (Latin small letter a with breve) it works as expected.
The reason seems to be that "Like" uses ascii and interprets every character it can't understand as a question mark. Try this instead:
SELECT words.[word]
FROM words
WHERE instr(words.[word],"ț")>0
Banging me head against the wall with this one.
I have table containing postcodes and street names and I have another table where Houses are listed for sale ( where the Street name is missing) and I am tryin to get the Street name for each post code.
The problem is that table 1 stores the postcode without the space and table 2 which I am trying to update stores the post code with the space.
So in table 1 the postcode is stored as "l249pb" and table 2 it is stored as "l24 9pb".
Now if the post codes where both stored in exactly the same format i.e without the space I would expect this query to work:
UPDATE Table1
INNER JOIN Table2 ON ( Table1.PostCode = Table2.PostCode )
SET Table1.StreetName = Table2.StreetName
I have tried this but it wont work :
UPDATE Table1
INNER JOIN Table2 ON ( Table1.PostCode = REPLACE(Table2.PostCode,' ',''))
SET Table1.StreetName = Table2.StreetName
can anyone tell me how to check for a match ignoring spaces ( like a trim but removing every space )
Many thanks for any help you can offer.
With the data you've given your UPDATE runs just fine. Probably the whitespaces you see are not actually spaces, but something else, e.g. non-breaking spaces, tabs etc.
After normal SPACE, the next most common white spaces (which are not line breaks) are CHARACTER TABULATION (ie. horizontal tab) and NO-BREAK SPACE. You could use CHAR(9) and CHAR(160), respectively, to reference them in your query.
It also might be possible that your table viewer application shows line breaks as a space for brevity, so if replacing space, tab and nbsp isn't enough, try replacing those, too.
If you really need to replace all white space characters… Unfortunately there is no "white space wildcard" to use in MySQL. Technically, you could make a monster REPLACE(REPLACE(REPLACE(REPLACE…-call, which, in the end, would replace all whitespace characters with ''. For example, to replace every THREE-PER-EM SPACE, first look for its Unicode code point (U+2004), then you can replace its occurences e.g. with:
REPLACE(PostCode, CHAR(0x2004 using ucs2), '')
There is a hackish shortcut to this: if you are sure that your data should contain only Latin-1 characters and no ? (question mark), you could CONVERT() the string first as latin1, which replaces all characters with overflowing code as ?and then replace all ? as '':
REPLACE(CONVERT(PostCode using latin1), '?', '')
This can be useful in one-off, manual queries, but for continuing use, better replace the characters explicitly.
But first you should check your data input sanitizer/validator, so future records won't be such a mess. Perhaps you could consider running a bulk replace to normalize the data on PostCode column(s), if possible, before even trying to do your join query. Legacy systems with legacy data only get worse over time.
Somewhere along the way, between all the imports and exports I have done, a lot of the text on a blog I run is full of weird accented A characters.
When I export the data using mysqldump and load it into a text editor with the intention of using search-and-replace to clear out the bad characters, searching just matches every "a" character.
Does anyone know any way I can successfully hunt down these characters and get rid of them, either directly in MySQL or by using mysqldump and then reimporting the content?
This is an encoding problem; the  is a non-breaking space (HTML entity ) in Unicode being displayed in Latin1.
You might try something like this... first we check to make sure the matching is working:
SELECT * FROM some_table WHERE some_field LIKE BINARY '%Â%'
This should return any rows in some_table where some_field has a bad character. Assuming that works properly and you find the rows you're looking for, try this:
UPDATE some_table SET some_field = REPLACE( some_field, BINARY 'Â', '' )
And that should remove those characters (based on the page you linked, you don't really want an nbsp there as you would end up with three spaces in a row between sentences etc, you should only have one).
If it doesn't work then you'll need to look at the encoding and collation being used.
EDIT: Just added BINARY to the strings; this should hopefully make it work regardless of encoding.
The accepted answer did not work for me.
From here http://nicj.net/mysql-converting-an-incorrect-latin1-column-to-utf8/ I have found that the binary code for  character is c2a0 (by converting the column to VARBINARY and looking what it turns to).
Then here http://www.oneminuteinfo.com/2013/11/mysql-replace-non-ascii-characters.html found the actual solution to remove (replace) it:
update entry set english_translation = unhex(replace(hex(english_translation),'C2A0','20')) where entry_id = 4008;
The query above replaces it to a space, then a normal trim can be applied or simply replace to '' instead.
I have had this problem and it is annoying, but solvable. As well as  you may find you have a whole load of characters showing up in your data like these:
“
This is connected to encoding changes in the database, but so long as you do not have any of these characters in your database that you want to keep (e.g. if you are actually using a Euro symbol) then you can strip them out with a few MySQL commands as previously suggested.
In my case I had this problem with a Wordpress database that I had inherited, and I found a useful set of pre-formed queries that work for Wordpress here http://digwp.com/2011/07/clean-up-weird-characters-in-database/
It's also worth noting that one of the causes of the problem in the first place is opening a database in a text editor which might change the encoding in some way. So if you can possibly manipulate the database using MySQL only and not a text editor this will reduce the risk of causing further trouble.
How can I detect and delete rows with Chinese characters in MySQL?
Here is the Table "Chinese_Test" Contains the Chinese Character on my PhpMyAdmin
Data:
Structure
notice my type of Collation is utf8, thus let's take a look at the Chinese Characters in utf8 table.
http://www.ansell-uebersetzungen.com/gbuni.html
Notice the Chinese Character is from E4 to E9, hence we use the code
select number
from Chinese_Test
where HEX(contents) REGEXP '^(..)*(E[4-9])';
and here is the result:
If all the other rows have alphanumeric values try the following:
DELETE FROM tableName WHERE NOT columnToCheck REGEXP '[A-Za-z0-9.,-]';
Do check the results before deletion, using the following:
SELECT * FROM tableName WHERE NOT columnToCheck REGEXP '[A-Za-z0-9.,-]';
I don't have an answer, but to provide you with a starting point: Chinese characters will occupy certain blocks in the UTF-8 character set. Example
You would have to query for rows that contain characters between the first and the last point of that block. I can't think of a way to automate this though (i.e. to query for characters inside a certain range without naming each character explicitly).
Another untested idea that comes to mind is using iconv() to convert the string to a specifically Chinese encoding, using //IGNORE, and seeing whether any data is left. If anything is left, the string may contain chinese characters.... although this would probably be disrupted by any numbers inside the string,
It's an interesting problem.
I have a MySQL database with roughly 70,000 records. I want to be able to search the "Person" table on the "Address1" column for all address that only contain capital letters, spaces, and numbers.
The end goal is to flag any addresses that look like this: 124 DOLPHIN STREET so they can be converted to 124 Dolphin Street.
I tried using the MySQL REGEXP, but it see`ms that it doesn't bother w/ case b/c I get results with lowercase characters in them.
Query:
SELECT *
FROM `Person`
WHERE `Address1`
REGEXP '[A-Z\\s0-9]+';
The coalition of the table and column is: latin1_general_cs
MySQL is, by default, not case sensitive (as I've occasionally had to learn the hard way as you're doing now). You need to use a regular expression without alphabetic characters in it, try .... REGEXP '^[[[:upper:]][[:space:]]0-9]+$'