Funny issue with MariaDB field - mysql

I am trying to save some value prefixed with the currency symbol like in €10. Yet when I manually enter them in the DB the euro sign gets turned now and then in ?. When then I query the line I get sometimes again the question mark with the value and some other times NaN for the full value.
The issue changes if I query the line by using the email field or the unique identifier. Using $ or ® instead of € presents no problems; even ™ is turned to ?, though.
What is strange is that if I try to replace the question mark with the original character, MariaDB complains that there is no change in the line, like if that character were in fact present even if not shown!
I tried restarting MariaDB, just in case, but the problem remained.
I am using UTF32 for encodage and utf32_unicode_ci for collation.
I am testing the thing with Sequel_pro without even touching php not to stack things.
At any rate if I execute the query from a php script and parse the result with JSON I get null for the value.
What could be the issue with those special characters?

Plan A: Store the amount as a string and do not try to get the value out of it. This requires, as already mentioned, "utf8 all the way through".
Plan B: Store only the amount in a numeric field. Either store the 'currency' in another field as 'EUR' or 'USD' or ... Or simply assume that all amounts are Euros. Then put the Euro sign in front of the amount when you print it.
Do not use DOUBLE or FLOAT, you get an undesirable extra rounding. Instead, consider DECIMAL(11,2). That will exactly handle amounts in most countries. (A few countries need 4 decimal places; some can live with 0.)
Do not use utf32; use utf8 (or utf8mb4).
A database is a repository of data, not a formatting tool. Keeping this distinction will help avoid problem like this.

Related

Why phone numbers in MySQL database are being truncated

I have created a database table in mySQL of which two column names are "landPhone" and "mobilePhone" to store phone numbers (in the format of: 123-456-8000 for land and 098-765-6601 for mobile). These two columns' data type are set to VARCHAR(30). The data have been inserted in the table. But after SQL query, I found the phone numbers have been truncated. It shows (above two data for example) only first 3 digits (123) for landPhone and only first 2 digits after removing the leading '0' (98) for mobilePhone.
Why this is happening ?
Phone numbers are not actually numbers; they are strings that happen to contain digits (and, in your case, dashes). If you try to interpret one as a number, two things typically happen:
Leading zeros are forgotten.
Everything from the first non-digit to the end of the string is stripped off.
That sounds exactly like the result you're describing. Even if you end up stuffing the result into a string field, it's too late -- the data has already been corrupted.
Make sure you're not treating phone numbers as integers at any point in the process.
You must use
insert into sample values('123-456-8000', '098-765-6601' )
instead of
insert into sample values(123-456-8000, 098-765-6601 )
see this SQLFiddle.
Thanks all for your solution. As cHao suspected, it was me who did the mistake. When I first time created the table, I declared the datatype of the phone columns as INT, later I corrected them to VARCHAR().
When I dropped the table and inserted the same data to the new table, it is working fine.
That sounds exactly like the result you're describing. Even if you end up stuffing the result into a string field, it's too late -- the data has already been corrupted. ..cHao
Question to understand: Why mySQL doesn't override the previous datatype with the new one ?

Ideas for Find and Replace character

I need to search address fields and change one character to upper case if there is an apartment number. So '521 Main St. #3b' would change to '521 Main St. #3B'.
The way I know to do this would be to write a program that loops through the recordset, looks at the address field for the last character to see if it's an alpha, then if the character before it is a numeric, change the case of the last char and update the record.
Is this something that would be quicker/simpler with regular expressions (haven't ever used)?
If so, is this best done from within a programming environmnet or using a text editor such as Textmate or vi ? The data is in MySQL and Excel, but I can export it to a text file.
Thanks.
I solved this using TextMate which, once I began to understand a little regex, was simple. (details here Regex Syntax for making the last character Uppercase in TextMate)
Still, I wonder if something like sed or awk, (which I started to try out) might be a better tool. And the SQL solution that Olexa provided works. I just don't know how to have it apply to the entire recordset.
If the data is stored in MySQL, then it is better to process it there:
UPDATE addresses
SET address = CONCAT(LEFT(address, CHAR_LENGTH(address) - 1), UPPER(RIGHT(address, 1)))
WHERE address REGEXP BINARY '#[[:digit:]]+[[:lower:]]{1}$'
;
I've added BINARY because otherwise REGEXP is not case-sensitive, but BINARY may need to be omitted to support multi-byte strings. In this case, surplus updates will be made, but the result would be correct anyway.
P. S. An example on SQL Fiddle showing which values are affected, and how they are affected: http://sqlfiddle.com/#!2/b29326/1

MSSQL to MySQL migration - char encoding issues with UCS-2 surrogate pairs, how can I remove these from MSSQL database?

I have been tasked with migrating a Microsoft SQL Server 2005 database to MySQL 5.6 (these are both database servers runnig locally) and would really appreciate some help.
-MSSQL source database has latin1 collation (so has ISO 8859-1 character set right?) but doesn't have any char/varchar fields (any string field is nvarchar/nchar) so all this data should be using the UCS-2 character set.
-MySQL target database wants the character set UTF-8
I decided to use the database migration toolkit in the latest version of the MySQL workbench. at first it worked fine and migrated everything as expected. But I have been totally tripped up upon encountering UCS-2 surrogate pair characters in the MSSQL database.
The migration toolkit copytable program did not provide a very useful error message: "Error during charset conversion of wstring: No error". It also did not provide any field/row information on the problem-causing data and would fail within chunks of 100 rows. So after searching through the 100 rows after the last successful insert I found that the issue seemed to be caused by two UCS-2 characters in one of the nvarchar fields. They are listed as surrogate pairs in the UCS-2 character set. They were specifically the characters DBC0 and DC83 (I got this by looking at the binary data for the field and comparing byte pairs (little endian) with data that was being migrated successfully).
When this surrogate pair was removed from the MSSQL database the row was migrated successfully to MySQL.
Here is the problem:
I have tried to search for these characters in a test MSSQL table (this chartest table is just various test strings an nvarchar field) to prepare a replacement script and keep getting strange results... I must be doing something incorrectly.
Searching for
SELECT * FROM chartest WHERE text LIKE NCHAR(0xdc83)
Will return any surrogate pair character (whether or not it uses DC83), but obviously, only if it is the only character (or part of the pair) in that field. This isn't a big deal since I would like to remove any instance of these anyway (I dont like to remove data like this but I think we can afford it).
Searching for
SELECT * FROM chartest WHERE text LIKE '%' + (NCHAR(0xdc83))+ '%'
Will return every row! Regardless of whether it even has a unicode character present in the field let alone the DC83 character. Is there a better way to find and replace these characters? Or something else I should try?
I have also tried setting the target databse, table, and field character set to UCS-2 but it seems as though it does not make a difference.
I should also mention that this migration is using live data (~50GB database!) while one of the sites that feeds it is taken offline so any solutions to this need to have a quick running time...
I would appreciate any suggestions very much! Please let me know if there is any information I have left out.
I had this error, and now I have discovered the source of the problem. I had a hard time finding out, so maybe this will be useful to someone, even though I realize, my problem and workaround may not be spot on matching op's original trouble.
I am migrating data from MSSQL to MySQL, and the content being migrated is html-content from Sitecore CMS (target CMS is Drupal, btw).
I've found, that I get this error when converting the database and hitting records, that contain Instagram-embeds. Instagram-embeds work in the way, that the embedded post data is copied to the embed code (instead of being loaded async., et.c. - even the image is included as base64-css...), and the young people nowadays tend to put a lot of emoji's in their image-descriptions (using their iPhones with Emoji keyboard). Emoji's are represented by 4-byte encoded characters, but MySQL utf8 only allows for 3-byte encoded unicode characters.
My initial error from running wbcopytables.exe (which is the non-GUI way of doing Migration Wizard in MySQL Workbench) was the
Error during charset conversion of wstring: No error
but upgrading MySQL Workbench to recent version (from 5.something to 6.x) makes the error a bit more descriptive, hinting table and column (alas, not row):
ERROR: Could not successfully convert UCS-2 string to UTF-8 in table
[MyDatabase].[dbo].[MyTable] (column MyColumn).
Original string: ...
Anyway - a solution *could* be to use utf8mb4 which would allow for the emoji's. Read more here.
But it looks like, it's a bad idea to do this in e.g. my case with Drupal.
So - the solution I ended up with was simply to strip these characters in my migrate-script. There is no point in keeping these for users of the site in question, since they are being displayed as rectangles on the webpage anyway. Since you can't search-and-replace with regex in SQL Server, I processed the data using a DAL and c# .NET, and I found the help here (thanks a ton, Jon Skeet) - turns out there is a regex-pattern for matching one half of a surrogate pair in UTF-16. See below (and use the pattern in another language if needed).
var noUcs2SurrogatePairsString = Regex.Replace(stringWithUcs2SurrogatePairs, #"\p{Cs}", string.Empty);
I had a very similar problem today, and I found that it was caused by empty strings, replaced them with NULLs or a value representing no data and the migration worked fine.
I solved just editing the "import data script.cmd" where it reads columns "As NVARCHAR" by replacing those with "VARCHAR" only.
Note: My table columns was VARCHAR type already, so... for some stupid reason the migration script improperly cast it to UNICODE (NVARCHAR) type.
This issue has now been resolved. I used user Remus Rusanu's suggestion here for finding the rows with these surrogate pair characters using CHARINDEX and have decided to use SUBSTRING to exclude the troublesome characters like so:
UPDATE test
SET a = SUBSTRING(a, 1, (CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000)))+1)/2 - 1) -- string before the unwanted character
+ SUBSTRING(a, (CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000)))+1)/2 +1, LEN(a) ) -- string after the unwanted character
WHERE CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000))) % 2 = 1 -- only odd numbered charindexes (to signify match at beginning of byte pair character)

Removing strange characters from MySQL data

Somewhere along the way, between all the imports and exports I have done, a lot of the text on a blog I run is full of weird accented A characters.
When I export the data using mysqldump and load it into a text editor with the intention of using search-and-replace to clear out the bad characters, searching just matches every "a" character.
Does anyone know any way I can successfully hunt down these characters and get rid of them, either directly in MySQL or by using mysqldump and then reimporting the content?
This is an encoding problem; the  is a non-breaking space (HTML entity ) in Unicode being displayed in Latin1.
You might try something like this... first we check to make sure the matching is working:
SELECT * FROM some_table WHERE some_field LIKE BINARY '%Â%'
This should return any rows in some_table where some_field has a bad character. Assuming that works properly and you find the rows you're looking for, try this:
UPDATE some_table SET some_field = REPLACE( some_field, BINARY 'Â', '' )
And that should remove those characters (based on the page you linked, you don't really want an nbsp there as you would end up with three spaces in a row between sentences etc, you should only have one).
If it doesn't work then you'll need to look at the encoding and collation being used.
EDIT: Just added BINARY to the strings; this should hopefully make it work regardless of encoding.
The accepted answer did not work for me.
From here http://nicj.net/mysql-converting-an-incorrect-latin1-column-to-utf8/ I have found that the binary code for  character is c2a0 (by converting the column to VARBINARY and looking what it turns to).
Then here http://www.oneminuteinfo.com/2013/11/mysql-replace-non-ascii-characters.html found the actual solution to remove (replace) it:
update entry set english_translation = unhex(replace(hex(english_translation),'C2A0','20')) where entry_id = 4008;
The query above replaces it to a space, then a normal trim can be applied or simply replace to '' instead.
I have had this problem and it is annoying, but solvable. As well as  you may find you have a whole load of characters showing up in your data like these:
“
This is connected to encoding changes in the database, but so long as you do not have any of these characters in your database that you want to keep (e.g. if you are actually using a Euro symbol) then you can strip them out with a few MySQL commands as previously suggested.
In my case I had this problem with a Wordpress database that I had inherited, and I found a useful set of pre-formed queries that work for Wordpress here http://digwp.com/2011/07/clean-up-weird-characters-in-database/
It's also worth noting that one of the causes of the problem in the first place is opening a database in a text editor which might change the encoding in some way. So if you can possibly manipulate the database using MySQL only and not a text editor this will reduce the risk of causing further trouble.

Escape characters in MySQL, in Ruby

I have a couple escaped characters in user-entered fields that I can't figure out.
I know they are the "smart" single and double quotes, but I don't know how to search for them in mysql.
The characters in ruby, when output from Ruby look like \222, \223, \224 etc
irb> "\222".length => 1
So - do you know how to search for these in mysql? When I look in mysql, they look like '?'.
I'd like to find all records that have this character in the text field. I tried
mysql> select id from table where field LIKE '%\222%'
but that did not work.
Some more information - after doing a mysqldump, this is how one of the characters is represented - '\\xE2\\x80\\x99'. It's the smart single quote.
Ultimately, I'm building an RTF file and the characters are coming out completely wrong, so I'm trying to replace them with 'dumb' quotes for now. I was able to do a gsub(/\222\, "'").
Thanks.
I don't quite understand your problem but here is some info for you:
First, there are no escaped characters in the database. Because every character being stored as is, with no escaping.
they don't "look ilke ?". I's just wrong terminal settings. SET NAMES query always should be executed first, to match client encoding.
you have to determine character set and use it on every stage - in the database, in the mysql client, in ruby.
you should distinguish ruby strings representation from character itself.
To enter character in the mysql query, you can use char function. But in terminal only. In ruby just use the character itself.
smart quotes looks like 2-byte encoded in the unicode. You have to determine your encoding first.