I recently switched to a new server and now when I add new topics to my forum, apostrophes become ' as they enter the database, while the apostrophes get backslashed.
For example:
Mike's website becomes: Mike\'s website. While in the database it becomes: Mike\' website.
I have checked to see if magic_quotes is on via php.ini -- it is off.
I have spent days troubleshooting, with no available reason why.
Strangely, when I use ' which creating the topic in my forum, it returns an apostrophe! I am assuming there is somewhat of a language problem. My forum is programmed to use UTF-8, and so is the database.
I checked database entries before the switch, and oddly, they would get recorded as apostrophes ' and not as ' ... Like it should. I know I could use striplashes(), but I don't want to since it worked properly without them before!
What apache/mysql settings are causing this to occur?
Furthermore, the characters < > also become html code characters.
Related
I've got web-app (jsp) which is using database (mysql). I've put some data into database to test is it work to show in jsp what's in database There was issue with utf-8 characters (polish letters) but i fixed it by adding <parameter-encoding default-charset="UTF-8" /> into glasfish-web.xml. But i still got problem with putting data into database from form. In database instead of polish character i got "?????". I've tried many thinks and nothing Dont reallu now where to look to fixit
ok problem solved I'll put answer for other people having same problem
In jsp where i start my database connection for url="jdbc:mysql://localhost/databasename i changed it into
url="jdbc:mysql://localhost/databasename?useUnicode=true&characterEncoding=UTF-8"
and now everything in database looks like it shoudl
Problem you have faced is that java has escaped the UTF-8 character sequence.
you can use StringEscapeUtils provided by java in order to escape any characters which become ??? or anything else.
try this :
str = org.apache.commons.lang.StringEscapeUtils.unescapeJava(str);
From java
I have Thai characters in MySQL but they don't transfer to my ASP generated web page correctly. I've linked to screen shots of the crucial factors showing the data is OK in the table but not on the website as seen in the last screen shot. Any thoughts what I'm doing wrong?
http://www.transum.com/Temp/ThaiScript.PNG
The first two pictures are from phpMyAdmin
The third picture is the header of my webpage.
The fourth picture shows what appears on the webpage.
I have already added the UTF-8 instruction to the connection:
Conn.execute ("SET NAMES utf8")
SQL = "select * from Phrases WHERE Checked = TRUE Order by English ASC"
set RSrecord = Conn.execute (SQL)
Response.CharSet = "utf-8"
Without seeing more of your source, my first question (or suggestion) is whether you are HTML encoding your output?
Response.Write Server.HTMLEncode(RSrecord.Fields("Thai_Script"))
If this doesn't work for you, can you show a bit more code?
I think I've worked this one out. First of all I should repeat that if you can possibly install MyODBC v5.1 then do so and don't bother trying to configure for 3.51. Unfortunately it would appear that the older driver is all that a certain large and well known web host seems to offer.
Here's a link to a version of my Russian Cyrillic page. It uses the old driver, (MyODBC 3.51)
http://clubdanceholidays.co.uk/aboutusruansi.asp
The crucial points to note are:
1) In the first line:
<%# LANGUAGE="VBSCRIPT" CODEPAGE="1252" LCID="2057" %>
Use 1252 rather than 65001 as your codepage value - I realise just how counterintuitive this is, but in this specific situation it works.
2) Your line
Conn.execute ("SET NAMES utf8")
Adding this query immediately before the query to populate my recordset was the missing piece of the jigsaw for me.
3) The Charset meta tag
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
This should definitely be UTF-8. Essentially we're telling the server that it's a Windows-1252 page and the browser that it's a UTF-8 page.
4) What kind of encoding to use when you save the file
This is a little more complicated. If you're using Notepad then you need to use ANSI. The UTF-8 option is actually UTF-8 + BOM, which seems to create a conflict with the codepage definition in point 1. The problem with ANSI comes if you need to hardcode any non-western characters into your page - you couldn't insert them as they are, you would need to manually encode them first, eg for เครื่องบิน you would need to insert เครื่à¸à¸‡à¸šà¸´à¸™
If you have a more powerful editor than Notepad, (I use Editplus - http://www.editplus.com/ - it's not free but it's quite cheap) you may find two UTF-8 options. Choose the one without BOM.
I have been tasked with migrating a Microsoft SQL Server 2005 database to MySQL 5.6 (these are both database servers runnig locally) and would really appreciate some help.
-MSSQL source database has latin1 collation (so has ISO 8859-1 character set right?) but doesn't have any char/varchar fields (any string field is nvarchar/nchar) so all this data should be using the UCS-2 character set.
-MySQL target database wants the character set UTF-8
I decided to use the database migration toolkit in the latest version of the MySQL workbench. at first it worked fine and migrated everything as expected. But I have been totally tripped up upon encountering UCS-2 surrogate pair characters in the MSSQL database.
The migration toolkit copytable program did not provide a very useful error message: "Error during charset conversion of wstring: No error". It also did not provide any field/row information on the problem-causing data and would fail within chunks of 100 rows. So after searching through the 100 rows after the last successful insert I found that the issue seemed to be caused by two UCS-2 characters in one of the nvarchar fields. They are listed as surrogate pairs in the UCS-2 character set. They were specifically the characters DBC0 and DC83 (I got this by looking at the binary data for the field and comparing byte pairs (little endian) with data that was being migrated successfully).
When this surrogate pair was removed from the MSSQL database the row was migrated successfully to MySQL.
Here is the problem:
I have tried to search for these characters in a test MSSQL table (this chartest table is just various test strings an nvarchar field) to prepare a replacement script and keep getting strange results... I must be doing something incorrectly.
Searching for
SELECT * FROM chartest WHERE text LIKE NCHAR(0xdc83)
Will return any surrogate pair character (whether or not it uses DC83), but obviously, only if it is the only character (or part of the pair) in that field. This isn't a big deal since I would like to remove any instance of these anyway (I dont like to remove data like this but I think we can afford it).
Searching for
SELECT * FROM chartest WHERE text LIKE '%' + (NCHAR(0xdc83))+ '%'
Will return every row! Regardless of whether it even has a unicode character present in the field let alone the DC83 character. Is there a better way to find and replace these characters? Or something else I should try?
I have also tried setting the target databse, table, and field character set to UCS-2 but it seems as though it does not make a difference.
I should also mention that this migration is using live data (~50GB database!) while one of the sites that feeds it is taken offline so any solutions to this need to have a quick running time...
I would appreciate any suggestions very much! Please let me know if there is any information I have left out.
I had this error, and now I have discovered the source of the problem. I had a hard time finding out, so maybe this will be useful to someone, even though I realize, my problem and workaround may not be spot on matching op's original trouble.
I am migrating data from MSSQL to MySQL, and the content being migrated is html-content from Sitecore CMS (target CMS is Drupal, btw).
I've found, that I get this error when converting the database and hitting records, that contain Instagram-embeds. Instagram-embeds work in the way, that the embedded post data is copied to the embed code (instead of being loaded async., et.c. - even the image is included as base64-css...), and the young people nowadays tend to put a lot of emoji's in their image-descriptions (using their iPhones with Emoji keyboard). Emoji's are represented by 4-byte encoded characters, but MySQL utf8 only allows for 3-byte encoded unicode characters.
My initial error from running wbcopytables.exe (which is the non-GUI way of doing Migration Wizard in MySQL Workbench) was the
Error during charset conversion of wstring: No error
but upgrading MySQL Workbench to recent version (from 5.something to 6.x) makes the error a bit more descriptive, hinting table and column (alas, not row):
ERROR: Could not successfully convert UCS-2 string to UTF-8 in table
[MyDatabase].[dbo].[MyTable] (column MyColumn).
Original string: ...
Anyway - a solution *could* be to use utf8mb4 which would allow for the emoji's. Read more here.
But it looks like, it's a bad idea to do this in e.g. my case with Drupal.
So - the solution I ended up with was simply to strip these characters in my migrate-script. There is no point in keeping these for users of the site in question, since they are being displayed as rectangles on the webpage anyway. Since you can't search-and-replace with regex in SQL Server, I processed the data using a DAL and c# .NET, and I found the help here (thanks a ton, Jon Skeet) - turns out there is a regex-pattern for matching one half of a surrogate pair in UTF-16. See below (and use the pattern in another language if needed).
var noUcs2SurrogatePairsString = Regex.Replace(stringWithUcs2SurrogatePairs, #"\p{Cs}", string.Empty);
I had a very similar problem today, and I found that it was caused by empty strings, replaced them with NULLs or a value representing no data and the migration worked fine.
I solved just editing the "import data script.cmd" where it reads columns "As NVARCHAR" by replacing those with "VARCHAR" only.
Note: My table columns was VARCHAR type already, so... for some stupid reason the migration script improperly cast it to UNICODE (NVARCHAR) type.
This issue has now been resolved. I used user Remus Rusanu's suggestion here for finding the rows with these surrogate pair characters using CHARINDEX and have decided to use SUBSTRING to exclude the troublesome characters like so:
UPDATE test
SET a = SUBSTRING(a, 1, (CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000)))+1)/2 - 1) -- string before the unwanted character
+ SUBSTRING(a, (CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000)))+1)/2 +1, LEN(a) ) -- string after the unwanted character
WHERE CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000))) % 2 = 1 -- only odd numbered charindexes (to signify match at beginning of byte pair character)
Somewhere along the way, between all the imports and exports I have done, a lot of the text on a blog I run is full of weird accented A characters.
When I export the data using mysqldump and load it into a text editor with the intention of using search-and-replace to clear out the bad characters, searching just matches every "a" character.
Does anyone know any way I can successfully hunt down these characters and get rid of them, either directly in MySQL or by using mysqldump and then reimporting the content?
This is an encoding problem; the  is a non-breaking space (HTML entity ) in Unicode being displayed in Latin1.
You might try something like this... first we check to make sure the matching is working:
SELECT * FROM some_table WHERE some_field LIKE BINARY '%Â%'
This should return any rows in some_table where some_field has a bad character. Assuming that works properly and you find the rows you're looking for, try this:
UPDATE some_table SET some_field = REPLACE( some_field, BINARY 'Â', '' )
And that should remove those characters (based on the page you linked, you don't really want an nbsp there as you would end up with three spaces in a row between sentences etc, you should only have one).
If it doesn't work then you'll need to look at the encoding and collation being used.
EDIT: Just added BINARY to the strings; this should hopefully make it work regardless of encoding.
The accepted answer did not work for me.
From here http://nicj.net/mysql-converting-an-incorrect-latin1-column-to-utf8/ I have found that the binary code for  character is c2a0 (by converting the column to VARBINARY and looking what it turns to).
Then here http://www.oneminuteinfo.com/2013/11/mysql-replace-non-ascii-characters.html found the actual solution to remove (replace) it:
update entry set english_translation = unhex(replace(hex(english_translation),'C2A0','20')) where entry_id = 4008;
The query above replaces it to a space, then a normal trim can be applied or simply replace to '' instead.
I have had this problem and it is annoying, but solvable. As well as  you may find you have a whole load of characters showing up in your data like these:
“
This is connected to encoding changes in the database, but so long as you do not have any of these characters in your database that you want to keep (e.g. if you are actually using a Euro symbol) then you can strip them out with a few MySQL commands as previously suggested.
In my case I had this problem with a Wordpress database that I had inherited, and I found a useful set of pre-formed queries that work for Wordpress here http://digwp.com/2011/07/clean-up-weird-characters-in-database/
It's also worth noting that one of the causes of the problem in the first place is opening a database in a text editor which might change the encoding in some way. So if you can possibly manipulate the database using MySQL only and not a text editor this will reduce the risk of causing further trouble.
I am currently having an issue saving special characters (® & ©) to a mySQL database.
On a local stack (local to my development machine) the operation runs smoothly and what is entered in the front-end gets saved as what is expected.
With the same front-end code on the centralised server when saving to the DB the character is proceeded with other characters it is saved as ®.
This isn't an issue when viewing the record on the front-end as it reverses what it does and displays correctly.
The issue comes from another separate system which utilises the same database and doesn't do any alteration so appears as the ® instead of simply ®.
The reason for my confusion is that it seems to work as required on my local stack but not on the centralised server.
I am looking for ideas as to what to look for which might be causing this on the servers configuration compared to my local.
Thanks in advance
Mark
#Pranav Hosangadi (thanks) covers three areas to check for consistency of encoding. The following solution adds to that. It may also be worth considering (a variation of) #Soaice Mircea's answer (thanks also) for some scenarios whereby this answer doesn't fix the problem although this wasn't required when I was able to reproduce and find a solution to your problem. #Pranav's line of thinking seems to be successful for this problem as it's about consistency of using one character set everywhere rather than a particular one.
five things to do:
ensure database charset and tables use same charset throughout, inspect this in phpmyadmin for example, note and this charset for use below
use php header() function with database charset e.g.:
header('Content-Type: text/html; charset=latin1_swedish_ci');
insert meta tag in html header e.g.:
<meta http-equiv="content-type"
content="text/html;charset=latin1_swedish_ci">
add charset-accept in form tag
<form action=\"testsubmit.php\" method=\"post\" accept-charset=\"latin1_swedish_ci\">
set charset of mysql connection e.g.:
$con = mysql_connect("localhost","test","test");
mysql_set_charset ( "latin1_swedish_ci", $con );
after you select the DB in php code add this:
mysql_query("set names 'utf8'");
also check if the column where you store the text has utf8 encoding.