I have done some changes in .sql file and after I try to save it, it gives me.
But the problem is I haven't use any characters that I haven't used before in that script. So, what I would like to know is if it is possible to find out which character is causing encoding to change?
I have opened .sql file in Notepad++ and change it to the encoding I needed. This is how I solved the problem, but that is stil not answer to the question how to determine which characters are of different encoding comparing to file itself.
Related
We run a site where users upload image files. When these files are produced on Mac, sometimes they have UTF-8 characters in the file names (Since mac uses UTF-8 as its file system character set).
When PHP7 code receives these files, we have to store them in the local file system, which is Debian Linux, and does not support UTF-8.
Also, while PHP7 can support UTF-8, it does not do it natively or automatically.
So, the question is: what's the current best practice for handling this?
Thought 1:
Save the original name in the database (Collation = utf8mb4_unicode_ci? ), and then store the images using a UUID on disk. Then, use the download="" to have the file download as the original file name.
Pro: Seems to solve the problem.
Con: multibyte support seems to be kludge and clunky in PHP (even in 7.2.x+). Does this require a ton of checks in order to deal with it?
Thought 2:
Sanitize / filter out the UTF-8 characters from the file name to avoid the problem altogether.
Pro: I can use latin collation in MySQL / MariaDB like we always have AND I don't have to worry about the file system charsets.
Con: This is lossy. A file named touche'.pdf will get renamed touch.pdf OR I have to create some equivalency tables to turn e' into e.
Thought 3
I have over-thought this problem, or I am missing a simple solution.
What's the best way to deal with uploaded filenames that are UTF-8 / Multibyte?
Consider PHP's urlencode() to turn UTF-8 characters into % plus hex.
fn 'smiley-☺'
urlencode 'smiley-%E2%98%BA'
bin2hex '736d696c65792de298ba'
I might prefer simply applying urlencode to every entry -- names in plain ascii will be unchanged. And I don't think the % will cause trouble. Other punctuation may cause trouble (eg /).
I know this question has been asked in many forms before but I could not find an answer to my specific question.
THE SCENARIO
Database Mysql, West European - Latin 1
Exported using sequel pro to sql file
Reimported the exact same file and had quotes, apostrophes, em dash, en dash, TM, Reg characters turn to question marks.
I fixed these by opening the sql file in vim and replacing the <94>, <95>, ect. hex codes with their correct counterpart.
Reference:
:%s/<93>/\'/g
Source
MY QUESTION
How can I prevent this from happening in the future? My guess is that these characters are being pasted in from a word doc in the admin part of the website but I don't understand why it re-imports incorrectly. For now, I have replaced all the characters and everything is close to normal. Should I convert the table to utf8? I do have several translations of the site in the DB.
NOTES
I have exported this database several times before and never had this problem. I have exported it to a staging server (which is the same server as the live server, just a different host) and a local server. Any ideas of why this might happen all of a sudden?
I have looked into a lot of issues like double character encoding and wrong encoding formats but I think this situation is different. I also tried several of those solutions and they did not work.
Similar Questions
Special characters get lost in MySQL export/import
In a web application I'm working on all the content seems fine, except for the content which seems to be retrieved from the database. Some special characters are used, and they break - so it's deffinatly a character encoding error somewhere.
When I manually in the browser try to switch from iso-8859-1 to utf-8 the database-content looks fine, but the static is messed up. And vice versa. So I suspect that the static content is iso-8859-1, and the content from the database is utf-8.
I've looked around for some configuration files which states the charset, but when I try changing it, nothing happen.
Will converting my database content to iso-8859-1 help maybe, so it correlates with the static content? In that case, how? I've tried changing the schema and collation but the effect was seamless.
Edit: My apologies. This is an MySQL database.
Maybe it is an issue with your client library that communicates to MySQL.
Before you try to retrieve data from the database, execute this query:
SET NAMES 'utf8'
reference:
http://forums.mysql.com/read.php?103,46870,47245#msg-47245
Hope this helps.
I have this joomla site. I have two versions of it, both using the same database. One has these characters allover the text. The other seems to be alright. What could be the problem? My database is the same so I don't believe it is a character set issue on the db. Anyone?
Something is wrong with the encoding. You're not saying where you get this, so my guess is that it's the browser. Try to change the page encoding (check the menus of your browser) to UTF-8 (or "Unicode"). That should fix it.
If that works, check the code and make sure pages are sent with correct charset settings.
Another source of these problems is that the code reads UTF-8 data from the database as bytes and then encodes that again as UTF-8.
I recently deployed my application. For development I used SQLite and everything was right so far. I have a controller which uses Nokogiri to populate data into my database.
The problem is on production I'm using MySQL instead of SQLite and now my script is populating the data with the wrong encoding.
For instance, it writes "Aragón" instead of "Aragón". The MySQL is using utf8 for both the database and every table.
Nokogiri is probably returning things correctly. I suspect you have a mismatch in the character set of the content you are parsing with Nokogiri, and the database.
Your data being parsed might be ISO-8859-1 or WIN-1252, which are the most common on the internet. You'll need to look in the data to see what it is declared as. Also look at the source for the word "Aragón" and see whether it has embedded upper-bit characters, or entity-encoded characters. By looking at the value for the accented characters you can also get an idea when encoding the characters are.
Odds are good they're not UTF8, so when Nokogiri passes them to your code that writes to the database they will be wrong.
To fix the problem you'll need to either tell Nokogiri what the encoding is, or convert the text to UTF-8 before storing it.
You've got the encoding wrong somewhere in your stack. I bet it's set wrong in MySQL.
Take a look at this: I need help fixing Broken UTF8 encoding