Character encoding not being picked up - html

http://www.mamstore.co.uk/bin/pxisapi1.exe/catalogue?level=805838
Look where its (meant to say) £5 T-shirts. Instead the '£' comes up as an invalid character, yet the exact same char is shown just below on the products.
I am getting the same when i pull a php files contents in with Jquery. The actual PHP file shows the chars correctly (without any head/body set etc) as soon as i pull it into the site it suddenly has issues with it.
Its stored in an SQL DB on a custom build CMS / WMS system.
Any suggestions would be much appreciated.
Cheers

Your page is encoded with UTF, but character in breadcrumbs is encoded with ISO. What encoding do you have in your database?

Related

How do you save a JSON response with Emojis as Unicode?

Currently I am scraping Instagram comments for a sentiment analysis project, and am using an Instagram scraper. It is supposed to output a comment file but it doesn't, so a workaround is to find the query URL in the log file and paste it into a browser.
An example URL would be this https://www.instagram.com/graphql/query/?query_hash=33ba35852cb50da46f5b5e889df7d159&variables={%22shortcode%22:%22CMex-IGn1G-%22,%22first%22:50,%22after%22:%22QVFCaERkTm84aWF3T1Exbmw5V0xhb05haVBEY2JaYmxhSTNGWVZ4M2RQWi0yVzVUSExlUlRYOUtsOVEtM0trRzBmSGxyYjdJV094a1hlYm1aLXZjdkVpZQ==%22}.
On Firefox I am able to view the JSON response and am also able to download it through two ways:
CTRL + A to select all and paste into a JSON file.
Download webpage as a JSON file.
The issue with these methods are that neither of these retain the emoji data. The first loses the emojis as they are not stored in unicode, but rather as question marks ???. I assumed this was related to the encoding, so tried to paste the raw response into Unicode files. Instead they are the emojis which can be represented as emojis ️🙌👏😍, but not unicode.
The second method either saves it with only the message {"message":"rate limited","status":"fail"} or another incorrect format.
The thing is, is that a few months ago I scraped some pages and managed to save the comments with the emojis stored in the unicode format. This is frustrating as I know it can be done, but I can't remember the process how I did it as I would have tried something basic, as I have outlined.
I am out of ideas and would greatly appreciate any help. Thank you.

Freemarker CSV generation - CSV with Chinese text truncates the csv contents

I have this very weird problem. I'm using Java 8, Struts2 and Freemarker 2.3.23 to generate reports in csv and html file formats (via.csv.ftl and .html.ftl templates both saved in utf-8 encoding), with data coming from postgres database.
The data has chinese characters in it and when I generate the report in html format, it is fine and complete and chinese characters are displayed properly. But when the report is generated in csv, I have observed that:
If I run the app with -Dfile.encoding=UTF-8 VM option, the chinese characters are generated properly but the report is incomplete (i.e. the texts are truncated specifically on the near end part)
If I run the app without -Dfile.encoding=UTF-8 VM option, the chinese characters are displayed in question marks (?????) but the report is complete
Also, the app uses StringWriter to write the data to the csv and html templates.
So, what could be the problem? Am I hitting Java character limits? I do not see error in the logs either. Appreciate your help. Thanks in advance.
UPDATE:
The StringWriter returns the data in whole, however when writing the data to the OutputStream, this is where some of the data gets lost.
ANOTHER UPDATE:
Looks like the issue is on contentLength (because the app is a webapp and csv is generated as file-download type) being generated from the data as Strings using String.length(). The String.length() method returns less value when there should be more. Maybe it has something to do with the chinese characters that's why length is being reported with less value.
I was able to resolve the issue with contentLength by using String.getBytes("UTF-8").length

how to pass backslash in html form

I know very little html, I have a backend application that does a mongodb lookup. I am building a simple html screen with forms to accept value to a web service which will run the mongo query and reply on the screen.
When I pass a filename path field in my form like this
\\test.server.com\filetest\test
in my web service app, I see the value coming in as
%5c%5Ctest.server.com%5cfiletest%5ctest
how can I get the value without this translation.
Matter fact I was hoping it would come in like this
\\\\test.server.com\\filetest\\test
as that is how things got stored in mongo.
You cannot pass a backslash directly as it is. That's because URLs can only be ASCII encoded. What this means is, that when you need to pass some special characters like Ü, as well as characters that need to be escaped in URLs (as spaces, backslashes, etc.) you need a way to represent them with ASCII symbols.
In your case the URL is getting encoded and backslashes are converted to %5c. To have them revert to '\' you need to either:
Decode them back in your server-side code. This is your best bet. This is done in different ways, depending on the technology your backend uses. In PHP, for example, you can use urldecode function - here.
Decode characters before querying in mongodb itself. This you will need to work on, because I'm not aware of a functionality that does this for you out of the box.
More info on URL encoding can be found here.
Hope this helps!

User import from CSV German

Using the Moodle user import from CSV, we have the problem, that some German names with letters like Ö,ä,ü are imported "falsely". I presume, that the problem is in the encoding, here are the two possibilities, which I tested:
ANSI-encoding: The German letters disappear, for example Michael Dürr appears like Michael Drr in the listed users to import.
UTF-8-encoding: The letters appear as Michael Drürr
Does anyone has solution for the problem, or it has to be fixed one by one in the user's list?
I'm guessing the original file is using a different encoding. Try to convert the csv file to utf8 then import.
How do I correct the character encoding of a file?
you have to configure the database connection to make sure the encoding you choose for your webapplication (moodle) is the same as the encoding your database connection will choose.
look for SET NAMES 'utf8' or similar if you use mariadb/mysql as database.
and compare off course to the encoding of your import file. maybe you will need to convert it first. in any case the encoding of your web gui, the file, and the database connection (client character set) should be the same.
for web application check in your browser via View->Encoding or something similar, or check the meta header tag for the encoding in your html source code.
for file, use some editor or the like that will display the chars correctly and will indicate the charset.
for database, depends on your database.)

Howto display localized characters

I have a MySQL database, php code to retrieve from the database, and an Android program that recieve the output from the php code via http post.
My localized characters displays as question marks in my program. I have tried different charsets in my database. utf8_general_ci, utf8_unicode_ci, latin1_general_ci - still questions marks. In html I could use the code ø, but not in an Android program - and I shouldn't have to.
First of all, where is this problem comming from? The database itself have no problems displaying localized characters with utf8_. Android also have no problems. Is it the http post request or php that have problems with this?
I would check layer by layer:
Even the DB encoding is UTF8 - you are sure the value is properly stored?
Do you have a way to test case the API (i.e. by using a web interface), if needed you would do packet inspection and check the proper UTF-8 value
Is your PHP API sending the correct encoding?
When reading the HTTP response in Android (i.e. parsing the Stream), are you supplying a encoding & is it the correct one?