How to truncate (ö, é etc.) efficiently from R to CSV to HTML? - html

I create some data in R (using R studio) that I export as a csv. This cdv will then be uploaded to HTML.
However, I always get bugs with symbols like é and ö and ä.
Is there a way I can "code" accordingly in my R file so in the HTML will look right, i.e. readable like é/ä/ö/ü....
Thank you!

You need to encode special characters, eg instead of ä, you need to put ä.
You can find the full list here: https://dev.w3.org/html5/html-author/charref

Related

Do you know how to make Hebrew JSON file show correct characters in HTML page?

I am having a problem in showing Hebrew letters on my HTML page. I am using (as far as I know maybe I'm wrong) JSON file from here:
https://getbible.net/json?scripture=Psa%20119&version=bhs
I want it to look like this one:
https://www.biblegateway.com/passage/?search=Ps.119&version=WLC
But I still get only this: u05d0\u05b7\u05e9\u05c1\u05b0\u05e8\u05b5\u05d9 \u05e0\u05b9\u05e6\u05b0\u05e8\u05b5\u05d9 \u05e2\u05b5\u05d3\u05b9\u05ea\u05b8\u05d9\u05d5
I mean I want it to show all Hebrew symbols. Does anyone know how to fix that on the HTML page? Thank you.
I don't know what framework library you are using but it should be displayed correctly unless you or the library you are using escapes the unicode characters a second time ("\u05d0" becoming "\\u05d0").
var verse ={"verse_nr":1,"verse":"\u05d0\u05b7\u05e9\u05c1\u05b0\u05e8\u05b5\u05d9 \u05ea\u05b0\u05de\u05b4\u05d9\u05de\u05b5\u05d9\u05be\u05d3\u05b8\u05e8\u05b6\u05da\u05b0 \u05d4\u05b7\u05d4\u05b9\u05dc\u05b0\u05db\u05b4\u05d9\u05dd \u05d1\u05bc\u05b0\u05ea\u05b9\u05d5\u05e8\u05b7\u05ea \u05d9\u05b0\u05d4\u05d5\u05b8\u05d4\u05c3\r\n"};
document.getElementById("content").textContent = verse.verse;
<div id="content"/>

How to insert these non-ascii characters as html content?

Any non-ascii representation is written as &#xYYYY.
As per below code,
Editor is Sublime Text.
How do I represent these emoticons in html?
I found this Sublime Text 3 Plugin to insert emojis into the editor.
https://packagecontrol.io/packages/Emoji
Is this what you are looking for?
As long as you save the file you're editing using the UTF-8 character encoding, and make sure it is delivered with a suitable content type header, such as Content-Type: text/html; charset=utf-8 you don't need to do anything at all.
Another option, as others have noted, is adding them as HTML entities instead. In order to do that you would need to know their character codes. How to do that differs between different environments, but there are multiple questions on SO about that.
Here's how you could do it in Python (Python 3, you'd need to use u"" strings in earlier versions):
chars = [
"😀",
"😐",
"😳",
"😫",
"💩"
]
for char in chars:
print("{}: &#{:02x};".format(char, ord(char)))
You can represent Emoticons in html as its unicode symbol formatted as 򪪪 (some unicode list)
<div>😁</div>
I am using sublime text as editor and when you see it on browser it should look like these:

dropdown doesn't display UTF-8 correctly

I have a <select> element with some options on a dropdown. on that dropdown i have product some of these product have names that come up with special characters like é. But on the front-end instead of showing the é it shows the ä characters.
for solution I tried to use special characters like É for é inside a textfield. But when I replace the é with É inside a textfield, on the front-end it shows the É My magento store charset is utf8.
i want to use é, $, ä etc... of my Magento store. is there any way to solve this problem tihs doesn't affect the rest of the website
You will have to save the file in UTF-8 as well. Both the file presenting the text, as well as the file that outputs the data that populates the selectbox.
A common misstake, at least for myself, is that when working with UTF-8, you have to ensure that everything is saved using it. Scripts, codebehind, html - Everything.
David Johansson is correct.
I had the same problem with a box with a list of names.
I populated it via a function that looked up the people and created the lines for each person found. However people with names containing accents didn't display correctly.
I resolved it by running my result through iconv before returning the value.
return iconv('ISO-8859-1','UTF-8', $retval);

Encode only non-ASCII characters to HTML entities, keeping HTML tags

I'm pulling text from a database, processing it, and uploading it as plain text to an HTML email creator. The email tool is internal to my company. It can take simple HTML tags, but it can't handle non-ASCII characters. They will be displayed as ¿ to the end user. As an example of what I'm working with, the source text from the database might look like this:
The café was…<br/>“delicious”.
My desired output would be
The café was…<br/>“delicious”.
If I use an HTML entity encoder like HTMLEntities it encodes everything, including the tag brackets (< and >). Here's the output from using HTMLEntities:
The café was…<br/>“delicious”.
If I upload the above to the HTML email tool, the end-user would see this in their email:
The café was…<br/>“delicious”
Is there any way to get the best of both worlds, where the tags are left alone but the non-ASCII characters are encoded as HTML entities? I could continue using HTMLEntities and just use a gsub; something like this:
coder = HTMLEntities.new
string = "The café was…<br/>“delicious”."
coder.encode(string, :named).gsub(/</, "<").gsub(/>/, ">")
#=> "The café was…<br/>“delicious”."
This seems pretty fragile to me. Any better way to do it?
Can you try to check encoding of your data!
Make sure your database are saving your data in UTF-8, and add:
# encoding: UTF-8
in top of your Ruby file.

HTML Character Encoding

When outputting HTML content from a database, some encoded characters are being properly interpreted by the browser while others are not.
For example, %20 properly becomes a space, but %AE does not become the registered trademark symbol.
Am I missing some sort of content encoding specifier?
(note: I cannot realistically change the content to, for example, ® as I do not have control over the input editor's generated markup)
%AE is not valid for HTML safe ASCII,
You can view the table here: http://www.ascii.cl/htmlcodes.htm
It looks like you are dealing with Windows Word encoding (windows-1252?? something like that) it really will NOT convert to html safe, unless you do some sort of translation in the middle.
The byte AE is the ISO-8859-1 representation for the registered trademark. If you don't see anything, then apparently the URL decoder is using other charset to URL-decode it. In for example UTF-8, this byte does not represent any valid character.
To fix this, you need to URL-decode it using ISO-8859-1, or to convert the existing data to be URL-encoded using UTF-8.
That said, you should not confuse HTML(XML) encoding like ® with URL encoding like %AE.
The '%20' encoding is URL encoding. It's only useful for URLs, not for displaying HTML.
If you want to display the reg character in an HTML page, you have two options: Either use an HTML entity, or transmit your page as UTF-8.
If you do decide to use the entity code, it's fairly simple to convert them en-masse, since you can use numeric entities; you don't have to use the named entities -- ie use ® rather than &#reg;.
If you need to know entity codes for every character, I find this cheat-sheet very helpful: http://www.evotech.net/blog/2007/04/named-html-entities-in-numeric-order/
What server side language are you using? Check for a URL Decode function.
If you are using php you can use urldecode() but you should be careful about + characters.