Html.Raw() and a "TM" character - html

I'm getting xml data which contains some embedded html formatting. I need to display this in an mvc.net razor cshtml page, and I use the Html.Raw() method to do this. However, this same xml data also contains the "TM" characters inline, and for some reason this gets converted into some jumbled text in the html. Not sure what needs to be repaired for this?

The problem was actually in retrieving the xml- the webclient was defaulting to a latin charset instead of utf-8, so the tm was being garbled in this step. Forcing it to utf-8 fixed the xml and then the html in the page displayed correctly.

Related

How to decode this text and display it as Arabic?

I have some text in a database that is encoded in some way, but I don't know what exactly, and the person who built the database and website left no documentation or comments of any kind. Ask an example, ÌÈÑÇä-1914 is stored in the database, but on the website, it is displayed as جبران-1914. How do I decode ÌÈÑÇä-1914 in order to get the Arabic text?
I've tried decoding ÌÈÑÇä-1914 into windows-1256, ISO-8859-1, UTF-8, but when I put it in an html file and lang-eg in the html tag, and meta charset=utf-8, but nothing is letting me display it as Arabic text instead of some indecipherable characters. How can I convert the encoded text into Arabic?
I realized the problem was that I was exporting the data incorrectly--when I exported it as ISO 8859-2 and converting it to windows 1256 in atom it displayed properly.

write_html() method in fpdf not using font/encoding specified

I'm creating a PDF with a large collection of quotes that I've imported into python with docx2python, using html=True so that they have some tags. I've done some processing to them so they only really have the bold, italics, underline, or break tags. I've sorted them and am trying to write them onto a PDF using the fpdf library, specifically the pdf.write_html(quote) method. The trouble comes with several special characters I have, so I am hoping to encode the PDF to UTF-8. To write with .write_html(), I had to create a new class as shown in their readthedocs under the .write_html() method at the very bottom of the left hand side:
from fpdf import FPDF, HTMLMixin
class htmlFPDF(FPDF, HTMLMixin):
pass
pdf = htmlFPDF()
pdf.add_page()
#set the overall PDF to utf-8 to preserve special characters
pdf.set_doc_option('core_fonts_encoding', 'utf-8')
pdf.write_html(quote) #[![a section of quote giving trouble with quotations][2]][2]
The list of quotes that I have going into the pdf all appear with their special characters and the html tags (<u> or <i>) in the debugger, but after the .write_html() step they then show up in the pdf file with mojibake, even before being saved, as seen through debugger. An example being "dayâ€ÂTMs demands", when it should be "day's demands" (the apostrophe is curled clockwise in the quote, but this textbox doesn't support).
I've tried updating the font I use by
pdf.add_font('NotoSans', '', 'NotoSans-Regular.ttf', uni=True)
pdf.set_font('NotoSans', '', size=12)
added after the .add_page() method, but this doesn't change the current font (or fix mojibake) on the PDF unless I use the more common .write(text_height, quote) method, which renders the underline/italicize tags into the PDF as text. The .write() method does preserve the special characters. I'm not trying to change the font really, but make sure that what's written onto the PDF preserves the special characters instead of mojibake them.
I've also attempted some .encode/.decode action before going into the .write_html(), as well as attempted some methods from the ftfy library. And tried adding '' to the start of each quote to no effect.
If anyone has ideas for a way to iterate through each line on the PDF that'd be terrific, since then I could use ftfy to fix the mojibake. But ideally, it would be some other html tag at the start of each quote or a way to change the font/encoding of the .write_html() method, maybe in the class declaration?
Or if I'm at a dead-end and should just split each quote on '<', use if statements to detect underlines, italicize, etc., and use the .write() method after all.
Extract docx to html works really bad with docx2python. I do this few month ago. I recommend PyDocX. docx2python are good for docx file content extracting, not converting it into a html.

HTML5 placeholder showing numerical HTML encoding

I have an application on Apache. My Apache is configured with default encoding ISO-8859, and I´m not able to change it because Apache suport others applications that need this.
Then, in my application I´m using numerical HTML encoding in special characters, like that: Usu& #225;rio (this is Usuário).
It´s working fine, but in placeholders and title (HTML5 elements), the interface is showing &#225 ; instead to show á.
Any idea?
Thanks
You could rename your .html file to .php and add following line to the first row:
<?php header('Content-Type:text/html; charset=UTF-8'); ?>
This will send a response from server that the content which is sent is encoded in utf-8.
By adding above code nothing will be broken and you wont see any difference exept for correct encoding.
In case you need to move the site from one server to another, you can undo those steps and everything will still work as expected.
It tried to reproduce your issue with the given HTML entity and placeholder encodes the character correctly.
Resolved. I used unicode code point instead numerical HTML encoding. Take a look at UTF-8 encoding table and unicode characters here.

HTML encoding issues - "Â" character showing up instead of " "

I've got a legacy app just starting to misbehave, for whatever reason I'm not sure. It generates a bunch of HTML that gets turned into PDF reports by ActivePDF.
The process works like this:
Pull an HTML template from a DB with tokens in it to be replaced (e.g. "~CompanyName~", "~CustomerName~", etc.)
Replace the tokens with real data
Tidy the HTML with a simple regex function that property formats HTML tag attribute values (ensures quotation marks, etc, since ActivePDF's rendering engine hates anything but single quotes around attribute values)
Send off the HTML to a web service that creates the PDF.
Somewhere in that mess, the non-breaking spaces from the HTML template (the s) are encoding as ISO-8859-1 so that they show up incorrectly as an "Â" character when viewing the document in a browser (FireFox). ActivePDF pukes on these non-UTF8 characters.
My question: since I don't know where the problem stems from and don't have time to investigate it, is there an easy way to re-encode or find-and-replace the bad characters? I've tried sending it through this little function I threw together, but it turns it all into gobbledegook doesn't change anything.
Private Shared Function ConvertToUTF8(ByVal html As String) As String
Dim isoEncoding As Encoding = Encoding.GetEncoding("iso-8859-1")
Dim source As Byte() = isoEncoding.GetBytes(html)
Return Encoding.UTF8.GetString(Encoding.Convert(isoEncoding, Encoding.UTF8, source))
End Function
Any ideas?
EDIT:
I'm getting by with this for now, though it hardly seems like a good solution:
Private Shared Function ReplaceNonASCIIChars(ByVal html As String) As String
Return Regex.Replace(html, "[^\u0000-\u007F]", " ")
End Function
Somewhere in that mess, the non-breaking spaces from the HTML template (the s) are encoding as ISO-8859-1 so that they show up incorrectly as an "Â" character
That'd be encoding to UTF-8 then, not ISO-8859-1. The non-breaking space character is byte 0xA0 in ISO-8859-1; when encoded to UTF-8 it'd be 0xC2,0xA0, which, if you (incorrectly) view it as ISO-8859-1 comes out as " ". That includes a trailing nbsp which you might not be noticing; if that byte isn't there, then something else has mauled your document and we need to see further up to find out what.
What's the regexp, how does the templating work? There would seem to be a proper HTML parser involved somewhere if your strings are (correctly) being turned into U+00A0 NON-BREAKING SPACE characters. If so, you could just process your template natively in the DOM, and ask it to serialise using the ASCII encoding to keep non-ASCII characters as character references. That would also stop you having to do regex post-processing on the HTML itself, which is always a highly dodgy business.
Well anyway, for now you can add one of the following to your document's <head> and see if that makes it look right in the browser:
for HTML4: <meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
for HTML5: <meta charset="utf-8">
If you've done that, then any remaining problem is ActivePDF's fault.
If any one had the same problem as me and the charset was already correct, simply do this:
Copy all the code inside the .html file.
Open notepad (or any basic text editor) and paste the code.
Go "File -> Save As"
Enter you file name "example.html" (Select "Save as type: All Files (.)")
Select Encoding as UTF-8
Hit Save and you can now delete your old .html file and the encoding should be fixed
Problem:
Even I was facing the problem where we were sending '£' with some string in POST request to CRM System, but when we were doing the GET call from CRM , it was returning '£' with some string content. So what we have analysed is that '£' was getting converted to '£'.
Analysis:
The glitch which we have found after doing research is that in POST call we have set HttpWebRequest ContentType as "text/xml" while in GET Call it was "text/xml; charset:utf-8".
Solution:
So as the part of solution we have included the charset:utf-8 in POST request and it works.
In my case this (a with caret) occurred in code I generated from visual studio using my own tool for generating code. It was easy to solve:
Select single spaces ( ) in the document. You should be able to see lots of single spaces that are looking different from the other single spaces, they are not selected. Select these other single spaces - they are the ones responsible for the unwanted characters in the browser. Go to Find and Replace with single space ( ). Done.
PS: It's easier to see all similar characters when you place the cursor on one or if you select it in VS2017+; I hope other IDEs may have similar features
In my case I was getting latin cross sign instead of nbsp, even that a page was correctly encoded into the UTF-8. Nothing of above helped in resolving the issue and I tried all.
In the end changing font for IE (with browser specific css) helped, I was using Helvetica-Nue as a body font changing to the Arial resolved the issue .
I was having the same sort of problem. Apparently it's simply because PHP doesn't recognise utf-8.
I was tearing my hair out at first when a '£' sign kept showing up as '£', despite it appearing ok in DreamWeaver. Eventually I remembered I had been having problems with links relative to the index file, when the pages, if viewed directly would work with slideshows, but not when used with an include (but that's beside the point. Anyway I wondered if this might be a similar problem, so instead of putting into the page that I was having problems with, I simply put it into the index.php file - problem fixed throughout.
The reason for this is PHP doesn't recognise utf-8.
Here you can check it for all Special Characters in HTML
http://www.degraeve.com/reference/specialcharacters.php
Well I got this Issue too in my few websites and all i need to do is customize the content fetler for HTML entites. before that more i delete them more i got, so just change you html fiter or parsing function for the page and it worked. Its mainly due to HTML editors in most of CMSs. the way they store parse the data caused this issue (In My case). May this would Help in your case too

File upload mojibake

How do you do a file upload in an HTML form without running into mojibake?
I have a form that has three fields:
a file field
a required text field
a text field which accepts Japanese characters
I've set up my HTML form with the attribute enctype='multipart/form-data'. But when the form submission fails due to the missing required field, I get redirected to the same page but my 2nd text field (the one that accepts the Jap. chars) is already mojibaked.
However, if I remove the enctype or change it to anything else, and when the form submission fails, I see the Japanese chars as they are (no mojibake). The problem is, if this succeeds, I am unable to read the uploaded files.
Any ideas how to fix this??
Mojibake (mangled display of Japanese characters) can have two causes:
The data on the page is in the right character encoding, but the browser does not recognize it.
Some characters on the page use the wrong encoding (the server wrote them in an incorrect encoding).
If the other characters on the page (outside of your form) show correctly, you produced broken output on your server.
If everything is clobbered, and you can fix it by manually setting a different encoding from the browser's menu, then the page encoding is not properly specified.
What kind of content-type headers and HTML meta tags do you use?
I've figured it out (by reverse-engineering appfuse (appfuse.org) which does not seem to be affected by mojibake with its file upload form ).
It solved it by setting the charset encoding to UTF-8 in the server side (with spring's org.springframework.web.filter.CharacterEncodingFilter ). Thus, I guess multipart-/form-data really does screw up the character encoding ( or at least for java ).