Generating HTML with inclosed HTML stored in DB - html

I'm generating HTML pages with dynamic content. Users can personalize their pages, by adding a footer. My user control panel stores the footer in a UFT-8 table (MySQL). The footer itself can have HTML.
When I generate my page I'm inserting the footer inside a DIV. My doctype is
'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">'+
'<html xmlns="http://www.w3.org/1999/xhtml">'+#10+
'<head>'+#10+
'<meta http-equiv="content-type" content="text/html;charset=utf-8" />'+#10+
This works, but accentuation and characters like "•", don't display correctly. I've tried using HTMLEscape, but it's breaking the footer HTML.
My question is: what is the most simple way to correct this, without iterating all the special chars and escaping them one by one.

Things I would check:
does the compiler issue warnings about string conversion problems?
can you save the HTML to a file, and open it with an editor like Notepad++ to verify it is UTF-8 encoding?
does the web server set the Content-Type header to text/html; charset=utf-8?

Related

How do I display Unicode as text in HTML?

I can't manage to find a way to do this.For example ∞ (infinity symbol) to display as text in a HTML document
You have first to check what is the Content-Type header your server returns? Is it Content-Type: text/html; charset=UTF-8? See Character_encodings_in_HTML If the server returns the charset, either fix it or use it, it overrides user provided encoding. (see HTML entities).
If your server does not provide charset, then add one in the document, as early as possible (should be in the first 1024 bytes entirely). Again, see Character_encodings_in_HTML. The following header should do:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
or for HTML 5:
<meta charset="utf-8">
or for XHTML (the first line):
<?xml version="1.0" encoding="ISO-8859-1"?>
And if you do not/can not use UTF-8 for your document, use HTML entities like
C Travel suggests.
You write the character, e.g. “∞”, in your authoring program, save the file as UTF-8 with BOM, and make sure that the fonts that you have declared for the page, or the relevant piece of text, contain the characters(s) you have included. For more information, see my Guide to using special characters in HTML. If problems remain, please post the code you have tried and specify how it fails (and on which browsers).
You can use the &#; HTML element.
For codes: http://unicode-table.com/en/
And you have to use UTF-8 encoding for the file save, and you have to put UTF-8 meta tag in the header too. (If you didn't already have this.)

Encoding issue in Mailchimp emails

I created a Mailchimp template for the email newsletter of the company I work for. There's an issue with some links and I can't work out how to fix them.
I add a link into the email like so:
Contact Us
And the link appears fine, but when clicked within Gmail it takes you to the site's 404 page, even though the URL is (on the surface) correct.
After clicking the link, the URL displayed in the address bar is http://www.nameofcompany.com/contact-us.php, which is the correct URL and which when typed into the address bar directly goes to the correct page. But when I visit this URL from the email, then copy and paste it from the address bar into a new email in Gmail, I see: http://www.nameofcompany.com/contact%E2%80%90us.php
So this appears to be an issue with character encoding. I have no idea how to fix it though.
Here's the doctype, charset etc from the HTML of the email.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
The strangest thing is, most URLs in the email work perfectly fine, even those with dashes in.
What's causing this issue and how can I fix it?
Cheers
Okay, fixed it. I found the template on Mailchimp and used "Edit this template's code" to edit the HTML within my browser. Then I found the tag that was causing the trouble, deleted it and typed it back out again. Bit of a crude fix and I'm not sure why the problem originally arose but it's done the job!
E2 80 90 is the Unicode byte sequence for the multibyte hyphen character and you should be using the ASCII one.
Which app. do you use for coding your html files? This has to do with the encoding of the hyphen. Try using a simple text font if you are given such option at your code editor (such as Courier).
I don't know why hyphen was encoded into 3 bytes here - usually non alpha/num characters in URL are encoded into one byte.
Try replacing hyphen with %2D, so that hyphen won't be transformed into %E2%80%90.

Why does the web page I fetch with Perl look odd?

I have a Perl script to open the page http://svejo.net/popular/all/new/ and filter the names of the posts, but except headers, everything seems encrypted. Nothing can be read.
When I open the same page in a browser everything looks fine, including the source code. How is it possible to encrypt a page for a script and not for a browser? My Perl script sends the same headers as my browser (Google Chrome).
The page looks fine to me, although I don't read Bulgarian.
#!perl
use LWP::Simple;
getprint( 'http://svejo.net/popular/all/new/' );
This script returns the plain page without anything that looks odd or encrypted:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="bg" lang="bg">
<head>
<title>Svejo — Популярните новини </title>
What were you trying, and which versions of perl and the modules are you using? What is the output that you are seeing?
You clarify that you are using ActivePerl on Windows (please update your question with additional details). Remember, not only do you need to do the right Unicode things in your programs, but your terminal has to be set up to display Unicode properly.
What happens when you explicitly binmode your output?
binmode STDOUT, ':utf8';
Try saving the output to a file and looking at it in an editor that understands UTF-8.
Okay, that didn't work. Let's get even more general and set all handles to use UTF-8 by default:
use open IO => ':utf8';
The page is encoded with UTF-8.
Perhaps your Perl script is using a different encoding?
I found this page that describes Processing UTF-8 Files with Perl.

Validate html 4.01 tags with weblogic

I need to validate a web application in html 4.01 transitional. In my project im working on skeletons/head.jsp to add meta tags. The problem is that i want to add tags like:
<meta name="robots" content="follow">
without the enclosing tag. And the document type is defined on skeleton.xml as HTML 4.01 Transitional. But when erase the slash the WorkShop Framework (Eclipse) fails if no exist ending tag.
It's the head.js where I have to put the meta tags?
“But when erase the slash the program fails.”
I’m not familiar with WebLogic, but when you say “the program fails”, do you mean your website stops working? Or the resulting HTML page doesn’t validate?

why my foreign language(malayalam) characters are stored as html characters in database

in my web site, using google language api , i type malayalam language in text box and text area ,
ഇതു ഒരു നല്ല സിനിമ ആണ്
like this, but when i look in to the mySQL database, in the table, it is
ഇതു ഒരു
നല്ല സിനിമ
ആണ്
and there is no problem for viewing it back in my website, but i use this same table data for my desktop application and in that application when fetching data from db, it shows as ഇതു ഒരു നല്ല സിനിമ ആണ്
not in my native language, if i do a manual update using sql query to the database, then in both website and in my desktop application it is shown correctly in my local language, my web server is tomcat. and my db is mySQL db and db is utf enabled.
what should i do in my java application to save direclty in local language other than സ ..etc.
You need to make sure that your web page is encoded in UTF-8, and declared so in the Content-Type HTTP header. Otherwise, the web browser may choose to send HTML-escaped characters when characters outside the presumed page encoding are entered.
Use Unicode encode at the beginning
HTML5
<meta charset="utf-8">
HTML 4.01
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
xhtml 1.0
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
xml
<?xml version="1.0" encoding="UTF-8"?>
Stop HTML escaping your data. But do continue to database-escape it.
Something in your website code may well be HTML encoding the characters before storing them in the database.
Have a look at exactly how your strings are represented just before the database insert/update. If necessary, HTMLDecode the data before inserting into the DB.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml">
Mallu ഇതു ഒരു നല്ല സിനിമ ആണ്