Why does the web page I fetch with Perl look odd?

Why does the web page I fetch with Perl look odd? - html

I have a Perl script to open the page http://svejo.net/popular/all/new/ and filter the names of the posts, but except headers, everything seems encrypted. Nothing can be read.
When I open the same page in a browser everything looks fine, including the source code. How is it possible to encrypt a page for a script and not for a browser? My Perl script sends the same headers as my browser (Google Chrome).

The page looks fine to me, although I don't read Bulgarian.
#!perl
use LWP::Simple;
getprint( 'http://svejo.net/popular/all/new/' );
This script returns the plain page without anything that looks odd or encrypted:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="bg" lang="bg">
<head>
<title>Svejo — Популярните новини </title>
What were you trying, and which versions of perl and the modules are you using? What is the output that you are seeing?
You clarify that you are using ActivePerl on Windows (please update your question with additional details). Remember, not only do you need to do the right Unicode things in your programs, but your terminal has to be set up to display Unicode properly.
What happens when you explicitly binmode your output?
binmode STDOUT, ':utf8';
Try saving the output to a file and looking at it in an editor that understands UTF-8.
Okay, that didn't work. Let's get even more general and set all handles to use UTF-8 by default:
use open IO => ':utf8';

The page is encoded with UTF-8.
Perhaps your Perl script is using a different encoding?
I found this page that describes Processing UTF-8 Files with Perl.

Related

System settings on MAC gives error for HTML

I just started to learn HTML and I am using a MAC and using Sublime as my text editor. I have written 6 lines of HTML code but unfortunately it gives this strange symbol output on my browser-what could be the problem? I think it has to do with either my system or browser settings on my computer.
My output on my Chrome/Safari Browser
My basic HTML code
Any advice would be appreciated, thanks!

It's an encoding problem. You have to set the correct encoding in the HTML head via meta tag:
<meta http-equiv="Content-Type" content="text/html"; charset="UTF-8">

You need to make sure your default encoding is set to UTF-8.
Below is the snippet from the default settings.
You need to add this to your user settings.
Go to your User Settings: Preferences > Settings - User and paste the snippet.
// Encoding used when saving new files, and files opened with an undefined
// encoding (e.g., plain ascii files). If a file is opened with a specific
// encoding (either detected or given explicitly), this setting will be
// ignored, and the file will be saved with the encoding it was opened
// with.
"default_encoding": "UTF-8",

Try deleting the whitespace first and see if there are any invisible characters being added. If that doesn't work, check the encoding of your file. Its possible you have a charset issue. Another thing you could do is try using another editor and/or another browser to see if the errors persist there as well. That will help you find out the source of the problem easier if charset or invisible characters aren't the issue.
Check here.
http://www.w3schools.com/html/html_charset.asp

Generating HTML with inclosed HTML stored in DB

I'm generating HTML pages with dynamic content. Users can personalize their pages, by adding a footer. My user control panel stores the footer in a UFT-8 table (MySQL). The footer itself can have HTML.
When I generate my page I'm inserting the footer inside a DIV. My doctype is
'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">'+
'<html xmlns="http://www.w3.org/1999/xhtml">'+#10+
'<head>'+#10+
'<meta http-equiv="content-type" content="text/html;charset=utf-8" />'+#10+
This works, but accentuation and characters like "•", don't display correctly. I've tried using HTMLEscape, but it's breaking the footer HTML.
My question is: what is the most simple way to correct this, without iterating all the special chars and escaping them one by one.

Things I would check:
does the compiler issue warnings about string conversion problems?
can you save the HTML to a file, and open it with an editor like Notepad++ to verify it is UTF-8 encoding?
does the web server set the Content-Type header to text/html; charset=utf-8?

Unicode is not shown in meta tag

I have used unicode in my website's meta tag as follows.
<meta property="og:title" content="ශ්‍රී ලංකා" />
But when I get view source in browser, it is shown as follows.
<meta property="og:title" content="????????" />
How can I avoid this?
Thank you.

With an editor like Notepadd++, you must change the file encoding to UTF-8:

The Sinhala characters in your file have been converted to question marks somewhere in the process of uploading to the server or in server actions. They are actual question marks “?”, U+003F, not problem indicators used by browsers or source viewers. Question marks also appear near the very end of the page in visible content, line 445: ?????
The page appears to be served simply from a static HTML file by an Apache server, with no special server-side technology (though one cannot be sure, when looking from outside). This suggests that something has gone wrong in the upload process, like incorrect character code conversion (assuming you have checked that the file in your authoring system is UTF-8 encoded and displays correctly). This may happen if you transfer a file in “text mode” or “Ascii mode”, so I suggest uploading it again, in as raw mode as possible.

html file displaying wierd characters when copied from Windows to Mac

Ok, I think the title pretty much sums the question up nicely. Basically, I've written an help file on my windows machine in HTML, so it includes characters like the following:
®, ', ", ...
Obviously it displays fine on Windows, but when I copy the file to my Mac and try to view it the characters above turn jibberish and look foriegn. I could type them on my Mac and save it, but I'm just worried that I need to do something to prevent the same thing from happening on other computers/environments.
If anybody knows how I can stop this from happening, as easily as possible, I'd be greatful to know. Thanks in advance...

Make sure your HTML file is saved as UTF8 and use the UTF8 meta tag:
To save a file as UTF-8, open it in using NotePad and choose "save as", then make sure encoding is set as UTF-8.
To add the UTF-8 meta tag to your HTML file, just add the following line in the "head" section: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
UTF8 is designed for backward compatibility with ASCII and to avoid the complications of endianness and byte order marks in UTF-16 and UTF-32. See: Wikipedia

My assumption is either due to file encoding (maybe one uses UTF-8 and the other iso-8859-1) or due to differences between editors. Try on the Windows machine pasting the code into Notepad or Wordpad, then sending that code to the Mac.

You can save it as unicode and add the meta like John Riche said or replace it by its HTML entities:
® = ®
http://www.w3schools.com/tags/ref_entities.asp

Eclipse HTML editor for HTML template files

I'm trying to edit phpbb HTML template file with Eclipse Ganymedes version 3.4.1 containing Web Developer Tools.
These template files contain HTML markup with template variable marks in form {variable_name}. Now, when trying to open such file, Eclipse trys to validate also these template variable marks.
For example template contains
<meta http-equiv="content-type" content="text/html; charset={S_CONTENT_ENCODING}" />
After opening Eclipse shows on editor body:
Unsupported Character Body
Character encoding "{S_CONTENT_ENCODING}" is not supported by this platform.
<button>Set encoding...</button>
How to solve this using WTP or is there any better editor for template editing purpose ?

Eclipse is trying to determine the text encoding from your meta tags and fails.
To override this behavior open the file in eclipse so you can see the error. Open the File menu and choose Properties (Alt-Enter) and eclipse will show you the properties dialog for the file where you can change the text file encoding.
I don't know if this can be disabled for all the files.

I've never used Eclipse on Linux, but it looks like the problem isn't really about Eclipse supporting variables -- it's about it trying to render what a character set that it thinks is called "{S_CONTENT_ENCODING}"
You can probably get around the problem by changing {S_CONTENT_ENCODING} to utf-8 (or latin-1 or whatever) in all of your templates. (This assumes that you aren't changing encoding from one template to the next, but I really doubt you are.)
Copy-paste utf-8 where you see {S_CONTENT_ENCODING} in one of the templates, and Eclipse should handle it the other {foo} instances from there.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Why does the web page I fetch with Perl look odd? - html

The page is encoded with UTF-8. Perhaps your Perl script is using a different encoding? I found this page that describes Processing UTF-8 Files with Perl.

Related

System settings on MAC gives error for HTML

Generating HTML with inclosed HTML stored in DB

Unicode is not shown in meta tag

html file displaying wierd characters when copied from Windows to Mac

Eclipse HTML editor for HTML template files

Categories

Resources