Way validate HTML code for gmail? - html

I got some auto generated HMLT code working. Made sure it is correctly parsed by http://validator.w3.org/, and it is a working HTML4.01 Strict.
Now, when I embed this code in a email and send it to gmail, the result is quite adverse (messes up the formatting).
The code is quite long, and apparently only happens when it has such size. This tells me two things:
not worth putting a code snippet here
it is probably some conflicting tag, but still considered valid the parser
Would you guys know of any tool even more strict to validate my HTML? Maybe even something specific for gmail?
Or maybe, just some pro tip on what usually screws the code for gmail.
ps.: the code, although is long, is also quite simple, only a few tables and styles - I took some care in making sure used only "email friendly" tags and formats.

Did you say styles? Oh, boy! Email clients are all doing things differently and even if you get it working for GMail, it may not work for Yahoo.
You may want to look at something like Email CSS guide to start, but really you also want to use some of the Inbox Analysis services (e.g. Inbox Inspector from MailChimp) to get a better picture.
I haven't done this myself (yet), but I have seen mentioned over and over that this is one area you can lose your hair over.

You have to code like its 1999 and not worry about being so tied to conformity to HTML.

Unfortunately valid HTML just doesn't work for some (most) email clients. Even Gmail will strip or ignore things probably for security reasons. The best bet for an email is basically HTML 3. Some inline styles for fonts. I know that <p> tags break in Gmail, and in general colspan and rowspan won't work as intended and you have to use nested tables. Those are just a few things I can think of off the top of my head.

All the answers here helped, but the actual problem was elsewhere.
The problem was In the HTML as I thought, but not exactly in my HTML.
Turns out the the email client will wrap lines too big before handling it to render, regardless if it is HTML code or whatever, more exactly, breaking tags in the middle - that explain why it was happening only when the report reached a certain length.
What tipped me that was when I looked at the code generated by MailChimp (suggested by Alexandre Rafalovitch) and noticed it was formatted as quoted-printable, cropped at exactly 75chars for every line.
After that was easy enough to do the same in my own code generator. Well, actually, I didn't even format as quoted-printable, only made sure it would wrap too long lines by itself.
Apart from that, for all I can tell, a HTML 4.01 Strict code will work pretty much fine in a Gmail client.
Hope it helps post-1999 generations.
cheers.

Related

Emails sent using PHP have intermittent missing sections

I have a test program that logs the users answers and grade to a log file. I also have it email off this in a nicely formatted html email to the administrator of the tests.
For the most part, this system works. But strangley I've noticed that different email clients are removing portions of the code. It's a table, so a lot of the code is very repetitive, and the sections that get removed are the same every time for each email client (outlook and gmail are the ones I've tested). So for example, if I have a section of the table:
<tr><td style="background:#a66;text-align:center">This is the answer</td></tr>
Then it may come out as:
<tr><td style="backgrouter">This is the answer</td></tr>
And I can't find any correlation between where it does this in each file. Sometimes its near the end, sometimes near the beginning. In some cases, if the test was particularly long, it won't even finish the email.
I have my php outputting the same exact html to a log file on the server, and that always comes out perfect.
What's going on? How do I fix it?
I think this is because of the 998 characters per line limitation on MIME Email.
For more details, you can see below posts:
Reasoning behing 76 being the line length limit for MIME sections, as defined by RFC 2045? (See answer by appleleaf)
HTML safe wrapping of long lines
My solution is to add "\r\n" between HTML tags so it won't exceed 998 characters per line. This works for me.
The one thing I can think to try is to append an !important to the each CSS statement.
<td style="background:#a66 !important;text-align:center !important">
Oh, and just noticed you are missing a closing ;.
That is weird!
Ok, firstly, have you tried spacing and terminating all the styles? eg:
<tr><td style="background-color: #aa6666; text-align: center;">
Secondly, it may be some strange HTML interpretation that Gmail is picking up on although I can't think of a reason for this to happen (eg. style names or reserved function names etc).
Otherwise, I'm stumped. I've only ever seen this happen with yahoo mail, where the HTML in an email broke the yahoo mail layout...
I'd be tempted to use css classes and style them inside a style tag. I've never seen them break.
I am afraid your problem has a deeper problem.
Those clients have some obscure way of processing data, and end up repeatably sent mails from same email dress to render it as it was a quote from some other mail.
I would advise you to check the html consistency of the mails, and to read up suported html for emails.
Also make sure that your email header is saying it is a html formatted email and not plain text. Formatting in the header is also important i would command utf8

Using direct HTML tags instead of div.class

I have some special tags on my blogsite which need to be as simple as possible so that my colleges who don't know anything about HTML can use it. For example
<question>...</question>
and
<answer>...</answer>
and then these are styled in CSS. It's far easier for HTML-idiots than to use the <div class="answer">...</div> format.
I've just found out IE8 is displaying it all wrong while Firefox and Chrome do it right. Is that expected or am I doing something wrong? Do you know of any hack to fix this since there are tons of blogsposts I'll have to manually change otherwise!!
You want to create <question>...</question> etc.
These are not HTML (not even HTML5), and you will struggle to get browsers to understand them reliably.
A quick tip that might help you:
You say you've got it working in all browsers except IE. If so, you might be able to hack IE to get it working as well, using a technique similar to the hacks like HTML5Shiv that are being used to get IE to work with the new HTML5 tags. These use Javascript to create a DOM element with the new tag name, after which IE suddenly starts to recognise that tag as being valid HTML.
It might just work. But be aware that it is a hack, and it only targets IE. And since you're using non-standard tags, you also have no way of knowing what will happen in the future in terms of it breaking browsers, even if they work now. (in fact, I would say the worse case scenario would be if one of the tags you've invented is added to the HTML standard at a later date, because then you'll start getting weird layout glitches as it gets added to the default stylesheet)
If you can get it working that way, then well done. But consider yourself warned that it's not good practice.
What you have actually asked for is not HTML, but XML markup. This is perfectly fine, but shouldn't be put directly into a web page in the way you're hoping.
There are a number of well-documented ways to get raw XML code into a browser.
One option is to use XSL to transform it into valid HTML. Another way would be to load it into a DOM object in Javascript and process it using a script. (this is where the 'X' comes in 'Ajax').
My guess is that a simple XSL transformation would do the trick for you. (In fact, it sounds like your use case might be simple enough that even just basic string replacement might suffice for the same end result). You can get your colleauges to create the code using <whatever> <tags> <they> <want>, and you write a script that parses it and converts it to regular HTML prior to merging it with the rest of the page.
In the long term, this would probably be a far better solution than the hack I've described above.
Hope that helps.
I don't know if this answer fits your needs but imho using custom html tags is basically NOT using HTML. Therefore the absence of compatibilty.
If you need to render data in HTML wouldn't be better using XML + XSLT?
You can find guides on w3schools
You can't add new elements like that. HTML has some fixed elements that browsers understand, but if you add your own, browser don't know what to do with them.
HTML5 has some new elements you can maybe find useful : http://www.w3schools.com/html5/html5_new_elements.asp but this won't work with older browser without some kind of javascript to fix things. For example http://remysharp.com/2009/01/07/html5-enabling-script/
However, if you really want to add new tags, it is possible to do so and then "modify" them via javascript to known tags (actually it's what the html5 enabling script of IE do), but it won't be possible to apply CSS easily to the new tags.
In short, I strongly advise against adding new tags. It's not that hard to understand something like <div class="answer">.
sounds like you want to write XML and convert to HTML using XSLT. This is an old tutorial (includes defining an DTD), but a further web search will garner more results that might suit
here you go fella:
http://net.tutsplus.com/tutorials/html-css-techniques/how-to-make-all-browsers-render-html5-mark-up-correctly-even-ie6/
You need to use createElement :)

How should I format my markup?

When it comes to my markup, I'm anal. It always has to be perfectly indented, easily readable to me, and 100% valid with the W3C. Often time, when viewing the markup of other websites, I'm appalled with the lack of effort by the developer to try to and keep their markup in the browser clean, organized, and valid.
On the flip side, there's a lot of people who will force all their markup on to one, continuous line for the size saving benefits. This annoys me as well, though not to the same extent because it is done with a purpose. But for the most part, it seems like no developer ever actually looks at their markup in the browser and does anything about it.
Understanding that, to the parser in the browser, indents and spaces (usually) don't matter, how should I be handling my markup? Is it worth the extra time to get my markup perfectly easily readable to humans as well as the browser? Are all my \t's and \n's being used in vain?
There are some browsers who has bugs that renders indented well formed html completely wrong. Such as some versions of Internet explorer with tables and images.
Other than that, i try to keep sane indention, I don't spend to much time with it, just enough to make it easy to debug.
Is it worth the extra time to get my markup perfectly easily readable
My answer is no. The arguments:
Whoever tries to look at the code probably will want modify it so, for editing the code you need good code editor with code formatting (e.g. Netbeans). You'll very soon need other features like, syntax coloring.
Some users might prefer other type of formatting than you.
Anyone interested in readable HTML may use Tidy (of Tidy extension to Firefox) to format it.
It's a performance issue too: additional overload of formatting + stripping whitespace (and minifying when possible) will speed up the site. It's very important for sites with high traffic.
It's worth the effort imho since it helps you understand what exactly is going on in your html page, and that's definitely worth something.
If we want to write clean, elegant code in general this means we should want to generate nice, clean elegant html as well, not?
Not sure if this answers your question, but as long as the code is valid by W3C, is structured as intended. As far as your view-ability of the code (like view source) structure, that's really up to you, but I would not add too much clutter (comments etc). Use the correct DOCTYPE for your markup and you should be fine with that. I don't see any reason to "waste" time on making the source code from the browser "book" readable. The view source would only be beneficial to you so you can quickly see what's happening at a glance through source view.
I like to correctly format my markup, and I think it makes it easier to manage when I do.
Then again, I use ASP.NET and a lot of markup is generated through various controls and classes. In this case, I've decided it is not worth trying to track down each mis-aligned markup and see if something can be done to get the associated control to produce the correct result.
In short, nicely formatted markup is worth it if it can be accomplished without a huge effort.
Yes, in my opinion it is worth. It will be easier to maintain, for you and for other collegues, now and in the future.
About the disadvantage of lower performance, why not to develop a well indented and commented source file and to generate a minimized version to run on the server? It can be acheived with a simple series of regex replacements.

Apart from <script> tags, what should I strip to make sure user-entered HTML is safe?

I have an app that reprocesses HTML in order to do nice typography. Now, I want to put it up on the web to let users type in their text. So here's the question: I'm pretty sure that I want to remove the SCRIPT tag, plus closing tags like </form>. But what else should I remove to make it totally safe?
Oh good lord you're screwed.
Take a look at this
Basically, there are so many things you want to strip out. Plus, there's stuff that's valid, but could be used in malicious ways. What if the user wants to set their font size smaller on a footnote? Do you care if that get applied to your entire page? How about setting colors? Now all the words on your page are white on a white background.
I would look into the requirements phase again.
Is a markdown-like alternative possible?
Can you restrict access to the final content, reducing risk of exposure? (meaning, can you set it up so the user only screws themselves, and can't harm other people?)
You should take the white-list rather than the black-list approach: Decide which features are desired, rather than try to block any unwanted feature.
Make a list of desired typographic features that match your application. Note that there is probably no one-size-fits-all list: It depends both on the nature of the site (programming questions? teenagers' blog?) and the nature of the text box (are you leaving a comment or writing an article?). You can take a look at some good and useful text boxes in open source CMSs.
Now you have to chose between your own markup language and HTML. I would chose a markup language. The pros are better security, the cons are incapability to add unexpected internet contents, like youtube videos. A good idea to prevent users' rage is adding an "HTML to my-site" feature that translates the corresponding HTML tags to your markup language, and delete all other tags.
The pros for HTML are consistency with standards, extendability to new contents types and simplicity. The big con is code injection security issues. Should you pick HTML tags, try to adopt some working system for filtering HTML (I think Drupal is doing quite a good job in this case).
Instead of blacklisting some tags, it's always safer to whitelist. See what stackoverflow does: What HTML tags are allowed on Stack Overflow?
There are just too many ways to embed scripts in the markup. javascript: URLs (encoded of course)? CSS behaviors? I don't think you want to go there.
There are plenty of ways that code could be sneaked in - especially watch for situations like <img src="http://nasty/exploit/here.php"> that can feed a <script> tag to your clients, I've seen <script> blocked on sites before, but the tag got right through, which resulted in 30-40 passwords stolen.
<iframe>
<style>
<form>
<object>
<embed>
<bgsound>
Is what I can think of. But to be sure, use a whitelist instead - things like <a>, <img>† that are (mostly) harmless.
† Just make sure that any javascript:... / on*=... are filtered out too... as you can see, it can get quite complicated.
I disagree with person-b. You're forgetting about javascript attributes, like this:
<img src="xyz.jpg" onload="javascript:alert('evil');"/>
Attackers will always be more creative than you when it comes to this. Definitely go with the whitelist approach.
MediaWiki is more permissive than this site; yes, it accepts setting colors (even white on white), margins, indents and absolute positioning (including those that would put the text completely out of screen), null, clippings and "display;none", font sizes (even if they are ridiculously small or excessively large) and font-names (even if this is a legacy non-Unicode Symbol font name that will not render text successfully), as opposed to this site which strips out almost everything.
But MediaWiki successifully strips out the dangerous active scripts from CSS (i.e. the behaviors, the onEvent handlers, the active filters or javascript link targets) without filtering completely the style attribute, and bans a few other active elements like object, embed, bgsound.
Both sits are banning marquees as well (not standard HTML, and needlessly distracting).
But MediaWiki sites are patrolled by lots of users and there are policy rules to ban those users that are abusing repeatedly.
It offers support for animated iamges, and provides support for active extensions, such as to render TeX maths expressions, or other active extensions that have been approved (like timeline), or to create or customize a few forms.

What guidelines for HTML email design are there? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
What guidelines can you give for rich HTML formatting in emails while maintaining good visual stability across many clients and web based email interfaces?
An unrelated answer on a question on Stack Overflow suggested:
http://www.campaignmonitor.com/blog/archives/2008/05/2008_email_design_guidelines.html
Which contains the following guidelines:
Place stylesheet in <body> instead of <head>
Some email clients will strip CSS out of the head, but leave it if the style block is (invalidly) in the body.
Use inline styles where ever possible
Gmail will strip any stylesheet, whether in the <head> or in the <body>, but honor inline styles assigned using the style="" attribute
Return to tables
Email standards have actually taken a giant step backwards in recent years thanks to Outlook 2007 using the Microsoft Word rendering engine. Unlearn most of what you learned about positioning without stylesheets.
Don't rely on images
Most clients and most web based email clients will not display images unless the user specifically requests them to be displayed.
I also have a few "unconfirmed" truths that I don't remember where I read them.
Don't use more than two levels of nesting in tables
Is this true. What is likely to happen if I do? Is there any particular client/clients that choke on this?
Be careful of nesting background images in cells/tables
As I understand you may encounter situations where the background image is applied in the descending table/cell completely anew, and not just "shining through". Again, true or not? Which clients?
I would like to flesh out this list with more guidelines and experiences from the trenches.
Can you offer any further suggestions?
Update: I'm specifially asking for guidelines for the design part in HTML and consistency there of. Questions about general guidelines for avoiding spam filters, and common courtesy are already on SO.
It's actually really hard to make a decent HTML email, if you approach it from a 'modern HTML and CSS' perspective.
For best results, imagine it's 1999.
Go back to tables for layout (or preferably - don't attempt any complex layout)
Be afraid of background images (they break in Outlook 2007 and Gmail).
The style-tag-in-the-body thing is because Hotmail used to accept it that way - I'm pretty sure they strip it out now though. Use inline styles with the style attribute if you must use CSS.
Forget entirely about float
Remember your images will probably be blocked - use background and text colour to your advantage - make sure there is some readable text with images disabled
Be very careful with links, be especially wary of anything that looks like a URL in the link text - you will anger 'phishing' filters (eg www.someotherdomain.tld is bad)
Remember that the "fold" on webmail clients tends to be extremely high up the page (on a 1024x768 screen, most interfaces won't show more than a hundred pixels or so) - get your identity stuff in right at the top so the recipient knows who you are.
Recent version of outlook have a "portrait" preview pane which is significantly narrower than you may be expecting - be very wary of fixed-width layouts, if you must use them, make them as narrow as you can.
Don't even think about flash, Javascript, SVG, canvas, or anything like that.
Test, a lot. Make sure you test in a recent Outlook (things have changed a lot! It now uses Word as its HTML rendering engine, and it's crippled: Word 2007 HTML/CSS support). Gmail is pretty finicky also. Surprisingly Yahoo's webmail is extremely good, with nice CSS support.
Good luck ;)
Update to answer further questions:
Don't use more than two levels of nesting in tables
I believe this is an older guideline pertaining to Lotus Notes. Nested tables should be okay, but really, if you have a layout that's complicated enough to need them, you're probably going to have trouble anyway. Keep your layout simple.
Be careful of nesting background images in cells/tables
This may be related to the above, and the same applies, if you're getting that complicated then you will have problems. Recent versions of Outlook don't support background images at all, so you'd be best advised to forget about them entirely.
Always use multipart mime and provide a plain text alternative.
The folks behind Campaign Monitor also started a Email Standards Project web site with a lot of good information.
Take a look at this boilerplate, it is like html5boilerplate, but for emails:
http://htmlemailboilerplate.com/
I think this is lower level than the question you are asking, but if you really want an html email to be correctly viewed by as many clients as possible, make sure it's using valid MIME. In particular, for an email to be considered as valid MIME, the headers MUST (in the RFC sense of the word) contain both of these headers:
MIME-Version:
Content-Type:
Very strict clients will display your HTML as raw text if one or the other of these is missing. You'd be surprised how many large online vendors who should know better have screwed this up (notably, I've gotten HTML emails w/ missing MIME-Version: headers from Amazon and the ACM in the past)
Background images are not reliable.
Practically a no-brainer, but no javascript.
Use an editor that lets you send the current file/buffer as an email, or at the very least, find a program that would let you send the contents of a file as an HTML email. do not test your emails by copying the HTML, and pasting it into outlook (or any other mail program for that matter).
Three words of advice: test, test, test.
Check out LitmusApp.com's email testing service. You send them a message and they render it in a bunch of clients and show you screenshots of the results. It's not perfect, but it's pretty good.
(Lotus Notes prior to 8.0 really, really stinks for HTML mail, by the way)
Also, beyond just inline CSS styles, I recommend switching to tags wherever possible.
Embed your images, don't link to them.
This is bad :
<img src="http://myserver.com/myImage.jpg" alt="Lolkat"/>
This is good :
<img src=cid:myImage/>
Yeah, it looks weird but check out this guide regarding embedding images in emails.
If you're including a style block don't begin any new line with ".classname" or "." anything. Put a brace or something before the period. If you don't do this some web mail systems will not properly display your style sheets.
Many people have incorrectly assumed they cannot use CSS blocks in emails because of this behavior... IIRC "." is the body delimiter for SMTP. Systems will tend to escape in their mail stores to prevent the contents of one message from being misrecognized as a new message. The way this is handled tends to break any style starting on a new line with a period.